Paths

Data Analytics Literacy

Authors: Janani Ravi, Axel Sirota

Data Analytics is the detection, interpretation, and communication of meaningful patterns in data.

What you will learn:

  • Describe the general analytics workflow
  • Differentiate data types and identify analyses suitable for specific types of data
  • Determine which analysis is appropriate for a specific business problem
  • Apply hypothesis testing to a new business problem
  • Describe the key components of an RDBMS (Relational Database Management System) architecture query and process data using OLTP (Online Transactional Processing) systems write portable SQL queries against data define schemas describe common database programming constructs (stored procedures, triggers, views, etc)
  • Describe the components of an OLAP (Online Analytical Processing) system differentiate tabular vs cube data models writing analytical queries working with nested/repeated data dealing with streaming data in an OLAP context
  • Describe the components of a NoSQL (Not Only SQL) database
  • differentiate columnar/wide-column databases vs document databases
  • Identify when each is appropriate
  • Describe common methods for getting data in and out of systems - scripting (including specialty languages such as Pig), bulk loading, streaming inserts
  • Compare and contrast the ETL (extract, transform, and load) workflow with the LET workflow (load, extract, and transform)
  • Describe the “four v’s” of Big Data and how they are used to differentiate Big Data problems from “small data”
  • Describe the pros and cons of using cloud vs on-premise solutions for data management
  • Describe the pros and cons of using “handrolled” Hadoop/Hive/Spark vs proprietary systems like Teradata/Oracle
  • Identify key decision factors between services on AWS, Azure, GCP etc
  • Describe the general analytics workflow
  • Differentiate data types and identify analyses suitable for specific types of data
  • Determine which analysis is appropriate for a specific business problem
  • Apply hypothesis testing to a new business problem
  • Describe the key components of an RDBMS (Relational Database Management System) architecture query and process data using OLTP (Online Transactional Processing) systems
  • Write portable SQL queries against data
  • Define schemas
  • Describe common database programming constructs (stored procedures, triggers, views, etc)
  • Describe the components of an OLAP (Online Analytical Processing) system
  • Differentiate tabular vs cube data models
  • Writing analytical queries
  • Working with nested/repeated data
  • Dealing with streaming data in an OLAP context
  • Describe the components of a NoSQL (Not Only SQL) database
  • Differentiate columnar/wide-column databases vs document databases
  • Identify when each is appropriate
  • Describe common methods for getting data in and out of systems - scripting (including specialty languages such as Pig), bulk loading, streaming inserts
  • Compare and contrast the ETL (extract, transform, and load) workflow with the LET workflow (load, extract, and transform)
  • Describe the “four v’s” of Big Data and how they are used to differentiate Big Data problems from “small data”
  • Describe the pros and cons of using cloud vs on-premise solutions for data management
  • Describe the pros and cons of using “handrolled” Hadoop/Hive/Spark vs proprietary systems like Teradata/Oracle
  • Identify key decision factors between services on AWS, Azure, GCP, etc.

Pre-requisites

  • Basic mathematics
  • Basic computer use
  • Basic data skills, such as using spreadsheets

Beginner

Learn fundamental objectives around representing, processing, and shaping data for analysis.

Representing, Processing, and Preparing Data

by Janani Ravi

Jun 19, 2019 / 2h 45m

2h 45m

Start Course
Description

Data science and data modeling are fast emerging as crucial capabilities that every enterprise and every technologist must possess these days. As the process of actually constructing models becomes democratized, the general view is shifting toward using the right data and using the data right. In this course, Representing, Processing, and Preparing Data, you will gain the ability to correctly represent information from your domain as numeric data, and get it into a form where the full capabilities of models can be leveraged. First, you will learn how outliers and missing data can be dealt with in a theoretically sound manner. Next, you will discover how to use spreadsheets, programming languages and relational databases to work with your data. You will see the different types of data that you may deal with in the real world and how you can collect and integrate data to a common destination to eliminate silos. Finally, you will round out the course by working with visualization tools that allow every member of an enterprise to work with data and extract meaningful insights. When you are finished with this course, you will have the skills and knowledge to use the right data sources, cope with data quality issues and choose the right technologies to extract insights from your enterprise data.

Table of contents
  1. Course Overview
  2. Understanding Data Cleaning and Preparation Techniques
  3. Preparing Data for Analysis Using Spreadsheets and Python
  4. Collecting Data to Extract Insights
  5. Loading and Processing Data Using Relational Databases
  6. Representing Insights Obtained from Data

Combining and Shaping Data

by Janani Ravi

Jun 21, 2019 / 3h 28m

3h 28m

Start Course
Description

Connecting the dots between data from different sources is becoming the most sought-after skill these days for everyone ranging from business professionals to data scientists. In this course, Combining and Shaping Data, you will gain the ability to connect the dots by pulling together data from disparate sources and shaping it so that extracting connections and relationships becomes relatively easy. First, you will learn how the most common constructs in shaping and combining data stay the same across spreadsheets, programming languages, and databases. Next, you will discover how to use joins and vlookups to obtain wide datasets, and then use pivots to shape that into long form. You will then see how both long and wide data can be aggregated to obtain higher level insights. You will work with Excel spreadsheets and SQL as well as Python. Finally, you will round out the course by integrating data from a variety of sources and working with streaming data, which helps your enterprise gain real-time insights into the world around you. When you are finished with this course, you will have the skills and knowledge to pull together data from disparate sources, including from streaming sources, to construct integrated data models that truly connect the dots.

Table of contents
  1. Course Overview
  2. Exploring Techniques to Combine and Shape Data
  3. Combining and Shaping Data Using Spreadsheets
  4. Combining and Shaping Data Using SQL
  5. Combining and Shaping Data Using Python
  6. Integrating Data from Disparate Sources into a Data Warehouse
  7. Working with Streaming Data Using a Data Warehouse

Intermediate

Learn to apply descriptive statistics to data, and design experiments to further your analysis.

Summarizing Data and Deducing Probabilities

by Janani Ravi

Jun 20, 2019 / 2h 49m

2h 49m

Start Course
Description

Data science and data modeling are fast emerging as crucial capabilities that every enterprise and every technologist must possess these days. Increasingly, different organizations are using the same models and the same modeling tools, so what differs is how those models are applied to the data. So, it is really important that you know your data well. In this course, Summarizing Data and Deducing Probabilities, you will gain the ability to summarize your data using univariate, bivariate, and multivariate statistics in a range of technologies. First, you will learn how measures of mean and central tendency can be calculated in Microsoft Excel and Python. Next, you will discover how to use correlations and covariances to explore pairwise relationships. You will then see how those constructs can be generalized to multiple variables using covariance and correlation matrices. You will understand and apply Bayes' Theorem, one of the most powerful and widely-used results in probability, to build a robust classifier. Finally, you will use Seaborn, a visualization library, to represent statistics visually.   When you are finished with this course, you will have the skills and knowledge to use univariate, bivariate, and multivariate descriptive statistics from Excel and Python in order to find relationships and calculate probabilities.

Table of contents
  1. Course Overview
  2. Understanding Descriptive Statistics for Data Analysis
  3. Performing Exploratory Data Analysis in Spreadsheets
  4. Summarizing Data and Deducing Probabilities Using Python
  5. Understanding and Applying Bayes' Rule
  6. Visualizing Probabilistic and Statistical Data Using Seaborn

Experimental Design for Data Analysis

by Janani Ravi

Jun 20, 2019 / 2h 45m

2h 45m

Start Course
Description

Providing crisp, clear, actionable points-of-view to senior executives is becoming an increasingly important role of data scientists and data professionals these days. Now, a point-of-view must represent a hypothesis, ideally backed by data. In this course, Experimental Design for Data Analysis, you will gain the ability to construct such hypotheses from data and use rigorous frameworks to test whether they hold true. First, you will learn how inferential statistics and hypothesis testing form the basis of data modeling and machine learning. Next, you will discover how the process of building machine learning models is akin to that of designing an experiment and how training and validation techniques help rigorously evaluate the results of such experiments. Then, you will round out the course by studying various forms of cross-validation, including both singular and iterative techniques to cope with independent, identically distributed data and grouped data. Finally, you will also learn how you can refine your models using these techniques with hyperparameter tuning. When you’re finished with this course, you will have the skills and knowledge to build and evaluate models, specifically including machine learning models, using rigorous cross-validation frameworks and hyperparameter tuning.

Table of contents
  1. Course Overview
  2. Designing an Experiment for Data Analysis
  3. Building and Training a Machine Learning Model
  4. Understanding and Overcoming Common Problems in Data Modeling
  5. Leveraging Different Validation Strategies in Data Modeling
  6. Tuning Hyperparameters Using Cross Validation Scores

Advanced

Learn to apply common statistical models to business problems, and to recognize factors that impact your communication of findings.

Interpreting Data with Statistical Models

by Axel Sirota

May 22, 2019 / 2h 54m

2h 54m

Start Course
Description

Data is everywhere, from the newspaper you read on the subway to the report you are using to analyze yesterday's stock market performance. In this course, Interpreting Data with Statistical Models, you will gain the ability to effectively understand how to tackle problems that appear at your work, understand which is the right statistical analysis to use, and how to interpret the results to obtain insights. First, you will learn the very basics of statistics. Next, you will discover hypothesis testing to compare variables. Finally, you will explore how to make multiple comparisons and detect functional relationships with ANOVA and Regression. When you’re finished with this course, you will have the skills and knowledge of data analysis and statistical models needed to make your data speak for itself.

Table of contents
  1. Course Overview
  2. Thinking Like a Statistician
  3. Testing a Hypothesis
  4. Comparing Categorical Values with Frequency Analysis
  5. Analyzing Experiments with ANOVA
  6. Comparing Groups and Effects with ANOVA
  7. Predicting Linear Relationships with Regression
  8. Predicting Non-linear Relationships with Regression

Communicating Data Insights

by Janani Ravi

Jun 21, 2019 / 2h 27m

2h 27m

Start Course
Description

Providing crisp, clear, actionable points-of-view to senior executives is becoming an increasingly important role of data scientists and data professionals these days. In this course, Communicating Data Insights you will gain the ability to summarize complex information into such clear and actionable insights. First, you will learn how to sum up the important descriptive statistics from any numeric dataset. Next, you will discover how to build and use specialized visual representations such as candlestick charts, Sankey diagrams and funnel charts in Python. You will then see how the data behind such representations can now be fed in from enterprise-wide sources such as data warehouses and ETL pipelines. Finally, you will round out the course by working with data residing in different public cloud platforms, and even in a hybrid environment, that is with some of it on-premise and some of it on the cloud. When you’re finished with this course, you will have the skills and knowledge to pull together data from disparate sources and use nifty visualizations to convey crisp, actionable points-of-view to a senior executive audience.

Table of contents
  1. Course Overview
  2. Communicating Insights from Statistical Data
  3. Communicating Insights from Business Data
  4. Visualizing Distributions and Relationships in Data
  5. Integrating Data in a Multi-cloud Environment
  6. Integrating Data in a Hybrid Environment