
Instructor
Rahul Kumar

Category
Python and R outline

Course Fees
Quotation on request Rs.
Course name : Supervised and Unsupervised Learning using Python and R 5 Days
DATA SCIENCE USING R AND PYTHON
GOOGLE’S SELFDRIVING CARS AND ROBOTS GET A LOT OF PRESS, BUT THE COMPANY’S REAL FUTURE IS IN MACHINE LEARNING, THE TECHNOLOGY THAT ENABLES COMPUTERS TO GET SMARTER AND MORE PERSONAL.
– ERIC SCHMIDT (GOOGLE CHAIRMAN)
This course is intended to give a holistic understanding on statistical & machine learning and its application using Python and R. The workshop will cover
 An introduction to business analytics
 An introduction to Python for data analysis
 An introduction to R for data analysis
 An introduction to supervised machine learning algorithms
 An introduction to unsupervised machine learning algorithms
 Understanding the core of machine learning – Gradient Descent Algorithm
 Understanding of various sampling strategies and its efficacy in learning process
 An introduction to ensemble methods for handling imbalanced data
 Handson using the Python code and R Code on the real life dataset
OBJECTIVE
We are living in an era where computing moved from mainframes to personal computers to cloud. And while it happened, we started generating humongous amount of data. However the multifolds increase in computing power also brought in advancement in application of algorithms which can be used to get insights from huge amount of data being generated. In this course, you will learn to nuances of building supervised and unsupervised machine learning models on real life datasets. We’ll introduce you to Anaconda framework, Python and R kernel through Jupyter Notebook for applying some of the statistical and machine learning algorithms which will become handy in solving challenging problems.
At the end of the course you will develop a clear understanding of the need of machine learning algorithms and the context in which to apply these algorithms to solve complex problems from the field of business.
WHO SHOULD ATTEND
Irrespective of type of industry (retail, ecommerce, manufacturing, real estate & construction, telecom, hospitality, banking, healthcare, IT, supply chain &logistic, etc.); data forms the crux of decision making. This course is designed hone up analytical skills and business acumen of midlevel and senior level corporate professionals trying to understand the nuances of data science and help them the machine learning techniques an efficient way to generate insights for customers which in turn optimizes the bottom line of organizations.
HARDWARE AND SOFTWARE
 Participants should bring their laptop (preferably Windows 7 or higher/ Mac OS installed).
 Operating System (any of the following):
 Mac OS X with XQuartz
 Windows (Version XP or later) is required.
 Minimum 8 GB RAM on the system is advisable.
INSTALLATIONS:
 For Windows, go to https://docs.anaconda.com/anaconda/install/windows.html
 For MacOS, go to https://docs.anaconda.com/anaconda/install/macos
 For Linux, go to https://docs.anaconda.com/anaconda/install/linux
More about anaconda can be found at https://docs.anaconda.com. Participants are expected to resolve any installation issues of the software prior to the commencement of the session.
PREREQUISITE & COURSE DELIVERABLE
 Participants should have basic programming skills. Participants are expected to spend time with the code set as a home assignment to leverage the classroom training hours to the fullest.
 High speed internet connection will be provided at the training venue.
 Deliverable: Python code and dataset. Soft copy of the content being covered (PDF file)
COURSE OUTLINE
Day 1: Understanding Anaconda Framework platform and other useful packages in Python
Session 1 & 2–Introduction to Business Analytics
 What is Business Analytics
 Why is it needed and how industries are adopting it
 Different components of analytics
 Applications of analytics in different domains
 Statistical learning vs. Machine learning
 What is Data Science and skills of a data scientist
 Introduction to cloud machine learning engines
 Different types of machine learning algorithms–Supervised, Unsupervised and Reinforcement learning
Session 3 & 4–Introduction to Anaconda and Python
 Overview of Anaconda framework
 Python – Variables, objects, loops, conditions, function.
 Python Data structures – lists, tuples, dictionaries, sets
 Introduction to Numpy – ndarrays, ndarrays indexing, ndarrays datatypes and operations, statistical sorting and set operation
 Introduction to Pandas – Data ingestion, descriptive statistics, visualization, frequent data operations, merging dataframes, parsing timestamps
 Introduction to visualization – Matplotlib
Session 5–Introduction to R Platform
 Overview of R
 Fundamentals of R – Reading Data Files, Data Manipulation
 R Data structures – lists, dataframes
 Loops and Functions in R
 Useful packages for visualization and modelling
 Run through with few of the twenty three example codes in R
Day 2: Understanding basis statistics and R framework for data analysis
Session 1, 2 & 3–Basics of Statistics
 Probability basics – Benford Law (application in fraud analysis)
 Random Variable – Discrete and Continuous
 Probability density function and Cumulative density function
 Distribution Family – Gaussian Distribution, Standard Normal Distribution
 Population and Sample
 Central limit theorem, Demonstration of Central limit theorem on finance data
 Hypothesis testing – Z test, t test, test for proportion, analysis of variance (ANNOVA)
 Degree of freedom, Covariance and Correlation
 Partial and Semipartial Correlation
Session 4 & 5–Introduction to useful packages in R
 R Data structures – lists, dataframes
 Loops and Functions in R
 Useful packages for visualization and modelling
 Run through with few of the twenty three example codes in R
 Introduction to R Markdown, Rattle, Shiny
Day 3: Understanding supervised learning algorithms
Session 1, 2 & 3–Lab 1: Linear Regression
 Introduction to simple and multiple linear regression
 Regression diagnostic–Rsquared, ttest, Ftest, error terms distribution, heteroscedasticity, identifying multicollinearity and handling, AIC  model selection strategy
 Common task framework for model evaluation – training and test set.
 Case study using regression techniques and handson using R code for regression
Session 4 & 5–Lab 2: Logistic Regression
 Introduction to logistic regression
 Logistic regression diagnostic: Wald statistics, Hosmer Lemeshow test, Classification Matrix, Sensitivity, Specificity, ROC Curve, precision, recall, F1score
 Strategy to find the optimal cutoff
 Bias and variance in the model, Bias vs. variance tradeoff
 Case study using logistic regression techniques and handson using Python code for regression.
Day 4: Understanding supervised learning and gradient descent algorithm
Session 1–Lab 2: Logistic Regression
 Introduction to logistic regression
 Logistic regression diagnostic: Wald statistics, Hosmer Lemeshow test, Classification Matrix, Sensitivity, Specificity, ROC Curve, precision, recall, F1score
 Strategy to find the optimal cutoff
 Bias and variance in the model, Bias vs. variance tradeoff
 Case study using logistic regression techniques and handson using Python code for regression.
Session 2&3 –Lab 3: Introduction to Gradient Descent
 Hypothesis formulation for linear regression
 Deriving the cost function for linear regression
 Cost function– Intuition for linear regression with one parameter and two parameters
 Gradient descent algorithm–application in linear regression
 Hypothesis formulation for logistic regression
 Deriving the cost function for logistic regression
 Cost function– Intuition for logistic regression
 Gradient descent algorithm–application in logistic regression
Session 4–Lab 4: Decision Trees
 Decision tree – Classification and regression trees (CART), Gini Index, Entropy
 Decision tree – Chisquare automatic interaction detection (CHAID)
 Case study using decision tree techniques
 Handson using Python code
Session 5: Naïve Bayes classifier
 Naïve Bayes classifier on structured data
 Case study using decision tree techniques
 Handson using R code for CART and Naïve Bayes Classifier using caret package
Day 5: Understanding unsupervised learning and ensemble methods
Session 1–Lab 6: Clustering and Segmentation
 Supervised and Unsupervised learning
 Clustering–Hierarchical, K means
 Clustering diagnostic–Dendrogram , Calinski and Harabasz index, Silhouette width
 Case study using hierarchical clustering and K–means clustering tree techniques
 Handson using Python code for Hierarchical and K–means cluster
Session 2 & 3–Lab 7: Other Machine learning models (Ensemble Methods)
 What is Machine learning
 Different sampling strategies–Bootstrapping, Up–Sample, Down–Sample, Synthetic Sample, Cross–Validation Data
 Introduction to Bagging–Random Forest
 Other Bagging algorithms
 Introduction to Boosting– Adaptive boosting
 Other Boosting algorithms
 Case study of an imbalanced data and application of sampling strategies & ensemble methods
 Handson using R code on an imbalanced data
Session 4 & 5–Lab 8: Multivariate Gaussian Model for Anomaly Detection
 What is a fraud/anomaly
 Gaussian Model for anomaly detection
 Multivariate Gaussian Model for anomaly detection
COURSE SCHEDULE
Day 1: Understanding Anaconda Framework platform and other useful packages in Python
This day will be primarily cover introduction to business analytics, introduction to Anaconda and Python
Introduction to Business Analytics 
1 
9 AM 
10:15 AM 
Introduction to Business Analytics…cont. 
2 
10:30 AM 
11:15 AM 
Introduction to Anaconda and Python platform 
3 
12:00 PM 
1:15 PM 
Introduction to Anaconda and Python platform…cont. 
4 
2:15 PM 
3:30 PM 
Introduction to R 
5 
3:45 PM 
5:00 PM 
Day 2: Understanding basis statistics and R framework for data analysis
Day is primarily devoted to concept building on supervised learning and handson using Python code for the same
Basic of Statistics 
1 
9 AM 
10:15 AM 
Basic of Statistics…cont. 
2 
10:30 AM 
11: 45 AM 
Basic of Statistics…cont. 
3 
12:00 PM 
1:15 PM 
Introduction to R 
4 
2:15 PM 
3:30 PM 
Introduction to R…cont. 
5 
3:45 PM 
5:00 PM 
Day 3: Understanding supervised learning algorithms
Day will cover concept building on unsupervised learning, sampling strategy and handson using Python code for ensemble methods
Lab 1: Multiple Linear Regression 
1 
9 AM 
10:15 AM 
Lab 1: Multiple Linear Regression…cont. 
2 
10:30 AM 
11: 45 AM 
Lab 1: Multiple Linear Regression…cont. 
3 
12:00 PM 
1:15 PM 
Lab 2: Logistic Regression 
4 
2:15 PM 
3:30 PM 
Lab 2: Logistic Regression…cont. 
5 
3:45 PM 
5:00 PM 
Day 4: Understanding unsupervised learning and ensemble methods
Day will cover concept building on unsupervised learning and gradient descent
Lab 2: Logistic Regression…cont. 
1 
9 AM 
10:15 AM 
Lab 3: Gradient Descent 
2 
10:30 AM 
11: 45 AM 
Lab 3: Gradient Descent…cont. 
3 
12:00 PM 
1:15 PM 
Lab 4: Decision Tree 
4 
2:15 PM 
3:30 PM 
Lab 5: Naïve Bayes 
5 
3:45 PM 
5:00 PM 
Day 5: Understanding unsupervised learning and ensemble methods
Day will cover concept building on unsupervised learning and ensemble methods
Lab 6: Clustering and Segmentation 
1 
9 AM 
10:15 AM 
Lab 7: Ensemble Methods 
2 
10:30 AM 
11: 45 AM 
Lab 7: Ensemble Methods…cont. 
3 
12:00 PM 
1:15 PM 
Lab 8: Gaussian Model 
4 
2:15 PM 
3:30 PM 
Lab 8: Gaussian Model…cont. 
5 
3:45 PM 
5:00 PM 
John Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
Replay