
Instructor
Rahul Kumar

Category
R Programming and Data science

Course Fees
Quotation on request Rs.
Course name : R Programming and Data Science Course 5 Days
R Programming and Data Science
HARVARD BUSINESS REVIEW SAYS DATA SCIENTIST IS THE SEXIEST JOB OF THE 21ST CENTURY. BUT DEMAND FOR DATA SCIENTISTS IS RACING AHEAD OF SUPPLY. PEOPLE WITH THE NECESSARY SKILLS ARE SCARCE, PRIMARILY BECAUSE THE DISCIPLINE IS NEW. THIS SESSION WILL GIVE AN OVERVIEW OF DATA SCIENCE AND ITS ROLE IN COMPLEX BUSINESS DECISIONS MAKING PROCESS.
This course is intended to give a holistic understanding of R programming and its usage in solving business problems. The workshop will cover
 An introduction to business analytics
 An introduction to R platform for data analysis
 Usage of 23 different examples to understand the various libraries, functions and features available in R
 Introduction to different packages which can be used in R for making robust and complex machine learning models
 Handson using the R code and the real life dataset
 An introduction to supervised machine learning algorithms
 An introduction to unsupervised machine learning algorithms
OBJECTIVE
We are living in an era where computing moved from mainframes to personal computers to cloud. And while it happened, we started generating humongous amount of data. However the multifolds increase in computing power also brought in advancement in application of algorithms which can be used to get insights from huge amount of data being generated. In this course, you will learn to nuances of using R programming to build supervised and unsupervised machine learning models on real life datasets. We’ll introduce you to R platform, advanced concepts in R, and some of the statistical and machine learning algorithms which will become handy in solving challenging problems.
At the end of the course you will develop a clear understanding of the need of R programming language, machine learning algorithms and the context in which these algorithms will be applied to solve complex problems from the field of business.
WHO SHOULD ATTEND
Irrespective of type of industry (retail, ecommerce, manufacturing, real estate & construction, telecom, hospitality, banking, healthcare, IT, supply chain &logistic, etc.); data forms the crux of decision making. This course is designed hone up analytical skills and business acumen of midlevel and senior level corporate professionals trying to understand the nuances of data science and help them the machine learning techniques an efficient way to generate insights for customers which in turn optimizes the bottom line of organizations.
HARDWARE AND SOFTWARE

Participants should bring their laptop (preferably Windows 7 or higher/ Mac OS installed).

Operating System (any of the following):

Mac OS X with XQuartz

Windows (Version XP or later) is required.

Minimum 8 GB RAM on the system is advisable.
INSTALLATIONS:

Participants should have latest version of R and R Studio installed on their system.

First Install R and then R Studio. Latest version of the software can be found at:

R Can be downloaded from: Link to download R

RStudio can be downloaded from: Link to download R Studio
PREREQUISITE & COURSE DELIVERABLE

Participants should have basic programming skills. Participants are expected to spend time with the code set as a home assignment to leverage the classroom training hours to the fullest.

High speed internet connection will be provided at the training venue.
Deliverable: Python code and dataset. Soft copy of the content being covered (PDF file)
COURSE OUTLINE
Day 1: Understanding R platform
Session 1–Introduction to Business Analytics
 What is Business Analytics
 Why is it needed and how industries are adopting it
 Different components of analytics
 Applications of analytics in different domains
 Different types of machine learning algorithms–Supervised, Unsupervised and Reinforcement learning, Evolutionary learning
Session 2 & 3–Introduction to R Platform
 Overview of R
 Overview of R Studio (Script editor, console, global environment)
 Creating working directory
 Inbuilt dataset in R
 Understanding structure and summary of data
 Head and Tail function and its use
 Fundamentals of R
 Reading Data Files (text file, csv file, database connectivity)
 Writing Data files and charts in working directory
 Data Manipulations in R
 Creating new variables
 Update a variable
 Sub setting: Use of where function and subset function
 Sort, Merge, Aggregate
 Reshape a data
 Type casting of variables
 Data imputation techniques for handling missing values
Session 4 & 5–Introduction to R Platform
 Data types in R
 Data frame
 list
 vector
 matrix
 Factor variables
 Loops and Functions in R
 If else construct in R
 While loop and for loop in R
 The apply function family
 Useful packages for visualization and modelling
 “sandwich","rms","Deducer","ROCR","cairoDevice","rattle","caret","lubridate","plotrix","xts","manipulate","Quandl”,“ggplot2”, “shiny”
Day 2: Using R platform for data analysis
Session 1 & 2–R programing for Data analysis
 Run through with twenty three example codes in R
 How to understand correlation amongst variables
 Generate random numbers and plot histogram, normality plot
 Scan an input from console, create directory and sink files into the directory
 Understand skewness in data using data from datasets package
 Use mtcars dataset to build a regression and visualize using plot function
 Use mtcars dataset to build a regression and visualize using ggplot package
 Create a matrix and array in R
Session 3, 4 & 5– R programing for Data analysis
 Run through with twenty three example codes in R
 Create a factor variable in R
 Using lubridate package for date manipulation in R
 Read data from yahoo finance and subset the data and plot the data using plot function
 Apply central limit theorem on data imported from yahoo finance
 Overlay two trend lines side by side on the same plot
 Use plotrix package to make a 3D pie chart
 Use xts package for time series manipulation
Day 3: Introduction to GG Plot, R Markdown and Rattle
Session 1– R programing for Data analysis
 Run through with twenty three example codes in R
 Boxplot using data from Yahoo finance and plot multiple charts in the same window
 Use manipulate package with subset and plot function to make interactive charts
 Creating user defined functions in R
 Using sample function to generate multiple samples from a dataset
 Understand uniform distribution, binomial distribution using R
 Create multiple random samples from a dataset and list them as columns on a spreadsheet
 Using quandl dataset to demonstrate central limit theorem
Session 2 & 3–The aesthetic of GG plot package
 How variables in the data are mapped to visual properties
 Specifying color, line, shape, text and justification
Session 4 & 5–Introduction to useful packages in R
 Introduction to R Markdown: Markdown script to create reports
 Introduction to Rattle
Day 4: Introduction to Shiny and Basic of statistics
Session 1 & 2–Introduction to useful packages in R
 Introduction to Shiny
 Using ggplot and other libraries to build an online dashboard using HR data
 Deploy the dashboard in the cloud
Session 3 &4–Basics of Statistics
 Random Variable – Discrete and Continuous
 Probability density function and Cumulative density function
 Distribution Family – Gaussian Distribution, Standard Normal Distribution
 Population and Sample
 Central limit theorem
 Demonstration of Central limit theorem on finance data
 Hypothesis testing – Z test, t test, test for proportion
 Covariance and Correlation
Session 5–Lab 1: Linear Regression
 Introduction to simple and multiple linear regression
 Regression diagnostic–Rsquared, ttest, Ftest, error terms distribution, heteroscedasticity
Day 5: Introduction to Machine Learning and its implementation using R
Session 1–Lab 1: Linear Regression
 Case study using regression techniques
 Handson using R code for regression using Stats package and Caret Package
Session 2 & 3–Lab 2: Logistic Regression
 Introduction to logistic regression
 Logistic regression diagnostic: Wald statistics, Hosmer Lemeshow test, Classification Matrix, Sensitivity, Specificity, ROC Curve
 Strategy to find the optimal cutoff
 Case study using logistic regression techniques
 Handson using R code for regression using Stats package and Caret Package
Session 4 & 5–Lab 4: Clustering and Segmentation
 Supervised and Unsupervised learning
 Clustering–Hierarchical, K means
 Clustering diagnostic–Dendrogram , Calinski and Harabasz index, Silhouette width
 Case study using hierarchical clustering and K–means clustering tree techniques
 Handson using R code for Hierarchical and K–means cluster
COURSE SCHEDULE
Day 1: Understanding R platform
This day will be primarily cover introduction to business analytics, introduction to R platform.
Introduction to Business Analytics 
1 
9 AM 
10:15 AM 
Introduction to R platform 
2 
10:30 AM 
11:15 AM 
Introduction to R platform…cont. 
3 
12:00 PM 
1:15 PM 
Introduction to R platform…cont. 
4 
2:15 PM 
3:30 PM 
Introduction to R platform…cont. 
5 
3:45 PM 
5:00 PM 
Day 2: Using R platform for data analysis
Day is primarily devoted to building R codes for data analysis
R Programming for data analysis 
1 
9 AM 
10:15 AM 
R Programming for data analysis…cont. 
2 
10:30 AM 
11: 45 AM 
R Programming for data analysis…cont. 
3 
12:00 PM 
1:15 PM 
R Programming for data analysis…cont. 
4 
2:15 PM 
3:30 PM 
R Programming for data analysis…cont. 
5 
3:45 PM 
5:00 PM 
Day 3: Introduction to GG Plot, R Markdown and Rattle
Day will cover packages like GG plot, Markdown and Rattle
R Programming for data analysis…cont. 
1 
9 AM 
10:15 AM 
The aesthetic of GG plot package 
2 
10:30 AM 
11: 45 AM 
The aesthetic of GG plot package 
3 
12:00 PM 
1:15 PM 
Introduction to useful packages in R 
4 
2:15 PM 
3:30 PM 
Introduction to useful packages in R 
5 
3:45 PM 
5:00 PM 
Day 4: Introduction to Shiny and Basic of statistics
Day will cover concept building on Basic of Statistics
Introduction to useful packages in R 
1 
9 AM 
10:15 AM 
Introduction to useful packages in R…cont. 
2 
10:30 AM 
11: 45 AM 
Basic of statistics 
3 
12:00 PM 
1:15 PM 
Basic of statistics…cont. 
4 
2:15 PM 
3:30 PM 
Lab 1: Simple Linear Regression 
5 
3:45 PM 
5:00 PM 
Day 5: Introduction to Machine Learning and its implementation using R
Day will cover concept building on supervised and unsupervised machine learning
Lab1: Simple Linear Regression…cont. 
1 
9 AM 
10:15 AM 
Lab2: Logistic Regression 
2 
10:30 AM 
11: 45 AM 
Lab2: Logistic Regression…cont. 
3 
12:00 PM 
1:15 PM 
Lab3: Clustering and Segmentation 
4 
2:15 PM 
3:30 PM 
Lab3: Clustering and Segmentation…cont. 
5 
3:45 PM 
5:00 PM 
John Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
ReplayJohn Doe says:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...
Replay