Disable Preloader
  • Instructor
    Rahul Kumar
  • Category
    R Programming and Data science
  • Course Fees
    Quotation on request Rs.

Course name : R Programming and Data Science Course 5 Days

R Programming and Data Science

HARVARD BUSINESS REVIEW SAYS DATA SCIENTIST IS THE SEXIEST JOB OF THE 21ST CENTURY. BUT DEMAND FOR DATA SCIENTISTS IS RACING AHEAD OF SUPPLY. PEOPLE WITH THE NECESSARY SKILLS ARE SCARCE, PRIMARILY BECAUSE THE DISCIPLINE IS NEW. THIS SESSION WILL GIVE AN OVERVIEW OF DATA SCIENCE AND ITS ROLE IN COMPLEX BUSINESS DECISIONS MAKING PROCESS.

This course is intended to give a holistic understanding of R programming and its usage in solving business problems. The workshop will cover

  • An introduction to business analytics
  • An introduction to R platform for data analysis
  • Usage of 23 different examples to understand the various libraries, functions and features available in R
  • Introduction to different packages which can be used in R for making robust and complex machine learning models
  • Hands-on using the R code and the real life dataset
  • An introduction to supervised machine learning algorithms
  • An introduction to unsupervised machine learning algorithms

OBJECTIVE

We are living in an era where computing moved from mainframes to personal computers to cloud. And while it happened, we started generating humongous amount of data. However the multi-folds increase in computing power also brought in advancement in application of algorithms which can be used to get insights from huge amount of data being generated. In this course, you will learn to nuances of using R programming to build supervised and unsupervised machine learning models on real life datasets. We’ll introduce you to R platform, advanced concepts in R, and some of the statistical and machine learning algorithms which will become handy in solving challenging problems.

At the end of the course you will develop a clear understanding of the need of R programming language, machine learning algorithms and the context in which these algorithms will be applied to solve complex problems from the field of business.

WHO SHOULD ATTEND

Irrespective of type of industry (retail, e-commerce, manufacturing, real estate & construction, telecom, hospitality, banking, healthcare, IT, supply chain &logistic, etc.); data forms the crux of decision making. This course is designed hone up analytical skills and business acumen of mid-level and senior level corporate professionals trying to understand the nuances of data science and help them the machine learning techniques an efficient way to generate insights for customers which in turn optimizes the bottom line of organizations.

HARDWARE AND SOFTWARE

  1. Participants should bring their laptop (preferably Windows 7 or higher/ Mac OS installed).

  2. Operating System (any of the following):

  • Mac OS X with XQuartz

  • Windows (Version XP or later) is required.

  1. Minimum 8 GB RAM on the system is advisable.

INSTALLATIONS:

  1. Participants should have latest version of R and R Studio installed on their system.

  2. First Install R and then R Studio. Latest version of the software can be found at:

PRE-REQUISITE & COURSE DELIVERABLE

  1. Participants should have basic programming skills. Participants are expected to spend time with the code set as a home assignment to leverage the classroom training hours to the fullest.

  2. High speed internet connection will be provided at the training venue.

Deliverable: Python code and dataset. Soft copy of the content being covered  (PDF file)

COURSE OUTLINE

Day 1: Understanding R platform

Session 1–Introduction to Business Analytics

  • What is Business Analytics
  • Why is it needed and how industries are adopting it
  • Different components of analytics
  • Applications of analytics in different domains
  • Different types of machine learning algorithms–Supervised, Unsupervised and Reinforcement learning, Evolutionary learning

Session 2 & 3–Introduction to R Platform

  • Overview of R
  • Overview of R Studio (Script editor, console, global environment)
    • Creating working directory
    • Inbuilt dataset in R
    • Understanding structure and summary of data
    • Head and Tail function and its use
  • Fundamentals of R
    • Reading Data Files (text file, csv file, database connectivity)
    • Writing Data files and charts in working directory
  • Data Manipulations in R
    • Creating new variables
    • Update a variable
    • Sub setting: Use of where function and subset function
    • Sort, Merge, Aggregate
    • Reshape a data
    • Type casting of variables
    • Data imputation techniques for handling missing values

Session 4 & 5–Introduction to R Platform

  • Data types in R
    • Data frame
    • list
    • vector
    • matrix
    • Factor variables
  • Loops and Functions in R
    • If else construct in R
    • While loop and for loop in R
    • The apply function family
  • Useful packages for visualization and modelling
    • “sandwich","rms","Deducer","ROCR","cairoDevice","rattle","caret","lubridate","plotrix","xts","manipulate","Quandl”,“ggplot2”, “shiny”

Day 2: Using R platform for data analysis

Session 1 & 2–R programing for Data analysis

  • Run through with twenty three example codes in R
    • How to understand correlation amongst variables
    • Generate random numbers and plot histogram, normality plot
    • Scan an input from console, create directory and sink files into the directory
    • Understand skewness in data using data from datasets package
    • Use mtcars dataset to build a regression and visualize using plot function
    • Use mtcars dataset to build a regression and visualize using ggplot package
    • Create a matrix and array in R

Session 3, 4 & 5– R programing for Data analysis

  • Run through with twenty three example codes in R
    • Create a factor variable in R
    • Using lubridate package for date manipulation in R
    • Read data from yahoo finance and subset the data and plot the data using plot function
    • Apply central limit theorem on data imported from yahoo finance
    • Overlay two trend lines side by side on the same plot
    • Use plotrix package to make a 3D pie chart
    • Use xts package for time series manipulation

Day 3: Introduction to GG Plot, R Markdown and Rattle

Session 1– R programing for Data analysis

  • Run through with twenty three example codes in R
    • Boxplot using data from Yahoo finance and plot multiple charts in the same window
    • Use manipulate package with subset and plot function to make interactive charts
    • Creating user defined functions in R
    • Using sample function to generate multiple samples from a dataset
    • Understand uniform distribution, binomial distribution using R
    • Create multiple random samples from a dataset and list them as columns on a spreadsheet
    • Using quandl dataset to demonstrate central limit theorem

Session 2 & 3–The aesthetic of GG plot package

  • How variables in the data are mapped to visual properties
  • Specifying color, line, shape, text and justification

Session 4 & 5–Introduction to useful packages in R

  • Introduction to R Markdown: Markdown script to create reports
  • Introduction to Rattle

Day 4: Introduction to Shiny and Basic of statistics

Session 1 & 2–Introduction to useful packages in R

  • Introduction to Shiny
    • Using ggplot and other libraries to build an online dashboard using HR data
    • Deploy the dashboard in the cloud

Session 3 &4–Basics of Statistics

  • Random Variable – Discrete and Continuous
  • Probability density function and Cumulative density function
  • Distribution Family – Gaussian Distribution, Standard Normal Distribution
  • Population and Sample
  • Central limit theorem
  • Demonstration of Central limit theorem on finance data
  • Hypothesis testing – Z test, t test, test for proportion
  • Covariance and Correlation

Session 5–Lab 1: Linear Regression

  • Introduction to simple and multiple linear regression
  • Regression diagnostic–R-squared, t-test, F-test, error terms distribution, heteroscedasticity

Day 5: Introduction to Machine Learning and its implementation using R

Session 1–Lab 1: Linear Regression

  • Case study using regression techniques
  • Hands-on using R code for regression using Stats package and Caret Package

Session 2 & 3–Lab 2: Logistic Regression

  • Introduction to logistic regression
  • Logistic regression diagnostic: Wald statistics, Hosmer Lemeshow test, Classification Matrix, Sensitivity, Specificity, ROC Curve
  • Strategy to find the optimal cut-off
  • Case study using logistic regression techniques
  • Hands-on using R code for regression using Stats package and Caret Package

Session 4 & 5–Lab 4: Clustering and Segmentation

  • Supervised and Unsupervised learning
  • Clustering–Hierarchical, K means
  • Clustering diagnostic–Dendrogram , Calinski and Harabasz index, Silhouette width
  • Case study using hierarchical clustering and K–means clustering  tree techniques
  • Hands-on using R code for Hierarchical and K–means cluster

COURSE SCHEDULE

Day 1: Understanding R platform

This day will be primarily cover introduction to business analytics, introduction to R platform.

Introduction to Business Analytics

1

9 AM

10:15 AM

Introduction to R platform

2

10:30 AM

11:15 AM

Introduction to R platform…cont.

3

12:00 PM

1:15 PM

Introduction to R platform…cont.

4

2:15 PM

3:30 PM

Introduction to R platform…cont.

5

3:45 PM

5:00 PM

Day 2: Using R platform for data analysis

Day is primarily devoted to building R codes for data analysis

R Programming for data analysis

1

9 AM

10:15 AM

R Programming for data analysis…cont.

2

10:30 AM

11: 45 AM

R Programming for data analysis…cont.

3

12:00 PM

1:15 PM

R Programming for data analysis…cont.

4

2:15 PM

3:30 PM

R Programming for data analysis…cont.

5

3:45 PM

5:00 PM

Day 3: Introduction to GG Plot, R Markdown and Rattle

Day will cover packages like GG plot, Markdown and Rattle

R Programming for data analysis…cont.

1

9 AM

10:15 AM

 The aesthetic of GG plot package

2

10:30 AM

11: 45 AM

The aesthetic of GG plot package

3

12:00 PM

1:15 PM

Introduction to useful packages in R

4

2:15 PM

3:30 PM

Introduction to useful packages in R

5

3:45 PM

5:00 PM

Day 4: Introduction to Shiny and Basic of statistics

Day will cover concept building on Basic of Statistics

Introduction to useful packages in R

1

9 AM

10:15 AM

Introduction to useful packages in R…cont.

2

10:30 AM

11: 45 AM

Basic of statistics

3

12:00 PM

1:15 PM

Basic of statistics…cont.

4

2:15 PM

3:30 PM

Lab 1: Simple Linear Regression

5

3:45 PM

5:00 PM

Day 5: Introduction to Machine Learning and its implementation using R

Day will cover concept building on supervised and unsupervised machine learning

Lab1: Simple Linear Regression…cont.

1

9 AM

10:15 AM

Lab2: Logistic Regression

2

10:30 AM

11: 45 AM

Lab2: Logistic Regression…cont.

3

12:00 PM

1:15 PM

Lab3: Clustering and Segmentation

4

2:15 PM

3:30 PM

Lab3: Clustering and Segmentation…cont.

5

3:45 PM

5:00 PM

Course Reviews

Average Rating:4.6

5 Stars24
4 Stars5
3 Stars2
2 Stars0
1 Star0

Comments

  • John Doe says:
    23/06/2014

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...

    Replay
  • John Doe says:
    23/06/2014

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...

    Replay
    John Doe says:
    23/06/2014

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...

    Replay
    John Doe says:
    23/06/2014

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...

    Replay
  • John Doe says:
    23/06/2014

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna et sed aliqua. Ut enim ea commodo consequat...

    Replay
Leave a Comment