Learning R Programming – Part 1

As per Glassdoor Top 5 skills in Data Science for job openings are:

  1. Python
  2. R
  3. SQL
  4. Hadoop
  5. Java

Most Java developers know SQL, Hadoop & Java to a good extent in today’s environment, two important skills Python & R should be learned by the Java developer / architect / manager if s/he wants to contribute / work in Data Science area. In this article you will find a structured step by step approach for learning programming in R.

  1. To start with install R and familiarize yourself with R Console & R Script interface. You can run commands on both but it’s best to write multiple commands and try them out in script editor. Use short cut CTRL + R to run your commands in R Script editor.
  2. Explore menu options like Package -> Install / Load / Choose CRAN Mirror. By default many commands for statistics, visualization, etc. are given in R by default. Big set of libraries are already loaded into R by default and 100s more are available. Select any mirror to download new packages and install / load them step by step. You will need working internet connection. Learn how to set working directory. You can see default libraries available in R using library()
  3. From there move on to various R Objects / Data Types – Explore Data Frame, Vector, List, Matrices, Arrays and Factors. Try out examples for the same.
  4. Next step learn to load / read and write datasets by commands like read.csv / write.csv. You can also read / write excel sheets but for it you will need other packages. See the basic commands like summary, structure & fix to analyze / edit your dataset
  5. Next step – go through various categories of operators (logical, mathematical, relational) and concepts like pipe %>%, constants, rules for naming identifiers followed by various statistical functions directly available in R. You can get help on a command by using ?<COMMAND>. Also, learn to create functions and use conditions like if
  6. By now you should revise basics of statistics & various visualization charts which are taught typically in Year 1 / Semester 1 of MBA. Explore various default commands for statistics built into R by default. Some examples – mean, variance, standard deviation, etc.
  7. Learn to manipulate / read / write datasets using subset, sample_n & sample_frac and using dplyr package which has commands like select & filter among others
  8. Check various types of default visualization commands in R for various charts like barplot & pie. Post this learn how to use ggplot2 package
  9. You will get many datasets at kaggle.com and various websites like stock exchanges – NSE / BSE, RBI, Open Data websites of various Governments and others
  10. Explore top 20 packages of R categorized by various areas as given below.

An advantage of learning R is that you will become better at statistics & data science. It’s much simpler than Java in terms of syntax and structure and is influenced by open source languages / scripting like Linux, etc.

Reach out to me at neil@techandtrain.com if you want to discuss R, conduct a training for MBA / BE / MCA / MSc students in R or want to conduct a workshop for your managers / executives on Data Science / R / Java / etc.

References:

Top 10 skills for Data Science – Glassdoor Economic Research

Top 20 packages in R

Leave a Reply