The R programming language was designed for doing statistics. In my view, its great popularity among statisticians, people learning statistics, data miners and others is due to the way it facilities the process of thinking about statistics. R’s syntax greatly aids in expressing statistical models. Often, it is intuitive shorthand for the mathematics. R’s interactive nature and the ability to get near instantaneous feedback encourages experimentation and self-learning; and, once you get a feel for where the resources can be found, the commitment and creativity of the R community is a source of great encouragement.
It is true that learning R takes some effort. However, just like with learning a new natural language useful things can be done and great fun had before achieving fluency. I think that the process of learning R can be broken down into the following five stages:
1. Understand something of the culture of the R community, the environment in which the R programming language is maintained and developed. Become familiar with the resources available. Install the R on your computer and run a test script.
2. Read csv files into data frames and confidently use R functions to perform statistical analyses in a domain with which you are familiar.
3. Use the basic control structures of the R language to write simple programs. Write your own functions, become familiar with the data structures included in R and begin to explore the rich features of the language. Interface with database, web pages and other external data sources.
4. Write complex programs in the language. Develop an understanding of the deep structure of the language S3 and S4 objects, closures etc.
5. Develop programs for production use. Write an R package.
Stage 1 can be achieved in less than a day and, with the right reference book, should be enough to launch anyone sitting down to learn statistics on a very good trajectory. The completion of stage 2 with regular work at stage 3 might be all that most people ever need to know. Once one becomes familiar with the libraries of R functions that are important to one’s field, it is not inconceivable that proficiency at this level is sufficient for professional scientists, social scientists and others for whom the mechanics of model building and analysis is not their main focus can go about their daily work. For the rest of us who want to do some serious modeling analysis, it’s a matter of taking Malcolm Gladwell’s advice and getting in your 10,000 hours.
So, how would I advise an R newbie to go about learning R? – jump right in, get oriented, latch on to a learning resource that fits your style, run other people’s R scripts that do something interesting, and begin writing your own.
Getting oriented
The best way to get oriented is to explore theInside-R web site,CRAN (particularly the task views) and crantastic. Download R and a GUI-based integrated development environment (IDE). If you are fortunate enough to have access to Revolution Analytics Enterprise R IDEthen you are off to a very good start. Otherwise, try RStudio.
Resources
Resources for learning R generally fit into three categories:
1. Books, papers, presentations and other “slideware”
2. Blogs
3. Formal courses
Books
I am a book person, so my knee jerk reaction to learning anything new is to find a good book. This might seem quaint to the mobile app generation, but, as it turns out, each of the major technical publishing houses specializing in statistics books: Springer,
the Cambridge University Press, Chapman&Hall / CRC have excellent books on doing statistics with R. Springer is the clear leader. The short texts in Springer’s
Use-R series are at an introductory level, are modestly priced and each focuses on a different statistical area. The following recommendations are only just a small sample of what is available. Even the extensive list on the Inside-R site is no longer complete.
Probably the best text for someone new to both statistics and R is Peter Dalgaar’s
“Introductory Statistics with R” . A personal favorite of mine at approximately the same level is John Fox’s
“An R and S-Plus Companion to Applied Regression”
. Slightly more advanced but very readable and enjoyable texts are Maindonald and Braun’s
“Data Analysis and Graphics Using R: An Example-based Approach” and Gelman and Hill’s
"Data Analysis Using Regression and Multilevel / Hierarchical Models”. A reference text that every aspiring R competent statistician ought to have is Venables and Ripley’s
“Modern Applied Statistics with S (Statistics and Computing”.
A very short but sweet book that ought to help beginners become familiar with R’s data structures is Phil Spector’s “Data Manipulation with R”. Two other noteworthy books in this class are the O’Reilly publications “R in a Nutshell” by Joe Adler and the “R Cookbook” by Paul Teetor. If you have a SAS or SPSS background then Robert Muenchen’s “R for SAS and SPSS Users” might be your bible. If you are an accomplished programmer and want a technical overview of the R language try John Chamber’s “Software for Data Analysis" .
Blogs
Besides books and their accompanying websites blogs are excellent place to get your hands on interesting, useful code. My favorite blogs are David Smith’s blog at Revolution, Quick R, R-Bloggers , and Rob Hyndman’s blog.
Courses
If a semi-formal setting better suites you style of learning than please do have a look at the courses offered by Statistics.com. I took one of their courses taught by Hadley Wickham, and very much enjoyed it.