Resources for R software for Behavioral Scientists
Theodore A. Walls
University of Rhode Island, Department of Psychology
R is an open source program for statistical computing. By integrating "packages" contributed by volunteer programmers in statistics and other fields, it enables computation of all of the basic statistical tests typically taught in undergraduate psychology through mid-level graduate training. You can read about R on these web sites and download the program if you would like. The way it works is that you first download the program from the CRAN server, and then you go about downloading packages that you would like to use. A (biased and incomplete) list of packages to start with could include: lmm, lme, lmesplines, nlme, vr, alr3, design, hmisc, and vr. Many textbooks have been written on statistical topics ranging from introductory to advanced modeling, and many of these have corresponding packages in R.
There are several advantages to using R. First, for instruction, the program runs from a more basic computing language, unlike the drop menu "graphical interface" oriented programs like SPSS and increasingly, SAS. Experience with R will enable students to develop programming skill that is transferable to a number of other programming needs that they may encounter. By contrast, reliance on graphical interfaces (which are also available for R, for example in the R-based commercial package, S-PLUS), seems to distance students from both the programming skill development and, often, from the statistical operations that they are implementing. Second, R is available worldwide, free of charge. Because of this, students who continue with statistical work in resource-limited settings after graduation can gain access to the program when they return to their home settings without paying relatively high purchase or licensing fees. Third, many new statistical models emerge first in R, and are only later implemented in packages like S or R (if at all). Many methodologists in fields outside of statistics, including quantitative psychology and econometrics, are increasingly developing methods in R. Fourth, for students who wish to go on to develop methods themselves or to simulate data for assessment of model efficacy in various circumstances, R is designed in a way that is easier to manipulate for this purpose.
There are some minor challenges in adopting R for student instruction. First, introductory level textbooks are only just emerging. Second, it may be that employers may seek expertise in currently more widely known packages like SPSS, SAS or STATA. Third, installation of the program is somewhat different than other programs, in that the base program is loaded initially followed by a user-specific selection of downloadable packages. Fourth, some users may be inhibited by the typing-based interfaces, especially if they have become accustomed to packages with graphical user interfaces. Fifth, there is something of a learning curve in becoming comfortable with the way that R is set up, both as a computing language as a way of handling data. In my opinion, all of these obstacles are relatively easy to overcome. I point out a number of resources that may help to address them below.
For the most comprehensive information on R from the developers:
For a Quick Guide on how to Load R
For a Quick Guide on how to Load Data Files in R
Web pages regarding the use for R in psychology:
Useful books (with links to publishers and associated web pages where applicable). I make some attempt to 'locate' these texts in terms of possible teaching use in behavioral science and point out strengths.
Chambers J. M. & Hastie, T. H. (eds) (1992). Statistical Models in S. This one of the classics on the topic. See home pages for Chambers and Hastie.Venables, W.N. & Ripley, B.D. (2002). Modern applied statistics with S. This book contains basic information on R (the open source program under discussion here) and S (the statistical data analysis and graphics program that R is actually running. It is a widely cited text and is used frequently in statistics. In my opinion, the writing and programming level would be challenging for undergraduates in the social sciences, unless a lot of additional handholding and supplementary materials were provided by the instructor. Moreover, the book moves from general programming to advanced models that are currently popular in science very quickly. This said, it is a clearly written and comprehensive book developed in conjunction with the R-Project. Most people using R professionally own it. In addition, the authors have long had a free introductory book available on the web free of charge. This is a fairly technical introduction, but it is accurate an excellent resource. Also see home pages for these authors, Venables and Ripley.
Dalgaard, P. (2002). Introductory statistics with R. This compact volume maps exactly to the typical content covered in introductory statistics in any field. It is virtually notation free, making it similar to the Maindonald and Braun volume in that it would be best utilized as a supplementary text. It has a nice introduction to R programming concepts. See the author's home page, Dalgaard.
Verzani, J. (2005). Using R for introductory statistics. This book is the closest I have seen so far to a book that integrates instruction in probability and statistics with R programming. The level is aimed at precalculus level students. The text could be used alone with a main statistics text, with a good deal of supplemental work on the part of the instructor. Or, it would be a rich supplementary text to accompany a strong undergraduate statistics text (such as Hogg & Tanis, 2001 for statistics students or Hinkle, Wiersma & Jurs, 2003, one of the infinite number of statistics texts in psychology and social science; I used this text last semester and had generally good experiences with it). See also this home page for the author, Verzani.
Maindonald, J. & Braun, J. (2003). Data analysis and graphics using R. (or use this alternative link). This is a wonderful book in terms of providing guidelines for applied analysis and for coverage of graphics and their interpretation. The authors intentionally reduced mathematical notation with a preference for accessibility. As such, it is very nice supplemental text, but could not easily stand alone for use in a course. It would be a nice resource for a graduate level multivariate course utilizing R. See also the authors' home pages at Maindonald, Braun, and this starter page at UCLA,
Fox, J. (2002). An R and S-Plus companion to applied regression. This book focuses on applied regression and can be thought of as an advanced topic specific coverage of that which is in Verzani or a comparable subset of Maindonald and Braun. Also, note forthcoming volumes by Fox.
Everitt, B.S. (2005). An R and S-Plus® Companion to Multivariate Analysis
Sarkar, D. (2008) Lattice: Multivariate Data Visualization in R. Provides in-depth documentation and
explanations of lattice graphics.
Spector (2008) Data Manipulation with R.Concluding comments. Clearly, all of these texts have focused on making R accessible and useful as a supplementary resource to a main text. In terms of teaching efficiency and reducing the cost of texts for students, one might wish for an integrated text that covers probability and statistics well and also covers R, at both the undergraduate level. Or, perhaps the pluralism of resources that students would gain by having both a main text and a computing guide is actually better, as these products may suggest.
Advanced topics and other links
Pinheiro, J.C. Bates, D.M (2000). Mixed-Effects Models in S and S-Plus.
Herbrich, R. (2002). Learning Kernel Classifiers
Functional Data Analysis (with R code)
Two variable, Two Group Interrupted Process.
(To be added as identified)
Programs used in professional papers
Chung, Walls & Park, program 1.
Chung, Walls & Park, program 2.