#52WeeksOfCode Week 30 – The R Project

Week: 30

Language: The R Project

IDE(s): R

History (official):

(From the R Project “What is R?” page)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

 

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and

 

  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

 

Once again, this is what you get when programmers write your sales materials – nothing but facts.

Boring, tediously informative facts.

History (real):

In the Olden Days (™), if you wanted a computer to do your math homework, you had to use FORTRAN. It wasn’t what you might call ‘interactive’. You wrote your code, submitted it to the mainframe, which compiled and ran it. Assuming you didn’t have any typos, you got a printout of the results. (FORTRAN was my first programming language back in 1977. We used punch cards.)

This was always annoying and occasionally painful, but there were no good alternatives until the mid-70’s, when researchers at Bell Labs developed the programming language ‘S’. It was standard practice at the time to give programming languages single letter names.

I’m picturing the marketing meetings:

“How about ‘Bell Labs: We Don’t Have Time For This’?”

“Not bad. But I really like ‘Bell Labs: Smart But Terse’.”

“Love it!”

Moving on.

In the early 90’s, researchers at the University of Auckland, New Zealand developed a new version of S that they called R. Currently it’s being maintained by the R Development Core Team, with contributors  from all over the world. The name R is not just a play on the name S, but is also a tribute to the original developers, Robert Gentleman and Ross Ihaka, who were known at university as “R & R”.

R is free and available for Linux, Windows and Mac OS X. The source code is also freely available so you can compile it for any platform you like.

Discussion:

I’ve been looking forward to this for some time. I teach undergraduate math and occasionally blog about math education so math software is a particular interest of mine. I’m a firm believer in letting machines do the grunt work of mathematics. If you understand the problem well enough to explain it to a computer, then by definition you understand the problem.

I downloaded the Mac version of R from the main project site. R is a command line based tool so I wasn’t that surprised when I started up the program and got a window with a command prompt:

R for Mac startup window

R for Mac startup window

 

The window has a toolbar with easy access to common functions:

  • Load data or a script file
  • Open a new window (for charts and plots)
  • Authorize R to run commands as root (system administrator)
  • Show/hide R command history
  • Set R console colors
  • Open document in editor
  • Create a new empty document
  • Print this document
  • Quit

Most of your work is done at the command line.

Before we get going, I’d like to perform the traditional “Hello World!”:

Hello World! in R

Hello World! in R

This term I’m teaching an introductory statistics course so I’ve got some sample data from classroom exercises to run R through it’s paces. R stores with data tables in variables called data frames. There are pre-loaded data frames available in the software with which to experiment. You can enter data manually or just load the data from an outside source.

My data is in spreadsheet format and there are a number of ways to import spreadsheet files directly into R. Since I use Google Sheets as my primary spreadsheet program, the easiest way for me was to save off my data in CSV format. (Excel files can be imported directly.)

R assumes files are in the current working directory. The default is my home directory so I changed that setting to where I’d saved the files.

Get and set the working directory

Get and set the working directory

My first test data was a simple table comparing car prices to the age of the car. I chose a specific make and model (Toyota Sienna) and pulled these numbers straight from AutoTrader.com. I converted the worksheet and loaded the data table into R:

Car age vs. price data table

Car age vs. price data table

Now I can work with it directly. First let’s give it a once-over using the summary() function:

Summary of data table

Summary of data table

This gives me my central tendency numbers amongst others. Now let’s do a quick plot using using the pairs() function.

Data plot using pairs()

Data plot using pairs()

I got two plots, one with age as the dependent variable and other with price. I didn’t tell R which was which so it did both. I can be more specific using the plot() function:

Plot command syntax

Plot command syntax

This tells R that price is dependent on the age of the car. This gives me a single chart:

Chart using plot()

Chart using plot()

Now I can calculate the correlation coefficient with cor() to see how strongly the two sets of data relate to each other:

Correlation Table

Correlation Table

So price is negatively correlated with the age of the car, which fits what the chart told us. Older cars cost less, in other words. It’s a pretty strong correlation, too, at 85%.

Now we’d like to do some prediction so we’ll perform a linear regression on the data. First create a data structure with the regression data, then pull a summary:

Linear regression

Linear regression

Now you have a processed data set and you can continue working with it.

We can get data in and manipulate it but how do we get it out? For text data, such as the correlation summary, you can just copy and paste it from the R gui window. The plots appear in a separate window. I was able to click on the image, select Copy from the Edit menu and paste directly in a document.

R is a very powerful, interactive language for scientific and math computing. So why would you use it instead of a spreadsheet?

Frankly, if you’re not a full-on numbers nerd, you may just want to stick with spreadsheets. But they’re a general purpose tool and R has more math functionality. You can write your own functions and even groups of functions (called packages) to extend R even further.

Another advantage of R is automation. Anything you can type in at the command prompt can be saved into a file, letting you easily set up long, complex sets of calculations that can be loaded into your workspace with a single command. If you’re doing batch processing of multiple datasets, this can save a lot of time and effort.

The documentation is very good and there are plenty of tutorials and examples available at the project homepage and around the Web.

 

#Coding4Humans #DIYMath FreeMat

(Cross-posted at We Hate Math)

Today in DIY Math we’re looking at  FreeMat. As the name suggests It’s modeled after MatLab. FreeMat has been in development for over a decade by a group of volunteers

System Requirements – Specific hardware requirements were not available but the pre-built packages I tested all run on 32 or 64-bit Intel-compatible CPUs. The application itself doesn’t seem to use much memory. As an example, the Mac version uses about 85 MB of real memory on my system. Since Windows XP is supported, we can assume that XP-compatible hardware constitutes the base system.

Installation – The latest version is 4.2 and is available for Windows (XP and up), Linux (various) and Mac OS X. In addition to pre-built packages for the above platforms, the source code is also available and is released under the GPL license. All versions of FreeMat are kept at the same version level and functionality.

Windows: Simply download the 52.5 MB setup file and double-click it. (NOTE: A portable version of Freemat is also available so you can run it from a thumb drive without installation.)

Linux – I installed FreeMat on Debian Linux using APT and on my system it was a 12 MB download, using an additional 22 MB of disk space.

Mac OS X – The installer is a 79.5 MB compressed disk image (DMG) file. Double-click the file to mount it, then drag the program and documentation to your Applications folder. The two files together take up about 250 MB of disk space.

Documentation – The Mac download comes with a PDF manual detailing all of the functions available in FreeMat. (For Windows or Linux, you can download the manual here.) The manual is automatically generated using Doxygen, which scans specially marked comments in the source code and outputs documentation in a variety of file formats.

This is both a good thing and a bad thing. It’s good in that it makes it easy for developers to actually maintain their documentation, assuming they remember to update the comments. It’s a bad thing because there’s no guarantee that the resulting document will be well-written. In fact, the included manual is very sparsely written, despite the 162 page (!) table of contents. It is less a manual than simply an API reference. Each function or class is briefly described and includes one or two usage examples. The target audience for this manual are those who don’t need a manual. It’s comprehensive but very terse.

A much better option to start with is the FreeMat Primer. Who’s the audience? Let the authors (Gary Schafer and Timothy Cyders) tell you:

We assume that you have Freemat properly installed and working. If you have any issues, direct them to the online Freemat group, http://groups.google.com/group/freemat.

This book was originally written for the Windows version. The book now covers more of the Linux and Mac versions, as well. In those cases where there are differences, we’ll point them out.

It’s a much friendlier introduction to the software. It’s very readable, with plenty of screenshots, little tutorials and code examples. With this and the official function reference you have a very good documentation base. In addition, there is also a Google group available for more interactive support. There is another Google community intended to host FreeMat tutorials. (At this time the content is a bit sparse.) You can also type helpwin at the command prompt from within FreeMat.

Compatibility – Based on the scripts I tested, MatLab support is somewhat hit-or-miss. I’ve been able to run scripts with no modifications, minor modifications or not at all. I would suggest that you test your Matlab scripts on a case-by-case basis and then decide whether you want to make the changes or just re-write from scratch. The scripting syntax is similar enough that most of your work will be figuring out equivalent function calls.  (A MatLab to FreeMat translation guide would be a really good project. Better yet, some kind of conversion tool.)

Command Line vs. GUI – The Windows and Mac versions of FreeMat are targeted at a graphical interface so accessing the tool from the command line is at best a non-trivial . The Linux version can be launched from the CLI. With no parameters, the graphical client starts up by default. To use the CLI version only, start the tool with the option -noX or -nogui to suppress the graphical subsystem This will give you a FreeMat command prompt in your terminal window. If you simply wish to run a Freemat command and then exit, use the option -f to run the tool in command mode. (NOTE: if you want to see the output of your command, make sure to specify that as FreeMat will not show any output.)

Integrating FreeMat with your native scripting environment is problematic (okay, just about impossible),as FreeMat scripts are meant to be run from within the FreeMat interface. You can edit them inside FreeMat or using your favorite text editor but make sure that they are saved to FreeMat’s working directory. (You can set this up by running pathtool from within FreeMat.)

 

The GUI for each version is comparable in look and feel.

FreeMat Interface

FreeMat GUI

This is from the Mac version of the tool. In addition to the main terminal window, FreeMat also tracks your command history (allowing you to invoke a previous command simply by double-clicking on it), tracks what variables are currently in memory, along with their data types and values if applicable. The Debug window is supposed to show any error or warning messages but on all three platforms I tested, the messages showed up in the main terminal window and the Debug window remained blank.

Summary

Pros: Easy installation, all supported platforms are kept current with a common codebase, decent documentation and online support.

Cons: Development progress is a bit slow. The latest release (4.2) was posted in June of 2013 and that was two years after the previous release. CLI support is limited or non-existent in the Windows and Mac versions and all scripts are restricted to running within the FreeMat environment. Third party support is a bit anemic.

Would I use this in my class? – I would feel confident recommending this to my students. The ease of installation and minimal setup are a definite plus, you don’t need the latest hardware to run it and the price fits everyone’s budget. It supports nearly everything we might do in 100- and 200-level math classes with enough overhead room for more advanced work.

#DIYMath – Math Wants to be Free

(Cross-posted at We Hate Math)

As a programming and math nerd, I’ve certainly made good use of Wolfram Alpha. After all, it’s free*, it’s ubiquitous (all you need is a Web browser but there are also apps for both Android and IOS) and it’s natural language interface is very powerful and easy to use.  It’s certainly a cost-effective alternative to commercial math packages like Matlab ($50 – $2,150) or even Wolfram’s own Mathematica. ($139 – $2,495)

However, much as I love the folks at Wolfram, it’s nice to have your own math software that:

  • doesn’t require an Internet connection
  • runs on computers you control
  • still gives you a lot of power and flexibility

It’s even better if the software is:

It turns out that there are several software packages that fit the bill, each with their own strengths and weaknesses but all absolutely free and cross-platform. With that in mind, I’m going to be reviewing each of them from the perspective of a teacher and casual programmer. To keep things consistent, I’ll be looking at the following categories:

System Requirements – Because there’s no point downloading the software unless you can actually run it.

Installation – How easy is it to find and install the software? How big of a download is it and how much disk space and RAM does it need? How does installation compare between platforms? I’ll be installing the software on Windows 7, Debian Linux and Mac OS X Mavericks and comparing the experience.

Documentation – Does the developer offer good documentation and/or tutorials? (By ‘good’, I mean documentation you are actually expected to read**.) Is information available from third parties?

Compatibility – Like it or not, MatLab and Mathematica are the big dogs in the math software field. How easily could a MatLab or Mathematica user transition to this package? How easy is it to port code? The easiest way to test this is to see if scripts designed for MatLab or Mathematica will run with minimal or no modification.

Command Line vs. GUI – Some of these packages allow you to run them from the command line as well as in a graphical interface. This is very useful as it allows you to integrate the software with your native scripting language for easy automation. How do the two compare? Do both offer the same functionality? Does the software operate in the same way on different operating systems?

Summary – Pros, Cons and whether I’d recommend this to my students.

As I’ve said, I’m looking at this from the perspective of a math teacher. Are there other aspects of the software you’d like me to examine?

*for a certain definition of ‘free’

**I’ve noticed that a lot of open source software documentation seems to assume that the target audience are those who don’t need to read it. Yes, poor documentation makes me cranky.