#52WeeksOfCode Week 30 – The R Project

Week: 30

Language: The R Project

IDE(s): R

History (official):

(From the R Project “What is R?” page)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

 

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and

 

  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

 

Once again, this is what you get when programmers write your sales materials – nothing but facts.

Boring, tediously informative facts.

History (real):

In the Olden Days (™), if you wanted a computer to do your math homework, you had to use FORTRAN. It wasn’t what you might call ‘interactive’. You wrote your code, submitted it to the mainframe, which compiled and ran it. Assuming you didn’t have any typos, you got a printout of the results. (FORTRAN was my first programming language back in 1977. We used punch cards.)

This was always annoying and occasionally painful, but there were no good alternatives until the mid-70’s, when researchers at Bell Labs developed the programming language ‘S’. It was standard practice at the time to give programming languages single letter names.

I’m picturing the marketing meetings:

“How about ‘Bell Labs: We Don’t Have Time For This’?”

“Not bad. But I really like ‘Bell Labs: Smart But Terse’.”

“Love it!”

Moving on.

In the early 90’s, researchers at the University of Auckland, New Zealand developed a new version of S that they called R. Currently it’s being maintained by the R Development Core Team, with contributors  from all over the world. The name R is not just a play on the name S, but is also a tribute to the original developers, Robert Gentleman and Ross Ihaka, who were known at university as “R & R”.

R is free and available for Linux, Windows and Mac OS X. The source code is also freely available so you can compile it for any platform you like.

Discussion:

I’ve been looking forward to this for some time. I teach undergraduate math and occasionally blog about math education so math software is a particular interest of mine. I’m a firm believer in letting machines do the grunt work of mathematics. If you understand the problem well enough to explain it to a computer, then by definition you understand the problem.

I downloaded the Mac version of R from the main project site. R is a command line based tool so I wasn’t that surprised when I started up the program and got a window with a command prompt:

R for Mac startup window

R for Mac startup window

 

The window has a toolbar with easy access to common functions:

  • Load data or a script file
  • Open a new window (for charts and plots)
  • Authorize R to run commands as root (system administrator)
  • Show/hide R command history
  • Set R console colors
  • Open document in editor
  • Create a new empty document
  • Print this document
  • Quit

Most of your work is done at the command line.

Before we get going, I’d like to perform the traditional “Hello World!”:

Hello World! in R

Hello World! in R

This term I’m teaching an introductory statistics course so I’ve got some sample data from classroom exercises to run R through it’s paces. R stores with data tables in variables called data frames. There are pre-loaded data frames available in the software with which to experiment. You can enter data manually or just load the data from an outside source.

My data is in spreadsheet format and there are a number of ways to import spreadsheet files directly into R. Since I use Google Sheets as my primary spreadsheet program, the easiest way for me was to save off my data in CSV format. (Excel files can be imported directly.)

R assumes files are in the current working directory. The default is my home directory so I changed that setting to where I’d saved the files.

Get and set the working directory

Get and set the working directory

My first test data was a simple table comparing car prices to the age of the car. I chose a specific make and model (Toyota Sienna) and pulled these numbers straight from AutoTrader.com. I converted the worksheet and loaded the data table into R:

Car age vs. price data table

Car age vs. price data table

Now I can work with it directly. First let’s give it a once-over using the summary() function:

Summary of data table

Summary of data table

This gives me my central tendency numbers amongst others. Now let’s do a quick plot using using the pairs() function.

Data plot using pairs()

Data plot using pairs()

I got two plots, one with age as the dependent variable and other with price. I didn’t tell R which was which so it did both. I can be more specific using the plot() function:

Plot command syntax

Plot command syntax

This tells R that price is dependent on the age of the car. This gives me a single chart:

Chart using plot()

Chart using plot()

Now I can calculate the correlation coefficient with cor() to see how strongly the two sets of data relate to each other:

Correlation Table

Correlation Table

So price is negatively correlated with the age of the car, which fits what the chart told us. Older cars cost less, in other words. It’s a pretty strong correlation, too, at 85%.

Now we’d like to do some prediction so we’ll perform a linear regression on the data. First create a data structure with the regression data, then pull a summary:

Linear regression

Linear regression

Now you have a processed data set and you can continue working with it.

We can get data in and manipulate it but how do we get it out? For text data, such as the correlation summary, you can just copy and paste it from the R gui window. The plots appear in a separate window. I was able to click on the image, select Copy from the Edit menu and paste directly in a document.

R is a very powerful, interactive language for scientific and math computing. So why would you use it instead of a spreadsheet?

Frankly, if you’re not a full-on numbers nerd, you may just want to stick with spreadsheets. But they’re a general purpose tool and R has more math functionality. You can write your own functions and even groups of functions (called packages) to extend R even further.

Another advantage of R is automation. Anything you can type in at the command prompt can be saved into a file, letting you easily set up long, complex sets of calculations that can be loaded into your workspace with a single command. If you’re doing batch processing of multiple datasets, this can save a lot of time and effort.

The documentation is very good and there are plenty of tutorials and examples available at the project homepage and around the Web.

 

#Review – The Practice of Programming

In a world of enormous and intricate interfaces, constantly changing tools and languages and systems, and relentless pressure for more of everything, one can lose sight of the basic principles— simplicity, clarity, generality— that form the bedrock of good software.

Kernighan, Brian W.; Pike, Rob (1999-02-09). The Practice of Programming (Addison-Wesley Professional Computing Series) (p. ix). Pearson Education. Kindle Edition.

 

Programming is a craft. Some programmers refuse to acknowledge this, insisting instead that it’s a scientific or engineering discipline. There are certainly elements of that but anything that allows a human to place their own distinctive style on a made thing is a craft.

Bridges look a certain way because that’s how the physics make them look, not because the engineer was feeling whimsical that day. That’s why one bridge looks a lot like another. When a carpenter makes a bookshelf, it shares the same functionality with other bookshelves. However, there are a hundred individual decisions made by the carpenter during the design and creation process. A bookshelf is physics seasoned by art.

Two software applications may have similar functions but the underlying source code tells a different story. Anyone who reads or writes code knows that the programmer imposes their own personal style on the code in hundreds of different ways. From the use of a favorite decision loop to the design and implementation of a particular data structure, programmers have always found a way to express themselves in their work.

The Practice of Programming was written to bring programmers who are swimming in complexity back to their roots and help them regain perspective. Just to be clear, this is not a book that will teach you how to program. However, if you are learning to program or even if you’re a veteran coder, you’ll get something useful out of this text.

Despite this, Kernighan and Pike don’t romanticize the work of programming. Instead they show that by embracing (or re-embracing) the fundamental principles of coding, you can become a better, more productive programmer.

They start with a style guide, because clean, consistent code is easier to read, debug and maintain. Establishing and maintaining a consistent coding style frees up your higher brain functions for more complex decisions and problem solving.

Next we move on to algorithms and data structures. These building blocks of software should be familiar to all coders but the right algorithm choice can make the difference between a program that takes an hour versus one that takes seconds to produce the desired result.

The authors build on this foundational knowledge with discussions on design, interfaces (how to efficiently pass data), debugging, testing (which reduces debugging), performance, portability and end with a chapter on notation which includes a discussion of tools that will help you generate code automatically.

The writing is crisp and direct. Kernighan and Pike speak to you, programmer to programmer. They have decades of combined experience in the coding trenches and understand the problems you face every day, whether you’re doing an assignment for school or creating a business analytics solution for your business.

#Book Review – Code Reading

The reading of code is likely to be one of the most common activities of a computing professional, yet it is seldom taught as a subject or formally used as a method for learning how to design and program.

 

Aspiring writers are always told to read as much as they can if they want to become better writers. As Dave Thomas points out in his foreword to Diomidis Spinellis’ Code Reading, aspiring programmers almost never get this advice. By code, of course, I mean the program source code, the fundamental recipe for any piece of software.

Imagine that you’re taking a writing course. You get assignment after assignment – persuasive essays, memos, poetry, research papers or prose. Now imagine that you have to do every one of these assignments from scratch, without any examples. You have to work everything out yourself.

This is the traditional method of teaching programming. I know because I used to teach programming classes.

Code Reading is both refreshing and an eye-opener. Not only does Spinellis present a solid case for the habit of reading program source code, he also fills his book with code examples, complete with commentary. Though the examples are mainly from Java and C, the lessons learned can be applied to any programming language.

Just to be clear, the code presented is not from the toy examples found in programming textbooks. This is material from real, working software projects. The book covers major programming topics and even includes analysis of a complete, working program. Spinellis gives tips and techniques for the novice code reader to aid them in developing their skills.

From my own experience, reading code can be very educational, even entertaining. Every programmer tries to put their own personal stamp on their work and part of the fun is seeing the human mind behind the algorithms.

For example, I was teaching game software development and wanted to get my students some practice in reading code. We downloaded the source code for BZFlag, a tank game based on the video game BattleZone combined with Capture the Flag. It’s a fun, cross-platform game that you can play solo or over the Internet with teams.

There was one particular feature in which I was interested. During play, you can set your tank to AutoPilot mode and let it play for you. This makes a nice change from having to pause your game whenever you have to get up and take care of business.

We grovelled through the source code files for a bit and I finally found the sections having to do with AutoPilot. As I read them, I spotted some code that looked very intriguing but was never actually called by any other part of the program. It looked like someone was trying to build a heuristic system for the AutoPilot mode. In short, it was meant to have the program build a solution starting with the goal and working its way backwards. It was very clever and had a lot of potential. I could add see why it hadn’t been  implemented due to the inherent complexity of the method.

The fact that the code was just left there, unfinished, was fascinating. When I saw that code, I put myself in the mind of that programmer. I’ve also had coding ideas that got stuck in blind alleys. But here was someone like me, trying to solve a problem in an interesting way and failing that, leaving a note for the explorers to come after in hopes that they would ultimately succeed where he did not.

This book shouldn’t just be on any programmer’s reference shelf, it should also be the basis of at least one undergraduate programming class.


Spinellis, Diomidis. Code reading: the open source perspective. Addison-Wesley Professional, 2003.

#52WeeksOfCode Week 21 – Facebook API

Week: 21

Language: Facebook API

IDE(s): TextWrangler with MAMP

History (Official):

(From Facebook’s Facebook page)

“Founded in 2004, Facebook’s mission is to give people the power to share and make the world more open and connected. People use Facebook to stay connected with friends and family, to discover what’s going on in the world, and to share and express what matters to them.”

History (Real):

Letting us connect with friends and family and all of the other sharing and caring doesn’t pay the bills.

Facebook makes money by selling the attention of its users.  If that’s a problem for you, don’t get an account.

That’s pretty much all I have to say about that.

Discussion:

Officially, Facebook provides SDKs (Software Development Kits) for IOS, Android, Javascript and PHP. There are also third-party providers that offer support for other languages such as Objective-C, Java, Ruby, Python and Flash. Facebook does not offer support for these third-party kits.

I like to keep things simple, so I’ll be using the Facebook Javascript SDK. There is no software to download or install. You simply need to add a short section of code to your Web page. When the page is loaded into a browser, the Facebook SDK will automatically load itself.

First I need to get Facebook to recognize me as a developer. I select Register as a Developer from the Apps menu on http://developers.facebook.com (after logging in with my Facebook account). I’m asked to re-enter my Facebook password.

FB Password Verification

Password Verification

I agree to Facebook’s policies.

FB Policy acceptance

Facebook Policy Acceptance Dialogue

I’m asked to confirm my registration by entering a code that will be sent to my phone by either a text message or a phone call. Facebook seems to be unable to text to my phone for some reason, so I picked Send via Phone Call. In a few moments, my phone rang and a friendly automated voice told me my code.

Account Verification Code

Account Verification Code

Finally I was successfully enrolled as a Facebook developer. I haven’t written a lick of code yet but I still feel like I’ve accomplished something.

Now I need to get an ID and secret token for my app. These identify my software to Facebook so they can manage what I can and can’t do and also keeps other developers from pretending to be me. (Twitter and other social networking services have similar methods.)

After a couple of false starts (my app title contained a trade name, which is verboten), I was sent to a page which provided the Javascript code (whichincluded my app ID and secret token) to insert in my Web page to load and initialize the Facebook API.

I created a page called fb_test.html and saved it in my MAMP local site folder. I inserted a bit of code to add a Like button on the page, loaded it up and it worked!

Testing the Like Button

Like Button Test Successful!

So far, so good. Now I’d like to send a message to Facebook from my page. I replaced the Like button code with the code for a Send button. A quick page reload later and voila:

Send Button Load

Loading the Send Button

Now to compose a message to myself:

Composing a Send

Just saying hi….

Meanwhile, over on my Facebook account:

Message received

Hey! I got a message!

Overall, it was a very pleasant experience. Granted, I wasn’t doing anything complicated, but I appreciated the hand-holding. Sample code was easy to find and worked first time. The documentation was readily available and easy to browse.

I almost hate to say it, but this was probably the most pleasant developer experience I’ve had to date. Whatever your opinion of Facebook, and everyone’s got an opinion, I can’t fault them here.

#Review Hackers – Heroes of the Computer Revolution

Just why Peter Samson was wandering around in Building 26 in the middle of the night is a matter that he would find difficult to explain.

My first programming class was in 1977 at a local community college. What I took away from it was the idea that you could solve any problem, no matter how overwhelming, by breaking it down into smaller and smaller functional pieces, then reassembling the pieces into a solution.

The opening sentence to Steven Levy’s book Hackers: Heroes of the Computer Revolution neatly encapsulates the book’s theme. It’s about very smart people who are compelled to act in ways that are hard for them to describe to others and even sometimes to themselves. Some of them are looking for money, others for redemption,  On a higher level, it’s a book about exploration.  It’s about compulsion. It’s about obsession. It’s about passion. It’s about America.

It’s difficult to write about computers for a general audience. The problem is similar to what Hollywood faces when they use computers as a story element. Simply put, it’s hard to make typing seem interesting.

This is where Hollywood gets it wrong. Computing isn’t about the technology, it’s about the people.

A good technology writer understands this and Steven Levy is a very good technology writer. He was a senior writer for Wired magazine and chief technology writer for Newsweek. Hackers was his first book and it’s a very engaging read.

The story starts at M.I.T. in 1958 and takes us on a journey across the country and spans almost three decades. It describes a tumultuous time in our modern history, not just politically and socially but technologically. Levy takes us from the Tech Model Railroad Club at M.I.T. to the hardware hackers of the Homebrew Computer Club in Silicon Valley to the first computer game hackers and ends up back in Cambridge with the ‘Last of the True Hackers’. Along the way we see how what was once unimaginable became commonplace.

Most important to me, this book showed I wasn’t alone. I understood these people. They were flawed like everyone else but they had a passion and the skill to make the thoughts inside their heads into reality for the rest of us.

 

References:

Levy, Steven. Hackers: Heroes of the computer revolution. New York: Penguin Books, 2001.

#Coding4Humans #DIYMath FreeMat

(Cross-posted at We Hate Math)

Today in DIY Math we’re looking at  FreeMat. As the name suggests It’s modeled after MatLab. FreeMat has been in development for over a decade by a group of volunteers

System Requirements – Specific hardware requirements were not available but the pre-built packages I tested all run on 32 or 64-bit Intel-compatible CPUs. The application itself doesn’t seem to use much memory. As an example, the Mac version uses about 85 MB of real memory on my system. Since Windows XP is supported, we can assume that XP-compatible hardware constitutes the base system.

Installation – The latest version is 4.2 and is available for Windows (XP and up), Linux (various) and Mac OS X. In addition to pre-built packages for the above platforms, the source code is also available and is released under the GPL license. All versions of FreeMat are kept at the same version level and functionality.

Windows: Simply download the 52.5 MB setup file and double-click it. (NOTE: A portable version of Freemat is also available so you can run it from a thumb drive without installation.)

Linux – I installed FreeMat on Debian Linux using APT and on my system it was a 12 MB download, using an additional 22 MB of disk space.

Mac OS X – The installer is a 79.5 MB compressed disk image (DMG) file. Double-click the file to mount it, then drag the program and documentation to your Applications folder. The two files together take up about 250 MB of disk space.

Documentation – The Mac download comes with a PDF manual detailing all of the functions available in FreeMat. (For Windows or Linux, you can download the manual here.) The manual is automatically generated using Doxygen, which scans specially marked comments in the source code and outputs documentation in a variety of file formats.

This is both a good thing and a bad thing. It’s good in that it makes it easy for developers to actually maintain their documentation, assuming they remember to update the comments. It’s a bad thing because there’s no guarantee that the resulting document will be well-written. In fact, the included manual is very sparsely written, despite the 162 page (!) table of contents. It is less a manual than simply an API reference. Each function or class is briefly described and includes one or two usage examples. The target audience for this manual are those who don’t need a manual. It’s comprehensive but very terse.

A much better option to start with is the FreeMat Primer. Who’s the audience? Let the authors (Gary Schafer and Timothy Cyders) tell you:

We assume that you have Freemat properly installed and working. If you have any issues, direct them to the online Freemat group, http://groups.google.com/group/freemat.

This book was originally written for the Windows version. The book now covers more of the Linux and Mac versions, as well. In those cases where there are differences, we’ll point them out.

It’s a much friendlier introduction to the software. It’s very readable, with plenty of screenshots, little tutorials and code examples. With this and the official function reference you have a very good documentation base. In addition, there is also a Google group available for more interactive support. There is another Google community intended to host FreeMat tutorials. (At this time the content is a bit sparse.) You can also type helpwin at the command prompt from within FreeMat.

Compatibility – Based on the scripts I tested, MatLab support is somewhat hit-or-miss. I’ve been able to run scripts with no modifications, minor modifications or not at all. I would suggest that you test your Matlab scripts on a case-by-case basis and then decide whether you want to make the changes or just re-write from scratch. The scripting syntax is similar enough that most of your work will be figuring out equivalent function calls.  (A MatLab to FreeMat translation guide would be a really good project. Better yet, some kind of conversion tool.)

Command Line vs. GUI – The Windows and Mac versions of FreeMat are targeted at a graphical interface so accessing the tool from the command line is at best a non-trivial . The Linux version can be launched from the CLI. With no parameters, the graphical client starts up by default. To use the CLI version only, start the tool with the option -noX or -nogui to suppress the graphical subsystem This will give you a FreeMat command prompt in your terminal window. If you simply wish to run a Freemat command and then exit, use the option -f to run the tool in command mode. (NOTE: if you want to see the output of your command, make sure to specify that as FreeMat will not show any output.)

Integrating FreeMat with your native scripting environment is problematic (okay, just about impossible),as FreeMat scripts are meant to be run from within the FreeMat interface. You can edit them inside FreeMat or using your favorite text editor but make sure that they are saved to FreeMat’s working directory. (You can set this up by running pathtool from within FreeMat.)

 

The GUI for each version is comparable in look and feel.

FreeMat Interface

FreeMat GUI

This is from the Mac version of the tool. In addition to the main terminal window, FreeMat also tracks your command history (allowing you to invoke a previous command simply by double-clicking on it), tracks what variables are currently in memory, along with their data types and values if applicable. The Debug window is supposed to show any error or warning messages but on all three platforms I tested, the messages showed up in the main terminal window and the Debug window remained blank.

Summary

Pros: Easy installation, all supported platforms are kept current with a common codebase, decent documentation and online support.

Cons: Development progress is a bit slow. The latest release (4.2) was posted in June of 2013 and that was two years after the previous release. CLI support is limited or non-existent in the Windows and Mac versions and all scripts are restricted to running within the FreeMat environment. Third party support is a bit anemic.

Would I use this in my class? – I would feel confident recommending this to my students. The ease of installation and minimal setup are a definite plus, you don’t need the latest hardware to run it and the price fits everyone’s budget. It supports nearly everything we might do in 100- and 200-level math classes with enough overhead room for more advanced work.

#Coding4Humans Book Review – Programming Pearls

I enjoy reading books about computer programming. (At this point you’re probably saying to yourself, “Of course you do, Tom. You big old nerd, you.”)

But the books I prefer to read aren’t about a particular programming language or operating system but the books about the art, history and philosophy of computer programming. Programming Pearls by Jon Bentley is a classic in this particular genre.

Bentley was a computer researcher at the original Bell Labs in Murray Hills, NJ and he used to write a column on various aspects of programming design and problem-solving for the periodical “Communications of the ACM”. This book is made up of selected essays from that column.

This book has earned a permanent spot on my bookshelf in three ways. First, it’s a fascinating glimpse into the history of computing. The year it was published (1986) was the beginning of the personal computer revolution. We were taking the power back from the mainframe computer priesthood. Having a PC was to be like Prometheus with a piece of stolen fire. We had some of the power for ourselves and were struggling to figure out what to do with it. (Hackers: Heroes of the Computer Revolution by Stephen Levy is an excellent look at the people and personalities that built this era.)

This book is also about problem-solving. As Bentley says in the introduction:

The essays in this book are about a more glamorous aspect of the profession: programming pearls whose origins lie beyond engineering, in the realm of insight and creativity.

These days we’re accustomed to being able to just throw more hardware at computing problems. Bentley reminds us that there is still value in thinking a problem through and presents some interesting ideas, examples and exercises to aid in that work.

Finally, there is my favorite essay, “The Back of the Envelope”. If I had my way, this would be required reading for all of my math students.

Let me explain.

I encourage the use of calculators and computers in my math classes to do the computational heavy-lifting. My logic is that if you understand the problem well enough to explain it to a machine, then the actual computation is just a mechanical exercise. But this doesn’t mean that you should just trust outright whatever a machine tells you. You need to know what the answer should look like by using estimation so you can judge the machine’s output. Bentley devotes an entire section to estimation and these skills also extend into other essays, such as “Perspectives on Performance” and “Algorithm Design Techniques”.

Programming Pearls includes exercises at the end of each essay to help you develop your mental muscles (don’t worry, there are hints in the back of the book) and an appendix with a catalog of algorithms. At 256 pages, it’s a pretty breezy read and the organization of topics makes it easy to just dip in wherever you like and start reading. It’s not just an excellent reference but Bentley’s writing style is friendly and intelligent without being condescending. If you’re a programmer (whether hobbyist, student or professional) you need a copy of this book.

References

Bentley, J. L. (1986). Programming pearls. Reading, MA: Addison-Wesley.

Levy, S. (1984). Hackers: Heroes of the computer revolution. Garden City, NY: Anchor Press/Doubleday.