Sunday, July 18, 2010

Thoughts on an iPad R (or similar device)

This has come up a few times on various lists I read and I have a strong opinion, so I figured I'd jot down some notes. The topic of discussion is the notion of R running on an Apple iPad, the inevitable Android Pads or the HP Slate (assuming the thing ever makes it out the door). Maybe even some sort of Windows 7 Mobile device.

There are a couple of immediate hurdles that come to mind:

* Limited RAM. The iPad has 256MB, the iPhone 4 has 512MB. A standard R compute box sports, what, 4GB at a minimum these days? Sure, back when I started using R I had maybe 512MB or 1GB of ram on my laptop, but I also had a lot less data back then.

* Battery. It doesn't really matter how fast the processors in these things are, the big problem is battery. Batteries aren't really getting better, the 10 hour battery life in modern laptops comes from a) bigger batteries (this is real reason you can't remove the battery from an Apple laptop anymore---doing that lets you wedge a bigger battery into the box without weight gain) and b) aggressive power management for the CPU.

* Apple prohibits interpreters outside of very specific use cases (games, Javascript). This is an Apple specific thing, but it's worth mentioning as there are now a lot of iPads out there. (Also Fortran is an issue under the new rules.)

For those reasons, I don't think there should be an R that runs directly on your Pad device the same way it does on your desktop. You don't have enough RAM and if you crank up the processor enough you'll destroy your battery life. Just look at what your favorite game does to your battery and remember that it gets to use specialized hardware for its processing.

So what should we do instead? Those who've known me for a while (they probably don't read this, but whatever) will remember StatPaper and the "Kernel" R hobby projects I've had at various times. Both operated on the premise that the R engine itself should run in a separate process from the UI and interact with the user through some sort of remote connection. This was inspired by Mathematica in the case of StatPaper and by the desire to use the multiple cores found in most machines through a single GUI in the case of Kernel R (which let you spin up multiple R consoles and transfer data between them).

I don't really work on either anymore, I have a day job and don't really program recreationally anymore (not that I dislike programming, it's just that I have work problems that provide a focus for learning new technologies and noodling around so I tend to go there first). But I do still like the idea of them and I think they could work for R. Here's I think what looks like a plan.


* Move away from the notion of R on a filesystem. R should really be living in something like a database, probably an object database. One option might be some of the new-ish document-oriented databases like CouchDB or MongoDB. Both would let you serialize R objects into documents using something that looked a lot like a standard environment.

* These databases should also live in a cloud environment (Cloudy-R! Eh? I know you like it). Having some sort of standard serialization that wasn't R specific could also let you have other applications updating data and using results. This lets R focus one what it's good at without being forced to do something it isn't good at (like web UIs).

* Use the same serialization format as the command format for a remote R UI. The serialization format should ALSO encode the graphics device.


Once you have those things you have the core of what you need for your Pad R. You have an R that exists in an always running state (because you could serialize the entire R environment to your database when the environment is idle. If you build it carefully you could probably even eliminate the need for a sticky session but that would likely involve some rummaging about in the internals).

On the client side, you can start simple with the familiar console interface then you can start to experiment with other interface modalities. Perhaps some sort of direct manipulation of a graphics object for exploratory data analysis.

No comments: