Statistical Computing & Software: June 2007

Saturday, June 30, 2007

RExecServer Preview

Posted last night to R-Sig-MAC:

I've mentioned this little project to a few people off-list as something I'd like to do for Leopard, but it occurred to me that there is nothing particularly Leopard specific in this particular piece of code. So, here we go:

RExecServer

This project mostly comes out of a conversation I had with Stefano about using multiple cores in the R GUI and Xgrid. The RExecServer is a first step in that direction. It provides a true Cocoa application that runs as a background server (so no Dock icon or menubar). The user (that's you) communicates via either a normal stdio connection (i.e. Terminal or ESS) or using Distributed Objects (for the GUI). In this initial implementation only the stdio access is working.

To use it I recommend symlinking the shell script RExecServer.app/Contents/Resources/RExecServer.sh to something handy in either /usr/bin or /usr/local/bin so that you can get at it from ESS or the Terminal. Then, just use as normal.

Things you get:

Mostly a working, fully responsive, Quartz device. This Quartz implementation is actually completely new so you may notice that certain things are different. Particularly, the font metric calculations are now improved---note the location of elements in plotmath (particularly sum and product). Right now aliasing is turned off, but that will be an option (I was experimenting with something). It also doesn't update the screen until it's done processing so while it feels slower it is actually much faster. There might be a few clipping bugs, but we'll sort those out.

A normal readline-based interface that can be used from ESS or Terminal. You can also start multiple copies, though it presently complains about Services. This is harmless though.

Very low CPU usage when idle. I'm forced to use polling with readline, but it doesn't appear to use very much. The event loop works differently in this version so there is no need for a timer or anything.

I'm not sure how much time I'll have, but here's what the design buys me:

We can pipe bitmap and pdf output through the quartz device. This means no more X11 required. Right now this isn't working, but the infrastructure is in place.

We can separate the GUI and R itself. This has pros and cons but I think it will be a long term advantage, especially as we get more cores.

Things I'd really like to do (again, time):

Copy-n-paste objects between Servers. Using serialize/deserialize and Distant Object or NSPasteboard connections

Quicktime movie output device. This might wait for Leopard.

If you poke around the link above you might find some other ideas. :-)

What this isn't:

Intended as a complete GUI. That's mostly for the front-end implementation which is a separate application. The graphics device is intended to be very minimal for ESS users who want something better
than the old Aqua device.

Anywhere near complete. You don't get lots of things right now. Like command-line options. Any options at all really. Lots of safety things aren't wired up either.

Let me know what you think and if you run into major trouble. The build is Universal so it should also work on PPC.

Thursday, June 28, 2007

An improved flowSet idiom?

In flowCore, a flowSet is associated with an AnnotatedDataFrame that contains ancillary information about the frame. This seems really useful, but it's really only used by the flowViz package---there really isn't anything remotely resembling a useful interactive idiom. Now, we could use subset(), assuming we could ever get the generic working properly, but we already have Subset in flowCore and the opportunity for confusion is high. One idea that has occurred to me is to take advantage of the ellipsis argument in [ and [[ to let us say things like

patient[[CellType="B Cells"]]

to extract the flowFrame identified by the CellType column.

Tuesday, June 26, 2007

If I Only Had The Time....

Other things I'd like to do with R if I had the time (if someone else needs a project, I won't mind :-) )

1. A JITer for R to extend Luke's Bytecode stuff. I would say something along the lines of a binding so, say GNU Lightning. Now that function pointers can exist as R objects for use in .Call you'd create an EXTPTRSXP that protects the RAWSXP holding the generated code.

2. Finish up my libffi interface for R. I wrote one of these just after DSC2003. It probably still even works since libffi doesn't drift very much. It would probably be nicer if it was integrated with TypeInfo though. It also let you write R functions that appear to be C function pointers (for use as callbacks for example), though this has issues in multithreaded environments.

3. A centralized object database. One of the things I actually like about S-PLUS is the persistent database notion. I often have little pieces of code (for example my alpha function) that I stash away in scripts and places and then promptly lose. It would be nice to have a little database, perhaps with versioning, that you could easily tag and search. Hell, it could even sync with something online.

4. GData for R. I think it would be cool to be able to access Google Spreadsheets from R. It could be a pretty slick way of distributing data easily. If there was a way to hook it up to Google Docs for documentation and description through the help system that would also be cool. You'd probably have to use RCurl as the back end to get https support. I've started this one a couple of times but I don't really have a pressing need so I end up putting it on the back burner.

5. A complete dbxml interface. Again, I have chunks of this one, but never finished it (I ended up just using pipe() and the command-line tool). DBXML is pretty handy if you have a massive XML file (say a FlowJo workspace) and you only need a teensy tiny little chunk (like the gating strategy).

6. R on Rails! Okay, that might be a little silly (though I wonder how I would do R mixins...)

7. GPU backend. Actually, with the advent of CUDA you could probably do this pretty easily a la the Matrix package.

8. R/Flash (or Flex) interface. Plots as SWF anyone? Do it right and you could use it to serve up things in Flex/Apollo apps. I suppose you could also do a XAML one?

Monday, June 25, 2007

Old Things New Again: Multiple Evaluators for R Under OS X

This post is mostly the result of a conversation I had with Stefano Iacus a couple of weeks ago at WWDC. He was making the observation that a) he would like to be able to run multiple copies of R from the R GUI and b) that he would really really love to run R evaluators over XGrid.

The second one might be harder, but I think the first one can be solved. Ideally, we would simply be able to spawn off multiple R evaluators in separate threads within the R GUI and apart from synchronization problems in the GUI we would be good to go. However, I rate the chances of R becoming thread-safe (let alone supporting multiple evaluators) any time soon as "slim to none." Of course, I'm not on R Core so I could be wrong, but from what it would take (every function in every package would need an extra argument for starters) it seems unlikely. The way around this? Spawn off separate R processes and connect them up within the R GUI. This is basically what people do when they use R from Terminal so there is no real disadvantage compared to the current methods and a lot of potential advantages.

So, the plan:

1. Implement RExecServer as an LSUIElement application. As much as I'd like to use Leopard-specific features here (garbage collection in particular), people using Tiger have multiple processors too. So, we're stuck with autorelease pools for now.
a. Vend an interface as a NSDistantObject that can be picked up by the GUI
b. Provide a threaded stdin reader (if TERM is set) using the "traditional" R reader. This is to provide ESS support. I think we can actually vend the object and allow the stdin reader at the same time. Er, this could be a cool feature for something we'll talk about in October (if all goes well, this isn't my day job so it only gets implemented when I have some spare (hah) time) :-).
c. Theoretically, we could vend to/from different machines. Using Bonjour you could publish your R session. We'd have to work out some sort of security model. Not sure how that's normally handled by NSDistantObject.

2. Change the graphics device a bit. Mostly I don't think we want to ship around the graphics list. In general, we can ship a bitmap that is appropriate for the display. We can also ship back a PDF if so desired, but performance with a bitmap is likely to be higher with no discernible quality difference except in special circumstances. We can just have a [Device dataInFormat:] that sends back an NSData of the appropriate format (RGBA bitmap, PDF, etc). The nice part is that both the client and server are running OS X and both have access to the Font metrics so there doesn't need to be communication there.
a. On the ESS thing again, we can provide a simple shim device window to give ESS users a decent graphics device with full interaction. It won't be as cool as the one in the GUI, but that's what you get for using ESS ;-).

3. The GUI now maintains connections to any number of GUI console/device windows.
a. Stefano suggested being able to copy and paste between environments. I think this can be done using private Pasteboards and serializing objects to RAWSXP types, converting them to NSData and then transferring them over.
b. The GUI itself never need become unresponsive. You could even force kill a runaway server.

Now, certain things get more difficult. Certain GUI toolkits, like my own Mojave, won't be running in the GUI process anymore making them difficult to write GUIs. Of course, nobody that I know of writes GUIs using Mojave (and you'd think I'd know), but this also rules out everything else. Personally, I think the way around this is some sort of Dashboard-style interface where the front-end is implemented in HTML and Javascript with hooks back into R using a special protocol handler (which is apparently an SDK these days...) or a Javascript proxy to allow execution. I tend to favor the protocol handler.

Statistical Computing & Software