Monday, June 25, 2007

Old Things New Again: Multiple Evaluators for R Under OS X

This post is mostly the result of a conversation I had with Stefano Iacus a couple of weeks ago at WWDC. He was making the observation that a) he would like to be able to run multiple copies of R from the R GUI and b) that he would really really love to run R evaluators over XGrid.

The second one might be harder, but I think the first one can be solved. Ideally, we would simply be able to spawn off multiple R evaluators in separate threads within the R GUI and apart from synchronization problems in the GUI we would be good to go. However, I rate the chances of R becoming thread-safe (let alone supporting multiple evaluators) any time soon as "slim to none." Of course, I'm not on R Core so I could be wrong, but from what it would take (every function in every package would need an extra argument for starters) it seems unlikely. The way around this? Spawn off separate R processes and connect them up within the R GUI. This is basically what people do when they use R from Terminal so there is no real disadvantage compared to the current methods and a lot of potential advantages.

So, the plan:

1. Implement RExecServer as an LSUIElement application. As much as I'd like to use Leopard-specific features here (garbage collection in particular), people using Tiger have multiple processors too. So, we're stuck with autorelease pools for now.
a. Vend an interface as a NSDistantObject that can be picked up by the GUI
b. Provide a threaded stdin reader (if TERM is set) using the "traditional" R reader. This is to provide ESS support. I think we can actually vend the object and allow the stdin reader at the same time. Er, this could be a cool feature for something we'll talk about in October (if all goes well, this isn't my day job so it only gets implemented when I have some spare (hah) time) :-).
c. Theoretically, we could vend to/from different machines. Using Bonjour you could publish your R session. We'd have to work out some sort of security model. Not sure how that's normally handled by NSDistantObject.

2. Change the graphics device a bit. Mostly I don't think we want to ship around the graphics list. In general, we can ship a bitmap that is appropriate for the display. We can also ship back a PDF if so desired, but performance with a bitmap is likely to be higher with no discernible quality difference except in special circumstances. We can just have a [Device dataInFormat:] that sends back an NSData of the appropriate format (RGBA bitmap, PDF, etc). The nice part is that both the client and server are running OS X and both have access to the Font metrics so there doesn't need to be communication there.
a. On the ESS thing again, we can provide a simple shim device window to give ESS users a decent graphics device with full interaction. It won't be as cool as the one in the GUI, but that's what you get for using ESS ;-).

3. The GUI now maintains connections to any number of GUI console/device windows.
a. Stefano suggested being able to copy and paste between environments. I think this can be done using private Pasteboards and serializing objects to RAWSXP types, converting them to NSData and then transferring them over.
b. The GUI itself never need become unresponsive. You could even force kill a runaway server.

Now, certain things get more difficult. Certain GUI toolkits, like my own Mojave, won't be running in the GUI process anymore making them difficult to write GUIs. Of course, nobody that I know of writes GUIs using Mojave (and you'd think I'd know), but this also rules out everything else. Personally, I think the way around this is some sort of Dashboard-style interface where the front-end is implemented in HTML and Javascript with hooks back into R using a special protocol handler (which is apparently an SDK these days...) or a Javascript proxy to allow execution. I tend to favor the protocol handler.

No comments: