Statistical Computing & Software: July 2007

Tuesday, July 31, 2007

Quick Post

Back from vacation, got a request for instructions on how to set up ESS to work with RExecServer. Consider me on that... right after I dig myself out of this email hole. :-)

Thursday, July 19, 2007

Okay, apparently Blogger decided that this was a spam blog for a while there and I didn't notice because I had other things to do this week. All sorted out now it seems. To celebrate, I updated the leopard branch of the RExecServer repository to activate bitmap graphics output to files, thusly:


quartz("public.jpeg:Rplot.jpg")

Supported UTIs are public.jpg,public.tiff,public.png and public.jpeg-2000. public.pdf will be coming along in the not too distance future. This is actually a side effect of moving from using raw bitmaps in the GUI to using png images. Your average statistical graphic is usually pretty easy to compress so you can get from ~500K for a raw image down to ~20K pretty easily. On the server side it doesn't matter much because we only keep one bitmap for each device, but on the GUI we're keeping a potentially large number of pages that can eat up RAM.

To people tracking the repository, the leopard branch is really no different from the master branch. I just push work done in my Leopard partition there before migrating to the master branch with a test run in Tiger. The leopard branch will work in Tiger unless I've made a mistake.

In other news, posting will be spotty of the next week. I'm heading down to Los Angeles tomorrow afternoon and then down to San Diego on Wednesday and Internet access will come and go.

Sunday, July 15, 2007

RExecServer git repository update

The master branch of the git repository is now up-to-date with the leopard branch, minus any leopard specific code (of which is there none at present). R-devel is required to get things to compile. The GUI code isn't available yet, I'd like to get things to a truly usable stage first. I haven't had as much time this week, other things have taken priority, but some minor stuff was added such as the beginnings of the preference panes (and the infrastructure to support them) as well as the start of the action menus. I also changed the graphics device plot selector to take advantage of available space. Perhaps I'll post a screencast of that soon.

In the meantime, people who aren't using OS X, but want to consider a similar GUI type of set up might want to look at the Omegahat CORBA stuff, which did something very similar a long time ago. The page hasn't even been updated since before OS X 10.0 was released. :-)

Tuesday, July 10, 2007

Just a couple minor updates

Doing some other things today so only a couple of minor updates watching The Tour this evening:

Removed the mouse-sensitive pages menu. This seems to be pretty disruptive without putting some extra delay into the transition, but the delay makes it too slow. Instead, I'm now using the toolbar lozenge to show and hide. I think I'll also hook it up to middle-click.

Implemented some display transformers for the Workspace. There's a (localized) "n objects" on the bottom now. The class and size information for objects is now shown (could probably be prettier!) as well.
Updated: Oh, yeah. You can also clear a console with Cmd-L now (the current prompt is restored). I saw that feature request on R-SIG-Mac a couple of weeks ago and it was something like 5 lines of code so I tossed it in.

Monday, July 9, 2007

R GUI Screencast

Some of the new GUI features are hard to get from screenshots so I decided to make a little screencast to show off a couple of the new features.

Saturday, July 7, 2007

Starting to think about help...

You know what would be awesome? If the HTML manpages generated by R were actually microformat-enabled HTML. I was just thinking about how it would be nice if you could identify example code and whatnot from the eventual help page. I wonder how scary that code is? IIRC the HTML generated by R's manpage generator was pretty much HTML3 (so not even easy to style) so it could be pretty scary. It may even be one of the Perl bits.

Okay, I need to get some sleep now.

Friday, July 6, 2007

Making Progress on the Console

Making some progress on the GUI's Console implementation. You can see that we now indicate WHICH server we're talking to at the moment. I've added an attached "workspace" inspector in the form of a source list such as the ones you see in Mail and iTunes. This holds information about the objects for inspection or dragging and dropping between servers. The thin black line is a splitter bar and is user selectable (double clicking collapses it entirely).

Both sides now have a status bar (of course I made sure the shading matched the action button... why do you ask?) with an action menu. As well, the hint is back and is now being powered from a new object inspection infrastructure in RExecServer (I have some other code to work on, but I will hopefully mind a few minutes to finish things and do a git push). We're now parsing out the function information rather than capturing string output so I think we can do completion in the style of Xcode or TextMate where, say, lm completion would result in:


lm(<#formula>,<#data>,<#subset>,<#weights>,<#na.action>)

I think Ctrl-/ should advance and that TAB (perhaps, I'm open to suggestion) should complete the editing. In other words, imagine we had...


lm(X ~ Y,mydata,<#subset>,<#weights>,<#na.action>)

then TAB would clean the other three arguments and jump us out of the function. You can't see it, but the console is actually doing a lot of invisible markup that we can use to detect the completion regions. We're presently also using text attributes to track input, prompt, error and output regions. This means we can easily cut and paste text WITHOUT prompts (which has been a major feature requests) and without ambiguity. We can also save a console script as a sourceable .R file for later use (sort of like dribble) saving the previous output as comments or not as all.

Thursday, July 5, 2007

Oh Good, Performance Doesn't Totally Suck.

Been doing a bit of performance testing this evening on the R GUI with RExecServer. Qualitatively, on my year old MacBook Pro, it seems that nothing has changed. At least, the overhead incurred by drawing the graphs or outputting the text seems to outweigh the communication overhead by quite a bit. This may change when hinting gets going, but thus far there doesn't appear to be a problem. Additionally, we have plenty of space for optimization in the distributed objects interface since we're not defining protocols yet. Good good good.

One thing does worry me: shipping bitmaps across the wire. I *think* that NSData it is actually optimized to use shared memory, but I'm not positive. In any case, the bitmap size of a full screen graph is pretty huge. That said, we could always do some timing (hey, we're statisticians, right? Staunch empiricists.) and then compress bitmaps above a certain area for transport via libz or something suitably fast. Hell, RLE would probably be a big win in most plots. I know you're thinking "use PDF," but PDF rendering performance is really awful relative to compositing to a bitmap. (Witness complex scatterplot in your Keynote presentation. There's a reason it takes 30 seconds for your graph to come up.)

Wednesday, July 4, 2007

Ha Ha! Success!

After several hours of missing one very important line of code in RExecServer, which was preventing it from operating in purely vended mode---it was a delegate problem and the TerminalDelegate would replace it---I've gotten RExecServer executing and communicating with an R GUI.

The GUI is Leopard specific, but I can offer some details so far:

We use a normal NSDocument with no modifications to NSDocumentController or NSApplication or any of the questionable things I had to do for the pre-RExecServer GUI implementation

Devices are considered to be views of the document so devices and the console are explicitly intertwined now. In much the same way that the main Interface Builder window and the associated UI views are linked. Hopefully this means an end to the infamous "hanging" windows.

Scripts won't be, but we'll need a UI for specifying their target console...

Tuesday, July 3, 2007

Apparently I'm on a tear...

Rob Goedman gets the Official Tester Award for RExecServer, reporting several issues with the graphics subsystem and my inability to appropriately use version control systems. There are a bunch of changes in the git repository:

All the files you need are actually there

Clipping rects are restored on subsequent updates so some things are working better

Terminal support has been refactored into a pseudo-GUI to help make sure I have coverage on the things I'll need for a true front end. Its probably fractionally slower than a true Terminal version of R, but it has some more flexibility

Device windows close when devices go away. A good thing too because they'd crash if you resized them after the device disappeared.

I've been playing with making the RGUI_Type not be AQUA, which restores all of the help file functionality, but causes an annoying (and untrue) complaint from quartz() among other things like trying to use X11 for select.list(). I wish there was a way to selectively deactivate things like that in R instead of the blanket situation we have now. There are a number of places where we have


if(.Platform$GUI == "windows" | .Platform$GUI == "AQUA") ... else ...

that are just irritating. Personally, I think we should be doing dispatch (S3, S4, I don't care) where .Platform$GUI becomes an _object_ so that I can define functions for my particular GUI and where .default is the stuff on the RHS of the else.

Monday, July 2, 2007

RExecServer object vending

The RExecServer started getting basic object vending today. To test it out I implemented some simple object copying using Distributed Objects. This is checked into the public git repository, but will require R-devel to run because of changes to function export under R-devel. Behold!

First we start ourselves a couple of copies of R. Then,


> x = 1:100
> .Call("RES_CopyObject","x","R Execution Server 2")
NULL
> .Call("RES_ServerName")
[1] "R Execution Server 1"
>

Afterwards, we can take a look at the other execution server:


> ls()
[1] "x"
> .Call("RES_ServerName")
[1] "R Execution Server 2"
> x
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100
>

Ta Da! There's no big trick to it--we're just using the standard serialization routines to read and write NSData objects on the Cocoa side and then using the usual NSDistantObject routines to actually transmit the data. There's no real error checking at the moment, but that will come.

Also, with the help of some scripts from Rob Goedman I think I've tracked down the last of the clipping and state stacking errors in this latest checkin.

RExecServer available as a git repository

The RExecServer source is now available as a git repository. Once I figure out what exactly they mean I will be "mobbing" it to allow for patch submission. The binary hasn't been updated, but the source fixes a few reported problems with the pager() and some graphics issues (clipping mostly). It also adds the ability to specify the usual command-line options except for the gui related ones (for obvious reasons).

Statistical Computing & Software