Statistical Computing & Software: 2007

Friday, October 5, 2007

R 2.7.0 Quartz graphics device clipboard support

Tom Elliot recently posted a request to the R-SIG-Mac list looking for a way to programmatically put graphics onto the clipboard under OS X. Right now this is pretty hard, but I was inspired to patch R-devel's (the future 2.7.0) new Quartz device to allow for clipboard output.

After applying the patch, you can specify file="clipboard://" and when the device is closed (yes, you MUST close the graphics device to see output. It would probably be a good idea to write the file on a NewPage as well, come to think of it). Then you can wrap up a simple function to get copying of the current device to the clipboard (or you can simply open it directly).


copy.to.clipboard = function(dpi=300) { dev.copy(device=quartz,type="png",file="clipboard://",dpi=dpi);dev.close(); }

I've attached the patch for the brave souls willing to try it out:


Index: qdBitmap.c
===================================================================
--- qdBitmap.c  (revision 43081)
+++ qdBitmap.c  (working copy)
@@ -49,16 +49,45 @@
         /* On 10.4+ we can employ the CGImageDestination API to create a
            variety of different bitmap formats */
 #if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_4
-        CFURLRef    path  = CFURLCreateFromFileSystemRepresentation(kCFAllocatorDefault,(const UInt8*)qbd->path,strlen(qbd->path),FALSE);
-        CFStringRef type  = CFStringCreateWithBytes(kCFAllocatorDefault,(UInt8*)qbd->uti,strlen(qbd->uti),kCFStringEncodingUTF8,FALSE);
-        CGImageDestinationRef dest = CGImageDestinationCreateWithURL(path,type,1,NULL);
-        CGImageRef image = CGBitmapContextCreateImage(qbd->bitmap);
-        CGImageDestinationAddImage(dest,image,NULL);
-        CGImageDestinationFinalize(dest);
-        CFRelease(image);
-        CFRelease(dest);
-        CFRelease(type);
+        CFStringRef pathString = CFStringCreateWithBytes(kCFAllocatorDefault,(UInt8*)qbd->path,strlen(qbd->path),kCFStringEncodingUTF8,FALSE);
+               CFURLRef path;
+               if(CFStringFind(pathString,CFSTR("://"),0).location != kCFNotFound) {
+               CFStringRef pathEscaped= CFURLCreateStringByAddingPercentEscapes(kCFAllocatorDefault,pathString,NULL,NULL,kCFStringEncodingUTF8);
+                       path = CFURLCreateWithString(kCFAllocatorDefault,pathEscaped,NULL);
+                       CFRelease(pathEscaped);
+               } else {
+                       path = CFURLCreateFromFileSystemRepresentation(kCFAllocatorDefault,(const UInt8*)qbd->path,strlen(qbd->path),FALSE);
+               }
+               CFRelease(pathString);
+               
+               CFStringRef scheme = CFURLCopyScheme(path);
+               CFStringRef type  = CFStringCreateWithBytes(kCFAllocatorDefault,(UInt8*)qbd->uti,strlen(qbd->uti),kCFStringEncodingUTF8,FALSE);
+       CGImageRef image = CGBitmapContextCreateImage(qbd->bitmap);
+               if(CFStringCompare(scheme,CFSTR("file"),0) == 0) {
+               CGImageDestinationRef dest = CGImageDestinationCreateWithURL(path,type,1,NULL);
+               CGImageDestinationAddImage(dest,image,NULL);
+               CGImageDestinationFinalize(dest);
+               CFRelease(dest);
+               } else if(CFStringCompare(scheme,CFSTR("clipboard"),0) == 0) {
+                       //Copy our image into data
+                       CFMutableDataRef      data = CFDataCreateMutable(kCFAllocatorDefault,0);
+                       CGImageDestinationRef dest = CGImageDestinationCreateWithData(data,type,1,NULL);
+                       CGImageDestinationAddImage(dest,image,NULL);
+                       CGImageDestinationFinalize(dest);
+                       CFRelease(dest);
+                       PasteboardRef pb = NULL;
+                       if(noErr == PasteboardCreate(kPasteboardClipboard,&pb)) {
+                               PasteboardClear(pb);
+                               PasteboardSyncFlags syncFlags = PasteboardSynchronize(pb);
+                               PasteboardPutItemFlavor(pb,(PasteboardItemID)1,type,data,0);
+                       }
+                       CFRelease(data);
+               } else
+                       warning("Not a supported scheme, no image data written.");
+               CFRelease(scheme);
+               CFRelease(type);
         CFRelease(path);
+               CFRelease(image);
 #endif
     }
     /* Free ourselves */

Friday, September 21, 2007

Those of your tracking R-devel may have noticed...

...that there was recently a big change to the way the Quartz device works. Turns out, Simon and I had roughly the same idea about a Quartz 2D implementation of the Quartz device for reasons of speed and the combined result (I ran "patch" on my R-devel tree and then Simon did all of the hard work :-) ) is in there now. This means the RExecServer and other R GUIs can now all use the same drawing backend for a variety of sources. I believe, for people who don't want to use RExecServer, that Simon also put in a CGLayer-based Cocoa target that runs an event loop in much the same way as RExecServer to give you interactive graphics natively in v2.7 (I'm guessing, since the feature freeze has passed). It also means native access to all of the Quartz 2D bitmap generation goodness and a lot more speed for ALL of the graphics device implementations.

Simon also went through and added some other features as well, including support for non-square pixels and some scaling things that I never got working correctly.

So, things have been moving along even though it may appear that I dropped off the face of the planet for a while. I'm going to be updating RExecServer soon, which means a dependence on R-devel, to use the converged device and it's new API (which is also available to any GUI developer).

Tuesday, September 4, 2007

Not Dead

Just busy trying to get some software sorted out before feature freeze. Turns out to be a lot of work when you break R. :-)

Friday, August 17, 2007

Cover Flow!

Yup. Figured it out. One of two ways to browse display plots. There is a more useful mode, using an iPhoto-like Browser that also lets you edit the list of plots. The arrow keys also let you move between plots.

Thursday, August 16, 2007

iWork '08 Spelunking.

Fiddling around with iWork '08 a bit and Numbers support is trivial to add to the R GUI. Basically, it exports a tab delimited version as text, much like Excel, though it uses the more modern pasteboard type (where Excel exports a much less useful string). We can detect that it is numbers from the metadata and "native" pasteboard types and throw up the same dialog as for the Excel pasting. Yay!

I'm thinking, for symmetry, that I should let you right click on a data frame in the workspace and copy as a tab delimited type for pasting into Excel/Numbers as well.

Friday, August 10, 2007

OLPC Musing

So, I've been playing with the OLPC a bit (I got a chance to play with a prototype in person at a party a while back) and I wonder: where's the science/mathematics? Authoring languages and EToys, which seem to be the focus presently (well, that and Tetris of course) are all well and good, but it would be really nice to see some more focus on the physical sciences, which leads naturally into topics like data analysis and statistics (collecting data is just the first step after all).

The minimal implementation would seem to be to start with something like the TILE/StatDocs projects that Duncan Temple Lang (Duncan, I hearby dub you DTL for the remainder of my posts. Sort of like DHH, the Ruby on Rails guy) and Deb Nolan have been doing for a while now (there used to be a StatDocs.org website, but it seems to have disappeared). They give you an organizational unit of a "Lab" or an "Experiment," with some data and ways of interacting with the data (plotting and so on). These'd probably be hosted on the school's server and brought across the network dynamically. Another interesting tack would be to let the server also COLLECT data from the kids' experiments to be combined together into larger datasets via surveys or actual data collection. An ad hoc version of this is pretty common in introductory statistics courses for example.

This leads to the question of what to use as the analytical environment. Being biased, I would tend towards using R + GTK + Gecko, with a UI designed for the more modal experience of the OLPC. Alternatively, Python could be used for the implementation, though I'm not convinced that Python would be a particularly good interactive data language (things like R/S's formula DSL are really quite powerful for example). I tend to dislike point-n-click on principal, especially in an environment that stressed authoring languages.

Beyond that, you want kids to be able to collect their own data. I remember having this thing for my Atari 800 when I was a kid that had a number of different probe options for things like temperature and pH and rainfall and such. The software it had was pretty primitive, but it was lots of fun (for me anyway) playing with the temperature probe and such. It seems like the USB ports could be used to build data acquisition systems pretty cheaply these days---you wouldn't need really high performance for the most part and that lets kids collect their own data, which will always be more compelling than some prepackaged dataset since they "own" the data. There's at least one, possibly more, Open Source hardware project for building lots of these sorts of sensors for DIY weather monitoring that can probably be leveraged for that sort of thing.

Saturday, August 4, 2007

RExecServer can load help

It's not the best implementation in the world---I still think that should take place at the R level not in the GUI implementation---but on the plane to Seattle I gave RExecServer the ability to load help via the pager instead of just complaining.

Friday, August 3, 2007

Using RExecServer from ESS

In the long term I hope that most of this can be configured by an Installer of some sort, but for those wanting to use RExecServer with ESS, here's how my setup is currently configured:

When I use Emacs at all (which is very rarely these days, I mostly use TextMate) I use Aquamacs, which ships with ESS installed. Hopefully the instructions won't be all that different for people using Carbon Emacs and what have you (and if they are, perhaps people would be so kind as to post their changes). I'm also not a super-sophisticated ESS user so there might be better ways of doing the ESS side of things.

First, though, we need to get ourselves easy access to RExecServer from the command line. I like to do this via a symbolic link in /usr/local/bin (/usr/bin would also work). You'd think that we could just symlink RExecServer.app/Contents/MacOS/RExecServer and be done with it, but the way OS X starts applications is... uh... strange and doing that will cause all sorts of problems because the bundle loader won't be able to find any resources and promptly crash out. It's not pretty. What we need to do is actually set a special environment variable called CFProcessPath to point to the application bundle rather than /usr/local or whatever.

Fortunately, I made this into a little script so you won't have to do it yourself. It lives in the Resources folder of the application bundle. So, from Terminal we would do something like:


$ cd /usr/local/bin
$ sudo ln -s /Applications/RExecServer.app/Contents/Resources/RExecServer.sh R-exec

if you have put RExecServer.app somewhere other than Applications, use that path instead of "/Applications." That's pretty much all there is to it. Of course, "/usr/local/bin" should be in your path


$ R-exec

R version 2.6.0 Under development (unstable) (2007-07-14 r42234)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

You should be able to use R-exec from Terminal just like normal R. For ESS, I have a line in my .emacs file:


(require 'ess-site)
(setq inferior-R-program-name "R-exec")

that starts my RExecServer version of R instead of the normal version. Once upon a time, I think I read that ESS is smart enough to detect multiple installed versions of R with specially named symlinks, but I don't know how to get it to work (if an ESS expert knows how to do this, please chime in).

In any case, that's pretty much all there is to it. Hopefully that helps, but if you have questions please email me or (better yet so others can see) leave a comment.

Thursday, August 2, 2007

Oh, yeah.

I did figure out one thing in an idle moment on vacation last week. How to paste from Excel to the R console. Thought y'all might like something like that. I should probably do another featurecast soon to show some of the things that have been fleshed out a bit. These include...

Multi-level hinting. If you have "plot(foo()," the hinting mechanism will show you "plot," rather than nothing

Drag-n-drop from the workspace to the console gets a summary() on those objects

Excel (or Excel-like things I suppose. Anything that posts VALU or NSTabularPboardType) pasting

Device recording, with the option to deactivate as well as clear plot listing

Clear the console

Copying text from the console optionally does so in a "source()-able" manner. i.e. prompts and output are removed.

Keybindings for popular things like Command-= to give you "<-"

TextMate-style brace handling. i.e. select some text and type "(" (or ",`,',{,[,Command-[) and it will put the appropriate bracket around the selected text.

An editor. :-)

Preferences :-)

Any requests?

I'm sure there's other stuff I've forgotten at this point, but I'm still unburying myself otherwise and getting ready to head to BioC'07 in a couple of days.

Tuesday, July 31, 2007

Quick Post

Back from vacation, got a request for instructions on how to set up ESS to work with RExecServer. Consider me on that... right after I dig myself out of this email hole. :-)

Thursday, July 19, 2007

And We're Back!

Okay, apparently Blogger decided that this was a spam blog for a while there and I didn't notice because I had other things to do this week. All sorted out now it seems. To celebrate, I updated the leopard branch of the RExecServer repository to activate bitmap graphics output to files, thusly:


quartz("public.jpeg:Rplot.jpg")

Supported UTIs are public.jpg,public.tiff,public.png and public.jpeg-2000. public.pdf will be coming along in the not too distance future. This is actually a side effect of moving from using raw bitmaps in the GUI to using png images. Your average statistical graphic is usually pretty easy to compress so you can get from ~500K for a raw image down to ~20K pretty easily. On the server side it doesn't matter much because we only keep one bitmap for each device, but on the GUI we're keeping a potentially large number of pages that can eat up RAM.

To people tracking the repository, the leopard branch is really no different from the master branch. I just push work done in my Leopard partition there before migrating to the master branch with a test run in Tiger. The leopard branch will work in Tiger unless I've made a mistake.

In other news, posting will be spotty of the next week. I'm heading down to Los Angeles tomorrow afternoon and then down to San Diego on Wednesday and Internet access will come and go.

Sunday, July 15, 2007

RExecServer git repository update

The master branch of the git repository is now up-to-date with the leopard branch, minus any leopard specific code (of which is there none at present). R-devel is required to get things to compile. The GUI code isn't available yet, I'd like to get things to a truly usable stage first. I haven't had as much time this week, other things have taken priority, but some minor stuff was added such as the beginnings of the preference panes (and the infrastructure to support them) as well as the start of the action menus. I also changed the graphics device plot selector to take advantage of available space. Perhaps I'll post a screencast of that soon.

In the meantime, people who aren't using OS X, but want to consider a similar GUI type of set up might want to look at the Omegahat CORBA stuff, which did something very similar a long time ago. The page hasn't even been updated since before OS X 10.0 was released. :-)

Tuesday, July 10, 2007

Just a couple minor updates

Doing some other things today so only a couple of minor updates watching The Tour this evening:

Removed the mouse-sensitive pages menu. This seems to be pretty disruptive without putting some extra delay into the transition, but the delay makes it too slow. Instead, I'm now using the toolbar lozenge to show and hide. I think I'll also hook it up to middle-click.

Implemented some display transformers for the Workspace. There's a (localized) "n objects" on the bottom now. The class and size information for objects is now shown (could probably be prettier!) as well.
Updated: Oh, yeah. You can also clear a console with Cmd-L now (the current prompt is restored). I saw that feature request on R-SIG-Mac a couple of weeks ago and it was something like 5 lines of code so I tossed it in.

Monday, July 9, 2007

R GUI Screencast

Some of the new GUI features are hard to get from screenshots so I decided to make a little screencast to show off a couple of the new features.

Saturday, July 7, 2007

Starting to think about help...

You know what would be awesome? If the HTML manpages generated by R were actually microformat-enabled HTML. I was just thinking about how it would be nice if you could identify example code and whatnot from the eventual help page. I wonder how scary that code is? IIRC the HTML generated by R's manpage generator was pretty much HTML3 (so not even easy to style) so it could be pretty scary. It may even be one of the Perl bits.

Okay, I need to get some sleep now.

Friday, July 6, 2007

Making Progress on the Console

Making some progress on the GUI's Console implementation. You can see that we now indicate WHICH server we're talking to at the moment. I've added an attached "workspace" inspector in the form of a source list such as the ones you see in Mail and iTunes. This holds information about the objects for inspection or dragging and dropping between servers. The thin black line is a splitter bar and is user selectable (double clicking collapses it entirely).

Both sides now have a status bar (of course I made sure the shading matched the action button... why do you ask?) with an action menu. As well, the hint is back and is now being powered from a new object inspection infrastructure in RExecServer (I have some other code to work on, but I will hopefully mind a few minutes to finish things and do a git push). We're now parsing out the function information rather than capturing string output so I think we can do completion in the style of Xcode or TextMate where, say, lm completion would result in:


lm(<#formula>,<#data>,<#subset>,<#weights>,<#na.action>)

I think Ctrl-/ should advance and that TAB (perhaps, I'm open to suggestion) should complete the editing. In other words, imagine we had...


lm(X ~ Y,mydata,<#subset>,<#weights>,<#na.action>)

then TAB would clean the other three arguments and jump us out of the function. You can't see it, but the console is actually doing a lot of invisible markup that we can use to detect the completion regions. We're presently also using text attributes to track input, prompt, error and output regions. This means we can easily cut and paste text WITHOUT prompts (which has been a major feature requests) and without ambiguity. We can also save a console script as a sourceable .R file for later use (sort of like dribble) saving the previous output as comments or not as all.

Thursday, July 5, 2007

Oh Good, Performance Doesn't Totally Suck.

Been doing a bit of performance testing this evening on the R GUI with RExecServer. Qualitatively, on my year old MacBook Pro, it seems that nothing has changed. At least, the overhead incurred by drawing the graphs or outputting the text seems to outweigh the communication overhead by quite a bit. This may change when hinting gets going, but thus far there doesn't appear to be a problem. Additionally, we have plenty of space for optimization in the distributed objects interface since we're not defining protocols yet. Good good good.

One thing does worry me: shipping bitmaps across the wire. I *think* that NSData it is actually optimized to use shared memory, but I'm not positive. In any case, the bitmap size of a full screen graph is pretty huge. That said, we could always do some timing (hey, we're statisticians, right? Staunch empiricists.) and then compress bitmaps above a certain area for transport via libz or something suitably fast. Hell, RLE would probably be a big win in most plots. I know you're thinking "use PDF," but PDF rendering performance is really awful relative to compositing to a bitmap. (Witness complex scatterplot in your Keynote presentation. There's a reason it takes 30 seconds for your graph to come up.)

Wednesday, July 4, 2007

Ha Ha! Success!

After several hours of missing one very important line of code in RExecServer, which was preventing it from operating in purely vended mode---it was a delegate problem and the TerminalDelegate would replace it---I've gotten RExecServer executing and communicating with an R GUI.

The GUI is Leopard specific, but I can offer some details so far:

We use a normal NSDocument with no modifications to NSDocumentController or NSApplication or any of the questionable things I had to do for the pre-RExecServer GUI implementation

Devices are considered to be views of the document so devices and the console are explicitly intertwined now. In much the same way that the main Interface Builder window and the associated UI views are linked. Hopefully this means an end to the infamous "hanging" windows.

Scripts won't be, but we'll need a UI for specifying their target console...

Tuesday, July 3, 2007

Apparently I'm on a tear...

Rob Goedman gets the Official Tester Award for RExecServer, reporting several issues with the graphics subsystem and my inability to appropriately use version control systems. There are a bunch of changes in the git repository:

All the files you need are actually there

Clipping rects are restored on subsequent updates so some things are working better

Terminal support has been refactored into a pseudo-GUI to help make sure I have coverage on the things I'll need for a true front end. Its probably fractionally slower than a true Terminal version of R, but it has some more flexibility

Device windows close when devices go away. A good thing too because they'd crash if you resized them after the device disappeared.

I've been playing with making the RGUI_Type not be AQUA, which restores all of the help file functionality, but causes an annoying (and untrue) complaint from quartz() among other things like trying to use X11 for select.list(). I wish there was a way to selectively deactivate things like that in R instead of the blanket situation we have now. There are a number of places where we have


if(.Platform$GUI == "windows" | .Platform$GUI == "AQUA") ... else ...

that are just irritating. Personally, I think we should be doing dispatch (S3, S4, I don't care) where .Platform$GUI becomes an _object_ so that I can define functions for my particular GUI and where .default is the stuff on the RHS of the else.

Monday, July 2, 2007

RExecServer object vending

The RExecServer started getting basic object vending today. To test it out I implemented some simple object copying using Distributed Objects. This is checked into the public git repository, but will require R-devel to run because of changes to function export under R-devel. Behold!

First we start ourselves a couple of copies of R. Then,


> x = 1:100
> .Call("RES_CopyObject","x","R Execution Server 2")
NULL
> .Call("RES_ServerName")
[1] "R Execution Server 1"
>

Afterwards, we can take a look at the other execution server:


> ls()
[1] "x"
> .Call("RES_ServerName")
[1] "R Execution Server 2"
> x
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100
>

Ta Da! There's no big trick to it--we're just using the standard serialization routines to read and write NSData objects on the Cocoa side and then using the usual NSDistantObject routines to actually transmit the data. There's no real error checking at the moment, but that will come.

Also, with the help of some scripts from Rob Goedman I think I've tracked down the last of the clipping and state stacking errors in this latest checkin.

RExecServer available as a git repository

The RExecServer source is now available as a git repository. Once I figure out what exactly they mean I will be "mobbing" it to allow for patch submission. The binary hasn't been updated, but the source fixes a few reported problems with the pager() and some graphics issues (clipping mostly). It also adds the ability to specify the usual command-line options except for the gui related ones (for obvious reasons).

Saturday, June 30, 2007

RExecServer Preview

Posted last night to R-Sig-MAC:

I've mentioned this little project to a few people off-list as something I'd like to do for Leopard, but it occurred to me that there is nothing particularly Leopard specific in this particular piece of code. So, here we go:

RExecServer

This project mostly comes out of a conversation I had with Stefano about using multiple cores in the R GUI and Xgrid. The RExecServer is a first step in that direction. It provides a true Cocoa application that runs as a background server (so no Dock icon or menubar). The user (that's you) communicates via either a normal stdio connection (i.e. Terminal or ESS) or using Distributed Objects (for the GUI). In this initial implementation only the stdio access is working.

To use it I recommend symlinking the shell script RExecServer.app/Contents/Resources/RExecServer.sh to something handy in either /usr/bin or /usr/local/bin so that you can get at it from ESS or the Terminal. Then, just use as normal.

Things you get:

Mostly a working, fully responsive, Quartz device. This Quartz implementation is actually completely new so you may notice that certain things are different. Particularly, the font metric calculations are now improved---note the location of elements in plotmath (particularly sum and product). Right now aliasing is turned off, but that will be an option (I was experimenting with something). It also doesn't update the screen until it's done processing so while it feels slower it is actually much faster. There might be a few clipping bugs, but we'll sort those out.

A normal readline-based interface that can be used from ESS or Terminal. You can also start multiple copies, though it presently complains about Services. This is harmless though.

Very low CPU usage when idle. I'm forced to use polling with readline, but it doesn't appear to use very much. The event loop works differently in this version so there is no need for a timer or anything.

I'm not sure how much time I'll have, but here's what the design buys me:

We can pipe bitmap and pdf output through the quartz device. This means no more X11 required. Right now this isn't working, but the infrastructure is in place.

We can separate the GUI and R itself. This has pros and cons but I think it will be a long term advantage, especially as we get more cores.

Things I'd really like to do (again, time):

Copy-n-paste objects between Servers. Using serialize/deserialize and Distant Object or NSPasteboard connections

Quicktime movie output device. This might wait for Leopard.

If you poke around the link above you might find some other ideas. :-)

What this isn't:

Intended as a complete GUI. That's mostly for the front-end implementation which is a separate application. The graphics device is intended to be very minimal for ESS users who want something better
than the old Aqua device.

Anywhere near complete. You don't get lots of things right now. Like command-line options. Any options at all really. Lots of safety things aren't wired up either.

Let me know what you think and if you run into major trouble. The build is Universal so it should also work on PPC.

Thursday, June 28, 2007

An improved flowSet idiom?

In flowCore, a flowSet is associated with an AnnotatedDataFrame that contains ancillary information about the frame. This seems really useful, but it's really only used by the flowViz package---there really isn't anything remotely resembling a useful interactive idiom. Now, we could use subset(), assuming we could ever get the generic working properly, but we already have Subset in flowCore and the opportunity for confusion is high. One idea that has occurred to me is to take advantage of the ellipsis argument in [ and [[ to let us say things like

patient[[CellType="B Cells"]]

to extract the flowFrame identified by the CellType column.

Tuesday, June 26, 2007

If I Only Had The Time....

Other things I'd like to do with R if I had the time (if someone else needs a project, I won't mind :-) )

1. A JITer for R to extend Luke's Bytecode stuff. I would say something along the lines of a binding so, say GNU Lightning. Now that function pointers can exist as R objects for use in .Call you'd create an EXTPTRSXP that protects the RAWSXP holding the generated code.

2. Finish up my libffi interface for R. I wrote one of these just after DSC2003. It probably still even works since libffi doesn't drift very much. It would probably be nicer if it was integrated with TypeInfo though. It also let you write R functions that appear to be C function pointers (for use as callbacks for example), though this has issues in multithreaded environments.

3. A centralized object database. One of the things I actually like about S-PLUS is the persistent database notion. I often have little pieces of code (for example my alpha function) that I stash away in scripts and places and then promptly lose. It would be nice to have a little database, perhaps with versioning, that you could easily tag and search. Hell, it could even sync with something online.

4. GData for R. I think it would be cool to be able to access Google Spreadsheets from R. It could be a pretty slick way of distributing data easily. If there was a way to hook it up to Google Docs for documentation and description through the help system that would also be cool. You'd probably have to use RCurl as the back end to get https support. I've started this one a couple of times but I don't really have a pressing need so I end up putting it on the back burner.

5. A complete dbxml interface. Again, I have chunks of this one, but never finished it (I ended up just using pipe() and the command-line tool). DBXML is pretty handy if you have a massive XML file (say a FlowJo workspace) and you only need a teensy tiny little chunk (like the gating strategy).

6. R on Rails! Okay, that might be a little silly (though I wonder how I would do R mixins...)

7. GPU backend. Actually, with the advent of CUDA you could probably do this pretty easily a la the Matrix package.

8. R/Flash (or Flex) interface. Plots as SWF anyone? Do it right and you could use it to serve up things in Flex/Apollo apps. I suppose you could also do a XAML one?

Monday, June 25, 2007

Old Things New Again: Multiple Evaluators for R Under OS X

This post is mostly the result of a conversation I had with Stefano Iacus a couple of weeks ago at WWDC. He was making the observation that a) he would like to be able to run multiple copies of R from the R GUI and b) that he would really really love to run R evaluators over XGrid.

The second one might be harder, but I think the first one can be solved. Ideally, we would simply be able to spawn off multiple R evaluators in separate threads within the R GUI and apart from synchronization problems in the GUI we would be good to go. However, I rate the chances of R becoming thread-safe (let alone supporting multiple evaluators) any time soon as "slim to none." Of course, I'm not on R Core so I could be wrong, but from what it would take (every function in every package would need an extra argument for starters) it seems unlikely. The way around this? Spawn off separate R processes and connect them up within the R GUI. This is basically what people do when they use R from Terminal so there is no real disadvantage compared to the current methods and a lot of potential advantages.

So, the plan:

1. Implement RExecServer as an LSUIElement application. As much as I'd like to use Leopard-specific features here (garbage collection in particular), people using Tiger have multiple processors too. So, we're stuck with autorelease pools for now.
a. Vend an interface as a NSDistantObject that can be picked up by the GUI
b. Provide a threaded stdin reader (if TERM is set) using the "traditional" R reader. This is to provide ESS support. I think we can actually vend the object and allow the stdin reader at the same time. Er, this could be a cool feature for something we'll talk about in October (if all goes well, this isn't my day job so it only gets implemented when I have some spare (hah) time) :-).
c. Theoretically, we could vend to/from different machines. Using Bonjour you could publish your R session. We'd have to work out some sort of security model. Not sure how that's normally handled by NSDistantObject.

2. Change the graphics device a bit. Mostly I don't think we want to ship around the graphics list. In general, we can ship a bitmap that is appropriate for the display. We can also ship back a PDF if so desired, but performance with a bitmap is likely to be higher with no discernible quality difference except in special circumstances. We can just have a [Device dataInFormat:] that sends back an NSData of the appropriate format (RGBA bitmap, PDF, etc). The nice part is that both the client and server are running OS X and both have access to the Font metrics so there doesn't need to be communication there.
a. On the ESS thing again, we can provide a simple shim device window to give ESS users a decent graphics device with full interaction. It won't be as cool as the one in the GUI, but that's what you get for using ESS ;-).

3. The GUI now maintains connections to any number of GUI console/device windows.
a. Stefano suggested being able to copy and paste between environments. I think this can be done using private Pasteboards and serializing objects to RAWSXP types, converting them to NSData and then transferring them over.
b. The GUI itself never need become unresponsive. You could even force kill a runaway server.

Now, certain things get more difficult. Certain GUI toolkits, like my own Mojave, won't be running in the GUI process anymore making them difficult to write GUIs. Of course, nobody that I know of writes GUIs using Mojave (and you'd think I'd know), but this also rules out everything else. Personally, I think the way around this is some sort of Dashboard-style interface where the front-end is implemented in HTML and Javascript with hooks back into R using a special protocol handler (which is apparently an SDK these days...) or a Javascript proxy to allow execution. I tend to favor the protocol handler.

Friday, May 25, 2007

Interactive Gating with flowCore

One of the biggest hurdles for flowCore adoption is probably the lack of interactive gating. People really like it and it can be useful even to us statistician types that are suspicious of the entire exercise. R and flowCore, unfortunately, aren't particularly good at interactive graphics so this presents something of a problem.

To help with this, here's a quick function for getting a 2D polygon gate with an optional transform:


interactiveGate = function(fcs,x,y,filterId="Picked", trans=NULL,...) {
  if(is.function(trans)) {
   tnf = structure(list(trans,trans),names=c(x,y))
   trans = do.call("transform",tnf)
  }
 plot(exprs(if(is.null(trans)) fcs else trans %on% fcs)[,c(x,y)],
  pch=20,cex=.5,main=filterId,...)
 points = locator(type="n")
 polygon(points)
 points = cbind(points$x,points$y)
 colnames(points) = c(x,y)
 r = polygonGate(filterId,points)
 if(is.null(trans)) r else r %on% trans
}

The fcs parameter is a flowFrame and the x and y arguments should be pretty obvious. The trans argument can be one of three things: NULL, a function, or a transform object. If it is a transform object (created by transform(...)) then it is simply applied. If it's a function, a transform will be created for both parameters and also applied to the subsequent gate.

e.g.


interactiveGate(fcs,"APC-Cy7-A","PacOrange-565-A","Control",function(x) asinh(x/32))

Just a little thing, but pretty useful.

Monday, April 30, 2007

Is it time for Q?

There was an interesting post over at Jim Hugunin's blog today announcing the Dynamic Language Runtime, which appears to be a refactoring and standardization of the IronPython and Javascript implementations on the CLR. The stated goal is to support the development of dynamic languages atop the CLR, freeing implementors from garbage collection issues and whatnot.

Considering Duncan Temple Lang's recent talks about system interoperability (challenges, advantages, etc) it seems like this might be a nice way to have an integrated statistical environment that can be bound to the sorts of things software engineering projects have come to expect but happen to be pretty tedious to implement (a decent network library, etc).

As a side note, we probably can't call it Q since a language of that name already exists in active development.