Feb 10

All your datasets R belong to us

I would like to introduce Rincanter, a binding for Clojure/Incanter to the R language for statistical computing. This is a thin layer over the existing Java/R bindings done by the folks at rosuda.org. Why did I write this? Thanks to the hard work of Rich Hickey, David Liebke and others, you can already do impressive statistical data-mining tasks using only Clojure. However, the R project has a huge body of work of libraries and datasets that the much smaller Incanter community won't be able to match, at least in the short term. Unless…we can provide an easy to use bridge that would allow us to work mostly in Clojure and break into the R cookie jar for its datasets and function libraries when we need to. So, the goals for this project are:

  • Short Term:
    • Provide access to the vast datasets available in R to Clojure/Incanter users. This requires that we can convert between R data types and Clojure/Incanter data types. For the most part this is working.
    • Provide access to the large body of R packages and function libraries to fill in the gaps where Clojure and Incanter don't have functional coverage. This is partially working, but there are probably quite a few places where the Clojure side and the R side just won't match up without some serious fudging.
  • Long Term
    • Provide a scaffold for porting R packages, functions and datasets over to what many people believe is a stronger base language. While R is an impressive language in many ways, even some of its Founders think that a full featured lisp could be a better way forward for basing an interactive statistical environment on. I would like people to strongly consider Clojure for that position.

A quick walkthrough

This will show a quick example showing how we can access R datasets available remotely on CRAN and import them into Incanter. We will be interacting with Incanter and R inside a REPL session.

To start with, you will need to get Rincanter up and running. There are some fairly detailed instructions for doing this on the project Home Page.

$ cd /path/to/where/you/downloaded/rincanter
$ lein repl

You should now have a REPL running with the required classes and packages loaded. Now we are ready for an interactive session. This will just be a very simple example showing how we can access R datasets available remotely on CRAN and import them into Incanter.

$ cd /path/to/where/you/downloaded/rincanter
$ lein repl

As you can see, it's fairly easy to grab any existing package and dataset on CRAN, download it, and pull the data into Incanter.