R is a popular scripting language oriented towards statistical computing, and with the addition of the BioConductor module, suitable for biological data analysis. Taverna comes with support executing R-scripts as part of a workflow. The functionality is very similar to how to use the Beanshell processor, so this section will only cover what is special about the RShell processor.
First of all, R and required R packages, such as the BioConductor, must be installed locally on the machines that will be executing the workflow. This is outside the scope of this manual, we refer to the FAQ for R on how to install. Once you have R installed, you can start it either on the command line with the command R or using the appropriate application shortcut, where you should get a shell that looks somewhat like this:
: stain@mira ~;R R version 2.4.1 (2006-12-18) Copyright (C) 2006 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > > sin(pi) [1] 1.224606e-16
If this is working, it should be quite easy to install required R modules such as the BioConductor:
> source("http://bioconductor.org/biocLite.R") > > biocLite() Running biocinstall version 1.9.9 with R version 2.4.1 Your version of R requires version 1.9 of Bioconductor. Will install the following packages: [1] "affy" "affydata" "affyPLM" "annaffy" "annotate" [6] "Biobase" "Biostrings" "DynDoc" "gcrma" "genefilter" [11] "geneplotter" "hgu95av2" "limma" "marray" "matchprobes" [16] "multtest" "ROC" "vsn" "xtable" Please wait... (..) The downloaded packages are in /tmp/RtmpNXlF02/downloaded_packages >
Taverna communicates with the local R installation using the RServe protocol. This is a network
based service that allows you to submit a script to be run within an R
environment. In our setup that means that the R script will be executed by
the RServe server process, and not the Taverna workbench. The service can
be configured to allow different network users identified with passwords,
but since they would be able to basically execute any code on that
machine, for security reason we recommend that you stick with the default,
which is to only listen on localhost
, without requring
a password. If your machine has multiple users we recommend you to enable
usernames and passwords to make sure only you can access the RServe
service.
Follow the installation instructions for RServe for information on how to install and start the RServe service. Here's the short version for version 0.4.3:
: stain@mira ~/Desktop;curl -fO http://rosuda.org/Rserve/dist/Rserve_0.4-3.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 86336 100 86336 0 0 239k 0 --:--:-- --:--:-- --:--:-- 465k : stain@mira ~/Desktop;R CMD INSTALL Rserve_0.4-3.tar.gz * Installing *source* package 'Rserve' ... checking for gcc... gcc-4.0 -arch i386 (..) ** building package indices ... * DONE (Rserve) : stain@mira ~/Desktop;cd /tmp : stain@mira /tmp;sudo -u nobody R CMD Rserve R version 2.4.1 (2006-12-18) (..) Rserv started in daemon mode. : stain@mira /tmp;
Notice that you will have to execute R CMD Rserve
to start the service again if you reboot the computer. For security
reasons, we recommend you to use a separate, non-privileged user account
on your machine for running RServe, so that if there is a security
problem, the R script won't be able to access your files and can be easily
isolated.
The RServe documentation describes several RServe clients, note
that the RShell processor is based on the JRclient
library.
Up to R-2.4.0
Download the corresponding binary Rserve.exe from http://rosuda.org/Rserve/dist/rserve-win.html in the same directory where R.dll is located (by default C:\Program files\R\R-2.4.0\bin).
Start Rserve by double clicking on Rserve.exe located in the bin directory.
You can now use the R processor in Taverna.
R-2.4.1 and later version
Install Rserve: From R menu, select Packages Install package(s) (select a mirror) Rserve.
Load Rserve: From R menu, select Packages Load package Rserve
Start Rserve: From R workspace type: Rserve (port = 6311)
You can now use the R processor in Taverna.
Note that you will need to start again step 2 and 3 anytime you are starting Rserve.
To add an RServe processor to a workflow, locate RShell under Local Services in the service scavenger panel. Either drag the processor to the Advanced model explorer, or right click and select .
Right click on the processor and select
to bring up the RShell configure dialogue.The first tab of the dialogue lets you type in the script, similar to the editor of beanshell processors. In addition you can open an existing script from a file. For this example we'll do a rather trivial sinus function.
Just like the beanshell
inputs and outputs are accessed through variable names, the RShell
processor makes input ports available as variables named after the port,
and output ports read their named variable after executing the script.
That is, the last assigned value to the variable will be the one returned
from the processor. So for this script to make sense we have to make an
input port x
and an output port y
.
Flip to the tab and click
, specify the port name
x
. Next, we'll have to specify the type this variable
will have within the R-script. Although Taverna normally operates by
passing around text strings, R is a typed language and you need to specify
that in this case x
is to be parsed a
double
, for example 0.45
.
Create the output port y
in the same way in the
tab, and remember to also set it's
type to double
. You should now be able to build the
workflow, connect the ports, and run it with an example input
0.5
which should give you an output
0.479425538604203
.
If you configured your RServe to use a different port, or to require
username and password, you can flip to the Keep session alive, which
will re-use the same connection each time you execute the script. This
means that if the script assigns objects to other variable names, say
z=x+1337
, z
will be available in the
R namespace for the next execution, like in an iteration. However, we
generally recommend transferring such state through the workflow instead
of keeping it in the R environment.
The input and output port type R-expression
can
be used to link several R processors together without regarding the
internal data type. This is useful when passing complex R objects from one
R script to another, however, as the whole object will be serialised this
is not recommended for very large structures, for those situations it
might be better to use the Keep session alive option
and share a global variable.
If you select the array datatypes such as
double[]
, integer[]
and
string[]
, the processor input port will consume a full
list of values of the specified type, which is useful if the R-script is
to do array indexing or statistical analysis on a vector of items.
Similary an array output port can be used if you want to return more than
one value. The port types are:
true or false (1 and 0 also allowed)
a floating point number
a natural number
R-expression to pass between RServ processors
string value
a list of doubles
a list of integers
a list of strings
an image created by the plotting device (for outputs), see section Graph output below
(unknown)
In the interactive R environment you might be used to creating fancy
graphs. You are able to create graphs in R through Taverna as well, but
instead of the graphs popping up directly on your screen you will have to
return them as image data to the workflow. The graphs can then be viewed
as part of the workflow output. Make a new output port called
g
, and set its type to
, and in your R script, use
png(g)
to enable PNG output to a variable called
g
, and dev.off();
when you are
finished plotting. Example:
png(g); plot(rnorm(1:100)); dev.off();