--
StianSoiland - 30 Mar 2006
Using the R processor
Taverna includes the possibility to run R scripts. R is the open version of the programming language S, and is highly popular for statistical purposes.
See also the
MicroArray page.
Requirements
Screenshot
Here's a screenshot that shows two R processors in a workflow. Note how the uppermost script calls
rnorm(counts), where
counts is the defined input port, and which output is passed to
squared, which does
nums*nums (not shown in picture). The actual workflow,
rnorm.xml, is attached to this page.
Activating in Taverna 1.3.2-RC1
By the time of release of
1.3.2-RC1, the R processor was not considered stable, and is disabled by default.
To activate it:
Note that the R processor included in 1.3.2-RC1 does not support connecting with username/password or to other hostnames/ports.
Activating in CVS build of Taverna
JRclient-RE817.jar is included in
taverna1.0/lib/, and support should be built automatically.
In
taverna1.0/src/taverna.properties make sure this line is not commented out:
taverna.scavenger.org.embl.ebi.escience.scuflworkers.rserv.RservScavenger = rserv
Installing Rserve
Rserve, also called Rserv, is a network service providing evaluation of
R scripts. Clients connect through the Rserve protocol, and client libraries are available for languages as Java and C.
Note: There is no security restrictions on scripts sent to a Rserve, and as R is a full programming language which provides access to the file system and the network, only allow Rserve clients you trust.
Normally the Rserve server will be running on the client, only responding to localhost requests. By default, no username or password is required to connect, so all local users can access the server.
To install Rserve, first install R, than follow the instructions on
http://stats.math.uni-augsburg.de/Rserve/doc.shtml, or for Windows, see
http://stats.math.uni-augsburg.de/Rserve/dist/rserve-win.html.
For security reasons, you should run the Rserve as a separate, non-privileged user, for instance a user called "r":
adduser r
To limit access for authorized users only, add to /etc/Rserv.conf:
auth required
plaintext disable
pwdfile /etc/Rpasswd
And create the file /etc/Rpasswd listing user and passwords like:
john Fish4You
peter 1MStUp1d
Note that since passwords are in cleartext, don't use "real" passwords in use anywhere else. To protect the password file from anyone other than R,
chown r /etc/Rpasswd and
chmod 400 /etc/Rpasswd. Note however that any Rserv user can read the passwords using R file functions.
To allow access from the network and not just localhost, add
remote enable to
Rserv.conf. Obviously you only want to do this with proper user authentication. Additionally you might want to put up a firewall to only allow connection from your institutional networks. The default R port is
6311.
Then, as root, start Rserve as the R user:
su -c "R CMD Rserve" r
In Windows, just start Rserve.exe.
If you have Jython installed, you can test the client quite easily once you know the path to JRclient.jar:
: stain@rpc268 ~;CLASSPATH=JRclient-RE817.jar jython
*sys-package-mgr*: processing new jar, '/local/stain/JRclient-RE817.jar'
Jython 2.1 on java1.4.2_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> from org.rosuda.JRclient import Rconnection
>>> r = Rconnection() # Connects to localhost by default
>>> r.login("stain", "fisk") # Only needed if you enabled auth in config
>>> r.eval("rnorm(5)") # Generate 5 random numbers
[REAL* (1.0476401366364214, -2.0138810712210984, -0.7726756687183931, 0.46842585768109235, -0.292819895704665)]
>>>
Adding R processor to a workflow.
In Taverna's Services panel, under "Local Services", you should find "R script". Add this to your work flow, possibly with a more clever name than "R script" by using "Add to model with name". You might notice that by default, the processor has one output port "value" and no inputs.
The output will be the result of the script you provide, namely the last expression evaluated.
Right click on the processor and choose "Configure rserv". Enter or paste your script into the "Script" tab. If you want an example script that does not require any input, try
rnorm(20).
Then select the tab "Ports" to add any input ports from your workflow. You can add as many input ports as you like, and the input will be available within the R script as a variable named by the port name. For instance, add a port named "numbers". You will have to select which data type the number is to be converted to in R, as R is a typed language and you can't do
"3" + "4.3" and expect the double
7.3. Possible types are
String,
int or
double.
(The type
REXP is for future use, for passing internal R structures between processors. This is not yet supported)
Within the R script, values passed from the workflow to the input ports will be available as an array of the specified type.
If you set up your Rserv with requirements for username/password, or on a different server than locally, go to the tab "Connection" to enter such details. Blank fields means to use the default, which for everything blank would mean
localhost on port
6311 with authentication.
Connect the output
value from the R processor to an output port or another processor, and run the workflow. Note that the output will be the last evaluation done, so if you have everything computed in R to an array named
results, include
results as the last line of your script.
Future possibilities
Future versions of the R processor might or might not include:
- Passing of internal
REXP values between R processors
- Retrieving variables as extra output ports, ie. being able to return several different outputs
- Persistant R states, possibly shared between R processors. (Could also be used to avoid transferring large data between processors)
- Prebuilt processors for commonly used microarray analysis functions
Links