r5 - 10 Apr 2006 - 09:44:45 - StianSoilandYou are here: myGrid wiki >  Mygrid Web  > TavernaRIntegration > UsingRProcessor
-- StianSoiland - 30 Mar 2006

Using the R processor

Taverna includes the possibility to run R scripts. R is the open version of the programming language S, and is highly popular for statistical purposes.

See also the MicroArray page.

Requirements

Screenshot

Here's a screenshot that shows two R processors in a workflow. Note how the uppermost script calls rnorm(counts), where counts is the defined input port, and which output is passed to squared, which does nums*nums (not shown in picture). The actual workflow, rnorm.xml, is attached to this page.

Screenshot of R workflow

Activating in Taverna 1.3.2-RC1

By the time of release of 1.3.2-RC1, the R processor was not considered stable, and is disabled by default.

To activate it:

  • Download JRclient-RE817.jar
  • Place the downloaded jar file into the subfolder lib within your installed taverna-workbench-1.3.2-RC1 folder. There should be plenty of other .jar files in that directory.
  • In the subfolder=conf=, make a new file named taverna.properties that contains the line:
    taverna.scavenger.org.embl.ebi.escience.scuflworkers.rserv.RservScavenger = rserv
    (If you use Notepad, remember to rename the file from taverna.properties.txt to taverna.properties)
  • Restart Taverna. A processor called R script should be available within Local services

Note that the R processor included in 1.3.2-RC1 does not support connecting with username/password or to other hostnames/ports.

Activating in CVS build of Taverna

JRclient-RE817.jar is included in taverna1.0/lib/, and support should be built automatically.

In taverna1.0/src/taverna.properties make sure this line is not commented out:

taverna.scavenger.org.embl.ebi.escience.scuflworkers.rserv.RservScavenger = rserv

Installing Rserve

Rserve, also called Rserv, is a network service providing evaluation of R scripts. Clients connect through the Rserve protocol, and client libraries are available for languages as Java and C.

Note: There is no security restrictions on scripts sent to a Rserve, and as R is a full programming language which provides access to the file system and the network, only allow Rserve clients you trust.

Normally the Rserve server will be running on the client, only responding to localhost requests. By default, no username or password is required to connect, so all local users can access the server.

To install Rserve, first install R, than follow the instructions on http://stats.math.uni-augsburg.de/Rserve/doc.shtml, or for Windows, see http://stats.math.uni-augsburg.de/Rserve/dist/rserve-win.html.

For security reasons, you should run the Rserve as a separate, non-privileged user, for instance a user called "r":

adduser r

To limit access for authorized users only, add to /etc/Rserv.conf:

auth required
plaintext disable
pwdfile /etc/Rpasswd

And create the file /etc/Rpasswd listing user and passwords like:

john Fish4You
peter 1MStUp1d

Note that since passwords are in cleartext, don't use "real" passwords in use anywhere else. To protect the password file from anyone other than R, chown r /etc/Rpasswd and chmod 400 /etc/Rpasswd. Note however that any Rserv user can read the passwords using R file functions.

To allow access from the network and not just localhost, add remote enable to Rserv.conf. Obviously you only want to do this with proper user authentication. Additionally you might want to put up a firewall to only allow connection from your institutional networks. The default R port is 6311.

Then, as root, start Rserve as the R user:

su -c "R CMD Rserve" r

In Windows, just start Rserve.exe.

If you have Jython installed, you can test the client quite easily once you know the path to JRclient.jar:

: stain@rpc268 ~;CLASSPATH=JRclient-RE817.jar jython
*sys-package-mgr*: processing new jar, '/local/stain/JRclient-RE817.jar'
Jython 2.1 on java1.4.2_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> from org.rosuda.JRclient import Rconnection
>>> r = Rconnection()          # Connects to localhost by default
>>> r.login("stain", "fisk")   # Only needed if you enabled auth in config
>>> r.eval("rnorm(5)")         # Generate 5 random numbers
[REAL* (1.0476401366364214, -2.0138810712210984, -0.7726756687183931, 0.46842585768109235, -0.292819895704665)]
>>>

Adding R processor to a workflow.

In Taverna's Services panel, under "Local Services", you should find "R script". Add this to your work flow, possibly with a more clever name than "R script" by using "Add to model with name". You might notice that by default, the processor has one output port "value" and no inputs.

The output will be the result of the script you provide, namely the last expression evaluated.

Right click on the processor and choose "Configure rserv". Enter or paste your script into the "Script" tab. If you want an example script that does not require any input, try rnorm(20).

Then select the tab "Ports" to add any input ports from your workflow. You can add as many input ports as you like, and the input will be available within the R script as a variable named by the port name. For instance, add a port named "numbers". You will have to select which data type the number is to be converted to in R, as R is a typed language and you can't do "3" + "4.3" and expect the double 7.3. Possible types are String, int or double.

(The type REXP is for future use, for passing internal R structures between processors. This is not yet supported)

Within the R script, values passed from the workflow to the input ports will be available as an array of the specified type.

If you set up your Rserv with requirements for username/password, or on a different server than locally, go to the tab "Connection" to enter such details. Blank fields means to use the default, which for everything blank would mean localhost on port 6311 with authentication.

Connect the output value from the R processor to an output port or another processor, and run the workflow. Note that the output will be the last evaluation done, so if you have everything computed in R to an array named results, include results as the last line of your script.

Future possibilities

Future versions of the R processor might or might not include:

  • Passing of internal REXP values between R processors
  • Retrieving variables as extra output ports, ie. being able to return several different outputs
  • Persistant R states, possibly shared between R processors. (Could also be used to avoid transferring large data between processors)
  • Prebuilt processors for commonly used microarray analysis functions

Links

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
pngpng rserve.png manage 125.5 K 24 Jul 2006 - 09:41 StianSoiland Screenshot of Rserve processor
xmlxml rnorm.xml manage 1.2 K 24 Jul 2006 - 09:41 StianSoiland Example workflow calling rnorm() in R
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r5 < r4 < r3 < r2 < r1 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback