4.3. Beanshell scripting

There are often cases in workflow construction where the output of one processor is not quite right for the input of the next. There are various options to cover these cases - the user can make use of so called shim services exposed in the same way as other operations, they could create a new service to perform the transformation or, for relatively simple cases, they could create a single non service component in the form of a script.

Beanshell scripts, as the name suggests, use the Beanshell scripting engine. This gives the user access to a form of interpreted Java; this section therefore assumes a minimal level of Java knowledge. For users who have never attempted Java programming we recommend the Java tutorial on Sun Microsystem's website at http://java.sun.com/docs/books/tutorial/. There are certain minor differences between the core language described there and the version used by the Beanshell, these are further documented at the Beanshell web site at http://www.beanshell.org/ - the good news is that almost all these differences make it easier to use than conventional Java; it's unlikely a typical user would ever encounter them however.

As an example of a simple script consider the following use case: Given three raw sequence strings (protein or nucleotide) create a single string containing the three sequences with FASTA format titles. For simplicity's sake assume that the titles are all fixed (although we could easily have the titles as parameters to the script).

4.3.1. Creating a new beanshell instance

Create a new Beanshell processor either by dragging the Beanshell scripting host from the local services section of the service selection panel into the list of processors in the Advanced Model Explorer:

or by right-clicking and selecting Add to model:

We recommend that you change the name of the inserted beanshell processor to something that describes what the processor will do. For our example, something like FASTA_format_sequences will do.

4.3.2. Defining inputs and outputs

The first things to configure, are the input and output ports of the new instance. The Beanshell configuration panel is accessible from the right click menu of the new processor, selecting Configure beanshell... from the menu:

This will open a new window containing options to configure the script itself and the inputs and outputs. Selecting the Ports tab allows you to create, remove and modify the types of the inputs and outputs to this processor. Input and output ports are the connection points between the workflow and the executed beanshell code. From a programming point of view, you can look at the input parameters as parameters to a function call, while output parameters are return values. Input to the processor will be available as variable names within the beanshell script (the names match the input port names), while output ports extract the value of the named variables after the script has executed.

A new input is added by entering the name in the text box to the right of the Add Input button then clicking on the button to create the input. The input ports appear in the Inputs list along with the default type a Plain Text corresponding to a single string with no additional MIME type information. Although in this case the default is the correct value this can be changed by selecting either the Plain Text at which a drop down list will present the available types, or the a, in which case options are available to cycle through the collection types such as a list of. Leave the defaults for now and use the port creation mechanism described above to create three inputs and one output with sensible names:

4.3.3. Configuring the script

Now the processor has the correct inputs and outputs the remaining task is to specify the logic connecting these together in the form of a script. Selecting the Script tab makes available a syntax highlighting editor (based on JEdit) into which the user must enter a Beanshell compatible script:

Having defined the available ports (both inputs and outputs) the script engine will, when this processor is enacted, create new bound variables in the scripting environment corresponding to the names of the input ports. It will extract bound variables with names corresponding to the output ports at the end of the invocation, and use the values of these as the output values of the processor. In this case therefore the script must make use of the variables named seq1, seq2 and seq3 and ensure that there is a variable of the appropriate type called fasta bound in the environment when the script completes. The types are determined by the a, a list of ... options in the Ports section - if the type is a single Plain Text the variable bound to it will be a String object, if a list of Plain Text the value will be a Java List<String> implementation where the items in the List are String objects and so on. Corresponding logic applies to the output - if the Ports section declares that there is an output called fasta with type a Plain Text the script must, before it completes, define a String called fasta containing that result value.

The screenshot above showed a script (more verbose than strictly required) which fulfils this contract and performs the desired function, those familiar with Java will realise that this could be done in a single line.

Changes to the script are immediately saved ot the workflow. Note that if you are using Mac OS X, instead of the usual clipboard shortcuts such as Apple-C for Copy, you will have to use Windows-style Ctrl-C within the script editor.

4.3.4. Sharing and reuse of scripts

Because the Beanshell processor only exists as part of a workflow (unlike, for example, a SOAP service which exists on some networked resource) there is a potential problem with reuse - having written a potentially complex script it would clearly be desirable to share and allow some level of reuse but because the script is within a workflow it cannot be simply found as a networked service can be. Fortunately it is possible to share scripts by creating a workflow containing the script and making the workflow definition available online - this can then be used as the target for either a web crawl or single workflow scavenger which will in turn expose the script as a child element of the workflow scavenger. The script can then be treated exactly as any other processor located in this way.

4.3.5. Depending on third party libraries

4.3.5.1. Using dependencies

Just like in Java, in Beanshell you are allowed to reference existing classes using import statements. By default you should have access to the full Java Platform API so you should have no problems using say a java.util.HashSet. However, it is often the case that you already have some library provided by you or some third party that does what you want. If these libraries are available as JARs you can access these from within the Beanshell, by clicking the Dependencies tab.

The dialogue should give you the location of the library folder into which you must copy the needed JARs. Note that you also have to copy the dependencies of that library again. (Taverna does have support for using Maven repositories for this purpose, but this is unfortunately not yet represented in the GUI dialogues).

The library folder is the subdirectory lib within taverna.home, which default location depends on your operating system.

After copying, close and open the Configure dialogue again, and the Dependencies should allow you to tick off the required JAR files. Different processors in the workflow, just as different workflows, can depend on different JAR files without getting conflicts.

The relative filenames will be stored in your workflow, so that if you open the workflow with another Taverna installation that doesn't have hello.jar installed, that entry will be listed in red in the dialogue to indicate that it is missing.

Important

Workflows with dependencies are inherently more difficult to share with other Taverna users, as other users would also need to download and install the dependencies.

4.3.5.2. Dependency classloaders

Caution

This section can be quite technical even for hard-core Java programmers.

Normally the default settings will be sufficient for the simple cases. However, if you have several beanshell with dependencies that are to cooperate using a more complex API, or the library you depend on have complex initialisation routines or store state in static variables, you might want stronger control over how the classes are loaded.

The default classloader persistence is Shared over iteration, which means that the classes are loaded for each workflow run, for each processor. That means if you have two beanshell processor in your workflow that depend on hello.jar, when you run the workflow each of the processors (and hence their Beanshell scripts) will see their classes freshly loaded. This isolation ensures that you don't get a 'dirty' class, and is neccessary in some cases to avoid thread lock problems with static methods. (Remember that several processors might execute in parallell in a Taverna workflow). However, some libraries depend on static members for sharing state, and if this is what you desire for your workflow you might want to consider some of the other classloader persistence options.

Always fresh

The classes are loaded fresh for each iteration of each processor. Although this option is slow, it guarantees that each iteration is executed in isolation with regards to the dependency classes. This option is generally not recommended.

Shared over iteration (default)

The classes are loaded fresh for each processor in the workflow, for each workflow run. As each processor is executed in isolation, beanshells can execute in parallell even when accessing non-thread-safe static methods, and can have different transient dependencies, for instance two different version of an XML library. Processors can't share state through static members.

Shared for whole workflow

The classes are loaded fresh for each workflow run, but are shared between all processors with this persistence option. The JAR files that are searched is the union of all the selections of workflow-shared processors. Normally this means that you only need to tick off the required JAR files in one of the processors, as long as all of them have Shared for whole workflow set. This option allows the dependency to share state through internal static members, and so the behaviour of one beanshell might depend on the behaviour of another. This is not recommended for scientificly sound provenance, but the isolation level is still at the workflow run so that each workflow is run with fresh classes. Try this option if Shared over iteration fails and you have several beanshell processors accessing the same API.

System classloader

The classes are loaded using the system classloader. This means they are only ever loaded once, even if you run several workflows or re-run a workflow. This option is generally only recommended as a last resort or if you are accessing JNI-based native libraries, which in their nature can only be loaded once. Notice that if you don't use the normal Taverna startup script you will have to add the JAR files to the -classpath. See the section on JNI-based libraries for more information.

In general we recommend using Shared over iteration (the default for Beanshell), or if required, Shared for whole workflow (the default for the API consumer).

4.3.5.3. JNI-based native libraries

JNI-based libraries is a way for Java programs to access natively compiled code, typically written in languages such as C, C++ or Fortran. Even if you don't depend on such a library, one of your dependencies might. A JNI-based library is normally identified by an extension such as .jnilib instead of .jar. Compiling and building JNI libraries is out of the scope for this documentation, but we'll cover how to access such libraries from within Taverna. In this section we will assume a Java library hello.jar that depends on some native functions in hello.jnilib. To complicate matters, our hello.jnilib again depends on the native dynamic library fish.dll / libfish.so / libfish.dylib (Pick your favourite extension depending on the operating system).

First of all you need to make a decission as to where to install the libraries. We generally recommend installing the .jnilib files in the same location as the .jar files (ie. in lib in your home directory's Taverna folder), as described in the section Using dependencies, but since supporting JNI will require you to modify the Taverna startup scripts, you might want to install them to the folder lib in the Taverna installation directory instead. Here we will assume the home directory solution.

In the Taverna installation directory, locate runme.bat or runme.sh, depending on your operating system. Open this file in a decent editor.

You need to add a few lines to set the library path so that the .jnilib can find it's dependencies. This step might not be required if you have no .dll/.so/.dylib files in addition to the .jnilib file, but it might be if you have more than one .jnilib file. We'll here set the dynamic library path to be the lib in your Taverna home directory.

In addition, we're going to modify the Java startup parameters to set the system property java.library.path which tells Java where to look for .jnilib files. Since both paths and variable names vary with operating system we'll show the modifications for Window, Linux and OS X.

Windows

In the Taverna installation folder, find and edit runme.bat with your favourite editor (in worst case Notepad), and add/modify the lines in bold.

@echo off

rem Set to %~dp0\lib for shared installations
set LIB_PATH="%APP_DATA\Taverna-1.7.1\lib"
set PATH=%PATH%:%LIB_PATH%

set ARGS=-Xmx300m "-Djava.library.path=%LIB_PATH"
set ARGS=%ARGS% -Djava.system.class.loader=net.sf.taverna.tools.BootstrapClassLoader
(..)

Linux

In the Taverna installation folder, find end edit runme.sh with your favourite editor, and add/modify the lines in bold.

(..)
TAVERNA_HOME="`dirname "$PRG"`"
cd "$saveddir"

# Set to $TAVERNA_HOME/lib for shared installation
LIB_PATH="$HOME/.taverna-1.7.1/lib"

LD_LIBRARY_PATH="$LIB_PATH"
export LD_LIBRARY_PATH

ARGS="-Xmx300m -Djava.library.path=$LIB_PATH"
ARGS="$ARGS -Djava.system.class.loader=net.sf.taverna.tools.BootstrapClassLoader"
(..)

Mac OS X

On Mac OS a startup script is not used, Taverna is wrapped in an application bundle, which is a kind of directory. In particular if dependency on dynamic libraries are needed, we recommend you install the JNI libraries inside the bundle. However the JAR files, must be installed as explained in Using dependencies.

Technically you can instead use almost the same solution as in Linux, but you would have to start Taverna using the command line DY_LD_LIBRARY_PATH=$HOME/Library/Application\ Support/Taverna-1.7.1/lib/ /Applications/Taverna.app/Contents/MacOS/JavaApplicationStub

Use the Terminal and change directory to inside the Taverna.app bundle (commands are shown in bold):

: stain@mira ~; cd /Applications/Taverna.app/
: stain@mira /Applications/Taverna.app; ls
Contents
: stain@mira /Applications/Taverna.app; cd Contents/MacOS/
: stain@mira /Applications/Taverna.app/Contents/MacOS; ls
JavaApplicationStub  dataviewer.sh  dot  executeworkflow.sh

or in the Finder, right-click (or control-click) on the Taverna icon in Applications and select Show Package Contents.

Navigate down to Contents/MacOS. This is where we will copy in our jni and dylib files, in this example libhello.jnilib and libfish.dylib.

: stain@mira /Applications/Taverna.app/Contents/MacOS; cp ~/src/jnitest/lib* .
: stain@mira /Applications/Taverna.app/Contents/MacOS; ls
JavaApplicationStub  dataviewer.sh  dot  executeworkflow.sh  libfish.dylib  libhello.jnilib

However, in order for libhello.jnilib to find the dependency to libfish.dylib we will have to use the Terminal and modify the library path using install_name_tool. We'll use otool to inspect the paths.

: stain@mira /Applications/Taverna.app/Contents/MacOS; otool -L libhello.jnilib 
libhello.jnilib:
        libhello.jnilib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.5.1)
        libfish.dylib (compatibility version 0.0.0, current version 0.0.0)

: stain@mira /Applications/Taverna.app/Contents/MacOS; install_name_tool -change libfish.dylib @executable_path/libfish.dylib libhello.jnilib

: stain@mira /Applications/Taverna.app/Contents/MacOS; otool -L libhello.jnilib 
libhello.jnilib:
        libhello.jnilib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.5.1)
        @executable_path/libfish.dylib (compatibility version 0.0.0, current version 0.0.0)

Why does this work? @executable_path is resolved to Contents/MacOS because Taverna's Java runtime is started by Contents/MacOS/JavaApplicationStub

If you experience errors, and want to check console for debug messages from your library, instead of double-clicking the Taverna icon, you can start it from the terminal. The example below shows the typical message of when libhello.jnilib is located, but some of it's dependencies again (libfish.dylib) can't be located, for instance because we didn't run the install_name_tool command:

: stain@mira ~; /Applications/Taverna.app/Contents/MacOS/JavaApplicationStub 
Warning: Incorrect memory size qualifier: mm Treating it as m
Exception in thread "Thread-29" java.lang.UnsatisfiedLinkError: /Applications/Taverna.app/Contents/MacOS/libhello.jnilib: 
   (..)