There are often cases in workflow construction where the output of one processor is not quite right for the input of the next. There are various options to cover these cases - the user can make use of so called shim services exposed in the same way as other operations, they could create a new service to perform the transformation or, for relatively simple cases, they could create a single non service component in the form of a script.
Beanshell scripts, as the name suggests, use the Beanshell scripting engine. This gives the user access to a form of interpreted Java; this section therefore assumes a minimal level of Java knowledge. For users who have never attempted Java programming we recommend the Java tutorial on Sun Microsystem's website at http://java.sun.com/docs/books/tutorial/. There are certain minor differences between the core language described there and the version used by the Beanshell, these are further documented at the Beanshell web site at http://www.beanshell.org/ - the good news is that almost all these differences make it easier to use than conventional Java; it's unlikely a typical user would ever encounter them however.
As an example of a simple script consider the following use case: Given three raw sequence strings (protein or nucleotide) create a single string containing the three sequences with FASTA format titles. For simplicity's sake assume that the titles are all fixed (although we could easily have the titles as parameters to the script).
Create a new Beanshell processor either by dragging the Advanced Model Explorer:
from the local services section of the service selection panel into the list of processors in theor by right-clicking and selecting
:We recommend that you change the name of the inserted beanshell
processor to something that describes what the processor will do. For our
example, something like FASTA_format_sequences
will
do.
The first things to configure, are the input and output ports of the new instance. The Beanshell configuration panel is accessible from the right click menu of the new processor, selecting
from the menu:This will open a new window containing options to configure the script itself and the inputs and outputs. Selecting the
tab allows you to create, remove and modify the types of the inputs and outputs to this processor. Input and output ports are the connection points between the workflow and the executed beanshell code. From a programming point of view, you can look at the input parameters as parameters to a function call, while output parameters are return values. Input to the processor will be available as variable names within the beanshell script (the names match the input port names), while output ports extract the value of the named variables after the script has executed.A new input is added by entering the name in the text box to the
right of the a Plain Text
corresponding
to a single string with no additional MIME type information. Although in
this case the default is the correct value this can be changed by
selecting either the Plain Text
at which a drop down
list will present the available types, or the a
, in
which case options are available to cycle through the collection types
such as a list of
. Leave the defaults for now and use
the port creation mechanism described above to create three inputs and one
output with sensible names:
Now the processor has the correct inputs and outputs the remaining task is to specify the logic connecting these together in the form of a script. Selecting the
tab makes available a syntax highlighting editor (based on JEdit) into which the user must enter a Beanshell compatible script:Having defined the available ports (both inputs and outputs) the
script engine will, when this processor is enacted, create new bound
variables in the scripting environment corresponding to the names of the
input ports. It will extract bound variables with names corresponding to
the output ports at the end of the invocation, and use the values of these
as the output values of the processor. In this case therefore the script
must make use of the variables named seq1
,
seq2
and seq3
and ensure that there
is a variable of the appropriate type called fasta
bound in the environment when the script completes. The types are
determined by the a
, a list of
...
options in the section - if the type is a
single Plain Text
the variable bound to it will be a
String
object, if a list of Plain
Text
the value will be a Java
List<String>
implementation where the items in
the List
are String
objects and so
on. Corresponding logic applies to the output - if the
section declares that there is an output
called fasta
with type a Plain Text
the script must, before it completes, define a String
called fasta
containing that result value.
The screenshot above showed a script (more verbose than strictly required) which fulfils this contract and performs the desired function, those familiar with Java will realise that this could be done in a single line.
Changes to the script are immediately saved ot the workflow. Note that if you are using Mac OS X, instead of the usual clipboard shortcuts such as Apple-C for Copy, you will have to use Windows-style Ctrl-C within the script editor.
Because the Beanshell processor only exists as part of a workflow (unlike, for example, a SOAP service which exists on some networked resource) there is a potential problem with reuse - having written a potentially complex script it would clearly be desirable to share and allow some level of reuse but because the script is within a workflow it cannot be simply found as a networked service can be. Fortunately it is possible to share scripts by creating a workflow containing the script and making the workflow definition available online - this can then be used as the target for either a web crawl or single workflow scavenger which will in turn expose the script as a child element of the workflow scavenger. The script can then be treated exactly as any other processor located in this way.
Just like in Java, in Beanshell you are allowed to reference
existing classes using import
statements. By default
you should have access to the full Java Platform API
so you should have no problems using say a
java.util.HashSet
. However, it is often the case that
you already have some library provided by you or some third party that
does what you want. If these libraries are available as
JAR
s you can access these from within the Beanshell,
by clicking the tab.
The dialogue should give you the location of the library folder into which you must copy the needed JARs. Note that you also have to copy the dependencies of that library again. (Taverna does have support for using Maven repositories for this purpose, but this is unfortunately not yet represented in the GUI dialogues).
The library folder is the subdirectory lib
within taverna.home
, which default
location depends on your operating system.
After copying, close and open the
dialogue again, and the should allow you to tick off the required JAR files. Different processors in the workflow, just as different workflows, can depend on different JAR files without getting conflicts.The relative filenames will be stored in your workflow, so that if
you open the workflow with another Taverna installation that doesn't
have hello.jar
installed, that entry will be listed
in red in the dialogue to indicate that it is missing.
Workflows with dependencies are inherently more difficult to share with other Taverna users, as other users would also need to download and install the dependencies.
This section can be quite technical even for hard-core Java programmers.
Normally the default settings will be sufficient for the simple
cases. However, if you have several beanshell with dependencies that are
to cooperate using a more complex API, or the library you depend on have
complex initialisation routines or store state in
static
variables, you might want stronger control
over how the classes are loaded.
The default classloader persistence is
Shared over iteration, which means that the classes
are loaded for each workflow run, for each processor. That means if you
have two beanshell processor in your workflow that depend on
hello.jar
, when you run the workflow each of the
processors (and hence their Beanshell scripts) will see their classes
freshly loaded. This isolation ensures that you don't get a 'dirty'
class, and is neccessary in some cases to avoid thread lock problems
with static methods. (Remember that several processors might execute in
parallell in a Taverna workflow). However, some libraries depend on
static members for sharing state, and if this is what you desire for
your workflow you might want to consider some of the other classloader
persistence options.
The classes are loaded fresh for each iteration of each processor. Although this option is slow, it guarantees that each iteration is executed in isolation with regards to the dependency classes. This option is generally not recommended.
The classes are loaded fresh for each processor in the
workflow, for each workflow run. As each processor is executed in
isolation, beanshells can execute in parallell even when accessing
non-thread-safe static methods, and can have different transient
dependencies, for instance two different version of an XML
library. Processors can't share state through
static
members.
The classes are loaded fresh for each workflow run, but are shared between all processors with this persistence option. The JAR files that are searched is the union of all the selections of workflow-shared processors. Normally this means that you only need to tick off the required JAR files in one of the processors, as long as all of them have
set. This option allows the dependency to share state through internal static members, and so the behaviour of one beanshell might depend on the behaviour of another. This is not recommended for scientificly sound provenance, but the isolation level is still at the workflow run so that each workflow is run with fresh classes. Try this option if fails and you have several beanshell processors accessing the same API.The classes are loaded using the system classloader. This
means they are only ever loaded once, even if you run several
workflows or re-run a workflow. This option is generally only
recommended as a last resort or if you are accessing JNI-based
native libraries, which in their nature can only be loaded once.
Notice that if you don't use the normal Taverna startup script you
will have to add the JAR files to the
-classpath
. See the section on JNI-based libraries
for more information.
In general we recommend using
(the default for Beanshell), or if required, (the default for the API consumer).JNI-based libraries is a way for Java programs to access natively
compiled code, typically written in languages such as C, C++ or Fortran.
Even if you don't depend on such a library, one of your dependencies
might. A JNI-based library is normally identified by an extension such
as .jnilib
instead of .jar
.
Compiling and building JNI libraries is out of the scope for this
documentation, but we'll cover how to access such libraries from within
Taverna. In this section we will assume a Java library
hello.jar
that depends on some native functions in
hello.jnilib
. To complicate matters, our
hello.jnilib
again depends on the native dynamic
library fish.dll
/ libfish.so
/ libfish.dylib
(Pick your favourite extension
depending on the operating system).
First of all you need to make a decission as to where to install
the libraries. We generally recommend installing the
.jnilib
files in the same location as the
.jar
files (ie. in lib
in your
home directory's Taverna folder), as described in the section Using dependencies, but since
supporting JNI will require you to modify the Taverna startup scripts,
you might want to install them to the folder lib
in
the Taverna installation directory instead. Here we will assume the home
directory solution.
In the Taverna installation directory, locate
runme.bat
or runme.sh
,
depending on your operating system. Open this file in a decent
editor.
You need to add a few lines to set the library path so that the
.jnilib
can find it's dependencies. This step might
not be required if you have no
.dll
/.so
/.dylib
files in addition to the .jnilib
file, but it might
be if you have more than one .jnilib
file. We'll here
set the dynamic library path to be the lib
in your
Taverna home directory.
In addition, we're going to modify the Java startup parameters to
set the system property java.library.path
which tells
Java where to look for .jnilib
files. Since both
paths and variable names vary with operating system we'll show the
modifications for Window, Linux and OS X.
Windows
In the Taverna installation folder, find and edit
runme.bat
with your favourite editor (in worst case
Notepad), and add/modify the lines in bold.
@echo off rem Set to %~dp0\lib for shared installations set LIB_PATH="%APP_DATA\Taverna-1.7.1\lib" set PATH=%PATH%:%LIB_PATH% set ARGS=-Xmx300m "-Djava.library.path=%LIB_PATH" set ARGS=%ARGS% -Djava.system.class.loader=net.sf.taverna.tools.BootstrapClassLoader (..)
Linux
In the Taverna installation folder, find end edit
runme.sh
with your favourite editor, and add/modify
the lines in bold.
(..)
TAVERNA_HOME="`dirname "$PRG"`"
cd "$saveddir"
# Set to $TAVERNA_HOME/lib for shared installation
LIB_PATH="$HOME/.taverna-1.7.1/lib"
LD_LIBRARY_PATH="$LIB_PATH"
export LD_LIBRARY_PATH
ARGS="-Xmx300m -Djava.library.path=$LIB_PATH"
ARGS="$ARGS -Djava.system.class.loader=net.sf.taverna.tools.BootstrapClassLoader"
(..)
Mac OS X
On Mac OS a startup script is not used, Taverna is wrapped in an application bundle, which is a kind of directory. In particular if dependency on dynamic libraries are needed, we recommend you install the JNI libraries inside the bundle. However the JAR files, must be installed as explained in Using dependencies.
Technically you can instead use almost the same solution as in
Linux, but you would have to start Taverna using the command line
DY_LD_LIBRARY_PATH=$HOME/Library/Application\
Support/Taverna-1.7.1/lib/
/Applications/Taverna.app/Contents/MacOS/JavaApplicationStub
Use the Terminal and change directory to
inside the Taverna.app
bundle (commands are shown in
bold
):
: stain@mira ~; cd /Applications/Taverna.app/ : stain@mira /Applications/Taverna.app; ls Contents : stain@mira /Applications/Taverna.app; cd Contents/MacOS/ : stain@mira /Applications/Taverna.app/Contents/MacOS; ls JavaApplicationStub dataviewer.sh dot executeworkflow.sh
or in the Finder, right-click (or control-click) on the Taverna icon in Applications and select .
Navigate down to Contents/MacOS
. This is where
we will copy in our jni
and dylib
files, in this example libhello.jnilib
and
libfish.dylib
.
: stain@mira /Applications/Taverna.app/Contents/MacOS; cp ~/src/jnitest/lib* . : stain@mira /Applications/Taverna.app/Contents/MacOS; ls JavaApplicationStub dataviewer.sh dot executeworkflow.sh libfish.dylib libhello.jnilib
However, in order for libhello.jnilib
to find
the dependency to libfish.dylib we will have to use the
Terminal and modify the library path using
install_name_tool
. We'll use otool
to inspect the paths.
: stain@mira /Applications/Taverna.app/Contents/MacOS; otool -L libhello.jnilib libhello.jnilib: libhello.jnilib (compatibility version 0.0.0, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.5.1) libfish.dylib (compatibility version 0.0.0, current version 0.0.0) : stain@mira /Applications/Taverna.app/Contents/MacOS; install_name_tool -change libfish.dylib @executable_path/libfish.dylib libhello.jnilib : stain@mira /Applications/Taverna.app/Contents/MacOS; otool -L libhello.jnilib libhello.jnilib: libhello.jnilib (compatibility version 0.0.0, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.5.1) @executable_path/libfish.dylib (compatibility version 0.0.0, current version 0.0.0)
Why does this work? @executable_path
is
resolved to Contents/MacOS
because Taverna's Java
runtime is started by
Contents/MacOS/JavaApplicationStub
If you experience errors, and want to check console for debug
messages from your library, instead of double-clicking the Taverna icon,
you can start it from the terminal. The example below shows the typical
message of when libhello.jnilib
is located, but some
of it's dependencies again (libfish.dylib
) can't be
located, for instance because we didn't run the
install_name_tool
command:
: stain@mira ~; /Applications/Taverna.app/Contents/MacOS/JavaApplicationStub
Warning: Incorrect memory size qualifier: mm Treating it as m
Exception in thread "Thread-29" java.lang.UnsatisfiedLinkError: /Applications/Taverna.app/Contents/MacOS/libhello.jnilib:
(..)