r2 - 05 Sep 2006 - 14:06:14 - DanieleTuriYou are here: myGrid wiki >  Mygrid Web  > TavernaWorkbench > TavernaProvenance > TavernaProvenancePlugin > TavernaProvenancePluginOneOne

Taverna Provenance Plugin v 1.1

Provenance is the metadata relative to the origin, production and usage of resources; the TavernaProvenancePlugin captures and records as RDF graphs the provenance information produced by the TavernaWorkbench during the enactment of a WorkFlow.

Provenance Ontology

The collection of the RDF graphs produced by the plugin forms instance data for the OWL provenance ontology. At the moment this ontology is "morally" just an RDF schema, ie a taxonomy of RDF classes and properties, and indeed the Sesame implementation of the plugin uses a translation to RDFS. The concepts in the ontology correspond mainly to concepts in Taverna's model. The plugin stores the ontology populated with the provenance corresponding to the last enacted workflow via a MetadataService.

Named Graphs

The provenance RDF graph corresponding to the enactment of a workflow is stored as a named graph using the NG4J (Named Graphs for Jena) API. This allows to recover and refer to the entire graphs corresponding to workflow enactments without having to resort to the expensive RDF reification mechanism.

Queries

One of the benefits of recording provenance is that one can go back and query the repository for information about previous experiments. For instance, one can ask:

Check if a given process has already been run against a given input and, if that is the case, give me the output it generated.

Architecture

The plugin is linked to Taverna through ProvenanceGenerator, which implements WorkflowEventListener. At the beginning of a workflow enactment the generator initialises a Jena ProvenanceOntology object which is used during the enactment to type the provenance data which is generated by every WorkflowInstanceEvent sent by Taverna. At the end of the workflow the whole graph corresponding to the instance data in ProvenanceOntology is passed to MetadataService that stores it in database.

Optionally, a DataService can be activated too which stores all input and output data, including the intermediate ones.

The metadata repository can subsequently be queried either via canned queries such as DataForProcess and FailedProcesses or generic queries expressed in TriQL query language for named graphs.

The following diagram illustrates the overall architecture:

architecture-1.1.png

Performance

The plugin does not affect Taverna's performance as it runs asynchronously.

Requirements

  • Java 5 and above
  • Taverna 1.4 and above

Downloads

Binaries

Source

Anonymous CVS:

  • Host: cvs.mygrid.info
  • Path: /usr/local/cvs/mygrid
  • Module: mygrid/miasgrid/rdf-provenance

Installation

The plugin comes as default with the Taverna 1.4 distribution. You can also download it separately, in which case:

  • Download or build the plugin, and then unzip it in TAVERNA_HOME (remember you need Taverna 1.4 or later) and start Taverna.

Configuration

  • The configuration file is TAVERNA_HOME/conf/provenance.properties.
  • By default, the provenance plugin uses Hypersonic's 100% Java database.
    • This means that both data and metadata will be stored on your file system with no further installation required.
    • You can control the location of data and metadata stores by changing the values of the properties mygrid.dataservice.hsql.url and mygrid.metadataservice.hsql.url respectively.
  • You can change the value of mygrid.kave.type from jena/hypersonic to jena/mysql to use the MySQL metadata store. Make sure that you configure the corresponding properties using your connection values.
  • Similarly, you can change the value of mygrid.dataservice.type from hypersonic to mysql to use MySQL as database for the data.

Here are the default properties:

##====================================================================
# STORES TYPES #

# Metadata Store 
#--------------------------------------------------------------------
     mygrid.kave.type = jena/hypersonic
#    mygrid.kave.type = jena/mysql
#--------------------------------------------------------------------

# Data Store 
#--------------------------------------------------------------------
    mygrid.dataservice.type = hypersonic
#   mygrid.dataservice.type = mysql
#--------------------------------------------------------------------

#====================================================================
# METADATA SERVICE CONNECTIONS #

# Jena/Hypersonic
#--------------------------------------------------------------------
# By defauly it stores data in user.home/provenance/metadata/hsql/tables
# Uncomment and edit the following if you want a different location:
#   mygrid.metadataservice.hsql.url = jdbc:hsqldb:file:your/path/to/provenance/metadata/hsql/tables
   mygrid.metadataservice.hsql.user = sa
   mygrid.metadataservice.hsql.password = 
#--------------------------------------------------------------------

# Jena/MySQL (requires access to a MySQL DB)
#--------------------------------------------------------------------
   mygrid.metadataservice.mysql.url = jdbc:mysql://rpc103.cs.man.ac.uk/metadata_sandbox
   mygrid.metadataservice.mysql.user = anonymous
   mygrid.metadataservice.mysql.password = anonymous
#--------------------------------------------------------------------

#====================================================================
# DATA SERVICE CONNECTIONS #

# Hypersonic
#--------------------------------------------------------------------
# By defauly it stores data in user.home/provenance/data/hsql/tables
# Uncomment and edit the following if you want a different location:
#   mygrid.dataservice.hsql.url = jdbc:hsqldb:file:your/path/to/provenance/data/hsql/tables
   mygrid.dataservice.hsql.user = sa
   mygrid.dataservice.hsql.password = 
#--------------------------------------------------------------------

# MySQL (requires access to a MySQL DB)
#--------------------------------------------------------------------
   mygrid.dataservice.mysql.url = jdbc:mysql://rpc103.cs.man.ac.uk/data_sandbox
   mygrid.dataservice.mysql.user = anonymous
   mygrid.dataservice.mysql.password = anonymous
#--------------------------------------------------------------------

# Derby
#--------------------------------------------------------------------
   mygrid.dataservice.derby.url = jdbc:derby:provenance/data/derby/tables;create=true
   mygrid.dataservice.derby.user = 
   mygrid.dataservice.derby.password = 
#--------------------------------------------------------------------

#====================================================================
# USER CONTEXT CONFIGURATION #
# used only if not already set by other means
#--------------------------------------------------------------------
   mygrid.usercontext.experimenter = http://www.someplace/someuser
   mygrid.usercontext.organisation = http://www.someplace/somelab
#--------------------------------------------------------------------


Troubleshooting and known bugs

  1. The simplest way to disable the plugin is to rename the provenance.properties file in the conf folder in Taverna. Detailed instructions at: ProvenanceBrowser#Troubleshooting
  2. If provenance collection is filling your disk space, just move or delete the provenance folder which is created in the folder .taverna in your home directory. (In Windows XP this is C:\Documents and Settings\your user name\.taverna\provenance, while in Mac OS X it is /Users/your user name/.taverna ). Detailed instructions at: ProvenanceBrowser#Troubleshooting
  3. The plugin is still experimental and might fail to record all steps, in particular if the workflow fails or has iterations of nested workflows. (It is more reliable on Linux than on Windows.)

Notes for developers

  1. Note that the name of the ProvenanceGenerator listener is hard coded in the content of the file /src/META-INF/services/org.embl.ebi.escience.scufl.enactor.WorkflowEventListener. Refactoring that name will have to be reflected in that file.
  2. ProvenancePluginDistribution

Future Work

See TavernaDesiderata.

See also Provenance Requirements

Author

Acknowledgements

Stian Soiland helped during the last phase of development. Thanks also to the other members of the myGrid team, in particular Matthew Gamble (author of the ProvenanceBrowser), David Withers, Stuart Owen, Jun Zhao, and Ismael Juma (author of an earlier provenance browser). This work is based on the KAVE Taverna plugin by Chris Wroe.

Code debugged using the YourKit Java Profiler.

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback