r2 - 25 May 2006 - 15:45:00 - DanieleTuriYou are here: myGrid wiki >  Mygrid Web  > TavernaWorkbench > TavernaProvenance > TavernaProvenancePlugin > TavernaProvenancePluginZeroNine

Taverna Provenance Plugin v 0.9

Provenance is the metadata relative to the origin, production and usage of resources; the TavernaProvenancePlugin captures and records as RDF graphs the provenance information produced by the TavernaWorkbench during the enactment of a WorkFlow.

Provenance Ontology

The collection of the RDF graphs produced by the plugin forms instance data for the OWL provenance ontology. At the moment this ontology is "morally" just an RDF schema, ie a taxonomy of RDF classes and properties, and indeed the Sesame implementation of the plugin uses a translation to RDFS. The concepts in the ontology correspond mainly to concepts in Taverna's model. The plugin stores the ontology populated with the provenance data corresponding to the last enacted workflow in the NamedRDFGraphRepository.

Named Graphs

The provenance RDF graph corresponding to the enactment of a workflow is stored as a named graph using the NG4J (Named Graphs for Jena) API or Sesame 2. This allows to recover and refer to the entire graphs corresponding to workflow enactments without having to resort to the expensive RDF reification mechanism.

Queries

One of the benefits of recording provenance is that one can go back and query the repository for information about previous experiments. For instance, one can ask:

Check if a given process has already been run against a given input and, if that is the case, give me the output it generated.

Note that, unless the data itself is RDF stored in the same RDF repository, the query which actually return a reference to the data in the form of the LSID generated by Taverna for it.

Architecture

The plugin is linked to Taverna through ProvenanceGenerator, which implements WorkflowEventListener. At the beginning of a workflow enactment the generator initialises a Jena ProvenanceOntology object which is used during the enactment to type the provenance data which is generated by every WorkflowInstanceEvent sent by Taverna. At the end of the workflow the whole graph corresponding to the instance data in ProvenanceOntology is passed to NamedRDFGraphsRepository that stores it either in a MySQL database using NG4J or on the file system (native store) using Sesame2.

The repository can subsequently be queried either via canned queries such as DataForProcess and FailedProcesses or generic NG4J queries. Two named graphs query languages can be used at the moment:

  1. TriQL if you use NG4J
  2. SeRQL if you use Sesame 2

The following diagram illustrates the overall architecture:

architecture.png

Performance

The plugin does not affect Taverna's performance as it is executed asynchronously.

Requirements

  • Java 5 and above
  • Taverna 1.3.2-RC1 and above

Downloads

Binaries

  • Provenance and Browser Plugin 0.9
    • note that this is a pre-release
      • the browser (alpha) only works with Jena NG4J, not with Sesame
      • an official release will follow soon

Source

Anonymous CVS:

  • Host: cvs.mygrid.info
  • Path: /usr/local/cvs/mygrid
  • Module: mygrid/miasgrid/rdf-provenance

Installation

  1. Unzip the plugin Provenance and Browser Plugin 0.9 in TAVERNA_HOME (remember you need Taverna 1.3.2-RC1 or later).
  2. Edit the following properties in TAVERNA_HOME/conf/mygrid.properties.
    • If you use Sesame 2 then you do not need a database for the metadata and you just choose a directory for the property mygrid.kave.sesame.native.dir
    • If you want to use the ProvenanceBrowser you need access to MySQL database and comment the Sesame 2 properties while uncommenting the NG4J properties; edit the latter to adapt them to your implementation.
    • If you want to store intermediate results, you need to activate the BaclavaDataStore by uncommenting and editing the USER CONTEXT CONFIGURATION properties.

# TAVERNA HOME
taverna.home=/your/path/to/taverna-workbench-1.3.2-RC1

# RDF PROVENANCE CONFIGURATION

# Sesame 2 
mygrid.kave.persister = uk.ac.man.cs.img.mygrid.provenance.knowledge.store.NativeSesameRepository
mygrid.kave.model = uk.ac.man.cs.img.mygrid.provenance.knowledge.ontology.sesame.SesameProvenanceOntology
mygrid.kave.generate.provenance.schema=http://www.mygrid.org.uk/ontology/provenance.rdfs
mygrid.kave.sesame.native.dir = /tmp/sesame2

# Jena NG4J
#mygrid.kave.persister = uk.ac.man.cs.img.mygrid.provenance.knowledge.store.NG4JRepository
#mygrid.kave.model = uk.ac.man.cs.img.mygrid.provenance.knowledge.ontology.jena.JenaProvenanceOntology
#mygrid.kave.generate.provenance.ontology=http://www.mygrid.org.uk/ontology/provenance.owl

#mygrid.kave.jdbc.driver=com.mysql.jdbc.Driver
#mygrid.kave.jdbc.url=jdbc:mysql://rpc103.cs.man.ac.uk:3306/metadata_sandbox
#mygrid.kave.jdbc.user=anonymous
#mygrid.kave.jdbc.password=anonymous

# USER CONTEXT CONFIGURATION

# used only if not already set by other means
mygrid.usercontext.experimenter=http://www.cs.man.ac.uk/~dturi
mygrid.usercontext.organisation=http://www.cs.manchester.ac.uk


# DATA STORE CONFIGURATION

#taverna.datastore.class = org.embl.ebi.escience.baclava.store.JDBCBaclavaDataService

#taverna.datastore.jdbc.driver = com.mysql.jdbc.Driver
#taverna.datastore.jdbc.url = jdbc:mysql://rpc103.cs.man.ac.uk/data_sandbox
#taverna.datastore.jdbc.user = anonymous
#taverna.datastore.jdbc.password = anonymous

Troubleshooting and known bugs

  1. There is a minor bug in version 0.4 of NG4J which causes an error when the repository tables are created if you use MySQL version 4.0.21 and higher. The correct version is now in the NG4J cvs and will be included in the next release, but for the time being I have fixed it (just removed a comma!) and produced a modified ng4j jar which is included in this distribution.
  2. The ProvenanceBrowser only works with NG4J at the moment.
  3. If the ProvenanceBrowser throws exceptions or Taverna hangs, try and comment the property taverna.datastore.class in mygrid.properties.
  4. Note that the name of the ProvenanceGenerator listener is hard coded in the content of the file /src/META-INF/services/org.embl.ebi.escience.scufl.enactor.WorkflowEventListener. Refactoring that name will have to be reflected in that file.
  5. Failed processes are not always recorded as such.

Future Work

See TavernaDesiderata.

See also Provenance Requirements

Author

Acknowledgements

Based on the KAVE Taverna plugin by Chris Wroe and provenance work by Jun Zhao. See also Matthew Gamble's ProvenanceBrowser plugin and Ismael Juma's Ocula-based rdf-provenance browser.
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback