NamedRDFGraphRepository.
The provenance RDF graph corresponding to the enactment of a workflow is stored as a named graph using the NG4J (Named Graphs for Jena) API or Sesame 2. This allows to recover and refer to the entire graphs corresponding to workflow enactments without having to resort to the expensive RDF reification mechanism.
One of the benefits of recording provenance is that one can go back and query the repository for information about previous experiments. For instance, one can ask:
Check if a given process has already been run against a given input and, if that is the case, give me the output it generated.
Note that, unless the data itself is RDF stored in the same RDF repository, the query which actually return a reference to the data in the form of the LSID generated by Taverna for it.
The plugin is linked to Taverna through
ProvenanceGenerator, which implements
WorkflowEventListener. At the beginning of a
workflow enactment the generator initialises a Jena
ProvenanceOntology object which is used during
the enactment to type the provenance data which is generated
by every WorkflowInstanceEvent sent by Taverna.
At the end of the workflow the whole graph corresponding to
the instance data in ProvenanceOntology is
passed to NamedRDFGraphsRepository that stores
it either in a MySQL database using NG4J or on the file system
(native store) using Sesame2.
The repository can subsequently be queried either via canned
queries such as DataForProcess and
FailedProcesses or generic NG4J queries.
Two
named graphs query languages can be used at the moment:
TAVERNA_HOME (remember you need Taverna 1.3.2-RC1 or later).
TAVERNA_HOME/conf/mygrid.properties. mygrid.kave.sesame.native.dir
USER CONTEXT CONFIGURATION properties.
# TAVERNA HOME taverna.home=/your/path/to/taverna-workbench-1.3.2-RC1 # RDF PROVENANCE CONFIGURATION # Sesame 2 mygrid.kave.persister = uk.ac.man.cs.img.mygrid.provenance.knowledge.store.NativeSesameRepository mygrid.kave.model = uk.ac.man.cs.img.mygrid.provenance.knowledge.ontology.sesame.SesameProvenanceOntology mygrid.kave.generate.provenance.schema=http://www.mygrid.org.uk/ontology/provenance.rdfs mygrid.kave.sesame.native.dir = /tmp/sesame2 # Jena NG4J #mygrid.kave.persister = uk.ac.man.cs.img.mygrid.provenance.knowledge.store.NG4JRepository #mygrid.kave.model = uk.ac.man.cs.img.mygrid.provenance.knowledge.ontology.jena.JenaProvenanceOntology #mygrid.kave.generate.provenance.ontology=http://www.mygrid.org.uk/ontology/provenance.owl #mygrid.kave.jdbc.driver=com.mysql.jdbc.Driver #mygrid.kave.jdbc.url=jdbc:mysql://rpc103.cs.man.ac.uk:3306/metadata_sandbox #mygrid.kave.jdbc.user=anonymous #mygrid.kave.jdbc.password=anonymous # USER CONTEXT CONFIGURATION # used only if not already set by other means mygrid.usercontext.experimenter=http://www.cs.man.ac.uk/~dturi mygrid.usercontext.organisation=http://www.cs.manchester.ac.uk # DATA STORE CONFIGURATION #taverna.datastore.class = org.embl.ebi.escience.baclava.store.JDBCBaclavaDataService #taverna.datastore.jdbc.driver = com.mysql.jdbc.Driver #taverna.datastore.jdbc.url = jdbc:mysql://rpc103.cs.man.ac.uk/data_sandbox #taverna.datastore.jdbc.user = anonymous #taverna.datastore.jdbc.password = anonymous
4.0.21 and higher. The correct version is now in the NG4J cvs and will be included in the next release, but for the time being I have fixed it (just removed a comma!) and produced a modified ng4j jar which is included in this distribution.
taverna.datastore.class in mygrid.properties.
ProvenanceGenerator listener is hard coded in the content of the file /src/META-INF/services/org.embl.ebi.escience.scufl.enactor.WorkflowEventListener. Refactoring that name will have to be reflected in that file.