r3 - 28 Nov 2002 - 10:54:00 - MarkGreenwoodYou are here: myGrid wiki >  Mygrid Web  > WorkInProgress > ProvenanceData > MarkGreenwoodSpeculations

Questions

1 Is there going to be provenance information about all data in the myGrid repository (personal repository)? For example, this piece of data was URI xxx downloaded by user yyy on date zzz

2 Are we going to have non-repudiation? The position statement from WP2 WP2 provenance position takes this as essential. In the thoughts below I have been thinking of provenance as any metadata that indicates how results have been produced.

3 What uses to we envisage for the provenance data?

Thoughts from a workflow viewpoint

This is biased view of provenance within Mygrid. It is based on looking at the provenance records generated by the workflow enactment engine for the pre-prototype (and 0.1). (For an example see the attachments below.)

Identifiers

The most significant issue I think is how we deal with the issue of identifiers. Provenance fundamentally depends on the ability to attach an identity to an object. (Indeed this applies generally to annotating data with metadata.)

At the moment the provenance records generated by the workflow enactment engine have identity (ID) fields, but they are always given dummy values. This is because we don't currently have a way of deciding appropriate identifiers.

workflow input identification

Take an example data input, the swissprot ID "P04637". This could be just a swissprot id - that is it could have an identifier org.swissprot.protien#P04637 which has a value "P04637". On the other hand it could be the result of some previous experiment - it could have an identifier mybestexperiment.run1.result7 which has value "P04637" and has an annotation to say that it is a swissprot ID.

workflow result identification

The workflow result data will also need an identifier (rather than the current default -1). Is this just an identifier for the personal repository which holds the result data?

There must be an annotation for this identifier identifying the workflow provenance record. If there is an intermediate result, then it should be possible to a corresponding identifier and provenance record (up to the provenance result). Perhaps this should also have the additional annotation to indicate the workflow of which it is an intermediate part.

service identification

The WSFL workflow is composed of activities which are mapped onto specific operations provided by a web service. The mapping is between a WSFL service provider, which can provide several activities, and a web service. This mapping is either done statically, the WSFL service provider is mapped to a specific service WSDL, or dynamically, the WSFL service provider indicates the UDDI(-M?) request and the selection policy. For the static case the web service operation is identified by the WSDL file, the service port and operation name (and the operation input message name if the port has overloaded operations).

This may need some revision depending on how myGrid chooses to identify services. It is possible for a service to have multiple WSDL files describing it. In addition, it is possible to mirror services. I expect that we should look at how the I3C deals with mirrors of a database such as swissprot.

workflow description identification

The workflow description itself is data. It should have an identity so that it can be annotated with metadata: who wrote it, when, is it based on an earlier workflow description, and so on. It could be very useful to have a provenance record for a workflow description: who editied it, when, etc.

Service Invocation provenance

The 0.1 portal provided the ability for a user to directly invoke a service. This could generate a service provenance record to link the input and result. If a user does several such direct invocations then we want to be able to traverse the graph of service provenance records and create a corresponding workflow.


Concrete Provenance Examples


still to do
  • look at TalismanRAD provenance
  • look at LSID from I3C
  • identification of Grid services

-- MarkGreenwood - 28 Nov 2002


Attachments

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
xmlxml biofetch3_P13351_wflow_prov.xml manage 66.4 K 24 Jul 2006 - 09:39 MarkGreenwood provenance record for biofetch 3 from 0.0
xmlxml biofetch3_April02.xml manage 2.4 K 24 Jul 2006 - 09:39 MarkGreenwood biofetch 3 workflow from pre-prototype (0.0)
xmlxml ShowProvenanceXML.xml manage 12.4 K 24 Jul 2006 - 09:39 MarkGreenwood provenance for Talisman EMBOSS workflow
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback