r6 - 06 Feb 2008 - 17:15:55 - StuartOwenYou are here: myGrid wiki >  Mygrid Web  > ProvenanceAndData > ProvenanceMeetingNorman180108

Provenance meeting with Norman 18/01/08

Minutes

Minutes are based on a mind map by Paolo

Idea: maximise product value but having a convergence of mature research ideas into as few products as possible, in order to satisfy as many use cases as possible.

Three legs of the provenance model:

  • Product:
    • Data Mananger
      • capturing initial, intermediate, final data
      • transaprent replication / caching planned
    • BioCatalogue?:
      • Service registry, Feta 2
      • Annotation model, RDF assertions based on ontologies
      • RDF store
      • nothing "bio" about it... entirely domain-neutral
      • interesting digression on a model-driven generation of a "curated registry toolkit"
    • myExperiment
      • probably many architectural elements in common with BioCatalogue?
    • no provenance in T2 - the point of this discussion.
    • EMOS - brief report on the current state of the EMO discussion.
  • Use cases:
    • result exploration, user-centric
      • reorganisation of intermediate / final results for the purpose of creating user views
    • profile / predict service behaviour, system-centric
      • re-run of a workflow selected from eg the myExp registry
      • derive automatic QoS? annotations of services by execution analysis
      • potentially supports late binding of services to alternate implemntations for the purpose of optimising execution
  • Research Threads:

Actions

Refine and validate use cases

Collect relevant research papers and provide links.

Refined use cases.

U1: Flexible result generation

Here the idea is that a single workflow may process data that can usefully be presented in different ways, and that the provision of a straightforward interface that allows users to describe a tabular format that can be populated from intermediate and final results of a workflow could be useful.

  • Data Lineage User runs a workflow, data passes through a workflow which generates several outputs. The user wants to know where these outputs originate from based on the inputs. eg. inputs X, Y, Z gives outputs A, B, C. Given output 'A' the user would want to know, eg. that it is generated because of inputs Y and Z (X was not a factor) etc. The user wants to understand the inner workings of a black box service - ie. the user wants to open up the box. The user wants minimal annotation of services to achieve this.

  • Publishing User runs a workflow and a filtered set of the provenance is pushed to a wiki/blog and presented in a customisable way. (Present provenance based on a user definable query in a Provenance Presentation Language over the set of data. Given this query the provenance can be displayed in different ways eg. tabular, gui, web etc. From a representation of this query the user wants to push either the full query or the parts relevant to them to the web (wiki, blog, myExp etc.) in a well defined manner. The query will also be stored/published for re-use. - need more user feedback.)

  • Sharing/Aggregation A group of users working on an experiment have produced some interesting results. They want to encapsulate and share their experiment as a whole: the workflow, supporting data, and provenance plus any other arbitary data. It should be possible to make a permanent record of the experiment for archival/publishing purposes. (this directly relates to myExperiment EMOs).

  • Comparison Comparison of provenance between workflow runs. User runs a workflow and then a month later runs the same workflow but gets different results. A comparison between the provenance logs should facilitate identifying the root of the change.

U2: Workflow and service reliability analysis

Here the idea is that it would be useful to annotate both services and workflows with information about their past behaviour that might be useful for users. For example, for services it might be useful to know when a service had last been used succesfully, when it was last called but found to be unavailable, what % of time it is unavailable, what typical reponse times are, etc. For a workflow, it might be useful to know when it was last used succesfully, what the recent reliability has been for the services in the workflow, how long it has typically taken to run, what the likelihood is that it will run to completion given the reliability of the services it contains, etc.

  • Design guidelines/late time binding Indirectly using provenance for performance metrics. Data may be useful to BioNanny?/BioCatalogue. User chooses service from provider A but past provenance shows that the same service from provider B is more reliable. An extension to this is that the user simply defines both services, and leaves it to the workflow engine to select the best service based upon the latest metrics known about those services.

  • Debugging The user runs a workflow and it behaves unexpectedly. The user inspects the provenance to help identify the cause of the problem. Given a set of provenance data the user wants to query over the results for the purpose of debugging a workflow.

  • BioNanny? refers to provenance from previous workflow runs and aggregates information about services which may help predict the reliability of that service.

Refined research threads.

Papers

(Currently in no particular order)

Adriane Chapman and HV Jagadish, Issues in Building Practical Provenance Systems, University of Michigan Ann Arbor, 2007

Wang-Chiew Tan, Provenance in Databases: Past, Current, and Future, UC Santa Cruz, 2007

Jun Zhao, Carole Goble, Robert Stevens, Daniele Turi, Mining Taverna's Semantic Web of Provenance, Concurency and Computation: Practice and Experience, 2007

K. Wolstencroft, P. Alper, D. Hull, C. Wroe, P.W. Lord, R.D. Stevens, C.A Goble, The myGrid Ontology: Bioinformatics Service Discovery, International Journal of Bioinformatics Research and Applications, Special Issue on Ontologies for Bioinformatics, Vol. 3, No. 3. (2007), pages 303-325, 2007

Hull, D. and Zolin, E. and Bovykin, A. and Horrocks, I. and Sattler, U. and Stevens, R., Deciding Semantic Matching of Stateless Services, Proc. of the 21st National Conference on Artificial Intelligence (AAAI’2006), 2006

L Moreau, J Freire, J Myers, J Futrelle, P Paulson, The Open Provenance Model, 2007

Susan Davidson, Sarah Cohen-Boulakia, Anat Eyal, Bertram Ludascher, Timothy McPhillips?, Shawn Bowers, and Juliana Freire, Provenance in Scientific Workflow Systems, IEEE Data Engineering Bulletin, 32(4), pages 44-50, 2007.

Cheung, K. and Hunter, J., Provenance Explorer-Customized Provenance Views Using Semantic Inferencing, 5th International Semantic Web Conference, ISWC2006, p.215-227,2006

R. Stevens, J. Zhao, and C. Goble, Using provenance to manage knowledge of In Silico experiments, Brief Bioinformatics, May 14, 2007

Carole Goble, Katy Wolstencroft, Antoon Goderis, Duncan Hull, Jun Zhao, Pinar Alper, Phillip Lord, Chris Wroe, Khalid Belhajjame, Daniele Turi, Robert Stevens and David De Roure, Knowledge discovery for in silico experiments with Taverna: Producing and consuming semantics on the Web of Science, Semantic Web: Revolutionising Knowledge Discovery in Life Sciences, Chris Baker. and Kei-Hoi Cheung (eds), ISBN: 978-0-387-48436-5 (2007)

Chen L., Shadbolt N.R., Goble C.A, A Semantic Web Based Approach to Knowledge Management for Grid, Applications, IEEE Transactions in Knowledge and Data Engineering, Special Issue: Semantic Web Era 19(2), pages 283-296, February 2007

Khalid Belhajjame, Suzanne M. Embury, Norman W. Paton, Robert Stevens, and Carole A. Goble, Automatic Annotation of Web Services Based on Workflow Definitions., Proc 5th International Semantic Web Conference, pages 116-129, 2006.

Jun Zhao, Chris Wroe, Carole Goble, Robert Stevens, Dennis Quan, Mark Greenwood, Using Semantic Web Technologies for Representing e-Science Provenance, Proc 3rd International Semantic Web Conference ISWC2004, Hiroshima, Japan, 9-11 Nov 2004

Jun Zhao, Carole Goble, and Robert Stevens, An Identity Crisis in the Life Science, Proceedings of the 3rd International Provenance and Annotation Workshops, pages 254-269, Springer LNCS, 2006

eBank UK study of provenance - a list of resources recommended by Carole (myGrid ones are out of date).

Other resources

SCOPE - OAI-ORE; Provenance Explorer (Jane Hunter)

DCC Keynote by Carole - curation of workflows and services which sums up the BioCatalogue? project, annotation etc

PASOA

Grid Provenance

Karma Provenance Framework

Vistrails.doc: Khalid's Vistrails Document

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
docdoc Vistrails.doc manage 33.5 K 25 Jan 2008 - 17:11 StuartOwen Khalid's Vistrails Document
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r6 < r5 < r4 < r3 < r2 | More topic actions
Mygrid.ProvenanceMeetingNorman180108 moved from Mygrid.ProvenanceMeetingNorman180107 on 23 Jan 2008 - 10:10 by StuartOwen - put it back
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback