Minutes of Access Grid meeting, 14 Mar 2003
Attendees
EBI: Peter Rice, Alan Robinson
Manchester: Mark Greenwood, Phil Lord, Nick Sharman
Newcastle: Peter Li, Savas Parastatidis, Anil Wipat
Nottingham: Kevin Glover, Chris Greenhalgh
Southampton: Ananth Krishna, Simon Miles
Agenda
The purpose of the meeting was to walk through
AlansLabBookStoryBoard (see also Chris G's annotated version
AlansLabBookStoryBoardWP6) and match it with the information model (and especially that part in the
MyGridInformationRepository), services and interactions identified so far.
Alan's exploration of the Graves' disease annotation task is also relevant - see
AlansAnnotationPipelineWalkThrough.
Discussion
(The numbered paragraphs in
boldface and the
italicised text are taken from
AlansLabBookStoryBoard)
1. I log onto the myGrid system to give me access to my mIR & other services.
In the short term, user authentication in will be by simple username/password passed
through the Gateway (GW) to the MIR. We don't support authentication
to external applications (so e.g. Affymetrix data will have to be
brought in-house).
2. I check the status of any workflows that I started previously to see if they are now finished & I can check the result. I may also have notifications detailing people or services that have annotated the contents of my mIR, e.g. my supervisor has signed off on the work I did last week with her digital signature. N.B. These annotations need not be stored in my personal mIR, however I must have given something the permission to look at my mIR. Something may have created their own annotations on the contents of my mIR, but decided to keep these private.
The enactment engine (EE) does not curently itself generate any status-change notifications, so we have three options:
- Hand-craft workflows to include explicit notification steps
- Transform workflows to include notification steps on submission
- Make the lab book poll known workflows instances for status
For options (1) and (2), we need some means of creating new or choosing existing notification topics. Milena is currently experimenting with option (1) and will address this issue.
We see the GW as being persistently active, so that it can catch notifications as they arrive and present them to users as they log in. The GW will need to persist notifications in case of failure or shutdown. Currently, we expect to save them in a GW-specific store: should they be saved in the MIR instead? I.e., do they have intrinsic value rather than being essentially transient? The meeting concluded that notifcations should NOT be stored in the MIR (until there is a definite requirement for that approach).
We do not plan to generate notifications of updates to the MIR, but we can write to add extra queries on the MIR to identify changes (e.g. based on creation/modification dates). We could define specific MIR methods to capture such queries, but Chris G. proposes exposing the raw SQL query interface already present 'under the hood' in the MIR to avoid this requirement in the short term [reduces the amount of development for the MIR, but makes clients more dependent on the MIR's schema].
3. I browse through my projects, experiments, workflows.
[I imagine this browser to be analogous to those of an MP3 player where one can slice the data in different ways & choose to organise in different ways by artist, genre, album title, etc.]
The MIR schema will model projects, experiments and leaf Things as a heterarchy and the lab book and GW will support navigation through it. For IF-4, we do not expect to model any notion of project/workgroup membership or role, so support for the notion of "my" things will be limited. For example, data Things may be private to users, but workflows will be visible to all.
4. I create a new project (myProject1) with a title & a free-text description (provenance annotation). I assume that myGrid will record provenance metadata including date, time, user name, user group, default security permissions, the host from which I created the project. The provenance is associated with myProject1 & stored in the mIR.
The lab book will support creation of work contexts (projects and experiments) and the free-text annotation of all Things, including work contexts. As noted above, we do not plan to support user groups for IF-4, but all the other metadata is OK, although hostname is a little awkward (DHCP/NAT issues). It may however be useful when answering "now, where was I when I did this?" questions (though less useful when the answer is "wherever your WiFi-enabled laptop was").
5. I upload some data (myData1) that I've obtained to my mIR & that will be part of my new project (myProject1) (automatically trap basic provenance metadata of this operation & the data).
The lab book will support data upload via a paste region or a file chooser in the first instance. The lab book (or MIR?) will capture basic provenance (as above) automatically, and prompt for free-text annotation via a pop-up editor dialogue. We also need to associate or suggest syntactic (MIME, XSD, SOAP) and concept types with the data: possibly by REGEXP matching, possibly based on previous inputs (would require an audit trail).
Chris pointed out that, for relational databases, it is usual for security to be a protective shell around the application: once inside the shell, changes are not tied to their authors. We therefore need an explicit author/creator field in the repository for all annotations (as in DAS).
6. I attach some metadata to this uploaded data, e.g. a name, a description, a type (e.g. unordered collection of protein sequences), where did it come from, how was it produced. Perhaps also some other annotation, e.g. what do I think of its quality.
[At another time, I may want to know "what projects did I use myData1 in?" & "What things are associated with myProject1?".]
To what extent are annotations 'authoritative'? Simon pointed out that it's up to each user to interpret the value of any data. Chris G asked whether we need an 'authoritative' field. [Perhaps we could assume that the annotations made by the original's author are authoritiative?]
7. I have an idea of what I want to do with the data as part of an in silico experiment, so I create a new experiment (myExpt1) that is associated with my project (myProject1) with a title, free text description, plus the automatically trapped basic provenance metadata.
As
step (4) above.
8. Somehow I find a workflow template (WFTemplate1) that satisfies my requirements & describes the services that are to be used in this in silico experiment.
[I assume that this workflow template doesn't record service instances. I assume that each workflow template comprises a number of services that run without breakpoints, but that have parameters that need to be pre-configured. Within a workflow template, parameters for a service may have recommended values.]
A workflow definition comprises a job definition file (JDF) as well as a WSFL document, and the JDF identifies a semantic concept. To search on these using some arbitrary class expression, we could put the ontology's subsumption hierarchy in the MIR; this could be made browsable and queryable. Chris G will discuss this with Chris Wroe but the meeting was very much in favour of this approach.
It will be possible to edit the JDF for IF-4, to allow configuration patrameters to be changed.
9. I want to associate this workflow template (WFTemplate1) with my experiment (myExpt1), plus some annotation about why I've chosen this workflow template. I trust myGrid to have recorded in the mIR where WFTemplate1 came from, plus provenance metadata about time, date, permissions, etc.
Membership of a work concept will by recorded by an association (an instance of a
MetaData thing) that can contain a string annotation value.
[At another time I may want to find both "what experiments did I use WFTemplate1 in?" & "In myExpt1, what workflow templates did I use?".]
10. I need a registry service that can return specific instances of the services that the workflow template requires. The registry will return metadata about all service instances fitting the criteria. As well as criteria that the workflow template may require, e.g. a particular version of an application, I may have my own personal preferences, e.g. it has to be free.
[Is my interpretation of finding & configuring a workflow correct and/or reasonable? Before running a workflow, I personally want to know where it is running. Perhaps others have better faith? Perhaps only service instances that are trusted by my supervisor are present in the service repository & that is the basis of my trust? Will the workflow enactment engine be able to use all the service instances in the registry, i.e. will they have a suitable API?]
We expect the service registry and the enactment engine to be configured in to the myGrid system. [I'ts not yet clear how the user will choose an appropriate view of the registry.]
11. I may need to choose between alternative service instances (e.g. emma@EBI vs. emma@HGMP) using the metadata returned by the service registry & to be used in the workflow.
[Do I get to choose between different service instances? How much choice do I have? For example, I presume that the workflow enactment engine may only communicate with services using specific APIs. At what point is the check made that a service instance supports an API that the workflow enactment engine can use?]
The EE resolves dynamic service instances at the time they are needed, not 'up front'. This will be a problem when individual workflow steps are expected to be long-running.
The user selects one of a list presented by the user proxy. One approach is for the EE to pass the proxy a reference to the Registry, so that the user can access service metadata in the registry via the proxy.
12. I want to record why I made a choice in favour of particular service instances for this workflow, e.g. emma@EBI is free to anyone, but emma@HGMP is free to registered users only.
[To what do I attach the annotation about my choices of service instance? Is a part of the workflow annotatable? Or only the whole thing? I.e. is a service instance recognised in the mIR as a separate entity & to which metadata may be attached? I have a vision of a large XML file with multiple name spaces: how conflated is data & metadata in an XML representation of the workflow?]
We would need to annotate the service instance's endpoint URL, linked to the action in the workflow instance.
13. I configure each activity of the workflow with my choice of parameters or default to those recommended in the workflow for the service instance, as well as my starting data. I want to annotate why I made those choices on the parameters.
[Some type checking by myGrid should help prevent me doing incorrect things. Do I need to have a service instance before I can do the parameter configuration?]
As
step (8) above. We haven't yet investigated how EMBOSS ACD or similar can be used in this.
14. I want to store the details of my configured workflow (WFConfigured1) for this experiment (myExpt1) in the mIR.
15. I decide it's time to run my workflow.
[Do I have a choice of which workflow enactment service I use? The configured workflow is converted to a workflow enactment language suitable for my workflow enactment engine. How tightly coupled is the service registry, my choice of service instances & the workflow enactment engine? Could I end up choosing service instances with APIs that the enactment engine doesn't understand? Are only service instances with APIs that the enactment engine can understand present in the service registry?]
Currently the EE and the Registry are configured in to myGrid.
16. As each activity in the workflow is completed, I'd like to be notified & to have the intermediate results stored in the mIR & associated with WFConfigured1 so that I can find them easily. I expect that provenance metadata about the service instances which are run is stored along with the results: location, input parameters I specified, default parameters the service instance used, including resources that the service instance used, e.g. which version of SWISS-PROT did the BLAST server use. For each result stored in the mIR, as well as the usual provenance metadata (who, date, time), I expect that its metadata includes a syntactic & semantic type for the data (taken from where?).
The EE doesn't write intermediate results to the MIR: they are only present in the provenance record, which is not visible until the end of the workflow.
[Who writes the results & provenance metadata to my mIR? - The enactment engine? The service instance? The Gateway? Who do I trust with my credentials? I might trust my lab's workflow enactment engine to write to my mIR, but I probably wouldn't trust a service instance or a public enactment engine. Is it inefficient to have everything shuttled through the Gateway?
The EE accumulates provenance data and currently writes it directly to the MIR. For IF-4, it will make it available on completion so that the GW retrieves it and stores it in the MIR.
I would expect that the results generated from a service instance & stored to the mIR should be immutable to protect against fraud - maybe use PKI to monitor if the stored results are the same as those sent originally?]
Not for IF-4.
17. For each activity, I want to look at the results & record thoughts about them (i.e. annotate them). If I don't like the intermediate results & the workflow is still running, I may want to terminate it.
The use-case here is that a relatively low-cost step produces input for a following, expensive, action. The later action will be useless if the earlier step produces junk: the user prefers not to waste time/money running the later action if possible. However, the user does not want to 'nurse' the workflow until the earlier step completes but instead be informed it has completed (by a notification) and then kill the later action if necessary.
Since the intermediate results are not available until the workflow completes, it is currently necessary for a workflow to include an explicit store action to support this. Also, stopping the workflow will not automatically kill an invoked service invocation. We may be able to exploit SOAPLAB's Abort action where appropriate, but there is no general Web Service abort operation.
The meeting ran our of time at this point, and we agreed to resume the walkthrough at the next scheduled Access Grid meeting on 28 Mar 2003.
--
NickSharman - 21 Mar 2003