Minutes of Access Grid meeting, 28 Mar 2003
Attendees
EBI: Alan Robinson
Manchester: Nedim Alpdemir, Chris Garwood, Carole Goble, Mark Greenwood, Phil Lord, Nick Sharman
Newcastle: Peter Li, Anil Wipat
Nottingham: Kevin Glover, Milena Radenkovic
Southampton: Simon Miles, Luc Moreau, Juri Papay, Terry Payne, Victor Tan
Agenda
- Discussion of semantic service discovery, requested by Simon. Specifically, this is about the So'ton/Manchester paper for the Semantic Web conference.
- Continue the walk through Alan's lab book storyboard - see AlansLabBookStoryBoard (see also Chris G's annotated version at AlansLabBookStoryBoardWP6) - and match it with the services and interactions identified so far.
Minutes of previous Access Grid meetings are at
AccessGridMinutes.
Discussion
Semantic service discovery
Simon & Phil outlined the relationship between the Semantic Find Service (SFS) and the Service Directory (SD). A number of generlapoints and questions were raised:
- Terry noted that is is useful to abstract from the mechanisms for getting the metadata (e.g. from WSDL, from the service itself, from a third party).
- Simon noted that the SD needed an extension to the UDDI interface for setting metadata on service entries.
- Phil noted that the SFS denormalizes and duplicates the SD's data: do we want to install a SFS within the SD's view mechanism?
- Alan noted that, for storing metadata in WSDL, the current extension points are protocol-specific, and so not useful for metadata that is protocol-independent (though support for service data elements in OGSI GWSDL would be OK). He asked if moving towards DAML+S was the answer, but Terry remarked that DAML+S refers to WSDL descriptions for the services themselves.
- Alan also noted that currently Axis doesn't deal correctly with all XSD types, in particular derived built-in types such as xsd:NonNegativeInteger.
The discussion then departed from the agenda to discuss type systems within myGrid, a topic we've been skirting round since the early daya of the project. We have identified:
- concept or semantic types taken from the service ontology
- representation types such as String
- format types such as FASTA
Chris Wroe's starting point for the concept types was the EMBOSS ACD language, which covers all three aspects to some degree (although format is often irrelevant, since EMBOSS tools are capable of coping with any 'reasonable' format). It was decided that a small subgroup should meet physically to investigate ACD further, with a view to adopting it as (a basis for) myGrid's type system. [The meeting was later arranged for 2 Apr 2003 at Southampton.] We noted that ACD only covers types currently used within the EMBOSS suite, and in particular does not yet cover gene expression - an important part of the Graves' disease scenario.
Lab Book walk through (concluded)
(The numbered paragraphs in
boldface and the
italicised text are taken from
AlansLabBookStoryBoard)
The previous meeting (
AccessGrid14Mar2003) covered the first 17 steps of
AlansLabBookStoryBoard, so this discussion picked up from step 18.
18. I log out of myGrid.
The Gateway (GW) suports a logout method. We anticipate that the scientist would simply close the Workbench client, so this will have to invoke the GW logout in its exit sequence.
19. Later I return & look at my results…
As step (1).
In his [AlansLabBookStoryBoardWP6][annotated version] of the storyboard, Chris asked whether the Workbench/workspace configuration information persisted ither in the MIR or in local files. The initial answer will be 'no'; and we are more likely to use local files in the medium term.
20. After logging in, I am notified that my in silico experiment has finished & the results have been stored, including the final result (as R1), along with metadata & provenance, in myExpt1 of myProject1.
As step (2).
21. I find & select that I want to look at the final result (R1) & a viewer displays it for me.
[I expect that the myGrid & the mIR will store the type of my data as part of the metadata, e.g. a MIME type. I may have personalised myGrid so that it uses my preferred viewer. If it's a data type for which I do not have a viewer already specified, then I'd expect that myGrid would help me find & choose an appropriate one, c.f. Netscape Plug-Ins.]
The
NetBeans framework underneath the Workbench has some support for MIME types and plugins. This is related to the type system question: how are format types related to MIME types?
22. I decide to change some of the parameters for a service instance in the workflow (WFTemplate1) from their recommended values to produce a new workflow (WFConfigured1.1) & see how this changes the final result (R1.1). I'll need to record why I've done this & have all the parameter values captured & stored in the mIR.
[I assume that this new run of the workflow is still part of myExpt1.]
The user's action involves changing the parameters in the Job Description File (JDF) part of the workflow, leaving the WSFL part unaltered. The Workbench will have to support this edit action, and in addition must offer the user the opportunity to create an annotation that includes the user's comments and links the two workflow definitions. The proposed schema (see
MyGridInformationRepository) supports this via a
MetaData entity and using the subject & associationObject relationships. [Which workflow is subject, and which object?]
Otherwise, and for running the new workflow, as steps (8)-(17).
23. I decide that the original result using the recommended parameters is the best & I add two annotations to the final result (R1): one is my conclusions (myNote1), the other is ideas for the next experiment (myNote2).
The Workbench will support creation of arbitrary annotations. [And it should allow the user optionally to choose an associationObject, in this case as a way of pointing to the second result in 'this result is better than that because ...'.]
24. I decide the results are so good, that I'm going to share them with my supervisor & the colleague from whom I got the original data & I send them the location of my result (R1) so that they can view it.
We will not have any support for roles and groups for the next demonstrator. The best we can hope for in the MIR is to support 'private' (to the author) and 'public' (to all registered users of the MIR).
25. Since she's my boss, my supervisor has permissions to see everything I've done. From the result (R1), she can follow the trail back to see how it was generated, using which services, from which workflow template (WFTemplate1), with which parameters (WFConfigured1) & using which starting data (myData1). At each stage, she may also see the annotation that I may have attached to the objects describing why I made particular choices, e.g. which workflow template, which service instances & which parameters. She may make some comments of her own, and/or sign off on the work. Before signing off on the work, she may want to check that I haven't altered R1 to falisify the results. She could do this either by re-running the workflow (WFConfigured1) herself with my data (myData1), or if we had a PKI for checking with a service instance that a result hasn't been tampered with.
[Having a system to detect possible fraud would be interesting, but I'm not sure about the plausability.]
As in step (24), we will not support access dependent on role & group. The Workbench should support browsing annotations and associated objects where they
are visible (see also below).
26. I'm a little more paranoid about my colleague. Although I'd like her to see my final results (R1), conclusions (myNote1) and how I got there (WFTemplate1 & WFConfigured1). I don't want her to see the annotations about what I want to do next (myNote2), or possibly comments I made about my perceptions of the quality of her data attached to myData1.
[I need to be able to grant & revoke privileges to other people on selected items in my mIR for reading & writing.]
As in step (24), we will not support access control of this complexity.
27. Following comments from my supervisor & colleague, I decide to retrieve my original conclusions (myNote1) & re-write parts of it (myNote1.1).
[I & my supervisor need to be sure that the original copy of myNote1 is still available & not overwritten or deleted by myNote1.1.]
To do this, we need to generalize the 'edit workflow' capability of step (22) to support edits on different entity types. Annotating the original and new objects should be part of the Workbench's support for editing, not specific to particular entity types.
Mark asked whether we need a real delete capability, as well as being able to mark an entity as 'deleted'. From experience with
ProcessWeb?, he suggested the answer should be 'yes': runaway workflows can fill up the disk before terminating. This could be via a special administrative utility (and port type?), although for the immediate demonstrator it's OK to offer this option to the Workbench user (subject to the usual "Are you REALLY sure?" confirmation dialogue!).
28. I run two further in silico experiments in this project: myExpt2 & myExpt3. myExpt2 is a different workflow template (WFTemplate2) which has the same function as WFTemplate1, but uses a different methodology - I decide the result (R2) of this experiment isn't as good as the first one, which I document & have stored in the mIR with R2. The other in silico experiment, myExpt3, takes the final result (R1) of the workflow (WFConfigured1) in the first in silico experiment (myExpt1), plus some new data (myData2) & runs it through a new workflow (WFTemplate3 & WFConfigured3) to produce a final result (R3).
This should all be OK. We noted that, in current thinking, these different workflow instances would be probably be part of the same experiment, as they are presumably testing the same hypothesis.
29. Some time later, I am in the process of writing up the paper about these experiments. I find my final result (R3). I need to know how & why that was generated all the way back to myData1, i.e. trace back through two workflows.
[The mIR needs to be able to capture the relationships that R1 was part of the input data to WFConfigured3 in myExpt3 & is also the result of WFConfigured1 in myExpt1 that also used myData1 as input. So the mIR is able to store different types of entities & the relationship between them, as well as the annotation/metadata that is associated with them - Does this sound like an OODBMS?]
The user needs to be able to 'walk' the graph of annotations in the MIR. Starting at any entity, she should be able to step to any annotation of which the entity is either subject or object, and for annotations, additionally the subject entity and (if defined) the object entity. [The appropriate user interface for this will be interesting...]
It's not clear whether the MIR will contain an object that represents the 'write up' of the experiment: probably not, for the next IF.
30. My supervisor wants to identify what people in the group have been working on & where there's commonality.
[A text analysis engine is run over the contents of everyone's mIR (or at least parts of them such as the descriptions & annotations) to identify terms & concepts. Look for people, projects & experiments where the same concepts are being used.]
Unlikely for the next IF.
That concluded the storyboard walk through and the meeting.
--
NickSharman - 3 Apr 2003