Minutes of Access Grid meeting, 21 Feb 2003
Attendees
Manchester: Carole Goble, Phil Lord
Newcastle: Peter Li, Neil Wipat, Savas Parastatidis, Paul Watson
Nottingham: Milena Radenkovic, Kevin Glover
Southampton: Simon Miles, Juri Papay, Victor Tan
IT Innovation: Justin Ferris
EBI: Alan Robinson
Agenda
- Walk through the GravesDiseaseScenario produced by Newcastle.
- Review Savas' version of the service interaction matrix: do the identified interactions support the scenarios from (1)? [We didn’t have time to start point 2.]
ACTIONS:
- Review Savas' version of the service interaction matrix: This is to be done offline by all sites creating their own copy of the matrix.
- Savas volunteered to produce a security policy document for the scenario with Luc Moreau
Next Meeting:
We will discuss the new information model and the regular AG meeting on 28 Feb 2003
The scenario turns out to be a good basis for the June demonstrator Lab Book [btw: I still haven’t heard of a new name for the lab book. I’m thinking of calling it Estudio, which is Spanish for study].
The workflow is split into 3 parts.
Annotation pipeline
The services identified by the scenario that we will need to access via the workflow enactment engine are:
- Ensembl: we can access as a mySQL database, and situate a local copy at Newcastle. There are a number of ways of accessing Ensembl. A possibility would be to have Ensembl at two sites and then substitute one for the other during a workflow enactment. It would be ideal to have several copies of the same service so we can describe it semantically once but have several entries in the UDDI registry.
- Transfac: This is technically ok as we can access this through EMBOSS but there are some issues with licences raised by AlanRobinson?
- MEDLINE:
- AJR (11/3/03): MEDLINE is available through both OpenBQS & SRS.
- OMIM: Alan proposed that OMIM would be a good (better than Medline) candidate for the text processing WP7 as it is more constrained.
- ARJ (11/3/03): My suggestion is based on OMIM being much smaller than MEDLINE (15,000 entries rather than 12,000,000) & I also suspect having a higher signal-to-noise ratio.
- BLAST: this would show off the invocation of a web service with a heap of parameters (see GenericOperationInvocation), and open out the possibility of a computationally intensive Grid activity.
- AJR (11/3/03): BLAST is currently submitted from Soaplab onto the EBI Linux farm using LSF. At one time we looked at using globus_run to submit jobs.
These services need to be:
- wrapped as web services
- AJR (11/3/03): Any command-line application can be made available as a web service using Soaplab. I still have issues as to what it means to wrap a database as a web service, e.g. do you just want a method, 'executeSQL(in string SQL)'?. Are OGSA/OGSA-DAI in a form where we should be examining them?
- described by the ontology
- described by the UDDI-M/view registry
- at least one should have multiple views
- registered in a UDDI registry
For each probe id we need to get the corresponding EMBL accession number, ensemble id, medline id, dbSNP and OMIN id. This implies an implementation of LSID and an LSID resolver by Nottingham.
- create personalised view of database entries for candidate gene: this is quite DAS/SRSish. This view will again fall upon Nottingham to present in the Lab Book.
Genotype assay design system
Design primers: we have three choices here:
1. a highly interactive process with the biologist, perhaps using
Talisman or some other application. The opportunities that this
affords are:
- user interaction with a workflow (halting and resuming a workflow)
- a workflow notifying the user proxy
- launching a third party tool from the workflow in the lab book, and notifying the workflo when the tool is exited
- collecting provenance data as free text notes
2. we use an autogenerating primer and just run through the workflow,
perhaps picking up user preferences.
3. the scenario is thought of as 2 separate workflows with an
application in the middle. The lab book would host the lanuching
of the primer application. On its close, a set of possible
workflows that could follow could be suggested.
The opinion for June and then Dec were that we should do then in the order of 2, 3, 1.
Determine restriction enzyme for the above SNP.
This is a simple workflow that can be enacted by the workflow enactment engine.
3D Protein structure and SNP visualisation
get PDB id for candidate gene.
- AJR (11/3/03): Unless the gene has been crystallised, this is a non-trivial step. Even if it has been crystallised, it's still far from trivial since for very god practical reasons, the translated DNA sequence & that used for crystallography may not be identical.
- this means making MSD a service for myGrid
- AJR (11/3/03): We've talked to Kim Henrick at the EBI about web services. They've tried them, but are sceptical about the performance.
- MSD is a relational database in Oracle. So it is a candidate for turning into an OGSA-DAI service
- AJR (11/3/03): An expert on OGSA-DAI needs to talk to Kim Henrick at the EBI.
- When MSD changes then we could notify the user proxy; this would be an example of DatabaseUpdateNotification that would be convincing. The MSD people are keen on this.
- AJR (11/3/03): An expert on myGrid notification needs to talk to Kim Henrick at the EBI.
- we need to match up notification topics with MSD, and set these up through the lab book.
obtain information about protein and extract information about active site
- InterPro?, Swiss-Prot, Pesto: this is a conventional workflow.
- AJR (11/3/03): Not sure what you mean by this. We'll have already analysed InterPro? & SWISS-PROT as part of the DNA & protein sequence analysis. - I guess this is Sheffield's call
display 3d protein structure to user and highlight location of amino acid change caused by SNP using RASMOL viewer
- AJR (11/3/03): "Here be dragons!!". You have to make sure that the DNA sequence & (fragment of) crystallised protein structure line-up. I've been down this road with the p53 gene: http://industry.ebi.ac.uk/~alan/MutationViewer/
- Phil raised the issue about associating MIME types with a service, whihc will have to be included in the service description.
get medline ids for PDB id, extract protein structure data.
- an opportunity for PESTO to get involved
- discovering services when building the workflow
- discovering the workflow so that it is reused from the MIR
- editing the workflow with an alternative service that is semantically proposed
- support in constructing the workflow
- using the concepts in provenance record, data, and workflow to link together the various components for browsing and searching purposes. This might mean using an annotation tool.
ideally we would like to do a semantic service substitution during a workflow enactment, but this seems unlikely. We should certainly do the substitution of an alterative service instance.
- the workflow provenance (and there have to be many runs of the flow!) is put into the MyGridInformationRepository
- we also need some free annotation provenance.
- mining/querying provenance. Through canned queries on mIR? through the ontology?
- AJR (11/3/03): There is provenance data of sorts in the EMBOSS outputs that we can capture.
to be completed
- AJR (11/3/03): Updates generated from a database, e.g. generated by triggers in MSD's Oracle database.
to be completed
to be completed
to be completed
the role of the LabBook
to be completed
the role of the WorkFlow Enactment
to be completed
the role of the WorkFlow Design
to be completed
- the MSD and the MyGridInformationRepository are both relational databases. So there is the possibility of some sort of distributed query demo.
- what is the role of SoapLab??
- AJR (11/3/03): Soaplab provides a consistent interface to access any command-line driven application.
- there are possibilities with Medline and OMIM
Security
we discussed security and digital signatures. Our conclusion was that we should use the scenario as a framework for producing a myGrid policy document on security. carole noted that Comb-e-Chem and Geodise had built authorisation and authentician mechanisms using Globus and web services for databases and ontology services. One play is that whoever is logged in can only see their own experimental components in the mIR. Is this possible?
ACTION: Savas volunteered to produce such a document with Luc Moreau.
-
CaroleGoble - 21 Feb 2003