myGrid planning for Taverna 2.0
September 9, 2005
Agenda
10.00am start meeting summary of
OMII outcome, related projects and
status of each component, final
report - Carole
10.30am Plans for Taverna 2.0
and priorities - Tom and Justin
11.20am Plans for data and
metadata management - Daniele and Jun
13.30pm Priorities and planning
and discussion - esp from user projects ISPIDER, PsyGrid and myIB
Carole OMII report
myGrid
Users: psyGrid and Ispider
OntoGrid research, possibly linking in with Pasoa
MyGrid specific development:
* 400k Platform grant for speculative work, linking with others, paying interns
* 2.2 mio Open Middleware Infrastructure Institute: myGrid, OGSA DAI (main users, channeling) and Soton OMII; Goble chairing OMII. User orientated view; ability to steer OMII towards users
OMII roadmap linking various releases; Taverna 1.3 in April 06 released via OMII. Theme of registry assisted workflow (wf) composition. Grimoires released before All Hands Meeting. Link together Taverna with BPEL/UCL component. Dec 2006 move to secured services part. April 2007 Taverna 2.0 release via OMII.
OMII Proposal: maintain 3 layer development cycle
* Pre-release activity: immediate needs. Eg Andy Brass needs portal quickly; deploy Perl services
* Dowloadable/documented software (to friends; sourceforge community)
* full OMII soft dev process. Quality assurance, regression testing, multi platform testing, doc writing; takes a year to do properly.
Chris G Nottingham portal: Stef tested and believes portal works. Problems in Taverna events/MIR.
Justin Simdat demonstrator: difficult to get MIR working with portal; ended up with custom Gridsphere based portal for demo/application and wf specific. Tom suggests web end interface to enactor as short term solution. Andy Brass needs only access to generated files.
Full time person needed long term; data mgt. Tieing MIR and portal difficult; fixing MIR.
Steve Pettifer. API accessible from non Java apps? Start Taverna job without UI.
Activate wf from native apps, so wants soap like interface. Data encapsulation used to carry around. Would need to create equivalent data binding libraries in native language. or compile libraries via .Net. Luc Moreau: Grimoires batch mode access API.
Coupling Taverna and Freefluo? Abstraction layer is there; capable of talking to remote enactor. (Web service enactor.)
Security: authentication and authorization
Permeating view throughout software
Ties in heavily with provenance.
View stuff of original proposal: VO, permissions.
Grimoire part: support for security model WS Security; clients that can talk to Grimoire service using WS Sec. Simple access control list in Grimoire service. Plans for XSCML.
“My data is mine, and I can sometimes share it to you.”
Peter wants to share workflows securely through Grimoire. Important priority.
Psygrid also using role based security model; not just closed world.
Tom and Justin Taverna plans for 2.0
1.3 changes mostly UI; adding more plug-ins
Move towards proper plugin architecture:
Int biology package Peter Li: visualization, API consumer definitions
Similar package for cheminformatics
New functionality in 2.0. problems driven by applications; not necessarily users
Taverna 2 will look very similar to 1.3, but new process model and enactor scalability.
Enactor scalability: memory mgt in Java; deal with real time /streaming data
Simplify process model
Process model of how wf works has evolved and has become messy for provenance (fault tolerance)
Want subwf implemented so can do integration of BPEL or Kepler workflows
Have appear as workflow graph when reporting progress, not just as service
Drilling down workflow structure.
Tree as progress report; visualising will be very hard.
Work on SimDAT secure FreeFluo engine; composing remote Taverna workflows.
Dynamic authorization/service selection. Users writing code against service, not workflow. Help on enactor from SIMDAT end. Justin/Nicholas: working on generic enactor model to support adaptable workflows. SIMDAT engine. Need better coordinated response/consolidated releases -> via OMII.
Luc: What is current myGrid architecture and what is it intended to be? Avoid duplication of work. Design exercise done with mediator architecture. But now Taverna architectural framework is central. No implementation for other; main idea about events is in Taverna anyway at moment; mediator became embedded in Taverna. Taverna architecture is central and distributed. Conceptually it’s modular, most users just use as one. Taverna the workbench container vs the wf execution container. Turned into container for components.
Plug-ins: Feta running over Webdav (as opposed to registry), Provenance plugin. Grimoire could be one too.
Worth revisiting mediator architecture in the new OMII context?
How to glue components together? Original Service oriented architecture? Adopt something in between event driven architecture and that one? Re-assess available technologies to drive component interaction, eg agent stuff
Taverna is application and how enhance that application to help users -> mediate term focus
Versus
Taverna is also about other not directly related components needing working together. -> longer term.
Phil Which components taverna does not know about. Interaction Feta registry and data storage and ontology server. Need such mechanism, but is it necessarily through Taverna? Need more user pull to resolve question.
Neil: Notification service integration Taverna? Use case? Microbase workflows notification based. Tom: No generic way to add notification. Could be done but custom; framework for events there. Notification scheme in OMII 2.x called FINS. Do we want independent notification service in myGrid architecture or not?
New myGrid project manager; we need standardised ways to interact with the project for outsiders.
Rizwan performance and QoS of workflows. Tom would like service runtimes in registry. Iteration is problematic. All real workflows have implicit iteration so can’t predict runtimes based on service performance data alone
Daniele and Jun data and provenance
Daniele presentation
Main points:
* Faithful recording of taverna events as ontological instance data
* Techniques used allows us to easily keep it in step with Taverna development
* Various additions to Taverna desirable
Integrate with personal/organisational VO/ security information
Presentation slides:
* Metadata;
Representation: RDF flexible and easy to exploit the presence of LSID, potential of semantic googling
* schemas
ontology based on Taverna functionality
for storage use named RDF graphs
retrieve whole graphs (eg workflows) eg complete wf record for executino/provenance
at moment implementation in NG4J (Jena+mySQL), not efficient
major RDF parties have pledged support to NG4J: Sesame, Kowari, Protege
* generation,
taverna workflow event listener
all events sent out by taverna are recorded faithfully
instance data ‘typed’ by new provenance ontology, again based on taverna’s model
* query,
TriQL, query language for named RDF graphs
Jena ontology inspection/reasoning
Canned queries
Workflows with failed processes
Input/output of past process runs
Workflows with data changed by user
* Diagram showing interaction of taverna with provenance plugin
* browsing
Ocula based browser (new component developed by Tom; GUI framework that uses XML as representation language, allows to embed bean shell scripts; tailored for quick representation; framework for people developing their own; components that can be customised for specific cases)
One browser for each canned query
very rapid development (few hours)
Ismael Juma browser for failed processes; Daniele quickly produced new browser based on that one, for changed data
Jun presentation on metadata requirements
* Distinguish service invocations from different people/clients
* Logging of user interaction, check points, invocation of scripts, grid services; now possible in Taverna. Taverna 2 will not have explicit steering/check points. If user interaction then do through specific service, which will generate its own provenance; the user as a service. Enactor not generating the provenance; distributed provenance.
Relate to work of Luc: Luc releasing new provenance discussion; in 3 weeks.
Norman: Looking forward for MIR/data mgt
MyGrid now equals:
Taverna/Freefluo/
and plugins which have UI and store
and domain services
unsuccessful components are ones that cut across that view of world.
an information repository
encompassing interfaces
Vertically integrated things that plug in to Taverna were more successful in the project
4 Good to haves
1/ Consistent user interfaces (look and feel + codebase) for shared tasks. Eg UI+store
- UI: Feta: canned query capabiliity, browse results
- Provenance store: canned query and browse
-> so share codebase/UI feel
Useful for future extensions too
Result viewing open question. Domain specific. Default behaviours but overwritable.
2/ Purposeful things (ie supporting some task)
feta: not very big, does specific thing
find functionality, do design, incremental release/test cycle
versus vision of overarching store (ie go for quick wins)
3/ coordinated storage practice
doesn’t mean need to use same storage technology
but when building stacks, can we be consistent
provenance store vs feta: rdf manipulation choices
just think about how relates
use as few as possible storage technologies
4/ pluggable storage systems where possible
consistent messages relating to stores where possible
reduce cost of moving from moving to different storage technologies, eg when replacing RDF stores; or from RDF to DB; logging in; querying
minimize cost introduced by different adopted technologies
5/ not looked at domain data storage
carole: truly federated environment, daniele uses lsids and doesn’t worry so much about data; put credential mgt in lsid mechanism rather than in data technology
Database is too much of implementation of Information Model ?
Or are interfaces direct view of databases ?
Chris G: Information Model works OK as ontology but less well when turned in to concrete schema. Was always work in progress, then reifying it in a concrete implementation made it hard to change it as things progressed later on.
Paul: Too much or too little info in information model? Needs detailed review/version
Carole: All or nothing thing? Views not in information model -> would use that as mechanism to hide things
Paul: Views come in 2 ways: interface to access data repository (personalisation) and whether want to enforce through use of security
Chris G: in ontology, use some parts of it, if at all versus database where it’s all there
Luc: we had views in registry
Norman: views relate to purposeful interface. VO as model.
Need VO model for security. Is management interface, not for all. Eg add/remove to projects.
Use view for discovery (eg ones used before by ones in my project)
Use view for provenance
-> different views on views model, not same as how one does in DB
DB views help implement these high level views.
Views implementation specific to interaction task of applications trying to use it. Where cocked up wrt MIR: building bespoke interface to undefined task.
Paul: MIR: because it has query interface can have it all by building queries but people don’t do that. We should identify 10 things we really need to do and support those, still leaving room for the others
Carole: People will populate different datamodels versus all will be put in same database
Paul: Both data and metadata in MIR is backstop. People choose their form, how generically store in structured way? No real answer available from Information Model meeting.
Damian: using SRB in Integrative Bio project. Believes will need to support multiple repositories. And then link to information in MIR. Keeping in these sync; is a problem for later. Always will be federated.
Carole: should help end users do synchronization. Has serious implciations for provenance. When distributed, how track building up of a data object. Hard for a cook workflow to say that the output is a sandwich/victoria sponge, if it never saw the overall picture and only knows the inputs/ingredients.
Tom: needs lots of role based metadata for security. Always created by people in a context.
Separate out organisational metadata is independent task for everything else. VO manager needs to be address book on steroids
Carole summing up:
* Reintroduction of views
* Implementing interfaces that match views rather than entire collection of information
* Deploying to appropriate implementation model where not all has to be same impl. model because interacting at level of view. Not fixed to underlying technology.
Paul adds: Security: need all this to integrate with myGrid security model.
Do VO stuff seperately. Corner stuff. Go check with VO whether that information holds.
Damian In IB: primitive VO model, based around groupings in SRB. Managed by hand, set up different groups for diff. virtual experiments. Results coming out of other experiments: which bits can you then see (eg only ones coming from your own inputs). Tie it in with data itself.
Tom: via SRB? Need to do it via SRB layer which would be index; contextual metadata; interceptor requests checked.
Damian: Do policy enforcement as close to data as possible. Policy decision can be elswhere. Whereever do checking, have unlimited access downstream, so want it this close, or thourgh linking.
To do’s wrt VO
Tom:
Come up with VO model, and check whether it covers projects
Implement container
Evaluate true or false security assertions
Test with people
Then go through loop
Luc: experience with (XSCML ?) assertions: do check as requests come in, and as results go out. Tom: done on server; server knows context Model interface should be same on server and client
Carole: need client side tooling while building underlying model container.
Damian: VO model will be generic organisational model? Not convinced. Perhaps better federated system; submodels to choose.
In any case, do it for one, then test whether can substitute. How pluggable is it? how reconcilable?
Norman summing up: Things discussed:
1 More uniform interface (vertical things look and feel) -> design and code base work
2 Development of VO model : might serve UI
3 Development and deployment enforcement infrastructure serving policies
--> 2 and 3 probably orthogonal; VO provides parameters, but doesn’t make the call
4 Domain data storage approaches we support: using multiple storage mechanisms, how do we name, dereference, check that still there
5 Management interfaces: allow to use/populate stores. Accumulating provenance -> what with deleting. What with stuff in external stores; garbage collect, notify. Maintenance of information resources. Could imagine starting up new projects, combining different databases.
Carole: also need to address deploy/install problems.
* soaplab needs revision. how deploy command tools. Phil: Needs management tools. General problem deploying applicatinos as Web service containers: potential to outsource to OMII Soton.
Paul: authentication ; common interface, requirement for everyone to log in
Important for context collection. Damian: IB have e-Science certificates. Grimoire use too, PsyGrid, Gold too; uses XSCML based certificate.
Damian: have lot of internal data from applications that need to be fed in RDF. Eg running heart model, and user steering it; eg apply electric shock; if collaborative steering with diff. People doing diff. Aspects of simulation. Equivalent to mini workflows inside applications: interactive. What’s been collected at enactor is not enough. Architecture of PASOA project: ;from the steering of application, give up piece of RDF that describes provenance, and export it so can join rest of provenance, cause can’t be collected at workflow level. Scientists need to produce it. how capture it? big issue
Khalid: implementation on interfaces. Wants to access Taverna UI components: display the workflow, access the explorer, discovery component. Not well documented outside Java doc. Starting point is workbench class. No dependencies exist between Taverna UI components.
myGrid roadmap and its place in OMII roadmap is in pipeline