See also
InformationModel,
MirNotifications,
MirQueries,
MirLsidAuthority and (New Sep 04)
MirBrowser
Overview
The primary objective of the myGrid information repository (MIR) is to store users' data. Other requirements on the MIR are described in the WP3 Requirements document.
The June 2003 MIR (MIR3) is undergoing a major revision. See
InformationModel for details.
--
PeterLi - 04 Feb 2003
Outstanding issues
MirFourImplementationIssues details outstanding issues in using the July04 implementation of the MIR.
--
ChrisWroe - 09 Aug 2004
MIR3 Implementation
See also
Mir3Reflections.
I have revised the initial implementation of the MIR (still under branch 'mir3' in the CVS mygrid/PersRepository. The corresponding information model and schema diagrams from Together-J are:
info model and
schema and
Java API. This includes all of the changes previously proposed and documented below:
- added 'addUnknownSubjectAs:String' and changed 'addUnknownAsConcept:boolean' to 'addUnknownObjectAs:String' in addAssociation() and addStringAnnotation() and addXMLAnnotation().
- replaced 'includeDeleted:boolean', 'authoritative:boolean' and 'nonAuthoritative:boolean' with hashtable keyed on property names with standard values in getMetadataExternalURIsBySimpleMatch(), getAssociatedExternalURIsBySimpleMatch(),
- replaced 'includeDeleted:boolean' with 'properties:Hashtable' in getExternalURIsByUserAndLocalType() and getExternalURIsByLocalType().
- added addProxyThing().
- added addCollection(). Note however that metadata over a Collection's members is not yet supported by getMetadataExternalURIsBySimpleMatch or getAssociatedExternalURIsBySimpleMatch.
- added getCollectionStandardMetadata() and associated 'membersExternalURI' property to act as metadata subject or object on behalf of its members (rather than as the collection as a thing in itself).
- changed 'valueXML:String' to 'value:Object' (should be type String or byte[] only) in addDataThing() and addOutputDataThing().
- similarly, changed return type of getDataThingValue() to 'Object' from 'String'.
--
ChrisGreenhalgh - 24 Apr 2003
I have done an initial implementation of the MIR (currently under branch 'mir3' in the CVS mygrid/PersRepository. The corresponding information model and schema diagrams from Together-J are:
info model and
schema and
Java API
--
ChrisGreenhalgh - 08 Apr 2003
Here are some refinements/extension currently under consideration:
- Add ProxyThing? localType and API operation to create - used to support annotations and associations involving things outside the MIR such as web services.
- Extend addAssociation/addAnnotation accordingly to allow automatic addition of ProxyThing? for unknown URIs (e.g. change addUnknownAsConcept:boolean to addUnknownAsLocalType:String, with null as false, and include this option also for the subject as well as the object).
- Support binary as well as text-based data for DataThing? and XMLAnnotation - allow easier inclusion of images, etc. in MIR. This probably means adding a binary flag or encoding attribute to the DataThing? table, and changing the API to addDataThing, addXMLAnnotation and getDataThingValue e.g. to use Object (which will actually be String or byte[]) rather than String, and adding 'binary' (or encoding/whatever) as a StandardMetadata? property for DataThing?.
- Drop suggestion in API docs that all DataThing? values are XML-wrapped - most of the scenario (e.g. GD) stuff at the moment is basically ASCII or images! Removing this lets clients work more intuitively in terms of (raw) values and their MIME types.
- Allow simple matches for deleted items only (currently can ask for non-deleted, or all including deleted).
- Consider generalisation of arguments to extensible set of flags or qualifiers to subsume deleted/notdeleted and authoritative/nonAuthoritative??
- Add Collection localType and API operation to create - used to support Collections; not a ProxyThing? because its ExternalURI? will be a new internally generated MIR LSID. (?)
--
ChrisGreenhalgh - 10 Apr 2003
Information Model
Following the brainstorm of the information model at IF-3 (
inf-model.jpg ) I have had a go at UML-ifying and clarifying an initial attempt at the information model. This is a bit broader than the MIR schema, but there we go.
Known issues:
- It is not clear where Subscription and Notification should go.
- Implementation details are deliberately glossed here (e.g. local vs global references, object vs relational styles).
- GridResource? and Service are more or less just place-holders at the moment; should they be reflected here?
- ServiceDescription is also a placeholder, and begs a range of different kinds of description, for actual services available, actual service used, to requirements, both syntactic, semantic and qos-related.
- ServiceProvenanceLog? is a place-holder for Luc's proposed 3rd party provenance recording.
- No RDF mapping is yet defined.
InformationDescription is a structured description of the broader information model.
MIR schema for IF-4
Introduction and Goals
I (
ChrisGreenhalgh?) want to rapidly adopt an initial and provisional subset of this for a new MIR schema to support lab-book style functionality. Concurrently, the broader model can be expanded and refined. The IF-4 MIR should include:
- Thing - common functionality across a range of Entities, including nominal support for LSIDs.
- DataThing? - which replaces DomainEntity?, and becomes a new base class for workflow definitions, etc.
- User - which expands existing User info. with extra fields to (a) relate to user agent and (b) potentially tie in to GSI.
- ActionPerformed? (and WFInstance), plus newly separated ActionProvenanceLog? (and WFProvenanceLog?), to record WF and direct action provenance, plus Input to preserve data-flow dependencies, which expand existing WFInstance.
- ActionDefinition? (and WFDefinition), which expand existing WFDefinition and extend DataThing?.
- WorkContext?, and in particular Experiment, for organising the lab book. This effectively replaces DEGroup.
- Report, and in particular LabBookReport?, for the narative/document presentation of the lab book report.
- Annotation, for 3rd party and other subsequent annotations of Thigns. In the first instance, signing off a lab book report can be done with a simple Annotation, although in future this should be a SigningAnnotation?.
--
ChrisGreenhalgh - 11 Feb 2003
Version 1 (superceded)
I have done an initial relational schema design based on a subset of the above information model. See the
design doc and/or
Together ControlCentre? project
--
ChrisGreenhalgh - 11 Feb 2003
Version 2 (superceded)
Hmm. Having trouble getting a stable relational schema, especially when I start to think about searching with concepts stuff, dealing with notifications, etc, etc. Perhaps it would be better to run for now with an abstract relational schema that directly supports a more RDF-style view of the world, so that stuff is more consistent at this level. Requires another level of schema of course, but then we have some of that in the Ontology and elsewhere... What do you think?
abstract schema gif
--
ChrisGreenhalgh - 27 Feb 2003
Version 3 (current)
Here is my third attempt at a schema; it is semi-abstract, i.e. a number of relationships which were previously separate tables - and some new relationships - are all realised in terms of a general Association class (implemented in the relational schema by the
MetaData table). Concept relationships are also included (equiv, super).
Class diagram Entity-Relationship diagram for Relational Schema
This is making minimal changes from the
PersRep2? schema (!). Notes:
- I have not look in detail at whether SQL queries will support the queries we need over this; I doubt it. The two options to improve queryability are (i) use the DB2 XML extensions to include XQueries into the documents (e.g. provenance logs) or (ii) have the repository and/or the gateway create additional explicit MetaData to reflect other information 'hidden' in these documents in a way that is directly queryable using SQL.
- Thing is a new common base class across all referable resources.
- deleted things can just be marked and retained using the deleted attribute.
- externalURI is used e.g. for Concept URIs.
- localType is the name of the nominal class in the information model (class diagram)
- Permission provides a trivial expression of access control (placeholder, only); I assume that we will make WFDefinitions public (shared read) by default.
- DataThing? replaces DomainEntity?.
- DEGroupID? is replaced by Associations of kind isPart, isUsedIn, wasCreatedIn, many of which can apply to each (Data)Thing.
- Some column names are changed to reflect the change.
- WFDefinition becomes a kind of DataThing? (and requires no additional data).
- WFInstance becomes a subclass of DataThing?.
- InstanceWFDefinition? is dropped, replaced by the definedBy:WFDefinitionID, since every WFInstance is an instance of precisely one WFDefinition.
- DEGroup is generalised to WorkContext?, which in turn is fully implemented by Thing.
- DEGroup.rDEGroupID is replaced by Associations of kind isPartOf.
- MetaData is added, to express Annotations and Associations.
- See above for some uses; also
- EntityConcept? is replaced by Associations of kind isA.
- MetaData.associationDistance allows expanded transitive associations to be placed in the mIR explicitly (with non-zero distance) to avoid recursive queries.
- User is retained but becomes a subclass of Thing.
- User.name is old UserID? (string).
- User.X509DN is a placeholder for additional user-related information, in this case their Distinguished Name as used on the Public Key Certificate.
- EntityWF? is replaced by Associations of kind hasInput and hasOutput.
- ConceptType? becomes a kind of Thing.
- Concept relationships are to be included, expressed using Associations of kind isSuperClass and isEquivClass.
- A LabBookReport? is just a kind of DataThing?.
I intend to prototype a new repository web service API for this real soon now, basing it strongly on the
PersRep2? API whereever possible (although many methods are still likely to change).
--
ChrisGreenhalgh - 13 Mar 2003
Views over MIR
As mentioned in the Overview above, the prime motive of the MIR is to store users' data. However, there is also the issue of supporting the users browsing their data, and the data of others for whom thay have the requisite permission.
RobertsMIRViews provides an initial target set of alternative views.
--
MarkGreenwood - 30 Apr 2003
ChrisWroemIRInstallationNotes as at 20th May 2003.
--
ChrisWroe - 20 May 2003
Some
notes on architectures for content management in other projects.
--
ChrisWroe - 12 Sep 2003
mIR Generic Query Interface
The mIR Generic Query Interface would use the OGSA-DAI WS-I version. The version which is available on the OGSA-DAI web site (
http://www.ogsadai.org.uk/downloads) is preconfigured for OMII-1. The bundle attached here is reconfigured to work on any platform independent of OMII. I am also attaching a text document on how to use it.
This has a bug in it - please do not use it. I am uploading a new version -- Arijit
The Twiki for some reason isn't allowing me to upload a new version of the zipped file. I am uploading the client code instead, please replace the client in the zipped file with this one.
--
ArijitMukherjee - 24 Jan 2005
mIR Performance and benchmarking
There are some performance issues around mIR which are highlighted with some benchmarking data in the attached document.
--
ArijitMukherjee - 24 Jan 2005
This is the first draft version of the MIR benchamarking document. The previous document is superceeded by this one.
Probably the final version of the benchmarking document - mIR is being modified to de-normalize some of the tables and use a serialization/de-serialization patch submitted by Ian (Roberts). The changes are committed to the CVS branch MIRv0_3 and will be merged with the HEAD branch soon.
More on mIR benchmarking - this version of the document contains the results achieved by setting the plug-in property for storing the traces on a PER_WORKFLOW basis. For workflows producing large volume of data, this requires a larger heapsize in tomcat; for smaller workflows, the default setting of tomcat works.