myGrid Ontology

The myGrid ontology describes the bioinformatics research domain and the dimensions with which a service can be characterised from the perspective of the scientist. Consequently the ontology is logically separated into two distinct components, the service ontology and the domain ontology. The domain ontology acts as an annotation vocabulary including descriptions of core bioinformatics data types and their relationships to one another, and the service ontology describes the physical and operational features of web services, such as, inputs and outputs.

The following concepts cover the scope of the myGrid ontology:

  • Informatics: captures the key concepts of data, data structures, databases and metadata. The data and metadata hierarchies in the ontology contain this information
  • Bioinformatics: This builds on informatics. As well as data and metadata, there are domain-specific data sources (e.g. the model organism sequencing databases), and domain-specific algorithms for searching and analyzing data (e.g. the sequence alignment algorithm, clustalw). The algorithm and data_resource hierarchies contain this information.
  • Molecular biology: This includes the higher level concepts used to describe the bioinformatics data types used as inputs and outputs in services. These concepts include examples such as, protein sequence, and nucleic acid sequence.
  • Tasks: A hierarchy describing the generic tasks a service operation can perform. Examples include retrieving, displaying, and aligning.
  • Services: The concepts required to describe the function of web services and their parameters. The service ontology is described in more detail below

The scope of the ontology is limited to support service discovery. Each hierarchy contains abstract concepts to describe the bioinformatics domain at a high level of abstraction. By combining the terms from the ontology, descriptions of services are constructed to detail:

  1. What the service does
  2. What data sources it accesses
  3. What each of the inputs and outputs should be
  4. Which domain specific methods the analysis involves

By describing the domain of interest in this way, users should be able to find appropriate services for their experiments from a high level view of the biological processes they wish to perform on their data.

The myGrid Service Ontology

The core entity in the service ontology model is the operation, which represents a unit of functionality (i.e the Processor) for the user. Operations could be grouped into units of publication represented by the Service entity. An Operation has input and output parameters. In turn, each input and output parameter has a name, a description and belongs to a certain namespace denoting its semantic domain type.

Creating Descriptions from the Ontology

Combining the domain ontology and the service ontology enables full descriptions of services. To take an example the blastn service would be described in the following way:

  • The overall task being performed by the operation. (i.e. the biological operation it performs) = aligning
  • The bioinformatics algorithm used (i.e. the underlying scientific method). = NCBIBlast
  • The data resource it accesses = NCBI GenBank database
  • The number of inputs = 1
  • The number of outputs = 1
  • Input 1 = DNA sequence (fasta format)
  • Output = Blast report

Ontology Downloads

The myGrid ontology can be downloaded in OWL and RDFS.

Please note: The OWL version of the ontology was developed using Protege 4 (which now supports OWL 1.1). To view this ontology successfully, please download the latest version of protege.