To date Grid development has focused on the basic issues of storage,
computation and resource management needed to make a global
scientific community’s information and tools accessible in a high
performance environment. However, from the e-Science viewpoint, the
purpose of the Grid is to deliver a collaborative and supportive
environment that enables geographically distributed scientists to
achieve research goals more effectively, while enabling their
results to be used in developments elsewhere. myGrid will integrate
and extend technologies to provide an advanced e-Science
environment. We will design, prototype and demonstrate a testbed
that: supports the scientific process of experimental investigation,
evidence accumulation and result assimilation; supports the
scientist’s use of the community’s information; and directly
supports scientific collaboration at the core of research, allowing
dynamic groupings to tackle emergent research problems. Our
objective is to present the necessary infrastructure within a Grid
environment as an e-Scientist’s Workbench that actively supports the
scientific lifecycle.
The particular focus of myGrid is on data-intensive e-Science and
the provision of a distributed environment that supports the in
silico experimental process. The application is bioinformatics,
specifically post-genomic functional analysis, where the building of
value-added repositories and their use in day-to-day research will
only be truly viable when scientists have efficient tools that allow
them seamlessly to: link together databases and analytical tools,
extract relevant information from free texts, and harness available
computational resources for CPU-intensive tasks.
The myGrid project aims to build a demonstrator infrastructure that
supports a personalised problem-solving environment for an
e-Scientist. The vision is of a “lab book” environment where the
e-Scientist can construct in silico experiments, and find and adapt
others, store partial results in local data repositories and have
their own view on public repositories, and be better informed as to the provenance and the currency of the tools and data directly
relevant to their experimental space. The Grid becomes gocentrically
based around the Scientist – myGrid. The appropriateness of the
infrastructure will be shown in two ways:
- For the e-Scientist by two information intensive demonstration applications in (i) model organism (S. cerevisiae) gene expression analysis, and (ii) GPCR fingerprints database annotation
- For developers by the assimilation of existing integration platforms found in the Life Sciences, specifically (i) the Sanger Centre’s Distributed Annotation System (DAS), the EBI’s AppLab and OpenBSA and (ii) the in-house integration platform of GlaxoSmithKline
The project objectives are as follows; each has a short term aim to
deliver a simplified scheme within 24 months and a more
sophisticated scheme within 33 months:
- An extensible open platform for data & tools interoperability built using existing Web services and Grid technologies, based on an agent-based software engineering paradigm, and achieved through exploiting our collective skills and technologies in databases, information and knowledge management, interoperation technologies and multi-agent systems
- Support for in silico experiments based on process flows, building on techniques already available in the consortium and the Web services community
- Support for data provenance and resource change management based on notification and process flow evolution, building on agent-based notification and incremental view management experience within the consortium
- Support for personalisation based on the management of views of, and over, information repositories, and personalisation of process flows, building on view management techniques, the annotation of existing data sets, and the creation of personal data sets using existing or user-specific models
- A set of toolkits that can be configured for specific applications, building on the experience of the consortium in user requirements capture and community-based tools
- A testbed demonstration, incorporating appropriate information resources and analysis tools into myGrid, and populating models with experimental processes, data provenance and personal results
myGrid has:
- A focus on the use of computer science, bringing together 5 computer science departments with established track records and complementary skills, and 5 commercial technology providers
- A strong scientific lead, bringing together biologists and bioinformaticians from Manchester and the European Bioinformatics Institute (EBI), and 3 problem holders from pharmaceutical companies who are willing to contribute significant people effort, use cases and evaluation effort. The EBI are the key holders of biological databases, tools and standards in Europe
--
NickSharman - 15 Nov 2002