This is a summary of information gathered from the NASC website which is at
http://www.arabidopsis.info (in particular,
http://arabidopsis.info/garnet/garnet_2003)
, from emails with
Sean May, and from a meeting with researchers at the NASC on 21st January. The summary first introduces the role of the
NASC, and then gives some detail about tasks that researchers at the
NASC are performing.
****************************
The
NASC (Nottingham Arabidopsis Stock Centre) has three major roles relating to Arabidopsis, a flowering plant commonly used for plant genetic research:
- they store donated arabidopsis seeds, which are made available to the scientific community.
- they maintain databases containing the entire arabidopsis genome, though the sequence information is provided by third parties.
- they perform microarray experiments requested by the scientific community, and store all results in a database.
Access to information held by the
NASC is through web interfaces which can be reached from
http://www.arabidopsis.info. As part of the
PLANET project, they are developing
bioMOBY web services which in the future may become an alternative interface.
Storage of seeds
Some seeds donated represent different arabidopsis ecotypes (sub-species specialised to a particular environmental range) but the majority represent experimental plants generated by genetic modification. Seeds are catalogued in a database, containing information provided by the seed dontator - eg ecotype, details of mutant alleles etc. The
NASC provides two web interfaces to its seed bank
- seed searching (eg on ecotype, donator, mutant allele required)
- seed purchasing
Arabidopsis sequence information
The
NASC have been using the
ENSEMBL database to store arabidopsis gene sequence information provided by
TIGR and by
MIPS. This is currently being tested, and will be made available soon. The
NASC are currently not using ENSEMBLs capabilities for gene prediction and annotation, but are instead using annotation provided by TIGR and MIPS.
Arabidopsis microarray experiments
The
NASC has funding to perform arabidopsis microarray experiments on behalf of external scientists. Experiments must be requested through a web interface. Results are stored in a database, and are made available through a web interface or by monthly postage of CDs (this is a subscription service).
***************************
I talked to the following people who are researchers at the
NASC:
Warren Read warren@arabidopsis.info
Warren is working on the seed bank database. The
NASC is currently maintaining two equivalent seed bank databases - one in
Filemaker?, which is used by the seed ordering web interface, and the other in
mySQL which is used by the seed searching web interface. Both are accessed through scripts written in lassoo, which is a commercial scripting language.
Warren is working on two tasks
- merging the databases into a version using mySQL, with an enhanced schema based on a database used by TAIR
- writing improved interfaces to this database in Perl.
He is planning to look at a web service interface in the future if it is judged to be useful to clients of the
NASC.
Beatrice Schildknecht beatrice@arabidopsis.info
Beatrice is working as part of the PLANET project, developing bioMoby web services to access
NASC databases.
She is developing web services in Perl, which are registered with PLANET's own instance of the Moby Central database. The majority of her services are classified under the Moby category "retrieval" - they take an ID as an argument, and return an item of data.
Nick James nick@arabidopsis.info
Nick has been constructing the
NASC arabidopsis ENSEMBL.
A complete gene sequence is built up in ENSEMBL by inserting many small sequences ("contigs"), which are then assembled into a whole sequence. Nick has been using contigs from two sources - TIGR and MIPS (see above).
So that the two sequences generated by this method can be compared for accuracy, Nick has had to work out which TIGR contigs match with which MIPS contigs. He has done this using BLAST. This involves many runs of BLAST, so he has set up a
condor cluster, to which BLAST jobs are farmed out.
After contigs have been added to ENSEMBL, Nick has added gene annotation from TIGR and MIPS. He is currently working on linking ENSEMBL to the
NASC microarray database (eg if a user selects a gene in ENSEMBL, they will be able to access any expression data for that gene).
Other points mentioned
A number of staff at the
NASC are working with microarray data. They indicated that one of the problems they have is with clustering, which involves identifying genes that are expressed in similar ways to similar stimuli. The problems are
- finding clusters - the NASC holds data for thousands of experiments, each with thousands of data points, so this is a difficult compuational task
- getting information about different genes in a cluster - there may be hundreds, so any automatic method to find information about each might be useful
****************************
Conclusion
Much of the work at the
NASC is not relevent to myGrid - they are mostly providers of bioinformatics resources rather than users. There may be potential collaboration over their use of bioMoby for web services - I have read papers indicating that parts of myGrid may interoperate with Moby in some way.
Next steps : find out more about bioMoby?