r1 - 07 Mar 2003 - 13:07:00 - AlanRobinsonYou are here: myGrid wiki >  Mygrid Web  > AlansAnnotationPipelineWalkThrough

Gathering Annotation about Genes

This demonstrates a walk-through of two different approaches that biologists may take to answer the question, "Tell me everything there is to know about my gene", aka the annotation pipeline of the GravesDiseaseScenario.

If you don't read through & try some of the following, then I believe that you'll never understand & appreciate the problems that many biologists face in using bioinformatics resources in the real world.

(N.B. SWISS-PROT & TrEMBL are separate databases: SWISS-PROT's annotation has been done by curators; TrEMBL's annotation is automatic. SWISS-PROT & TrEMBL are often merged & referred to by a number of names, including: SWALL, SPTR, SPTREMBL, SWISS-PROT/TrEMBL.)

There are a number of approaches this gathering of annotation could take:


1) AlansRandomWalk: Crawl through resources in a ~exhaustive manner, following links & collecting annotation along the way. This is the route that may be followed by people who don't know where to look before they start. Given the nature of the web, there is a network of resources to be traversed - You may not agree with the ~random path I've outlined.


2) AlansTargetedQueries: Query a number of resources directly using an identifier. This raises the questions:

  • How do you know where the resources are?
  • How do you know how to query them?
  • How did you find that identifier?
  • Is that identifier unique?

For example, the identifer for the complete cDNA sequence of the human Mdb2 gene in EMBL is AF072242, but the identifier for the protein in SWISS-PROT is Q9UBB5. You're going to need a means to establish that mapping, either parsing the file or using SRS.

Oh, btw, Mbd2 is not a unique identifier - It may be the entry in the HUGO database of official gene names, but the same gene name is used in mouse, frog & a type of cress. Sometimes these can be homologous genes, sometime not. So searching for Mbd2 in EMBL or SWISS-PROT will return numerous hits - Which are the right ones?

This is why following a trail of hyperlinks can be both easier & more accurate, although not rigorous. However, an experienced biologist can spot the correct identifiers & use them in the right resources to more rapidly collect annotation than the first process. In the second part, I'll demonstrate this approach.


3) Carry out a bespoke analysis, e.g. run InterPro scan. Usually this is going to repeat what many bioinformatics resources provide already. I am not going to demonstrate this here.


Comments wrt Graves' disease scenario

In the two approaches outlined above with either an ~exhaustive web trawl or a targeted search, I did not cover two parts highlighted for the annotation pipeline of the Graves' disease scenario: BLAST & TRANSFAC:

  • BLAST: What do we want to BLAST against? A possibility is PDB, however you've almost certainly already got the significant hits to PDB via Pfam & SCOP.
  • TRANSFAC: There is no entry in TRANSFAC that I can find for a gene called MDB2. The alternative is to isolate the upstream sequence of the gene & calculate putative sites using MatInspector. This piece of software requires user registration.

-- AlanRobinson - 07 Mar 2003

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback