Message 1
Dear Tim
You made a suggestion on Friday about a workflow I should try writing. The idea I got was that it should look up all protein ids for proteins with a given name from swissprot, then for each protein id look up the nucleotide sequence from which it was translated. I've got that working now. For each proteind id generated for a given protein name, the workflow generates a list of embl nucleotide sequences cross referenced by the swiss prot record for the protein id.
So, for example, given protein name apolipoprotein, the following list of protein ids are generated
SWISSPROT:ABME_HUMAN
SWISSPROT:ABME_MESAU
SWISSPROT:ABME_MONDO
SWISSPROT:ABME_MOUSE
SWISSPROT:ABME_RABIT
... (loads more)
then for SWISSPROT:ABME_HUMAN, the following embl ids are generated:
EMBL:AB009422
EMBL:AB009423
EMBL:AB009424
...(about 5 more)
The sequences these refer to can then easily be looked up in embl using SRS.
I think you suggested these sequences should then be aligned against each other using clustalw, but I wasn't sure exactly how.
Should I do an alignment for each protein (eg align EMBL:AB009422,EMBL:AB009423,EMBL:AB009424, ... align all the sequences for SWISSPROT:ABME_MESAU against each other, align all the sequences fro SWISSPROT:ABME_MONDO against each other, producing a list of clustalw results) or do you want all of the sequences generated above aligned against each other, producing one clustalw result?
Stef