r2 - 30 Jan 2004 - 11:02:00 - StefanEgglestoneYou are here: myGrid wiki >  Main Web  > StefanEgglestone > SchoolOfBiosciences > DivisionOfNutritionalBiochemistry > JohnBrameld > JohnBrameldInterviewOne
Interview with John Brameld, Tim Parr, Ron Bardsley
Interview in North Lab, Sutton Bonnington Campus, at 9:15 on 15th January 2004.

Research

All three scientists are involved in studying the effects of nutrients and hormones on growth of organisms, particularly farm animals. As such, much of their research involves studying expression of genes related to growth hormone production.

Use of bioinformatics

They use bioinformatics for research, and teach it to second and third year students.

They describe themselves as not being sophisticated bioinformatics users, and normally work with small data sets. I have a copy of a third year assessed practical used for teaching bioinformatics, which they indicate is fairly representative of the types of task that they perform. A scanned, OCR'd version of this practical is attacehd to this wiki page. Most of their work involves web interfaces to databases, utilities running on servers and accessed through web interfaces, or command line utilities running on their own computers, often accessed through graphical user interfaces.

They are unlikely to ever write code to access web services, or to develop web services themselves.

Storage of intermediate results

When performing simple tasks, intermediate results are stored in word documents. Problems with this approach are

confusion when too many documents are open
issues with speed on slow computers
when using alignment algorithms, problems with fonts - if info in a fixed-width font is converted to a variable-width font, it may not be acceptable input for some web interfaces!

When performing more complex tasks, or when they wish to store intermediate results, they log into the Human Gene Mapping Project (HGMP) website, here, which offers many bioinformatics tools running on servers, and storage space for intermediate results.

Tools used

GCG - a set of tools similar to EMBOSS, from the university of Wisconsin. These tools are free to run on the desktop, but a subscription must be paid to access versions running on a server

EMBOSS - through the supplied graphical user interface

protein secondary structure packages - only those that come secondary structure software that comes in a whole package. They've tried using series of simpler pieces of software, passing the output from one as an input to another, but often fail to get this to work.

NIX, PIX - software available through the HGMP website which query as many resources as possible to get annotation for unknown nucleotide and protein sequences.

How they find services and data

courses, such as the computational biology course run by the HGMP.
directories - genome web, HGMP website, EMBL website. They like directories which give them information about how easy to use a service is. Services tend to be trusted more if they are supplied by well-known organisations. Sequences are trusted more if they produce good matches against sequences in existing trusted databases. (a good match is considered to be ~50% of base pairs matching)

Scenarios currently performed

The 3rd-year practical mentioned above gives a number of scenarios. The scientists also described the following scenarios as very common examples of what they do

scenario a

When performing a gene expression experiment, they wish to generate oligonucleotides (small strands of DNA, ~ 2-200 base pairs) to which mRNA taken from a cell being studied will bind if the gene from which it was expressed contained a similar sequence to the oligonucleotide. Experiments work more effectively if oligonucleotides are an exact complement of a section of a strand of mRNA, but if an oligonucleotide representing the whole gene was used, polymorphisms between different versinos of the gene would make this unlikely. Instead, the scientists try and identify subsections of a gene they wish to study which vary little across the differnet versions of the gene that exist in different species. For a given protein they wish to study, they

1. generate a cDNA sequence which is a complement of the strand of mRNA from which the protein would be translated.
2. look up similar cDNA sequences in human, pig, sheep etc cDNA databases
3. use alignment algorithms with sequences found, to find areas of maximum similarity
4. use PCR to generate probe material for the gene expression experiment from these areas.

scenario b

The scientists find a gene or protein sequence in a database, with no other information about it. To find out more about it, they use BLAST against reliable databases to find similar sequences, and then search for more annotation on these.

what they would find useful

Automation of scenario a
Notification of a new sequence having been added to a database, so that scenarion b can be run on it.

The scientists' biggest problem is with microarray experiments. They have equipment, but don't use it currently because

1. they are struggling with processing data produced to give valid results
2. processing of results. micro array experiments often produce sets containging 100s or 1000s of cDNA strands with high levels of mRNA expression, which should be investigated further. Investigation can then lead to indications that more of the cDNA strands with lower expressino levels, but which are related to the original set should be investigated. Information per strand which they might like to gather might include GO terms for genes containing the strand. The scientists obviously require tools automating this iterative process, as there is just too much information to do it by hand.

Comments on existing myGrid software

I showed them screenshots of Taverna, which they seemed very interested in, as they thought it might be able to do some of the automation they would like. They liked the fact that it gave access to intermediate results and stored provenace records of what services has been called. They indicated that it would be useful to store not only the name of the service, but arguments that had been passed to the service (eg parameters that had been passed to BLAST) to make it easier for them to interpret results.

Next steps

I suggested that I implement part of the scenarion in their third-year practical in Taverna, to provide a demonstration for them so they could decide whether to become more involved in the project.

Possible ideas

Implement some of their scenarions as workflows?
Get them to do this, and gather information about how they found the process?
Look at how myGrid workbench could be used to trigger a workflow on receiveing a notificatin of a change in a database?

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback