Interview findings:
User requirements for occasional bioinformaticians: Bio5
Bio5 is a Research Associate in the Human Genetics Department of an University.
Scenario
Bio5 is studying the genetic basis for Graves' disease. Graves' disease is an abnormality of the thyroid found in nearly 2% of the female population. Graves' disease is characterised by an accumulation of large numbers of lymphocytes (cells that are important for our immune system) in the thyroid. Many of these lymphocytes are thought to help produce antibodies that attack particular thyroid cell surface structures (TSH receptors). In essence the bodies own immune system begins to attack its own thyroid preventing it from functioning correctly. There are certain genes encoding proteins that seem to allow lymphocytes (T cells) to carry out this task in Graves' disease patients but not in healthy individuals.
Thus, Bio5 seeks to identify genes in cells of the immune system whose expression is modified in Graves' disease patients.
Bio5 is using microarrays to compare patterns of global gene expression in the lymphocytes of healthy individuals with those from Graves' disease patients to help identify genes that are responsible for allowing T cells to act in this fashion. Bio5 extracts mRNA from peripheral lymphocytes (those in the blood), synthesises cRNA and then probes Affymetrix microarrays. Data is collected using the Affymetrix software, and genes that are differentially expressed are identified using statistical approaches.
Currently Bio5 moves the data into Microsoft Access and performs custom queries to cut down the data set. Genes are then clustered based on their expression levels between patients and controls using the Genesis and the R software packages. Interesting genes are then identified by eye.
Affymetrix uses 'probe-ids' as unique ids for each gene. These probe ids must be converted to accession numbers by extracting the accession number from the description field of the database (using a custom program).
A list of accession numbers is retrieved and the location data and data about neighbouring genes located in the ENSEMBL database over the web. Bio5 is currently investigating the possibility of establishing a local copy of ENSEMBL to allow this data to be automatically extracted on mass and moved into a custom database.
Bio5 would also like to be able to automatically retrieve the GO terms for interesting genes and to identify pathways that the gene products are involved in. Currently this is done by hand.
Finally, once the data from ENSEMBL has been retrieved, all the data must be collated and scanned manually to find 'important' genes. Once identified, putative important genes will act as the focus for future laboratory based studies.
Use of services
- Bio5 uses local packages like the Affymetrix data mining tool, Access, Excel, Genespring, Genesis and R. Web based tools include ENSEMBL, various tools on the NCBI site (Locus Link, OMIN etc). BLAST at EBI and NCBI (NCBI has prettier output).
- Bio5 gets help with other tools such as Kegg, Interpro and GO browsers
- Quality of service is assessed as ease and speed of use as the major criteria. The ease of interpreting data is important. Graphical output preferred and lots of hyperlinks to other services using same data.
- Slow internet connection and incomprehensible interfaces discourage Bio5 from using a service. Fast connection, ease of use, simple & pretty interface, lots of online help/explanation, encourage Bio5 to go back to a site.
- If she finds something useful, Bio5 will recommend it to colleagues by word of mouth.
- Bio5 hasn't really found a poor service in terms of analysis provided. As pointed above usability is the main criteria.
- Bio5 would like to see more ways of linking the services together without having to reformat data in databases or excel spreadsheets.
Discovery of services
- If Bio5 is happy with a service she won't look for a new one.
- Other people may suggest alternatives and new services e.g. at coffee
- When looking for new service Bio5 asks colleagues, boss, uses Google or asks a handy bioinformatician for help.
Tracking changes to services
- Bio5 would like be notified when changes 'happened' to the genes that she was interested in e.g. A 'watch this gene' scenario equivalent to the watch this item on Ebay! Changes such as contig reassembly, annotation, new papers, new homologues etc.
- Information in the human genome changes rapidly at the moment as its still being sequenced
- Once a week would be fine for notification
- An email would be the best form of notification. Should include hyperlinks to the changed data/services otherwise may never get round to tracking it down.
Recording of the in silico process
- Bio5 only records the process once she has finally worked out how to do it (by a trial and error process). She writes it down in her lab book.
- Not very much time is spent recording the in-silico process (no were near as much time as that is spent recording experimental results) - its too difficult and tedious to describe. Easier to remember it.
- Would be nice if process was recorded for you automatically in a format that you could print out and stick in your lab-book.
Management of provenance
- Bio5 does appreciate the limitations of the available data but its better than nothing. After all, the bioinformatics work only provides clues and pointers to the next lab experiment. Everything has to be re-checked experimentally at the end of the day. Including re-sequencing, confirmatory PCR and checking expression 'properly' using Northern and Western Blots.
- You can see were data has come from. Some labs have 'better' reputations that others.
- Work is published in papers and presented at conferences.
- Trusts data from big institutions like EBI and NCBI
Data Storage
- Data is stored in files (Excel, Word and raw HTML saved from web pages) on the hard disk of the local machine or backed up on Zip disks.
- Bio5s' bioinformatician helper stores data in databases for her
- Changed datasets are saved as a file with a new name. Would be nice to have data changes recorded automatically for you.
Sharing of personalised data
- Excel files and word files are sent to her boss.
- Data is not usually shared with anybody else unless they ask for it. They are given it as a printout or in disk.
Notes:
type here
Discussion:
type here
--
AnilWipat - 05 Jan 2003