r9 - 15 Aug 2006 - 13:42:09 - StianSoilandYou are here: myGrid wiki >  Techreq Web  > R
High Level Requirement Specification

R libraries

Reference Techreq.R
Referenced Use-cases QtlMicroarray LandscapeGenomics, MicroArray
Dependencies ExpressionProfiler BioConductor
Champion StianSoiland
Status DEFERRED

  Taverna 1.3 Taverna 2.0
Priority 2  
Rough estimate MONTH(s) -

Overview

Provide statistical analysis tools from R to Taverna, making it easier for biologists who don't know R to do statistical analysis. Browsable list of functions (ie. services), most importantly exposed from BioConductor.

Overall Goals

Short Term goals:

  • Implementation of workflows using ExpressionProfiler components where possible, and interfacing with the R environment where no EP component is available, so some R functionality can be exposed as services in Taverna for workflow composition.
  • Expose some existing EP components in Taverna so they can be composed using the Taverna editor and submitted to the Taverna engine for execution.
  • Documentation on how to expose EP/R components in Taverna, e.g. expressing the components in Scufl and submitting them for execution using the workflow engine
  • Documentation on how to use the Taverna Java API and the consumer tool

Medium to Long-term goals:

  • It should come with a set of pre-packaged services specific to microarray so it�s easy for microarray data analysis users to get on with their work. For instance, there could be services for performing various QA tests, normalisation, t-test, RMA, clustering (hierarchical/k-means), PCA, annotation.
  • Pre-configured library (or set of services) to expose R and BioConductor functionality need to be provided. This is essential because research community shares most algorithms as R scripts/packages for BioConductor.
  • It should be easy to add new packages, expose existing algorithms written as perl/R scripts as services. Good documentation and demonstrations on how to do this would be important.
  • The performance should be good for microarray data analysis, which involves performing lots of statistical tests on a set of fairly large (~10-200MB) data files. Using XML messages to pass them between services might produce significant overheads.
  • By design data communication between services should be minimal i.e. services should be able to pass references to input data for services.
  • Monitoring and logging of jobs via a web-based interface and a documented API (Java/WS/etc.) to these services
  • A set of demonstration applications should be part of the distribution. This would make it easy for new users to see how to do things in the correct way.
  • There should be good documentation (manuals, tutorials, how to etc.) available for users to do their work effectively.
  • Service discovery could be improved. It could be difficult for users/developers to design workflows when there is not enough description provided by service providers. Perhaps a minimal expected description could be recommended/required to publish a service in the registry.
  • The workflow editor could be improved for usability.
  • The timelines or project plan should be published so user community knows what to expect in what form of timeframe and could provide feedback to improve.

Assessment

R support is probably wholely supported by the ExpressionProfiler and BioConductor, which are listed as separate requirements. This requirement is left over the ensure that all R requirements are met by these two.

Affected Components

Taverna workbench, Enactor?, Feta (potentially Kave and datastore?)

Key Tasks

  1. Examine and choose between the different paths that satisfy this requirement, such as: (3 days)
    • BioConductor and ExpressionProfiler web services could satisfy most needs
    • Locally running Rserve with scavenger that browses registered functions and libraries, allows any function/script
    • Site-local running Rserve, allows only security-approved functions
    • Globally exposed Rserve, but then why not a web service?
  2. Introspect BioConductor/ExpressionProfiler/R for automatically generating either web services or local processors (2 weeks).

Appendix

See also BioConductor, ExpressionProfiler, MicroArray and MicroArray.

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r9 < r8 < r7 < r6 < r5 | More topic actions
 
Powered by myGrid wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding myGrid wiki? Send feedback