myGrid

Taverna does not handle large amounts of data well.

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 1.6
  • Component/s: None
  • Labels:
    None

Description

Dealing with large amounts of data should be better handled in T2 with the new enactor.

However, in the meantime, we need to investigate the possibilties for dealing better with large amounts of data. This could possibly be by creating a proxy plug-in which could create references for large blocks of data before they are passed to the Taverna core.

Some more detail:

The proxy would sit between Taverna and the guilty webservice. Taverna would not talk directly with the webservice but would talk to the proxy via its own generated wsdl. The proxy would relay the SOAP request from taverna to the offending webservice. The proxy would be configured to manipulate the incoming SOAP responses, removing the offending data and replacing it with a reference which would tie to a permanant record of the data stored on the proxy and be accessible via that reference.
Ideally the proxy should also be able to handle attachments as well as data embedded within the SOAP response, so also needs to manipulate the HTTP reponses as well as the soap messages.
Also the proxy should stream the incoming data to the data store, to prevent the need to hold the entire response in memory leading to simply moving the memory problem from Taverna to the proxy.

Activity

Hide
June Finch added a comment - 2006-09-04 10:59

Stuart - please modify description text as appropriate when you have done some initial investigation. Is the issue large collections of data or large numbers of small pieces of data? Thanks - June.

Show
June Finch added a comment - 2006-09-04 10:59 Stuart - please modify description text as appropriate when you have done some initial investigation. Is the issue large collections of data or large numbers of small pieces of data? Thanks - June.
Hide
Stuart Owen added a comment - 2006-09-15 10:57

This covers all the steps required to get it up and running, and the time scales also include incorportating tests and good error handling and resiliance.
For now my intention is to store runtime and workflow data as files, but will architect this using a data adaptor layer so that alternative means can be easily incorporated in the future if need be. I expect the best mechanism to become apparent as its being created and once it starts being used.

Show
Stuart Owen added a comment - 2006-09-15 10:57 This covers all the steps required to get it up and running, and the time scales also include incorportating tests and good error handling and resiliance. For now my intention is to store runtime and workflow data as files, but will architect this using a data adaptor layer so that alternative means can be easily incorporated in the future if need be. I expect the best mechanism to become apparent as its being created and once it starts being used.
Hide
Stian Soiland-Reyes added a comment - 2007-08-28 16:26

Was implemented by Stuart as the Data Proxy.

Show
Stian Soiland-Reyes added a comment - 2007-08-28 16:26 Was implemented by Stuart as the Data Proxy.

People

Vote (0)
Watch (0)

Dates

  • Created:
    2006-09-04 10:58
    Updated:
    2007-08-28 16:26
    Resolved:
    2007-08-28 16:26