Scufl Semantics (Khalid Belhajjame)
Following our discussion last myGrid meeting about scufl, I went through the user manual of Taverna and myGrid publications.
I didn’t find the description of Scufl semantics. The reason is that usually we wonder about the behaviour of Taverna in some cases,
and the semantic description would help for that. In the following I present an example of a question that we discussed during an ISPIDER meeting.
Consider the protein identification workflow (I attached a figure that illustrates the workflow). The operation identifyUniprotProtein
outputs a list of uniprot accession number, the members of this list are then processed using uniprot2GO to get the corresponding
Gene Ontology accession number. More than once Gene ontology can be associated with a uniprot accession number. The operation
formatResult is then used for formatting the results. It takes as input the uniprot accession numbers and the gene ontology accession
numbers and should associate every uniprot accession number to the corresponding Gene ontology accession numbers: it is a basic
operation that simply concatenates strings.
As a concrete example, the inputs of formatResult can be the two lists [Q9XAB7, Q5NGG6] and
[[GO:0006098, GO:0017057, GO:0016787],[GO:0008652, GO:0009073]].
Using Taverna, the execution of formatResult is iterated over all combinations of input values.
As a result we will have the four elements:
[Q9XAB7: GO:0006098 GO:0017057 GO:0016787,
Q5NGG6: GO:0006098 GO:0017057 GO:0016787,
Q9XAB7: GO:0008652 GO:0009073,
Q5NGG6: GO:0008652 GO:0009073]
The second and third of this list do not make sense, as they associate the uniprot accession number to non-corresponding gene ontology Ids.
Having checked the Taverna documentation, it appears that Taverna allows modifying the iteration strategy to what is called ‘dot product’.
This is done by modifying the metadata of the formatResult operation using the editor. Using this strategy, the first item from each input list is
processed using formatResult, then the second from each list is processed and so on. The input lists should be of the same size.
The dot strategy meets our needs, enacting the workflow using such a strategy we obtain the following results:
[Q9XAB7: GO:0006098 GO:0017057 GO:0016787,
Q5NGG6 GO:0008652 GO:0009073]
The above gives an example of a question one might have concerning the behaviour of the workflow.
I think a semantic description of the scufl language itself will help as it will allow knowing the behaviour in a systematic way.