How do I use xpath to extract information from XML?
XPath is a way to express a path to
elements and parts of an XML document. Taverna provides a processor for
this purpose, located under
Local Services ->
Local Java widgets ->
xml ->
XPath From Text.
There is an
XPath tutorial
available online that might be helpful, here we'll just mention a simple
example. Consider a service returns this XML:
<result>
<users>
<user>Katy</user>
<user>Stian</user>
</users>
<groups>
<group id="mygrid">
<user>Katy</user>
<user>Carole</user>
</group>
</groups>
</result>
Say we are only interested in getting the list
{Katy, Stian} out of this
document, ie. the content of the
user elements below
users, but not
the ones below
group.
XPath allows you to express this in a path, similar to a path on a
filesystem. The xpath:
/result/users/user
would select those two
user element. In this case we specified the
full path, which is normally the most precise, and avoids potential
unrelated results such as the two other
user tags belows. Another
path:
//user
would select
all user element, and thus result in the list
{Katy, Stian, Katy, Carole} In many cases this short-form (the blank
node between
// means 'below anything') can be good enough, but as
shown in this example, can be quite volatile if additional information
is added to the XML at a later time.
Note that currently the processor in Taverna has an issue with XML
documents with namespaces. A workaround is suggested, see
TAV-492