LogBook DataLineage
DataLineage is a visualisation component added to
LogBook version 1.2.4. It allows to:
- display the data derivation graph of a workflow run
- the actual data values
- similar data
Launching
The data lineage visualisation of a workflow run is launched by pressing the
DataLineage button in the GUI:
A new window then opens with a graph on the left. In the current version, there is a very basic graph layout, so it will probably look messy, unbearably so if it is a very large graph:
You can, however, manually arrange the graph using your mouse:
The nodes of the graph represent data; an arrow from a node A to a node B represent the fact that
A is (directly) derived from data
B.
We distinguish between intermediate data and workflow inputs and outputs.
Intermediate Data
Intermediate data is yellow for data items and green for data collections (lists). The names consist of the processor name followed by colon follow by the corresponding output port name. Thus, eg,
getImageLinks:imageLinks.
Workflow Inputs and Outputs
The names of inputs and outputs of workflows are just the port names, eg
todaysDilbert
Workflow outputs are light blue with rounded edges and workflow inputs are darker blue with even more rounded edges:
Every time a node is selected, the corresponding data is displayed in the right hand side of the window:

Again, please note that you might need some window adjustment in order to display the data properly.
Failures
Data associated with failed processor runs is marked as red:
Similar Data
By clicking on the
Similar Data button you get a list (in decreasing chronological order) of all data similar to the selecte one. At the moment the notion of similarity is purely sintactical, ie we look for data with the very same (complete) name:
By
double-clicking on an item in the list a new data lineage visualisation widow opens corresponding to the selected data, together with its workflow run:
One can then again organise the window and compare it with the previous one:
Source Code
Anonymous CVS:
- Host: cvs.mygrid.info
- Path: /usr/local/cvs/mygrid
- Module: datalineage
Author
Acknowledgements
Thanks to the other members of the myGrid team.
Code debugged using the YourKit
Java Profiler.