Karma is a research product of Indiana University at Bloomington. Originally conceived by Dennis Gammon, it was Yogesh Simmhan's Ph.D. work.
Yogesh presented Karma at the Lilly kick-off meeting. Here
are the slides.
Interestingly, Karma does not require to have a static model of a workflow before it can start collecting provenance. This is a key to the Lilly project, where such static structure does not exist. Instead, in LSG Karma will catpure user and system (i.e., WS) events that LSG has been instrumented to capture.
Here are some of my questions, not all answered yet:
- scalability:
-
- the model is relational
- recursive queries by "brute force". Queries involve extracting a whole workflow trail into memory (into a service's space) in order to query it
- some published results on scalability, no big deal so far (need to fetch paper)
- however, what about queries over multiple workflows?
- how is the scope defined when there is no "workflow structure" at all? (see below for an example)
- granularity of provenance: Karma assumes a fine-grained state machine and therefore the ability to instrument the middleware at a fine level of granularity.
- cost of instrumenting middleware to extract provenance. There seems to be a high cost involved in externalizing the state
of the components that need to monitored. Is that been measured in some way? - What is the effort involved in instrumenting at that fne level?
- more importantly, what happens when a coarser grain is available from the exec infrastructure?
- cost of instrumenting middleware to extract provenance. There seems to be a high cost involved in externalizing the state
- granularity of resources for which provenance is captured?
- naming scheme for GOID?