(from a mail I tried to send to taverna-hackers) :
Hi guys, t2 data people especially!
I was just drawing a diagram of our data reference architecture to get the thing clear in my mind and a few things stuck out that I'd quite like to change slightly, in particular the mechanism by which we translate one reference type to another.
Firstly there's an open question as to how we treat collections - if I say I require URLs and I ask for a collection it's not an unreasonable expectation that each leaf node (data document) in the collection will have a URL reference scheme in, but that's not how the system currently works.
Secondly there's an issue with write-back, ideally if I cause a reference to be translated I want the translated reference to be visible subsequently when calling the get method on the data manager but at the moment there's no API in
DataManager? to do an update (and I don't think there should be one either, it would be too easy to add a reference to the wrong thing and break the 'all references in a data document are the same underlying data' contract).
I propose we add a method to
DataManager? which is the same as the current resolver but with the ability to specify a set of reference scheme types which must be (all or at least one depending on semantics of the call) present in the returned data document, or ignored if there's a collection. This would be an additional getEntity method with the extra information. The translator infrastructure then hangs on the side of
DataManager? without ever having to be directly exposed to the client code.
I'd also like to split the
DataManager? into two parts - read and write (really 'register'). This is also the split we've been dithering about wrt the peer API - effectively the 'read' part of the
DataManager? is the interface a peer exposes on the network. That makes it possible to have the original
DataManager? API extending both read and write interfaces and makes it easy for us to hide the split away. I'd also remove the getBlobStore from it - there's no reason to treat the blob store distinctly from any other reference type (it just happens to be one we know we can write to locally). The data facade would then communicate directly with the blob store and the data manager when the client wants to store a value.
Thoughts? (this post isn't really meant to be particularly comprehensible unless you've been involved in the data manager work but I prefer to keep things in the open just in case anyone has a sudden attack of inspiration :))
Tom
- V2, shows write-back and makes it explicit that there are APIs to the backing stores:
