Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Trivial
-
Resolution: Unresolved
-
Affects Version/s: 1.4, 1.5
-
Fix Version/s: 2.0 essential
-
Component/s: Taverna Core
-
Labels:None
Description
Most of our datas, such as the Scufl workflow and the Baclava document, are stored as XML. By default, XML uses utf-8 as encoding.
However, most of our current code uses tricks like FileReader and FileWriter to write and read that XML. That means that unicode characters are not stored and read correctly, because, File*er uses whatever is the default encoding on the particular platform.
We should save and load as UTF-8 all over.
The impact factor is not very big, except that for say saving workflows for authors with non-English names, such as my own (Stian Søiland). The changes required are not very big either.
Example:
code
InputStreamReader isr = new InputStreamReader(is, Charset.forName("UTF-8"));
code
Fixed for Workflow load/save, and for input document loading. Should also do this properly for output document saving ("Save as XML"), result document, etc. and in general everywhere some XML is loaded or saved.
For inputs loaded from file I think it's OK to use the system encoding by now, unless that is an XML input document..