myGrid

Out of memory error on repeated runs

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 1.4
  • Fix Version/s: 1.5
  • Component/s: None
  • Labels:
    None

Description

By running a workflow using medium large inputs (12 MB), after a few runs, Taverna will die of OutOfMemoryException.

Reported by Iain Milne imilne@scri.ac.uk to taverna-users, thread "CPU at 100% when dealing with large Affymetrix .cel files"

(the CPU-usage was solved by disabling provenance)

http://germinate.scri.ac.uk:8080/normalization/

You can find a simple workflow that expects two input files and will
produce one output file. The input files are also listed there (~12MB
each).

Activity

Hide
Stian Soiland-Reyes added a comment - 2006-08-25 16:03

Resolved by several patches committed. There were two reasons why we leaked memory.

The thread:
freefluo.util.event.Queue.ConsumerThread

lived on and kept references Queue -> Flow -> Hash* -> PortTask -> DataThing -> large blob

This was resolved by appropriately calling the .destroy() method on the workflow instance in detachModel of the invocation panel. This method has been in the interfaces all the time, but was never actually invoked anywhere.

The other part was that the WorkflowEditor used by the EnactorInvocation set static members VertexView.renderer and EdgeView.renderer to instances that implicitly referenced back to the WorkflowEditor and then the EnactorInvocation, which referenced the data sets.

These static members are now cleared on WorkflowEditor.detachFromModel. We set them to null, so this could potentially raise some problems later on, further investigation would have to check what these members are originally, and preferably get away from the non-thread safe setting of these members.

Show
Stian Soiland-Reyes added a comment - 2006-08-25 16:03 Resolved by several patches committed. There were two reasons why we leaked memory. The thread: freefluo.util.event.Queue.ConsumerThread lived on and kept references Queue -> Flow -> Hash* -> PortTask -> DataThing -> large blob This was resolved by appropriately calling the .destroy() method on the workflow instance in detachModel of the invocation panel. This method has been in the interfaces all the time, but was never actually invoked anywhere. The other part was that the WorkflowEditor used by the EnactorInvocation set static members VertexView.renderer and EdgeView.renderer to instances that implicitly referenced back to the WorkflowEditor and then the EnactorInvocation, which referenced the data sets. These static members are now cleared on WorkflowEditor.detachFromModel. We set them to null, so this could potentially raise some problems later on, further investigation would have to check what these members are originally, and preferably get away from the non-thread safe setting of these members.
Hide
Stian Soiland-Reyes added a comment - 2006-08-25 16:08

Need to confirm this by running the workflow from Iain Milne

Show
Stian Soiland-Reyes added a comment - 2006-08-25 16:08 Need to confirm this by running the workflow from Iain Milne
Hide
Stian Soiland-Reyes added a comment - 2006-08-31 12:49

Should now be fixed, see email to taverna-users 2006-08-31:

I've now submitted several (some might say dirty) patches to CVS that
fix most of these problems.

There's still an issue that GUI elements (like buttons) stay alive in
memory longer than thay need to, for instance even after closing the
"Run workflow" window, that window object lives on (through 50 reference
step into some internal sun.awt.crap), and by that also the data that
was loaded at the time. However, when a new Run window is made, those
resources seems to be freed. It seems that lots of this is caused by Sun
Java internals that are not easy to debug.

With my patches, I managed to run the workflow from Iain several times,
although it needs almost 400 MB while running.

When the workflow is finished, and I'm closing down both the "Run"
window and the "Result" window, (and manually forcing a GC), Taverna
uses about 78 MB on Java heap.

Of these, 23 MB are the two previously loaded data files, which are
cleared when the next "Run" window is made, dropping usage to 55 MB.

This happens even the third time that workflow is run, so I believe it
would now be possible to run such workflows many times. You'll have to
wait until the next Taverna 1.5 comes out or build it from CVS [1] to
test for yourself.

I still believe there could be similar memory issues with other GUI
elements, processors, etc, and we'll have to keep this in mind when
refactoring GUI code.

[1] http://www.mygrid.org.uk/wiki/Mygrid/BuildingTaverna

Show
Stian Soiland-Reyes added a comment - 2006-08-31 12:49 Should now be fixed, see email to taverna-users 2006-08-31: I've now submitted several (some might say dirty) patches to CVS that fix most of these problems. There's still an issue that GUI elements (like buttons) stay alive in memory longer than thay need to, for instance even after closing the "Run workflow" window, that window object lives on (through 50 reference step into some internal sun.awt.crap), and by that also the data that was loaded at the time. However, when a new Run window is made, those resources seems to be freed. It seems that lots of this is caused by Sun Java internals that are not easy to debug. With my patches, I managed to run the workflow from Iain several times, although it needs almost 400 MB while running. When the workflow is finished, and I'm closing down both the "Run" window and the "Result" window, (and manually forcing a GC), Taverna uses about 78 MB on Java heap. Of these, 23 MB are the two previously loaded data files, which are cleared when the next "Run" window is made, dropping usage to 55 MB. This happens even the third time that workflow is run, so I believe it would now be possible to run such workflows many times. You'll have to wait until the next Taverna 1.5 comes out or build it from CVS [1] to test for yourself. I still believe there could be similar memory issues with other GUI elements, processors, etc, and we'll have to keep this in mind when refactoring GUI code. [1] http://www.mygrid.org.uk/wiki/Mygrid/BuildingTaverna

People

Vote (0)
Watch (0)

Dates

  • Created:
    2006-08-18 10:01
    Updated:
    2006-08-31 12:49
    Resolved:
    2006-08-31 12:49