Monday, February 15, 2010

Writing for the web

I wish I had found these articles by Jakob Nielsen years ago (http://www.useit.com/alertbox/). It seems that I have been doing a poor job of making my stuff easy or compelling to read!!

Saturday, February 13, 2010

Wrestling with the JVM Garbage Collector

It must be time to revive this ancient blog of mine.

Recently, a colleage and I were helping a client with an interesting performance issue. The client wanted to run multiple batch jobs in parrallel on an 8 CPU machine. The example job took about 10 minutes to run by itself and was very cpu intensive apart from a tiny bit of file I/O at the start and end of the job.

Running two instances of the job in the same Java virtual machine at once took a bit longer, which I assumed was due to some contention we had yet to spot. To my surprise running three jobs in parrallel took 15 minutes. Running four jobs in parrallel took two hours!

Initially we suspected that the slowdown might be due to excessive swapping of virtual memory to and from disk. However, the machine still had about 3 or 4 Gb of physical memory free out of its 22Gb total, and there did not seem to be any swapping going on. Then we used jconsole to see what the JVM was up to and noticed that it was doing a LOT of garbage collection.

I suspect many Java developers, like myself, tend to forget that garbage collection ever happens. Or at least treat it a bit like air: we have a rudimentary understanding of what it is, but just do not notice it until it hurts!

So my colleage decided it was time that we learnt more about the JVM garbage collector. Here is a simplictic summary which is probably full of errors, but that hopefully conveys enough information about the GC for you to understand what was going on. A better reference is here: http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

The JVM heap is divided into a two sections (generations): young and tenured (actually there is a third "perm" generation which I will ignore here). New objects are created in the young generation. At the risk of over-simplification, the young generation is garbage collected whenever it fills up. This is called a minor collection and is usually very fast (small fraction of a second). After a while, long lived objects are moved into the tenured generation. When the tenured generation fills up, then the whole heap is garbage collected (I think!). This is a major collection and can take many seconds.

Another important thing to know is this: if a major collection leaves the tenured generation full, then all subsequent garbage collections will be major collections until some space is freed up in the tenured generation.

Also: all application threads are suspended while garbage collection occurs.

When a JVM is running in server mode, the default GC uses a concurrent algorithm to garbage collect the young generation, but uses a single thread when garbage collecting the tenured generation. So the difference between a minor and a major collection is even more significant on a multi processor machine.

You can configure the GC to use a parrallel version of the major collection, but this actually made things worse for us, maybe because it actually needs more memory before it is effective.

So what was going on with our application? We had configured the JVM to allow the heap to grow to 16Gb. By default, the tenured generation is twice the size of the young, and so our tenured genuration could grow to just under 11Gb. When running a single job, we noticed that a major garbage collect would leave nearly 4Gb of objects in the tenured generation.

When running 3 jobs concurrently, the tenured generation eventually became permanently full, and so all subsequent garbage collections were major. Hence the 50% increase in time.

When running 4 jobs concurrently, a major garbage collection could only free up a very small amount of memory, which would rapidly fill up and trigger another major garbage collection. Most of the time only one of the eight CPUs seemed to be doing anything which was another hint that it was spending most of its time performing garbage collection.

We provided further evidence that garbage collection was the problem by changing the ratio of the young to tenured generations so that the tenured generation would always have a little free space after a major collection. This resulted in a lot more minor collections but very few major collections. The running time was reduced from 2 hours to 20 minutes! This was still twice as long as it ideally should have been, but the only way that could be solved was to either give the JVM much more memory, or fix the application to not use so much memory (we suspected that it was holding on to some objects long after they were needed).

In the end we ran out of time to explore further. We had tried running each task in its own JVM, but we needed to really combine this with adding a bit more memory to each as well.

So I learnt a couple of valuable lessons out of this. First, it is important that you give the JVM much more memory than it actually needs. Probably 50% to 100% more. The second lesson was that the "stop the world" effect of garbage collection has a more obvious impact on a multi processor machine as the CPU utilization plummets. Of course this may change given that Sun/Oracle have been putting more work into the garbage collector: http://blogs.sun.com/theplanetarium/entry/java_vm_trying_a_new