Mchr3k - Coding: Java Bytecode

In my last post I talked about some new code coverage filters which I have added to my fork of the JaCoCo project. These filters are used at coverage analysis time when the recorded coverage data is analysed against the corresponding class bytecode. In the process of writing these filters I came across the grim truth behind Java finally blocks.

Finally Blocks

It turns out there is no explicit support for this Java language feature so in bytecode it's use has unfortunate consequences. Consider the following Java code:

try {
  try {
    System.out.println("A");
  } catch (RuntimeException ex) {
    System.out.println("B");
  } finally {
    System.out.println("C");
  }
} catch (RuntimeException ex) {
  System.out.println("D");
} finally {
  System.out.println("E");
}

This will end up producing the following bytecode (annotated with line numbers and exception handlers):

try{
try{
try{
try{
L3:
  0 getstatic <java/lang/System.out>
  3 ldc <A>
  5 invokevirtual <java/io/PrintStream.println>
  8 goto 42 (+34)

} catch (RuntimeException) => 11
L4:
 11 astore_1
L5:
 12 getstatic <java/lang/System.out>
 15 ldc <B>
 17 invokevirtual <java/io/PrintStream.println>
} catch (Anything) => 31
L7:
 20 getstatic <java/lang/System.out>
 23 ldc <C>
 25 invokevirtual <java/io/PrintStream.println>
 28 goto 84 (+56)

L6:
 31 astore_2
L7:
 32 getstatic <java/lang/System.out>
 35 ldc <C>
 37 invokevirtual <java/io/PrintStream.println>
L8:
 40 aload_2
 41 athrow

L7:
 42 getstatic <java/lang/System.out>
 45 ldc <C>
 47 invokevirtual <java/io/PrintStream.println>
L8:
 50 goto 84 (+34)

} catch (RuntimeException) => 53
L9:
 53 astore_1
L10:
 54 getstatic <java/lang/System.out>
 57 ldc <D>
 59 invokevirtual <java/io/PrintStream.println>
L12:
 62 getstatic <java/lang/System.out>

} catch (Anything) => 73
 65 ldc <E>
 67 invokevirtual <java/io/PrintStream.println>
 70 goto 92 (+22)

L11:
 73 astore_3
L12:
 74 getstatic <java/lang/System.out>
 77 ldc <E>
 79 invokevirtual <java/io/PrintStream.println>
L13:
 82 aload_3
 83 athrow

L12:
 84 getstatic <java/lang/System.out>
 87 ldc <E>
 89 invokevirtual <java/io/PrintStream.println>
L14:
 92 return

From a code coverage point of view this is a complete nightmare. For any branch within a finally block the number of actual branches in the bytecode is multiplied by the number of times that the finally block is duplicated (at least two).

Technically this is correct as to cover all the possible paths through the bytecode you must consider all of the possible exception paths. However, in practice the developer is likely only concerned with ensuring that all the branches in their finally blocks are covered at least once and don't want to worry about the possible paths into the finally block.

Implementing the filtering required to deduplicate coverage of finally blocks is made tricky by the fact that the start/end of the duplicate blocks are not explicitly annotated in any way in the bytecode. The closest I could get was to write a heuristic which works during the analysis of a method:

When a new line number marker is seen a new list of instructions is started.
Instructions are added to this list as they are seen. The instruction arguments are ignored.
After all the instructions in a method have been visited:

For each line number which was seen:

The instruction sequences seen with that line number are "clustered". Sequences which start with the same instruction are grouped together and assumed to be copies caused by the line being within a finally block.

For each covered probe:

Find the instruction sequence which contains the probe.
Mark the corresponding probe in every other sequence in the cluster as covered.

Run the existing probe coverage propagation.
For each cluster:

If the last instruction of any sequence in the cluster is covered, mark all other sequences in the cluster as covered.
Disable coverage reporting for all but one sequence.

This (relatively) simple heuristic is good enough to work in most cases. However it is straightforward to come up with code which will trip this up:

int i = 0;
for (i++; i < 10; i++) {
  System.out.println("A");
}

This produces the following bytecode:

L1:
  0 iconst_0
  1 istore_1
L2:
  2 iinc 1 by 1
  5 goto 19 (+14)
L3:
  8 getstatic #16 <java/lang/System.out>
 11 ldc #22 <A>
 13 invokevirtual #24 <java/io/PrintStream.println>
L2:
 16 iinc 1 by 1
 19 iload_1
 20 bipush 10
 22 if_icmplt 8 (-14)
L5:
 25 return

There are two blocks of code tagged as being on line 2 and they both start with the same instruction. With my heuristic in place one of these blocks will be ignored as they are assumed to be duplicates. This is a shame but in practice I have yet to think of an example of realistic code which would be matched by my heuristic.

Conclusions

I'm pleased that my heuristic works as well as it does but I would love to come up with a better mechanism. If you have any suggestions I would love to hear them!

Mchr3k - Coding

Wednesday, 9 January 2013

Java Bytecode - Finally Blocks