HotSpot JIT “Optimization”

I noticed Scala ticket #1377 the other day. Even though I think the bug is valid, it’s for different reasons to the ones supplied by the reporter. I had my doubts that the casts would have any measurable cost and the informal benchmark provided looked suspect. This is all described in the ticket, so I won’t go over it again here. The point of this blog is some weird behaviour I noticed in my version of the benchmark.

The benchmark looks like:

  def f(i: Iterator[_]) = i.hasNext

  def main(args: Array[String]) {
    test()
  }
    
  def test() {
    var i = 0
    while (i < 10) {
      val time = System.currentTimeMillis
      val result = inner()
      i += 1
      println("Time: " + (System.currentTimeMillis - time))
      println("Value: " + result)
    }
  }
    
  def inner() = {
    //val empty = Iterator.empty // checkcast occurs within loop
    val empty: Iterator[_] = Iterator.empty // checkcast outside loop
    var i = 0L
    while (i < 10000000000L) {
      f(empty)
      i += 1
    }
    i
  }
}

According to this version of the benchmark the extra casts don’t cause any performance difference, so I will just focus on the one without the extra casts. The results with JDK 6 Update 10 RC1 64-bit were:

Time: 4903
Time: 4883
Time: 7213
Time: 7197
Time: 7203
Time: 7212
Time: 7185
Time: 7190
Time: 7210
Time: 7188

That is odd. Instead of getting faster, the benchmark gets slower from the third iteration onwards. With JITs like these, we’re better off with interpreters. ;) Ok, that’s an exaggeration but let’s investigate this further.

Before we do so, I should clarify two things. The ones paying attention might wonder where the “Value” output went. The purpose of that is just to make sure HotSpot does not optimize the inner loop away, so I trimmed it from the output. The second point is that the usage of a 64-bit JVM is important because the problem does not occur on the 32-bit version of HotSpot. I initially used the 64-bit version because it’s much faster when performing operations on longs and I had to use a long index in the loop to allow the benchmark to take a reasonable amount of time.

Ok, so first step is to re-run the benchmark with -Xbatch and -XX:+PrintCompilation. The results of that were:

1 b test.PerfTest$::f (7 bytes)
2 b scala.Iterator$$anon$5::hasNext (2 bytes)
1% b test.PerfTest$::inner @ 12 (35 bytes)
Time: 4938
3 b test.PerfTest$::inner (35 bytes)
Time: 4861
Time: 7197
Time: 7199
Time: 7241

Ok, so it seems like the inner loop got JIT’d a second time and that made it slower than the previous version, which sounds like a bug. I converted the code into Java and it turns out that we don’t need much more than a loop in the inner method to reproduce the problem:

static long inner() {
  long i = 0L;
  for (; i < 10000000000L; ++i);
  return i;
}

Before reporting the bug to Sun, I was curious if -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation would give any interesting information. I pasted some parts that someone that is not a HotSpot engineer might understand with the help of this.

<nmethod compile_id='1' compile_kind='osr' compiler='C2' entry='0x000000002ee0d000' size='520' address='0x000000002ee0ced0' relocation_offset='264' code_offset='304' stub_offset='400' consts_offset='420' scopes_data_offset='424' scopes_pcs_offset='456' dependencies_offset='504' oops_offset='512' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' stamp='0.054'/>
<writer thread='12933456'/>
<task_queued compile_id='1' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' blocking='1' stamp='4.965' comment='count' hot_count='2'/>
<writer thread='43477328'/>
  1   b   test.PerfTest::inner (19 bytes)
<nmethod compile_id='1' compiler='C2' entry='0x000000002ee0d240' size='488' address='0x000000002ee0d110' relocation_offset='264' code_offset='304' stub_offset='368' consts_offset='388' scopes_data_offset='392' scopes_pcs_offset='424' dependencies_offset='472' oops_offset='480' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' stamp='4.966'/>
<...>

<task compile_id='1' compile_kind='osr' method='test/PerfTest inner ()J' bytes='19' count='1' backedge_count='14563' iicount='1' osr_bci='5' blocking='1' stamp='0.050'>
<...>
<task_done success='1' nmsize='120' count='1' backedge_count='14563' stamp='0.054'/>

<task compile_id='1' method='test/PerfTest inner ()J' bytes='19' count='2' backedge_count='5000' iicount='2' blocking='1' stamp='4.965'>
<...>
<task_done success='1' nmsize='88' count='2' backedge_count='5000' stamp='4.966'/>

So, it seems like the inner method was first JIT’d through on-stack-replacement (i.e. OSR). Usually, on-stack-replacement does not produce the best code, so HotSpot recompiles the method again once it gets the chance. Unfortunately, it generates slower code for some reason even though the compiled method size is smaller (88 instead of 120).

We could go deeper in this investigation by using a debug JVM like Kohsuke Kawaguchi did here, but I decided to just file a bug and let the HotSpot engineers handle it. :) I will update the blog once a bug number is assigned (I wonder when Sun is going to fix their bug database so that the bug id becomes available after submission of a bug…).

Update: The bug is now available in the Sun database.

Posted in programming, scala Tagged: hotspot, java, jit, jvm, microbenchmarks, optimization, scala

HotSpot JIT “Optimization”

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112