Jvm原生代码编译的疯狂 - 即使在代码编译后，我似乎也会在一段时间内遭受奇怪的性能损失。为什么？

java benchmarking jvm jvm-hotspot

2022-09-04 23:03:16

问题

在Java中对简单的QuickSort实现进行基准测试时，我在绘制的图形中遇到了意想不到的驼峰：n vs time

enter image description here

我知道 HotSpot 会尝试将代码编译为本机，因为某些方法似乎被大量使用，所以我用 .经过反复的试验，它似乎总是以相同的方式编译算法的方法：-XX:+PrintCompilation

@ iteration 6 -> sorting.QuickSort::swap (15 bytes)
@ iteration 7 -> sorting.QuickSort::partition (66 bytes)
@ iteration 7 -> sorting.QuickSort::quickSort (29 bytes)

我重复上面的图形，并添加了此信息，只是为了让事情更清晰一些：

enter image description here

在这一点上，我们都必须问自己：为什么在代码编译后我们仍然得到那些丑陋的驼峰？也许它与算法本身有关？它肯定是可能的，幸运的是，对我们来说，有一种快速的方法来解决这个问题，通过：-XX:CompileThreshold=0

enter image description here

无赖！这真的必须是JVM在后台做的事情。但那又如何呢？我从理论上认为，尽管正在编译代码，但可能需要一段时间才能真正开始使用已编译的代码。也许在这里和那里添加几个可以帮助我们解决这个问题？Thread.sleep()

enter image description here

哎哟！绿色函数是 QuickSort 的代码，每次运行之间有 1000 毫秒的内部运行（详细信息在附录中），而蓝色函数是我们的旧函数（仅用于比较）。

一方面，给火锅时间似乎只会让事情变得更糟！也许它只是由于其他一些因素（例如缓存问题）而变得更糟？

免责声明：我正在为所示图形的每个点运行1000次试验，并用于测量结果。System.nanoTime()

编辑

在这个阶段，你们中的一些人可能想知道使用如何扭曲结果。我再次运行了红色情节（没有本机编译），现在介于两者之间：sleep()

enter image description here

可怕！

附录

在这里，我介绍我正在使用的代码，以防万一：QuickSort

public class QuickSort {

    public <T extends Comparable<T>> void sort(int[] table) {
        quickSort(table, 0, table.length - 1);
    }

    private static <T extends Comparable<T>> void quickSort(int[] table,
            int first, int last) {
        if (first < last) { // There is data to be sorted.
            // Partition the table.
            int pivotIndex = partition(table, first, last);
            // Sort the left half.
            quickSort(table, first, pivotIndex - 1);
            // Sort the right half.
            quickSort(table, pivotIndex + 1, last);
        }
    }

    /**
     * @author http://en.wikipedia.org/wiki/Quick_Sort
     */
    private static <T extends Comparable<T>> int partition(int[] table,
            int first, int last) {
        int pivotIndex = (first + last) / 2;
        int pivotValue = table[pivotIndex];
        swap(table, pivotIndex, last);
        int storeIndex = first;
        for (int i = first; i < last; i++) {
            if (table[i]-(pivotValue) <= 0) {
                swap(table, i, storeIndex);
                storeIndex++;
            }
        }
        swap(table, storeIndex, last);
        return storeIndex;
    }

    private static <T> void swap(int[] a, int i, int j) {
        int h = a[i];
        a[i] = a[j];
        a[j] = h;
    }
}

以及我用来运行基准测试的代码：

public static void main(String[] args) throws InterruptedException, IOException {
    QuickSort quickSort = new QuickSort();

    int TRIALS = 1000;

    File file = new File(Long.toString(System.currentTimeMillis()));
    System.out.println("Saving @ \"" + file.getAbsolutePath() + "\"");

    for (int x = 0; x < 30; ++x) {
    //          if (x > 4 && x < 17)
    //              Thread.sleep(1000);

        int[] values = new int[x];

        long start = System.nanoTime();

        for (int i = 0; i < TRIALS; ++i)
            quickSort.sort(values);

        double duration = (System.nanoTime() - start) / TRIALS;
        String line = x + "\t" + duration;
        System.out.println(line);
        FileUtils.writeStringToFile(file, line + "\r\n", true);
    }
}

答案 1

好吧，看来我自己解决了这个问题。

我的想法是正确的，编译代码可能需要一段时间才能启动。问题是我实际实现基准测试代码的方式中的一个缺陷：

if (x > 4 && x < 17)
    Thread.sleep(1000);

在这里，我假设由于唯一的“受影响”区域将在4到17之间，我可以继续对这些值进行睡眠。事实并非如此。以下情节可能很有启发性：

enter image description here

在这里，我将原始的无编译函数（红色）与另一个无编译函数进行比较，但中间有睡眠。正如你所看到的，它们在不同的数量级上工作，这意味着将有睡眠和不睡觉的代码结果混合在一起会产生不合理的结果，就像我一样。

然而，最初的问题仍然没有答案。是什么原因导致即使在编译发生后也会出现驼峰？让我们试着找出答案，把1s睡眠放在所有采取的点：

enter image description here

这会产生预期的结果。奇怪的驼峰正在发生，原生代码仍然没有启动。

将睡眠 50ms 与睡眠 1000ms 函数进行比较，再次得出预期结果：

enter image description here

（灰色的那个似乎仍然显示一点延迟）

答案 2