气泡排序中的>与>= 会导致显著的性能差异

performance optimization java c++

2022-08-31 13:03:49

我只是偶然发现了一些东西。起初，我认为这可能是一个分支错误预测的情况，就像在这种情况下一样，但我无法解释为什么分支错误预测会导致这种行为。

我在Java中实现了两个版本的Bubble Sort，并做了一些性能测试：

import java.util.Random;

public class BubbleSortAnnomaly {

    public static void main(String... args) {
        final int ARRAY_SIZE = Integer.parseInt(args[0]);
        final int LIMIT = Integer.parseInt(args[1]);
        final int RUNS = Integer.parseInt(args[2]);

        int[] a = new int[ARRAY_SIZE];
        int[] b = new int[ARRAY_SIZE];
        Random r = new Random();
        for (int run = 0; RUNS > run; ++run) {
            for (int i = 0; i < ARRAY_SIZE; i++) {
                a[i] = r.nextInt(LIMIT);
                b[i] = a[i];
            }

            System.out.print("Sorting with sortA: ");
            long start = System.nanoTime();
            int swaps = bubbleSortA(a);

            System.out.println(  (System.nanoTime() - start) + " ns. "
                               + "It used " + swaps + " swaps.");

            System.out.print("Sorting with sortB: ");
            start = System.nanoTime();
            swaps = bubbleSortB(b);

            System.out.println(  (System.nanoTime() - start) + " ns. "
                               + "It used " + swaps + " swaps.");
        }
    }

    public static int bubbleSortA(int[] a) {
        int counter = 0;
        for (int i = a.length - 1; i >= 0; --i) {
            for (int j = 0; j < i; ++j) {
                if (a[j] > a[j + 1]) {
                    swap(a, j, j + 1);
                    ++counter;
                }
            }
        }
        return (counter);
    }

    public static int bubbleSortB(int[] a) {
        int counter = 0;
        for (int i = a.length - 1; i >= 0; --i) {
            for (int j = 0; j < i; ++j) {
                if (a[j] >= a[j + 1]) {
                    swap(a, j, j + 1);
                    ++counter;
                }
            }
        }
        return (counter);
    }

    private static void swap(int[] a, int j, int i) {
        int h = a[i];
        a[i] = a[j];
        a[j] = h;
    }
}

正如我们所看到的，这两种排序方法之间的唯一区别是 vs. 。当运行程序时，人们显然会期望它比因为它必须执行更多的s要慢。但我在三台不同的机器上得到了以下（或类似的）输出：>>=java BubbleSortAnnomaly 50000 10 10sortBsortAswap(...)

Sorting with sortA: 4.214 seconds. It used  564960211 swaps.
Sorting with sortB: 2.278 seconds. It used 1249750569 swaps.
Sorting with sortA: 4.199 seconds. It used  563355818 swaps.
Sorting with sortB: 2.254 seconds. It used 1249750348 swaps.
Sorting with sortA: 4.189 seconds. It used  560825110 swaps.
Sorting with sortB: 2.264 seconds. It used 1249749572 swaps.
Sorting with sortA: 4.17  seconds. It used  561924561 swaps.
Sorting with sortB: 2.256 seconds. It used 1249749766 swaps.
Sorting with sortA: 4.198 seconds. It used  562613693 swaps.
Sorting with sortB: 2.266 seconds. It used 1249749880 swaps.
Sorting with sortA: 4.19  seconds. It used  561658723 swaps.
Sorting with sortB: 2.281 seconds. It used 1249751070 swaps.
Sorting with sortA: 4.193 seconds. It used  564986461 swaps.
Sorting with sortB: 2.266 seconds. It used 1249749681 swaps.
Sorting with sortA: 4.203 seconds. It used  562526980 swaps.
Sorting with sortB: 2.27  seconds. It used 1249749609 swaps.
Sorting with sortA: 4.176 seconds. It used  561070571 swaps.
Sorting with sortB: 2.241 seconds. It used 1249749831 swaps.
Sorting with sortA: 4.191 seconds. It used  559883210 swaps.
Sorting with sortB: 2.257 seconds. It used 1249749371 swaps.

当我将参数设置为 to 时，例如（），我得到预期的结果：LIMIT50000java BubbleSortAnnomaly 50000 50000 10

Sorting with sortA: 3.983 seconds. It used  625941897 swaps.
Sorting with sortB: 4.658 seconds. It used  789391382 swaps.

我将程序移植到C++以确定此问题是否特定于 Java。下面是C++代码。

#include <cstdlib>
#include <iostream>

#include <omp.h>

#ifndef ARRAY_SIZE
#define ARRAY_SIZE 50000
#endif

#ifndef LIMIT
#define LIMIT 10
#endif

#ifndef RUNS
#define RUNS 10
#endif

void swap(int * a, int i, int j)
{
    int h = a[i];
    a[i] = a[j];
    a[j] = h;
}

int bubbleSortA(int * a)
{
    const int LAST = ARRAY_SIZE - 1;
    int counter = 0;
    for (int i = LAST; 0 < i; --i)
    {
        for (int j = 0; j < i; ++j)
        {
            int next = j + 1;
            if (a[j] > a[next])
            {
                swap(a, j, next);
                ++counter;
            }
        }
    }
    return (counter);
}

int bubbleSortB(int * a)
{
    const int LAST = ARRAY_SIZE - 1;
    int counter = 0;
    for (int i = LAST; 0 < i; --i)
    {
        for (int j = 0; j < i; ++j)
        {
            int next = j + 1;
            if (a[j] >= a[next])
            {
                swap(a, j, next);
                ++counter;
            }
        }
    }
    return (counter);
}

int main()
{
    int * a = (int *) malloc(ARRAY_SIZE * sizeof(int));
    int * b = (int *) malloc(ARRAY_SIZE * sizeof(int));

    for (int run = 0; RUNS > run; ++run)
    {
        for (int idx = 0; ARRAY_SIZE > idx; ++idx)
        {
            a[idx] = std::rand() % LIMIT;
            b[idx] = a[idx];
        }

        std::cout << "Sorting with sortA: ";
        double start = omp_get_wtime();
        int swaps = bubbleSortA(a);

        std::cout << (omp_get_wtime() - start) << " seconds. It used " << swaps
                  << " swaps." << std::endl;

        std::cout << "Sorting with sortB: ";
        start = omp_get_wtime();
        swaps = bubbleSortB(b);

        std::cout << (omp_get_wtime() - start) << " seconds. It used " << swaps
                  << " swaps." << std::endl;
    }

    free(a);
    free(b);

    return (0);
}

此程序显示相同的行为。有人能解释一下这里到底发生了什么吗？

先执行，然后执行不会更改结果。sortBsortA

答案 1

我认为这可能确实是由于分支预测。如果将交换数与找到的内部排序迭代次数进行比较：

限制 = 10

A = 560M 交换 / 1250M 环路
B = 1250M 交换 / 1250M 环路（交换比环路少 0.02%）

限制 = 50000

A = 627M 交换 / 1250M 环路
B = 850M 交换 / 1250M 环路

因此，在这种情况下，交换在 B 排序中执行 99.98% 的时间，这显然对分支预测器有利。在这种情况下，隔夜利息仅随机命中 68%，因此分支预测变量的益处较小。Limit == 10Limit == 50000

答案 2

我认为这确实可以用分支错误预测来解释。

例如，考虑 LIMIT=11 和。在外部循环的第一次迭代中，它将很快偶然发现一个等于 10 的元素。所以它将具有，因此肯定将是，因为没有大于 10 的元素。因此，它将执行交换，然后执行一个步骤，只是为了再次找到该步骤（相同的交换值）。因此，它将再次成为，如此之一。除了一开始的几个比较之外，每一个比较都是正确的。同样，它将在外部循环的下一次迭代中运行。sortBa[j]=10a[j]>=a[next]ja[j]=10a[j]>=a[next]

与不同。它将以大致相同的方式开始，偶然发现，以类似的方式进行一些交换，但只有在发现时才达到一定程度。然后条件将为假，并且不会进行任何交换。依此类推：每次它偶然发现时，条件都是假的，并且没有进行任何交换。因此，此条件在 11 次中为真 10 次（值为 0 到 9），在 11 次中有 1 次为假。分支预测失败并不奇怪。sortAa[j]=10a[next]=10a[next]=10a[next]