TreeSet vs Java 8 Streams 性能初步分析实验设置结果讨论

performance java java-stream treeset

2022-09-02 12:05:01

哪种方式处理独特且已排序的馆藏最有效？

1. 使用树集增强循环

Set<MyObj> ret = new TreeSet<>();
for (Foo foo : foos)
    ret.add(new MyObj(foo));

2. 简单流

List<MyObj> ret = foos.stream().map(MyObj::new)
                      .distinct().sorted()
                      .collect(Collectors.toList());

3. 树集流

Set<MyObj> ret = foos.stream().map(MyObj::new)
                     .collect(Collectors.toCollection(TreeSet::new));

第一种方式似乎是最不优雅但易于阅读的。第二种方法让我担心，并且会处理流两次。最后一种方法感觉还不错，但是流中的TreeSet开销是什么？distinctsorted

有什么线索吗？谢谢。

答案 1

初步分析

从 Stream API 源代码来看，我最初的猜测是：对于许多项目，简单流（2）应该是最快的，性能明显优于 TreeSet 版本（1），那么 TreeSet 流（3）应该会落后一点。对于短数据集，（1）可能比（2）更好，后者比（3）更好，因为流创建会增加一些开销。不同排序的流的工作方式大致如下：

Set<MyObj> set = new HashSet<>();
List<MyObj> result = new ArrayList<>();
for (Foo foo : foos) {
    MyObj myObj = new MyObj(foo);
    if(set.add(myObj))
        result.add(myObj);
}
result.sort(null);
return result;

让我们将此实现添加为（4）。它用于检查结果是否不同，将它们添加到中间容器中，然后对其进行排序。这应该比维护更快，因为我们不需要在每次插入后保持顺序（这样做，可能会重新平衡树）。实际的 Stream 实现效率会稍低一些，因为它无法就地对生成的列表进行排序。相反，它创建中间容器，对其进行排序，然后使用一系列调用将结果转储到最终列表中。HashSetTreeSetTreeSetlist.add

结果可能取决于初始集合中的元素数量以及不同元素的数量。我称之为多样性：多样性= 1意味着大致每个元素都是不同的;多样性 = 0.5 表示每个元素大约重复两次。此外，结果可能在很大程度上取决于初始元素顺序：当输入数据预分类或几乎预排序时，排序算法可能会快几个数量级。foos

实验设置

因此，让我们通过以下方式参数化测试：

尺寸（元素数量）： 10， 1000， 100000foos
多样性（不同多样性的分数）：1、0.5、0.2
预分类：真，假

我假设它只包含一个字段。当然，结果可能在很大程度上取决于类的实现，因为版本（2）和（4）使用，而版本（1）和（3）使用。我们将简单地做到这一点：FoointcompareToequalshashCodeFooequalshashCodecompareTo

@Override
public int hashCode() {
    return x;
}

@Override
public boolean equals(Object o) {
    return this == o || (o != null && getClass() == o.getClass() && x == ((Foo) o).x);
}

@Override
public int compareTo(Foo o) {
    return Integer.compare(x, o.x);
}

预分类元素可以通过以下方式生成：

foos = IntStream.range(0, size)
                .mapToObj(x -> new Foo((int)(x*diversity)))
                .collect(Collectors.toList());

随机元素可以通过以下方式生成：

foos = new Random().ints(size, 0, (int) (size * diversity))
                   .mapToObj(Foo::new)
                   .collect(Collectors.toList());

使用 JMH 1.13 和 JDK 1.8.0_101、VM 25.101-b13 64 位执行测量

结果

预分类（所有时间均以 μs 为单位）：

diversity size      (1)      (2)      (3)      (4)
  1         10      0.2      0.5      0.3      0.2
  1       1000     48.0     36.0     53.0     24.2
  1     100000  14165.7   4759.0  15177.3   3341.6
0.5         10      0.2      0.3      0.2      0.2
0.5       1000     36.9     23.1     41.6     20.1
0.5     100000  11442.1   2819.2  12508.7   2661.3
0.2         10      0.1      0.3      0.2      0.2
0.2       1000     32.0     13.0     29.8     16.7
0.2     100000   8491.6   1969.5   8971.9   1951.7

未预分类：

diversity size      (1)      (2)      (3)      (4)
  1         10      0.2      0.4      0.2      0.3
  1       1000     72.8     77.4     73.6     72.7
  1     100000  21599.9  16427.1  22807.8  16322.2
0.5         10      0.2      0.3      0.2      0.2
0.5       1000     64.8     46.9     69.4     45.5
0.5     100000  20335.2  11190.3  20658.6  10806.7
0.2         10      0.1      0.3      0.2      0.2
0.2       1000     48.0     19.6     56.7     22.2
0.2     100000  16713.0   5533.4  16885.0   5930.6

讨论

我最初的猜测总的来说是正确的。对于预排序的数据（2）和（4）是当我们有100，000个元素时的倍数。当我们有许多重复项时，差异会变得更大，因为它们不会增加排序时间，并且重复插入到比重复插入更有效。对于随机数据，差异不那么明显，因为与TimSort算法（Java用于对列表和数组进行排序）相比，性能对输入数据顺序的依赖要小得多。对于小数据集，简单是快速的，但使用（4）版本也可能具有竞争力。HashSetTreeSetTreeSetTreeSet

此处提供了基准测试的源代码以及原始结果。

答案 2

如果不分析输入，很难给出一个好的答案。无论如何，我会分享我的结果：

我为单个做了一个容器，为单个做了一个容器。此外，我使所有测试都以将数据复制到普通数组中结束。此外，我还添加了两种方法：FoolongMyObjFoo

4）. 简单数组

@Benchmark
public void simpleArray(Blackhole bh) {
    MyObj[] ret = new MyObj[foos.size()];
    for (int i=0;i<foos.size();i++)
        ret[i] = new MyObj(foos.get(i));
    Arrays.sort(ret);
    int lastDistinct = 0;
    for (int i = 1; i < ret.length; i++) {
        if (ret[i].equals(ret[lastDistinct])) {
            continue;
        }
        lastDistinct++;
        ret[lastDistinct] = ret[i];
    }
    MyObj[] ret2 = new MyObj[lastDistinct + 1];
    System.arraycopy(ret, 0, ret2, 0, lastDistinct + 1);
    bh.consume(ret2);
}

和（2）的反转顺序：distinctorder

@Benchmark
public void simpleStream_distinctAfterSort(Blackhole bh) {
    List<MyObj> ret = foos.stream().map(MyObj::new)
            .sorted().distinct()
            .collect(Collectors.toList());
    bh.consume(ret.toArray(new MyObj[ret.size()]));
}

测试设置：

public static final int MAX_SIZE = 10_000;
public static final long ELEM_THRESHOLD = MAX_SIZE * 10;
private List<Foo> foos;

@Setup
public void init() throws IOException, IllegalAccessException, InstantiationException {
    foos = new ArrayList<>(MAX_SIZE);
    for (int i = 0; i < MAX_SIZE; i++) {
        foos.add(new Foo(ThreadLocalRandom.current().nextLong(ELEM_THRESHOLD)));
    }
}

现在，具有不同大小和阈值的结果：

Size=10_000
Threshold=Size*10
Benchmark                                         Mode  Cnt    Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2  478,978          ops/s
StreamBenchmark5.simpleArray                     thrpt    2  591,287          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  407,556          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2  533,091          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2  492,709          ops/s

Size=10_000
Threshold=Size/10
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2   937,908          ops/s
StreamBenchmark5.simpleArray                     thrpt    2   593,983          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  3344,508          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2   560,652          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2  1000,585          ops/s

Size=500_000
Threshold=Size*10
Benchmark                                         Mode  Cnt  Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2  1,809          ops/s
StreamBenchmark5.simpleArray                     thrpt    2  4,009          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  2,443          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2  4,141          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2  2,040          ops/s

Size=500_000
Threshold=Size/10
Benchmark                                         Mode  Cnt   Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2   5,724          ops/s
StreamBenchmark5.simpleArray                     thrpt    2   4,567          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  19,001          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2   4,840          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2   5,407          ops/s

Size=1_000_000
Threshold=Size/100
Benchmark                                         Mode  Cnt   Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2   4,529          ops/s
StreamBenchmark5.simpleArray                     thrpt    2   2,402          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  35,699          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2   2,232          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2   4,889          ops/s

如您所见，根据重复的数量，算法会进行更可取的更改。最平衡的方法是（3），但最快的方法几乎总是简单的流（具有和定位对应于输入数据）。TreeSetorderdistinct

如果你愿意玩一下，这里是测试源。您将需要 JMH。