What is more efficient: sorted stream or sorting a list?

java arraylist java-8 java-stream

2022-09-01 23:18:07

Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:

Collection<Item> items = ...;
Comparator<Item> itemComparator = ...;

One of the approaches is to sort items in a list, something like:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

Anothe approach is using a sorted stream:

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

I wonder, which approach is more efficient? Are there any advantages of a sorted stream (like faste sorting on multiple cores)?

Efficient in a sense of runtime complexity/fastest.

I don't trust myself to implement a perfect benchmark and studying did not really enlighten me.SortedOps

答案 1

To be honest I don't trust myself too much either in (unless I understand the assembly, which takes lots of time in my case), especially since I've used , but here is a small test (I took the generation from some other test I did, but it should not matter, it's just some data to sort)JMH@Setup(Level.Invocation)StringInput

@State(Scope.Thread)
public static class StringInput {

    private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
            "y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };

    public String s = "";

    public List<String> list;

    @Param(value = { "1000", "10000", "100000" })
    int next;

    @TearDown(Level.Invocation)
    public void tearDown() {
        s = null;
    }

    @Setup(Level.Invocation)
    public void setUp() {

         list = ThreadLocalRandom.current()
                .ints(next, 0, letters.length)
                .mapToObj(x -> letters[x])
                .map(x -> Character.toString((char) x.intValue()))
                .collect(Collectors.toList());

    }
}


@Fork(1)
@Benchmark
public List<String> testCollection(StringInput si){
    Collections.sort(si.list, Comparator.naturalOrder());
    return si.list;
}

@Fork(1)
@Benchmark
public List<String> testStream(StringInput si){
    return si.list.stream()
            .sorted(Comparator.naturalOrder())
            .collect(Collectors.toList());
}

Results show that is faster, but not by a big margin:Collections.sort

Benchmark                                 (next)  Mode  Cnt   Score   Error  Units
streamvsLoop.StreamVsLoop.testCollection    1000  avgt    2   0.038          ms/op
streamvsLoop.StreamVsLoop.testCollection   10000  avgt    2   0.599          ms/op
streamvsLoop.StreamVsLoop.testCollection  100000  avgt    2  12.488          ms/op
streamvsLoop.StreamVsLoop.testStream        1000  avgt    2   0.048          ms/op
streamvsLoop.StreamVsLoop.testStream       10000  avgt    2   0.808          ms/op
streamvsLoop.StreamVsLoop.testStream      100000  avgt    2  15.652          ms/op

答案 2

It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)

Looking at Java 8 source code for streams (specifically the internal class ), the method adds a component to a stream pipeline that captures all of the stream elements into either an array or an .java.util.stream.SortedOpssorted()ArrayList

An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.
Otherwise, an is used to gather the elements to be sorted.ArrayList

If an is used, you incur the extra overhead of building / growing the list.ArrayList

Then we return to two versions of the code:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

In this version, the constructor copies the elements to an appropriately sized array, and then does an in-place sort of that array. (This happens under the covers).ArrayListitemsCollections.sort

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

In this version, as we have seen above, the code associated with either builds and sorts an array (equivalent to what happens above) or it builds the the slow way. But on top of that, there are the overheads of stream the data from and to the collector.sorted()ArrayListitems

Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the sorting will tend to dominate the overheads of copying. That will mean that the relative difference between the two versions will get smaller.O(NlogN)O(N)

If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)