通过分组、计数和过滤操作收集流

java java-8 java-stream

2022-09-04 03:01:57

我正在尝试收集流丢弃很少使用的项目，如以下示例所示：

import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;

@Test
public void shouldFilterCommonlyUsedWords() {
    // given
    List<String> allWords = Arrays.asList(
       "call", "feel", "call", "very", "call", "very", "feel", "very", "any");

    // when
    Set<String> commonlyUsed = allWords.stream()
            .collect(groupingBy(Function.identity(), counting()))
            .entrySet().stream().filter(e -> e.getValue() > 2)
            .map(Map.Entry::getKey).collect(toSet());

    // then
    assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}

我有一种感觉，可以做得简单得多 - 我是对的吗？

答案 1

没有办法创建一个，除非你想接受一个非常高的 CPU 复杂性。Map

但是，您可以删除第二个操作：collect

Map<String,Long> map = allWords.stream()
    .collect(groupingBy(Function.identity(), HashMap::new, counting()));
map.values().removeIf(l -> l<=2);
Set<String> commonlyUsed=map.keySet();

请注意，在 Java 8 中，仍然包装 a ，因此，当您首先想要 a 时，使用 a 不会浪费空间给定的当前实现。HashSetHashMapkeySet()HashMapSet

当然，您可以将后期处理隐藏在感觉更“流畅”的中：Collector

Set<String> commonlyUsed = allWords.stream()
    .collect(collectingAndThen(
        groupingBy(Function.identity(), HashMap::new, counting()),
        map-> { map.values().removeIf(l -> l<=2); return map.keySet(); }));

答案 2

不久前，我为我的库写了一个实验方法：distinct(atLeast)

public StreamEx<T> distinct(long atLeast) {
    if (atLeast <= 1)
        return distinct();
    AtomicLong nullCount = new AtomicLong();
    ConcurrentHashMap<T, Long> map = new ConcurrentHashMap<>();
    return filter(t -> {
        if (t == null) {
            return nullCount.incrementAndGet() == atLeast;
        }
        return map.merge(t, 1L, (u, v) -> (u + v)) == atLeast;
    });
}

所以这个想法是像这样使用它：

Set<String> commonlyUsed = StreamEx.of(allWords).distinct(3).toSet();

这执行有状态过滤，看起来有点丑陋。我怀疑这样的功能是否有用，所以我没有将其合并到主分支中。尽管如此，它在单流传递中也能完成这项工作。也许我应该恢复它。同时，您可以将此代码复制到静态方法中，并按如下方式使用它：

Set<String> commonlyUsed = distinct(allWords.stream(), 3).collect(Collectors.toSet());

更新（2015/05/31）：我已将 distinct（至少） 方法添加到 StreamEx 0.3.1。它是使用自定义拆分器实现的。基准测试表明，对于顺序流，此实现比上述有状态过滤快得多，并且在许多情况下，它也比本主题中提出的其他解决方案更快。此外，如果在流中遇到，它也可以很好地工作（收集器不支持作为类，因此如果遇到基于-的解决方案将失败）。nullgroupingBynullgroupingBynull