为什么 HashSet.removeAll 需要二次操作?

2022-09-04 21:57:46

我有这个代码来生成一个并调用它。我创建了一个类,它只是一个int的包装器,它记录了调用的次数 - 程序输出该数字。HashSetremoveAll()Aequals

import java.util.*;

class A {
    int x;
    static int equalsCalls;

    A(int x) {
        this.x = x;
    }

    @Override
    public int hashCode() {
        return x;
    }

    @Override
    public boolean equals(Object o) {
        equalsCalls++;
        if (!(o instanceof A)) return false;
        return x == ((A)o).x;
    }

    public static void main(String[] args) {
        int setSize = Integer.parseInt(args[0]);
        int listSize = Integer.parseInt(args[1]);
        Set<A> s = new HashSet<A>();
        for (int i = 0; i < setSize; i ++)
            s.add(new A(i));
        List<A> l = new LinkedList<A>();
        for (int i = 0; i < listSize; i++)
            l.add(new A(i));
        s.removeAll(l);
        System.out.println(A.equalsCalls);
    }
}

事实证明,该操作并不像我预期的那样是线性的:removeAll

$ java A 10 10
55
$ java A 100 100
5050
$ java A 1000 1000
500500

事实上,它似乎是二次的。为什么?

用修复替换该行可以按照我期望的方式运行:s.removeAll(l);for (A b : l) s.remove(b);

$ java A 10 10
10
$ java A 100 100
100
$ java A 1000 1000
1000

为什么?

这是一个显示二次曲线的图表:

enter image description here

x 轴和 y 轴是 Java 程序的两个参数。z 轴是呼叫数。A.equals


该图由此渐近线程序生成:

import graph3;
import grid3;
import palette;

currentprojection=orthographic(0.8,1,1);

size(400,300,IgnoreAspect);

defaultrender.merge=true;

real[][] a = new real[][]{
  new real[]{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},
  new real[]{0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
  new real[]{0,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3},
  new real[]{0,1,2,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6},
  new real[]{0,1,2,3,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10},
  new real[]{0,1,2,3,4,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15},
  new real[]{0,1,2,3,4,5,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21},
  new real[]{0,1,2,3,4,5,6,28,28,28,28,28,28,28,28,28,28,28,28,28,28},
  new real[]{0,1,2,3,4,5,6,7,36,36,36,36,36,36,36,36,36,36,36,36,36},
  new real[]{0,1,2,3,4,5,6,7,8,45,45,45,45,45,45,45,45,45,45,45,45},
  new real[]{0,1,2,3,4,5,6,7,8,9,55,55,55,55,55,55,55,55,55,55,55},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,66,66,66,66,66,66,66,66,66,66},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,78,78,78,78,78,78,78,78,78},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,91,91,91,91,91,91,91,91},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,105,105,105,105,105,105,105},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,120,120,120,120,120,120},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,136,136,136,136,136},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,153,153,153,153},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,171,171,171},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,190,190},
  new real[]{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,210},
};


surface s=surface(a,(-1/2,-1/2),(1/2,1/2),Spline);
draw(s,mean(palette(s.map(zpart),Rainbow())),black);
//grid3(XYZgrid);

该数组由以下 Haskell 程序生成:a

import System.Process
import Data.List

f :: Integer -> Integer -> IO Integer
f x y = fmap read $ readProcess "/usr/bin/java" ["A", show x, show y] ""

g :: [[Integer]] -> [String]
g xss = map (\xs -> "new real[]{" ++ intercalate "," (map show xs) ++ "},") xss

main = do
  let n = 20
  xs <- mapM (\x -> mapM (\y -> f x y) [0..n]) [0..n]
  putStrLn $ unlines $ g xs

答案 1

工作所需的时间取决于您通过的收藏类型。当您查看以下各项的实现时:removeAllremoveAll

public boolean removeAll(Collection<?> c) {
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

可以看到,当 和 集合具有相同的大小时,它会迭代 和 调用每个元素。由于您使用 a 作为参数,因此这是每个元素的 O(n) 操作。由于它需要对要删除的 n 个元素中的每个元素执行此操作,因此结果是 O(n2) 操作。HashSetHashSetc.containsLinkedList

如果将 替换为提供 更有效实现 的集合,则 的性能应相应提高。如果使集合大于集合,则性能也应显著提高,因为它随后将循环访问集合。LinkedListcontainsremoveAllHashSet


答案 2