如何有效地(性能)从Java中的列表中删除许多项目?

2022-09-01 10:27:45

我有相当大的列表命名项目(> = 1,000,000个项目)和一些由<秒>表示的条件,该条件选择要删除的项目,<秒>对于我列表中的许多(也许是一半)项目都是如此。

我的目标是有效地删除通过<秒>选择的项目并保留所有其他项目,可以修改源列表,可以创建新列表 - 应该选择考虑性能的最佳方法。

这是我的测试代码:

    System.out.println("preparing items");
    List<Integer> items = new ArrayList<Integer>(); // Integer is for demo
    for (int i = 0; i < 1000000; i++) {
        items.add(i * 3); // just for demo
    }

    System.out.println("deleting items");
    long startMillis = System.currentTimeMillis();
    items = removeMany(items);
    long endMillis = System.currentTimeMillis();

    System.out.println("after remove: items.size=" + items.size() + 
            " and it took " + (endMillis - startMillis) + " milli(s)");

和幼稚的实现:

public static <T> List<T> removeMany(List<T> items) {
    int i = 0;
    Iterator<T> iter = items.iterator();
    while (iter.hasNext()) {
        T item = iter.next();
        // <cond> goes here
        if (/*<cond>: */i % 2 == 0) {
            iter.remove();
        }
        i++;
    }
    return items;
}

如您所见,我使用项目索引模数 2 == 0 作为删除条件(<>) - 仅用于演示目的。

可以提供什么更好的版本,为什么这个更好的版本实际上更好?removeMany


答案 1

好了,是时候对所提出的方法进行测试了。以下是我测试过的方法(每种方法的名称在我的源代码中也是类名):

  1. NaiveRemoveManyPerformer - ArrayList使用迭代器和删除 - 在我的问题中给出的第一个和幼稚的实现。
  2. BetterNaiveRemoveManyPerformer - ArrayList从头到尾进行向后迭代和删除。
  3. LinkedRemoveManyPerformer- 幼稚迭代器和删除,但正在工作。不适用:仅适用于 .LinkedListLinkedList
  4. CreateNewRemoveManyPerformer - ArrayList作为副本制作(仅添加保留的元素),迭代器用于遍历输入。ArrayList
  5. SmartCreateNewRemoveManyPerformer- 更好 - 结果的初始大小(容量)设置为最终列表大小。缺点:启动时必须知道列表的最终大小。CreateNewRemoveManyPerformerArrayList
  6. FasterSmartCreateNewRemoveManyPerformer- 甚至更好(? - 使用项目索引 () 而不是迭代器。SmartCreateNewRemoveManyPerformeritems.get(idx)
  7. MagicRemoveManyPerformer- 工作到位(无列表副本)从列表末尾的项目开始压缩孔(已删除的项目)。不考虑:此方法更改列表中项目的顺序。ArrayList
  8. ForwardInPlaceRemoveManyPerformer- 工作到位 - 移动保留项目以填补漏洞,最后返回子列表(没有最终删除或清除)。ArrayList
  9. GuavaArrayListRemoveManyPerformer- 谷歌番石榴用于 - 几乎相同,但做最终删除的项目在列表的末尾。Iterables.removeIfArrayListForwardInPlaceRemoveManyPerformer

完整的源代码在本答案的末尾给出。

使用不同的列表大小(从 10,000 个项目到 10,000,000 个项目)和不同的移除因素(指定必须从列表中移除多少个项目)执行的测试。

正如我在这里发布的其他答案的评论一样 - 我认为将项目从第二个复制到秒将比迭代和仅删除项目更快。Sun的Java文档说,与实现相比,常数因子的因子很低,但令人惊讶的是,在我的问题中并非如此。ArrayListArrayListLinkedListArrayListLinkedList

在实践中,简单迭代和删除在大多数情况下具有最佳性能(此方法在 中实现)。通常只有性能是可比的,其他方法明显较慢。Google番石榴比手工制作的类似代码慢(因为我的代码不会删除列表末尾的不必要项目)。LinkedListLinkedRemoveManyPerformerMagicRemoveManyPerformerLinkedRemoveManyPerformerGuavaArrayListRemoveManyPerformer

从 1,000,000 个源项目中删除 500,000 个项目的示例结果:

  1. NaiveRemoveManyPerformer:未执行测试 - 我不是那个病人,但它的表现比.BetterNaiveRemoveManyPerformer
  2. BetterNaiveRemoveManyPerformer: 226080 毫
  3. LinkedRemoveManyPerformer: 69 毫
  4. CreateNewRemoveManyPerformer: 246 毫
  5. SmartCreateNewRemoveManyPerformer: 112 毫
  6. FasterSmartCreateNewRemoveManyPerformer: 202 毫
  7. MagicRemoveManyPerformer: 74 毫
  8. ForwardInPlaceRemoveManyPerformer: 69 毫
  9. GuavaArrayListRemoveManyPerformer: 118 毫

从 1,000,000 个源项目中删除 1 个项目的示例结果(删除第一个项目):

  1. BetterNaiveRemoveManyPerformer: 34 毫
  2. LinkedRemoveManyPerformer: 41 毫
  3. 创建新移除许多性能: 253 毫
  4. SmartCreateNewRemoveManyPerformer: 108 毫
  5. 更快智能创建新移动多个性能: 71 毫
  6. 魔术移除性能: 43 毫
  7. ForwardInPlaceRemoveManyPerformer: 73 毫
  8. 番石榴阵列列表删除许多性能: 78 毫

从 1,000,000 个源项目中删除 333,334 个项目的示例结果:

  1. BetterNaiveRemoveManyPerformer: 253206 毫
  2. LinkedRemoveManyPerformer: 69 毫
  3. 创建新移除许多性能: 245 毫
  4. SmartCreateNewRemoveManyPerformer: 111 毫
  5. 更快智能创建新移动多个性能: 203 毫
  6. 魔术移除性能: 69 毫
  7. ForwardInPlaceRemoveManyPerformer: 72 毫
  8. 番石榴阵列列表删除许多性能: 102 毫

从 1,000,000 个源项目中删除 1,000,000 个(全部)项目的示例结果(所有项目都将被删除,但需要逐个处理,如果您先验地知道要删除所有项目,则应简单地清除 list):

  1. BetterNaiveRemoveManyPerformer: 58 毫
  2. LinkedRemoveManyPerformer: 88 毫
  3. 创建新移除许多性能: 95 毫
  4. SmartCreateNewRemoveManyPerformer: 91 毫
  5. 更快智能创建新移动多个性能: 48 毫
  6. 魔术移除性能: 61 毫
  7. ForwardInPlaceRemoveManyPerformer: 49 毫
  8. 番石榴阵列列表删除许多性能: 133 毫

我最后的结论:使用混合方法 - 如果处理LinkedList - 简单的迭代和删除是最好的,如果处理ArrayList - 这取决于项目顺序是否重要 - 使用ForwardInPlaceRemoveManyPerformer,那么,如果项目顺序可以改变 - 最好的选择是MagicRemoveManyPerformer。如果删除因子是先验的(您知道将删除与保留的项目数量),那么可能会有更多的条件来选择在特定情况下表现更好的方法。但已知的去除因子不是通常的情况...Google番石榴就是这样一种混合解决方案,但假设略有不同(原始列表必须更改,无法创建新列表,并且项目顺序始终很重要) - 这些是最常见的假设,因此在大多数现实生活中都是最佳选择。Iterables.removeIfremoveIf

还要注意,所有好的方法(天真是不好的!)都足够好 - 其中任何一个在实际应用中都应该做得很好,但必须避免幼稚的方法。

最后 - 我的源代码进行测试。

package WildWezyrListRemovalTesting;

import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;

public class RemoveManyFromList {

    public static abstract class BaseRemoveManyPerformer {

        protected String performerName() {
            return getClass().getSimpleName();
        }

        protected void info(String msg) {
            System.out.println(performerName() + ": " + msg);
        }

        protected void populateList(List<Integer> items, int itemCnt) {
            for (int i = 0; i < itemCnt; i++) {
                items.add(i);
            }
        }

        protected boolean mustRemoveItem(Integer itemVal, int itemIdx, int removeFactor) {
            if (removeFactor == 0) {
                return false;
            }
            return itemIdx % removeFactor == 0;
        }

        protected abstract List<Integer> removeItems(List<Integer> items, int removeFactor);

        protected abstract List<Integer> createInitialList();

        public void testMe(int itemCnt, int removeFactor) {
            List<Integer> items = createInitialList();
            populateList(items, itemCnt);
            long startMillis = System.currentTimeMillis();
            items = removeItems(items, removeFactor);
            long endMillis = System.currentTimeMillis();
            int chksum = 0;
            for (Integer item : items) {
                chksum += item;
            }
            info("removing took " + (endMillis - startMillis)
                    + " milli(s), itemCnt=" + itemCnt
                    + ", removed items: " + (itemCnt - items.size())
                    + ", remaining items: " + items.size()
                    + ", checksum: " + chksum);
        }
    }
    private List<BaseRemoveManyPerformer> rmps =
            new ArrayList<BaseRemoveManyPerformer>();

    public void addPerformer(BaseRemoveManyPerformer rmp) {
        rmps.add(rmp);
    }
    private Runtime runtime = Runtime.getRuntime();

    private void runGc() {
        for (int i = 0; i < 5; i++) {
            runtime.gc();
        }
    }

    public void testAll(int itemCnt, int removeFactor) {
        runGc();
        for (BaseRemoveManyPerformer rmp : rmps) {
            rmp.testMe(itemCnt, removeFactor);
        }
        runGc();
        System.out.println("\n--------------------------\n");
    }

    public static class NaiveRemoveManyPerformer
            extends BaseRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            if (items.size() > 300000 && items instanceof ArrayList) {
                info("this removeItems is too slow, returning without processing");
                return items;
            }
            int i = 0;
            Iterator<Integer> iter = items.iterator();
            while (iter.hasNext()) {
                Integer item = iter.next();
                if (mustRemoveItem(item, i, removeFactor)) {
                    iter.remove();
                }
                i++;
            }
            return items;
        }

        @Override
        public List<Integer> createInitialList() {
            return new ArrayList<Integer>();
        }
    }

    public static class BetterNaiveRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
//            if (items.size() > 300000 && items instanceof ArrayList) {
//                info("this removeItems is too slow, returning without processing");
//                return items;
//            }

            for (int i = items.size(); --i >= 0;) {
                Integer item = items.get(i);
                if (mustRemoveItem(item, i, removeFactor)) {
                    items.remove(i);
                }
            }
            return items;
        }
    }

    public static class LinkedRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> createInitialList() {
            return new LinkedList<Integer>();
        }
    }

    public static class CreateNewRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            List<Integer> res = createResultList(items, removeFactor);
            int i = 0;

            for (Integer item : items) {
                if (mustRemoveItem(item, i, removeFactor)) {
                    // no-op
                } else {
                    res.add(item);
                }
                i++;
            }

            return res;
        }

        protected List<Integer> createResultList(List<Integer> items, int removeFactor) {
            return new ArrayList<Integer>();
        }
    }

    public static class SmartCreateNewRemoveManyPerformer
            extends CreateNewRemoveManyPerformer {

        @Override
        protected List<Integer> createResultList(List<Integer> items, int removeFactor) {
            int newCapacity = removeFactor == 0 ? items.size()
                    : (int) (items.size() * (removeFactor - 1L) / removeFactor + 1);
            //System.out.println("newCapacity=" + newCapacity);
            return new ArrayList<Integer>(newCapacity);
        }
    }

    public static class FasterSmartCreateNewRemoveManyPerformer
            extends SmartCreateNewRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            List<Integer> res = createResultList(items, removeFactor);

            for (int i = 0; i < items.size(); i++) {
                Integer item = items.get(i);
                if (mustRemoveItem(item, i, removeFactor)) {
                    // no-op
                } else {
                    res.add(item);
                }
            }

            return res;
        }
    }

    public static class ForwardInPlaceRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            int j = 0; // destination idx
            for (int i = 0; i < items.size(); i++) {
                Integer item = items.get(i);
                if (mustRemoveItem(item, i, removeFactor)) {
                    // no-op
                } else {
                    if (j < i) {
                        items.set(j, item);
                    }
                    j++;
                }
            }

            return items.subList(0, j);
        }
    }

    public static class MagicRemoveManyPerformer
            extends NaiveRemoveManyPerformer {

        @Override
        public List<Integer> removeItems(List<Integer> items, int removeFactor) {
            for (int i = 0; i < items.size(); i++) {
                if (mustRemoveItem(items.get(i), i, removeFactor)) {
                    Integer retainedItem = removeSomeFromEnd(items, removeFactor, i);
                    if (retainedItem == null) {
                        items.remove(i);
                        break;
                    }
                    items.set(i, retainedItem);
                }
            }

            return items;
        }

        private Integer removeSomeFromEnd(List<Integer> items, int removeFactor, int lowerBound) {
            for (int i = items.size(); --i > lowerBound;) {
                Integer item = items.get(i);
                items.remove(i);
                if (!mustRemoveItem(item, i, removeFactor)) {
                    return item;
                }
            }
            return null;
        }
    }

    public static class GuavaArrayListRemoveManyPerformer
            extends BaseRemoveManyPerformer {

        @Override
        protected List<Integer> removeItems(List<Integer> items, final int removeFactor) {
            Iterables.removeIf(items, new Predicate<Integer>() {

                public boolean apply(Integer input) {
                    return mustRemoveItem(input, input, removeFactor);
                }
            });

            return items;
        }

        @Override
        protected List<Integer> createInitialList() {
            return new ArrayList<Integer>();
        }
    }

    public void testForOneItemCnt(int itemCnt) {
        testAll(itemCnt, 0);
        testAll(itemCnt, itemCnt);
        testAll(itemCnt, itemCnt - 1);
        testAll(itemCnt, 3);
        testAll(itemCnt, 2);
        testAll(itemCnt, 1);
    }

    public static void main(String[] args) {
        RemoveManyFromList t = new RemoveManyFromList();
        t.addPerformer(new NaiveRemoveManyPerformer());
        t.addPerformer(new BetterNaiveRemoveManyPerformer());
        t.addPerformer(new LinkedRemoveManyPerformer());
        t.addPerformer(new CreateNewRemoveManyPerformer());
        t.addPerformer(new SmartCreateNewRemoveManyPerformer());
        t.addPerformer(new FasterSmartCreateNewRemoveManyPerformer());
        t.addPerformer(new MagicRemoveManyPerformer());
        t.addPerformer(new ForwardInPlaceRemoveManyPerformer());
        t.addPerformer(new GuavaArrayListRemoveManyPerformer());

        t.testForOneItemCnt(1000);
        t.testForOneItemCnt(10000);
        t.testForOneItemCnt(100000);
        t.testForOneItemCnt(200000);
        t.testForOneItemCnt(300000);
        t.testForOneItemCnt(500000);
        t.testForOneItemCnt(1000000);
        t.testForOneItemCnt(10000000);
    }
}

答案 2

正如其他人所说,你的第一个倾向应该是建立第二个清单。

但是,如果您还想尝试就地编辑列表,那么有效的方法是使用Guava中的Iterables.removeIf()。如果它的论点是一个列表,它会将保留的元素合并到前面,然后简单地砍掉末尾 - 比逐个删除()内部元素要快得多。


推荐