对于Java在遍历大型目录时性能不佳，是否有解决方法？

performance java directory-walk

2022-09-01 21:00:25

我正在尝试一次处理一个通过网络存储的文件。由于缓冲不是问题，因此读取文件速度很快。我遇到的问题只是列出文件夹中的目录。在许多文件夹中，每个文件夹至少有10k个文件。

性能非常慢，因为 File.list（）返回数组而不是可迭代的数组。Java会关闭并收集文件夹中的所有名称，并将其打包到数组中，然后再返回。

此错误条目 http://bugs.sun.com/view_bug.do;jsessionid=db7fcf25bcce13541c4289edeb4?bug_id=4285834，没有解决方法。他们只是说这已经在JDK7中修复了。

几个问题：

是否有人有解决此性能瓶颈的方法？
我是在试图实现不可能的事情吗？即使只是循环访问目录，性能是否仍然会很差？
我可以使用具有此功能的测试版 JDK7 版本，而不必在其上构建整个项目吗？

答案 1

虽然它并不漂亮，但我通过在启动应用程序之前将dir / ls的输出管道连接到文件并传入文件名，解决了这种问题。

如果你需要在应用程序中执行此操作，你可以只使用system.exec（），但它会产生一些恶心。

你问。第一种形式将非常快，第二种形式也应该非常快。

确保每行执行一个项目（裸露，无修饰，无图形），所选命令的完整路径和递归选项。

编辑：

30分钟只是为了得到一个目录列表，哇。

我突然想到，如果你使用exec（），你可以让它的stdout重定向到管道中，而不是把它写到一个文件中。

如果这样做，则应立即开始获取文件，并能够在命令完成之前开始处理。

这种互动实际上可能会减慢速度，但也许不会 - 你可以试一试。

哇，我刚刚为你找到了.exec命令的语法，并遇到了这个，可能正是你想要的（它使用exec和“ls”列出了一个目录，并将结果管道化到你的程序中进行处理）：很好的回溯链接（Jörg在注释中提供了替换Oracle破坏的sun的这个目录）

无论如何，这个想法很简单，但正确的代码很烦人。我会从互联网上偷一些代码并破解它们 - brb

/**
 * Note: Only use this as a last resort!  It's specific to windows and even
 * at that it's not a good solution, but it should be fast.
 * 
 * to use it, extend FileProcessor and call processFiles("...") with a list
 * of options if you want them like /s... I highly recommend /b
 * 
 * override processFile and it will be called once for each line of output.
 */
import java.io.*;

public abstract class FileProcessor
{
   public void processFiles(String dirOptions)
   {
      Process theProcess = null;
      BufferedReader inStream = null;

      // call the Hello class
      try
      {
          theProcess = Runtime.getRuntime().exec("cmd /c dir " + dirOptions);
      }
      catch(IOException e)
      {
         System.err.println("Error on exec() method");
         e.printStackTrace();  
      }

      // read from the called program's standard output stream
      try
      {
         inStream = new BufferedReader(
                                new InputStreamReader( theProcess.getInputStream() ));  
         processFile(inStream.readLine());
      }
      catch(IOException e)
      {
         System.err.println("Error on inStream.readLine()");
         e.printStackTrace();  
      }

   } // end method
   /** Override this method--it will be called once for each file */
   public abstract void processFile(String filename);


} // end class

感谢 IBM 的代码捐赠者

答案 2

如何使用File.list（FilenameFilter filter）方法并实现文件名Filter.accept（File dir，String name）来处理每个文件并返回false。

我在Linux vm上为具有10K +文件的目录运行了此程序，<10秒。

import java.io.File;  
import java.io.FilenameFilter;

public class Temp {
    private static void processFile(File dir, String name) {
        File file = new File(dir, name);
        System.out.println("processing file " + file.getName());
    }

    private static void forEachFile(File dir) {
        String [] ignore = dir.list(new FilenameFilter() {
            public boolean accept(File dir, String name) {
                processFile(dir, name);
                return false;
            }
        });
    }

    public static void main(String[] args) {
        long before, after;
        File dot = new File(".");
        before = System.currentTimeMillis();
        forEachFile(dot);
        after = System.currentTimeMillis();
        System.out.println("after call, delta is " + (after - before));
    }  
}