取消长时间运行的正则表达式匹配?

2022-09-02 00:31:12

假设我正在运行一个服务,用户可以提交一个正则表达式来搜索大量数据。如果用户提交的正则表达式非常慢(即 Matcher.find() 需要几分钟才能返回),我想要一种方法来取消该匹配。我能想到这样做的唯一方法是让另一个线程监视匹配需要多长时间,并在必要时使用Thread.stop()取消它。

成员变量:

long REGEX_TIMEOUT = 30000L;
Object lock = new Object();
boolean finished = false;
Thread matcherThread;

匹配器线程:

try {
    matcherThread = Thread.currentThread();

    // imagine code to start monitor thread is here

    try {
        matched = matcher.find();
    } finally {
        synchronized (lock) {
            finished = true;
            lock.notifyAll();
        }
    }
} catch (ThreadDeath td) {
    // send angry message to client
    // handle error without rethrowing td
}

监视线程:

synchronized (lock) {
    while (! finished) {
        try {
            lock.wait(REGEX_TIMEOUT);

            if (! finished) {
                matcherThread.stop();
            }
        } catch (InterruptedException ex) {
            // ignore, top level method in dedicated thread, etc..
        }
    }
}

我已经读过 java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html,我认为这种用法是安全的,因为我通过同步控制了ThreadDeath被抛出的位置并处理它,唯一损坏的对象可能是我的Pattern和Matcher实例,无论如何都会被丢弃。我认为这会破坏Thread.stop(),因为我没有重新抛出错误,但我真的不希望线程死亡,只需中止find()方法。

到目前为止,我已经设法避免使用这些已弃用的API组件,但是Matcher.find()似乎不可中断,可能需要很长时间才能返回。有没有更好的方法来做到这一点?


答案 1

从 赫里特里克斯: (crawler.archive.org)

/**
 * CharSequence that noticed thread interrupts -- as might be necessary 
 * to recover from a loose regex on unexpected challenging input. 
 * 
 * @author gojomo
 */
public class InterruptibleCharSequence implements CharSequence {
    CharSequence inner;
    // public long counter = 0; 

    public InterruptibleCharSequence(CharSequence inner) {
        super();
        this.inner = inner;
    }

    public char charAt(int index) {
        if (Thread.interrupted()) { // clears flag if set
            throw new RuntimeException(new InterruptedException());
        }
        // counter++;
        return inner.charAt(index);
    }

    public int length() {
        return inner.length();
    }

    public CharSequence subSequence(int start, int end) {
        return new InterruptibleCharSequence(inner.subSequence(start, end));
    }

    @Override
    public String toString() {
        return inner.toString();
    }
}

用这个包装你的字符序列,线程中断将起作用...


答案 2

通过一点点变化,可以避免为此使用其他线程:

public class RegularExpressionUtils {

    // demonstrates behavior for regular expression running into catastrophic backtracking for given input
    public static void main(String[] args) {
        Matcher matcher = createMatcherWithTimeout(
                "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "(x+x+)+y", 2000);
        System.out.println(matcher.matches());
    }

    public static Matcher createMatcherWithTimeout(String stringToMatch, String regularExpression, int timeoutMillis) {
        Pattern pattern = Pattern.compile(regularExpression);
        return createMatcherWithTimeout(stringToMatch, pattern, timeoutMillis);
    }

    public static Matcher createMatcherWithTimeout(String stringToMatch, Pattern regularExpressionPattern, int timeoutMillis) {
        CharSequence charSequence = new TimeoutRegexCharSequence(stringToMatch, timeoutMillis, stringToMatch,
                regularExpressionPattern.pattern());
        return regularExpressionPattern.matcher(charSequence);
    }

    private static class TimeoutRegexCharSequence implements CharSequence {

        private final CharSequence inner;

        private final int timeoutMillis;

        private final long timeoutTime;

        private final String stringToMatch;

        private final String regularExpression;

        public TimeoutRegexCharSequence(CharSequence inner, int timeoutMillis, String stringToMatch, String regularExpression) {
            super();
            this.inner = inner;
            this.timeoutMillis = timeoutMillis;
            this.stringToMatch = stringToMatch;
            this.regularExpression = regularExpression;
            timeoutTime = System.currentTimeMillis() + timeoutMillis;
        }

        public char charAt(int index) {
            if (System.currentTimeMillis() > timeoutTime) {
                throw new RuntimeException("Timeout occurred after " + timeoutMillis + "ms while processing regular expression '"
                                + regularExpression + "' on input '" + stringToMatch + "'!");
            }
            return inner.charAt(index);
        }

        public int length() {
            return inner.length();
        }

        public CharSequence subSequence(int start, int end) {
            return new TimeoutRegexCharSequence(inner.subSequence(start, end), timeoutMillis, stringToMatch, regularExpression);
        }

        @Override
        public String toString() {
            return inner.toString();
        }
    }

}

非常感谢dawce为我指出这个解决方案,以回答一个不必要的复杂问题