Java indexOf 函数比 Rabin-Karp 更有效?文本的搜索效率
几周前,我向Stackoverflow提出了一个问题,关于创建一个有效的算法来搜索大块文本中的模式。现在我正在使用字符串函数 indexOf 进行搜索。一项建议是使用Rabin-Karp作为替代方案。我写了一个小测试程序,如下所示来测试Rabin-Karp的实现。
public static void main(String[] args) {
    String test = "Mary had a little lamb whose fleece was white as snow";
    String p = "was";
     long start  = Calendar.getInstance().getTimeInMillis();
     for (int x = 0; x < 200000; x++)
         test.indexOf(p);
     long end = Calendar.getInstance().getTimeInMillis();
     end = end -start;
     System.out.println("Standard Java Time->"+end);
    RabinKarp searcher = new RabinKarp("was");
    start  = Calendar.getInstance().getTimeInMillis();
    for (int x = 0; x < 200000; x++)
    searcher.search(test);
    end = Calendar.getInstance().getTimeInMillis();
    end = end -start;
    System.out.println("Rabin Karp time->"+end);
}
以下是我正在使用的Rabin-Karp的实现:
import java.math.BigInteger;
import java.util.Random;
public class RabinKarp {
private String pat; // the pattern // needed only for Las Vegas
private long patHash; // pattern hash value
private int M; // pattern length
private long Q; // a large prime, small enough to avoid long overflow
private int R; // radix
private long RM; // R^(M-1) % Q
static private long dochash = -1L;
public RabinKarp(int R, char[] pattern) {
    throw new RuntimeException("Operation not supported yet");
}
public RabinKarp(String pat) {
    this.pat = pat; // save pattern (needed only for Las Vegas)
    R = 256;
    M = pat.length();
    Q = longRandomPrime();
    // precompute R^(M-1) % Q for use in removing leading digit
    RM = 1;
    for (int i = 1; i <= M - 1; i++)
        RM = (R * RM) % Q;
    patHash = hash(pat, M);
}
// Compute hash for key[0..M-1].
private long hash(String key, int M) {
    long h = 0;
    for (int j = 0; j < M; j++)
        h = (R * h + key.charAt(j)) % Q;
    return h;
}
// Las Vegas version: does pat[] match txt[i..i-M+1] ?
private boolean check(String txt, int i) {
    for (int j = 0; j < M; j++)
        if (pat.charAt(j) != txt.charAt(i + j))
            return false;
    return true;
}
// check for exact match
public int search(String txt) {
    int N = txt.length();
    if (N < M)
        return -1;
    long txtHash;
    if (dochash == -1L) {
        txtHash = hash(txt, M);
        dochash = txtHash;
    } else
        txtHash = dochash;
    // check for match at offset 0
    if ((patHash == txtHash) && check(txt, 0))
        return 0;
    // check for hash match; if hash match, check for exact match
    for (int i = M; i < N; i++) {
        // Remove leading digit, add trailing digit, check for match.
        txtHash = (txtHash + Q - RM * txt.charAt(i - M) % Q) % Q;
        txtHash = (txtHash * R + txt.charAt(i)) % Q;
        // match
        int offset = i - M + 1;
        if ((patHash == txtHash) && check(txt, offset))
            return offset;
    }
    // no match
    return -1; // was N
}
// a random 31-bit prime
private static long longRandomPrime() {
    BigInteger prime = new BigInteger(31, new Random());
    return prime.longValue();
}
// test client
}
Rabin-Karp的实现的工作原理是它返回我正在寻找的字符串的正确偏移量。然而,令我惊讶的是当我运行测试程序时发生的时序统计信息。它们分别是:
Standard Java Time->39
Rabin Karp time->409
这真的很令人惊讶。Rabin-Karp(至少在这里实现)不仅不比标准的java indexOf String函数快,而且速度慢一个数量级。我不知道出了什么问题(如果有的话)。有人对此有想法吗?
谢谢
埃利奥特