Most efficient way to search for unknown patterns in a string?
I am trying to find patterns that:
- occur more than once
- are more than 1 character long
- are not substrings of any other known pattern
without knowing any of the patterns that might occur.
For example:
- The string "the boy fell by the bell" would return .
'ell', 'the b', 'y '
- The string "the boy fell by the bell, the boy fell by the bell" would return .
'the boy fell by the bell'
Using double for-loops, it can be brute forced very inefficiently:
ArrayList<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) {
int limit = (length - i) / 2;
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if(candidate.length() <= 1) {
continue;
}
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
However, this is incredibly slow when searching large strings with tons of patterns.