StringFinder finder = stringSearch.createFinder(text) example with StringSearchAlgorithmsĪhoCorasick stringSearch = new AhoCorasick(asList("123woods", "woods")) ĬharProvider text = new StringCharProvider("I will come and meet you at the woods 123woods and all the woods", 0) These could be found in libraries like Stringsearchalgorithms or byteseek. Algorithms suitable for such a search would be Aho-Corasick, Wu-Manber, or Set Backwards Oracle Matching. Multi pattern search will process each character of the text exactly once. In worse case each character of the text will be processed p times where p is the number of patterns. One will have to start the whole search for every keyword pattern. Single pattern search is better, but not qualified, too. In worse case each character of the text will be processed l times (where l is the sum of the pattern lengths). Searching for exactly one keyword is optimized in java, searching for an or-expression uses the regex non deterministic automaton which is backtracking on mismatches. Java Pattern Search (with Matcher.find) is not qualified for doing that. This is a classical application for multi-pattern-search-algorithms. The solution seems to be long accepted, but the solution could be improved, so if someone has a similar problem:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |