Boyer-Moore Algorithm
๐ก Concept Name
Boyer-Moore Algorithm โ A landmark string searching algorithm that drastically reduces the number of comparisons by smartly skipping ahead in the text using the pattern's own structure.
๐ Quick Intro
Boyer-Moore is designed for rapid pattern matching in long texts. By analyzing the pattern up front, it can often leap over big sections of text, making it one of the fastest search techniques for real-world cases.
๐ง Analogy / Short Story
Picture searching for a friend in a crowd with red hats. Instead of scanning every face, you jump ahead to spots where you actually see red hatsโignoring the rest. Boyer-Moore works the same way: it only stops where there's a real chance for a match, skipping dead ends with clever shortcuts.
๐ง Technical Explanation
- ๐ฉ Bad Character Rule: If a mismatch happens, it shifts the pattern so that the mismatched character in the text aligns with its last occurrence in the patternโor skips it entirely if it doesn't exist.
- ๐ Good Suffix Rule: Uses information from matched suffixes to jump to the next possible match position.
- โฉ Right-to-Left Comparison: Scans the pattern backwards, starting from the end, which helps in faster mismatches and shifts.
- โก Real-World Speed: While the worst-case is O(mn), in practice itโs much faster, especially with longer patterns and large texts.
๐ฏ Purpose & Use Case
- โ Ultra-fast text search in editors and IDEs.
- โ DNA pattern analysis in bioinformatics.
- โ Compilers and search engines for finding keywords.
- โ Detecting plagiarism or malware signatures in files.
๐ป Real Code Example
// Boyer-Moore search (using Bad Character heuristic)
int BoyerMooreSearch(string text, string pattern)
{
int[] last = new int[256];
for (int i = 0; i < last.Length; i++) last[i] = -1;
for (int i = 0; i < pattern.Length; i++)
last[(int)pattern[i]] = i;
int n = text.Length, m = pattern.Length, s = 0;
while (s <= n - m)
{
int j = m - 1;
while (j >= 0 && pattern[j] == text[s + j]) j--;
if (j < 0)
return s; // Found!
else
s += Math.Max(1, j - last[(int)text[s + j]]);
}
return -1;
}
// Example usage:
Console.WriteLine(BoyerMooreSearch("EXAMPLE IN A SAMPLE TEXT", "SAMPLE")); // Output: 12

โ Interview Q&A
Q1: What is the main benefit of Boyer-Moore over naive search?
A: It skips over sections of text, reducing unnecessary comparisons.
Q2: Which two key heuristics power Boyer-Moore?
A: The Bad Character Rule and Good Suffix Rule.
Q3: How does Boyer-Moore decide how far to jump?
A: Based on the mismatched characterโs position in the pattern or matched suffixes.
Q4: In which direction does it check characters?
A: From right to left (end of the pattern backwards).
Q5: When does Boyer-Moore perform best?
A: On long patterns and large texts, especially with few pattern repetitions.
Q6: What is the Bad Character Rule?
A: It shifts the pattern so the bad character aligns with the rightmost occurrence in the pattern.
Q7: What does the Good Suffix Rule do?
A: It shifts the pattern based on matching suffixes after a mismatch.
Q8: How does Boyer-Moore preprocess the pattern?
A: By building tables for bad characters and good suffixes.
Q9: Is Boyer-Moore efficient for short patterns?
A: It's most efficient for longer patterns.
Q10: What is a real-world application of Boyer-Moore?
A: Plagiarism detection and text editors.
๐ MCQs
Q1. What is the Boyer-Moore algorithm designed for?
- Sorting
- Fast string search
- Compression
- Data hashing
Q2. Which rule lets Boyer-Moore skip text on mismatch?
- Quick Exit
- Suffix Check
- Bad Character Rule
- Heap Skip
Q3. Boyer-Moore scans the pattern in which direction?
- Left to right
- Random
- Middle out
- Right to left
Q4. Why does Boyer-Moore often beat KMP in practice?
- It sorts first
- Skips ahead using more pattern info
- It’s O(1)
- It reverses the pattern
Q5. Which area benefits from Boyer-Moore?
- Number sorting
- Plagiarism detection
- Array merging
- Stack overflow
Q6. What tables does Boyer-Moore preprocess?
- Hash table
- Bad character and good suffix tables
- Frequency table
- Priority queue
Q7. What happens when a mismatch occurs?
- Restart search
- Pattern shifts based on rules
- Search stops
- Swap characters
Q8. Which rule applies when a mismatch occurs at the end of the pattern?
- Bad Character Rule
- Good Suffix Rule
- Prefix Rule
- Suffix Rule
Q9. What is the worst-case time complexity of Boyer-Moore?
- O(n^2)
- O(n + m)
- O(m log n)
- O(n)
Q10. Which search algorithm is generally faster in practice, Boyer-Moore or naive?
- Naive
- Boyer-Moore
- KMP
- Rabin-Karp
๐ก Bonus Insight
Because Boyer-Moore uses knowledge about both mismatches and matches, it can sometimes skip most of the text entirelyโmaking it a favorite for applications like grep and text editors searching through gigabytes of data.
๐ PDF Download
Need a handy summary for your notes? Download this topic as a PDF!