String Matching Algo: KMP

Introduction

Nowadays, there is so much information around us. It’s impossible to read all of them. String matching algorithm becomes indispensable. It helps us grab the pieces we want.

Background

Before talking about KMP, let’s understand how to get a string match. There are 2 strings, the Main string, and the Pattern string. We need to find the pattern in the main.

a a b c      a a b c
| ==>
a b > a b

Partial Match Table (PTM)

Let’s look at the pattern when a mismatch occurs.

a b a b a b c
|
a b a b c
  • The suffix ab mathes the ab in the prefix of pattern string
a b a b a b c
> > a b a b c
pattern : a b a b a b c aPTM     : 0 0 1 2 3 4 0 1
  • PTM[1] = 0: In the string `ab`, the last `b` doesn’t matche the first `a`. So the matched string length is 0.
  • PTM[2] = 1: In the string `aba`, the last `a` matches the first `a`. So the matched string length is 1.
  • PTM[3] = 2: In the string `abab`, the last `ab` matches the last `ab`. the matched string length is 2.
  • … (same as the case PTM[3] = 2)
  • PTM[7] = 0: In the string `abababc`, the last `c` doesn’t matche the first `a`. So the matched string length is 0.
  • PTM[8] = 1: In the string `abababca`, the last `a` matches the first `a`, and `ca` doesn’t matches the first `ab`. So the matched string length is 1.

Implement

PTM is the key point of KMP. Since we’ve understood it, we can write codes to implement this algorithm.

pattern :  a b a b a b c aPTM     :  0 0 1 2 3 4 0 1next    : -1 0 0 1 2 3 4 0 1

Developper in Paris, interested in Big Front-end, Full Stack, Explainable AI. Try to introduce AI into Full Stack workflow