crystal
edit-distance
similarity-measures
levenshtein
damerau-levenshtein
jaro-winkler
jaro
text

A collection of edit distance algorithms in Crystal.

Includes Levenshtein, Restricted Edit (Optimal Alignment) and Damerau-Levenshtein distances, and Jaro and Jaro-Winkler similarity.

Add this to your application's `shard.yml`

:

```
dependencies:
edits:
github: tcrouch/edits.cr
```

```
require "edits"
```

Calculate the edit distance between two sequences with variants of the Levenshtein distance algorithm.

```
Edits::Levenshtein.distance "raked", "bakers"
# => 3
Edits::RestrictedEdit.distance "iota", "atom"
# => 3
Edits::DamerauLevenshtein.distance "acer", "earn"
# => 3
```

**Levenshtein**edit distance, counting insertion, deletion and substitution.**Restricted Damerau-Levenshtein**edit distance (aka**Optimal Alignment**), counting insertion, deletion, substitution and transposition (adjacent symbols swapped). Restricted by the condition that no substring is edited more than once.**Damerau-Levenshtein**edit distance, counting insertion, deletion, substitution and transposition (adjacent symbols swapped).

Levenshtein | Restricted Damerau-Levenshtein | Damerau-Levenshtein | |
---|---|---|---|

"raked" vs. "bakers" | 3 | 3 | 3 |

"iota" vs. "atom" | 4 | 3 | 3 |

"acer" vs. "earn" | 4 | 4 | 3 |

Levenshtein and Restricted Edit distances accept an optional maximum bound.

```
Edits::Levenshtein.distance "fghijk", "abcde", 3
# => 3
```

The convenience method `most_similar`

searches for the best match to a
given sequence from a collection. It is similar to using `min_by`

, but leverages
a maximum bound.

```
Edits::RestrictedEdit.most_similar "atom", ["iota", "tome", "mown", "tame"]
# => "tome"
```

Calculate the Jaro and Jaro-Winkler similarity/distance of two sequences.

```
Edits::Jaro.similarity "information", "informant"
# => 0.90235690235690236
Edits::Jaro.distance "information", "informant"
# => 0.097643097643097643
Edits::JaroWinkler.similarity "information", "informant"
# => 0.94141414141414137
Edits::JaroWinkler.distance "information", "informant"
# => 0.05858585858585863
```

- [tcrouch] Tom Crouch - creator, maintainer