Suggests alternative words for a given input word, useful for expanding search results.
Search Corrections
Suggests alternative words for a given input word.
This can be useful in a website search feature where the given search term produces no results, but an alternative spelling or stem of the term may produce results.
The module has two methods intended for public use:
findSimilarWords()
: this method suggests corrected spellings or similar alternatives for the given word based on words that exist in the website.stem()
: this method returns the stem of the given word, which may give a full or partial match for a word within the website.
The module doesn't dictate any particular way of using it in a website search feature, but one possible approach is as follows. If a search produces no matching pages you can take the search term (or if multiple terms, split and then loop over each term) and use the module methods to find alternative words and/or the stem word. Then automatically perform a new search using the alternative word(s), and show a notice to the user, e.g.
Your search for "begining" produced no matches. Including results for "beginning" and "begin".
findSimilarWords()
This method creates a list of unique words (the "word list") that exist on the pages and fields that you define, and compares those words to a target word that you give it. The method returns an array of words that are sufficiently similar to the target word.
For multi-language sites, the $user
language determines which language populates the word list.
Similarity
The method ranks similar words by calculating the Levenshtein distance from the target word.
Where several results have the same Levenshtein distance from the target word these are ordered so that results which have more letters in common with the target word at the start of the result word are higher in the order.
Method arguments
$target
(string)
The input word.
$selector
(string)
A selector string to find the pages that the word list will be derived from.
$fields
(array)
An array of field names that the word list will be derived from.
$options
(array)
Optional: an array of options as described below.
minWordLength
(int)
Words below this length will not be included in the word list. Default:4
lengthRange
(int)
Words that are longer or shorter than the target word by more than this number will not be included in the word list. Default:2
expire
(int)
The word list is cached for this number of seconds, to improve performance. Default:3600
maxChangePercent
(int)
When the Levenshtein distance between a word and the target word is calculated, the distance is then converted into a percentage of changed letters relative to the target word. Words that have a higher percentage change than this value are not included in the results. Default:50
insertionCost
(int)
This is an optional argument for the PHPlevenshtein()
function. See the docs for details. Default:1
replacementCost
(int)
This is an optional argument for the PHPlevenshtein()
function. See the docs for details. Default:1
deletionCost
(int)
This is an optional argument for the PHPlevenshtein()
function. See the docs for details. Default:1
Example of use// The input word that may need correcting
$target = 'dispraxia';
// Get the Search Corrections module
$sc = $modules->get('SearchCorrections');
// Define a selector string to find the pages that the word list will be derived from
$selector = "template=basic-page";
// Define an array of field names that the word list will be derived from
$flds = ['title', 'body'];
// Optional: override any of the default options
$options = ['maxChangePercent' => 55];
// Get an array of similar words that exist in the pages/fields you defined
// The return value is in the format $word => $levenshtein_distance
$results = $sc->findSimilarWords($target, $selector, $flds, $options);
Example result:
stem()
This method uses php-stemmer to return the stem of the given word. As an example, "fish" is the stem of "fishing", "fished", and "fisher".
The returned stem may be the original given word in some cases. The stem is not necessarily a complete word, e.g. the stem of "argued" is "argu".
If using the stem in a search you will probably want to use a selector operator that can match partial words.
Method arguments
$word
(string)
The input word.
$language
(string)
Optional: the language name in English. The valid options are shown below. Default: english
- catalan
- danish
- dutch
- english
- finnish
- french
- german
- italian
- norwegian
- portuguese
- romanian
- russian
- spanish
- swedish
Alternatively, you can use the ISO 639 language code for any of the above languages.
Example of use// The input word
$word = 'fishing';
// Get the Search Corrections module
$sc = $modules->get('SearchCorrections');
// Get the stem of the word
$stem = $sc->stem($word);
Install and use modules at your own risk. Always have a site and database backup before installing new modules.