Thursday, April 28, 2011

Detecting misspelled words

I have a list of airport names and my users have the possibility to enter one airport name to select it for futher processing.

How would you handle misspelled names and present a list of suggestions?

From stackoverflow
  • Look up Levenshtein distances to match a correct name against a given user input.

    Wedge : Levenshtein distances are pretty expensive to calculate, O(n^2), calculating Levenshtein distances to every word in a dictionary is a non-starter.
    TokenMacGuy : The dictionary in this case is just the list of airports. worldwide there are only a few hundred notable airports.
  • It may be better to let the user select from the list of airport names instead of letting them type in their own. No mistakes can be made that way.

  • While it won't help right away, you could keep track of typos, and see which name they finally enter when a correct name is entered. That way you can track most common typos, and offer the best options.

  • Employ spell check in your code. The list of words should contain only correct spellings of airports.

    This is not a great way to do this. You should either go for a control that provides auto complete option or a drop down as someone else suggested.

    Use AJAX if your technology supports.

  • Adding to Kevin's suggestion, it might be a best of both worlds if you use an input box with javascript autocomplete. such as jquery autocomplete

    edit: danish beat me :(

    Jayrox : a reason for the downvote?
    Michal Sznajder : I tried to remove downvote but system forbids this. Main reason was: for God's sake not everyone is writing in jquery/JavaScript/HTML. Language agnostic people, language agnostic..
    Jayrox : Some people want a simple solution for a simple task. I provided a simple solution for a simple task. Every problem has multiple routes available to solve it.
  • There may be an existing spell-check library you can use. The code to do this sort of thing well is non-trivial. If you do want to write this yourself, you might want to look at dictionary trie's.

    One method that may work is to just generate a huge list of possible error words and their corrections (here's an implementation in Python), which you could cache for greater performance.

  • http://norvig.com/spell-correct.html
    does something like levenshtein but, because he doesnt go all the way, its more efficient

  • I know its not what you asked, but if this is an application where getting the right airport is important (e.g. booking tickets) then you might want to have a confirmation stage to make sure you have the right one. There have been cases of people getting tickets for the wrong Sydney, for instance.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.