One of the ingredients for a good (and fast) search is the stopwords list. It contains all words that appear so often in a language that their relevance for search is almost zero. For example the word “and” – if a search would list all texts that contain “and” the list of results would be enormous! Of course this won’t harm the quality of a good search algorithm, but its performance and the size of the index. And that’s why wanted a stopwords list for the new russian search.
We invited Olha Biletska and Tetyana Dekola, two students I met at an alumni meeting recently, to help us with this task. After losing everything because of this great IDEA bug we finally came up with a list. It’s bigger than the german or english one because the russian language has six cases instead of four.
Download (UTF-8)

Email this Share this on Facebook Share this on LinkedIn Tweet This! RSS feed for comments on this post. TrackBack URL

Leave a comment