Vlad Fedorkov

Performance consulting for MySQL and Sphinx

Top 100 and top 500 stopwords for Sphinx Search

Back to year 2006 when I was working for my first sphinxsearch project I was playing with stopwords files. Stopwords is basically a small set of highly frequent words you often don’t want to search for (like “I”, “Am”, “The”, etc). For most sphinx instances they only wasting index space and slower your search queries by finding all occurrences of these non-important words.

Say if you are searching for “when is jane’s birthday” you are actually looking to find documents with “jane’s birthday”, and you don’t really care about lot’s of documents (blog posts, news articles, etc) with only “when” and “is” inside.

Remove those high frequency words from search index is usually smart move and ages ago I’ve created two stopword file samples which I’m using by now.

stopwords.txt is a top 100 most frequent words in my blog post collection while stopwords-500.txt as you might expect is a 500 top frequent words. They are old, but not yet included in sphinx distribution so I would suggest to start with stopwords.txt and add it using stopwords option to your sphinx config file as below:

     stopwords = /path/to/stopwords.txt

You could also download stopword files using wget:

wget http://astellar.com/downloads/stopwords.txt
wget http://astellar.com/downloads/stopwords-500.txt

Learn more about Sphinx tips and tricks from my talks on various conferences and meetups, read blog posts about Sphinx and follow me on twitter.

If you are looking for help with Sphinx installation and integration, troubleshooting and fine tuning please contact me for a quote with your problem description.


P.S. If you found this article useful please share it!

Category: Performance
  • froth says:

    great,thank you very much,but I config stopwords get error message,my sphinx version is 0.9,Does not support this?

    October 15, 2012 at 11:40 pm
    • vlad says:

      Sphinx has stopwords support since it’s early days, at least since 0.9.7 so answer is most likely yes. But as soon as 0.9 version is more than two years old would recommend to upgrade to recent stable (currently 2.0).

      What error messages do you get?

      October 18, 2012 at 9:30 pm
  • froth says:

    Thank you ,I have update to the lastversion, It’s work and great:)

    October 18, 2012 at 10:54 pm

Your email address will not be published. Required fields are marked *