Vlad Fedorkov

Technical stuff and random personal thoughts

Difference between myisam_sort_buffer_size and sort_buffer_size

MySQL has two confusingly identical by the first look variables myisam_sort_buffer_size and sort_buffer_size. Thing is that those two confusingly similar variables has absolutely different meanings.

 

sort_buffer_size is a per-connection variable and do not belongs to any specific storage engine. It doesn’t matter do you use MyISAM or InnoDB – MySQL will allocate sort_buffer_size for every sort (required most of the times for ORDER BY and GROUP BY queries) so increasing it’s value might help speeding up those queries however I would not recommend to change it from the default value unless you are absolutely sure about all the drawbacks. Value for out-of-the-box MySQL-5.1.41 installation on Ubuntu is 2Mb and it’s recommended to keep it that way.

 

On the other side myisam_sort_buffer_size used by MyISAM to perform index sorting on relatively rare table-wide modifications like ALTER/REPAIR TABLE. Stock value is 8Mb so if you are using MyISAM tables intensively (please refer to my other post to see how to know your tables type) I would recommend to set it to some higher value close or even more than key_buffer_size but still small enough to keep it in memory and prevent MySQL from swapping.

Why is stock MySQL slow?

“I’ve installed MySQL and it doesn’t work fast enough for me”. MySQL server is heart of database driven application (if it uses MySQL as database of course!) and any slowness related to running queries is affecting all application layers.

 

MySQL server tuning and query slowness hunting are always step by step process and without knowing all the data (SHOW GLOBAL VARIABLES, SHOW GLOBAL STATUS, SHOW TABLE STATUS LIKE ‘tablename’, EXPLAIN details for slow query is just some of the required information) it would be generally a blind guess. But there are still few things which is related to newly installed MySQL server.

 

If you are using stock MySQL you might need to check memory pool size which MySQL used to load index data to avoid slow IO requests and increase queries speed. Connect to MySQL and fire two queries: Continue reading

Replacing MySQL Full-text search with Sphinx

It’s very handy to have FT search out of the box, but there are several drawbacks attached. Problem is that MyISAM Full-text search is not designed to handle big amounts of text data. If you plan to index more than 1M documents you will probably need to take a look on the external search system like Lucene or Sphinx. For the usual LAMP-based service I personally would prefer to use Sphinx as it provides simple transition from MySQL FT and easy to integrate into any application (Sphinx basically could be queried via native API or via MySQL protocol.

 

Say we have table called <my_table> with `title` and `content` text fields. In MySQL you have to fire query like this: SELECT * FROM <my_table> WHERE MATCH(`title`,`content`) AGAINST (‘I love Sphinx’); Let’s see how could we do the same query with Sphinx.

 

There are two steps to run the Sphinx as a MySQL FT replacement. First you need to pull text data from MySQL to Sphinx. For that you will need to configure source and index definition. Second step is to run indexer program which will connect to MySQL and fetch all the desired data and fire up search daemon which will serve queries. Simplified Sphinx configuration example is below: Continue reading

Top 100 and top 500 stopwords for Sphinx Search

Back to year 2006 when I was working for my first sphinxsearch project I was playing with stopwords files. Stopwords is basically a small set of highly frequent words you often don’t want to search for (like “I”, “Am”, “The”, etc). For most sphinx instances they only wasting index space and slower your search queries by finding all occurrences of these non-important words.

 

Say if you are searching for “when is jane’s birthday” you are actually looking to find documents with “jane’s birthday”, and you don’t really care about lot’s of documents (blog posts, news articles, etc) with only “when” and “is” inside. Continue reading