Vlad Fedorkov

Performance consulting for MySQL and Sphinx

Replacing MySQL Full-text search with Sphinx

It’s very handy to have FT search out of the box, but there are several drawbacks attached. Problem is that MyISAM Full-text search is not designed to handle big amounts of text data. If you plan to index more than 1M documents you will probably need to take a look on the external search system like Lucene or Sphinx. For the usual LAMP-based service I personally would prefer to use Sphinx as it provides simple transition from MySQL FT and easy to integrate into any application (Sphinx could be queried via native APIs or via MySQL protocol).

Say we have table called <my_table> with `title` and `content` text fields. In MySQL you have to fire query like this:

SELECT * FROM <my_table> WHERE MATCH(`title`,`content`) AGAINST ('I love Sphinx');

Let’s see how could we do the same query with Sphinx.

There are two steps to run the Sphinx as a MySQL FT replacement. First you will need to pull all needed text data from MySQL to Sphinx. For that you have to configure source and index definition in Sphinx search config. Second step is to run indexer program which will connect to MySQL and fetch all the desired data. Then fire up the search daemon which will serve queries. Simplified Sphinx configuration example is below:

You need to let Sphinx know where to look for the data (source configuration):

source my_source
{
    type      = mysql
    sql_host  = localhost
    sql_user  = sphinx
    sql_pass  = ********
    sql_db    = <my_database_name>
    sql_port  = 3306
    sql_query = SELECT id, title, description FROM <my_table>
}

Please note that id field in sql_query. This field MUST be positive integer and have to be unique across all the documents in collection. Auto_incremented integer primary key from MySQL table with work in this case like a charm.

Now we need to tell Sphinx where to store all that data and configure index:

index my_first_sphinx_index
{
    source        = my_source
    path          = <path_to_sphinx_home>/var/index1
    docinfo       = extern
    charset_type  = utf-8
}

That’s it. Let’s add few more required sections to complete configuration:

Indexer settings:

indexer
{
    mem_limit    = 256M
    write_buffer = 8M
}

and daemon configuration:

searchd
{
    listen                  = 9312
    listen                  = 9306:mysql41
    pid_file                = <sphinx_path>/var/searchd.pid
    max_matches             = 1000
}

Put blocks above to <sphinx_path>/etc/sphinx.conf file which will be your main Sphinx configuration file.
Please also make sure that <sphinx_path>/var directory is writable for user you planning to run sphinx daemon.

Now we have to perform initial indexing by running indexer binary

<sphinx_path>/bin/indexer my_first_sphinx_index -c <sphinx_path>/etc/sphinx.conf

indexer my_first_sphinx_index tells indexer to create index called my_first_sphinx_index described in sphinx config. To create all the indexes at once (if you have two or more) run indexer –all -c <sphinx_path>/etc/sphinx.conf

Now you have to run the search daemon apparently called searchd

<sphinx_path>/bin/searchd -c <sphinx_path>/etc/sphinx.conf

Now Sphinx should be able to answer queries.

Fire up mysql client and connect to brand new Sphinx installation:

$ mysql -h0 -P 9306
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.0.3-id64-dev (rel20-r3043)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Please note server version. It is Sphinx! Now you can hire all power of Sphinx full-text query language!

mysql> SELECT * FROM my_first_sphinx_index WHERE
MATCH('I love Sphinx') LIMIT 0,5; SHOW META;
+---------+--------+
| id      | weight |
+---------+--------+
| 7637682 |   2652 |
| 6598265 |   2612 |
| 6941386 |   2612 |
| 6913297 |   2584 |
| 7139957 |   1667 |
+---------+--------+
5 rows in set (0.01 sec)

+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| total         | 51     |
| total_found   | 51     |
| time          | 0.013  |
| keyword[0]    | love   |
| docs[0]       | 227990 |
| hits[0]       | 472541 |
| keyword[1]    | sphinx |
| docs[1]       | 114    |
| hits[1]       | 178    |
+---------------+--------+
9 rows in set (0.00 sec)
mysql>

Please note – Sphinx returns document IDs, not a document content, so you need to query MySQL to fetch additional fields: SELECT * FROM <my_table> WHERE id IN (7637682, 6598265, …, 7139957)

Config above is fully working but very simple and provided as an example. You can download it from this website directly or using wget:

wget http://astellar.com/downloads/sphinx.conf

Another way to create initial Sphinx configuration is to adopt Sphinx configuration sample called sphinx-min.conf.dist bundled to the Sphinx RPM and Deb packets.

You could also learn more about Sphinx tips and tricks from my talks on various conferences and meetups, read blog posts about Sphinx and follow me on twitter.

If you are looking for help with Sphinx installation and integration, troubleshooting and fine tuning please contact me for a quote with your problem description.

Enjoy!

P.S. If you found this article useful please share it!

Category: Guide
  • Rashid says:

    man you are awesome.

    But i dont understand how to start with.

    i opened Editor and copied all the four coding like Indexer settings, daemon configuration and all.

    /bin/indexer my_first_sphinx_index -c /etc/sphinx.conf

    what is this . where to run this??

    terminal??

    May 15, 2012 at 12:53 pm
  • Stan says:

    I get the following error when trying to query my index
    ERROR 1064 (42000): index …: fullscan requires extern docinfo

    It is setup as you have it in this post.

    If I add to the source
    sql_attr_string title
    sql_attr_string description

    SELECT * FROM my_first_sphinx_index works but I get an Empty Set

    Any help is greatly appreciated – I’m brand new to sphinx.

    August 29, 2012 at 8:37 pm
  • Stan says:

    Found the solution to my problem :)

    Instead of sql_attr_string I used sql_field_string
    sql_attr_string title
    sql_attr_string description
    which makes those fields an index with full text.

    August 29, 2012 at 8:46 pm
  • Sivakumar says:

    Hi i am currently working on Sphinx 2.06 , installed it ubuntu and configuration are made, i created index and make searcd also, its work fine i need to visulaize in mysql client, as i created SphinxSE engine also and created a table with fields that have three columns must be integer and SOme other, But dont know how to query after this… HElp me

    December 3, 2012 at 11:50 am
  • yuvakumar says:

    Hi,
    i am using sphinx search in codeigniter project. Please tell me any references or any coding help. I have already integrated sphinx search. The main problem is –
    1. How to use sphinx search in codeigniter
    2. Where to place the sphinxapi i.e, controller or model. Then how?
    3. Please suggest any live code.
    4. I am also saw the so many references like you.
    5. In references i found only basics, not completely code.

    thank you.

    July 6, 2013 at 9:56 am
  • aisha says:

    Thanks, one of a kind tute on the web on sphinx.

    December 9, 2013 at 5:15 pm

Your email address will not be published. Required fields are marked *

*