Vlad Fedorkov

Performance consulting for MySQL and Sphinx

Debuging Sphinx index with indextool

Sometime you need to debug your Sphinx indexes to know what’s inside it, is it okay, is there document you trying to find? In this case indextool utility might be very handy as it gathers information directly from index files even searchd is not started. Here few examples of indextool usage:

Checking index consistency

One of the most important functions of indextool is checking index consistency. You will need to have sphinx config file and index files.
/path/to/indextool -c sphinx.conf --check my_sphinx_index

This will perform checking of my_sphinx_index for consistency between document list, hit list, positions and other internal sphinx index structures. Please note that indextool is only checking disk indexes (starting from 2.0.2 it could also check on-disk part of Real-Time indexes, but not a memory part). Usual output for healthy index looks likes this:

using config file 'sphinx.conf'...
checking index 'my_sphinx_index'...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
check passed, 4.4 sec elapsed

indextool doesn’t fix issues itself it’s only telling you if index okay or not. In case of troubles you will need to rebuild broken index. Usually you could do that with indexer [--rotate] my_sphinx_index where –rotate is used to rebuild index on the fly, while searchd is running.

Getting number of documents from index

indextool is providing you an option to reverse engineer index to see internal structures and settings along with global information like number of documents stored, number of bits per document identifier (32 or 64), tokenizer, morphology type and some other index setting. Usage:
indextool --dumpheader my_sphinx_index.sph

Shortened results would look like this:
dumping header for index 'my_sphinx_index'...
dumping header file 'my_sphinx_index.sph'...
version: 23
idbits: 64
docinfo: extern
fields: 5
field 0: page
field 1: title
field 2: description
field 3: tags
attrs: 23
attr 0: user_id, uint, bitoff 0
attr 1: url_crc32, uint, bitoff 32
attr 12: deleted, uint, bitoff 384
attr 13: private, uint, bitoff 416
attr 14: external, uint, bitoff 448
attr 15: enabled, uint, bitoff 480
attr 16: lastactivity, timestamp, bitoff 512
attr 17: url, ordinal, bitoff 544
attr 18: has_picture, bool, bitoff 576
total-documents: 3438767
total-bytes: 21164723870
min-prefix-len: 0
min-infix-len: 0
exact-words: 0
html-strip: 1

Lots of interesting internal index information as you could see. Besides total-documents and total-bytes you can find names and internal types for all non full-text attributes with their sizes. bitoff (offset) field for last attr record will give you an idea of attributes memory consumption. For full-text fields you can find names, text processing settings like prefix/infix indexing, stemming length, stopwords, zones support and some other settings.

Are there documents in index that matched my keyword?

Easy! Even when daemon is not running:

# indextool -c sphinx.conf --dumphitlist <my_sphinx_index> <keyword>

or in some more convenient way, just to get the number of docs:

# indextool -c sphinx.conf --dumphitlist <my_sphinx_index> <keyword> | grep docs | wc -l

You could find indextool in any Sphinx distribution since version 0.9.9, it’s usually located at the same directory as searchd.

Category: Guide

Your email address will not be published. Required fields are marked *