Sometime you need to debug your Sphinx indexes to know what’s inside it, is it okay, is there document you trying to find? In this case indextool utility might be very handy as it gathers information directly from index files even searchd is not started. Here few examples of indextool usage:
Checking index consistency
One of the most important functions of indextool is checking index consistency. You will need to have sphinx config file and index files.
/path/to/indextool -c sphinx.conf --check my_sphinx_index
This will perform checking of my_sphinx_index for consistency between document list, hit list, positions and other internal sphinx index structures. Please note that indextool is only checking disk indexes (starting from 2.0.2 it could also check on-disk part of Real-Time indexes, but not a memory part). Usual output for healthy index looks likes this:
using config file 'sphinx.conf'...
checking index 'my_sphinx_index'...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
check passed, 4.4 sec elapsed
indextool doesn’t fix issues itself it’s only telling you if index okay or not. In case of troubles you will need to rebuild broken index. Usually you could do that with indexer [--rotate] my_sphinx_index where –rotate is used to rebuild index on the fly, while searchd is running.
Getting number of documents from index
indextool is providing you an option to reverse engineer index to see internal structures and settings along with global information like number of documents stored, number of bits per document identifier (32 or 64), tokenizer, morphology type and some other index setting. Usage:
indextool --dumpheader my_sphinx_index.sph
Shortened results would look like this:
dumping header for index 'my_sphinx_index'...
dumping header file 'my_sphinx_index.sph'...
version: 23
idbits: 64
docinfo: extern
fields: 5
field 0: page
field 1: title
field 2: description
field 3: tags
attrs: 23
attr 0: user_id, uint, bitoff 0
attr 1: url_crc32, uint, bitoff 32
[...]
attr 12: deleted, uint, bitoff 384
attr 13: private, uint, bitoff 416
attr 14: external, uint, bitoff 448
attr 15: enabled, uint, bitoff 480
attr 16: lastactivity, timestamp, bitoff 512
attr 17: url, ordinal, bitoff 544
attr 18: has_picture, bool, bitoff 576
[...]
total-documents: 3438767
total-bytes: 21164723870
min-prefix-len: 0
min-infix-len: 0
exact-words: 0
html-strip: 1
[...]
Lots of interesting internal index information as you could see. Besides total-documents and total-bytes you can find names and internal types for all non full-text attributes with their sizes. bitoff (offset) field for last attr record will give you an idea of attributes memory consumption. For full-text fields you can find names, text processing settings like prefix/infix indexing, stemming length, stopwords, zones support and some other settings.
Are there documents in index that matched my keyword?
Easy! Even when daemon is not running:
# indextool -c sphinx.conf --dumphitlist <my_sphinx_index> <keyword>
or in some more convenient way, just to get the number of docs:
# indextool -c sphinx.conf --dumphitlist <my_sphinx_index> <keyword> | grep docs | wc -l
You could find indextool in any Sphinx distribution since version 0.9.9, it’s usually located at the same directory as searchd.
Pingback/Trackback
How to avoid two backups running at the same time
Pingback/Trackback
How to avoid two backups running at the same time : Atom Wire
Harry says:
I get pleasure from, result in I discovered just what I
was having a look for. You’ve ended my four day lengthy hunt!
God Bless you man. Have a great day. Bye
Thurman Trumball says:
Thanks for finally talking about >Debuging Sphinx index with indextool <Liked it!
Bernie Berdin says:
and defines whether to check index waiting for rotation, i.e. with .new extension. This is useful when you want to check your index before actually using it.