Vlad Fedorkov

Performance consulting for MySQL and Sphinx

One, two, three, MySQLers… OOW 2019, Percona Live, ProxySQL Tech Day!

As people are getting back from vacations and application traffic is going back and hitting database backend hard, it is a good time to think about performance and prepare for the winter holiday spike. Whenever you are on-premise or in the cloud, questions are the same:

  • Are we satisfied with the current MySQL performance?
  • How much traffic are we currently serving?
  • Is there enough capacity to stay stable as traffic increases?
  • What would be the best way to stay cost-effective as we grow?

Unfortunately, the answers are not so obvious and require a thorough performance analysis. However, there is a chance that your problem has already been solved by someone else. I do not mean StackOverflow (which is a beautiful place, by the way), but some higher level questions which people usually discuss at conferences, and we have several of them ongoing or just a couple of weeks away.

Right this moment René Cannào and Peter Boros are preparing to deliver their talk MySQL with ProxySQL at Zuora at Oracle Open World conference, 09:00 AM @ Room 301, Moscone South.

At the same time next door in the Room 312 Oracle’ Frederic Descamps will be talking about MySQL Shell for DBAs.

If you are a developer, than you probably won’t miss Seventeen Things Developers Need to Know About MySQL by Peter Zaitsev 11:15 AM @ Room 3.

But that’s California, and what if you’re in Europe? Please hold on, we’re crossing the Atlantic!

Let’s move to Amsterdam where you might want to attend Percona Live Europe held from September 30th to October 2nd. As you land, you will not even need a taxi to get there as your venue is next to the airport! Besides useful tutorials on the first day, there will be two complete days filled with Database Wisdom. Perconians and their guests will share virtually every aspect of the database performance and operations.

Here is a list of my favorite talks from day one @ Percona Live:

  • MyRocks and RocksDB Advanced Features and Performance by Yoshinori Matsunobu (2:30 PM),
  • ClickHouse Features to Blow Your Mind by  Aleksei Milovidov (6:00 PM),
  • Tracing and Profiling MySQL by Valerii Kravchuk (same 6:00 PM).

The next day is also full of great talks, here’re just some examples:

  • Sveta Smirnova helps you to save life with Histograms (9:00 AM),
  • Marco Tusa will make a deep dive into Using ProxySQL as Firewall (10:00 AM),
  • Morgan Tocker will share his First 90 days experience with Vitess (11:00 AM).
  • And if you are looking for real hardcore stuff, please visit gdb basics for MySQL DBAs  by Valerii Kravchuk (3:30 PM).

That’s a really intense line-up for three days, but that’s not the end, here comes the ProxySQL Tech Day!

Right after Percona Live, on Thursday Oct 3rd, we are going to meet in person at a free event in Ghent, Belgium. We’ll be talking specifically about large-scale MySQL environments and how to serve a really huge amount of transactions with MySQL. Experts from 5 different companies will gather at Co.Station Gent and share their knowledge. Tricky questions, mind-breaking cases, crazy requirements and challenges of all kinds are super welcome! Please use this form and be sure that we’ll tackle your issue.

We’re starting at 5:00 PM, so you will have some time for a walk across the nice City of Ghent!

P.S. The views expressed here are my own and do not necessarily reflect the opinion of my employer.

What is stuck in MySQL server?

There are few easy ticks to see what is stuck inside MySQL instance. All these techniques will not give you whole picture, but might help to find queries that block performance. Let’s start from what exactly doing your MySQL server right now.

Which queries are running now?

This will give you an idea what’s running right now so you can find long running queries which slowing down MySQL and/or causing replication lag:

mysql -e "SHOW PROCESSLIST" | grep -v -i "sleep"

It is more convenient than just run “SHOW PROCESSLIST” as it’s hiding all connected threads in “Sleep” state so you’ll get a clean output. Also you can get same output but updating each second:

watch -n1 'mysql -e "SHOW FULL PROCESSLIST" | grep -v -i "Sleep"'

What to look for? This is complex output but you can start with Time and State columns. When you see a query running for more than one second it’s time for query review. If you sure it’s ok for query to be slow (like for queries in complex reports) you can move it away from critical server to special “reporting” replica.

When you see states like “Copying to tmp table” or any kinds of “Waiting …” like “Waiting for query cache lock” even if it’s fast then your server performance is less than stellar and it’s time to dig in.

How to collect bad qeuries

How to easily collect bad queries? Non-intrusive way is to listen MySQL network communications and dump all the data using tcpdump. This is extremely useful when you need to get an idea about most time consuming queries without touching MySQL configuration and query logging at all. This will also require Percona toolkit to be installed, but it’s worth it. Here is quick example. Let tcpdump run for some time and then run pt-query-digest to aggregate tcpdump output into readable report.

tcpdump -s 65535 -X -nn -q -tttt -i any port 3306 > mysql.tcp.txt
pt-query-digest --type tcpdump ./mysql.tcp.txt > digest.log

This will produce report with most resource consuming queries on the top. Optimizing them one by one will improve speed or your MySQL instances, application response time and so give you a chance to handle more traffic with less hardware.

Which engines are in use?

Where also to look? Performance optimization techniques quite complex but depends on the engine that you use to store data. InnoDB, MyISAM, TokuDB – they all require different approach in query tunning and troubleshooting. So next step is to see how your data is distributed across the engines. In that case the following query will help:

SELECT
    engine,
    count(*) TABLES,
    concat(round(sum(table_rows)/1000000,2),'M') rows,
    concat(round(sum(data_length+index_length)/(1024*1024*1024),2),'G') Size
FROM
    information_schema.TABLES
WHERE
    table_schema NOT IN
      ('mysql', 'information_schema', 'performance_schema')
GROUP BY engine
ORDER BY Size DESC;
Find all tables across all databases uses specific engine

Now you would probably like to see all tables that uses say MyISAM engine to store data. Here you go – example for MyISAM with big tables first.

SELECT
    concat(table_schema,'.',table_name),
    engine,
    concat(round(table_rows/1000000,2),'M') Rows,
    concat(round((data_length+index_length)/(1024*1024*1024),2),'G') Size
FROM
    information_schema.TABLES
WHERE
    engine = 'MyISAM' AND
    table_schema NOT IN
        ('mysql', 'information_schema', 'performance_schema')
ORDER BY
    Size DESC;

Keeping your queries in a good shape could save you money on hardware and make your application fast but it’s also cause servers to do less work which means less electricity consumed and less CO2 and heat released to the atmosphere. So it is good thing to do whatever you believe on global warming or not :)

Full table scans and MySQL performance

High season is coming, how do you make sure that MySQL will handle the increased load? Stress tests could help with that, but it’s not a good idea to run them in a production environment. In this case Select_scan, Select_full_join and other MySQL counters could quickly give you an idea of how many queries are not performing well and could cause a performance degradation as the load goes up.

Select_scan from SHOW GLOBAL STATUS indicates how many full table scans were done since last MySQL restart. Scanning the entire table is a resource intensive operation. It also forces MySQL to store unnecessary data in the buffer pool, wasting memory and IO resources. Continue reading

Difference between myisam_sort_buffer_size and sort_buffer_size

MySQL has two confusingly identical by the first look variables myisam_sort_buffer_size and sort_buffer_size. Thing is that those two confusingly similar variables has absolutely different meanings.

sort_buffer_size is a per-connection variable and do not belongs to any specific storage engine. It doesn’t matter do you use MyISAM or InnoDB – MySQL will allocate sort_buffer_size for every sort (required most of the times for ORDER BY and GROUP BY queries) so increasing it’s value might help speeding up those queries however I would not recommend to change it from the default value unless you are absolutely sure about all the drawbacks. Value for out-of-the-box MySQL-5.1.41 installation on Ubuntu is 2Mb and it’s recommended to keep it that way.

On the other side myisam_sort_buffer_size used by MyISAM to perform index sorting on relatively rare table-wide modifications like ALTER/REPAIR TABLE. Stock value is 8Mb so if you are using MyISAM tables intensively (please refer to my other post to see how to know your tables type) I would recommend to set it to some higher value close or even more than key_buffer_size but still small enough to keep it in memory and prevent MySQL from swapping.

Why is stock MySQL slow?

“I’ve installed MySQL and it doesn’t work fast enough for me”. MySQL server is heart of database driven application (if it uses MySQL as database of course!) and any slowness related to running queries is affecting all application layers.

MySQL server tuning and query slowness hunting are always step by step process and without knowing all the data (SHOW GLOBAL VARIABLES, SHOW GLOBAL STATUS, SHOW TABLE STATUS LIKE ‘tablename’, EXPLAIN details for slow query is just some of the required information) it would be generally a blind guess. But there are still few things which is related to newly installed MySQL server.

If you are using stock MySQL you might need to check memory pool size which MySQL used to load index data to avoid slow IO requests and increase queries speed. Connect to MySQL and fire two queries: Continue reading

Top 100 and top 500 stopwords for Sphinx Search

Back to year 2006 when I was working for my first sphinxsearch project I was playing with stopwords files. Stopwords is basically a small set of highly frequent words you often don’t want to search for (like “I”, “Am”, “The”, etc). For most sphinx instances they only wasting index space and slower your search queries by finding all occurrences of these non-important words.

Say if you are searching for “when is jane’s birthday” you are actually looking to find documents with “jane’s birthday”, and you don’t really care about lot’s of documents (blog posts, news articles, etc) with only “when” and “is” inside. Continue reading