Yo, my indexes got indexes

Posted by Andy Huggins on November 01, 2017

Have you ever heard that you should add indexes to your MySQL tables?

Have you ignored that advice because you didn't believe it would be a serious improvement?

In a recent project, we had to add a lot of past data, like four years worth of data for basically an entire industry. I'm talking 22,500 CSV files worth of data.

I had already created the import scripts and ran them many times uploading new data, but now I needed to import this old data. I transferred the CSV files to the server, edited some scripts to turn those CSV files into jobs, and setup multiple workers do process the files.

I kicked off this "run" and left it. After a few days, the files had been worked down to 15,500 files, or around one third had been completed. And a few days was probably like 60 hours or so.

Clearly, this was taking a long time.

I talked with a co-worker and we did a quick code review to see if we could think of anything. We ended up noticing that a lot of the imports were using a Laravel Relationship on our models. So thinking about how those work, we realized that we had not set up indexes on the main keys we were using to relate the data.

We figured it would be worth a try and created new migrations for these "key" fields.

Deployed and ran the migrations and kept looking at the number of jobs left to process. As soon as the indexes were built, each refresh of the count of jobs in Sequel Pro, it looked like there was an issue.

15,500 jobs had dropped to 14,200 in a few minutes.

A few minutes later there were 13,000 (or less) left. And no new "failed jobs" in the failed_jobs table.

I looked at the `updated_at` fields on the records that should be being imported, and to my amazement, things were being updated.

The jobs were flying. Within about 30-45 minutes, the CSV file import was done. I could not believe it.

Another Example of Indexes

After the import completed, I poked around in the application to see if things slowed down at all with all the new data. And on one of the most important pieces of the app, I noticed that the load time had increased by 10x.

I then loaded in the Laravel Debugbar and looked at the queries that were happening on the page. I had 56 queries and it was taking 26-32 seconds.

But this is where the power of the debugbar comes in. If you look through the queries, it lists how long it takes for each query to execute. Out of the 56 queries for that page, most were pretty fast, 1.32ms, 12ms, 600ns (nanoseconds?), and then I came across one that was running for 24 seconds.

The debugbar said the page took 26-32 seconds (multiple loads the time varied) and this one query was taking 24 seconds. Well, I think I just found the next thing to optimize.

Reading the query, it was clear that another index was needed. I made the migration and ran it.

Reload the page and the page loads in 1.5 seconds. What? I think, oh, it's probably cached, let me force a rebuild...page loads in 1.55 seconds. Oh. I see. It worked.

Repeating this process, I was able to get it down to about a half second of load time.