As I work on projects, there are often interesting decisions that I am faced with. In the most recent project that I have been working on, I decided to use a queue to solve a particular problem.
The particular problem was the need to allow a user to upload an import file, but not have to wait for the import process to complete before the user can leave the page.
Using a queue ended up seeming like a great choice because it allowed me to upload the file, add it to the queue, and then send a message to the user that the file had been successfully uploaded and is in the process of being imported.
I set up a database queue...which is not usually recommended for a queue, but in this case, because there is pretty much only one single user of the app, I determined it would be fine to use the database queue instead of setting up Redis or some other queue system.
I added the queue system to the project and deployed it, everything was going smooth.
Over the next week or so, I ended up having to adjust the import script. So I made my changes, tested it locally in the CLI and since everything looked right, I pushed it up and deployed my changes.
Client messages me the next day and says that "something is not working correctly." This seems odd to me since I had deployed my code and tested it. I opened the db and looked, and sure enough the change I had made did not seem to be working. I then opened up the import file in vim on the server and looked to see if the code was there. It was. This is odd.
I figured that something odd had happened, and had a script I could run to update the new records with the missing info. Ran the script, told the client something weird happened, but the code was there and we will have to see tomorrow what happens.
Side note, this client is awesome and we have a great working relationship where he understands that code can sometimes be complicated. And with it being him as the only user, I did not have to worry about a business critical problem since he was not fully using the app at this point.
The next day, I got another message with the same issue. So looking again, the code was there, the db records just didn't update. Took me some time, and a little Googling, but eventually reading something triggered that when you use a queue, and in turn use Supervisor to keep the worker alive, it hit me...the realization that php usually kills the chef after each request...and since Supervisor "keeps the worker alive"...it became clear...the worker was not reading the new code and was an instance of the application running for a long time.
This is the first thing that I want to iterate about working with queues...1. Be sure to restart your workers when you deploy new code. I have talked with other devs about this, and they also learned this lesson. So just be aware of it. Luckily, Laravel provides an easy way to do this with `php artisan queue:restart` and you can add that to your deploy script to ensure that the workers are updated when you deploy.
After this knowledge was made aware to me, it fixed the issues, the import scripts began running correctly and I no longer had any of this, seemingly, unexplained code "not working" issues.
MaxAttemptsExceededException in failed_jobs
The next piece that was a little tricky to figure out, was a lot of failed jobs showing up. And I think it was a little tricky because what was really happening is not really described in the main "MaxAttemptsExceededException" that was being thrown.
The project required importing a lot of data, like 22,500+ csv files. It's essentially four years worth of data for a particular industry. But this is almost an idea situation for a queue, each csv contains data, and each one is independent, meaning each file can be imported by itself and in no particular order.
So I created a script to create jobs for each csv, and then just let the workers run. After it was running for 24+ hours, I noticed these failed jobs, and the above exception.
I started thinking about the max_execution_time option that php has, but I was able to exceed that in CLI for some test import files, so I did not think that was the case. And I tried importing locally one of the failed csv, and it completed without issue.
So I began trying to think about what could be causing a problem, and then thought that there could be a timeout on how long a worker process tries to run.
Turns out, that is exactly what happens. It's in the docs, but this is my first experience with queues, and I guess I simply skipped this part of the docs.
In Laravel you simply need to edit your `config/queue.php` file and set the `expire` option for the right type of queue to be appropriate for how long the job might take. In my case the DB looked to show most imports completed in 45 seconds to one minute and a half.
I set mine to 1:30 minutes and deployed. But kept noticing that the failed jobs were still happening. A little more reading, turns out you should also set the `retry_after` value to match the `expire` setting. The `retry_after` setting tells the worker to try for X amount of time, before trying again.
I tried another csv locally and timed it and it took 1:36 minutes, which made it clear that the `retry_after` setting was causing a problem. I adjusted this setting, set it to four minutes, and deployed.
I restarted the entire import, all 22,500 files and let it run. After an hour, I had zero failed jobs, and this seemed to solve the problem. And this is the second thing I learned from my experience working with queues...2. Be sure your workers have enough time to complete the task given.
Setting up a queue table is super simple in Laravel...`php artisan queue:table` will generate the migration for a basic queue table.
Which, I assumed would handle failed jobs. As you can guess, after running some test jobs, I assumed that things were working correctly, but then realized I didn't know.
Back to the docs, and I see there is another command you need to make the migration for the failed_jobs table. `php artisan queue:failed-table`.
Just be sure to run `php artisan migrate` after the two queue table commands and you will have the needed queue tables in your db.
Wrap up and general experience with queues
After sorting out the above, and considering this is my first time using queues, I think queues are awesome and suprisingly easy.
The above issues, were a bit my fault for not fully reading through the docs, but also feel like because I experienced the issues, I will not forget them soon.
I think the queue was the right decision to allow the user to get feedback from the system quickly, and not have to wait for the import to complete. This is a great user experience, and could be improved with a notification to the user when the queue is complete, but I did not see the need for that for the client.
I do think that queues are great, but that they are not needed for everything. Sometimes it's perfectly fine to have the user wait if it's only expected to be a few seconds. When you get into minutes, or need something to "happen in the background" queues are a great choice.