Show HN: WakaQ - a Python distributed task queue

>>osdev+He
1. That's the part I'm not happy with, but currently I use the `@wakaq.after_worker_started` decorator to setup an error logging handler in each worker. It outputs to a file that gets aggregated for error reporting, but without examples most people wouldn't know to do that. Here's the code: https://gist.github.com/alanhamlett/365d48276ac054ae75e59525...

2. Delayed tasks are added to a Redis sorted set, where the eta datetime when task should run is the score. Then the sorted set is queried for all items/tasks between zero and the current time. That returns eta tasks which are ready to run. Those tasks are added to their equivalent non-eta queue at the front and executed by the next worker. Eta tasks might not run right at the eta datetime, but they shouldn't run too early if all worker machines have clocks synced with time servers.

3. wakaq-info prints all queues with counts of pending tasks and pending eta tasks. Throughput isn't printed and has to be calculated by sampling wakaq-info at least twice. You can also import info from wakaq.utils to use it from a Python script.

4. No built-in way to pause. I pause by sending a broadcast task to all workers, which sets exclude_queues to all queues and then restarts the worker. Then it only listens for broadcast tasks and can be turned back on with another broadcast task.

> Congrats

Thanks!

>>notaco+4h
Tests? We don't need no stinking tests! [1] I didn't plan to open source this so quickly, if ever. I'm just happy it's working for WakaTime. The makefile has plans for pytest, but honestly my motivation for tests has waned after fixing the production issues with Celery.

[1] https://en.wikipedia.org/wiki/Stinking_badges

>>llIIll+Xa
Dramatiq is easy to daemonize which I like a about a lot but the schedule feature is not there and that's big drawback. I had to use my custom open source scheduler to schedule a function. https://github.com/rajasimon/beatserver

>>nerdpo+Zh
Ah yes, "hobby projects" you don't want to go paying those any attention ... https://en.wikipedia.org/wiki/History_of_Linux#The_creation_... ...

>>welder+(OP)
i made one as well called SAQ!

https://github.com/tobymao/saq

Looks like you use BLPOP, but there could be race conditions or lost jobs if your workers crash, check out BLMOVE and RPOPLPUSH.

>>captai+gG
> i made one as well called SAQ! https://github.com/tobymao/saq

Awesome! I'll check out your implementation.

> Looks like you use BLPOP, but there could be race conditions or lost jobs if your workers crash, check out BLMOVE and RPOPLPUSH.

I prefer ack-first (blpop) and designing the tasks to expect some amount of errors.

>>zerop+BE
https://github.com/wakatime/wakaq/blob/main/wakaq/__init__.p...

and

https://github.com/wakatime/wakaq/blob/main/wakaq/worker.py

is the meat of it. The blog post talks about the Redis data structures used, and there's not much to it beyond that.

>>ttymck+9r
Yes, if you're accepting the type as a param or returning the type. It happens very frequently, and without moving those imports behind "if typing.TYPE_CHECKING", you constantly run into cyclic imports.

https://stackoverflow.com/questions/39740632/python-type-hin...

>>nerdpo+Zh
It's being used in production at https://wakatime.com and performing better than Celery was, but yes it's an internal project that was open sourced early stage.

>>avinas+wE1
WakaQ re-forks workers that crash, but normally a crash means an exception was raised in application task code.

I do a combination of a few things to handle exceptions in tasks:

1. `@wakaq.task(max_retries=X)` auto retries the task on soft timeouts, which are usually intermittent and faster the second time the task runs

2. use error handling to track task failure rates https://gist.github.com/alanhamlett/365d48276ac054ae75e59525...

3. build tasks to expect some failures by being safe to re-run without, for ex: sending an email twice. this usually means exclusive locking in tasks or keeping counters of recent task runs or task progress somewhere

4. try catch blocks inside tasks, which rollback and re-enqueue the same task to retry later

Basically I handle all crashes at the application task level, not in WakaQ.

>>alecr9+f37
> Or wanting to shift the architecture entirely to avoid using memory-bound Redis as a queue with an overflow risk.

I wanted to use SSDB[1] instead of Redis for that reason, but it doesn't support the necessary data structures.

[1] https://github.com/ideawu/ssdb

zlacker

Show HN: WakaQ - a Python distributed task queue