On SQS - zlacker

>>mpweih+(OP)
A long time ago, as new-ish developer, I was building a system that needed to take inputs, then run "pass/fail/wait and try again later" until timeout or completion. This wasn't mission-critical stuff, mind you, so a lost message would annoy someone but not cause any actual harm.

As I was figuring out how to setup a datastore, query it for running workflows and all that jazz, I happened upon an interesting SQS feature: Post with Delay.

And so, the system has no database. Instead, when new work arrives it posts the details of the work to be done to SQS. All hosts in the fleet are polling SQS for messages. When they receive one, they do the checks and if the process isn't complete they repost the message again with a 5-minute delay. In 5 minutes, a host in the fleet will receive the message and try again. The process continues as long as it needs to.

Looking back, part of me now is horrified at this design. But: that system now has thousands of users and continues to scale really well. Data loss is very rare. Costs are low. No datastore to manage. SQS is just really darned neat because it can do things like that.

>>mabbo+kK
Why are you horrified at a design that works well, scales well, is resilient enough for its use case, and is low cost? The whole point of an engineering design process is to find designs that meet these types of requirements. Honestly, this sounds like the perfect solution for what you're trying to accomplish.