As I was figuring out how to setup a datastore, query it for running workflows and all that jazz, I happened upon an interesting SQS feature: Post with Delay.
And so, the system has no database. Instead, when new work arrives it posts the details of the work to be done to SQS. All hosts in the fleet are polling SQS for messages. When they receive one, they do the checks and if the process isn't complete they repost the message again with a 5-minute delay. In 5 minutes, a host in the fleet will receive the message and try again. The process continues as long as it needs to.
Looking back, part of me now is horrified at this design. But: that system now has thousands of users and continues to scale really well. Data loss is very rare. Costs are low. No datastore to manage. SQS is just really darned neat because it can do things like that.
Depending on the workload this could be not a big deal or very expensive. Treating a queue as a database, particularly queues that can't participate in XA transactions, can get you in trouble quick.
So a duplicate message should be processed as normal anyway, e.g. by deduplication within a reasonable window, and/or by having idempotent operations.
I can't do any analytics about how long things typically take, who my biggest users are, etc. I mean, I could, but I'd have to add a datastore for that.
Adding new details to the parameters of the system requires very careful work to make all changes backwards and forwards compatible so that mid-deployment we don't have messages being pushed that old hosts can't process or new hosts seeing old messages they don't understand. That's good practice generally, but it's super mission critical to get right this way.
Also, a dropped message is invisible. SQS has redrive, sure, and that helps but if there were a bug, an edge case, where the system stopped processing something and quietly failed, that processing would just stop and we'd never know. If the entries were in a datastore, we'd see "Hey, this one didn't finish and I havne't worked on it lately, what gives?".