I don't see a downside to this approach. Perhaps some increased latency?
Each shard allows at most 5 GetRecords operations per second. If you want to fan out to many consumers, you will reach those limits quickly and have to implement a significant latency/throughput tradeoff to make it work.
For API limits, see: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_...
Then any new lambdas or other services that want to subscribe to messages will have another queue, and another, etc.
I haven't had a case where I had service groups coming up and down, I'm struggling to think of a use case.
For example, an AWS Lambda triggered from SQS will lead to thousands of executions, each lambda pulling a new message from SQS.
But another consumer group, maybe a group of load balanced EC2 instances, will have a separate queue.
In general, I don't know of cases where you want a single message duplicated across a variable number of consumer groups - services are not ephemeral things, even if their underlying processes are. You don't build a service, deploy it, and then tear it down the next day and throw away the code.
Google Cloud really outshines AWS here with its serverless PubSub - its trivial to fan out, its low latency, and has similar delivery semantics (I think), and IMHO better, easier api's. Its a really impressive service, IMHO.
But their only method of throttling is to scale up and down base on failures. And it has been very unpredictable for me.
Even though my webhook started failing and timing out on requests, pubsub just kept hammering my servers until it brought it completely to it's knees. Logs on Google's end showed 1,500 failed attempts per second and 0.2 successes per second. It hammered at this rate for half an hour.
Seems like their Push option really needs some work.