You could deduplicate based on vector similarity within a few days (7 days for big stories as their news cycle is longer, 24 or 48 hours for smaller stories).
Idea: The number of stories within a cluster weighted by credibility of the source could be another element of the rating.
>>julien+(OP)
Wow, thank you! I'm just a frontend dev and know almost nothing about this.
Will research vector similarity, sounds like the solution I need.