Any chance AI could be used to dedup the stories (like these are identical - only show higher source)?
The problem with deduping is that some news get posted and reposted by different sources for several days (sometimes even weeks) in a row. That's a huge context I'd have to put in AI.
1000 news titles * 3 days * 70 symbols per title = 210,000 symbols = 40000 words = 53000 tokens. My current context window is 8000 tokens and I think 32000 tokens is max that GPT-4 allows.
---
Add: now that I think about it should be possible to do in several runs. Will keep thinking about it, thanks for the suggestion.