It's not clear to me that this is true for all data about an arrest/charge.
Courts have upheld that arrest mugshots and fingerprints taken at jail intake time both can be retained by the law enforcement system even if the arrested person is exonerated (acquitted, charged dropped, etc).
Surely there is a way to achieve the stated goal of the project without the collateral damage of exposing information that individuals might, for whatever reason, prefer not be published online.
You’re still correct that datasets such as these might need to be globally distributed, instead of hosted with a single commercial provider.
Edit: or am I thinking of libel?
EDIT I'm not going to do your legal homework for you, but this is South Carolina, for example. As stated above, each of the 50 United States has various laws and regulations with regards to arrest and criminal records. Violate those laws at your own risk, but if a lawyer is not being involved in this project on an ongoing basis, I highly recommend anyone to avoid: https://www.scjustice.org/criminal-records-come-back-haunt-e...
That's just about respecting expungement (30 day notice must takedown). If you improperly record or transcode the data from the scrape and that results in someone being attributed to something that the record never showed, you are subject to full weight of defamation lawsuits. If you unwittingly expose someone's private information that is involved in witness protection, for example, you can be subject to legal and civil penalties: https://www.gsa.gov/reference/gsa-privacy-program/rules-and-...
I am sympathetic to minimizing collateral damage of the innocent, or even those without a chronic history of abuse (we should never be judged by the single worst day of our lives), but also believe in the vigorous application of sunlight on the nefarious.