The irony is that if you had a great discriminator to separate the wheat from the chaff, that it would probably make its way into the next model and would no longer be useful.
My only recommendation is that OpenAI et al should be tagging metadata for all generated images as synthetic. That would be a really interesting tag for media file formats (would be much better native than metadata though) and probably useful across a lot of domains.
I can see the future as being devoid of any humanity.
I think you’re right, and it’s unlikely that we (society) will convince people to label their AI content as such so that scraping is still feasible.
It’s far more likely that companies will be formed to provide “pristine training sets of human-created content”, and quite likely they will be subscription based.
Neil Stephenson covered this briefly in "Fall; or Dodge In Hell." So much 'net content was garbage, AI-generated, and/or spam that it could only be consumed via "editors" (either AI or AI+human, depending on your income level) that separated the interesting sliver of content from...everything else.
I guess the concern would be: If one of these recipe websites _was_ generated by an AI, the ingredients _look_ correct to an AI but are otherwise wrong - then what do you do? Baking soda swapped with baking powder. Tablespoons instead of teaspoons. Add 2tbsp of flower to the caramel macchiato. Whoops! Meant sugar.
well, we do have organic/farmed/handcrafted/etc. food. One can imagine information nutrition label - "contains 70% AI generated content, triggers 25% of the daily dopamine release target".
I think this will introduce unavoidable background noise that will be super hard to fully eliminate in future large scale data sets scraped from the web, there's always going to be more and more photorealistic pictures of "cats" "chairs" etc. in the data that are close to looking real but not quite, and we can never really go back to a world where there's only "real" pictures, or "authentic human art" on the internet.
Cheap books, cheap TV and cheap music will be generated.
Imagine that instead of having cheap labor from Southeast Asia churn out these videos, that instead they are just spit out as fast as possible using AI.
Unless you assume there are bad actors who will crop out the tags. Not many people now have access to Dall-E2 or will have access to Imagen.
As someone working in Vision, I am also thinking about whether to include such images deliberately. Using image augmentation techniques is ubiquitous in the field. Thus we introduce many examples for training the model that are not in the distribution over input images. They improve model generality by huge margins. Whether generated images improve generality of future models is a thing to try.
Damn I just got an idea for a paper writing this comment.
I don't know why people do that but lots of randoms on the internet do that and they're not even bad actors per se. The removed signatures from art posted online became a kind of a meme itself. Especially when comic strips are reposted on Reddit. So yeah, we'll see lots of them.
If the AI models can't consume it, it can't be commoditised and, well, ruined.
A bit far out there in terms of plot but the notion of authenticating based on a multitude of factors and fingerprints is not that strange. We've already started doing that. It's just that we currently still consume a lot of unsigned content from all sorts of unreliable/untrustworthy sources.
Fake news stops being a thing as soon as you stop doing that. Having people sign off on and vouch for content needs to start becoming a thing. I might see Joe Biden saying stuff in a video on Youtube. But how do I know if that's real or not?
With deep fakes already happening, that's no longer an academic question. The answer is that you can't know. Unless people sign the content. Like Joe Biden, any journalists involved, etc. You might still not know 100% it is real but you can know whether relevant people signed off on it or not and then simply ignore any unsigned content from non reputable sources. Reputations are something we can track using signatures, blockchains, and other solutions.
Interesting with Neal Stephenson that he presents a problem and a possible solution in that book.
A digital picture of an oil painting != an actual oil painting
Of course once someone trains an AI with a robotic arm to do the actual painting, then your worry holds firm.
Naturally there's a python library [1] with some algorithms that are resistant to lossy compression, cropping, brightness changes, etc. Scaling seems to be a weakness though.
Less common opinion: this is also how you end up with models that understand the concept of themselves, which has high economic value.
Even less common opinion: that's really dangerous.
As AI advances, a lot of people will look after experiencing life outside the digital world.
Even digital communication will not be trustworthy anymore with deepfaces and everything else, so people will want to get together more often.
Edit: for the lazy ones, yeah, digital will be a sad and heartless environment...
maybe I misunderstood, but I had it that people used generative AI models that would transform the media they produced. The generated content can be uniquely identified, but the creator (or creators) retains anonymity. Later these generative AI models morphed into a form of identity since they could be accurately and uniquely identified.
I loved that he extended the concept of identity as an individualized pattern of events and activities to the real world: the innovation of face masks with seemingly random but unique patterns to foil facial recognition systems but still create a unique identity.
Like you say, the story itself had horrible flaws (I'm still not sure if I liked it in its totality, and I'm a Stephenson fan since reading Snow Crash on release in '92), but still had fascinating and thought provoking content.
Considering how many of the readers of said blog will be scrapers and bots, who will use the results to generate more spammy "content", I think you are right.
[0] https://creativecloud.adobe.com/discover/article/how-to-use-...
I can see a past where this already happened, to paraphrase Douglas Adams ;)
It's been done, starting from plotter based solutions years ago, through the work of folks like Thomas Lindemeier:
https://scholar.google.com/citations?user=5PpKJ7QAAAAJ&hl=en...
Up to and including actual painting robot arms that dip brushes in paint and apply strokes to canvas today:
https://www.theguardian.com/technology/2022/apr/04/mind-blow...
The painting technique isn't all that great yet for any of these artbots working in a physical medium, but that's largely a general lack of dexterity in manual tool use rather than an art specific challenge. I suspect that RL environments that physically model the application of paint with a brush would help advance the SOTA. It might be cheaper to model other mediums like pencil, charcoal, or even airbrushing first, before tackling more complex and dimensional mediums like oil paint or watercolor.