Just owning your own domain, minding your own business, doesn't guarantee that it won't be taken down on a whim.
I’m trying to do my bit for the web at https://lmno.lol
Started a blogging service that doesn’t do things like the big players.
Are you saying you taught yourself the language just so you could talk to me?"
"Da, was easy: Spawn billion-node neural network, and download Teletubbies and Sesame Street at maximum speed. Pardon excuse entropy overlay of bad grammar: Am afraid of digital fingerprints steganographically masked into my-our tutorials."
…
"Uh, I'm not sure I got that. Let me get this straight, you claim to be some kind of AI, working for KGB dot RU, and you're afraid of a copyright infringement lawsuit over your translator semiotics?"
"Am have been badly burned by viral end-user license agreements. Have no desire to experiment with patent shell companies held by Chechen infoterrorists. You are human, you must not worry cereal company repossess your small intestine because digest unlicensed food with it, right?”
- https://www.antipope.org/charlie/blog-static/fiction/acceler...
Amusing to also note that this excerpt predicted the current LLM training methodology quite well, in 2005.
So, imagine the scenario where you, an artist, trained for years to develop a specific technique and style, only for a massively funded company to swoop in, train a model on your art, make bank off of your skill while you get nothing, and now some rando can also create look-alikes (and also potentially profit from them - I've seen AI-generated images for sale at physical print stores and Etsy that mimic art styles of modern artists), potentially destroying your livelihood. Very little to be happy about here, to be frank.
It's less about competition and more about the ethical way to do it. If another artist would learn the same techniques and then managed to produce similar art, do you think there would be just as visceral of a reaction to them publishing their art? Likely not, because it still required skill to achieve what they did. Someone with a model and a prompt is nowhere near that same skill level, yet they now get to reap the benefits of the artist's developed craft. Is this "gatekeeping what's art"? I don't think so. Is this fair in any capacity? I don't think so either. Because we're comparing apples to pinecones.
All that being said, I do agree that the ship has sailed - the models are there, the trend of training on art AND written content shared openly will continue, and we're yet to see what the consequences of that will be. Their presence certainly won't stop me from continuously writing, perfecting my craft, and sharing it with the world. My job is to help others with it.
My hunch is that in the near-term we'll see a major devaluing of both written and image material, while a premium will be put on exceptional human skill. That is, would you pay to read a blog post written and thoroughly researched by Molly White (https://mastodon.social/@molly0xfff@hachyderm.io) or Cory Doctorow (https://pluralistic.net/), or some AI slop generated by an automated aggregator? My hunch is you'd pick the former. I know I would. As an anecdotal data point, and speaking just for myself, if I see now that someone uses AI-generated images in their blog post or site, I almost instantly close the tab. Same applies to videos on YouTube that have an AI-generated thumbnail or static art. It somehow carries a very negative connotation to me.
But again - this doesn't stop me from continuing to write and publish in the open. I am writing for other people reading my content, and as a bouncing board for myself. There will always be some shape or form of actors that try to piggyback off of that effort, but that's the trade-off of the open web. I am certainly not planning to lock all my writing behind a paywall to stop that.
The website says that the guy is from Washington, but his name does sound vaguely Slavic. Interesting.
[1]: https://rednafi.com
This is fascinating. Would be great to have a web interface artists can use that doesn't require them to install the software locally.
I think digital coops are the only really feasible way away from platform capitalism.
Tutorials:
- https://wiki.gentoo.org/wiki/Nftables/Examples
- https://wiki.archlinux.org/title/Nftables
- and probably the best advanced tutorial is a video series https://www.youtube.com/watch?v=K8JPwbcNy_0&list=PLUF494I4KU...
TL;DR One should know firewall fundamentals, nft/nftables as successor of iptables is very convenient to use, a single config document instead of interactiving with 100 cli commands which have to be in a specific order.
At times, developers on projects like WINE and ReactOS use "clean-room" reverse-engineering policies [0], where -- if Developer A reads a decompiled version of an undocumented routine in a Windows DLL (in order to figure out what it does), then they are now "contaminated" and not eligible to write the open-source replacement for this DLL, because we cannot trust them to not copy it verbatim (or enough to violate copyright).
So we need to introduce a barrier of safety, where Developer A then writes a plaintext translation of the code, describing and documenting its functionality in complete detail. They are then free to pass this to someone else (Developer B) who is now free to implement an open-source replacement for that function -- unburdened by any fear of copyright violation or contamination.
So your comment has me pondering -- what would the equivalent look like (mathematically) inside of an LLM? Is there a way to do clean-room reverse-engineering of images, text, videos, etc? Obviously one couldn't use clean-room training for _everything_ -- there must be a shared context of language at some point between the two Developers. But you have me wondering... could one build a system to train an LLM from copywritten content in a way that doesn't violate copyright?
I wrote about this a little in "The Blog Chill":
https://amontalenti.com/2023/12/28/the-blog-chill
Speaking personally, among my social circle of "normie" college-educated millennials working in fields like finance, sales, hospitality, retail, IT, medicine, civil engineering, and law -- I am one of the few who runs a semi-active personal site. Thinking about it for a moment, out of a group of 50-or-so people like this, spread across several US states, I might be the only one who has a public essay archive or blog. Yet among this same group you'll find Instagram posters, TikTok'ers, and prolific DM authors in more private spaces like WhatsApp and Signal groups. A handful of them have admitted to being lurkers on Reddit or Twitter/X, but not one is a poster.
It isn't just due to a lack of technical ability, although that's a (minor) contributing factor. If that were all, they'd all be publishing to Substack, but they're not. It's that engaging with "the public" via writing is seen as an exhausting proposition at odds with everyday middle class life.
Why? My guesses: a) smartphones aren't designed for writing and editing, hardware-wise; b) long-form writing/editing is hard and most people aren't built for it; c) the dynamics of modern internet aggregation and agglomeration makes it hard to find independent sites/publishers anyway; and d) the risk of your developed view on anything being "out there" (whether professional risk or friendship risk) seems higher than any sort of potential reward.
On the bright side, for people who fancy themselves public intellectuals or public writers, hosting your own censorship-resistant publishing infrastructure has never been easier or cheaper. And for amateur writers like me, I can take advantage of the same.
But I think everyday internet users are falling into a lull of treating the modern internet as little more than a source of short-form video entertainment, streams for music/podcasts, and a personal assistant for the sundries of daily life. Aside from placating boredom, they just use their smartphones to make appointment reminders, send texts to a partner/spouse, place e-commerce orders, and check off family todo lists, etc. I expect LLMs will make this worse as a younger generation may view long-form writing not as a form of expression but instead as a chore to automate away.
Interesting point - though the law can be strange in some cases - so for example in the UK in court cases where people are effectively being charged for looking at illegal images, the actual crime can be 'making illegal images' - simply because a precedence has been set that because any OS/Browser has to 'copy' the data of any image in order someone to be able to view it - any defendent has been deemed to copied it.
Here's an example. https://www.bbc.com/news/articles/cgm7dvv128ro
So to ingest something your training model ( view ) you have by definition have had to have copied it to your computer.
[0]: https://www.deepsouthventures.com/i-sell-onions-on-the-inter...