zlacker

that just sounds like "we didn't even try to build those systems in that way, and we're all out of ideas, so it basically will never work"

which is really just a very, very common story with ai problems, be it sources/citations/licenses/usage tracking/etc., it's all just 'too complex if not impossible to solve', which just seems like a facade for intentionally ignoring those problems for benefit at this point. those problems definitely exist, why not try to solve them? because well...actually trying to solve them would entail having to use data properly and pay creators, and that'd just cut into bottom line. the point is free data use without having to pay, so why would they try to ruin that for themselves?

replies(2): >>simonw+T >>KHRZ+y3

>>pxoe+(OP)
What makes you think AI researchers (including the big labs like OpenAI and Anthropic) aren't trying to solve these problems?

replies(1): >>pxoe+V3

>>pxoe+(OP)
Just a question, do you remember a source for all the knowledge in your mind, or did you at least try to remember?

replies(2): >>pxoe+D4 >>bluefi+rb

>>simonw+T
the solutions haven't arrived. neither have changes in lieu of having solutions. "trying" isn't an actual, present, functional change. and it just gets passed around as an excuse for companies to keep doing whatever they're doing.

replies(1): >>pama+Yj

>>KHRZ+y3
a computer isn't a human. aren't computers good at storing data? why can't they just store that data? they literally have sources in datasets. why can't they just reference those sources?

human analogies are cute, but they're completely irrelevant. it doesn't change that it's specifically about computers, and doesn't change or excuse how computers work.

replies(8): >>umvi+D7 >>wrs+V7 >>qup+38 >>jquery+J8 >>KHRZ+lb >>Tao330+ch >>Kim_Br+Lh >>Levitz+RM

>>pxoe+D4
Yes, computers are good at storing data. But there's a big difference between information stored in a database and information stored in a neural network. The former is well defined, the latter is a giant list of numbers - literally a black box. So in this case, the analogy to a human brain is fairly on-point because just as you can't perfectly cite every source that comes out of your (black box) brain, other black boxes have similar challenges.

>>pxoe+D4
The analogy to a database is also irrelevant. LLMs aren’t databases.

>>pxoe+D4
When all the legal precedents we have are about humans, human analogies are incredibly relevant.

replies(1): >>jazzyj+ni

>>pxoe+D4
LLMs are not databases. There is no "citation" associated with a specific query, any more than you can cite the source of the comment you just made.

replies(1): >>aantix+de

>>pxoe+D4
OK, let's say you were given a source for an LLM output such as "Common Crawl/reddit/1000000 books collection". Would this be usefull? Probably not. Or do you want the chat system to operate magnitudes slower so it can search the peta bytes of sources and warn of similarities constantly for every sentence? That's obviously a huge waste of resources, it should probably be done by the users appropriately for their use case, such as these NY Times journalists which were easily able to find such similarities themselves for their use case of "specifically crafted prompts to output NY Times text".

>>KHRZ+y3
No, but I'm a human and treating computers like humans is a huge mistake that we shouldn't make.

replies(1): >>pama+Hj

>>jquery+J8
That's fine. Solve it a different way.

OpenAI doesn't just get to steal work and then say "sorry, not possible" and shrug it off.

The NYTimes should be suing.

replies(4): >>MeImCo+vi >>Kim_Br+Ti >>Levitz+lP >>slyall+CT

>>pxoe+D4
You'd effectively be asking it to cite sources on why the next token is statistically likely. Then it will hallucinate anyway and tell you the NYT said so. You might think you want this, but you don't.

>>pxoe+D4
Can't have your cake and eat it too.

1. If you run different software (LLM), install different hardware (GPU/TPU), and use it differently (natural language), to the point that in many ways it's a different kind of machine; does it actually surprise you that it works differently? There's definitely computer components in there somewhere, but they're combined in a somewhat different way. Just like you can use the same lego bricks to make either a house or a space-ship, even though it's the same bricks. For one: GPT-4 is not quite going to display a windows desktop for you (right-this-minute at least)

2. Comparing to humans is fine. Else by similar logic a robot arm is not a human arm, and thus should not be capable of gripping things and picking them up. Obviously that logic has a flaw somewhere. A more useful logic might be to compare eg. Human arm, Gorilla arm, Robot arm, they're all arms!

>>qup+38
There is a hundred years of legal precedents in the realm of technology upsetting the assumptions of copyright law. Humans use tools - radios, xerox machines, home video tape. AI is another tool that just makes making copies way easier. The law will be updated, hopefully without comparing an LLM to a man.

>>aantix+de
And god willing if there is any justice in the courts NYTimes will lose this frivolous lawsuit.

Copyright law is a prehistoric and corrupt system that has been about protecting the profit margins of Disney and Warner Bros rather than protecting real art and science for living memory. Unless copy/paste superhero movies are your definition of art I suppose.

Unfortunately it seems like judges and the general public are so clueless as to how this technology works it might get regulated into the ground by uneducated people before it ever has a chance to take off. All so we can protect endless listicle factories. What a shame.

replies(1): >>lewhoo+d01

>>aantix+de
Clearly, "theft" is an analogy here (since we can't get it to fit exactly), but we can work with it.

You are correct, if I were to steal something, surely I can be made to give it back to you. However, if I haven't actually stolen it, there is nothing for me to return.

By analogy, if OpenAI copied data from the NYT, they should be able to at least provide a reference. But if they don't actually have a proper copy of it, they cannot.

>>bluefi+rb
Treating computers like humans in this one particular way is very appropriate. It is the only way that LLM can synthesize a worldview when their training data is many thousands of times larger than their number of parameters. Imagine scaling up the total data by another factor of 1million in a few years. There is no current technology to store that info but we can easily train large neural nets that can recreate the essence of it, just like we traditionally trained humans to recall ideas.

>>pxoe+V3
Please recall how much the world changed in just the last year. What would be your expected timescale for the solution of this particular problem and why is it more important than instilling models with the ability to logically plan and answer correctly?

replies(1): >>pxoe+po8

>>pxoe+D4
I'm sorry if this is too callous, but if you don't understand what you are talking about you should first familiarize yourself with the problem, then make claims about what should be done.

It would be great if we could tell specifically how something like ChatGPT creates its output, it would be great for research, so it's not like there is no interest in it, but it's just not an easy thing to do. It's more "Where did you get your identity from?" than "What's the author of that book?". You might think "But sometimes what the machine gives CAN literally be the answer to 'What is the author of that book?'" but even in those cases the answer is not restricted to the work alone, there is an entire background that makes it understand that thing is what you want.

>>aantix+de
Really? Solve it a different way? Do you realize the kind of tech we are talking about here?

This kind of mentality would have stopped the internet from existing. After all, it has been an absolute copyright nightmare, has it not?

If that's what copyright does then we are better without it.

>>aantix+de
You sound like one of those government people who demand encryption that has government backdoors but is perfect safe from attackers.

When told it is impossible they go "Geek Harder then Nerd" like demanding it will make it happen.

>>MeImCo+vi
> Copyright law is a prehistoric and corrupt system that has been about protecting the profit margins of Disney and Warner Bros rather than protecting real art

These types of arguments miss the mark entirely imho. First and foremost, not every instance of copyrighted creation involves a giant corporation. Second, what you are arguing against is the unfair leverage corporations have when negotiating a deal with a rising artist.

>>pama+Yj
the timeline for LLMs and image generation has been 6+ years. it is not a thing where it "arrived just this year, and only just changing". it's been in a development for a long time. and yet.