overview for lily33

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To) in c/[email protected]

[–] [email protected] 4 points 2 years ago* (last edited 2 years ago) (2 children)

From Wikipedia, "a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work".

You can probably can the output of an LLM 'derived', in the same way that if I counted the number of 'Q's in Harry Potter the result derived from Rowling's work.

But it's not 'derivative'.

Technically it's possible for an LLM to output a derivative work if you prompt it to do so. But most of its outputs aren't.

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To) in c/[email protected]

[–] [email protected] 10 points 2 years ago* (last edited 2 years ago)

One the contrary - the reason copyright is called that is because it started as the right to make copies. Since then it's been expanded to include more than just copies, such as distributing derivative works

But the act of distribution is key. If I wanted to, I could write whatever derivative works in my personal diary.

I also have the right to count the number of occurrences of the letter 'Q' in Harry Potter workout Rowling's permission. This I can also post my count online for other lovers of 'Q', because it's not derivative (it is 'derived', but 'derivative' is different - according to Wikipedia it means 'includes major copyrightable elements').

Or do more complex statistical analysis.

Which is one conspiracy theory that you think/believe may have some verity to it? in c/[email protected]

[–] [email protected] 7 points 2 years ago

Indeed not a conspiracy. A conspiracy involves a crew of conspirators conspiring to do something illegal or unethical.

If it's just the laws of physics conspiring, it doesn't count.

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To) in c/[email protected]

[–] [email protected] 23 points 2 years ago (17 children)

I'm sick and tired of this "parrots the works of others" narrative. Here's a challenge for you: go to https://huggingface.co/chat/, input some prompt (for example, "Write a three paragraphs scene about Jason and Carol playing hide and seek with some other kids. Jason gets injured, and Carol has to help him."). And when you get the response, try to find the author that it "parroted". You won't be able to - because it wouldn't just reproduce someone else's already made scene. It'll mesh maaany things from all over the training data in such a way that none of them will be even remotely recognizable.

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To) in c/[email protected]

[–] [email protected] 30 points 2 years ago* (last edited 2 years ago) (29 children)

They have the right to ingest data, not because they're “just learning like a human would". But because I - a human - have a right to grab all data that's available on the public internet, and process it however I want, including by training statistical models. The only thing I don't have a right to do is distribute it (or works that resemble it too closely).

In you actually show me people who are extracting books from LLMs and reading them that way, then I'd agree that would be piracy - but that'd be such a terrible experience if it ever works - that I can't see it actually happening.

A disturbing number of TikTok videos about autism include claims that are “patently false,” study finds in c/[email protected]

[–] [email protected] 9 points 2 years ago* (last edited 2 years ago)

BREAKING NEWS: PEOPLE SAY WRONG THINGS ON THE INTERNET!

Today's Large Language Models are Essentially BS Machines in c/[email protected]

[–] [email protected] 4 points 2 years ago (1 children)

That same critique should apply to the LLM as well.

No, it shouldn't. Instead, you should compare it to the alternatives you have on hand.

The fact is,

Using LLM was a better experience for me then reading a textbook.
And it was also a better experience for me then watching recorded video lectures.

So, if I have to learn something, I have enough background to spot hallucinations, and I don't have a teacher (having graduated college, that's always true), I would consider using it, because it's better then the alternatives.

I just would never fully trust knowledge I gained from an LLM

There are plenty of cases where you shouldn't fully trust knowledge you gained from a human, too.

And there are, actually, cases where you can trust the knowledge gained from an LLM. Not because it sounds confident, but because you know how it behaves.

Today's Large Language Models are Essentially BS Machines in c/[email protected]

[–] [email protected] 4 points 2 years ago* (last edited 2 years ago) (3 children)

Why is that a problem?

For example, I've used it to learn the basics of Galois theory, and it worked pretty well.

The information is stored in the model, do it can tell me the basics
The interactive nature of taking to LLM actually helped me learn better than just reading.
And I know enough general math so I can tell the rare occasions (and they indeed were rare) when it makes things up.
Asking it questions can be better than searching Google, because Google needs exact keywords to find the answer, and the LLM can be more flexible (of course, neither will answer if the answer isn't in the index/training data).

So what if it doesn't understand Galois theory - it could teach it to me well enough. Frankly if it did actually understand it, I'd be worried about slavery.

Today's Large Language Models are Essentially BS Machines in c/[email protected]

[–] [email protected] 11 points 2 years ago* (last edited 2 years ago) (5 children)

have no thoughts

True

know no information

False. There's plenty of information stored in the models, and plenty of papers that delve into how it's stored, or how to extract or modify it.

I guess you can nitpick over the work "know", and what it means, but as someone else pointed out, we don't actually know what that means in humans anyway. But LLMs do use the information stored in context, they don't simply regurgitate it verbatim. For example (from this article):

If you ask an LLM what's near the Eiffel Tower, it'll list location in Paris. If you edit its stored information to think the Eiffel Tower is in Rome, it'll actually start suggesting you sights in Rome instead.

"OSI still recommends the Cyber Resilience Act should exclude all activities prior to commercial deployment of software and clearly ensure that responsibility for CE marks does not rest with any actor in c/[email protected]

[–] [email protected] 3 points 2 years ago* (last edited 2 years ago)

At the early stages of the legislative process, every time this was brought up people kept saying, "it's fine, they've excluded non-commercial open source". Now it seems there are problems with what might count as "commercial".

But at this stage of the process, the EU legislators can't make arbitrary amendments. There are two versions of the text now - one proposed by the Parliament, and one by the Council - and the final text must be a compromise between those two. It can still be rejected by Parliament, but that would be rejecting it in whole (and they won't do that).

Fediverser: bring content and users from legacy social media networks into the fediverse in c/[email protected]

[–] [email protected] 6 points 2 years ago

Anyone who still has to regularly visit Reddit because of all the niche subreddits that have great communities there but 1 post per month here.

FCC says “too bad” to ISPs complaining that listing every fee is too hard in c/[email protected]

[–] [email protected] 5 points 2 years ago (2 children)

For those of us not American, can someone explain what fees are root talking about? Isn't it like one fee of $X/month?