Jaded

joined 2 years ago
[–] Jaded 2 points 2 years ago (2 children)

Most Loras use instruction type datasets. I know of only one that used straight text but that was the unreal docs and not just a book.

From what I understand, if you want it to answer questions on the book, you need to feed paragraphs into chat gpt and generate questions and answers.

If you want to generate text, you will want to have the input be the previous paragraphs and the output be the current paragraphs. Depending on what you want, you can grab paragraphs and then have ChatGtp write a summary, and put that summary as the prompt in the input instead of the previous paragraphs.

I like the alpaca format so I would add an instruction above all that explaining what it's suppose to do.

I would look into how the other fine tunes are structuring their data and mirror that. I would even grab some of theirs and add it to boost the diversity in your data. I find just training on one narrow subject makes it a bit dumb but I don't have all that much experience with it.

[–] Jaded 1 points 2 years ago

Ignoring the fact that training an AI is insanely transformative and definitely fair use, people would not get any kind of pay. The data is owned by websites and corporations.

If AI training was to be highly restricted, Microsoft and google would just pay each other for the data and pay the few websites they don't own (stack, GitHub, Reddit, Shutterstock, etc), a bit of money would go to publishing houses and record companies, not enough for the actual artist to get anything over a few dollars.

And they would happily do it, since they would be the only players in the game and could easily overcharge for a product that is eventually going to replace 30% of our workforce.

Your emotional short sighted response kills all open source and literally gives our economy to Google and Microsoft. They become the sole owners of AI tech. Don't be stupid, please. They want you to be mad, it literally only helps them.

[–] Jaded 6 points 2 years ago

I had my insurance company ask me for my phone number for security purposes. It was an old one I had since replaced and forgotten, so they read it out to me and asked me to confirm it.

[–] Jaded 2 points 2 years ago

Check out fusion 360. There is a free version for personal use, you have to search for it on their website since they hide it.

[–] Jaded 11 points 2 years ago

The purchased service is internet. I should be able to use it how I want, including supplying it to other devices through my phone. This is the equivalent of Netflix not letting us cast onto tvs.

Not sure what you are defending here, this is clearly unethical and gross corporate behavior.

[–] Jaded 1 points 2 years ago

There is no open source future if all we have is the blender and nothing else

[–] Jaded 0 points 2 years ago

It depends for what kind of AI and but no, giving sources and building with just volunteer data is just not possible at our current technological level. I'm mostly talking about large llms because that's what's really at stake and they train on huge amounts of data. Like ALL of stack, GitHub, Reddit, etc. Just fine tuning them on a consumer level takes more than 50 000 question and answer pairs, that's just one tiny superficial layer that's added on top.

Grammerly should absolutely add an opt out option to gain consumers trust, but forcing the the whole industry to do so is a disaster.

If individuals can opt out, so will websites to "protect their users". Then we get data hoarding, where stack and GitHub opt out of all open source options but sell it to the only ones that can now afford to build ais, Microsoft and google. it won't include data of certain individuals, the few that opt out, but I'm guessing eventually the opt in will be directly into the terms of service of websites, you opt in or you fuck off.

How does anyone except corporations benefit from this kind of circus. In 10 years, AI will be doing most office work. Google isn't dumb and wants that profit. They and openai have all the data, they can strong arm or buy what they are missing. Restricting and legislating only widens their moat.

[–] Jaded 2 points 2 years ago* (last edited 2 years ago) (2 children)

Most of the data is scraped, it's not up to the website. You can't give a list of citation since it isn't a search engine, it doesn't know where the information comes from and it's highly transformative, it melds information from hundreds if not thousand of different sources.

If it worked only with volunteer work, there would simply be not enough data.

Any law restricting data use in AI is only going to benefit corporations, there isn't a solution for individual content creators. You can't pay them for the drop in the bucket they add, thee logistics are insane. You can let them opt out, but then you need to do the same for whole websites which leads to a corporate hellscape where three companies own our whole economy since they are the only ones who can train ais.

[–] Jaded 1 points 2 years ago (2 children)

What happens when every corporation and website closes their doors to AI? There isn't any open source if we can't use scrapped information from stack overflow, GitHub, Reddit etc.

Sure some users will opt out but most won't. Every single website will restrict though and then they will sell it to google and Microsoft who will be the only companies able to build ais.

[–] Jaded -2 points 2 years ago* (last edited 2 years ago) (9 children)

Models need vast amounts of data. Paying individual users isnt feasible, and like you said most of it can be scraped.

The only way I see this working is if scraped content is a no go and then you pay the website, publishing house, record company, etc which kills any open source solution and doesn't really help any of the users or creators that much. It also paves the way for certain companies owning a lot of our economy as we move towards an AI driven society.

It's definitely a hot mess but the way I see it, the more restrictive we are with it, the more gross monopolies we create for no real gains.

[–] Jaded 12 points 2 years ago (3 children)

It's because certain companies are stirring the pot and manipulating. They want people mad so they can put restrictions on training AI, to stifle the open source scene.

[–] Jaded 6 points 2 years ago

Use Firefox with adblocker then disable the YouTube app so it goes with Firefox as your default

view more: ‹ prev next ›