daniskarma

joined 1 year ago
[–] daniskarma 1 points 1 week ago (1 children)

Probably. Here in spain public workers have 35 hours work week and global 37,5 is being introduced. For this we usually take off half an hour or an entire hour each day.

[–] daniskarma 1 points 1 week ago (3 children)

Belgium is 38 hours for instance.

[–] daniskarma 38 points 1 week ago* (last edited 1 week ago) (15 children)

United States of America is not a planet.

There are countries with both more and less work hours.

[–] daniskarma 4 points 1 week ago (1 children)

Yeah, sorry I forgot, I will release them now. Thanks for reaching me, the person responsible to release the files, through this unrelated lemmy thread.

How do you know to find me here?

[–] daniskarma 19 points 1 week ago (1 children)

Capitalism is working so good that capitalists are the most consistently class conscious group. They are aware which class they belong to and side with it.

[–] daniskarma 1 points 1 week ago (2 children)

I don't think it's easy to do. Given how unreliable "AI detectors" are in general.

Also, why? Music is something very sensitive driven. If you like it you like if you don't you don't, I don't think a quantitative measure on how a song is made is a reasonable approach to distinguish which songs you like and which song you don't.

I can just imagine:

  • Do you like this song?

  • I don't know yet. (Pulls phone out to measure AIness of the song) No I don't like it.

[–] daniskarma 47 points 1 week ago (1 children)

Money. The answer is always money. If it's cheaper to build on land they will.

The answer would probably to make a special tax that force them to move to more environmentally friendly locations.

[–] daniskarma -2 points 1 week ago* (last edited 1 week ago) (4 children)

Why would they request so many times a day the same data if the objective was AI model training. It makes zero sense.

Also google bots obeys robots.txt so they are easy to manage.

There may be tons of reasons google is crawling your website. From ad research to any kind of research. The only AI related use I can think of is RAG. But that would take some user requests aways because if the user got the info through the AI google response then they would not enter the website. I suppose that would suck for the website owner, but it won't drastically increase the number of requests.

But for training I don't see it, there's no need at all to keep constantly scraping the same web for model training.

[–] daniskarma -1 points 1 week ago

Cloudfare have a clear advantage in the sense that can put the door away from the host and can redistribute the attacks between thousands of servers. Also it's able to analyze attacks from their position of being able to see half the internet so they can develop and implement very efficient block lists.

I'm the first one who is not fan of cloudfare though. So I use crowdsec which builds community blocklists based on user statistics.

PoW as a bot detection is not new. It has been around for ages, but it has never been popular because there have always been better ways to achieve the same or even better results. Captcha may be more user intrusive, but it can actually deflect bots completely (even the best AI could be unable to solve a well made captcha), while PoW only introduces a energy penalty expecting to act as deterrent.

My bet is that invidious is under constant Google attack by obvious reasons. It's a hard situation to be overall. It's true that they are a very particular usercase, with both a lot of users and bots interested in their content, a very resource heavy content, and also the target of one of the biggest corporations of the world. I suppose Anubis could act as mitigation there, at the cost of being less user friendly. And if youtube goes a do the same it would really made for a shitty experience.

[–] daniskarma 0 points 1 week ago* (last edited 1 week ago)

Most of those companies are what's called "gpt wrappers". They don't train anything. They just wrap an existing model or service into their software. AI is a trendy word that gets quick funds, many companies will say they are AI related even if they are just making an API call to chatGPT.

For the few that will attempt to train something, there are already a wide variety of datasets for AI training. Or they will may try to get data of a very specific topic. But in order to be scraping the bottom of the pan so hard that you need to scrap some little website you need to be talking about a model with a massive amount of parameters. Something that only like 5 companies in the world would actually need to improve their models. The rest of the people trying to train a model is not going to go try to scrap the whole internet, because they have no way to process and train that.

Also if some company is willing to waste a ton of energy training some data, doing some PoW to obtain that data, while it would be an inconvenient I don't think it will stop them. They are literally building nuclear plants for training, a little crypto challenge is nothing in comparison. But it can be quite intrusive for legitimate users. For starters it forbids navigation with js deactivated.

[–] daniskarma -3 points 1 week ago* (last edited 1 week ago) (1 children)

It's very intrusive in the sense that it runs a PoW challenge, unsolicited on the client. That's literally like having a cryptominer running on your computer for each challenge.

Each one would do what they want with their server, of course. But for instance I'm very fond of scraping. For instance I have FreshRSS running ok my server, and the way it works is that when the target website doesn't provide a RSS feed ot scrapes it to get the articles. I also have other service that scrapes to get pages changes.

I think part of the beauty of internet is being able to automate processes, software lile Anubis puts a globally significant energy tax on theses automations.

Once again, each one it's able to do with their server whatever they want. But the think I like the least is that they are targeting with some great PR their software as part of some great anti-AI crusade, I don't know if the devs itself or any other party. And I don't like this mostly because I think is disinformation and just manipulative towards people who is maybe easy to manipulate if you say the right words. I also think that it's a discourse that pushes into radicalization from certain topic, and I'm a firm believer that right now we need to overall reduce radicalization, not increase it.

view more: ‹ prev next ›