daniskarma

joined 1 year ago
[–] daniskarma 3 points 8 hours ago (1 children)

Yeah, sorry I forgot, I will release them now. Thanks for reaching me, the person responsible to release the files, through this unrelated lemmy thread.

How do you know to find me here?

[–] daniskarma 7 points 8 hours ago

Capitalism is working so good that capitalists are the most consistently class conscious group. They are aware which class they belong to and side with it.

[–] daniskarma 2 points 8 hours ago

I don't think it's easy to do. Given how unreliable "AI detectors" are in general.

Also, why? Music is something very sensitive driven. If you like it you like if you don't you don't, I don't think a quantitative measure on how a song is made is a reasonable approach to distinguish which songs you like and which song you don't.

I can just imagine:

  • Do you like this song?

  • I don't know yet. (Pulls phone out to measure AIness of the song) No I don't like it.

[–] daniskarma 28 points 9 hours ago (1 children)

Money. The answer is always money. If it's cheaper to build on land they will.

The answer would probably to make a special tax that force them to move to more environmentally friendly locations.

[–] daniskarma 0 points 9 hours ago* (last edited 9 hours ago) (4 children)

Why would they request so many times a day the same data if the objective was AI model training. It makes zero sense.

Also google bots obeys robots.txt so they are easy to manage.

There may be tons of reasons google is crawling your website. From ad research to any kind of research. The only AI related use I can think of is RAG. But that would take some user requests aways because if the user got the info through the AI google response then they would not enter the website. I suppose that would suck for the website owner, but it won't drastically increase the number of requests.

But for training I don't see it, there's no need at all to keep constantly scraping the same web for model training.

[–] daniskarma 1 points 10 hours ago

Cloudfare have a clear advantage in the sense that can put the door away from the host and can redistribute the attacks between thousands of servers. Also it's able to analyze attacks from their position of being able to see half the internet so they can develop and implement very efficient block lists.

I'm the first one who is not fan of cloudfare though. So I use crowdsec which builds community blocklists based on user statistics.

PoW as a bot detection is not new. It has been around for ages, but it has never been popular because there have always been better ways to achieve the same or even better results. Captcha may be more user intrusive, but it can actually deflect bots completely (even the best AI could be unable to solve a well made captcha), while PoW only introduces a energy penalty expecting to act as deterrent.

My bet is that invidious is under constant Google attack by obvious reasons. It's a hard situation to be overall. It's true that they are a very particular usercase, with both a lot of users and bots interested in their content, a very resource heavy content, and also the target of one of the biggest corporations of the world. I suppose Anubis could act as mitigation there, at the cost of being less user friendly. And if youtube goes a do the same it would really made for a shitty experience.

[–] daniskarma 0 points 10 hours ago* (last edited 10 hours ago)

Most of those companies are what's called "gpt wrappers". They don't train anything. They just wrap an existing model or service into their software. AI is a trendy word that gets quick funds, many companies will say they are AI related even if they are just making an API call to chatGPT.

For the few that will attempt to train something, there are already a wide variety of datasets for AI training. Or they will may try to get data of a very specific topic. But in order to be scraping the bottom of the pan so hard that you need to scrap some little website you need to be talking about a model with a massive amount of parameters. Something that only like 5 companies in the world would actually need to improve their models. The rest of the people trying to train a model is not going to go try to scrap the whole internet, because they have no way to process and train that.

Also if some company is willing to waste a ton of energy training some data, doing some PoW to obtain that data, while it would be an inconvenient I don't think it will stop them. They are literally building nuclear plants for training, a little crypto challenge is nothing in comparison. But it can be quite intrusive for legitimate users. For starters it forbids navigation with js deactivated.

[–] daniskarma -2 points 11 hours ago* (last edited 11 hours ago) (1 children)

It's very intrusive in the sense that it runs a PoW challenge, unsolicited on the client. That's literally like having a cryptominer running on your computer for each challenge.

Each one would do what they want with their server, of course. But for instance I'm very fond of scraping. For instance I have FreshRSS running ok my server, and the way it works is that when the target website doesn't provide a RSS feed ot scrapes it to get the articles. I also have other service that scrapes to get pages changes.

I think part of the beauty of internet is being able to automate processes, software lile Anubis puts a globally significant energy tax on theses automations.

Once again, each one it's able to do with their server whatever they want. But the think I like the least is that they are targeting with some great PR their software as part of some great anti-AI crusade, I don't know if the devs itself or any other party. And I don't like this mostly because I think is disinformation and just manipulative towards people who is maybe easy to manipulate if you say the right words. I also think that it's a discourse that pushes into radicalization from certain topic, and I'm a firm believer that right now we need to overall reduce radicalization, not increase it.

[–] daniskarma 0 points 11 hours ago (2 children)

Not really. I only ask because people always say it's for LLM training, which seem a little illogical to me, knowing the small number of companies that have access to the computer power to actually do a training with that data. And big companies are not going to scrape hundreds of times the same resource for a piece of information they already have.

But I think people should be more critique trying to understand who is making the request and with which purpose. So then people could make a better informed decision of they need that system (which is very intrusive for the clients) or not.

[–] daniskarma -3 points 11 hours ago (2 children)

I don't think is millions. Take into account that a ddos attacker is not going to execute JavaScript code, at least not any competent one, so they are not going to run the PoW.

In fact the unsolicited and unwarned PoW does not provide more protection than a captcha again ddos.

The mitigation comes from the smaller and easier requests response by the server, so the number of requests to saturate the service must increase. How much? Depending how demanding the "real" website would be in comparison. I doubt the answer is millions. And they would achieve the exact same result with a captcha without running literal malware on the clients.

[–] daniskarma -3 points 11 hours ago* (last edited 11 hours ago) (5 children)

Precisely that's my point. It fits a very small risk profile. People who is going to be ddosed but not by a big agent.

It's not the most common risk profile. Usually ddos attacks are very heavy or doesn't happen at all. These "half gas" ddos attacks are not really common.

I think that's why when I read about Anubis is never in a context of ddos protection. It's always on a context of "let's fuck AI", like this precise line of comments.

 

This is not about any specific case. It's just a theoretical scenario that popped into my mind.

For context, in many places is required to label AI generated content as such, in other places is not required but it is considered good etiquette.

But imagine the following, an artist is going to make an image. Normal first step is search for references online, and then do the drawing taking reference from those. But this artists cannot found proper references online or maybe the artist want to experiment, and the artist decide to use a diffusion model to generate a bunch of AI images for reference. Then the artist procedes to draw the image taking the AI images as references.

The picture is 100% handmade, each line was manually drawn. But AI was used in the process of making this image. Should it have some kind of "AI warning label"?

What do you think?

 

Reminder: This post is from the Community Actual Discussion. You’re encouraged to use voting for elevating constructive, or lowering unproductive, posts and comments here. When disagreeing, replies detailing your views are appreciated. For other rules, please see this pinned thread. Thanks!

I think one of the issues with online arguing, from most takes on it, is that the main reason people have to argue is to spread an idea. Whether it's by convincing the opposing part of the argument and making them change their mind, or by changing or reinforcing the mind of anonymous readers of the argument.

Most of the time this leads to one of two conclusions: If someone tries to change the other person's mind they will, most likely, find themselves hitting a wall, which will lead to frustration, disinterest, or personal attacks once it's seen that the other person will not change their mind. If they do not care about changing the other person's mind and just want to make clear that their own position is the right one to have, then the argument becomes a game of winning and losing. This could be achieved by many ways, depending on the context, it could lead to insulting and trying to put group pressure (via downvotes for instance) to make the other person's opinion seem as the "bad" one. Or via creating a game of rules, and play that game better to become a winner. Please excuse the small attack I'm about to make on this very space, but part of this second approach is the rules of debate, as in consider arguments without sources, emotional responses, or fallacies as losing points in the game of arguing. And often when the other part falls into one of this issues the goal quickly becomes to point out all this "faults" the other person made, so they are clearly shown as the loser. Don't get me wrong, it is important to argue without fallacies, and to be able to prove any statements that one's make. But I don't think anyone gains anything when the argument becomes a match on who is able to ask for more sources, link more articles and identify more fallacies.

That being said I'm going to just link some literature that support the basis of this statements. Can Arguments Change Minds? . This article goes into great lengths to show something that's easily seen when arguing online: People don't change their minds from an argument. The process of changing someone's mind is very complex. The article explains some study cases where people from extremist backgrounds changed their minds over time, in a context of discussion, but it's stated that this change had a lot more going out that just a proper intellectual discussion.

Why bother then? In my opinion, the best thing we can get when arguing with someone whose opinion differs to our own is to understand them. To find out their way of seeing things, their motives, their reasoning. That's a great value. And to get this often we need to let them talk the way they want to talk, this tend to lead to some undesirable things, like mentioned fallacies, unsupported claims or straight up bigotry and name-calling. But I think that it is still valuable knowing if that's their only reasoning, or trying to push past those to see if there's something more in depth about why they don't agree with us. But, ultimately, focusing the discussion in getting a win, will often make us miss a lot of valuable information that we could have gotten if we just saw the argument as a way to understand the other person, and of course, to understand ourselves. And not only for us to understand them, but to them to understand us. Explaining our point of view in the clearer way possible, and focussing not on winning when we talk about our opinions, but on showing why we have those opinions. To be able to reach a point of "I don't agree with you but I understand you".

Of course the big elephant in the room here is that taking this approach to it's logical conclusion would mean letting some people express ideas that we don't want to be expressed. The obvious example here is hate speech. Should hate speech, or extremist arguments be allowed, and discussed? If allowed, what's our goal when engaging into an argument with them, to convince, or to understand and make the other part to also understand us? This is where I'm more torn apart, as the logic of this reasoning leads me to believe that the best is the later, but it confronts with everything I've learn about how to deal with hate speech and dangerous ideologies until now. Thus why the (OPEN-ENDED) tag, and why I hope for anyone to jump and give their opinion on this.

 

This is not a question about if you think it is possible, or not.

This is a question about your own will and desires. If there was a vote and you had a ballot in your hand, what will you vote? Do you want Artificial Intelligence to exist, do you not, maybe do you not care?

Here I define Artificial Intelligence as something created by humans that is capable of rational thinking, that is creative, that it's self aware and have consciousness. All that with the processing power of computers behind it.

As for the important question that would arise of "Who is creating this AI?", I'm not that focused on the first AI created, as it's supposed that with time multiple AI will be created by multiple entities. The question would be if you want this process to start or not.

 

I cannot stand google news any more, too much spam, clickbait and advertisement. So I decided to try to selfhost an RSS aggregator to make myself a news feed that I would be comfortable with. Being RSS such an "ancient" thing I thought there will be many mature systems, but I'm not sure that's the case..

As far as my investigation goes there are two main options out there** TT-RSS (tiny tiny RSS) and FreshRSS**. There seems to also be miniflux but it supposedly have very few features.

So I tried the both main ones and I ended up kind of disappointed, I hope that I'm missing something. My requirements are:

1-Have a nice interface, card view, phone friendly. Basically being able to look the same as google news looked. So both have a pretty dated interface. And terrible responsive UI for phones. I was kind of able to make a "card view" with TT-RSS but looked hideous and didn't really work on phone screen, also applying themes broke TT-RSS, this will be recurring theme but it looks like TT-RSS is constantly breaking a rolling release system makes it very unstable and many plugins, themes and third party apps don't work right now because some new update broke everything. So native theming wasn't going to be a thing, so I tried third party apps. I found many that worked with FreshRSS and settled on Feedme, it looked exactly as I wanted, great. One point for FreshRSS. Feedme was supposedly compatible with TTRSS but I could not login, I have the suspicion that one update broke integration. I'm not even try to attempt to ask in their forums as I see that some time ago somebody asked the same question and got banned from their forums.

2-Being able to filter or prioritize feeds The problem is that I would love to suscribe to very diverse feeds, some would post maybe over a 100 post per day and others maybe one post every week or even month. So if let everything by default the former would flood the feed and I would never see the post from the little feeds. Here both offer categories that I could use but ideally I would love to have a curated main page. FreshRSS supposedly have a priority system but it seems quite simple and not effective for my needs, AFAIK you can put some feeds in "important feeds" but it only would show those feeds in that category then. TTRSS does have an advance filter system that is complex enough and with some fiddling I think I could make a set of rules that satisfy my needs. One point for TTRSS.

3-Being able to suscribe to any feed or even scrape webs that doesn't provide feeds. Here FreshRSS wins, I have zero issues subscribing to everything I wanted. With TTRSS I couldn't even subscribe with some pages that did provide with a feed, even if it was in an unconventional way. TTRSS devs say that is the webpage problem (even if FreshRSS had no problem with it). Here another point to FreshRSS.

And that is it, I do not exige that much. But I wasn't able to find a system that ticks those three checkboxes. FreshRSS was so close. But unless I am missing something you can't really create a curate feed that prioritizes and sorts feeds and posts in the way you can do with TTRSS sorting, if there is a way please let me know. And without that the whole thing becomes useless from the flooding feeds. And while I'm in love with TTRSS filters and sorting system, the whole app seems to unstable and with so many bugs to be usable, at least in my desired usercase (and I've seem many people complaining about TTRSS updates breaking things all the time).

My two main questions are:

-Am I missing some other self-hosted app that could do all I wanted?

-Am I missing some FreshRSS feature or extension that could curate a main feed with my own rules?

Any thoughts?

view more: next ›