AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

Tea@programming.dev · edit-2 2 days ago

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

andybytes@programming.dev · 2 days ago

even better stop joining major platforms like social media and then they won’t be able to create data sets. Be a leach, especially when they give it away for free, but don’t contribute to the project. Understand how it works, sure. But it seems like most of humanity says they don’t want something, yet they do the contrary. It’s like we choose to comply before we even ask to comply for the fear of missing out. But if you look at what is today, what are you really missing out on?

pulsewidth@lemmy.world · 2 days ago

So, uh. What about Lemmy?

They can also crawl this publically-accessible social media source for their data sets.

I’m on board with abandoning mainstream social media, but my point is that your suggestion would not solve the problem just relocate it. A better solution to the AI conglomerates stealing everyone’s data from the open Internet is legislation and regulations - ie tackling the whole ‘stealing data’ component, along with stronger privacy regulations for everyone to make it harder for them to do the same in the future. It’s nice seeing the EU taking some positive steps, but we will not see the US take any steps in that direction anytime soon, due to corporate capture of their politicians and the AI companies all being in the top 10 most wealthy companies in the US.

Spaniard@lemmy.world · edit-2 2 days ago

It’s nice seeing the EU taking some positive steps

Yet they helped introducing the super cookies and are trying to end encryption on communications.

Saik0@lemmy.saik0.com · 1 day ago

They can also crawl this publically-accessible social media source for their data sets.

Crawling would be silly. They can simply setup a lemmy node and subscribe to every other server. Activitypub crawler would be much more efficient as they wouldn’t accidentally crawl things that haven’t changed, but instead can read the activitypub updates.

Strawberry@lemmy.blahaj.zone · 1 day ago

Sure but we’re in the comments section of an article about wikipedia being crawled, which is silly because they could just download a snapshot of wikipedia

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

How crawlers impact the operations of the Wikimedia projects – Diff