That's the logic I don't get, I guess I will never be rich, unless I win the lottery??..
Scraping is a huge business nowadays.
NB: many LLMs are based on something called the Pile, it is weird and shaddy to say the least. I don't think using LLM for business is good for reputation. But clearly, we are not really allowed to think otherwise (Physics Nobel price for AI was the end of the argument for me), and I want to work, it is MY fault, I should've known better.
Ok, time to deploy Anubis in front of Gitea, I'm done with those FAANG oligarchs scraping my repos 24/7 to check if anything changed...
F*ck off.
But that also means Gitea might get unstable for some time, woops
If you are curious : https://git.halis.io
If you see the cute furry, it worked
Watt is being Dunn about AI scraping images and descriptions?
Make RED sure you fill your gravy description meat with AI hostile get em on the beaches words.
Images uploaded to mastodon should have AI poison added to them.
Really interesting project Anubis to protect against #LLM scraping bots : https://anubis.techaro.lol/ #Scraping #bots
Le #scraping #payant : vers un changement radical du modèle économique de l’ #IA #AI #générative ?
#Cloudflare lässt KI-Crawler auflaufen, wenn nicht für #Scraping bezahlt wird | heise online https://www.heise.de/news/Cloudflare-laesst-KI-Crawler-auflaufen-wenn-nicht-fuer-Scraping-bezahlt-wird-10467015.html #PayPerCrawl #ArtificialIntelligence #copyright #Urheberrecht
@akamran @davidtoddmccarty If you search Google for #Mastodon hashtag scraping, you find software and programs that help AI for doing that. It exists.
Fact is that from today, the main instances mastodon.social and mastodon.online prohibit #scraping officially: https://techcrunch.com/2025/06/17/mastodon-updates-its-terms-to-prohibit-ai-model-training/
Problem of decentralisation: admins/users of other instances must get aware of the problem and change their terms, too.
It may be funny but it's no joke.
#Hinweis auf #Nutzbarkeit von #Data #Analytics / #Data #Science #Methoden #Scraping, #Pattern #Recognition, #Machine #Learning oder #Text #Mining für #soziologische #Forschung.
#Sutter / #Maasen - #Neuerfindung #Soziologie S.76 f. 2020 DOI: 10.5771/9783845295008-73
5 Best JavaScript Web Scraping Libraries in 2025, by @apify.bsky.social:
https://blog.apify.com/best-javascript-web-scraping-libraries/
@anirvan @404mediaco the only way to deal with this is the same as with any other #malware and #DDoS:
I do maintain a #blocklist of those and will happily accept suggestions and pull requests...
https://github.com/greyhat-academy/lists.d/blob/main/scrapers.ipv4.block.list.tsv
#AI #scraping #GLAM #CulturalHeritage
'AI bots that scrape the internet for training data are hammering the servers of libraries, archives, museums, and galleries, and are in some cases knocking their collections offline, according to a new survey published today.'
https://www.404media.co/ai-scraping-bots-are-breaking-open-libraries-archives-and-museums/
A Thought on JavaScript “Proof of Work” Anti-Scraper Systems, by @cks:
https://utcc.utoronto.ca/~cks/space/blog/web/JavaScriptScraperObstacles
PyDoll – Async Python scraping engine with native CAPTCHA bypass
A thought on JavaScript "proof of work" anti-scraper systems
https://utcc.utoronto.ca/~cks/space/blog/web/JavaScriptScraperObstacles
#HackerNews #JavaScript #proof #of #work #anti-scraper #systems #web #scraping #technology #security
Analysten von #Cybernews meinen dass die #Facebook Daten neu seien.Dafür sei eine Stichprobe von 100.000 Datensätzen untersucht worden
Gestohlen:
Benutzer-IDs
Namen
E-Mail-Adressen
Benutzernamen
Telefonnummern
Standorte
Geburtstage
Geschlechter
#Data #Scraping
www.security-insider.de/cyberkrimina...
Facebook Nutzerdaten im Darkne...
Наладил архивацию всех ссылок, заметок и страниц из своего профиля Readwise Reader. Теперь ни один бесценный пост не пропадёт!
3/
For more on scraping (as in web-scraping) see here:
https://mastodon.social/@reiver/114353728684249608
CC: @404mediaco
2/
Scraping (as in Web Scraping) is the act of extracting data from HTML web-pages where the data is NOT machine-legible.
If the data, even in an HTML web-page, is in a machine-legible format, then it is NOT scraping.
...
And, getting data in JSON (key-value pairs) is definitely NOT scraping — as JSON's purpose is to communicate data in a machine-legible manner.
CC: @404mediaco