mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,5 Tsd.
aktive Profile

#scraper

2 Beiträge2 Beteiligte0 Beiträge heute

»Cloudflare Introduces Default Blocking of A.I. Data Scrapers«

Nett, wird aber kaum funktionieren. Weil: Fortgeschrittene Scraper nutzen Browser-Emulation und rotierende IPs, um sich als echte Nutzer auszugeben und technische Erkennung zu umgehen. Da es sich nur um eine serverseitige Maßnahme ohne rechtliche Bindung handelt, können solche Akteure die Sperren leicht und folgenlos ignorieren.

nytimes.com/2025/07/01/technol

#cloudflare #ai #ki #scraper

/kuk

Matthew Prince, the chief executive of Cloudflare, said he was “deeply concerned that the incentives for content creation are dead.”
The New York Times · Cloudflare Introduces Blocking of A.I. Scrapers By DefaultVon Natallie Rocha

Mastodon untersagt das Training von KIs mit Inhalten der eigenen Plattform.

Neue #Nutzungsbedingungen gegen #KITraining 🚫 #Mastodon verbietet ab 1. Juli explizit das #Scrapen und die Nutzung von User-Daten zum #Training von #KIModellen auf seinem #Hauptserver.

Klarer #Schutz der #Community 🤝 Die neuen Regeln untersagen automatisierte Tools wie #Bots und #Scraper, um Daten abzugreifen – mit Ausnahme von normalen Suchmaschinen und Browsern. (1/2)

The most disgusting feature of this relatively new #AI #scraper |s plague is that they are about to defile everything we like in the *good* internet.

Images with relevant #AltText? Perfect training materials for text-to-image generative models.

Static webpages? No #Anubis - no problem to scrape.

#Anubis uses proof-of-work ( #PoW ), which implies either #JavaScript or manual instructions. No, it is a good solution... Best of the worst (as if there were any good ones...)

Last days I learned that (1) #Tor has a #PoW mechanism (2) Anubis seems to somehow whitelist #lynx browser, allowing no-JS Lynx users in (a big favour for #accessibility and #smolweb ). Good (let's hope all these will persist).

MWoffliner, the @mediawiki #scraper has been released in version 1.15!

1.15 brings a significant amount of improvements:
* Support of wide used (outside Wikimedia) "ActionParse" API
* Use latest libzim (we were stuck with an older version) which fixes many suggestion problems with non-latin alphabets
* Move to Node.js 24 + many install fixes
* Better & sophisticated remote error handling

Full changelog at github.com/openzim/mwoffliner/

Available as container image and Npmjs package!

GitHubRelease 1.15.0 · openzim/mwofflinerNEW: Check early for availability APIs + add check for module API (@benoit74 #2246 / #2248) NEW: Add support for ActionParse API as renderer (@benoit74 #2127) CHANGED: Upgrade to node-libzim 3.3.0 ...
Fortgeführter Thread

2/

Scraping (as in Web Scraping) is the act of extracting data from HTML web-pages where the data is NOT machine-legible.

If the data, even in an HTML web-page, is in a machine-legible format, then it is NOT scraping.

...

And, getting data in JSON (key-value pairs) is definitely NOT scraping — as JSON's purpose is to communicate data in a machine-legible manner.

CC: @404mediaco

#Scraper#Scraping#WebScraper

Update: I reported the bot. Thanks.

A Mastodon bot account at mastodon.cloud scans the fediverse, scrapes selected web pages shared there, rewrites them with AI, posts them to its own site, and shares on Mastodon as tech news the rewritten AI slop. The bot scraped a post of mine (including the attached image) within minutes of my federated blog publishing it.

Is it worth flagging the bot and reporting it to its instance? Are the mods likely to take action?

#mastodon#moderation#ai