mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,5 Tsd.
aktive Profile

#Webscraper

0 Beiträge0 Beteiligte0 Beiträge heute
@reiver ⊼ (Charles) :batman:<p>3/</p><p>For more on scraping (as in web-scraping) see here:<br><a href="https://mastodon.social/@reiver/114353728684249608" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">mastodon.social/@reiver/114353</span><span class="invisible">728684249608</span></a></p><p>CC: <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>404mediaco</span></a></span> </p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>2/</p><p>Scraping (as in Web Scraping) is the act of extracting data from HTML web-pages where the data is NOT machine-legible.</p><p>If the data, even in an HTML web-page, is in a machine-legible format, then it is NOT scraping.</p><p>...</p><p>And, getting data in JSON (key-value pairs) is definitely NOT scraping — as JSON's purpose is to communicate data in a machine-legible manner.</p><p>CC: <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>404mediaco</span></a></span> </p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>1/</p><p>If these researchers used a typical HTTP-based API that returns JSON, then —</p><p>What these researchers did is NOT scraping.</p><p>CC: <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>404mediaco</span></a></span></p><p>RE: <a href="https://www.404media.co/researchers-scrape-2-billion-discord-messages-and-publish-them-online/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">404media.co/researchers-scrape</span><span class="invisible">-2-billion-discord-messages-and-publish-them-online/</span></a></p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebScraping</span></a></p>
Hacker News<p>Scraperr – A Self Hosted Webscraper</p><p><a href="https://github.com/jaypyles/Scraperr" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/jaypyles/Scraperr</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/Scraperr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Scraperr</span></a> <a href="https://mastodon.social/tags/Webscraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Webscraper</span></a> <a href="https://mastodon.social/tags/SelfHosted" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SelfHosted</span></a> <a href="https://mastodon.social/tags/TechTools" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechTools</span></a> <a href="https://mastodon.social/tags/OpenSource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenSource</span></a></p>
Fox Ritch :fjoxicon:<p>apparently webscrapers are addicted to my archive </p><p><a href="https://mastodon.hostnetwork.xyz/tags/hostnetwork" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>hostnetwork</span></a> <a href="https://mastodon.hostnetwork.xyz/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>archive</span></a> <a href="https://mastodon.hostnetwork.xyz/tags/webscraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraper</span></a></p>
Fox Ritch :fjoxicon:<p>just caught chatgpt web scraper lacking</p><p><a href="https://mastodon.hostnetwork.xyz/tags/chatgpt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chatgpt</span></a> <a href="https://mastodon.hostnetwork.xyz/tags/webscraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraper</span></a></p>
@reiver ⊼ (Charles) :batman:web scraper
@reiver ⊼ (Charles) :batman:web scraper
@reiver ⊼ (Charles) :batman:web scraper
@reiver ⊼ (Charles) :batman:web scraper
@reiver ⊼ (Charles) :batman:web scraper
Enzyklopädie Roter Kreis<p>Um im föderalen Verband zu erfahren, welche Aktivitäten es in bestimmten Tätigkeitsbereichen gibt, wird im DRK mit Webscraping der Websites der Kreis- und Landesverbände experimentiert.<br>➡️ <a href="https://drk-wohlfahrt.de/blog/eintrag/mit-webscraping-data-science-die-wohnungslosenhilfen-im-drk-verstehen.html" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">drk-wohlfahrt.de/blog/eintrag/</span><span class="invisible">mit-webscraping-data-science-die-wohnungslosenhilfen-im-drk-verstehen.html</span></a> ("Wie Data Science das DRK in der Wohnungslosenhilfe unterstützen kann")</p><p><a href="https://sozial.roter-kreis.de/tags/DRK" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DRK</span></a> <a href="https://sozial.roter-kreis.de/tags/RotesKreuz" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RotesKreuz</span></a> <a href="https://sozial.roter-kreis.de/tags/DataScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataScience</span></a> <a href="https://sozial.roter-kreis.de/tags/DataScienceHub" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataScienceHub</span></a> <a href="https://sozial.roter-kreis.de/tags/Webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Webscraping</span></a> <a href="https://sozial.roter-kreis.de/tags/Webscraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Webscraper</span></a> <a href="https://sozial.roter-kreis.de/tags/DSSG" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DSSG</span></a> <a href="https://sozial.roter-kreis.de/tags/Wohlfahrt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Wohlfahrt</span></a> <a href="https://sozial.roter-kreis.de/tags/Wohlfahrtspflege" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Wohlfahrtspflege</span></a></p>
Stark<p>I did it again!</p><p>So I created <a href="https://techhub.social/tags/MastoBot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MastoBot</span></a>, a generic <a href="https://techhub.social/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a> Mastodon bot that allows anyone to create a bot.</p><p>I created a few versions, and I use it for <span class="h-card"><a href="https://techhub.social/@3dprinting" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>3dprinting</span></a></span>. But naturally, knowing how to implement it and developing functions, I need a use case.</p><p>So after a discussion this morning. I spent the entire day writing <span class="h-card"><a href="https://techhub.social/@Python" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>Python</span></a></span>. Yes, I did it again.</p><p>However, this one now has a built-in <a href="https://techhub.social/tags/webscraper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraper</span></a> to cross-post new posts fromhttps://discuss.python.org/, because why not.</p><p>This <span class="h-card"><a href="https://techhub.social/@Python" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>Python</span></a></span> required a few things, and updates were made to <a href="https://techhub.social/tags/MastoBot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MastoBot</span></a>. I had to make it even more generic, implement an overkil datastore with <a href="https://techhub.social/tags/Redis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Redis</span></a>, and extend the config system.</p><p><span class="h-card"><a href="https://techhub.social/@Python" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>Python</span></a></span> will behave exactly like <span class="h-card"><a href="https://techhub.social/@3dprinting" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>3dprinting</span></a></span> with the added feature of crossposts. These posts will, however, be "follower only" posts, to not polute <a href="https://techhub.social/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a> and just flood everything initially.</p><p>The bot will <a href="https://techhub.social/tags/boost" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>boost</span></a> parent posts, allowing for threads and discussions to be created.</p><p>The source code will be out tomorrow, just cleaning up.</p>