
A sharded DuckDB on 63 nodes runs 1T row aggregation challenge in 5 sec
https://gizmodata.com/blog/gizmoedge-one-trillion-row-challenge

{targets} daily pipeline
| {meteospain} ->
| {meteoland} ->
| {medfate} ->
| {arrow} + {geoarrow}
|
|- geoparquet files in custom S3 (No AWS)
|
|- {shiny}
| {duckdb} +
| {mapdeck} +
| {echarts4r}
|
|- Interactive visualizations of daily modelled forest water balance in Spain at 500m2 resolution.
"Let me just quickly create script that export all #Garmin badges."
Oooh shoot… Garmin did harden their login flow and my #cURL foo did capitulate. Hours later… Not even Selenium did work anymore, but got my IP blocked at cloudflare.
Now using #Garth to get me an access token https://github.com/matin/garth I wish I could just sign up for the Garmin developer program, but they say "only business"… Meeeh.
Script itself basically wrote itself with #DuckDB
https://github.com/michael-simons/garmin-babel/blob/main/bin/export_badges.sh

Duck-UI – Browser-Based SQL IDE for DuckDB
DuckDB extension for A5 DGGS: A community extension by Query.Farm brings native support for the (equal-area, pentagonal-cell-based) #A5 Discrete Global Grid System (#DGGS) to #DuckDB. In development since April, the extension includes a set of functions for coordinate...
https://spatialists.ch/posts/2025/10/19-duckdb-extension-for-a5-dggs/ #GIS #GISchat #geospatial #SwissGIS

One other reason I covered Garage (https://dailydrop.hrbrmstr.dev/2025/10/06/drop-714-2025-10-06-monday-morning-grab-bag/) in the 10-06 Drop was b/c we're making super cool new hourly Parquet timeseries now at work & they `aws s3 sync` super fast to my local server.
Garage makes local S3 rly easy (MinIO drank the "AI" kool-aid, so intercourse them) & works well from #RStats & #DuckDB.
Notwithstanding the global slide-to-authoritarian mess + daft "AI" stuff, we do live in amazing tech times w/rly clever folks making awesome FOSS.
Getting a s***-ton of CSV files as input for spatial analysis can be a pain.
Yes, you can write a quick Python loop in the #QGIS console to load them but it's not a great workflow, imho.
The DuckDB read_csv function has been very convenient for this kind of ETL workflow:
From a bunch of csv files to a neat #SpatialAnalytics dataset:
Step-by-step GeoLife #GPS track collection processing with #DuckDB, #QGIS & #Trajectools

Appreciate the nod that spatial data viz is hard, @motherduck!
The motivation part about expensive GIS tools is a little off though
https://motherduck.com/blog/geospatial-for-beginner-duckdb-spatial-motherduck/
#spatial @duckdb based #OGCAPI Features server written in #golang
fork of #pg_featureserv (cc @pwramsey)
#gischat #geodata #geospatial #duckdb

Detecting #WarCrimes from #Space. Building an Open-Source Pipeline with SQLMesh and #DuckDB - Tom Uijtdehaag - #Data Engineer https://bigdatarepublic.nl/articles/detecting-war-crimes-from-space/
#DuckDB is great for at-scale testing IPv4 CIDR membership.
If you have a massive (++millions) table of IPs and a table of ++thousands of CIDRs, this will crank through them locally in mere seconds.
```sql
INSTALL inet;
LOAD inet;
FROM ips
CROSS JOIN cidrs
SELECT
ip,
cidr
WHERE
ip::INET <<= cidr::INET;
```
New #GDAL tutorial just published, on working with #DuckDB - https://gdal.org/en/latest/tutorials/vector_duckdb_tut.html (also includes some #MapServer at the end for good measure )
#Rstats #duckdb hivemind
Best strategy for this situation :
- very large number of indivual files with same structure (millions of rows)
- concurent individual file based (client side) queries based on attributes stored in different tables/files (search on attributes tables then joins basically)
- don't want/cannot pay motherduck for now
What's best :
Just add tables, store in parquet/duckdb files and try to do most stuff on the client side, use Postgress ?
There's a new release, 1.4.0, of #DuckDB, my favourite in-process analytical #SQL database. In this release the sorting code has been rewritten, again! I love this kind of development work, where improved algorithms enable us to do more with less. (Unlike generative AI which…) https://duckdb.org/2025/09/24/sorting-again.html