mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,5 Tsd.
aktive Profile

#multimodal

1 Beitrag1 Beteiligte*r0 Beiträge heute

🚗 x 🚌 Gemeinsam unterwegs: Die starke Partnerschaft von stadtmobil & VVS und SSB macht Mobilität in Stuttgart noch smarter! Viele Wege lassen sich bequem mit Bus & Bahn zurücklegen – und wenn’s mal flexibel, spontan oder mit viel Gepäck sein muss, ist stadtmobil die perfekte Ergänzung. 🤝

Wann nimmst du Bus & Bahn und wann greifst du lieber auf ein stadtmobil zurück?

#stadtmobil#VVS#SSB

Want an economically competitive city? A fiscally smart city? A healthy city? A sustainable, climate-responsible & resilient city? An equitable & accessible city? A livable city? A city with more choices? A successful city today, that’s positioned for a successful future? Build a #multimodal city.

Fortgeführter Thread

The stadtnavi system, built on Trufi’s open-source platform, shows how cities can democratize mobility. Real-time updates, CO₂ comparisons, and weather alerts aren’t exclusive to Herrenberg—they’re open for any city to implement. The project’s success lies in its adaptability: a white-label solution that lets cities rebrand and expand it freely.

tinyurl.com/yn9t9ro7

Trufi AssociationTrufi’s Tour-de-Force of Multimodal Possibilities: stadtnaviActive transport, motorized transport, public toilets and more – they're in the stadtnavi app: bus, train, bike, car, rideshare, taxi...

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems. Multi-modal LLM system simulates human communication using speech and generates human-like dialogues with consistent content, rhythm, & emotion.

Funnily, they also elaborate on a "think before you speak" design aspect. This might also be applicable to our everyday lives.

doi: 10.48550/arXiv.2401.03945

#30DayChartChallenge Día 10: ¡Buceando en la Distribución del VIX! 🌊

En lugar de solo ver la línea del VIX, hoy analizamos su "distribución de probabilidad" por Presidencia de EE.UU. (Clinton -> Trump 2º). ¡La forma lo es todo!

Usando #rstats y #ggplot2, estas densidades facetadas nos permiten investigar:
* Modos Dominantes: ¿Cuál era el nivel "normal" de VIX (el pico más alto)? ¿Cambió mucho?
* Multi-modalidad: ¿Hay evidencia de múltiples estados de volatilidad (picos secundarios) dentro de un mismo mandato? 🤔
* Riesgo de Cola: ¿Qué tan probable era el "pánico" (VIX > 35)? ¡Compara las colas derechas!

Estos patrones reflejan los distintos regímenes de volatilidad y la percepción del riesgo sistémico. No es solo el nivel, ¡sino la "estructura" de la incertidumbre lo que importa!

Datos: Yahoo Finance via #quantmod.
📂Código: t.ly/kikdo

#Day10#Multimodal#dataviz

NEWS: Meta has unveiled Llama 4, its latest AI model, featuring advanced multimodal capabilities that integrate text, video, images, and audio processing. This release includes Llama 4 Scout and Llama 4 Maverick, both open-source and designed to enhance Meta’s AI assistant across platforms like WhatsApp, Messenger, and Instagram. Is this a new benchmark in AI versatility?
#Llama4 #AI #Multimodal
ai.meta.com/blog/llama-4-multi

Meta AIThe Llama 4 herd: The beginning of a new era of natively multimodal AI innovationWe’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.

🚀 Exciting news for devs & AI enthusiasts! Introducing Qwen2.5-Omni , the latest multimodal model from Alibaba Cloud 🌟. It excels in text, vision, and audio tasks—chat, image gen, speech recog, you name it! Access now via 🌐 Qwen's GitHub or try it on ModelScope. Open weights soon! #AI #Multimodal #Tech #Qwen

SmolDocling: An ultra-compact VLM for end-to-end multi-modal document conversion

arxiv.org/abs/2503.11576

arXiv.orgSmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversionWe introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements in their full context with location. Unlike existing approaches that rely on large foundational models, or ensemble solutions that rely on handcrafted pipelines of multiple specialized models, SmolDocling offers an end-to-end conversion for accurately capturing content, structure and spatial location of document elements in a 256M parameters vision-language model. SmolDocling exhibits robust performance in correctly reproducing document features such as code listings, tables, equations, charts, lists, and more across a diverse range of document types including business documents, academic papers, technical reports, patents, and forms -- significantly extending beyond the commonly observed focus on scientific papers. Additionally, we contribute novel publicly sourced datasets for charts, tables, equations, and code recognition. Experimental results demonstrate that SmolDocling competes with other Vision Language Models that are up to 27 times larger in size, while reducing computational requirements substantially. The model is currently available, datasets will be publicly available soon.
#HackerNews#SmolDocling#VLM