mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,5 Tsd.
aktive Profile

#nlp

10 Beiträge10 Beteiligte1 Beitrag heute

Predictive Testimony explores how compiled syntax in AI-generated police reports and judicial narratives redefines testimony as a syntactic construct, not observation. Key reading for #MedicalNLP and #LegalTech.
zenodo.org/records/16695136

ZenodoPredictive Testimony: Compiled Syntax in AI-Generated Police Reports and Judicial NarrativesAs AI systems draft police reports, insurance narratives, and judicial statements, a form of testimony emerges that is produced by syntax rather than direct observation. The paper argues that these systems operate as regla compilada, mapping heterogeneous inputs into surface sentences whose operator choices carry evidentiary force independent of officer perception. The analysis targets operator level mechanisms, including agentless passives, evidential frame insertion, temporal anchoring shifts, modal attenuation, serial nominalization, and quasi quotation, which shape who appears to act, what appears to occur, and how certainty is signaled. Method: twelve aligned pairs of body cam ASR segments and AI drafted report segments were tagged for six operators and compared with simple before and after counts. Findings show higher operator incidence in AI drafted text, preassigned narrative paths, and evidentiary posture shifts that do not depend on factual grounding or sensory access. The paper specifies audit artifacts for adversarial review, including compilation logs, prompt and template versions, operator traces, model release hashes, and officer edit diffs. The contribution is to locate evidentiary authority in operator conditioned form, not in content alone, and to establish a testable pathway from input stream to evidentiary surface relevant to confrontation, hearsay, and reliability analysis. Institutions are starting to use AI to write reports. These reports can read like testimony even when nobody actually witnessed the events. This paper explains how a regla compilada chooses sentence operators that change how a report works as evidence. We counted six operators in twelve matched pairs of audio and AI text. AI drafts used more operators, especially in accusatory parts. This matters for confrontation (who made the statement), hearsay (what source is being used), reliability (how certain the claim is), and chain of custody (what happened when). We provide a small audit kit, logs, operator traces, links to records, and an edits diff. Courts can run a simple screen. If a sentence has no identified source, cites “records” without a link, and replaces event time with system time, it should be cured or limited.   DOI Primary archive: https://doi.org/10.5281/zenodo.16689540 Secondary archive: https://doi.org/10.6084/m9.figshare.29790617 SSRN: Pending assignment (ETA: Q3 2025)

New article published: Syntax Without Subject
What happens when AI writes rules but removes the speaker?
This study tracks how LLMs erase the subject from legal, medical, and policy texts.
We call this structural delegation.
🔗 zenodo.org/records/16571077
#MedicalNLP #LegalTech
#MedTech #AIethics #AIgovernance #healthcare #ArtificialIntelligence #NLP #aifutures #LawFedi #lawstodon #tech #finance #business #agustinvstartari #medical #linguistics #ai #LRM #ClinicalAI #politics

ZenodoSyntax Without Subject: Structural Delegation and the Disappearance of Political Agency in LLM-Governed ContextsAbstract This article examines the syntactic disappearance of the subject in LLM-governed documents. Structural delegation refers to the transfer of agency to impersonal grammatical forms that preclude subject reappearance. Subjects are not censored but syntactically eliminated through passive constructions, nominalizations, and imperative prompt formats with suppressed agents. Building on prior work on synthetic ethos and impersonal command grammars, the article shows that AI-generated institutional texts display consistent patterns of subject erasure. The study analyzes 172 documents produced by GPT‑4 class models (temperature 0.2–0.7, 2024–2025) across legal, healthcare, and administrative domains. Metrics include passive ratio (via dependency label parsing), nominalization density (via POS and suffix filters), and instruction-format frequency. The result is a form of executable authority grounded not in referential authorship but in compliance with a regla compilada (type-0 production). The study proposes a typology of structural delegation and a formal framework for detecting syntactic absence in automated governance. This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29665697 and Pending SSRN ID to be assigned. ETA: Q3 2025. Resumen Este artículo examina la desaparición sintáctica del sujeto en documentos generados por modelos de lenguaje de gran escala (LLMs). Delegación estructural se define como la transferencia de agencia a formas gramaticales impersonales que impiden la reaparición del sujeto. No se censura a los agentes, sino que se los elimina mediante construcciones pasivas, nominalizaciones y formatos imperativos de instrucción con agente suprimido. Basado en trabajos previos sobre ethos sintético y gramáticas de mando impersonales, el artículo demuestra que los textos institucionales generados por IA presentan patrones consistentes de borramiento del sujeto. El estudio analiza 172 documentos producidos por modelos de clase GPT‑4 (temperatura 0.2–0.7, años 2024–2025) en los sectores legal, sanitario y administrativo. Las métricas incluyen proporción de pasivas (vía etiquetas de dependencia), densidad de nominalización (a través de filtros de sufijo y categoría gramatical), y frecuencia de formatos de instrucción. El resultado es una forma de autoridad ejecutable basada no en autoría referencial, sino en la adhesión a una regla compilada (producción tipo 0). El estudio propone una tipología de la delegación estructural y un marco formal para detectar la ausencia sintáctica en entornos de gobernanza automatizada.

New article published: Syntax Without Subject
What happens when AI writes rules but removes the speaker?
This study tracks how LLMs erase the subject from legal, medical, and policy texts.
We call this structural delegation.
🔗 zenodo.org/records/16571077
#MedicalNLP #LegalTech
#MedTech #AIethics #AIgovernance #healthcare #ArtificialIntelligence #NLP #aifutures #LawFedi #lawstodon #tech #finance #business #agustinvstartari #medical #linguistics #ai #LRM #ClinicalAI #politics

ZenodoSyntax Without Subject: Structural Delegation and the Disappearance of Political Agency in LLM-Governed ContextsAbstract This article examines the syntactic disappearance of the subject in LLM-governed documents. Structural delegation refers to the transfer of agency to impersonal grammatical forms that preclude subject reappearance. Subjects are not censored but syntactically eliminated through passive constructions, nominalizations, and imperative prompt formats with suppressed agents. Building on prior work on synthetic ethos and impersonal command grammars, the article shows that AI-generated institutional texts display consistent patterns of subject erasure. The study analyzes 172 documents produced by GPT‑4 class models (temperature 0.2–0.7, 2024–2025) across legal, healthcare, and administrative domains. Metrics include passive ratio (via dependency label parsing), nominalization density (via POS and suffix filters), and instruction-format frequency. The result is a form of executable authority grounded not in referential authorship but in compliance with a regla compilada (type-0 production). The study proposes a typology of structural delegation and a formal framework for detecting syntactic absence in automated governance. This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29665697 and Pending SSRN ID to be assigned. ETA: Q3 2025. Resumen Este artículo examina la desaparición sintáctica del sujeto en documentos generados por modelos de lenguaje de gran escala (LLMs). Delegación estructural se define como la transferencia de agencia a formas gramaticales impersonales que impiden la reaparición del sujeto. No se censura a los agentes, sino que se los elimina mediante construcciones pasivas, nominalizaciones y formatos imperativos de instrucción con agente suprimido. Basado en trabajos previos sobre ethos sintético y gramáticas de mando impersonales, el artículo demuestra que los textos institucionales generados por IA presentan patrones consistentes de borramiento del sujeto. El estudio analiza 172 documentos producidos por modelos de clase GPT‑4 (temperatura 0.2–0.7, años 2024–2025) en los sectores legal, sanitario y administrativo. Las métricas incluyen proporción de pasivas (vía etiquetas de dependencia), densidad de nominalización (a través de filtros de sufijo y categoría gramatical), y frecuencia de formatos de instrucción. El resultado es una forma de autoridad ejecutable basada no en autoría referencial, sino en la adhesión a una regla compilada (producción tipo 0). El estudio propone una tipología de la delegación estructural y un marco formal para detectar la ausencia sintáctica en entornos de gobernanza automatizada.

This is a Ficaria verna (formerly Ranunculus ficaria L.), or lesser celandine or pilewort (as per Wikipedia and other sites).

I have tested gemma 3 4b-it-q4_0 multimodal vision model for a while: it is not accurate, and can't be trusted.
For instance, it thinks it is a Ranunculus acris (it isn't). It's really hit and miss with this model. I guess it could still be useful to provide some clues or vocabulary.

#taximony#localLLM#NLP

New published article:
Syntax Without Subject: Structural Delegation and the Disappearance of Political Agency in LLM-Governed Contexts
zenodo.org/records/16571077

This study introduces the concept of structural delegation to explain how large language models produce legally
#MedicalNLP #LegalTech
#MedTech #AIethics #AIgovernance #healthcare #ArtificialIntelligence #NLP #aifutures #LawFedi #lawstodon #tech #finance #business #agustinvstartari #medical #linguistics #ai #LRM #ClinicalAI #politics

ZenodoSyntax Without Subject: Structural Delegation and the Disappearance of Political Agency in LLM-Governed ContextsAbstract This article examines the syntactic disappearance of the subject in LLM-governed documents. Structural delegation refers to the transfer of agency to impersonal grammatical forms that preclude subject reappearance. Subjects are not censored but syntactically eliminated through passive constructions, nominalizations, and imperative prompt formats with suppressed agents. Building on prior work on synthetic ethos and impersonal command grammars, the article shows that AI-generated institutional texts display consistent patterns of subject erasure. The study analyzes 172 documents produced by GPT‑4 class models (temperature 0.2–0.7, 2024–2025) across legal, healthcare, and administrative domains. Metrics include passive ratio (via dependency label parsing), nominalization density (via POS and suffix filters), and instruction-format frequency. The result is a form of executable authority grounded not in referential authorship but in compliance with a regla compilada (type-0 production). The study proposes a typology of structural delegation and a formal framework for detecting syntactic absence in automated governance. This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29665697 and Pending SSRN ID to be assigned. ETA: Q3 2025. Resumen Este artículo examina la desaparición sintáctica del sujeto en documentos generados por modelos de lenguaje de gran escala (LLMs). Delegación estructural se define como la transferencia de agencia a formas gramaticales impersonales que impiden la reaparición del sujeto. No se censura a los agentes, sino que se los elimina mediante construcciones pasivas, nominalizaciones y formatos imperativos de instrucción con agente suprimido. Basado en trabajos previos sobre ethos sintético y gramáticas de mando impersonales, el artículo demuestra que los textos institucionales generados por IA presentan patrones consistentes de borramiento del sujeto. El estudio analiza 172 documentos producidos por modelos de clase GPT‑4 (temperatura 0.2–0.7, años 2024–2025) en los sectores legal, sanitario y administrativo. Las métricas incluyen proporción de pasivas (vía etiquetas de dependencia), densidad de nominalización (a través de filtros de sufijo y categoría gramatical), y frecuencia de formatos de instrucción. El resultado es una forma de autoridad ejecutable basada no en autoría referencial, sino en la adhesión a una regla compilada (producción tipo 0). El estudio propone una tipología de la delegación estructural y un marco formal para detectar la ausencia sintáctica en entornos de gobernanza automatizada.

📄 New article: Sovereign Syntax in Financial Disclosure
How LLMs simulate trust using grammar, not facts.
🔍 SDRI = a syntactic risk index for crypto whitepapers.
Audit language structure, not just claims.
🔗 zenodo.org/records/16421548| papers.ssrn.com/abstract=53662

ZenodoSovereign Syntax in Financial Disclosure: How LLMs Shape Trust in Tokenized EconomiesThrough structural analysis of LLM‑generated or LLM‑refined whitepapers, this study identifies a recurring pattern in tokenized finance: legitimacy is simulated through formal syntactic depth rather than verifiable disclosure. It introduces the Syntactic Deception Risk Index (SDRI), a quantitative measure of non‑referential persuasion derived from syntactic volatility. Grounded in Algorithmic Obedience and The Grammar of Objectivity, the findings show that high‑risk disclosures converge on a formal grammar that substitutes substantive content with surface coherence. The concept of sovereign syntax is formalized as the regla compilada (type‑0 production) that governs trust independently of source or reference. From this model follow concrete pathways for audit automation, exchange‑side filtration, and real‑time regulatory screening. SDRI thus exposes how non‑human authority embeds in financial language without a traceable epistemic anchor. DOI: https://doi.org/10.5281/zenodo.16421548 This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29646473 and Pending SSRN ID to be assigned.
#LLM#MedicalNLP#LegalTech
ZenodoSovereign Syntax in Financial Disclosure: How LLMs Shape Trust in Tokenized EconomiesThrough structural analysis of LLM‑generated or LLM‑refined whitepapers, this study identifies a recurring pattern in tokenized finance: legitimacy is simulated through formal syntactic depth rather than verifiable disclosure. It introduces the Syntactic Deception Risk Index (SDRI), a quantitative measure of non‑referential persuasion derived from syntactic volatility. Grounded in Algorithmic Obedience and The Grammar of Objectivity, the findings show that high‑risk disclosures converge on a formal grammar that substitutes substantive content with surface coherence. The concept of sovereign syntax is formalized as the regla compilada (type‑0 production) that governs trust independently of source or reference. From this model follow concrete pathways for audit automation, exchange‑side filtration, and real‑time regulatory screening. SDRI thus exposes how non‑human authority embeds in financial language without a traceable epistemic anchor. DOI: https://doi.org/10.5281/zenodo.16421548 This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29646473 and Pending SSRN ID to be assigned.

🧾 New paper out:
When Grammar Fails the Audit: How One Bad Sentence Can Cost Your Company Millions

🔍 Solution: fair-syntax transformation → reduces errors by 15%.

📄 Read: zenodo.org/records/16322760
✍️ Agustin V. Startari
#LLM #MedicalNLP #LegalTech #MedTech #AIethics #AIgovernance #cryptoreg #healthcare #ArtificialIntelligence #NLP #aifutures #LawFedi #lawstodon #tech #finance #business #agustinvstartari #medical #linguistics #ai #LRM #ClinicalAI #politics #regulation

ZenodoExpense Coding Syntax: Misclassification in AI-Powered Corporate ERPsAbstract This study examines how syntactic constructions in expense narratives affect misclassification rates in AI-powered corporate ERP systems. We trained transformer-based classifiers on labeled accounting data to predict expense categories and observed that these models frequently relied on grammatical form rather than financial semantics. We extracted syntactic features including nominalization frequency, defined as the ratio of deverbal nouns to verbs; coordination depth, measured by the maximum depth of coordinated clauses; and subordination complexity, expressed as the number of embedded subordinate clauses per sentence. Using SHAP (SHapley Additive exPlanations), we identified that these structural patterns significantly contribute to false allocations, thus increasing the likelihood of audit discrepancies. For interpretability, we applied the method introduced by Lundberg and Lee in their seminal work, “A Unified Approach to Interpreting Model Predictions,” published in Advances in Neural Information Processing Systems 30 (2017): 4765–4774. To mitigate these syntactic biases, we implemented a rule-based debiasing module that re-parses each narrative into a standardized fair-syntax transformation, structured around a minimal Subject-Verb-Object sequence. Evaluation on a corpus of 18,240 expense records drawn from the U.S. Federal Travel Expenditure dataset (GSA SmartPay, 2018–2020, https://smartpay.gsa.gov) shows that the fair-syntax transformation reduced misclassification rates by 15 percent. It also improved key pre-audit compliance indicators, including GL code accuracy—defined as the percentage of model-assigned codes matching human-validated general ledger categories, with a target threshold of ≥ 95 percent—and reconciliation match rate, the proportion of expense records successfully aligned with authorized payment entries, aiming for ≥ 98 percent. The findings reveal a direct operational link between linguistic form and algorithmic behavior in accounting automation, providing a replicable interpretability framework and a functional safeguard against structural bias in enterprise classification systems.   DOI: https://doi.org/10.5281/zenodo.16322760 This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29618654 and Pending SSRN ID to be assigned. ETA: Q3 2025. Resumen Este estudio analiza cómo las construcciones sintácticas presentes en las narrativas de gastos afectan las tasas de clasificación errónea en sistemas ERP corporativos impulsados por inteligencia artificial. Se entrenaron clasificadores basados en transformadores sobre datos contables etiquetados, y se observó que estos modelos se apoyan con frecuencia en patrones gramaticales superficiales en lugar de en la semántica financiera subyacente. Se extrajeron métricas sintácticas como la frecuencia de nominalización (definida como la proporción entre sustantivos deverbales y verbos), la profundidad de coordinación (medida por la máxima profundidad de cláusulas coordinadas) y la complejidad de subordinación (cantidad de cláusulas subordinadas incrustadas por oración). Mediante SHAP (SHapley Additive exPlanations), se identificó que estos patrones estructurales contribuyen de forma significativa a asignaciones erróneas, elevando el riesgo de discrepancias contables y observaciones de auditoría. Para la interpretación, se aplicó el método propuesto por Lundberg y Lee en su artículo “A Unified Approach to Interpreting Model Predictions”, publicado en Advances in Neural Information Processing Systems 30 (2017): 4765–4774. Como estrategia de mitigación, se implementó un módulo de corrección sintáctica basado en reglas que reescribe cada narrativa en una transformación de sintaxis justa, estructurada según una secuencia mínima Sujeto–Verbo–Objeto. La evaluación se realizó sobre un corpus de 18.240 registros de gastos extraídos del conjunto de datos del Programa Federal de Viajes de EE. UU. (GSA SmartPay, 2018–2020, https://smartpay.gsa.gov), y mostró que la transformación de sintaxis justa redujo las tasas de clasificación errónea en un 15 %. También mejoró indicadores clave de cumplimiento previo a la auditoría, como la precisión del código contable (porcentaje de códigos asignados por el modelo que coinciden con categorías del libro mayor validadas por humanos, con umbral objetivo ≥ 95 %) y la tasa de conciliación (proporción de registros emparejados correctamente con transacciones autorizadas, con objetivo ≥ 98 %). Los resultados revelan un vínculo operativo directo entre la forma lingüística y el comportamiento algorítmico en automatización contable, y proponen un marco replicable de interpretación y una salvaguarda funcional contra el sesgo estructural en los sistemas de clasificación empresarial.

🤔 What is #NLP research 𝘳𝘦𝘢𝘭𝘭𝘺 about?
We analyzed 29k+ papers to find out! 📚🔍

📌 Our NLPContributions dataset, from the ACL Anthology, reveals what authors actually contribute—artifacts, insights, and more.

📈 Trends show a swing back towards language & society. Curious where you fit in?

🎁 Tools, data, and analysis await you:

📄 Paper: arxiv.org/abs/2409.19505
🌐Project: ukplab.github.io/acl25-nlp-con
💻 Code: github.com/UKPLab/acl25-nlp-co
💾 Data: tudatalib.ulb.tu-darmstadt.de/

(1/🧵)

🧾 New paper out:
When Grammar Fails the Audit: How One Bad Sentence Can Cost Your Company Millions

🔍 Solution: fair-syntax transformation → reduces errors by 15%.

📄 Read: zenodo.org/records/16322760
✍️ Agustin V. Startari
#LLM #MedicalNLP #LegalTech #MedTech #AIethics #AIgovernance #cryptoreg #healthcare #ArtificialIntelligence #NLP #aifutures #LawFedi #lawstodon #tech #finance #business #agustinvstartari #medical #linguistics #ai #LRM #ClinicalAI #politics #regulation

ZenodoExpense Coding Syntax: Misclassification in AI-Powered Corporate ERPsAbstract This study examines how syntactic constructions in expense narratives affect misclassification rates in AI-powered corporate ERP systems. We trained transformer-based classifiers on labeled accounting data to predict expense categories and observed that these models frequently relied on grammatical form rather than financial semantics. We extracted syntactic features including nominalization frequency, defined as the ratio of deverbal nouns to verbs; coordination depth, measured by the maximum depth of coordinated clauses; and subordination complexity, expressed as the number of embedded subordinate clauses per sentence. Using SHAP (SHapley Additive exPlanations), we identified that these structural patterns significantly contribute to false allocations, thus increasing the likelihood of audit discrepancies. For interpretability, we applied the method introduced by Lundberg and Lee in their seminal work, “A Unified Approach to Interpreting Model Predictions,” published in Advances in Neural Information Processing Systems 30 (2017): 4765–4774. To mitigate these syntactic biases, we implemented a rule-based debiasing module that re-parses each narrative into a standardized fair-syntax transformation, structured around a minimal Subject-Verb-Object sequence. Evaluation on a corpus of 18,240 expense records drawn from the U.S. Federal Travel Expenditure dataset (GSA SmartPay, 2018–2020, https://smartpay.gsa.gov) shows that the fair-syntax transformation reduced misclassification rates by 15 percent. It also improved key pre-audit compliance indicators, including GL code accuracy—defined as the percentage of model-assigned codes matching human-validated general ledger categories, with a target threshold of ≥ 95 percent—and reconciliation match rate, the proportion of expense records successfully aligned with authorized payment entries, aiming for ≥ 98 percent. The findings reveal a direct operational link between linguistic form and algorithmic behavior in accounting automation, providing a replicable interpretability framework and a functional safeguard against structural bias in enterprise classification systems.   DOI: https://doi.org/10.5281/zenodo.16322760 This work is also published with DOI reference in Figshare https://doi.org/10.6084/m9.figshare.29618654 and Pending SSRN ID to be assigned. ETA: Q3 2025. Resumen Este estudio analiza cómo las construcciones sintácticas presentes en las narrativas de gastos afectan las tasas de clasificación errónea en sistemas ERP corporativos impulsados por inteligencia artificial. Se entrenaron clasificadores basados en transformadores sobre datos contables etiquetados, y se observó que estos modelos se apoyan con frecuencia en patrones gramaticales superficiales en lugar de en la semántica financiera subyacente. Se extrajeron métricas sintácticas como la frecuencia de nominalización (definida como la proporción entre sustantivos deverbales y verbos), la profundidad de coordinación (medida por la máxima profundidad de cláusulas coordinadas) y la complejidad de subordinación (cantidad de cláusulas subordinadas incrustadas por oración). Mediante SHAP (SHapley Additive exPlanations), se identificó que estos patrones estructurales contribuyen de forma significativa a asignaciones erróneas, elevando el riesgo de discrepancias contables y observaciones de auditoría. Para la interpretación, se aplicó el método propuesto por Lundberg y Lee en su artículo “A Unified Approach to Interpreting Model Predictions”, publicado en Advances in Neural Information Processing Systems 30 (2017): 4765–4774. Como estrategia de mitigación, se implementó un módulo de corrección sintáctica basado en reglas que reescribe cada narrativa en una transformación de sintaxis justa, estructurada según una secuencia mínima Sujeto–Verbo–Objeto. La evaluación se realizó sobre un corpus de 18.240 registros de gastos extraídos del conjunto de datos del Programa Federal de Viajes de EE. UU. (GSA SmartPay, 2018–2020, https://smartpay.gsa.gov), y mostró que la transformación de sintaxis justa redujo las tasas de clasificación errónea en un 15 %. También mejoró indicadores clave de cumplimiento previo a la auditoría, como la precisión del código contable (porcentaje de códigos asignados por el modelo que coinciden con categorías del libro mayor validadas por humanos, con umbral objetivo ≥ 95 %) y la tasa de conciliación (proporción de registros emparejados correctamente con transacciones autorizadas, con objetivo ≥ 98 %). Los resultados revelan un vínculo operativo directo entre la forma lingüística y el comportamiento algorítmico en automatización contable, y proponen un marco replicable de interpretación y una salvaguarda funcional contra el sesgo estructural en los sistemas de clasificación empresarial.

🎓 Join us: Fully-Funded PhD Position in ML/NLP – Starting October in Berlin

Are you interested in doing a PhD in Machine Learning or Natural Language Processing? A fully-funded PhD position is now open in our group!

You'll work on cutting-edge topics in large language models (LLMs) and open-source ML research, alongside a collaborative and curious team.

1/3

#jobs#berlin#nlp