mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,4 Tsd.
aktive Profile

#softwaredevelopment

30 Beiträge29 Beteiligte2 Beiträge heute

Can AI really code? Study maps the roadblocks to autonomous software engineering:

news.mit.edu/2025/can-ai-reall

“Without a channel for the #AI to expose its own confidence — ‘this part’s correct … this part, maybe double‑check’ — developers risk blindly trusting hallucinated logic that compiles, but collapses in production. Another critical aspect is having the AI know when to defer to the user for clarification.”

THIS! 👆

MIT News | Massachusetts Institute of TechnologyCan AI really code? Study maps the roadblocks to autonomous software engineeringVon Rachel Gordon | MIT CSAIL
Fortgeführter Thread

I mean, I'm never really compiling binaries, unless they are #NodeJs dependencies (some might be) so it's never *really* mattered, but the obsessive compulsive part of me just wants my dev environment to be as much like production as possible..

#cloud#serverless#lambda

Do you write code that runs on Linux and macOS? If so, what does your development environment look like? Please boost for visibility.

Do AI models help produce verified bug fixes?

"Abstract: Among areas of software engineering where AI techniques — particularly, Large Language Models — seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills?

To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research questions (Goals in the GoalQuery-Metric approach), specific elements admitting specific answers (Queries), and measurements supporting these answers (Metrics). While applied so far to a limited sample size, the results are a first step towards delineating a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.

These results caused surprise as compared to what one might expect from the use of AI for debugging and APR. The contributions also include: a detailed methodology for experiments in the use of LLMs for debugging, which other projects can reuse; a finegrain analysis of programmer behavior, made possible by the use of full-session recording; a definition of patterns of use of LLMs, with 7 distinct categories; and validated advice for getting the best of LLMs for debugging and Automatic Program Repair"

arxiv.org/abs/2507.15822

arXiv logo
arXiv.orgDo AI models help produce verified bug fixes?Among areas of software engineering where AI techniques -- particularly, Large Language Models -- seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills? To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research questions (Goals in the Goal-Query-Metric approach), specific elements admitting specific answers (Queries), and measurements supporting these answers (Metrics). While applied so far to a limited sample size, the results are a first step towards delineating a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs. These results caused surprise as compared to what one might expect from the use of AI for debugging and APR. The contributions also include: a detailed methodology for experiments in the use of LLMs for debugging, which other projects can reuse; a fine-grain analysis of programmer behavior, made possible by the use of full-session recording; a definition of patterns of use of LLMs, with 7 distinct categories; and validated advice for getting the best of LLMs for debugging and Automatic Program Repair.
#AI#GenerativeAI#LLMs

Interesting tidbits from #Anthropic’s blog on how they use Claude Code:
anthropic.com/news/how-anthrop

Top tip from Data Science and ML Engineering teams: treat it like a *slot machine*. Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh…

Top tip from Product Engineering teams: treat it as an *iterative partner*, not a one-shot solution…

Hand with network visualization nodes and slides in presentation context
www.anthropic.comHow Anthropic teams use Claude CodeDiscover how Anthropic's internal teams leverage Claude Code for development workflows, from debugging to code assistance.
#AI#coding#genAI

How can DevOps principles accelerate the development of embedded systems? 🤔

Discover how DevOps can improve embedded development with Mariusz Walczyk, our Senior DevOps Engineer.

👉 youtube.com/watch?v=XuG3sur2bs

From faster builds to streamlined CI/CD pipelines and automated hardware testing - this presentation covers it all.

You'll learn about the core basics, how to overcome some build challenges, what tools to use, and more!

"[W]hat we are doing is shepherding AI, limiting it to certain contexts. We are learning where it’s best to call it, how is best to feed it. And what to do with the output. So is it looks very much like an editorial process, an editorial workflow where you provide some initial input, maybe some some idea on what content to produce, then you review it. There’s always that quality assurance, quality control side, the supervision.

AI is not really autonomous. It relies a lot on us. And I feel like sometimes there are days where, when coding through AIs or doing some assisted writing, I’m spending more time helping out the AI doing the actual task that I’m asking the AI to do. But I take this as a learning process. I read this article the other day, Nobody knows how to build with AI yet. And it was a developer saying that they haven’t quite figured out how to best work with AI. There were lots of comments around the fact that you have to spend lots of time, you have to learn how to talk to it, and when the model changes, you have to also maybe change something you’re doing. You have to learn how to optimize your time. But your presence is always mandatory.”

passo.uno/webinar-ai-tech-writ

passo.uno · Webinar: What's Wrong with AI Generated DocsToday I discussed how tech writers can use AI at work with Tom Johnson and Scott Abel. It all started from my post What’s wrong with AI-generated docs, though we didn’t just focus on the negatives; in fact, we ended up acknowledging that, while AI has limitations, it’s also the most powerful productivity tool at our disposal. Here are some of the things I said during the webinar, transcribed and edited for clarity.