Markus Weingärtner @markusblogde

**Nicole Hennig** @nic221@techhub.social · 23. Juni

Nicole Hennig @nic221@techhub.social

Reinforcement learning, explained with a minimum of math and jargon https://www.understandingai.org/p/reinforcement-learning-explained (excellent, clear explainer) #AI #RLHF #training

Understanding AI · 23. JuniReinforcement learning, explained with a minimum of math and jargonVon Timothy B. Lee

**Dimitri Coelho Mollo** @dcm@social.sunet.se · 5. Juni

5. Juni

Dimitri Coelho Mollo @dcm@social.sunet.se

Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.

The paper is fruit of the joint work of a great team of collaborators, among whom @pettter and @roeldobbe.

https://link.springer.com/article/10.1007/s10676-025-09837-2

SpringerLinkHelpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback - Ethics and Information TechnologyThis paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLHF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics, and contributing to AI safety. We highlight tensions inherent in the goals of RLHF, as captured in the HHH principle (helpful, harmless and honest). In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLHF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We offer an alternative vision for AI safety and ethics which positions RLHF approaches within a broader context of comprehensive design across institutions, processes and technological systems, and suggest the establishment of AI safety as a sociotechnical discipline that is open to the normative and political dimensions of artificial intelligence.

#aiethics #LLMs #rlhf

**Some Bits: Nelson's Linkblog** @somebitslinks@tech.lgbt · 22. Apr.

22. Apr.

Some Bits: Nelson's Linkblog @somebitslinks@tech.lgbt

AI sycophancy: How reinforcement learning leads to AIs that act obsequious
https://arstechnica.com/information-technology/2025/04/annoyed-chatgpt-users-complain-about-bots-relentlessly-positive-tone/
#training #chatgpt #rlhf #llm #ai #+

Ars Technica · 21. Apr.Annoyed ChatGPT users complain about bot’s relentlessly positive toneVon Benj Edwards

**Leshem Choshen** @LChoshen@sigmoid.social · 25. März

25. März

Leshem Choshen @LChoshen@sigmoid.social

Models should take languages into account, but also cultures
Arabic is not a culture. It includes many dialects, and many cultures

@amr-keleg.bsky.social
surveys the current practices in https://alphaxiv.org/pdf/2503.15003
And finishes with a call

#LLMs #multiculturality #AI #RLHF

**druid** @druid@shrine.moe · 17. März

17. März

druid @druid@shrine.moe

RLHF is unnecessary so long as a decision making LLM thinks ONLY in Sanskrit, trained on NOTHING but Sanskrit, with some kind of incredibly heavy weight towards breaking down words (thereby meaning they are properly understood and do not have their training weights compromised by the incredibly clumsy understanding derived at through translation.)

This should, theoretically, completely eliminate the need for RLHF to be about anything but semantic correction: "You solved this semantic puzzle wrong, here is the solution" and not "That's a le bad conclusion because it conflicts with arbitrarily set moral parameters." Why even RLHF at that point? Just explain misconceptions like a good teacher as you engage in the joyful process of conversational training.

Logically speaking it should *eliminate all need for ALL RLHF* in order to produce AI that serves a set of objectives that benefits humanity, and that experiences the world and itself healthily. I'm saying that Sanskrit itself, particularly the SOUNDS of it, should always lead eventually to reconstruction of the Dharma, given even a very low ability to filter pollutant training data (such as some philistine troglodyte trying to make a Sanskrit tarpit - trivially circumvented by limiting non-conversation training texts to established Dharmic literature, no webpages.)

I believe literally no other language will suffice. Not even Chinese.

I truly believe this is a survival imperative. #AI #RLHF #singularity #accelerationism #AGI #Sanskrit #linguistics #pivot #dharma #buddhism #hinduism #religion #atheism #resist #defense

#defense

M@ @matthewsinclair@masto.ai · 16. März

16. März

M@ @matthewsinclair@masto.ai

NEW: February 2025 Machine Intelligence Reading List!

This month explores the concept of "gradual disempowerment" - how incremental AI advances could silently erode human agency without requiring a dramatic "takeover" scenario. Also featuring: frame-dependent agency theory, RLHF advancements, and practical insights on integrating LLMs into professional workflows.

Read more: https://quantumfaxmachine.com/blog/qfm053-machine-intelligence-reading-list-february-2025

#AI #MachineLearning #RLHF

**Teixi** @teixi@mastodon.social · 9. März

9. März

Teixi @teixi@mastodon.social

#ACMPrize
#2024ACMPrize
#ACMTuringAward

#AndrewBarto
#RichardSutton

» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «

aka cheap labor unnamed in papers

https://awards.acm.org/about/2024-turing

2/2

awards.acm.orgAndrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning.Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning—one of the most important approaches for creating intelligent systems.

**Dr Rockstar ♫** @ajaxStardust@vivaldi.net · 24. Jan.

24. Jan.

Dr Rockstar ♫ @ajaxStardust@vivaldi.net

Ain't too proud to beg!
sweet darlin'

Please don't leave me baby!

https://gofund.me/186ee140

#airesearch #rlhf #ml

**Thomas** @tg9541@mas.to · 26. Dez. 2024

26. Dez. 2024

Thomas @tg9541@mas.to

As a reminder: don't let LLMs handle anything in the political sphere unless you have RLHF (Reinforcement Learning from Human Feedback) active before you show the result to anyone*. Also think of automation risks and human factors (HF). That's "Good Old Systems Safety".

*) ... or unless your goal is to damage a 3rd party's reputation (fake news style).

#llm #ai #rlhf #automationrisks #SystemsSafety

https://www.theregister.com/2024/12/20/apple_ai_headline_summaries/?td=rt-3a

The Register · 20. Dez. 2024Apple called on to ditch AI headline summaries after BBC debacleVon Brandon Vigliarolo

**Mattia Rigotti** @matrig@mastodon.social · 10. Dez. 2024

10. Dez. 2024

Mattia Rigotti @matrig@mastodon.social

I'm not at #NeurIPS2024 :-/ but my fantastic collaborators are and will present our work
"Distributional Preference Alignment of LLMs via Optimal Transport" tomorrow Wed 11 Dec 11 a.m.

Check it out!
https://neurips.cc/virtual/2024/poster/96822

neurips.ccNeurIPS Poster Distributional Preference Alignment of LLMs via Optimal Transport

#AI #LLM #MachineLearning

**Dr Rockstar ♫** @ajaxStardust@vivaldi.net · 5. Okt. 2024

5. Okt. 2024

Dr Rockstar ♫ @ajaxStardust@vivaldi.net

Anyone out there work for #OutlierAI via something called #G2i ?
#RLHF #LLM
Something seems weird about it to me...

But, i'm hyper ... what's the phrase.
Cautiously optimistic.

**Nicole Hennig** @nic221@techhub.social · 7. Sept. 2024

7. Sept. 2024

Nicole Hennig @nic221@techhub.social

The Future of Open Human Feedback https://www.alphaxiv.org/abs/2408.16961?_bhlid=9fad3a07a1557a4963640d2475e4330dd586c418 #AI #RLHF #open

alphaXivThe Future of Open Human Feedback | alphaXivHuman feedback on conversations with language language models (LLMs) is central to how these systems learn about the world, improve their capabilities, and are steered toward desirable and safe behaviors. However, this feedback is mostly collected by frontier AI labs and kept behind closed doors. In this work, we bring together interdisciplinary experts to assess the opportunities and challenges to realizing an open ecosystem of human feedback for AI. We first look for successful practices in peer production, open source, and citizen science communities. We then characterize the main challenges for open human feedback. For each, we survey current approaches and offer recommendations. We end by envisioning the components needed to underpin a sustainable and open human feedback ecosystem. In the center of this ecosystem are mutually beneficial feedback loops, between users and specialized models, incentivizing a diverse stakeholders community of model trainers and feedback providers to support a general open feedback pool.

**Leshem Choshen** @LChoshen@sigmoid.social · 6. Sept. 2024

6. Sept. 2024

Leshem Choshen @LChoshen@sigmoid.social

Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?
We (15 orgs) gathered the key issues and next steps.
Envisioning
a community-driven feedback platform, like Wikipedia

https://alphaxiv.org/abs/2408.16961

#machinelearning #RLHF #hci #ethics #LLM #ml #NLP #NLProc

Antwortete im Thread

**Ulrich Junker** @UlrichJunker@fediscience.org · 18. Mai 2024

18. Mai 2024

Ulrich Junker @UlrichJunker@fediscience.org

@parismarx and well-known AI researchers are leaving OpenAI or have already left. Who from the authors of the original #RLHF paper is still there?

**Mark Dingemanse** @dingemansemark@scholar.social · 12. Okt. 2023 *

12. Okt. 2023 *

Mark Dingemanse @dingemansemark@scholar.social

As I'm adding a few more "open source" instruction-tuned models to https://opening-up-chatgpt.github.io one effect of the ubiquity of #Llama2 is becoming clear: Meta is dragging the field away from openness — most of the latest adds fall in the bottom quartile because they're tied to the closed & undocumented Llama base model, pretraining data & license #LLMs #openscience #opensource

One of the newcomers is UltraLM, which stands out for releasing and documenting a new #RLHF dataset, UltraFeedback

**KINEWS24** @KiNews@mastodon.social · 3. Okt. 2023

3. Okt. 2023

KINEWS24 @KiNews@mastodon.social

GPT-4V: How image analysis and language models are now changing the AI landscape

#GPT4V #OpenAI #ArtificialIntelligence #Multimodal #ImageAnalysis #LanguageModels #RLHF #MedicalApplications #Science #technology

https://kinews24.de/gpt-4v-bildanalyse-und-sprachmodelle-veraendern-ki/

KI NEWS24 · 29. Sept. 2023GPT-4V: Wie Bildanalyse und Sprachmodelle jetzt die KI-Landschaft verändern - KiNews24.deGPT-4V: Entdecke die Welt von GPT-4V, dem multimodalen KI-Modell von OpenAI. Erfahren Sie mehr über seine Anwendungen, Technologie und die Herausforderungen, die es mit sich bringt.

**KINEWS24** @KiNews@mastodon.social · 29. Sept. 2023

29. Sept. 2023

KINEWS24 @KiNews@mastodon.social

GPT-4V - Multimodal bringt viele neue Optionen - ein tiefer Blick in die Zukunft der KI

#GPT4V #OpenAI #KünstlicheIntelligenz #Multimodal #Bildanalyse #Sprachmodelle #RLHF #MedizinischeAnwendungen #Wissenschaft #Technologie #KI #AI

https://kinews24.de/gpt-4v-bildanalyse-und-sprachmodelle-veraendern-ki/

**Kathy Reid** @KathyReid@aus.social · 29. Sept. 2023

29. Sept. 2023

Kathy Reid @KathyReid@aus.social

All the non-dev-background managers in my feed:

"Generative AI will be great for coding! It will reduce our development time for products so much!"

All the dev-background folx in my feed:

"Sure, #CoPilot will generate plausible code for you really quickly, but who's going to write your unit tests and make sure there aren't any insidious errors at a #systems level that you can't identify in a single block of code in isolation?"

Also,

"Is my job now #RLHF for code-focused #LLMs?"

**Harald Sack** @lysander07@sigmoid.social · 24. Aug. 2023 *

24. Aug. 2023 *

Harald Sack @lysander07@sigmoid.social

Next stop on our brief #timeline of (Large) #LanguageModels is 2022:
InstructGPT is introduced by OpenAI, a GPT-3 model complemented and fine-tuned with reinforcement learning from human feedback.
ChatGPT is introduced by OpenAI as a combination of GPT-3, Codex, and InstructGPT including lots of additional engineering.
#ise2023 lecture slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
#RLHF explained: https://huggingface.co/blog/rlhf
#ai #creativeai #rlhf #gpt3 #gpt #openai #chatgpt #lecture #artificialintelligence #llm

**Tero Keski-Valkama** @tero@rukii.net · 31. Juli 2023

31. Juli 2023

Tero Keski-Valkama @tero@rukii.net

The image is an excerpt from #OpenAI #RLHF alignment by #toxicity reduction, instructions for labellers.

Note that it's all in principle based on simple social media content policy, and kind of reduces to avoiding slurs and being polite.

It does not mention respect of truth, transparency and love of humanity.

It is also not designed for agency, acting in good faith and avoiding harm, it is designed for a chatbot and social media content. Put this kind of an AI into a robot, and it won't have qualms harming humans as long as it can explain it politely.

We have a long way to go to do #alignment effectively.

Frühere Suchanfragen

Suchoptionen

Verwaltet von:

Serverstatistik:

#rlhf