Reinforcement learning, explained with a minimum of math and jargon https://www.understandingai.org/p/reinforcement-learning-explained (excellent, clear explainer) #AI #RLHF #training

Reinforcement learning, explained with a minimum of math and jargon https://www.understandingai.org/p/reinforcement-learning-explained (excellent, clear explainer) #AI #RLHF #training
Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @pettter and @roeldobbe.
https://link.springer.com/article/10.1007/s10676-025-09837-2
1/
Models should take languages into account, but also cultures
Arabic is not a culture. It includes many dialects, and many cultures
@amr-keleg.bsky.social
surveys the current practices in https://alphaxiv.org/pdf/2503.15003
And finishes with a call
#LLMs #multiculturality #AI #RLHF
NEW: February 2025 Machine Intelligence Reading List!
This month explores the concept of "gradual disempowerment" - how incremental AI advances could silently erode human agency without requiring a dramatic "takeover" scenario. Also featuring: frame-dependent agency theory, RLHF advancements, and practical insights on integrating LLMs into professional workflows.
Read more: https://quantumfaxmachine.com/blog/qfm053-machine-intelligence-reading-list-february-2025
#ACMPrize
#2024ACMPrize
#ACMTuringAward
» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «
aka cheap labor unnamed in papers
https://awards.acm.org/about/2024-turing
2/2
As a reminder: don't let LLMs handle anything in the political sphere unless you have RLHF (Reinforcement Learning from Human Feedback) active before you show the result to anyone*. Also think of automation risks and human factors (HF). That's "Good Old Systems Safety".
*) ... or unless your goal is to damage a 3rd party's reputation (fake news style).
#llm #ai #rlhf #automationrisks #SystemsSafety
https://www.theregister.com/2024/12/20/apple_ai_headline_summaries/?td=rt-3a
I'm not at #NeurIPS2024 :-/ but my fantastic collaborators are and will present our work
"Distributional Preference Alignment of LLMs via Optimal Transport" tomorrow Wed 11 Dec 11 a.m.
Check it out!
https://neurips.cc/virtual/2024/poster/96822
Anyone out there work for #OutlierAI via something called #G2i ?
#RLHF #LLM
Something seems weird about it to me...
But, i'm hyper ... what's the phrase.
Cautiously optimistic.
The Future of Open Human Feedback https://www.alphaxiv.org/abs/2408.16961?_bhlid=9fad3a07a1557a4963640d2475e4330dd586c418 #AI #RLHF #open
Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?
We (15 orgs) gathered the key issues and next steps.
Envisioning
a community-driven feedback platform, like Wikipedia
https://alphaxiv.org/abs/2408.16961
#machinelearning #RLHF #hci #ethics #LLM #ml #NLP #NLProc
@parismarx and well-known AI researchers are leaving OpenAI or have already left. Who from the authors of the original #RLHF paper is still there?
As I'm adding a few more "open source" instruction-tuned models to https://opening-up-chatgpt.github.io one effect of the ubiquity of #Llama2 is becoming clear: Meta is dragging the field away from openness — most of the latest adds fall in the bottom quartile because they're tied to the closed & undocumented Llama base model, pretraining data & license #LLMs #openscience #opensource
One of the newcomers is UltraLM, which stands out for releasing and documenting a new #RLHF dataset, UltraFeedback
GPT-4V: How image analysis and language models are now changing the AI landscape
#GPT4V #OpenAI #ArtificialIntelligence #Multimodal #ImageAnalysis #LanguageModels #RLHF #MedicalApplications #Science #technology
https://kinews24.de/gpt-4v-bildanalyse-und-sprachmodelle-veraendern-ki/
GPT-4V - Multimodal bringt viele neue Optionen - ein tiefer Blick in die Zukunft der KI
#GPT4V #OpenAI #KünstlicheIntelligenz #Multimodal #Bildanalyse #Sprachmodelle #RLHF #MedizinischeAnwendungen #Wissenschaft #Technologie #KI #AI
https://kinews24.de/gpt-4v-bildanalyse-und-sprachmodelle-veraendern-ki/
All the non-dev-background managers in my feed:
"Generative AI will be great for coding! It will reduce our development time for products so much!"
All the dev-background folx in my feed:
"Sure, #CoPilot will generate plausible code for you really quickly, but who's going to write your unit tests and make sure there aren't any insidious errors at a #systems level that you can't identify in a single block of code in isolation?"
Also,
Next stop on our brief #timeline of (Large) #LanguageModels is 2022:
InstructGPT is introduced by OpenAI, a GPT-3 model complemented and fine-tuned with reinforcement learning from human feedback.
ChatGPT is introduced by OpenAI as a combination of GPT-3, Codex, and InstructGPT including lots of additional engineering.
#ise2023 lecture slides: https://drive.google.com/file/d/1atNvMYNkeKDwXP3olHXzloa09S5pzjXb/view?usp=drive_link
#RLHF explained: https://huggingface.co/blog/rlhf
#ai #creativeai #rlhf #gpt3 #gpt #openai #chatgpt #lecture #artificialintelligence #llm
The image is an excerpt from #OpenAI #RLHF alignment by #toxicity reduction, instructions for labellers.
Note that it's all in principle based on simple social media content policy, and kind of reduces to avoiding slurs and being polite.
It does not mention respect of truth, transparency and love of humanity.
It is also not designed for agency, acting in good faith and avoiding harm, it is designed for a chatbot and social media content. Put this kind of an AI into a robot, and it won't have qualms harming humans as long as it can explain it politely.
We have a long way to go to do #alignment effectively.