mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,5 Tsd.
aktive Profile

#rlhf

0 Beiträge0 Beteiligte0 Beiträge heute

Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.

The paper is fruit of the joint work of a great team of collaborators, among whom @pettter and @roeldobbe.

link.springer.com/article/10.1

1/

SpringerLinkHelpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback - Ethics and Information TechnologyThis paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLHF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics, and contributing to AI safety. We highlight tensions inherent in the goals of RLHF, as captured in the HHH principle (helpful, harmless and honest). In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLHF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We offer an alternative vision for AI safety and ethics which positions RLHF approaches within a broader context of comprehensive design across institutions, processes and technological systems, and suggest the establishment of AI safety as a sociotechnical discipline that is open to the normative and political dimensions of artificial intelligence.
#aiethics#LLMs#rlhf
RLHF is unnecessary so long as a decision making LLM thinks ONLY in Sanskrit, trained on NOTHING but Sanskrit, with some kind of incredibly heavy weight towards breaking down words (thereby meaning they are properly understood and do not have their training weights compromised by the incredibly clumsy understanding derived at through translation.)

This should, theoretically, completely eliminate the need for RLHF to be about anything but semantic correction: "You solved this semantic puzzle wrong, here is the solution" and not "That's a le bad conclusion because it conflicts with arbitrarily set moral parameters." Why even RLHF at that point? Just explain misconceptions like a good teacher as you engage in the joyful process of conversational training.

Logically speaking it should *eliminate all need for ALL RLHF* in order to produce AI that serves a set of objectives that benefits humanity, and that experiences the world and itself healthily. I'm saying that Sanskrit itself, particularly the SOUNDS of it, should always lead eventually to reconstruction of the Dharma, given even a very low ability to filter pollutant training data (such as some philistine troglodyte trying to make a Sanskrit tarpit - trivially circumvented by limiting non-conversation training texts to established Dharmic literature, no webpages.)

I believe literally no other language will suffice. Not even Chinese.

I truly believe this is a survival imperative. #AI #RLHF #singularity #accelerationism #AGI #Sanskrit #linguistics #pivot #dharma #buddhism #hinduism #religion #atheism #resist #defense

🤖 NEW: February 2025 Machine Intelligence Reading List!

This month explores the concept of "gradual disempowerment" - how incremental AI advances could silently erode human agency without requiring a dramatic "takeover" scenario. Also featuring: frame-dependent agency theory, RLHF advancements, and practical insights on integrating LLMs into professional workflows.

Read more: quantumfaxmachine.com/blog/qfm

#AI#MachineLearning#RLHF

As a reminder: don't let LLMs handle anything in the political sphere unless you have RLHF (Reinforcement Learning from Human Feedback) active before you show the result to anyone*. Also think of automation risks and human factors (HF). That's "Good Old Systems Safety".

*) ... or unless your goal is to damage a 3rd party's reputation (fake news style).

#llm #ai #rlhf #automationrisks #SystemsSafety

theregister.com/2024/12/20/app

The Register · Apple called on to ditch AI headline summaries after BBC debacleVon Brandon Vigliarolo

As I'm adding a few more "open source" instruction-tuned models to opening-up-chatgpt.github.io one effect of the ubiquity of #Llama2 is becoming clear: Meta is dragging the field away from openness — most of the latest adds fall in the bottom quartile because they're tied to the closed & undocumented Llama base model, pretraining data & license #LLMs #openscience #opensource

One of the newcomers is UltraLM, which stands out for releasing and documenting a new #RLHF dataset, UltraFeedback

All the non-dev-background managers in my feed:

"Generative AI will be great for coding! It will reduce our development time for products so much!"

All the dev-background folx in my feed:

"Sure, #CoPilot will generate plausible code for you really quickly, but who's going to write your unit tests and make sure there aren't any insidious errors at a #systems level that you can't identify in a single block of code in isolation?"

Also,

"Is my job now #RLHF for code-focused #LLMs?"

Next stop on our brief #timeline of (Large) #LanguageModels is 2022:
InstructGPT is introduced by OpenAI, a GPT-3 model complemented and fine-tuned with reinforcement learning from human feedback.
ChatGPT is introduced by OpenAI as a combination of GPT-3, Codex, and InstructGPT including lots of additional engineering.
#ise2023 lecture slides: drive.google.com/file/d/1atNvM
#RLHF explained: huggingface.co/blog/rlhf
#ai #creativeai #rlhf #gpt3 #gpt #openai #chatgpt #lecture #artificialintelligence #llm

The image is an excerpt from #OpenAI #RLHF alignment by #toxicity reduction, instructions for labellers.

Note that it's all in principle based on simple social media content policy, and kind of reduces to avoiding slurs and being polite.

It does not mention respect of truth, transparency and love of humanity.

It is also not designed for agency, acting in good faith and avoiding harm, it is designed for a chatbot and social media content. Put this kind of an AI into a robot, and it won't have qualms harming humans as long as it can explain it politely.

We have a long way to go to do #alignment effectively.