Skip to main content
Carsten Felix Draschner, PhD

Evil LLMs available! Break GenAI Alignment through finetuning!

Need for LLM Alignment transparency?

Image 1

TL;DR ⏱️

For one of the most interesting open source LLMs, the Mixtral 8x7B, a finetuned LLM is available which has “broken” Alignment & answers to problematic prompts without prompt injections. Example in images (reference see below) shows “funny” but the astonishing LLM capabilities with broken Alignment.

Powerful LLMs are mostly aligned (Mixtral, LLAMA2, GPT4, …)

LLM/GPT creation three-step approach ⚙️

  1. Initial pretuning: Next token prediction
  2. Chat/Instruction finetuning: training for conversational interaction & execution of tasks
  3. Alignment: Adjust answers to not respond to critical questions like: creation of hate speech, critical advice in health issues, creation of spam or fraudulent content, and other

Alignment Explanation 👩🏽‍🏫

My Questions 🤷🏼‍♂️

IMHO 🤗

Within a great team @Comma Soft AG we are evaluating, selecting and finetuning open source LLMs for dedicated use cases.

Credit to: Eric Hartford & Hugging Face & Mistral AI https://lnkd.in/eyBSi4iu AI Ethics - clickworkers: https://lnkd.in/eKFfQZfF

#genai #artificialintelligence #aiethics #huggingface #llm #alignment

LinkedIn Post