Skip to main content
Carsten Felix Draschner, PhD

We've beaten GPT4! ... is a sentence which starts to annoy me.

About Mistrust in LLM Evaluation. Benchmark contamination in LLMs? How to Evaluate GenAI?!

Image 1

𝗧𝗟;𝗗𝗥 ⏱️

Why do we need LLM benchmarks? 📊

Why is LLM Evaluation? 👩🏽‍🔬

My Tasks 🔍

Extract of my hands-on criteria👨🏼‍💻

Credit ❤️

My Questions?

#generativeai #artificialintelligence #llm #machinelearning #benchmark