Blog Posts
-
Insights from the Largest Study on ChatGPT Usage
(Read More) (Original-Websource)
OpenAI & Harvard on Consumer ChatGPT Usage
TL;DR ⏱️
- 1.5M conversations analyzed, set against 700M weekly active users
- Usage is now globally balanced, with strong growth in low- and middle-income countries
- ChatGPT is used for asking, doing, and expressing — across work and private life
-
One of the biggest missing pieces between Open-Source and Open-Weight models ... and a Weird Finding about Pre-training Data!
(Read More) (Original-Websource)
Open-Source ≠ Open-Weight — Pre-training Data Mystery
TL;DR ⏱️
- Pre-training data shapes everything downstream (RAG, RLHF, prompting).
- FineWeb uses per-snapshot deduplication only, not global.
- Raises open questions about contamination vs. genuine generalization.
-
Unmasking AI’s Footprint - The Real Cost of Large Language Models
(Read More) (Original-Websource)
The Hidden Environmental Cost of AI
TL;DR ⏱️
- Mistral AI published the first full life-cycle analysis (LCA) of an LLM
- Training has huge up-front CO₂, water, and resource costs
- Smart choices (renewables, smaller models, optimisation) reduce the footprint
-
Is your Data on European Servers of US-Hyperscalers?
(Read More) (Original-Websource)
Microsoft admits: "No, We Can’t Guarantee It"
TL;DR ⏱️
- Microsoft France admitted under oath: EU data on U.S. hyperscalers isn’t fully protected
- U.S. CLOUD Act enables access without informing you
- Enterprises need exit plans & sovereign AI options
-
Understanding the Levels of LLM Evaluation
(Read More) (Original-Websource)
A Crash-Course in Evaluating Large Language Models 🚀
TL;DR ⏱️
- Complex tasks → open-ended outputs
- Wide range of benchmark approaches
- Big differences in scale & complexity
-
Risk of LLMs Getting Stuck in Local Optima – Are We Training Optimally?
(Read More) (Original-Websource)
Local Optima Risk in Large Language Model Training
TL;DR ⏱️
- LLMs risk getting stuck in local optima during training
- Extreme dimensionality reduces “bad minima” but makes intuition hard
- Research shows many optima are connected, reducing real-world risk
-
LLM Context Engineering and Vision Language Models – What do they have in common?
(Read More) (Original-Websource)
Context Matters – In Language and Vision
TL;DR ⏱️
- Context engineering goes beyond prompt engineering
- More context = better predictions (for humans and models)
- Vision Transformers outperform CNNs by leveraging global context early
-
Real AI Business Process Risks – LLM & RAG Attacks
(Read More) (Original-Websource)
Invisible Context Editing: How Hidden Attacks Undermine AI & RAG Pipelines
TL;DR ⏱️
- Malicious actors hide invisible instructions in PDFs
- LLMs & RAG pipelines can be silently manipulated
- Guard-rail models comparing PDF text vs OCR output can detect anomalies
-
What color is the Coke can in this picture? On Context and Neural Structures
(Read More) (Original-Websource)
Context and Neural Structures in LLMs
TL;DR ⏱️
- LLMs only predict based on the context we provide
- Humans also rely on assumptions instead of raw data
- Understanding neural structures improves AI deployment
-
Battle for Talent and Resources in AI
(Read More) (Original-Websource)
Who Will Stay on Top of the AI Race?
TL;DR ⏱️
- Global battle for AI talent and compute resources
- U.S., China, and Europe positioning themselves with different strategies
- Open-weight tooling becoming an important backbone
-
Bigger AI Risk due to Sleeper Agents, MCP & Reasoning Attacks
(Read More) (Original-Websource)
Now bigger AI Risk due to Sleeper Agents, MCP & Reasoning Attacks
TL;DR ⏱️
- Increased risks from LLM backdoors or data poisoning via tool calling
- “Sleeper Agents” hidden in open-weight models
- Closed-source models vulnerable through pre-tuning data injection
- Possible new attack surface when using MCP
-
Comma Talk 25 – Keynote & GenAI Roundtable Highlights
(Read More) (Original-Websource)
AI is happening in the heart of EU industry! 🤖🇪🇺
TL;DR ⏱️
- Keynote talk on (Gen)AI trends and hype demystification
- GenAI Roundtable with open, unfiltered discussions
- Industry-wide AI applications across Europe
-
Meta’s LLaMA 4 Fraud Has Been Exposed!
(Read More) (Original-Websource)
AI Benchmarks and the Meta LLaMA 4 Controversy
TL;DR ⏱️
- Meta’s Arena Elo #2 model is not the same as the Hugging Face version
- Performance discrepancies raise trust issues in AI benchmarks
- Benchmark gaming and contamination are growing concerns
-
Meta’s Llama 4 Release – Issues with Elo Scores?
(Read More) (Original-Websource)
LLaMA 4 is here – but questions remain about Elo Scores and benchmarks.
TL;DR ⏱️
- Wait for verified SOTA benchmark results and reliable Elo scores
- Elo Score #2 based on "Experimental" Maverick raises questions
- Meta switched to Mixture of Experts (MoE) architecture, similar to DeepSeek and Mistral
- Released multiple model sizes, scaling up to 2 trillion parameters
- Proposing multi-million token context windows
-
VIBE Coding – SWE & AI Engineering Jobs, Code Models & Reinforcement Learning
(Read More) (Original-Websource)
VIBE Coding – Code Models & Huge Opportunity in Reinforcement Learning
TL;DR ⏱️
- VIBE Coding: no longer coding yourself, just prompting desired functions
- Code models + RL = measurable GenAI performance & big opportunities
- Still controversial: productivity vs. technical debt
-
There is also technical foundation for thinking capability in Reasoning LLMs
(Read More) (Original-Websource)
Do LLMs Think – Or Is It Just Next Token Prediction? 🤔
TL;DR ⏱️
- Short explanation of reasoning models
- Technical description: where is the potential for "thoughts" in transformers
- My hypothesis on LLMs and thinking capabilities
-
DeepSeek has huge issues. Code on last Slide. Be careful!
(Read More) (Original-Websource)
DeepSeek Alignment and Censorship Concerns
TL;DR ⏱️
- DeepSeek R1 alignment issues
- Censorship on topics related to China
- Use with caution
-
What a Wonderful Start, More to Come!
(Read More) (Original-Websource)
My journey at Comma Soft AG and my recent promotion to Senior Level 🥳🥰
TL;DR ⏱️
- Promotion to Senior Level
- Leading AI R&D and Product Development
- Building Alan.de for the European market
-
What happens in a learning (AI) brain? Problems of AI Model Knowledge Editing and LLM Cut Off dates lead to more errors in question answering and worse climate footprint
(Read More) (Original-Websource)
Discussing the challenges of AI Model Knowledge Editing and the impact on question answering systems and climate footprint.
TL;DR ⏱️
- Knowledge editing in AI models is complex and can lead to errors.
- Re-training models from scratch has a significant climate impact.
- Alternative solutions like RAG and Knowledge Graphs have their own challenges.
-
Expanding my GenAI content beyond LinkedIn!
(Read More) (Original-Websource)
I'm excited to share that my work-in-progress blog is now live at carstendraschner.github.io!
TL;DR ⏱️
- New blog live at carstendraschner.github.io
- Reach broader audience beyond LinkedIn
- Experiment with Alan.de and GitHub Pages
-
LLAMA 3.3 70B is there! Here are the facts!
(Read More) (Original-Websource)
New Release of LLAMA 3.3 70B with Significant Improvements
TL;DR ⏱️
- Performance Improvements over 3.2 & 3.1
- Similar License of Meta
- On par with other API Models from OpenAI & Anthropic
-
New DE/EU Open Source LLM 🇪🇺 Teuken 7B, Now Truly Open Source?
(Read More) (Original-Websource)
New Model out of Open-GPT-X developed within Germany from European Perspective
TL;DR ⏱️
- New German OS LLM
- 7B, Transperant Docs
- Problems in Alignment?
-
We're Dealing with the Butterfly Effect in LLM Creation Pipelines 🦋🤖
(Read More) (Original-Websource)
When we develop novel LLMs, AI-Agents and GenAI Pipelines for Alan.de or within our various other GenAI projects, we're continuously learning about the Butterfly Effect in GenAI creation pipelines and how to mitigate this problem. 👨🏼💻 🤯 Every change can have a huge impact on the entire pipeline that creates and executes models
TL;DR ⏱️
- The Butterfly Effect can significantly impact LLM creation pipelines.
- Small changes in preprocessing or generation configs can have large ripple effects.
- Constantly evaluating trade-offs and staying up-to-date with new models is crucial.
- Strategies to mitigate the Butterfly Effect are essential for optimal performance.
-
New Official OSI "Open Source AI definition" attacks Meta's LLAMA as "Open Washing"
(Read More) (Original-Websource)
We getting closer what should be considered being open source in GenAI and not only open weight.
TL;DR 📝
- Official OSI definition of Open Source AI available.
- Meta doesn't like it 🙅♂️.
- State-of-the-art (SOTA) models lack being truly Open Source 🤔.
- "Open Washing" Problem of Meta and others?!
-
The Uncanny Valley Phenomenon in GenAI Face Synthesis
(Read More) (Original-Websource)
GenAI hits Uncanny Valley Problems as we have seen in Animation and Computer Games
TL;DR 📝
- Official OSI definition of Open Source AI available.
- Meta doesn't like it 🙅♂️.
- State-of-the-art (SOTA) models lack being truly Open Source 🤔.
- "Open Washing" Problem of Meta and others?!
-
A GenAI Solution/Tweak to Rescue ARD tagesschau webpage?!
(Read More) (Original-Websource)
Brainstorming how old regulations meet new GenAI opportunities.
TL;DR 📝
- Debate over new regulations restricting text content of public broadcasters online.
- Reform could impact news portals like tagesschau.de by limiting text content to only that which accompanies broadcasts.
- GenAI could potentially auto-generate videos from text content to comply with regulations.
- Questions about the impact and regulation of GenAI in public broadcasting contexts.
-
We Cannot Waste Time and Resources through Regenerating GenAI Creations! GenAI Control is Key for Productivity!
(Read More) (Original-Websource)
We will see more and more precise adjustment options in GenAI Tooling
TL;DR ⏱️
- GenAI often lacks precise Adjuments
- Tools and Models start CLosing the Gap
- Major Waste of Time and Resources through inefficient Regenerates
-
Choosing the Right LLM for Your Needs - Key Considerations
(Read More) (Original-Websource)
Consider the key factors when selecting a Large Language Model (LLM) for your project.
TL;DR ⏱️
- Benchmark Performance
- License
- Model Size
- Alignment
-
Stop adding Languages to LLMs! The Potential Drawbacks of Training Multilingual Large Language Models (LLMs) for Performance and Sustainability!
(Read More) (Original-Websource)
Exploring the downsides of creating multilingual LLMs and their impact on performance and resource utilization.
TL;DR ⏱️
- Challenges of building multilingual LLMs
- Inefficiencies in token usage and context length
- Increased hardware costs and reduced token training
- Weighing multilingual models against language-specific models
-
Where Science Meets Innovation - My personal Highlights & Insights into the PG 2024! Do you have answers to the open Questions?
(Read More) (Original-Websource)
Highlights and open questions from the Petersberger Gespräche (PG) 2024, covering AI, energy transition, chip technologies, and more.
TL;DR ⏱️
- AI and consciousness discussions
- Energy transition and regulatory challenges
- Distributed chip technologies in Europe
- Generative AI in media
- Metaverse applications beyond gaming
-
Today's Research Proposal - How to achieve "real" thinking and reasoning in GenAI, rather than just relying on a silent Chain of Thought, as seen in ReflectionAI or possibly GPT-o1?
(Read More) (Original-Websource)
Exploring the potential for achieving true reasoning and thinking in Generative AI models beyond the current Chain of Thought methodologies.
TL;DR ⏱️
- Current state of reasoning in models
- Possibilities for transformers to learn to think
- Customization ideas for achieving true reasoning
- Open questions and discussion points
-
For more sustainability transparency in GenAI! Share your knowledge and reduce energy waste!
(Read More) (Original-Websource)
Emphasizing the importance of transparency and shared knowledge to enhance sustainability in GenAI.
TL;DR ⏱️
- GenAI involves very large models and significant training efforts
- Transparency can help share emissions and reduce energy waste
- Open source models can optimize future development
-
Sustainable Air-Gapped On-Prem LLM Solution! How can we make GenAI available on almost any hardware, and how is it also available as a portable demo on our Alan Notebook
(Read More) (Original-Websource)
Exploring the development of a full-stack GenAI LLM solution that can run on a variety of hardware configurations, including a portable demo setup.
TL;DR ⏱️
- Developing Alan, a full-stack GenAI LLM solution
- Hosted on German hyperscaler infrastructure
- Offers a smaller version, Alan-S-LLM
- Portable demo available on Alan Notebook
-
Combining the Hugging Face Model Platform and Knowledge Graph trend analysis over time could improve GenAI research and reduce waste of energy!
(Read More) (Original-Websource)
Exploring the potential of leveraging knowledge graphs to analyze trends in evolving models for better GenAI research and efficiency.
TL;DR ⏱️
- Leveraging knowledge graphs for GenAI trends
- Identifying high-performing models and best practices
- Potential for a crowd-sourced GenAI cookbook
-
What is the perfect approach to adjust an LLM to your GenAI use case?
(Read More) (Original-Websource)
Exploring various methods to customize LLMs for specific GenAI use cases, ranging from simple to complex approaches.
TL;DR ⏱️
- Various ways to customize LLMs for specific use cases
- Approaches vary in difficulty and complexity
- Pros and cons of different methods
- More dimensions to improve GenAI use cases
-
These results give me hope for sustainable AI 🌱
(Read More) (Original-Websource)
I'm impressed by some of the recent advances in the field of "small" open-weight Language Models (LLMs).
TL;DR ⏱️
- Increased documentation supports reproducibility
- Data quality improves model performance
- Model distillation reduces hardware needs
-
LLMs - Big vs Small. Bigger is Better!? OR Let's not waste energy!?
(Read More) (Original-Websource)
The AI community is abuzz with debates over the efficacy of large versus small language models. Both have their own merits and limitations.
TL;DR ⏱️
- AI community debates model sizes
- Massive models vs. smaller, efficient models
- Insights and future predictions
- Links to further reading
-
GenAI, what is plagiarism? Its Impact on Science. How should it be handled? What is your perspective?
(Read More) (Original-Websource)
Discussing the implications of GenAI on scientific work and the thin line between acceptable use and plagiarism.
TL;DR ⏱️
- Use of GenAI in scientific work
- Acceptable vs. debatable vs. critical usage
- Questions and concerns about plagiarism
- The pressure on researchers and students
- Opportunities for better research
-
Adjust GenAI responses towards more ethical behavior possible through system prompts!? Do you trust in such LLM-chat prepended pamphlets?
(Read More) (Original-Websource)
Exploring the potential and challenges of using system prompts to guide LLM behavior towards ethical outputs.
TL;DR ⏱️
- GenAI chat interactions often include system prompts
- System prompts aim to guide ethical LLM behavior
- Challenges exist in ensuring compliance and formulation
- Questions on designing and revealing system prompts
-
Alternative to GenAI creativity? Watch and try out these fun Evolutionary Algorithms. Problem-solving without GenAI and SGD-based approaches explained!
(Read More) (Original-Websource)
Exploring Evolutionary Algorithms as an alternative to GenAI for problem-solving, using a fun 2D vehicle race example.
TL;DR ⏱️
- There is hype around GenAI and LLMs
- Evolutionary Algorithms (EAs) offer an alternative
- A fun example of EAs using a 2D vehicle race
- Steps involved in EAs explained
-
Fast New ELO Over MixEval. But Looking into Code, I Got Doubts!
(Read More) (Original-Websource)
Have you seen that MixEval Hard has two interesting but little critical aspects
TLDR
- Examined MixEval, an open-source LLM benchmark
- Uses LLMs as judges to predict continuous values
- Evaluation data contains duplicates
- Uncertain about trusting these methods
-
Do we need another GenAI solution? How & why we developed a full-stack GenAI LLM+RAG tool called Alan. A sneak peek at what I am currently working on.
(Read More) (Original-Websource)
An overview of the motivations and technical aspects behind developing our own GenAI solution, Alan, at Comma Soft AG.
TL;DR ⏱️
- Diverse GenAI solutions exist
- Unique motivations for developing our own tool
- Technical advantages of our solution
- Questions on custom development vs. wrappers
-
Will we reach AGI, and if so, by transformers-based architectures? Share your perspective!
(Read More) (Original-Websource)
Exploring the potential of transformers-based architectures in achieving Artificial General Intelligence (AGI) and the ongoing debate surrounding it.
TL;DR ⏱️
- GenAI's impact on AGI discussions
- Technical challenges with transformer-based architectures
- Optimistic yet cautious approach at Comma Soft AG
-
Do you differentiate AI Ethics principles between AI/ML fields like GenAI/LLMs or Knowledge Graph-based ML? How do we deal with so-called facts on the internet as training data?
(Read More) (Original-Websource)
Exploring the nuances of AI ethics across different AI/ML fields and handling internet-based training data responsibly.
TL;DR ⏱️
- AI ethics principles across different AI/ML fields
- Personal background and perspective on AI ethics
- Recommendations and further reading on AI ethics
- Questions to ponder on AI ethics practices
-
What expectations do you have regarding the values and norms of your GenAI chat assistants? Highly sensitive topic in the LLM space! My take...
(Read More) (Original-Websource)
Exploring the ethical considerations and expectations surrounding the values and norms embedded in GenAI chat assistants.
TL;DR ⏱️
- LLMs generate text based on training
- Alignment and finetuning influence behavior
- Ethical considerations in different languages
- Need for a holistic view on model behavior
-
Be careful when you speak of Open (Source) GenAI. Why OpenAI and Meta (shouldn't) use the word Open within their GenAI efforts?
(Read More) (Original-Websource)
Examining the implications of using the term "Open" in the context of GenAI by organizations like OpenAI and Meta.
TL;DR 🚅
- Open Source is a huge and important field in computer science and AI
- The word "Open" is used widely within the GenAI field: OpenAI, Open Source LLMs
-
Thanks to the Open Source Community for all their efforts! Greetings from PyCon 2024
(Read More) (Original-Websource)
Expressing gratitude to the open-source community and sharing experiences from PyConDE & PyData Berlin 2024.
TL;DR ⏱️
- Trip to PyCon with colleagues
- Attended insightful talks in various AI fields
- Appreciation for open-source community
- Gratitude to all contributors and supporters
-
Who will take care of truly low-resource languages? A good step towards more fair GenAI LLM pricing at OpenAI for Japanese-using people!
(Read More) (Original-Websource)
Exploring the challenges and recent developments in addressing low-resource languages within the GenAI landscape, with a focus on OpenAI's efforts for the Japanese language.
TL;DR ⏱️
- Issues with LLMs for low-resource languages
- Major challenges with different character languages
- OpenAI's new dedicated model for Japanese
- Concerns about AI ethics and inequality
-
What is your preferred LLM family? And do you start with an already finetuned LLM? Why you have chosen this LLM? I love to hear your perspective!
(Read More) (Original-Websource)
Understanding the preferences and choices behind selecting specific LLM families and their finetuned variants.
TL;DR ⏱️
- GenAI for text implemented by LLMs
- Many open-source models available
- Continuous influx of new models
- Key foundation model families
- LLM-based GenAI pipelines at Comma Soft AG
-
NVIDIA Benchmark might be WRONG cause it states - You lose money AND LLM inference speed if you add more NVIDIA A100. This NVIDIA Benchmark is NOT reliable.
(Read More) (Original-Websource)
Analyzing the reliability of NVIDIA's benchmark results and the implications for LLM inference speed and hardware investment.
TL;DR ⏱️
- Terms and background on LLMs and inference
- Strange findings in NVIDIA's benchmark results
- Concerns about the reliability of these benchmarks
- Questions and further reading on the topic
-
Too many LLMs?! How to keep track with all the Open Source Models? Identify the finetuned-masked LLMs and its position within the GenAI landscape!
(Read More) (Original-Websource)
Navigating the complex landscape of GenAI models can be challenging, but it's crucial to understand the foundational and finetuned models to make informed decisions.
TL;DR ⏱️
- The GenAI landscape is crowded with many models
- Keeping track of innovations and true effects is hard
- Transparency issues with many so-called "open-source" models
- Recommendations for navigating this landscape
-
Be careful when you are using LLAMA-2! Legal risks & Sustainability Implications due to LLAMA-2 is (NOT) Open Source.
(Read More) (Original-Websource)
Important considerations regarding LLAMA-2's legal and sustainability implications.
TL;DR ⏱️
- LLAMA-2's legal and sustainability challenges
- Not truly open-source according to OSD
- Technical implications of its license
- Meta's restrictions and their broader impact
-
The major players in GenAI are facing challenges with their Generative AIs. GenAI capabilities and security issues related to LLMs Tools • 37C3 Presentation
(Read More) (Original-Websource)
Challenges and security issues in GenAI and LLMs, highlighted at 37C3.
TL;DR ⏱️
- GenAI has immense capabilities
- Ethical and secure GenAI pipelines are crucial
- 37C3 presentation on security issues and exploitations
- Categories of threats and challenges in GenAI
-
Not prompting in English?... You have Problems!! LLM Language Barriers • Democratizing GenAI and fair pricing
(Read More) (Original-Websource)
Exploring the challenges of using Generative AI with languages other than English and the implications for cost and performance.
TL;DR ⏱️
- Tokenizers and their role in LLMs
- Challenges of non-English prompts
- Efficiency and fairness in GenAI
- Recommendations for LLM pipelines
-
Ever wondered about open source LLLM sizes - 7B, 13B, 70B?
(Read More) (Original-Websource)
Where do those model sizes come from?... My findings!
TL;DR ⏱️
- Open-source LLM alternatives to AIaaS
- Hugging Face as a source for open-source models
- Many models are finetuned variants
- Bigger models imply slower inference & higher costs
- Different use cases require different model capabilities
- Questioning the parameter step sizes of models
-
We've beaten GPT4! ... is a sentence which starts to annoy me.
(Read More) (Original-Websource)
About Mistrust in LLM Evaluation. Benchmark contamination in LLMs? How to Evaluate GenAI?!
𝗧𝗟;𝗗𝗥 ⏱️
- News is flooded with LLMs being “better” than GPT4
- LLM Evaluation is Difficult
- Benchmark Contamination is a serious issue
- Build your own use case specific benchmarks
-
DALLE has surprising guardrails. Your image is not filtered based on your prompt. "Dead cookies" may be generated ...sometimes
(Read More) (Original-Websource)
Interesting findings on DALLE's content filtering mechanisms.
TL;DR ⏱️
- DALLE-3 filters your content AFTER image creation
- With prompt “dead cookies” you can reproduce inconsistent filtering over OpenAI API
- 40% of cases with same “dead cookies” prompt stop through content filter and 60% reach us over API
-
Evil LLMs available! Break GenAI Alignment through finetuning!
(Read More) (Original-Websource)
Need for LLM Alignment transparency?
TL;DR ⏱️
- Powerful LLMs are mostly aligned
- Alignment can be broken through finetuning
- Need for transparency in alignment processes
- Questions about alignment in LLMs
-
LLAMA2 13B is faster than LLAMA2 7B, according to NVIDIA benchmark!
(Read More) (Original-Websource)
Interesting findings on NVIDIA's LLAMA 2 benchmark results.
TL;DR ⏱️
- NVIDIA LLAMA 2 Benchmark results
- LLAMA 13B reported faster than LLAMA 7B
- Questions about the accuracy of these findings
- Seeking community insights