Skip to main content
E
e
Glossary Term

Explainability

What fuels powerful AI systems? It all starts with training data. Learn how vast datasets, human feedback, and legal gray areas shape today’s most advanced models—and what challenges lie ahead.

By Tech Brew Staff

less than 3 min read

Back to Glossary

Definition:

Explainability is more or less what it sounds like: the ability to explain why an AI model has yielded a given result. For very simple machine learning algorithms, it might be easy. You can trace exactly how the input numbers were transformed through each part of an equation to reach the end result; it’s complicated but doable. These are collectively referred to as white-box AI.

OK, explain it a little more

For most of the more sophisticated neural networks at issue when we discuss AI today, it’s much more difficult, if not impossible. That’s why algorithms are often referred to as black-box systems. When it comes to large language models—key word: large—which, by nature, consist of billions of parameters, there’s no known way to explain exactly how they come up with their results.

The best that AI researchers currently have is an emerging sub-field called mechanistic interpretability, which seeks to understand the reasoning processes at play in massive foundation models—basically, how they “think.” Anthropic, for instance, has been able to identify clusters of concepts within Claude’s neural networks. More recently, it has started to understand the beginnings of how the system plots sentences and connects different languages.

The lack of explainability in most sophisticated AI can be an issue for a slew of reasons. If a company is using AI to make decisions about loans, jobs, or housing, regulators typically require more justification for denying an applicant than an algorithm’s say-so. AI systems are also prone to all sorts of biases, and the lack of explainability can make it harder to root them out.

In lieu of explainability in most modern AI systems, safety researchers instead must rely on evaluation, testing, validation, and trustworthiness. The industry has somewhat moved away from explainability as a goal and focused on transparency and openness of the underlying weights, source code, and data instead.