An interview with Marco Wobben, information modeling expert and creator of CaseTalk

The Lost Art of Understanding Data

In an era where organizations are drowning in data yet starving for meaning, there's a methodology developed decades ago that addresses a problem more relevant today than ever: how do we ensure that the people building IT systems truly understand what the business needs?

Marco Wobben has been working on fact-based modeling since the early 2000s, when a university professor handed him the source code of a modeling tool and asked him to maintain it. "I had to learn it from the inside out," he explains. "And now, with a lot of professors retired and the young people not having caught on yet, I'm kind of being considered the expert."

What Is Fact-Based Modeling?

At its core, fact-based modeling is about grounding abstract discussions in concrete examples. Instead of debating what a "customer" is at a conceptual level—which Wobben calls "type level arguments"—fact-based modeling forces you to use actual data.

"I'm not talking about 'customer buys product,'" Wobben explains. "I'm saying 'customer with customer number 123 buys product XYZ'—where both 123 and XYZ are actual data, real examples to ground the discussion."

This simple shift has profound implications. When you use real examples, you quickly discover that different departments may use the same words but mean entirely different things.

The Inventory Problem

Consider the word "inventory." Everyone in an organization might agree on the definition: it's the amount of things you have of a certain article. But ask different departments:

  • Sales says "we have three" (because they already sold some)
  • Purchasing says "we have eight" (because they ordered more)
  • Warehouse says "we only have one on the shelf"

"Even though everybody agrees on the definition, they have different data," Wobben notes. "It's only by looking at the actual data that you start realizing—wait a minute, we're all calling this inventory, but we're seeing different things."

The Problem with Traditional Data Modeling

Traditional data modeling focuses on tables, entities, classes, attributes, columns, and foreign keys. "The data itself is a secondary citizen," Wobben observes. "It's all on a type level and usually geared towards how we structure the storage of data—not how you and I figure out how we talk about the data."

This technical focus made sense when systems were built for specific departments where everyone knew the context. But as IT proliferated, computers appeared in every department, each system built in isolation with implicit context that was never documented.

"This is where data warehousing came in," Wobben explains. "But the data warehousing team now had to figure out—what does this data even mean? They were really trying to rediscover the context, reverse-engineering the human aspects of understanding on top of the data structure."

Facts, Claims, and the Illusion of Truth

The word "fact" in fact-based modeling doesn't mean absolute truth. Wobben is careful to distinguish between reality and what we record:

"There's something seriously flawed with the idea of a 'single point of truth.' We all perceive from our own bias and subjective reality. So there is no such thing as truth. But when you store data and consider it to be true in your world, then you can state that as a fact—as in 'I'm writing this down and me and my colleagues agree on it.'"

He's even considering whether "claims" might be a better term: "All the source systems claim a certain statement about what happened. Some might be alternate facts."

How Fact-Based Modeling Works in Practice

Wobben illustrates with a simple example:

Start with a fact statement: "Marco Wobben lives in Utrecht."

This is something people can agree on. But there's embedded knowledge here:

  • Marco Wobben is the citizen (identified by first name and surname)
  • Utrecht is the city (identified by city name)
  • The relationship is city of residence

Now populate with more examples: Marco's wife lives in Utrecht, his kids live in Utrecht. You can type these statements as "city of residence" facts.

Then ask constraining questions: "Could Marco live in Utrecht AND New York at the same time?"

The business expert would say no—there's something wrong with that. Through these interactive sessions, you discover business constraints that drive data structure:

  • If you can only live in one city → city becomes an attribute of citizen
  • If you can live in multiple cities → you need a linking table

"These constraints steer how the data is structured," Wobben explains.

The Alignment Challenge

Different systems identify the same concepts differently. One department uses names, another uses email addresses. Fact-based modeling handles this naturally:

"My citizen can either be identified by first name and surname OR by email address. So in the modeling, it's easy to find statements that generalize—allowing different ways of identification."

But this creates a data challenge: which combination of first name and surname goes with which email address? You need another fact statement: "Marco Wobben has email address This email address is being protected from spambots. You need JavaScript enabled to view it.."

"Nothing in my communication has to be altered to support alignment," Wobben notes. "Doing data mapping becomes just part of the verbalization."

Technical Debt and Business Debt

Organizations accumulate two kinds of debt:

Technical debt: Systems that are undocumented or poorly documented.

Business debt: Nobody knows what the data meant to the business anymore.

"Business wanted changes faster, and IT said 'we can deliver faster with this new tech.' But neither party realized what they were losing along the way," Wobben observes.

In government, the problem is even worse: "There's a massive gap between the legal articles made up by politicians—full of compromises and loopholes—and the actual systems used by government bodies. Now we change the law—which system do we need to change? Or we're looking at data, but we have no idea if we're even allowed to have it."

Why Not Just Use AI?

With large language models capturing everyone's attention, isn't AI the solution?

"An LLM is wonderful, magical stuff," Wobben acknowledges. "But it's not going to solve the real thinking issues. We can do data profiling, but we're still not sure if we got the context right. We can use LLMs, but we're still not sure if the relationships are correct. There's still that gap of knowledge that we lost and somehow need to reintroduce."

The key insight: LLMs can generate plausible-sounding explanations, but they can't verify whether those explanations match your organization's actual reality. Fact-based modeling, with its grounding in concrete examples, provides exactly the kind of verified context that keeps AI useful rather than hallucinating.

The Knowledge Evaporation Problem

Modern career patterns make documentation even more critical:

"With short-lived career jumps, you may find an expert, but they might be gone in four years. Seniors who actually know the organization are reaching retirement. The average job tenure is four to six years. The knowledge evaporates while we're trying to document it."

Traditional data modeling creates diagrams and schemas, but "there's not really a story there." Architects and data modelers often say, "Yeah, that's what I do in my head." Wobben's obvious question: "But does it leave your head? Does it get written down so your colleague can take over?"

From Information Model to Implementation

The power of properly captured information models is that they can generate technical artifacts automatically:

  • SQL databases with comments containing the original business language
  • Test data that came from the interview examples
  • Database views representing full user stories
  • Data Vault models, normalized models, or JSON schemas

"The physical parts become automatable. The data model becomes automatable. And having all the semantics and verbiage and examples, it makes it verifiable and readable by the business."

The IBM Banking Model Paradox

Wobben shares a telling observation: "There's not a bank in the world that hasn't purchased the IBM banking model. At the same time, I dare any bank to have actually implemented it."

The problem? To implement a standard model, you first have to know what you already do—and nobody has that documented. Plus, every bank wants to differentiate from competitors, so they don't really want to conform to one standard.

"There is no real universal pattern because everybody tries to give it their own little edge. You have to capture that edge, not just build something you think will fit."

Data Outlasts Technology

One insight that should give every organization pause:

"I've seen systems where the database was designed with rigor and the software was developed three times over, but the database didn't change. Mainframe became Windows, became Internet, became mobile. The technology changed, the business processes changed—but the data itself didn't change."

This underscores why getting the data model right, and preserving the story behind it, matters more than any particular technology choice.

Getting Started with Fact-Based Modeling

For those interested in exploring further:

  1. Start with a problem domain. You can't just jump into the jungle and start describing insects on the jungle floor. There's always a problem to solve, an integration challenge to address.

  2. Sit down with subject matter experts. Carve out time to ask: What's the issue? What are you doing? Start writing it down.

  3. Ground everything in examples. Don't accept abstract definitions. If someone can't give you a concrete example, they're likely outside their scope of expertise.

  4. Tie language to data. Keep the examples connected to the fact types so you can transform models into any artifact while preserving the original meaning.

Resources

  • CaseTalk: casetalk.com — Software tool supporting fact-based modeling
  • "Just the Facts" by Marco Wobben (Technics Publications) — An overview for management, architects, modelers, and developers
  • DAMA-DMBOK — Contains articles on fact-based modeling (improved in version 2)
  • Wikipedia — Background on fact-oriented modeling

Fact-based modeling won't solve all your data problems overnight. But in a world where everyone's chasing the next silver bullet, there's wisdom in an approach that starts with a simple question: Can we talk to each other, and are we writing down how we do that?