dog PNG50380 smallNormalization is not always fun, and even in 1989 someone by the name of Marc Rettig tried to make it more fun and easy to absorb. He must have thought, "everybody loves puppies", let's make everybody love normalization by using puppies as an example. Naturally he was right, everybody loves puppies. :-P

In this article we are going through the process of verbalizing the data shown, and information stated from the original poster published in 1989. First though we need to credit Thomas Friesendal for bringing the poster back to life in his article at dataversity.

rules of normalization

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model.

Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).

~ Wikipedia.

The weird part of this poster is that it immediately assumes we have non normalized data available. Naturally, we would start at looking at data and have the business user or domain expert state facts about the data. But we'll play ball...

Apparently we have data of some sort which lists:

  • puppy number, puppy name
  • kennel code, kennel name, kennel location
  • trick id, trick name, trick where learned, skill level

Eliminate Repeating Groups

Make a separate table for each set of related attributes, and give each table a primary key.

Separate the puppies from their learned tricks. Puppies and tricks are identified independently. This can be accomplished by simply stating the names of the puppies and the tricks:

Puppy Name
"Puppy 51 is called Chief."
"Puppy 52 is called Boo."
"Puppy 53 is called Phydeaux."
"Puppy 54 is called Bugger."

Trick Name
"Trick 27 is called Roll Over."
"Trick 16 is called Nose Stand."
"Trick 18 is called Sit Up and Beg."

Eliminate redundant data

If an attribute depends on only part of a multivalued key, remove it to a separate table.

Having broken apart the puppies and the tricks, we're left with information which connects the two concepts. Where did the puppy learn the trick, and secondly at which level does it master its trick?

Puppy Trick where Learned
"Puppy 52 learned trick 27 at kennel 16."
"Puppy 53 learned trick 16 at kennel 9."
"Puppy 54 learned trick 27 at kennel 9."

Puppy Trick Skill Level
"Puppy 52 knows trick 27 at level 9."
"Puppy 53 knows trick 16 at level 9."
"Puppy 54 knows trick 27 at level 5."

eliminate columns not dependent on key

If attributes do not contribute to a description of the key, remove them to a separate table.

Break down the puppy information further down into puppies and kennels.

Puppy Location
"Puppy 54 is housed in kennel 16."

Kennel Name
"Kennel 5 is named Daisy Hill Puppy Farm."
"Kennel 6 is named Khabul Khennels."

Kennel Location
"Kennel 5 is located in Chicago."

isolate independent multiple relationships

No table may contain two or more 1:n or n:m relationships that are not directly related.

The costumes the puppies can wear should be separated out from the puppies and the tricks they know.

Puppy Costume
"Puppy 52 can wear a Groucho Marx mask."
"Puppy 54 can wear a Wet suit."

isolate semantically related multiple relationships

There may be practical constraints on information that justify separating logically related many-to-many relationships.

The breed offered by the kennel, is not the same information as the breed certain breeders handle, or which breeds our breeders provide for certain kennels.

Breed
"The breed Afghans exists."

Kennel-Breed
"Kennel 5 has dogs of the breed Spaniel."
"Kennel 5 has dogs of the breed Dachshund."
"Kennel 5 has dogs of the breed Banana-Biter."

Kennel-Breeder
"Kennel 5 sells dogs from breeder Acme."
"Kennel 5 sells dogs from breeder Puppy Factory."
"Kennel 5 sells dogs from breeder Whatapuppy."

Breed Full Name
"The full name of the breed Banana-Biters is West Indian Banana-Biters."

Breeder Breed
"Breeder Acme breeds Spaniel."
"Breeder Acme breeds Dachshund."
"Breeder Acme breeds Banana-Biter."
"Breeder Puppy Factory breeds Spaniel."
"Breeder Puppy Factory breeds Dachshund."
"Breeder Puppy Factory breeds Banana-Biter."
"Breeder Whatapuppy breeds Spaniel."
"Breeder Whatapuppy breeds Dachshund."
"Breeder Whatapuppy breeds Banana-Biter."

 Finally, there are a few more facts mentioned in the text of the poster. Trivial, but probably worth mentioning to be more complete:

Puppy Cause of Death
"Puppy 51 deceased for it was eaten by a lion."

Breed Full Name
"The full name of the breed Banana-Biters is West Indian Banana-Biters."

The posters goal was to explain the steps of normalization of data for database implementations, hence the terms "table" and "key". The interesting part of stating elementary fact expression as done above, is that the actual normalization is merely a step to transform the elementary facts into a grouped and lexicalized form, forming tables as the outcome.

So, let's start with a diagram which shows all the facts above in a single diagram.

puppytricks

Facts form the database

After having transformed the fact oriented information model with a click of a button, we'll end up with the following set of tables for our database:

puppytricks.erd

Facts form other artifacts

As one may expect, databases are not all we want or need. There's also a whole wide world of software developers. So, why not also present this as a UML Class diagram:

puppytricks.uml

High level view

As you can see, the process of normalization is about structuring the data into tables. But when stating facts, we can have this process done almost automatically. And for our managers who just want to know the high level diagrams? Yes, we can also fold our models into an even simpler diagram, called a concept map:

puppytricks.map

questions for our business experts

As can be seen in the first diagram, a lot is assumed. The poster does not answer all questions, and provides some information and data which are not addressed further. Here are some:

  • Which puppy is called Fifi?
  • What's considered a location where the puppy learned the trick?
  • Is there a relation between breeds and puppies? And if so, how does that impact the facts about breeds, breeders and kennels?

On top of this are all the time related questions:

  • Do puppies relocate to different locations over time, and do we need to keep track of those?
  • Are we interested when the puppy learned a trick?
  • Do we need to know how old puppies are?

And since the article is about structuring, constraints are not really discussed.

  • Do all puppies have a name?
  • Can one puppy wear more than a single costume?
  • We assumed so, but can a puppy only learn a trick at a location where it is housed?

Conclusion

All these questions need to be answered by experts. The poster is a cute start to get the idea of normalization across, but there's so much more that drives the structure of the database. And these are barely addressed. The constraints are very important, as are the time-related topics. Some require more facts, others are merely technical implementations. It demonstrates the Fact Oriented Model is so much more than a way to derive a data model.

Do you still think normalization is hard, or would you want to adopt your own little puppy now? Whatever your next step, I certainly hope it is the right one. In the meanwhile you may download this project from our github page. Best of luck, and see you next time, online or in the park. :-)

The poster

rettigNormalizationPosterFrom an email exchange with Marc, I've learned about the background of it. I post relevant parts of his email, and his kind words, below.

Hello Marco,

Thank you for your kind note. I continue to be surprised at the life this poster has had. I’m gratified that people still find it useful, and grateful that it has given me a little window into conversations in the data and database communities over the years since I moved on to other quests.

Around the time that poster came out I was getting more heavily involved with object-oriented programming, working with people who saw object models as a kind of philosophical Truth. Closer, I think, to the “Fact Oriented Models” you describe. The work of object modeling certainly puts one in a different state of mind than I experienced when working with relational databases and questions of normalization.

I enjoyed reading your article. For what it’s worth coming from someone who has been away from these ideas since the mid-1990’s—I found it insightful!

As you surmised, the poster was purely an attempt to make basic concepts more accessible. At the time there were books (Codd and Date, for example), some dense articles in magazines, but little in the way of an accessible on-ramp to the ideas. Philip Chapnick, then editor of Database Programming and Design magazine, had the idea that such a poster would make a good subscription incentive for the magazine. He gave me the challenge, I did what I could, and here we are.
...
Lastly, I’m attaching a copy of the poster that fixes a bug. Somehow a version got out there which was missing a final sentence in the section on first normal form. The attached version restores it.
...
Cheers to you. Thanks for all you do.
Marc

CaseTalk

We make IT better. Together!