Foreign Keys. Who Needs Them?

Foreign Keys:  Should you use them or are they to be avoided like the plague? First off, what is a foreign key? I’m going to assume most of you know what they are, but just in case, this is how Wikipedia defines them:

In the context of relational databases, a foreign key is a referential constraint between two tables. The foreign key identifies a column or a set of columns in one (referencing) table that refers to a column or set of columns in another (referenced) table.

Basically a foreign key is how the database relates two tables.

Don’t Join w/o a Foreign Key

Is it the foreign key that relates two tables or is it really the data in the table that does this? I mean, can’t we just do a join between two tables and resolve any relationship? You do resolve a relationship with a join, but, when you do a join you are resolving a relationship between the records in the tables, not the tables themselves.

You might wonder if you can resolve a relationship between records without defining a relationship between tables, wouldn’t it be easier and more flexible to just not bother defining the relationship? But by not defining the relationships between your tables with foreign keys you are keeping these relationships a secret from yourself, your teammates and your relational database and this will make your life harder. Now doesn’t keeping the relationships a secret from a RELATIONAL DATABASE seem like a naughty thing to do?  It sure does, and it certainly is naughty.

When you define these foreign key relationships you provide two very important features to your database. First off, you document a map that you, your friends, and your programming tools can follow in order to derive relationships and create the joins we talked about earlier, and secondly you enforce the relationship.

The Truth About Enforcement

So what does enforcement of a relationship get you?  After all, that sounds restrictive and processor intensive! You’re saying, “I don’t want that.  My database must perform at its best and if that means forgoing the niceties of foreign keys then I have to make that sacrifice.”  I will need to address this in two parts.

First, let me talk about restrictiveness. Isn’t restrictiveness bad? It sounds bad. But it is not bad, at all.  It is a good and necessary component of any and all databases or other forms of structured data. Think about it. When you specify a field as a number is that helpful or unhelpful? It lets people inserting the data know they need to insert a number and it guarantees to anyone accessing that information that they are going to get a number. What is the answer to 2 + apple? This is the kind of silly thing a computer will try to do if we don’t make sure it only gets numbers when that is what it expects. That is why restricting data is a good and necessary thing to do. So when you define a foreign key you are saying that this relationship exists and can be relied on and that makes sure that the computers, programs and people that rely on those relationship don’t fail or pull their hair out.

Secondly, I must address the concerns on performance. Surely all of this checking and enforcing of a foreign key relationship must introduce overhead that is going to drag the database down. Thankfully, as it turns out this is not the case. In fact, foreign keys will speed up your database in a way you might not expect. Lets think about what a computer must do to ensure a foreign key relationship is not violated. Basically, whenever someone inserts anything into a table that has such a relationship with another table the database must look at the other table and ensure that whatever the record is saying it relates to actually exists over in that other table. So there is extra work to be done, BUT when you make sure that the column that you use to specify the relationship is a primary key this look up will be lightning fast, and this, in fact, should always be the case. Still, even if the look up is lightning fast and the insert is lightning fast, isn’t that twice as slow as the database was before? The problem with drawing this conclusion is that the insert was never lightning fast to begin with. It wasn’t, and if it was you are probably doing something wrong. Any DBA worth her salt knows every table must have a primary key. I won’t get into explaining why here, but if I need to I can address that in a separate post. Maintaining this primary key is intensive and it slows inserts down many orders of magnitudes, but in so doing make retrieval faster than magic. So when you do an insert on a table with a foreign key, the database actually does a lightning fast lookup followed by a relatively snail paced insert as is always the case. What this means is that the enforcement of the foreign key goes unnoticed from a performance perspective due to the economies of scale. So how does a foreign key speed up a database? The short answer is that they ensure your database is in an optimized state. When you take it upon yourself to define foreign keys you identify to yourself and to everyone else how your database is to be used and this allows you to understand how to best optimize it as well as provide the database engine with the information it needs to best optimize your queries.

Conclusion

Foreign keys are necessary components of any database.  Attempting to avoid them is one of the most destructive things that can be done to a database and any organization the relies on it. Everyone should take the time to learn more about them and make sure they are a part of any project they work on.

Epilogue: When we can Forego Foreign Keys

Lastly, I want to mention there are times when foreign keys constraints and primary keys can be omitted. As with most things, you can use a database out of the context of its original purpose. For example, were not wrong for building our houses out of wood. While trees originally evolved for other reasons all together, using it’s wood to build a house can provide additional rewards. The same is to be said for a database table.

One case of out of band usage could be an effort to increase the performance of inserts. How this would be done is to remove primary key and other indexes on the target table. Duplicated records and inconsistent data would be allowed, but under the circumstances it would be more efficient to deal with these details later. Unfortunately, retrieving specific records would be like trying to find the part where “Bob misplaces his teeth” in a book you have never read before without the use of any references. There really is no good strategy for accomplishing this. Therefore, in loading this table, what you have created can’t be used for retrieving specific records. This table is essentially just a queue, as you would be best served reading and acting upon its contents sequentially. You do not need a relational database if all you need is a queue. So while this proves a table without foreign keys can be useful, it is not an example of when foreign keys can be omitted within the context of what I’m discussing in the article and what the database is actually needed for.

What you need a database for typically happens after you dequeue something. At this point you will generally want to make some use of the data, but if you can’t do that right now, you will want to stash it somewhere that is quick to find later on. Finding things, albeit maybe not teeth, is what databases are there for. Sure they can store things, but the file system can do that with much less ceremony. What you need a database for is finding things, and a database won’t help you much here if it doesn’t maintain an index. Maintaining an index makes the insert so slow the foreign key constraints have effectively zero impact on performance and so this can’t be used as an excuse for their absence.

A table without a primary key and dependent foreign key references is not a replacement for one with them. Such a table is nothing more than an accessory and provides no value within the context of an actual data model. If such a table schema is all someone has they don’t have a data model at all and therefore are not experiencing the advantages that one would provide. This is unfortunate because such an individual would believe that they do have one and might discredit database theory all together.

About these ads

About this entry