The main feature in RDBMS (Relational Database management system) is the relational or the relationship part of it. If done correctly, it makes the connections among entities easy and faster to retrieve.
Business rules will dictate that, so a clear understanding of the rules will make it easier to identify those relationships and how they should be connected.
These are some terms that should be understood in order to talk about relationships in the database world.
- Primary Key
- Surrogate Key
- Foreign Key
- Natural Key
- Candidate Key
- Unique Key
- Composite Key
- NOT NULL Constraint
- Parent table
- Child table
To be able to make a good database design and make a database model that works, the terms above must be understood throughout.
Let’s understand those terms, before we talk about relationships.
Primary Key is the unique identifier of a row. It can have one or more columns. If it has more than 1 column, it will be called a composite key. Every time a primary key is created, a clustered index is also created. The clustered index holds all the column of a row.
The clustered index, in this case the primary key is ordered in the table. So, when an INSERT happens, SQL Server must make sure it will insert in the right order.
Most of the times, we can find combinations that will make the row unique. In that case, these rows are candidate keys to become a primary key. However, we may not have control of the datatype, or we don’t really know the rule behind that that column specification.
A good example is the Social Security Number, for a time, many people thought that SS numbers were unique per person. We know, however, that it is mathematically impossible to be unique for every single person, when the total number it can hold is less than the total individuals that lived, live, or will live in the U.S.
Another option to make it “unique”, could be the association of this number with the name of the person and date of birth. If we add all these two other columns to the social security number, we will now have a very possible unique identification in the table. Those 3 would make a almost “perfect” candidate key. However, I am not sure, if, this is what IRS really does to identify a person or if they use something internally that I am not aware of.
Since IRS, as far as I know, manages these SS numbers, they could tomorrow say that these numbers now would also have letters and an X amount of characters will be added to it or removed from it. Now, if I have a column datatype of type VARCHAR(9) that holds social security numbers, if this column is used as primary key, I will need to drop it first, before I recreate as different datatype. I am also not sure what application is using that and how it may be affected. In other words, too much burden, maintenance and doubts.
Surrogate Keys are the solution for this problem. This key is a non-business meaning key, that will guarantee that the primary is unique and under my control. So instead of using Social Security Number as my primary key, which is a candidate key, and will talk more about it later, I will use a key that I created and defined the datatype as the primary key.
Surrogate keys can be of any data type, however, the most used is a numeric integer number. There is also an explanation for it to be numeric. As I said before, a primary key is sorted. Numeric values are easier to sort. Numeric values will help particularly on data manipulation language.
You will still find database applications with strings or uniqueidentifier types for this key, but the hard and cold truth is, it will cause issues. So, do not do it!
Foreign Key is a primary Key or unique Key of another table. It sets, first, the relationship between two tables and second ensures data integrity. What do I mean by integrity? Imagine Table A has a Primary Key named A_ID. I can make that A_ID a foreign key in table B. If I try to insert anything in table B, but in that insert the _AID value does not exist in table A. It will fail because SQL Server is expecting a value existing in table A. The relations must exist no matter what!
Foreign Keys do not have to be unique in a table, we can force it to be, but it is not needed. One table can have foreign keys referencing many different tables at once. A table can reference itself as well. I this case, table A would have a primary key and a foreign key that references its own primary key.
Natural Key is a column that by its nature could be a primary key. So, I bring back the SSN example again, it would fit almost perfectly as a primary key, if it was unique. Another good example is an email address. An email should be unique per person, in this case it is a natural key for sure. If wanted, they can be the primary key of that table. However, control could be an issue again.
Candidate Keys are columns that could be a primary key. Yes, it is like a natural key, but at this point we are only looking for possibilities. So, why do I care about them, if I will end up creating a surrogate key at the end of the day? The answer is simple, in order to be able to normalize this data later, we must know what each column represents.
Unique Keys are created every time we create a primary key or any time we add an UNIQUE key constraint to the table. It ensures that that the column has a unique value thought the table. The key, just like a primary key, can also be reference by another table as a foreign key.
Composite Key is a combination of one or more columns, and they can also uniquely identify a row in a table. Their association will become a primary key.
NOT NULL is a constraint that will enforce that the column is not empty. It can be blank, but not empty. NULL is the default value and will accept empty values. I say empty, but let’s understand 1 thing. SQL Server interprets NULL value as unknown. So, if you try to do operations with null values, you may get unexpected results.
Parent table is the table that holds the primary key as foreign key in a related table.
Child table host the primary key of the parent table as one of the columns in its table.
When designing a database these objects must be understood to be able to make an excellent database design. They will be dictated by many of the business rules and will define relationship and how data should be manipulated and/or retrieved. If done correctly, development will be smoother and onboarding new database professionals will also be easier.
It will also make the normalization process suitable and easy going.
There are three different relationships when designing a database application. They are:
- One to one
- One to many
- Many to many
One to One Relationship
As the name implies, in this scenario, we would have two different tables connecting to each other with the first table having a record and the second table having one single record associated with it. Many database professionals believe that since these two tables is mapping one to one, why not have it in the same table and avoid a join?
There are few reasons, and let’s start with the 8kb page mentioned above. Let’s say that we have a field in user table and that that field is PersonBio taking NVARCHAR(MAX) as the datatype. In the worst-case scenario, that will fill out the whole 8kb page. If I have that field in association with other fields, we will most likely have a page split.
Also, since this field will be part of the clustered index page, all the columns including this one, may live in the cache area, which will cause the database to clean data more often making the system somewhat slow.
So, if the column in question utilizes any of those datatypes to its MAX. We should consider putting them in a different table, specially if that field is not accessed very much. I certainly do not want things in memory if they are not utilized that much.
I used that TEXT with MAX example to help illustrate the problem, but it can apply to any column that is not very utilized. Let’s say that we have a table with 30 columns but out of those 30 columns only 20 are frequently queried, why not separate those 10 into a detail table with a one to one relationship? That sounds like a good idea!
Another good reason to have a one-to-one relationship is for fields that for, private purposes, should not be queried or live in the same area as other information. Columns like username and password are good candidates to be separated in a one to one relationship. This would mitigate security risks for accessing that data without necessity.
Here is an example:
Notice that PersonID is in both tables. The parent table, in this case Person has a Primary key named PersonID. The child table has a primary key also called PersonID. This will also be the Foreign key referencing Person. In that way uniqueness will be enforced as well.
There is also another approach you may see. Check a very similar example:
In this case, we have the same parent table, Person, holding the primary key, and the child table, PersonDetail, holding a Foreign key column called PersonID, which is referencing to the Person table. To make sure this works as one to one, we need to create a unique key on PersonID in the child table.
One to many relationship
In this case we would have one record in a table connecting to many records in a different table. This concept is much easier to understand than the one to one because in this case, we can easily notice the need for this.
A person, as we know, may have many different emails. However, that email can only belong to 1 single person. Now we have a one to many relationship.
See the tables below:
In that example we see the parent table called Person with a column named PersonID. PersonId is the primary key. We then see the child table called Email with a foreign key of the PersonID in the third column.
Can you see how Joe (PersonID = 1) has more than 1 email stored? That characterizes the one to many relationship. List table are the ones to usually hold this type of data. The connection is usually between a Core or parent table with this child or list table. In tables that are normalized to the 3rd or above form, will see these table rightfully hanging all over
Many to many relationship
In this case, we would have a table that connects with another table, but both can have multiple records of each other. For example, a person from the Person table, who buys products from the Product table. The customer can buy many different products from the Product table. The same product can also be bought by different customers.
In the example above we have 2 different tables. Notice that Joe can buy, if he wants, more than one type of phones. If he buys 1 iPhone and 1 Galaxy for example, he would now have to records on the product table to be connected.
Notice also that an iPhone can be bought by any of the other persons. So, if John and Jennifer also bough an iPhone 12, those records would have to be connected show how.
That would be a many to many relationship. The question then is how to connect them? In previous examples, we saw the relation of one parent table with one child table. This will not work here, without making the 2 tables extremely redundant.
We should now use a Cross-reference table to connect those two tables. Those two tables are, now, parents, and the cross-reference table is the child.
Now, this new table called Order, has a reference to the Primary key of the Person table and a reference of the primary key of the Product table. The primary key of the order table could be all three columns forming a composite key.
These cross-reference table start to show up in our designs the moment we start eliminating partial dependencies. Which is very common when normalizing from 1st to 2nd normal form.
In this article, we talked about the main terminologies when designing a database application. We also talked about the relationships among tables and how to determine them.
Designing a database requires some deep thought to make sure that it will be easier to develop, to understand, and to use once available.