What are the best examples of natural keys in SQL? - sql

I've been reading the great debates about natural vs surrogate keys in data modeling, and to be clear I'm not trying to get into that thorny question here. All I want to know is what are some of the best examples of good natural keys?
All I seem to find online are keys that someone thought might be good but turn out not to be, like social security numbers. (For that one: privacy concerns, not everyone has one, reused after death, can be changed after identity theft, can double as business tax id.)
My own guess is that internationally standardized codes (ISBN, VIN, country codes, language codes) would make good keys.

Invoice numbers, vehicle registration numbers, scheduled flight codes, login names, email addresses, employee numbers, room numbers, UPC codes. There are also many thousands of industry, public and international standards for everything from currencies, languages, financial instruments, chemical compounds and medical diagnoses. All of these are potentially good candidates for key attributes. Some sensible criteria for choosing and designing keys are: Simplicity, Stability and Familiarity (i.e. familiar within the business or other context in which they are used).
Some people seem to struggle with the choice of "natural" key attributes because they hypothesize situations where a particular key might not be unique in some given population. This misses the point. The point of a key is to impose a business rule that attributes must and will be unique for the population of data within a particular table at any given point in time. The table always represents data in a particular and hopefully well-understood context (the "business domain" AKA "domain of discourse"). It is the intention/requirement to apply a uniqueness constraint within that domain that matters.
For example, if my website requires each user to supply a unique email address when they register then email address may be a valid choice of key in the database supporting that website. The fact that there are other populations of people in other domains where email addresses are not required to be unique does not necessarily invalidate that choice of key for my website.

Assume, there is a table named person. When we use the columns LastName, FirstName and Address together as a key, then this will be a natural key as those columns are completely natural to people, and there is also a logical relationship between the columns in the table.

Your DNA code would be one really good example of a natural key in real life.

Related

need help answering question about sql data integrity

The question is:
For your final designed database, find a scenario in which a relatively prominent business data
integrity can not be ensured by your current primary keys and foreign keys, nor by adding directly
more of such keys or check clauses in the created tables.
In other words, the data integrity ensured by
the keys within the database may not be enough to ensure all the data integrity within the business
context.
Write a SQL statement that will determine if such a problem exists or not, and where, for any
given state of the database.
I am not too sure what this question is asking or how to approach it.
Need help writing a sql code for this question.
I think the question is asking you to define some business logic that cannot be encoded in the database. However, it then wants you to find conflicts that could occur because the business logic is not encoded in the database. This second part seems to be in conflict with the first, but not necessarily.
An example based on your previous assignment would be if a coach is suddenly sick and there are too few additional coaches to cover the booked clients, or some coaches are not qualified to replace the sick coach, or had previous conflicts with certain clients and therefore can't be assigned to those clients. Therefore, some training bookings must be cancelled.
The decision on which are best to cancel may be difficult or impossible to code in SQL, but you can use SQL to verify that all of the sick coach's slots have either been filled by others or cancelled after the external business logic is applied.
EDIT: I think the above scenario fits the question's requirement that you can't find the conflicts (such as clients that don't like certain coaches) in your existing foreign key relationships, but you can verify that the external logic is consistent with the final requirements (all slots accounted for).
Perhaps a better example is the traveling salesman problem: It is difficult to code the least cost routing in SQL, but it's easy to verify that all cities have been visited.
The scenarios where every row can have variable number of attributes and each attribute can have variable number of datatypes, is generally modeled using EAV datamodel. EAV wikipedia.
Here, attribute can have variety of values and so, we cannot enforce check constraint always. In some scenarios, if attributes finite list is not available, we cannot have foreign key for attributeID.
This datamodel is popular in the medical history datamodel, where every patient can have different kinds of symptoms.
May be this can be an example for a scenario, where data integrity cannot be completely enforced.

Primary key requirements

Is it a good idea to store phone number as a primary key on RDBMS? They are unique to nearly all of us. But my friend suggests it is not a good idea because of the following reasons.
What if two people in a family share a phone number?
What if a person does not have a phone number?
What are your insights, please let me know!.
I'd be against this idea, generally for reasons:
It is personally identifiable information and I'd recommend using it with caution if you're bound to GDPR. Some users might ask you to not use their phone numbers. It might later be required to hash or mask part of the phone number, or even completely get rid of it.
Value depends on user input even it is validated. There are several services which lend you a phone number for validation if you're not in the target country of the validator.
A schema needs to be defined of the phone number if it will contain country code, parentheses or spaces.
There should be a validation to prevent duplicates and null values.
In summary it is not a good idea to use a field which has a dependency to external facts. As others mentioned, using an autogenerated identifier for the ID and non-unique index for the phone number seems like a better approach.
A phone number certainly can make sense as a key but all depends on what you need to identify and how you intend to use it. There is no general right or wrong answer.
Three very good criteria (but not absolute rules) for choosing and designing keys are: Simplicity, Stability, Familiarity. Phone numbers are simple and familiar enough for many purposes. Whether they are stable enough is probably highly dependent on circumstances. For example you might require all your employees to supply a unique phone number for third-factor authentication but probably it's quite acceptable to change that number occasionally.
what is the purpose of having phone number as primary key, is it to identify a individual? if so one individual can have multiple phone numbers (mobile/home phone) so it is not advisable to use phone number as primary key.
Also your question is right what if a person does not have phone number.

Is this Library Management System ER diagram correct?

Quick question about an ER/EER Diagram.
I have made this Entity Relationship Diagram, but I have been told, that there is something wrong with it by a friend. Is there something wrong with it?
The ER diagram is a design of a Library Management System, where a member can borrow 5 books at a time. The rest of the functionality of the system is how a normal library functions.
Library Management System EER
i don't understand the utility of the relationship between the librarian and the card and i don't understand why the books are splitted in two entities.
I would do 3 entities:
-member
-card
-book
every member has one card, every card is of one member;
every member can take many books, every book can be taken by many members,
the relation between member and book create another table in the logic schema: loans. before inserting a new loan you can check if the member has alredy 5 active loans (by checking the attribute active in the loans table).
Your given context is incomplete for me. I do not see the whole description of your problem/situation, so I will answer based on assumptions, and the experience I had during my life. So let's see...
The tino user questioned the existence of two entities, title and volume, which is something important. Let me explain this for a moment, which will eliminate this as an error. Previously (a time ago) we had video rental stores (I don't know if this the right name where you live, english is not my native language). Remember? We used to go there to rent VHS tapes to watch at home.
What we rented were not films, but more copies/midia of them. A film will always have the same actor, director, title, etc., but a copy could have different attributes/properties, like the year that the media was manufactured, the available languages, the expiration year, among other things. So we had distinctly two different things.
But despite this, we have to consider whether there is a need to create two entities for persistence. We have to remember if we need to persist this information. If a copy/midia has no attributes, then it's entity should not exist, and what a user would rent really would be the movies titles.
In your case, the relationship between volume and title, I belive, is really expressing this discrepancy.
Let's talk about the relationship between librarian and title. What a librarian manages? Does It manages the titles that never change and are abstract things, or the physical objects present in the library? :)
Finally, let's talk about the borrows relationship. When we break down 1-N (or N-1) relationships, we always pass the primary key from the 1 side to the N side, solving the relationship to the formation of the Physical Model in a Entity-Relationship Diagram.
Despite this relationship here is a 0-5, to decompose it, we will not have exactly a 0-5 relationship. We would have in anyway to pass the primary key from the two sides to the table formed by this relationship. Therefore, here we have initially a N-N relationship between member and volume.
N-N relationships allow optional relations between entities. This means we can have the zero side cardinality here. To limit the number of books that can be rented, you need to implement a restriction/constraint with SQL, or with any procedural language in your database. In this case, you can implement a before insert trigger. This trigger has a duty to verify this restriction to allow or denny the completion of the operation as a whole.
Let it be clear that I'm not saying you should remove this notation. Your Conceptual Model should express it. But when you are decomposing, you have to remember that. I think you should just correct it.
Remember one important rule: Relations that have attributes/properties (the attributes/properties) can only exist in N-N relationships. If you have to put attributes/properties in a 1-N (or a N-1) relation, they (the attributes/properties) will always be on the N side. In summary, there are no N-1 (or 1-N) relationships with attributes in the relation. Only N-N relations can have attributes/properties. So be careful with this.
Any questions or clarification, please comment and I will answer.
I see no reason to distinguish member and card. Volume and Librarian don't have primary keys. Are they supposed to be weak entities? That doesn't make sense for Librarian and Volume needs an identifier to distinguish different copies.

Is there any abnormality in this modeling?

I am using a table to centralize the addresses. Customers and Suppliers have a reference to address.
A Customer has an Address, an Address may or may not be associated with a Customer.
A Supplier has an Address, an Address may or may not be associated with a Supplier.
To ensure that an address is not associated with more than one Customer or Supplier, I have a unique index on the Customer and Supplier tables on AddressID column.
I am suspicious that this relationship is abnormal, because I am not able to map it using Entity-Framework with FluentAPI.
Edit:
In my real scenario, the address table will have many more columns.
In fact this is an adaptation to simplify a complex scenario where
the address table is a financial release and the tables customer and
supplier are representing the origin of the financial release, like
Sale and Purchase.
Your model seems reasonable. The merits of having a second table when you want to enforce a 1-1 relationship may not be obvious to everyone.
I can think of two good reasons off-hand:
You want the addresses in one place so you can treat all addresses equivalently (say geo-coding them, standardizing them, extracting features).
The address column is long and many queries do not require it, so you gain efficiency by not storing the address with the rest of the data ("vertical partitioning").
And there may be other reasons. I can't speak to why EF makes such relationships difficult to express.
In your current design this is not EF issue and by it self it cannot prevent you to assign the same address to a customer and to a supplier. If you go this path you are the sole responsible of enforce this uniqueness through business rules and validations in your model.
In the other hand, the correctness (or not) of your model design, apart from what Linoff points out in his answer, depends of the nature of your problem and what important and address is to your business. For example, if this is an app for Post Office, then the Address merits an individual table, as it will be one of the core concepts to your application. But if not, with the current approach you are going to add complexity to your model.

How many foreign keys is too many?

After running across this article:
http://diovo.com/2008/08/are-foreign-keys-really-necessary-in-a-database-design/
It seems like a good idea to use foreign keys when designing a database. But when are you using too many?
For example, suppose I have a main table used to store a list of machinery part information that other programs make reference to with the following columns:
ID
Name
Colour
Price
Measurement Units
Category
etc...
Should I be making tables containing a list of all possible colours, units and categories and then setting these as foreign keys to the corresponding columns in my machine part info table? At what point would the benefit of using foreign keys out weight the fact that I'm making all these extra tables and relationships?
Any attribute for which you want to be able to state, with certainty, that there are only known, valid values present in the database should be protected with a foreign key. Otherwise, you can only hope to catch invalid values in your application code and whatever interfaces are created in the future.
It is NOT a bad thing to have more tables and relationships. The only issue -- and it usually is not one -- has to do with the overhead of maintaining the indexes that are used in enforcing those relationships. Until you experience performance issues you should create a foreign key relationship for every column that "should" have one (because the values need to be validated against a list).
The performance considerations would have to be pretty dire before I would be willing to sacrifice correctness for performance.
Every Design is a compromise of competing goals, so there are very few simple answers (except the wrong ones).
I would certainly put discrete measures such as name, color, category, measure unit, etc.. in their own key tables. Variable measures (cost, number of units ,etc..) not so much, unless you have units in standard size packages (i.e. only 1, 6, 12, etc..)
The simplest way to design a database is to start with the requirements. In one classical methodology, the requirements are summarized in an ER (Entity-Relationship) model. In this model, relationships between entities are not invented, they are discovered. If they lie within the scope of the information the database is supposed to cover, then they are part of the model. Period.
From there, when you turn to database design, you already know what relationships you need. You have a few decisions to make about the structure of your tables, but almost all the foreign keys that reference a primary key are a direct consequence of the requirements.
Of course, if you are at liberty to change the requirements as you go through the design process, then you can do anything you want.
Dimensional modelling covers all points of your question well. Having too many foreign key relationships can make query performance suffer. Kimball's Group Reader is a great introduction to Dimensional Design and how to translate customer requirements to a schema.
http://en.wikipedia.org/wiki/Dimensional_modeling
The main question to ask is 'how constrained does the data need to be?' Concerning the color of machine parts, I'd assume, it would be in everyone's best interest not to have burnt ciena and camomille as color options. So, a look up table for these would be best.