Primary key requirements - sql

Is it a good idea to store phone number as a primary key on RDBMS? They are unique to nearly all of us. But my friend suggests it is not a good idea because of the following reasons.
What if two people in a family share a phone number?
What if a person does not have a phone number?
What are your insights, please let me know!.

I'd be against this idea, generally for reasons:
It is personally identifiable information and I'd recommend using it with caution if you're bound to GDPR. Some users might ask you to not use their phone numbers. It might later be required to hash or mask part of the phone number, or even completely get rid of it.
Value depends on user input even it is validated. There are several services which lend you a phone number for validation if you're not in the target country of the validator.
A schema needs to be defined of the phone number if it will contain country code, parentheses or spaces.
There should be a validation to prevent duplicates and null values.
In summary it is not a good idea to use a field which has a dependency to external facts. As others mentioned, using an autogenerated identifier for the ID and non-unique index for the phone number seems like a better approach.

A phone number certainly can make sense as a key but all depends on what you need to identify and how you intend to use it. There is no general right or wrong answer.
Three very good criteria (but not absolute rules) for choosing and designing keys are: Simplicity, Stability, Familiarity. Phone numbers are simple and familiar enough for many purposes. Whether they are stable enough is probably highly dependent on circumstances. For example you might require all your employees to supply a unique phone number for third-factor authentication but probably it's quite acceptable to change that number occasionally.

what is the purpose of having phone number as primary key, is it to identify a individual? if so one individual can have multiple phone numbers (mobile/home phone) so it is not advisable to use phone number as primary key.
Also your question is right what if a person does not have phone number.

Related

Primary Key Type Guid or Int? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am wondering what is the recommended type for PK in sql server? I remember reading a long time ago this article but now I am wondering if it is still a wise decision to use GUID still.
One reason that got me thinking about it is, these days many sites use the id in the url for instance Course/1 would get the information about that record.
You can't really do that with a guid, which would mean you would need some new column that would be unique and use that, what is more work as you got to make sure each record has a unique number.
There is never a "one solution fits all". You have to carefully design your architecture and select the best options for your scenario. Both INT and GUID types are valid options like they've always been.
You can absolutely use GUID in a URL. In fact, in most scenarios, it is better to use a GUID (or another random ID) in the URL than a sequential numeric ID for security reason. If you use sequential ID, your site visitors will be able to easily guess other users' IDs and potentially access their contents. For example, if my profile URL is /Profiles/111, I can try Profile/112 and see if I can access it. If my reservation URL is Reservation/444, I can try Reservation/441 and see what happens. I can easily guess other IDs in the system. Of course, you must have strong permissions, so I should not be able to see those other pages that don't belong to my account, but if there is any issues or holes in your permissions and security, a breach can happen. While with GUID and other random IDs, there is no way to guess other IDs in the system, so such a breach is much more difficult.
Another issue with sequential IDs is that your users can guess how many accounts or records you have and their order in your database. If my ID is 50269, I know that you must have almost this number of records. If my Id is 4, then I know that you had a very few accounts when I registered. For that reason, many developers start the first ID at some random high number like 1529 instead of 1. It doesn't solve the issue entirely, but it avoid the issues with small IDs. How important all that guessing is depends on the system, so you have to evaluate your scenario carefully.
That's on the top of the benefits mentioned in the article that you mentioned in your question. But still, an integer is better in some areas, so choose the best option for your scenario.
EDIT To answer the point that you raised in your comment about user-friendly URLs. In those scenarios, sequential numbers is the wrong answer. A better solution is a unique string in the URL which is linked to your numeric ID. For example, the Cars movie has this URL on IMDB:
https://www.imdb.com/title/tt0317219/
Now, compare that to the URL of the same movie on Wikipedia, Rotten Tomatoes, Plugged In, or Facebook:
https://en.wikipedia.org/wiki/Cars_(film)
https://www.rottentomatoes.com/m/cars/
https://www.pluggedin.ca/movie-reviews/cars/
https://www.facebook.com/PixarCars
We must agree that those URLs are much friendlier than the one from IMDB.
I've worked on small, medium, and large scale implementations(100k+ users) with SQL and Oracle. The major of the time PK type of INT is used when needed. The GUID was more popular 10-15 years ago, but even at its height was not as populate as the INT. Unless you see a need for it I would recommend INT.
My experience has been that the only time a GUID is needed is if your data is on the move or merged with other databases. For example, say you have three sites running the same application and you merge those three systems for reporting purposes.
If your data is stationary or running a single instance, int should be sufficient.
According to the article you mention:
GUIDs are unique across every table, every database, every server
Well... this is a great promise, but fails to deliver. GUID are supposed to be unique snowflakes. However, reality is much more complicated than that, and there are numerous reasons why they end up not being unique.
One of the main reasons is not related to the UUID/GUID specification, but by poor implementations of it. For example some Javascript implementations rank as the worst ones, using pseudo random numbers that are quite predictable. Other implementations are much more decent.
So, bottom line, study the specific implementation of UUID/GUID you are and will be using. Don't just read and trust the specification. Otherwise you may be up for a surprise, when you get called at 3 am on a Saturday night by angry customers.

What are the best examples of natural keys in SQL?

I've been reading the great debates about natural vs surrogate keys in data modeling, and to be clear I'm not trying to get into that thorny question here. All I want to know is what are some of the best examples of good natural keys?
All I seem to find online are keys that someone thought might be good but turn out not to be, like social security numbers. (For that one: privacy concerns, not everyone has one, reused after death, can be changed after identity theft, can double as business tax id.)
My own guess is that internationally standardized codes (ISBN, VIN, country codes, language codes) would make good keys.
Invoice numbers, vehicle registration numbers, scheduled flight codes, login names, email addresses, employee numbers, room numbers, UPC codes. There are also many thousands of industry, public and international standards for everything from currencies, languages, financial instruments, chemical compounds and medical diagnoses. All of these are potentially good candidates for key attributes. Some sensible criteria for choosing and designing keys are: Simplicity, Stability and Familiarity (i.e. familiar within the business or other context in which they are used).
Some people seem to struggle with the choice of "natural" key attributes because they hypothesize situations where a particular key might not be unique in some given population. This misses the point. The point of a key is to impose a business rule that attributes must and will be unique for the population of data within a particular table at any given point in time. The table always represents data in a particular and hopefully well-understood context (the "business domain" AKA "domain of discourse"). It is the intention/requirement to apply a uniqueness constraint within that domain that matters.
For example, if my website requires each user to supply a unique email address when they register then email address may be a valid choice of key in the database supporting that website. The fact that there are other populations of people in other domains where email addresses are not required to be unique does not necessarily invalidate that choice of key for my website.
Assume, there is a table named person. When we use the columns LastName, FirstName and Address together as a key, then this will be a natural key as those columns are completely natural to people, and there is also a logical relationship between the columns in the table.
Your DNA code would be one really good example of a natural key in real life.

Using GUIDs for Custom Tables?

As far as I know, SAP CRM and HANA both utilise GUIDs to uniquely identify records instead of using classic incremented integers. Are there best practices or clear guidelines that cover their use?
Here are some factors I've considered in favour of GUIDs:
Offline creation of objects. IIRC GUIDs are near-guaranteed to be unique in these situations so merging or integration of disparate data sets is not an issue.
Surrogate keys have distinct development advantages. While incrementing integers are a form of surrogate key, use of different number sequences can impose a functional meaning on them.
And some scenarious that favour classic keys:
Users require human-readable keys to identify records in the system. This can be handled in GUID tables by also specifying an external ID with a readable value.
Users want to use number sequences to identify different types of records, similar to sales or purchase documents. Though I actually consider this bad design.
What scenarios for custom development would make you prefer GUIDs over classic keys?
Is blanket-usage of GUIDs for all tables a good idea?
To answer the question at the end: No, it isn’t (at least not in an ABAP environment, and I doubt it’s sensible elsewhere). Using GUIDs for primary keys everywhere makes it awfully hard to maintain and follow complex foreign key relationships at runtime. Just imagine having to debug a program that handles everything using GUIDs instead of the semantic keys you’re used to. And remember that the total length of the primary key may not exceed 255, and the total length of the primary key should not exceed 120 if you want to be able to transport table entries using fully qualified keys. Using GUIDs in composite keys blows the keys up unnecessarily, and using them as synthetics keys makes using foreign key relationships virtually impossible. So no, using GUIDs everywhere is not a good idea, especially not for configuration / customizing data.
It is however a good idea to use GUIDs in almost every place where you would have used a number range object in “old-school ABAP development”. GUIDs can be generated by the application server, while number ranges require network communication to the enqueuing server. (Yes, there is some buffering involved, but generally speaking, GUIDs are a lot faster and easier to handle). So unless you need your keys to follow a certain pattern, you should consider using a GUID. Even if you need some kind of sequential number for whatever business reasons, it might be sensible to use a GUID as the primary key and store the sequential number inside an (indexed) attribute to increase flexibility at development time.

About GUID usage

Wiki said it used to make class,interface uniquely identifier , how about object (actual instance) ??
When work with SQL,i also see the GUID for ID field (table user,..etc in database aspnetdb in asp.net MVC template project)
So I want to clearly understand the GUID usage, which case should use it , and is it really unique ,
Any explain appreciated
thank
For a good overview of what a GUID is, check out our good friend Wikipedia: GUID.
and is it really unique
GUIDs generated from the same machine are virtually guaranteed to be unique. You have an infinitesimally small chance of generating the same one twice on the same machine. Arguably you have a tiny chance of generating two GUIDs the same out in the wider world, but that chance is still small and the chances of those two GUIDs ever meeting are also pretty small. In fact you probably have a greater chance of the Large Hadron Collider generating a black hole that swallows the Earth than you would having two identical GUIDs meeting somewhere on a network.
Because of this, some people like to use it as the primary key for database tables. Personally i don't like to do this because:
an auto-incrementing integer gives me enough uniqueness to be able to use it as a primary key
GUIDs are a massive PITA to deal with when you are writing SQL queries.
Wiki said it used to make class,interface uniquely identifier
If you need an identifier that is unique across several disparate areas (like hives in a registry), then GUIDs are a good solution. In this particular case they are being used to identify a type. A concrete instance could also internally use a GUID identifier, but this is really only useful for data objects.

OOP: Is it going to far to create a phone number object, or an address object?

Many things can have phone numbers and addresses. . . people, places, etc. You want phone numbers and addresses to have the same functionality, format and validation whether it is a phone number or address for a person or a place etc.
Is it going to far to create a phone number class, and an address class, and use them in those objects that have phone numbers and addresses?
My question goes to other properties as well that could be reuseable across diverse objects.
Yes, you can go too far and this is borderline. I tend to draw the line at the point where it becomes cumbersome to treat things as more than a string, or another already defined class/type.
If you need to somehow manipulate phone numbers (by, for example, separating them into area code and other bits) or addresses (number, street, city, country and so forth) then, yes, consider making them objects.
I rarely do anything with phone numbers or addresses other than store and display them, in which case they're fine as strings without having to have their own dedicated class. For addresses, I don't even impose a separation based on parts (except maybe the zipcode), preferring free-format entry so as to not annoy those with addresses of a format I don't know about.
Going the reductio ad absurdum route, you could also objectify the characters that make up your phone number but that would be silly.
I think it would be perfectly acceptable. A well designed class will allow you to reuse it in many different projects. If you have many projects that could use this sort of functionality, using an object is the perfect way to ensure that your code is reusable and portable. The extensibility and the potential for you to extend the functionality of your class to handle anything phone number/address related would be unmatched by a set of functions or once off code you rewrite over and over.
In the end it's your call, personally I think it would fall under good practice though.
You need an Entity Class and Address Class.
Entity can be person, place, organisation, coffee shop kinda, whereas Address can capture Phone number, emailid, Lat/Long kinda stuff.
Keeping Entity and Address will help you across diverse objects.
and having many to many relation ship among entity and address would help, having loose coupling wud help on long run.