Do two tables with the same content break data normalization?

Do two tables with the same content break data normalization? - sql

[Assuming there is a one to many relationship between an individual and an address, and assuming there is a one to many relationship between an agency and an address.]
Given the following table structure:
Wouldn't you want to merge the two address tables together and instead of using a foreign key within each one use a tie table?
Like this:
Are they both valid for normalization or only one?

Depends what you want to do.
In your second example with the tie tables, if I want to do a mailshot to my customers then my query has to go out to the agency tie table to exclude any agency addresses.
Of course you could have an address type column to differentiate but then you have a more complex query for your insert statement.
So although "address" is a global idea, sometimes it is easier to have it segregated by context.
Secondly, your customer data would usually be changing much more than your agency data. There may also be organisational and legal requirements around storage of personal data that make it better to separate the two.
e.g. in a health records system I want to be able to easily extract / restrict client data and to keep my configuration or commissioning data separate.
Thus in all the client systems I have used, the model tends to be the first one you describe rather than the second.

Related

Should I use one-to-one relationships to avoid repetition even if the new table wasn't really needed?

I'm designing a database from scratch and I'm wondering if the way I'm using one-to-one relationships is correct.
Imagine I have a table that needs the columns city and country_id, the first being alphanumeric and the second being a foreign key to another table. Should I place these in a locations table and use a one-to-one relationship?
Another example:
I have a table with the factory information of a device like the serial number and other fields. These will later be used to register a device in another table. Of course this is a one-to-one relationship, but should the columns of the first table be in the second table instead? Have in mind that the registrations table has another 4-5 columns.
I've read a lot of times that these relationships can often be omitted. However, I like the separation of concerns that creating a new table can give, in some cases.
Thanks in advance!

This may be a duplicate question, e.g. see:
SQL one to one relationship vs. single table
There's no perfect answer to this question as it depends on use case, but here's the rule of thumb I recommend abiding by: if you can already envision the potential need for separate tables, then I would err on the side of splitting them and using a 1:1 relationship. For example, imagine in the future that you want to have some kind of one-to-many relationship between some new table and the country table, or between some new table and the device table. In these cases you probably don't want city information mixed up with the former, and you probably don't want device registration information mixed up with the latter. By keeping your DB schema normalized, you can better future-proof it and you can mitigate the need for (likely extremely painful) updates that may have otherwise cropped up.

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?

You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.

There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.

I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.

Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.

a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp

This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.

You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

SQL one to one relationship vs. single table

Consider a data structure such as the below where the user has a small number of fixed settings.
User
[Id] INT IDENTITY NOT NULL,
[Name] NVARCHAR(MAX) NOT NULL,
[Email] VNARCHAR(2034) NOT NULL
UserSettings
[SettingA],
[SettingB],
[SettingC]
Is it considered correct to move the user's settings into a separate table, thereby creating a one-to-one relationship with the users table? Does this offer any real advantage over storing it in the same row as the user (the obvious disadvantage being performance).

You would normally split tables into two or more 1:1 related tables when the table gets very wide (i.e. has many columns). It is hard for programmers to have to deal with tables with too many columns. For big companies such tables can easily have more than 100 columns.
So imagine a product table. There is a selling price and maybe another price which was used for calculation and estimation only. Wouldn't it be good to have two tables, one for the real values and one for the planning phase? So a programmer would never confuse the two prices. Or take logistic settings for the product. You want to insert into the products table, but with all these logistic attributes in it, do you need to set some of these? If it were two tables, you would insert into the product table, and another programmer responsible for logistics data would care about the logistic table. No more confusion.
Another thing with many-column tables is that a full table scan is of course slower for a table with 150 columns than for a table with just half of this or less.
A last point is access rights. With separate tables you can grant different rights on the product's main table and the product's logistic table.
So all in all, it is rather rare to see 1:1 relations, but they can give a clearer view on data and even help with performance issues and data access.
EDIT: I'm taking Mike Sherrill's advice and (hopefully) clarify the thing about normalization.
Normalization is mainly about avoiding redundancy and relateded lack of consistence. The decision whether to hold data in only one table or more 1:1 related tables has nothing to do with this. You can decide to split a user table in one table for personal information like first and last name and another for his school, graduation and job. Both tables would stay in the normal form as the original table, because there is no data more or less redundant than before. The only column used twice would be the user id, but this is not redundant, because it is needed in both tables to identify a record.
So asking "Is it considered correct to normalize the settings into a separate table?" is not a valid question, because you don't normalize anything by putting data into a 1:1 related separate table.

Creating a new table with 1-1 relationships is not a reasonable solution. You might need to do it sometimes, but there would typically be no reason to have two tables where the user id is the primary key.
On the other hand, splitting the settings into a separate table with one row per user/setting combination might be a very good idea. This would be a three-table solution. One for users, one for all possible settings, and one for the junction table between them.
The junction table can be quite useful. For instance, it might contain the effective and end dates of the setting.
However, this assumes that the settings are "similar" to each other, in a SQL sense. If the settings are different such as:
Preferred location as latitude/longitude
Preferred time of day to receive an email
Flag to be excluded from certain contacts
Then you have a data-type problem when storing them in a table. So, the answer is "it depends". A lot of the answer depends on what the settings look like, how they will be used, and the type of constraints on them.

You're all wrong :) Just kidding.
On a very high load, high volume, heavily updated system splitting a table by 1:1 helps optimize I/O.
For example, this way you can place heavily read columns onto separate physical hard-drives to speed-up parallel reads (the 1-1 tables have to be in different "filegroups" for this). Or you can optimize table-level locks. Etc. Etc.
But this type of optimization usually does not happen until you have millions of rows and huge read/write concurrency

Splitting tables into distinct tables with 1:1 relationships between them is usually not practiced, because :
If the relationship is really 1:1, then integrity enforcement boils down to "inserts being done in all concerned tables, or none at all". Achieving this on the server side requires systems that support deferred constraint checking, and AFAIK that's a feature of the rather high-end systems. So in many cases the 1:1 enforcement is pushed over to the application side, and that approach has its own obvious downsides.
A case when splitting tables is nonetheless advisable, is when there are security perspectives, i.e. when not all columns can be updated by one user. But note that by definition, in such cases the relationship between the tables can never be strictly 1:1 .
(I also suggest you read carefully the discussion between Thorsten/Mike. You used the word 'normalization' but normalization has very little to do with your scenario - except if you were considering 6NF, which I think is rather unlikely.)

It makes more sense that your settings are not only in a separate table, but also use a on-to-many relationship between the ID and Settings. This way, you could potentially have a as many (or as few) setting as required.
UserSettings
[Settings_ID]
[User_ID]
[Settings]
In fact, one could make the same argument for the [Email] field.

database normalization

EDIT:
Would it be a good idea to just keep it all under 1 big table and have a flag that differentiates the different forms?
I have to build a site with 5 forms, maybe more. so far the fields for the forms are the following:
What would be the best approach to normalize this design?
I was thinking about splitting "Personal Details" into 3 different tables:
and then reference them from the others with an ID...
Would that make sense? It looks like I'll end up with lots of relationships...

Normalized data essentially means that the same data is not stored multiple times in multiple places. For example, instead of storing the customer contact info with an order, the customer ID is stored with the order and the customer's contact information is 'related' to the order. When the customer's phone number is updated, there is only one place the phone number needs to be updated (the customer table) and all the orders will have the correct information without being updated. Each piece of data exists in one, and only one, place. This is normalized data.
So, to answer your question: no, you will not make your database structure more normalized by breaking up a large table as you described.
The reason to break up a single table into multiple tables is usually to create a one to many relationship. For example, one person might have multiple e-mail addresses. Or multiple physical addresses. Another common reason for breaking up tables is to make systems modular, so that tables can be created that join to existing tables without modifying the existing tables.
Breaking one big table into multiple little tables, with a one to one relationship between them, doesn't make the data any more normalized, it just makes your queries more of a pain to write.* And you don't want to structure your database design around interfaces (forms) unless there is a good reason. There usually isn't.
*Although there are sometimes good reasons to break up big tables and create one to one relationships, normalization isn't one of them.

Database Design for One to One relationships

I'm trying to finalize my design of the data model for my project, and am having difficulty figuring out which way to go with it.
I have a table of users, and an undetermined number of attributes that apply to that user. The attributes are in almost every case optional, so null values are allowed. Each of these attributes are one to one for the user. Should I put them on the same table, and keep adding columns when attributes are added (making the user table quite wide), or should I put each attribute on a separate table with a foreign key to the user table.
I have decided against using the EAV model.
Thanks!
Edit
Properties include thing like marital status, gender, age, first and last name, occupation, etc. All are optional.

Tables:
USERS
USER_PREFERENCE_TYPE_CODES
USER_PREFERENCES
USER_PREFERENCES is a many-to-many table, connecting the USERS and USER_PREFERENCE_TYPE_CODES tables. This will allow you to normalize the preference type attribute, while still being flexible to add preferences without needing an ALTER TABLE statement.

Could you give some examples of what kind of properties you'd want to add to the user table? As long as you stay below roughly 50 columns, it shouldn't be a big deal.
How ever, one way would be to split the data:
One table (users) for username, hashed_password, last_login, last_ip, current_ip etc, another table (profiles) for display_name, birth_day etc.
You'd link them either via the same id property or you'd add an user_id column to the other tables.

It depends.
You need to Look at what percentage of users will have that attribute. If the attribute is 'WalkedOnTheMoon' then split it out, if it is 'Sex' include it on the user's table. Also consider the number of columns on the base table, a few, 10-20, won't hurt that much.
If you have several related attributes you could group them into a common table: 'MedicalSchoolId', 'MedicalSpeciality', 'ResidencyHospitalId', etc. could be combined in UserMedical table.

Personally I would decide on whether there are natural groupings of attributes. You might put the most commonly queried in the user table and the others in a separate table with a one-to-one relationship to keep the table from being too wide (we usually call that something like User_Extended). If some of the attributes fall into natural groupings, they may call for a separate table because those attributes will usually be queried together.
In looking at the attributes, examine if some can be combined into one column (for instance if a user cannot simlutaneoulsy be three differnt things (say intern, resident, attending) but only one of them at a time, it is better to have one field and put the data into it rather than three bit fields that have to be transalted. This is especially true if you will need to use a case statement with all three fileds to get the information (say title) that you want in reporting. IN other words look over your attributes and see if they are truly separate or if they can be abstracted into a more general one.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas