Why would a database architect choose to de-normalize referenced child tables

Why would a database architect choose to de-normalize referenced child tables - sql

Why would a DBA choose to have a large, heavily referenced lookup table instead of several small, dedicated lookup tables with only one or two tables referencing each one. For example:
CREATE TABLE value_group (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
group_name VARCHAR(30) NOT NULL
);
CREATE TABLE value_group_value (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
value_group_id INT NOT NULL,
value_id INT NOT NULL,
FOREIGN KEY (value_group_id) REFERENCES value_group(id)
);
CREATE TABLE value (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
value_text VARCHAR(30) NOT NULL
);
Example groups would be something along the lines of:
'State Abbreviation' with the corresponding values being a list of all the U.S. state abbreviations.
'Name Prefix' with the corresponding values being a list of strings such as 'Mr.', 'Mrs.', 'Dr.', etc.
In my experience normalizing these value tables into tables for each value_group would make changes easier, provides clarity, and queries perform faster:
CREATE TABLE state_abbrv (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
abbreviation CHAR NOT NULL
);
CREATE TABLE name_prefix (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
prefix VARCHAR NOT NULL
);
With n tables like that for n groups in the value_group table. Each of these new tables could then be directly referenced from another table or using some intermediary table depending on the desired relationship.
What factors would influence a DBA to use the described the first setup over the second?

In my experience, the primary advantages of a single, standardized "table of tables" structure for lookups are code reuse, simplified documentation (if you're in the 1% of folks who document your database, that is) and you can add new lookup tables without changing the database structure.
And if I had a dollar for every time I saw something in a database that made me wonder "what was the DBA thinking?", I could retire to the Bahamas.

Related

What is the best way to construct database tables for many to many relationship (with additional condition)?

Business problem: Suppose that we have a few medical centers and doctors, who work in these centers. Obviously, many doctors can work in one center. But also one doctor can work in many centers at the same time. And we have to store information about who is the head doctor of each medical center (each medical center can have only one head doctor and one doctor can be the head doctor in multiple centers).
Question: What is the best way to construct database tables to serve these business requirements?
I see two variants (described below) but if you see more, please, let me know.
Variant 1
In this variant, we store information about the head doctor in the join table jobs. I see two disadvantages here:
the column jobs.is_head will contain false in most cases and it looks strange (and looks like we store unnecessary information).
we need somehow to restrict adding two head doctors into one center.
create table doctors
(
id bigint not null
constraint doctors_pk
primary key,
name varchar not null
);
create table medical_centers
(
id bigint not null
constraint medical_centers_pk
primary key,
address varchar not null
);
create table jobs
(
medical_center_id bigint not null
constraint centers_fk
references medical_centers,
doctor_id bigint not null
constraint doctors_fk
references doctors,
is_head boolean not null,
constraint jobs_pk
primary key (doctor_id, medical_center_id)
);
Variant 2
In this variant, we store information about the head doctor in medical_centers table. Two disadvantages again:
we have two types of relationships between tables now: many to many and one to many (because one doctor can be the head doctor in multiple centers), which is a bit complicated, especially considering that I want to use this schema through ORM framework (JPA implementation).
we have to somehow restrict setting doctor as a head doctor if this doctor is not working in this center.
create table doctors
(
id bigint not null
constraint doctors_pk
primary key,
name varchar not null
);
create table medical_centers
(
id bigint not null
constraint medical_centers_pk
primary key,
address varchar not null,
head_doctor_id bigint
head_doctor_id_fk
references doctors
);
create table jobs
(
medical_center_id bigint not null
constraint centers_fk
references medical_centers,
doctor_id bigint not null
constraint doctors_fk
references doctors,
constraint jobs_pk
primary key (doctor_id, medical_center_id)
);

It is a trade - off, but operational complexity or some(?) storage. I think variation 1 looks good. If you want to change a little bit you can add another table called "med_cen_heads" with unique constraint on med_center_id column. Thus, we prevent adding a second doctor to same medical center. The hard part is checking if the head doctor works in the medical center or not before insert.
INSERT INTO med_cen_heads
SELECT medical_center_id, doctor_id
FROM jobs
WHERE EXISTS (SELECT 1 FROM jobs WHERE medical_center_id = 'medical_center_id_to_insert' and doctor_id 'doctor_id_to_insert');
Also, you can create "before insert trigger" to check if values exist in jobs table.
It could look like this:
create table doctors
(
id bigint not null
constraint doctors_pk
primary key,
name varchar not null
);
create table medical_centers
(
id bigint not null
constraint medical_centers_pk
primary key,
address varchar not null
);
create table jobs
(
job_id serial,
medical_center_id bigint not null
constraint centers_fk
references medical_centers,
doctor_id bigint not null
constraint doctors_fk
references doctors
);
create table med_cen_heads
(
medical_center_id bigint unique not null,
doctor_id bigint not null
);
This will save you from unnecessary storage but, BOOLEAN data type is just 1 byte. Let' s assume you have 1 billion med_center - doctor pairs(I don' t think you will have). In this way you only store 0.93132 GB more for your extra column. Of course, becuse there is only one head doctor in a medical center this column will be skewed. Yet when you query normal doctors this column will not be your concern, you should use "doctor_id" or any other columns.
In short, from my point of view stick with variation 1 with this small change:
create unique index unique_row on jobs(medical_center_id) where is_head;
Check you cannot add a second head doctor to a medical center.

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?

Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

SQL: How to link two tables that don't share a column name without creating a composite table?

I have to create a database named "Elections" and then write some queries to get the answers provided by my teacher.
I created the database. My issue is that I don't know how to link two tables (candidate and constituency) because they do not share any primary or foreign key.
The teacher is saying that a composite table should not be created in order to link those two tables.
Please see picture (the tables are linked on the pictures but I do not know how to do it when I create the database).
I am also including:
the query that I have to write. The thing is, without knowing how to link those two tables, I cannot write the query.
the SQL code representing the creation of those two tables.
QUERY:
1. Display the number of candidates eliminated in the first
round, for each constituency, and show the constituency number,
and name.
SQL CODE:
CREATE TABLE CANDIDATE (
CANDIDATE_NB smallint CONSTRAINT PK_CANDIDATE_NB PRIMARY KEY NOT NULL,
PARTY_NB smallint NOT NULL,
ROUND tinyint NOT NULL
);
alter table CANDIDATE ADD CONSTRAINT FK_PARTY_NB FOREIGN KEY (PARTY_NB) REFERENCES PARTY(PARTY_NB);
CREATE TABLE CONSTITUENCY (
CONSTITUENCY_NB smallint IDENTITY (100, 100) CONSTRAINT PK_CONSTITUENCY_NB PRIMARY KEY NOT NULL,
CONSTITUENCY_NAME varchar(20) NOT NULL,
NB_REGISTERED smallint NULL,
TOTAL_CANDIDATES smallint NULL
);
I am trying to understand how to link those two tables but I really can't think of anything apart from creating a composite table, which is not what has to be done as per my teacher's instructions.

Best practice for verifying correctness of data in MS SQL

We have multiple tables with different data (for example masses, heights, widths, ...) that needs to be verified by employees. To keep track of already verified data, we are thinking about designing a following table:
TableName varchar
ColumnName varchar
ItemID varchar
VerifiedBy varchar
VerificationDate date
This table links the different product id's, tables and columns that will be verified, for example:
Table dbo.Chairs
Column dbo.Chairs.Mass
ItemId 203
VerifiedBy xy
VerificationDate 10.09.2020
While creating foreign keys, we were able to link the ItemID to the central ProductsID-Table. We wanted to create two more foreign keys for database tables and columns. We were unable to do this, since "sys.tables" and "INFORMATION_SCHEMA.COLUMNS" are views.
How can I create the foreign keys to the availible database tables/columns?
Is there better way how to do such a data verification?
Thanks.

You can add a CHECK constraint to verify that the correctness of the data which is inserted/updated in the columns TableName and ColumnName, like this:
CREATE TABLE Products (
ItemID VARCHAR(10) PRIMARY KEY,
ItemName NVARCHAR(50) UNIQUE
)
CREATE TABLE Chairs (
ItemID VARCHAR(10) PRIMARY KEY,
FOREIGN KEY (ItemID) REFERENCES dbo.Products,
Legs TINYINT NOT NULL
)
CREATE TABLE Sofas (
ItemID VARCHAR(10) PRIMARY KEY,
FOREIGN KEY (ItemID) REFERENCES dbo.Products,
Extendable BIT NOT NULL
)
CREATE TABLE Verifications (
TableName sysname NOT NULL,
ColumnName sysname NOT NULL,
ItemID VARCHAR(10) REFERENCES dbo.Products,
VerifiedBy varchar(30) NOT NULL,
VerificationDate date NOT NULL,
CHECK (COLUMNPROPERTY(OBJECT_ID(TableName),ColumnName,'ColumnId') IS NOT NULL)
)
You need to grant VIEW DEFINITION on the tables to the users which have rights to insert/update the data.
This will not entirely prevent wrong data, because the check constraints will not be verified when you drop a table or a column.
However, I don't think this is necessarily a good idea. A better (and more conventional) way would be to add the VerifiedBy and VerificationDate to the Products table (if you can force the user to verify all the properties at once) or create separate columns regarding each verified column (for example LegsVerifiedBy and LegsVerificationDate in the Chairs table, ExtendableVerifiedBy and ExtendableVerificationDate in the Sofas table, etc), if the verification really needs to be done separately for each column.

SQL - Field Grouping and temporary data restructruing

I would like to apologize first about my title, because I understand it may be technically incorrect
I currently have a database with multiple tables, 4 of them are relevant in this example.
FORMS
FIELDS
ENTRIES
VALUES
Below is a shortened version of the tables
Create table Form_Master
(
form_id int identity primary key ,
form_name varchar(255) ,
form_description varchar(255),
form_create_date date ,
)
Create table Field_Master
(field_id int identity primary key,
form_ID int foreign key references Form_Master(form_id),
field_name varchar(255),
type_ID int
)
Create table Entry_Master
(
entry_id int identity primary key,
entry_date date,
form_id int foreign key references Form_Master(form_id),
)
Create table Value_Master
(
value_id int identity primary key,
value varchar(255),
field_id int foreign key references Field_Master(field_id),
entry_id int foreign key references Entry_Master(entry_id),
)
The purpose of these tables is to create a dynamic method of capturing and retrieving information - a form is a table, a field is a column, and entry is a row and a value is a cell
Currently when I am retrieving information from a form, I create a temporary table, with columns as such in the field_master, then select all entries linked to the form, and the values linked to those entries, and insert them into the temporary table I have just created.
The reason for the temporary table is to restructure the data into an organised format and display it in a DataGridView.
My problem is one of performance, creating the table as mentioned above is becoming slower as forms exceed fields > 20 or entries linked to a form exceeds > 100
My questions are:
Is there a way to select the data directly from field_master in the format of the temporary table mentioned above?
Do you think I should re-think my database design?
Is there an easier method to do what I am trying to do?
Any input will be appreciated, I do know how to use Google, however in this instance I am not sure what exactly to look for, so even a keyword would be nice.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas