Can Database Normalization occur from values - sql

Say I want to normalize a table
itemID | itemDate | itemSource | type | color | size | material
254 03/08/1988 toyCo doll null 16 plastic
255 03/08/1988 toyCo car blue null plastic
256 03/08/1988 toyCo boat purple 20 wood
Now the type field can only have 1 of 3 values. doll, car, or boat. Attributes of color, size, and material are functionally dependent on type. As you can see though, items of type|doll do not determine color. I do not know if this is a problem. But moving on.
type(pk) | color | size | material = table A
itemID(pk) | itemDate | itemSource = table B
We are now in 1nf. My question is, can the type key, along with its attributes, become based on the type keys' possible values?
typeDoll(pk) | size | material = table C
typeCar(pk) | color| material = table D
typeBoat(pk) | color | size | material table E

I'm not sure I understand exactly what you're asking, but here's one approach to creating an exclusive arc in SQL.
-- Columns common to all types.
create table items (
item_id integer primary key,
item_type varchar(10) not null
check (item_type in 'doll', 'car', 'boat'),
-- This constraint lets the pair of columns be the target of a foreign key reference.
unique (item_id, item_type),
item_date date not null default current_date,
item_source varchar(25) not null
);
-- Columns unique to dolls. I'd assume that "size" means one thing when you're
-- talking about dolls, and something slightly different when you're talking
-- about boats.
create table dolls (
item_id integer primary key,
item_type varchar(10) not null default 'doll'
check(item_type = 'doll'),
foreign key (item_id, item_type) references items (item_id, item_type),
doll_size integer not null
check(doll_size between 1 and 20),
doll_material varchar(25) not null -- In production, probably references a table
-- of valid doll materials.
);
The column dolls.item_type, along with its CHECK constraint and the foreign key reference, guarantees that
every row in "dolls" has a matching row in "items", and
that matching row is also about dolls. (Not about boats or cars.)
Tables for boats and cars are similar.
If you have to implement this in MySQL, you'll have to replace the CHECK constraints, because MySQL doesn't enforce CHECK constraints. In some cases, you can replace them with a foreign key reference to a tiny table. In other cases, you might have to write a trigger.

What I am trying to achieve is called Polymorphic Association. This can be accomplished by creating a super table to store all possible columns and using a second and third table to constrain foreign keys to primary keys.
Its explained in detail here

Related

What's the best way to store an enum type in postgres being used in a join table?

I have have the following three tables:
Movie Types
CREATE TYPE category AS ENUM ('comedy', 'drama', 'action', 'thriller');
CREATE TABLE movie_types (
id BIGSERIAL PRIMARY KEY,
category category NOT NULL
);
Column | Type | Modifiers
--------+--------+----------------------------------------------------
id | bigint | not null
category | category | not null
Movie to Movie types joins table
CREATE TABLE movie_categories (
id BIGSERIAL PRIMARY KEY,
movie_type_id FOREIGN KEY REFERENCES movie_type(id)
movie_id FOREIGN KEY REFERENCES movie(id)
);
Column | Type | Modifiers
--------+--------+----------------------------------------------------
id | bigint | not null
movie_type_id | category | not null
movie_id | category | not null
Movies
CREATE TABLE movies (
id BIGSERIAL PRIMARY KEY,
name varchar
);
Column | Type | Modifiers
--------+--------+----------------------------------------------------
id | bigint | not null
name | string | not null
The Movie types is limited list of categories stored as an enum. A movie can have several different categories associated with it.
What's the best practice when storing something in a similar data model? I using the enum type here good practice or is it better to just use varchar for category in movie_types?
An ENUM sets up a predetermined set of values that a column can take. A look up table provides the same restriction on a column. The problem you face is you are attempting to implement both. A solution is to choose and implement one of them. I tend to lean toward minimizing maintenance where possible, so the following implements a look table approach.
First step: Drop the ENUM.
Second alter category table category to text.
Insert the prior enum values into category table.
Adjust other tables as needed.
The result becomes a simple M:M with Movie:Movie_Types with the intersection table movie_categories.
create table movie_types (
id bigint generated always as identity primary key
, category text not null unique
);
insert into movie_types(category)
values ('comedy'), ('drama'), ('action'), ('thriller');
create table movies (
id bigint generated always as identity primary key
, name varchar
);
create table movie_categories (
movie_type_id bigint references movie_types(id)
, movie_id bigint references movies(id)
, constraint movie_categories_pk
primary key (movie_id,movie_type_id)
);

Primary key of a simple 1:1-mapping table with NULL values?

This feels like a very basic question, but I really don't see the obvious answer at the moment.
I have a simple table that maps object ids between two namespaces:
|---------------------|------------------|
| id_in_ns1 | id_in_ns2 |
|---------------------|------------------|
| 1 | 5 |
|---------------------|------------------|
| 2 | 17 |
|---------------------|------------------|
| 3 | NULL |
|---------------------|------------------|
| NULL | 1 |
|---------------------|------------------|
The mapping is basically 1:1, but as you can see, some objects from namespace 1 do not exist in namespace 2, and vice versa, so that there are NULL values in the table.
So, what would be the primary key of this table? As a PK cannot be NULL, I can neither use (id_in_ns1) nor (id_in_ns2) nor the composite.
The only idea I have is to replace NULL by a definite value, say -1, and to use (id_in_ns1, id_in_ns2)as PK. However, this feels not only hackish but also "unnormal" because the non-NULL (or non--1)) value alone is already sufficient to uniquely identify an object.
Only add entries that have a valid id on both sides. This will effectively get rid of all NULL values, allowing you to specify a proper composite key on (id_in_ns1, id_in_ns2).
Ultimately, those are the values that allow you to identify a single row and you will not lose relevant information - a SELECT id_in_ns2 FROM mapping_table WHERE id_in_ns1 = x will return NULL either way, whether there is a (x, NULL) row or not.
If you insist on keeping those NULLs you could add another column with an artificial (auto incrementing) primary key, but that feels as hacky as using -1.
Use a synthetic primary key and use unique constraints for the rest:
create table mapping (
mappingId int auto_increment primary key, -- or whatever for your database
id_in_ns1 int references ns1(id),
id_in_ns2 int references ns2(id),
unique (id_in_ns1),
unique (id_in_ns2)
);
Just one caveat: some databases only allow one NULL value for UNIQUE constraints. You might need to use a filtered unique index instead (or some other mechanism) for this construct.

Can a foreign key refer to a primary key in the same table?

I just think that the answer is false because the foreign key doesn't have uniqueness property.
But some people said that it can be in case of self joining the table.
I am new to SQL. If its true please explain how and why?
Employee table
| e_id | e_name | e_sala | d_id |
|---- |------- |----- |--------|
| 1 | Tom | 50K | A |
| 2 | Billy | 15K | A |
| 3 | Bucky | 15K | B |
department table
| d_id | d_name |
|---- |------- |
| A | XXX |
| B | YYY |
Now, d_id is foreign key so how it can be a primary key. And explain something about join. What is its use?
I think the question is a bit confusing.
If you mean "can foreign key 'refer' to a primary key in the same table?", the answer is a firm yes as some replied. For example, in an employee table, a row for an employee may have a column for storing manager's employee number where the manager is also an employee and hence will have a row in the table like a row of any other employee.
If you mean "can column(or set of columns) be a primary key as well as a foreign key in the same table?", the answer, in my view, is a no; it seems meaningless. However, the following definition succeeds in SQL Server!
create table t1(c1 int not null primary key foreign key references t1(c1))
But I think it is meaningless to have such a constraint unless somebody comes up with a practical example.
AmanS, in your example d_id in no circumstance can be a primary key in Employee table. A table can have only one primary key. I hope this clears your doubt. d_id is/can be a primary key only in department table.
This may be a good explanation example
CREATE TABLE employees (
id INTEGER NOT NULL PRIMARY KEY,
managerId INTEGER REFERENCES employees(id),
name VARCHAR(30) NOT NULL
);
INSERT INTO employees(id, managerId, name) VALUES(1, NULL, 'John');
INSERT INTO employees(id, managerId, name) VALUES(2, 1, 'Mike');
-- Explanation:
-- In this example.
-- John is Mike's manager. Mike does not manage anyone.
-- Mike is the only employee who does not manage anyone.
Sure, why not? Let's say you have a Person table, with id, name, age, and parent_id, where parent_id is a foreign key to the same table. You wouldn't need to normalize the Person table to Parent and Child tables, that would be overkill.
Person
| id | name | age | parent_id |
|----|-------|-----|-----------|
| 1 | Tom | 50 | null |
| 2 | Billy | 15 | 1 |
Something like this.
I suppose to maintain consistency, there would need to be at least 1 null value for parent_id, though. The one "alpha male" row.
EDIT: As the comments show, Sam found a good reason not to do this. It seems that in MySQL when you attempt to make edits to the primary key, even if you specify CASCADE ON UPDATE it won’t propagate the edit properly. Although primary keys are (usually) off-limits to editing in production, it is nevertheless a limitation not to be ignored. Thus I change my answer to:- you should probably avoid this practice unless you have pretty tight control over the production system (and can guarantee no one will implement a control that edits the PKs). I haven't tested it outside of MySQL.
Eg: n sub-category level for categories .Below table primary-key id is referred by foreign-key sub_category_id
A good example of using ids of other rows in the same table as foreign keys is nested lists.
Deleting a row that has children (i.e., rows, which refer to parent's id), which also have children (i.e., referencing ids of children) will delete a cascade of rows.
This will save a lot of pain (and a lot of code of what to do with orphans - i.e., rows, that refer to non-existing ids).
Other answers have given clear enough examples of a record referencing another record in the same table.
There are even valid use cases for a record referencing itself in the same table. For example, a point of sale system accepting many tenders may need to know which tender to use for change when the payment is not the exact value of the sale. For many tenders that's the same tender, for others that's domestic cash, for yet other tenders, no form of change is allowed.
All this can be pretty elegantly represented with a single tender attribute which is a foreign key referencing the primary key of the same table, and whose values sometimes match the respective primary key of same record. In this example, the absence of value (also known as NULL value) might be needed to represent an unrelated meaning: this tender can only be used at its full value.
Popular relational database management systems support this use case smoothly.
Take-aways:
When inserting a record, the foreign key reference is verified to be present after the insert, rather than before the insert.
When inserting multiple records with a single statement, the order in which the records are inserted matters. The constraints are checked for each record separately.
Certain other data patterns, such as those involving circular dependences on record level going through two or more tables, cannot be purely inserted at all, or at least not with all the foreign keys enabled, and they have to be established using a combination of inserts and updates (if they are truly necessary).
Adding to the answer by #mysagar the way to do the same in MySQL is demonstrated below -
CREATE TABLE t1 (
-> c1 INT NOT NULL,
-> PRIMARY KEY (c1),
-> CONSTRAINT fk FOREIGN KEY (c1)
-> REFERENCES t1 (c1)
-> ON UPDATE RESTRICT
-> ON DELETE RESTRICT
-> );
would give error -
ERROR 1822 (HY000): Failed to add the foreign key constraint. Missing index for constraint 'fk' in the referenced table 't1'
The correct way to do it is -
CREATE TABLE t1 (
-> c1 INT NOT NULL,
-> PRIMARY KEY (c1),
-> KEY i (c1),
-> CONSTRAINT fk FOREIGN KEY (c1)
-> REFERENCES t1 (c1)
-> ON UPDATE RESTRICT
-> ON DELETE RESTRICT
-> );
One practical utility I can think of is a quick-fix to ensure that after a value is entered in the PRIMARY KEY column, it can neither be updated, nor deleted.
For example, over here let's populate table t1 -
INSERT INTO t1 (c1) VALUES
-> (1),
-> (2),
-> (3),
-> (4),
-> (5);
SELECT * FROM t1;
+----+
| c1 |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
Now, let's try updating row1 -
UPDATE t1
-> SET c1 = 6 WHERE c1 = 1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`constraints`.`t1`, CONSTRAINT `fk` FOREIGN KEY (`c1`) REFERENCES `t1` (`c1`) ON DELETE RESTRICT ON UPDATE RESTRICT)
Now, let's try deleting row1 -
DELETE FROM t1
-> WHERE c1 = 1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`constraints`.`t1`, CONSTRAINT `fk` FOREIGN KEY (`c1`) REFERENCES `t1` (`c1`) ON DELETE RESTRICT ON UPDATE RESTRICT)

Which one is best practice for database table having two columns...one for Item1 and second for Item2?

Which one is best practice for database table having two columns...one for Item1 and second for Item2?
Table Structure
Item1 | Item2
Apple | Orange
Pen | Paper
OR
ID | Item1 | Item2
1 | Apple | Orange
2 | Pen | Paper
In short I wish to know that is it a good practice to make a primary column/field ID for tables even if they are allowed to accept multiple same values?
You should have a primary key. My guess from the extremely limited info you have posted is that neither of your fields will count as a primary key. Therefore, you need an id field.
(note: You could do it without, but it's a bad idea)
Well, having a numeric ID of the record in the table, making that ID a Primary Key for the table, is a preferred way to go:
comparison operations on the numbers are much faster then the ones on strings (given number fits into the CPU integer or long value, check the INTEGER SQL type);
you will have your database consistent in case you'll want to rename "Apple" to an "apple" or, maybe, "APPLE" one day. Without the extra ID column, you'll have to update all the dependent tables (in case you're planning to have a Primary/Foreign keys of course).
you will have only one column in your Primary Key even for cases when you'll have both, Orange and Green apples, without the extra ID you'd have to make Primary as (Item1, Item2).
Sounds like you might need more than one table, but it really depends on what you need to store and why. Try designing your model first, then create a data structure that supports your model.
To answer your question, it looks like the second table structure is the "best practice" (or should I say 'better practice') as it contains an ID column that can act as the primary key for indexing, but it really depends on usage. Considering the following:
One possible data structure (separates by type)
table - Fruit
=============
FruitId int not null identity(1,1) primary key
Name varchar(100) not null
table - OfficeSupply
====================
OfficeSupplyId int not null identity(1,1) primary key
Name varchar(100) not null
Another possibility (combined with a type column)
table - Item
============
ItemId int not null identity(1,1) primary key
Name varchar(100) not null
Type varchar(100) not null

SQL - Properties Structure

What is the best way to setup this table structure.
I have 3 tables, one table we'll call fruit and the other two tables are properties of that fruit so fruit_detailed and fruit_basic.
fruit
id | isDetailed
fruit_detailed
id | price | color | source | weight | fruitid?
fruit_basic
id | value | fruitid?
So what I want to do is have a property in fruit called isDetailed and if true, fill the fruit_detailed table with properties like color, weight, source, etc (multiple column). If its false then store in fruit_basic table with properties written in a single row.
Storage sounds quite basic but if I want to select a fruit and get its properties, how can I determine which table to join? I could use and IF statement on the isDetailed property and then join like that but then you have two different types of properties coming back
How would you create the tables or do the join to get the properties? Am I missing something?
Personally, I see no need to split the basic and detailed attributes out into separate tables. I think they can/should all be columns of the main fruit table.
I would probably model this like so:
CREATE TABLE Fruits (
fruit_id INT NOT NULL,
CONSTRAINT PK_Fruit PRIMARY KEY CLUSTERED (fruit_id)
)
CREATE TABLE Fruit_Details (
fruit_id INT NOT NULL,
price MONEY NOT NULL,
color VARCHAR(20) NOT NULL,
source VARCHAR(20) NOT NULL,
weight DECIMAL(10, 4) NOT NULL,
CONSTRAINT PK_Fruit_Detail PRIMARY KEY CLUSTERED (fruit_id),
CONSTRAINT FK_Fruit_Detail_Fruit FOREIGN KEY (fruit_id) REFERENCES Fruit (fruit_id)
)
I had to guess on appropriate data types for some of the columns. I'm also not sure exactly what the "value" column is in your Fruit_Basic table, so I've left that out for now.
Don't bother putting a bunch of IDs out there simply for the sake of having an ID column on every table. The Fruits->Fruit_Details relationship is a one-to-zero-or-one relationship. In other words, you can have at most one Fruit_Details row for each Fruits row. In some cases you might have no row in Fruit_Details for a particular row in Fruits.
When you're querying you can simply OUTER JOIN from the Fruits table to the Fruit_Details table. If you get back a NULL value for Fruit_Details.fruit_id then you know that the fruit doesn't have any details. You can always include the Fruit_Details columns, they'll just be NULL if the row doesn't exist. That way you can always have homogeneous resultsets. As you've discovered, otherwise you end up having to worry about different column lists coming back depending on the row in question, which will lead to tons of headaches.
If you want to include an "isDetailed" column then you can just use this:
CASE WHEN Fruit_Details.fruit_id IS NULL THEN 0 ELSE 1 END AS isDetailed
This approach also has an advantage over putting all of the columns in one table because it lowers the number of NULL columns in your database and depending on your data can substantially decrease storage requirements and improve performance.
I'm not sure why you would need to store a basic or detailed list of the fruit in different tables. You should just have 1 table and then leave some of the fields null if the information doesn't exist.
Assuming that value from fruit_basic is the same as price from fruit_detailed, you'd have something like this.
fruit
id | detail_id (fk to fruit_detailed table)
fruit_details
detail_id | price | color | source | weight