Migrating legacy table to normalized data structure with foreign keys in Oracle SQL - sql

I am having some trouble wrapping my head around remaking databases. I have a book database that includes only one table, where all of the authors data is included after each book. I'm trying to remake this database in order to have an author table and a book table.
I made the author table using :
CREATE TABLE AUTHORS
AS SELECT AUTHOR_NAME, AUTHOR_SURNAME, AUTHOR_BIRTHDATE
If I now want to remake the book table, how do I add the foreign key so that the author of each book will be the correct one? That is, if the first entry on the original book table was:
ISBN1 Title1 Author_Name1 Author_Surname1 Author_Birthdate1
How do I import this data into the new table so that the new author field, a foreign key, references the correct entry in the author table? Sorry if it's confusing.

You are looking to split the existing table into two tables, one to store the authors and the other for books. For this to work properly, you need to create a unique id for each author. Here is a step by step approach.
Assuming the following legacy data structure:
create table old_books (
isbn NUMBER(13, 0),
title VARCHAR2(200),
author_name VARCHAR2(200),
author_surname VARCHAR2(200),
author_birthdate DATE
);
And this sample data:
ISBN | TITLE | AUTHOR_NAME | AUTHOR_SURNAME | AUTHOR_BIRTHDATE
------------: | :----- | :---------- | :------------- | :---------------
1000000000001 | book 1 | name 1 | surname 1 | 01-MAR-90
1000000000002 | book 2 | name 2 | surname 2 | 01-MAR-95
1000000000003 | book 3 | name 1 | surname 1 | 01-MAR-90
First, let's create and feed the new data structure for authors (note that you don't want to use CREATE TABLE AS SELECT ... because this does not let you add constraints or other useful options).
To generate a unique author id, we use the IDENTITY feature (available starting Oracle 12c - without this feature, we would need to create a sequence and a trigger).
In legacy data, we assume that each author is uniquely identified by its name, surname and birthdate:
CREATE TABLE authors (
id NUMBER GENERATED ALWAYS AS IDENTITY,
name VARCHAR2(200),
surname VARCHAR2(200),
birthdate DATE,
PRIMARY KEY (id)
);
INSERT INTO AUTHORS (name, surname, birthdate)
SELECT DISTINCT author_name, author_surname, author_birthdate FROM old_books;
2 rows affected
SELECT * FROM authors;
ID | NAME | SURNAME | BIRTHDATE
-: | :----- | :-------- | :--------
1 | name 1 | surname 1 | 01-MAR-90
2 | name 2 | surname 2 | 01-MAR-95
With this first table in place, we can now create the books table. It contains a foreign key that references the primary key of the authors table. To feed the table, we need to join the legacy table with the new authors table to recover the id of each author:
CREATE TABLE books (
isbn NUMBER(13, 0),
title VARCHAR2(200),
author_id NUMBER,
CONSTRAINT book_author FOREIGN KEY(author_id) REFERENCES authors(id),
PRIMARY KEY (isbn)
);
INSERT INTO books(isbn, title, author_id)
SELECT ob.isbn, ob.title, a.id
FROM old_books ob
INNER JOIN authors a
ON a.name = ob.author_name
AND a.surname = ob.author_surname
AND a.birthdate = ob.author_birthdate;
3 rows affected
SELECT * FROM books;
ISBN | TITLE | AUTHOR_ID
------------: | :----- | --------:
1000000000001 | book 1 | 1
1000000000002 | book 2 | 2
1000000000003 | book 3 | 1
All set! Data is properly spread between the two tables, with the proper constraints in place. We can join both tables with a query like:
SELECT b.isbn, b.title, a.name, a.surname, a.birthdate
FROM authors a
INNER JOIN books b ON a.id = b.author_id;
ISBN | TITLE | NAME | SURNAME | BIRTHDATE
------------: | :----- | :----- | :-------- | :--------
1000000000001 | book 1 | name 1 | surname 1 | 01-MAR-90
1000000000002 | book 2 | name 2 | surname 2 | 01-MAR-95
1000000000003 | book 3 | name 1 | surname 1 | 01-MAR-90

You say that an author's first name plus surname are your author table's primary key. This is a valid approach. In case of two authors with the same name you'd have to find a solution like 'John' + 'Smith' and 'John R.' + 'Smith' or 'John' + 'Smith (the fantasy author)'. This is called a natural composite key, albeit not a perfect one as we may have to deal with duplicate names as mentioned. On the other hand there exist authors with the same name, so we may face this problem right away ;-)
Books are identified by their ISBN, which makes for an even better natural key, because there can be no duplicates. (Only if you wanted to add very old books or self-marketed books that have no ISBN, you'd have to create a fake ISBN.)
In order to have your book referring to an author, you must include the whole key, which is first and surname here. This is no redundancy, as this is the key needed to identify an author in your database.
CREATE TABLE books AS SELECT isbn, title, author_name, author_surname FROM old_table;
ALTER TABLE books ADD CONSTRAINT fk_book_author FOREIGN KEY (author_name, author_surname)
REFERENCES authors (author_name, author_surname);
An alternative would be to introduce surrogate (i.e. technical) keys. You would generate an ID (a number) for each book and each author and work with them. (That means the book table would contain an author_id.) But for a good database you should still think about what identifies a row naturally. This makes it easier for people who write the queries later. (E.g. someone asks to select a list of authors and the number of books they've written. How to write that query? Does it suffice to show first and surname or could we end up with two rows "John Smith | 5" and "John Smith | 2" and the enquirer saying they cannot use this ambiguous result?) Even when providing surrogate keys you should still have a unique constraint on the natural key, if there is one. For books with optional ISBNs this may be title + author_id and for authors it could be first name + surname + date of birth.
By the way: There exist books with more than one author ;-)

Related

Making combinations of attributes unique in PostgreSQL

There is an option in postgresql where we can have a constraint such that we can have multiple attributes of a table together as unique
UNIQUE (A, B, C)
Is it possible to take attributes from multiple tables and make their entire combination as unique in some way
Edit:
Table 1: List of Book
Attributes: ID, Title, Year, Publisher
Table 2: List of Author
Attributes: Name, ID
Table 3: Written By: Relation between Book and Author
Attributes: Book_ID, Author_ID
Now I have situation where I don't want (Title, Year, Publisher, Authors) get repeated in my entire database
There are 3 solutions to this problem:
You add a column "authorID" to the table "book", as a foreign key. You can then add the UNIQUE constraint to the table "book".
We can have a foreign key on the 2 columns (bookID, author ID) which references the table bookAuthor.
You create a Trigger on insert on the table "book" which checks whether the combination exist and does not insert if it does exist. You will find a working example of this option below.
Whilst working on this option I realised that the JOIN to WrittenBy must be done on Title and not ID. Otherwise we can record the same book as many times as we like just by using a new ID. The problem with using the title is that the slightest change in spelling or punctuation means that it is treated as a new title.
In the example the 3rd insert has failed because it already exists. In the 4th have left 2 spaces in "Tom Sawyer" and it is accepted as a different title.
Also as we use a join to find out the author the real effect of our rule is exactly the same as if we had a UNIQUE constraint on the table books on columns Title, Year and Publisher. This means that all that I have coded is a waste of time.
We thus decide, after coding it, that this option is not effective.
We could create a fourth table with the 4 columns and a UNIQUE constraint on all 4. This seems a heavy solution compared to option 1.
CREATE TABLE Book (
ID int primary key,
Title varchar(25),
Year int,
Publisher varchar(10) );
CREATE TABLE Author (
ID int primary key,
Name varchar(10)
);
CREATE TABLE WrittenBy(
Book_ID int primary key,
Titlew varchar(25),
Author_ID int
);
CREATE FUNCTION book_insert_trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
DECLARE
authID INTEGER;
coun INTEGER;
BEGIN
IF pg_trigger_depth() <> 1 THEN
RETURN NEW;
END IF;
SELECT MAX(Author_ID) into authID
FROM WrittenBy w
WHERE w.Titlew = NEW.Title;
SELECT COUNT(*) INTO coun FROM
Book b LEFT JOIN WrittenBy w ON
b.Title = w.Titlew
WHERE NEW.year = b.year
AND NEW.title=b.title
AND NEW.publisher=b.publisher
AND authID = COALESCE(w.Author_ID,authID);
IF coun > 0 THEN
RETURN null; -- this means that we do not insert
ELSE
RETURN NEW;
END IF;
END;
$$
;
CREATE TRIGGER book_insert_trigger
BEFORE INSERT
ON Book
FOR EACH ROW
EXECUTE PROCEDURE book_insert_trigger_function();
INSERT INTO WrittenBy VALUES
(1,'Tom Sawyer',1),
(2,'Huckleberry Finn',1);
INSERT INTO Book VALUES (1,'Tom Sawyer',1950,'Classics');
INSERT INTO Book VALUES (2,'Huckleberry Finn',1950,'Classics');
INSERT INTO Book VALUES (3,'Tom Sawyer',1950,'Classics');
INSERT INTO Book VALUES (3,'Tom Sawyer',1950,'Classics');
SELECT *
FROM Book b
LEFT JOIN WrittenBy w on w.Titlew = b.Title
LEFT JOIN Author a on w.author_ID = a.ID;
>
> id | title | year | publisher | book_id | titlew | author_id | id | name
> -: | :--------------- | ---: | :-------- | ------: | :--------------- | --------: | ---: | :---
> 1 | Tom Sawyer | 1950 | Classics | 1 | Tom Sawyer | 1 | null | null
> 2 | Huckleberry Finn | 1950 | Classics | 2 | Huckleberry Finn | 1 | null | null
> 3 | Tom Sawyer | 1950 | Classics | null | null | null | null | null
>
db<>fiddle here

Creating a database. Design help needed

I am practicing python flask and I am going to make a very simple music site and am now creating the database.
I am new to databases so I just wanted help to see if the tables and relations are correct.
Also would I store multiple song ID's in the playlist_songs songID column?
userTable
userID (PK), username, email, password, level
songTable
songID (PK), songName, songArtist, songGenre, songDuration
playlistTable
playlistID (PK), userID (FK), playlistName, playlistDescription
playlist_songs
playlistID (FK), songID (FK)
As requested, I'm adding some collective info based on your question and comments.
Your design looks fine. As recommended by Rowland, it could perhaps use an order column. Something to order by. If you choose not to add this the songs will be returned in a somewhat random order for the playlist, or you could order by the SongId column and be guaranteed the same order at least (within a playlist). But it wouldn't be changeable.
You asked how data was entered in to the playlist_songs table:
SongTable
SongId | SongName | ...
-----------------------------
1 | Happy Bithday | ...
2 | Last Christmas | ...
3 | Christmas tree | ...
4 | Some song | ...
PlaylistTable
PlaylistId | PlaylistName | ...
-------------------------------------
1 | My Birthday songs | ...
2 | My Christmas songs | ...
3 | All my songs | ...
Playlist_songs
PlaylistId (FK) | SongId (FK)
-----------------------------
1 | 1
2 | 2
2 | 3
3 | 1
3 | 2
3 | 3
3 | 4
As you can see the Playlist_songs table can contain many playlists and many songs. If you query Playlist_songs for PlaylistId = 2 it will return SongId 2 & 3, and so on.
Currently, a primary key would have to be a constraint on the two columns (a compound key). This is also where you could add an Order column, or just add a stand alone primary key (Id for example) and order by that.

How do I delete duplicate records, or merge them with foreign-key restraints intact?

I have a database generated from an XML document with duplicate records. I know how to delete one record from the main table, but not those with foreign-key restraints.
I have a large amount of XML documents, and they are inserted without caring about duplicates or not. One solution to removing duplicates is to just delete the lowest Primary_Key values (and all related foreign key records) and keep the highest. I don't know how to do that, though.
The database looks like this:
Table 1: [type]
+-------------+---------+-----------+
| Primary_Key | Food_ID | Food_Type |
+-------------+---------+-----------+
| 70001 | 12345 | fruit |
| 70002 | 12345 | fruit |
| 70003 | 12345 | meat |
+----^--------+---------+-----------+
|
|-----------------|
|
| Linked to primary key in the first table
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key | Information_ID | Food_Name | Information | Comments |
+-------------+-----------------+-------------+-------------+------------+
| 0001 | 70001 | banana | buy # toms | delicious! |
| 0002 | 70002 | banana | buy # mats | so-so |
| 0003 | 70003 | decade meat | buy # sals | disgusting |
+-------------+-----------------+-------------+-------------+------------+
^ Table 2: [food_information]
There are several other linked tables as well, which all have a foreign key value of the matched primary key value in the main table ([type]).
My question based on which solution might be best:
How do I delete all of those records, except 70003 (the highest one)? We can't know if it's duplicate record unless [Food_ID] shows up more than once. If it shows up more than once, we need to delete records from ALL tables (there are 10) based on the Primary_Key and Foreign_Key relationship.
How do I update/merge these SQL records on insertion to avoid having to delete multiples again?
I'd prefer #1, as it prevents me from having to rebuild the database, and it makes inserting much easier.
Thanks!
Even if a [foodID] is not duplicated you will get a max(Primary_Key)
And it will not be deleted
The where condition is NOT in
delete tableX
where tableX.informationID not in ( select max(Primary_Key)
from [type]
group by [foodID] )
then just do [type] last
delete [type]
where [type].[Primary_Key] not in ( select max(Primary_Key)
from [type]
group by [foodID] )
then just create as unique constraint on [foodID]
something like...
assumed:
create table food (
primary_key int,
food_id int,
food_type varchar(20)
);
insert into food values (70001,12345,'fruit');
insert into food values (70002,12345,'fruit');
insert into food values (70003,12345,'meat');
insert into food values (70004,11111,'taco');
create table info (
primary_key int,
info_id int,
food_name varchar(20)
);
insert into info values (1,70001,'banana');
insert into info values (2,70002,'banana');
insert into info values (3,70003,'decade meat');
insert into info values (4,70004,'taco taco');
and then...
-- yields: 12345 70003
select food_id, max(info_id) as max_info_id
from food
join info on food.primary_key=info.info_id
where food_id in (
select food_id
from food
join info on food.primary_key=info.info_id
group by food_id
having count(*)>1);
then... something like... this to get the ones to delete. there might be a better way to write this... i'm thinking about it.
select *
from food
join info on food.primary_key=info.info_id
join ( select food_id, max(info_id) as max_info_id
from food
join info on food.primary_key=info.info_id
where food_id in (
select food_id
from food
join info on food.primary_key=info.info_id
group by food_id
having count(*)>1)
) as dont_delete
on food.food_id=dont_delete.food_id and
info.info_id<max_info_id
gives you:
PRIMARY_KEY FOOD_ID FOOD_TYPE INFO_ID FOOD_NAME MAX_INFO_ID
70001 12345 fruit 70001 banana 70003
70002 12345 fruit 70002 banana 70003
so you could do.... just delete from food where primary_key in (select food.primary_key from that_big_query_up_there) and delete from info where info_id in (select food.primary_key from that_big_query_up_there)
for future issues, maybe consider a unique constraint on food... unique(primary_key,food_id) or something but if it's one-to-one, why don't you just store them together...?

Dynamically creating Columns from rows Access 2010

I am relatively new to Access and I have a table that has AuthorName, BookTitle, and CoAuthor. AuthorName and BookTitle are a composite key.
Currently the query pulls information like:
AuthorName---------BookTitle------CoAuthor
Steven King--------Dark Half------Peter Straub
Steven King--------Dark Half------John Doe
James Patterson----Another Time
Jeff Hanson--------Tales of Time---Joe Smith
I would like it to read (dynamically) if possible
AuthorName---------BookTitle---------CoAuthor1--------CoAuthor2
Steven King----------Dark Half--------Peter Straub-----Joe Doe
James Patterson----Another Time
Jeff Hanson----------Tales of Time----Joe Smith
So if there is another author that is later added, a third column for CoAuthor would appear.
Is this possible with VBA or SQL?
What you are asking goes against the whole point of using a relational database; in short, you should design your tables in a way that minimizes (rather eliminates) the need to redesign the tables. I suggest you read about database normalization.
Your question is, as a matter of fact, an example that I use very frequently when I teach about databases: How would you design a database to hold all the information about books and authors. The scenarios are:
A book can have one or more authors
An author may have written one or more books
So, this is a many-to-many relation, and there's a way to design such a database.
First, design a table for the authors:
tbl_authors
author_idd (primary key, numeric, preferibly autonumeric)
first_name (String)
last_name (String)
Then, design a table for the books:
tbl_books
book_id (primary key, numeric, preferibly autonumeric)
book_title (String)
And finally, a third table is needed to relate authors and books:
tbl_books_authors
book_id (primary key, numeric)
author_id (primary key, numeric)
main_author (boolean: "Yes" if it is the main author, "No" otherwise)
(Both fields must be part of the primary key)
And now, the main question: How to query for books and its authors?
Asuming the above design, you could write an SQL query to get the full list of books and its authors:
select book_title, first_name, last_name
from
tbl_authors as a
inner join tbl_books_authors as ba on a.authorId = ba.authorId
inner join tbl_books as b on ba.bookId = b.bookId
This way, you'll have something like this:
book_title | first_name | last_name
-----------+------------+-----------
book1 | John | Doe
book1 | Jane | Doe
book2 | John | Doe
book2 | Peter | Who
book3 | Jane | Doe
book4 | Peter | Who
book5 | John | Doe
book5 | Jane | Doe
book5 | Peter | Who
book5 | Jack | Black
Why is this design better than your original idea?
Because you won't need to alter the structure of your tables to add another author
Because you don't know a priori how many authors a book can have
Because you avoid redundancy in your database
Because, with this design, you'll be able to use front-end tools (like Access forms and reports, or other tools) to create almost any arraingment from this data.
Further reading:
The Access Web (specially "The Ten Commandments of Access")
Minor update
This kind of design will help you avoid lots and lots of headaches in the future, because you won't need to alter your tables every time you add a third or fourth author. I learned about database normalization some years ago reading "Running Access 97" by Jon Viescas (this is not a commercial, it's just a reference ;) )... an old book, yes, but it has a very good introduction on the topic.
Now, after you have normalized your database, you can use pivot queries to get what you need (as noted in the answer posted by Conrad Frix).
If your table had type like below
+-----------------+---------------+--------------+-----------+
| AuthorName | BookTitle | CoAuthor | Type |
+-----------------+---------------+--------------+-----------+
| Steven King | Dark Half | Peter Straub | CoAuthor1 |
| Steven King | Dark Half | John Doe | CoAuthor2 |
| James Patterson | Another Time | | CoAuthor1 |
| Jeff Hanson | Tales of Time | Joe Smith | CoAuthor1 |
+-----------------+---------------+--------------+-----------+
it would be a pretty simple transform
TRANSFORM First(Books.CoAuthor) AS FirstOfCoAuthor
SELECT Books.AuthorName, Books.BookTitle
FROM Books
GROUP BY Books.AuthorName, Books.BookTitle
PIVOT Books.Type;
Since it doesn't we need to create it on the fly by first assigning a number to each row simulating ROW_NUMBER OVER and then transforming. On large data sets this may be quite slow
TRANSFORM First(b.coauthor) AS firstofcoauthor
SELECT b.authorname,
b.booktitle
FROM (SELECT authorname,
booktitle,
coauthor,
'CoAuthor' & k AS Type
FROM (SELECT b.authorname,
b.booktitle,
b.coauthor,
Count(*) AS K
FROM books AS b
LEFT JOIN books AS b1
ON b.authorname = b1.authorname
WHERE [b].[coauthor] <= [b1].[coauthor]
OR (( ( b1.coauthor ) IS NULL ))
GROUP BY b.authorname,
b.booktitle,
b.coauthor) AS t) AS b
GROUP BY b.authorname,
b.booktitle
PIVOT b.type

Create a new table with each new record

I am building a Hospital Management System and I wanted to make a table in the database that contains all the hospitals. Each hospital has another table that contains info about its employees.
If I wanted to add a new record (a new Hospital) in the hospital table after the system is released, can the system generate a new table for that new hospital's employees?
It would have a standard form and the system would ask the user to fill it (through GUI, or any other way).
Is it technically possible? And if not is there other ways to do it?
Why would you need to create a seperate table for each hospital? Add Hospital_ID as a column to the Employee table and then you can tell, from that one table, what hospital an mployee works for.
SQLFiddle
First you should read about Table relationships. It's really important that you understand how it works before desgnining your database.
Now as for your question, tables don't contain tables. This is why you should review you design.
You should have a those tables :
Hospital
---------------------------------------------------
| ID | Name | Address | Phone |
---------------------------------------------------
| 1 | SomeName | 13 Mercy Street | 555-555-5555 |
---------------------------------------------------
Employee
-------------------------
| ID | Hospital_ID | Name |
-------------------------
| 01 | 1 | John |
-------------------------
This way every employee is associated with an hospital. We now know John works in the SomeName hospital.
To help in your research,
Hospital.ID is a PRIMARY KEY
Employee.ID is a PRIMARY KEY
Employee.Hospital_ID is a FOREIGN KEY