Polymorphism in SQL database tables?

Polymorphism in SQL database tables? - sql

I currently have multiple tables in my database which consist of the same 'basic fields' like:
name character varying(100),
description text,
url character varying(255)
But I have multiple specializations of that basic table, which is for example that tv_series has the fields season, episode, airing, while the movies table has release_date, budget etc.
Now at first this is not a problem, but I want to create a second table, called linkgroups with a Foreign Key to these specialized tables. That means I would somehow have to normalize it within itself.
One way of solving this I have heard of is to normalize it with a key-value-pair-table, but I do not like that idea since it is kind of a 'database-within-a-database' scheme, I do not have a way to require certain keys/fields nor require a special type, and it would be a huge pain to fetch and order the data later.
So I am looking for a way now to 'share' a Primary Key between multiple tables or even better: a way to normalize it by having a general table and multiple specialized tables.

Right, the problem is you want only one object of one sub-type to reference any given row of the parent class. Starting from the example given by #Jay S, try this:
create table media_types (
media_type int primary key,
media_name varchar(20)
);
insert into media_types (media_type, media_name) values
(2, 'TV series'),
(3, 'movie');
create table media (
media_id int not null,
media_type not null,
name varchar(100),
description text,
url varchar(255),
primary key (media_id),
unique key (media_id, media_type),
foreign key (media_type)
references media_types (media_type)
);
create table tv_series (
media_id int primary key,
media_type int check (media_type = 2),
season int,
episode int,
airing date,
foreign key (media_id, media_type)
references media (media_id, media_type)
);
create table movies (
media_id int primary key,
media_type int check (media_type = 3),
release_date date,
budget numeric(9,2),
foreign key (media_id, media_type)
references media (media_id, media_type)
);
This is an example of the disjoint subtypes mentioned by #mike g.
Re comments by #Countably Infinite and #Peter:
INSERT to two tables would require two insert statements. But that's also true in SQL any time you have child tables. It's an ordinary thing to do.
UPDATE may require two statements, but some brands of RDBMS support multi-table UPDATE with JOIN syntax, so you can do it in one statement.
When querying data, you can do it simply by querying the media table if you only need information about the common columns:
SELECT name, url FROM media WHERE media_id = ?
If you know you are querying a movie, you can get movie-specific information with a single join:
SELECT m.name, v.release_date
FROM media AS m
INNER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?
If you want information for a given media entry, and you don't know what type it is, you'd have to join to all your subtype tables, knowing that only one such subtype table will match:
SELECT m.name, t.episode, v.release_date
FROM media AS m
LEFT OUTER JOIN tv_series AS t USING (media_id)
LEFT OUTER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?
If the given media is a movie,then all columns in t.* will be NULL.

Consider using a main basic data table with tables extending off of it with specialized information.
Ex.
basic_data
id int,
name character varying(100),
description text,
url character varying(255)
tv_series
id int,
BDID int, --foreign key to basic_data
season,
episode
airing
movies
id int,
BDID int, --foreign key to basic_data
release_data
budget

What you are looking for is called 'disjoint subtypes' in the relational world. They are not supported in sql at the language level, but can be more or less implemented on top of sql.

You could create one table with the main fields plus a uid then extension tables with the same uid for each specific case. To query these like separate tables you could create views.

Using the disjoint subtype approach suggested by Bill Karwin, how would you do INSERTs and UPDATEs without having to do it in two steps?
Getting data, I can introduce a View that joins and selects based on specific media_type but AFAIK I cant update or insert into that view because it affects multiple tables (I am talking MS SQL Server here). Can this be done without doing two operations - and without a stored procedure, natually.
Thanks

Question is quite old but for modern postresql versions it's also worth considering using json/jsonb/hstore type.
For example:
create table some_table (
name character varying(100),
description text,
url character varying(255),
additional_data json
);

Related

How to handle multivalued fields in MYSQL database for movie collection?

I have a large number of movies and TV series, which I currently keep track of in an MS Excel worksheet. Due to the large number of records and various data required, it is no longer a convenient option, so I want to switch to a MYSQL database, accessed through a GUI programmed in Java using Netbeans IDE.
I have the following tables in Excel:
Media_Library,
To_Be_Watched,
Statistics,
Wish_List,
Orders
Each film and TV series in my collection is in the Media_Library table, which has the following fields:
Sorting_Title
Title
Collection
Genre
Release_Year
Director
Age_Rating
Country
Runtime (min)
Watched
Media_Type
Format
For example: 'Alien 2', 'Aliens', 'Alien: Anthology', 'Action/Horror/Sci-Fi', 1986, 'James Cameron', 'M', 'America', 137, 'Yes', 'Movie', '4K UHD'
I'm stuck on what to do for the following fields: Genre, Director, Country, Runtime
Those 4 fields can each have multiple values, and I don't know how best to handle that; e.g. most films only have 1 runtime, but many have multiple (2 of the films have 4 different cuts). Also anthology films can have something like 6 different directors. I want to include all relevant genres, directors, countries and runtimes, but I don't know how to best do that.
I've tried adding a column for each value; genre1, genre2, ... This results in many blank values though. In the spreadsheet in Excel I put all applicable genres in a single field as one string, e.g. 'comedy/horror'.
What would be the easiest way to resolve this issue? Can I do a many-to-many relationship to achieve what I want?

Simply put a hard limit on the amount of genres.
For instance, while you may want the user to be able to enter as many genres as they want, is it rational to go above 20 genres? That doesn't make much sense and will only make searches much more time intensive.
For other possible duplicates, you can do something like this (in sqlite3 at least):
CREATE TABLE IF NOT EXISTS Directors
(id INTEGER PRIMARY KEY,
director TEXT,
UNIQUE(director) ON CONFLICT IGNORE)
CREATE TABLE IF NOT EXISTS file
(file_id INTEGER PRIMARY KEY,
filename TEXT,
director_id INTEGER,
watched INTEGER,
FOREIGN KEY (director_id)
REFERENCES Directors (id)
ON UPDATE CASCADE
ON DELETE SET NULL)
It doesn't matter if more than one genre have the same director, just as long as the 'file' table knows which one it's referencing and staying updated.
The 'watched' column holds a type of value that doesn't make sense to create an individual table for. For instance, say a song's track number is 2. Creating a table just for track numbers to reference doesn't make sense because you're going to spend a point in that table, then spend another point in the 'file' table to reference. So, just spend 1 point and put in the 'file'.
https://www.sqlitetutorial.net/sqlite-foreign-key/

Generally, you would add a second table, Directors, for instance, and then you relate that back to the movie title. You will need a uniqueID for the movie, and you do a join where that uniqueID is referenced in the Directors table, something like this working demo (not all fields were included in my demo):
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9e8d73bc798767b56b974ab4ebd30517
SELECT m.id, m.title, d.director
FROM Media_Library m
JOIN Directors d ON (m.id = d.mediaID);
or to concatenate the directors:
SELECT m.id, m.title, group_concat(d.director) as directors
FROM Media_Library m
JOIN Directors d ON (m.id = d.mediaID)
GROUP BY m.id;
Usually when you make this kind of relationship you will define a foreign key restraint, creating a link between the primary key of one table and a key (or keys) in another. In this case, the link is between id in Media_Library and mediaID in directors, so you would alter the create statement like this:
CREATE TABLE Directors (
id int not null auto_increment,
mediaID int,
director varchar(50),
PRIMARY KEY(id),
FOREIGN KEY (mediaID) REFERENCES Media_Library(id)
);
The foreign key is not strictly necessary, but it can reinforce database integrity. The ins and outs of foreign keys are out of scope for this answer, but you should probably read about them.
It is also possible to store the data in a JSON field since v5.7, like this:
CREATE TABLE test.Media_Library (id int not null auto_increment, title varchar (50), director JSON, PRIMARY KEY (id));
INSERT INTO test.Media_Library (title, director) VALUES
('Alien', json_array("Scott", "Scorsese")),
('The Alienist', '["Tarantino", "Nolan", "Kubrick"]'),
('Alien 2', '["Scott"]');
SELECT * FROM test.Media_Library;
https://www.db-fiddle.com/f/tG1SZorEHEYi5cYwgXjPeY/1
In the second query in that fiddle, I select only the first director from the list:
SELECT id, title, director->>'$[0]' as firstDirector
FROM Media_Library;
There are advantages to storing data this way, but there are tradeoffs, and unless you know what you are doing or you have a specific reason to be using JSON fields (for instance, you are getting the data from an api as JSON and you just want to use it as is), I would stick with the join method. Also, storing arrays is inherently non-normal (read about database normalization, a quick overview on wiki: https://en.wikipedia.org/wiki/Database_normalization).

Google 'Big Table' like data in SQL? How to design DB?

I need to create a database that, among other things, lets people choose 1 - N zip codes in the US.
Intuitively it seems best to make users a row and zip codes columns.
The problem I am having is that is like 42k columns. Which I am confident is outside most SQL DBs upper bound on columns.
I could have separate tables for each state. And then would have like 500-5K columns / table?
I mean that is doable, but the whole thing just seems a little ridiculous.
All thoughts, critiques, etc. are appreciated.
Also, any know the best place to get a list of zip codes (maybe broken down by state?)? Googling yielded some dated stuff. And so far I have USPS APIs for live verification. But I just need a static list.
Thanks everyone.

In any database -- including BigQuery -- your description suggests a table UserZips with one row per UserId and ZipCode.
Bigquery does not require such a structure. It supports arrays within a row, so you can have an array of the zip codes that a user chooses.
It also supports records within a row, so you can have an array of records. Each record could have a zip code and other information.
In many databases, including BigQuery, you might find a JSON object to be the appropriate representation.
Nonetheless, what first comes to mind is a table with a column for the user and zip code.

A structure like this would allow you to add as many zip codes as needed and link as many zips as you needed to as many users as you have.
http://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=6d1c3a41cb439d17f4def51a672766e1
CREATE TABLE zipCodes (
zipID int identity
, zipcode varchar(5) NOT NULL
, zipPlusFour varchar(4) DEFAULT '0000'
, CONSTRAINT PK_zipID PRIMARY KEY (zipID)
) ;
CREATE TABLE users (
userID int identity
, username nvarchar(20) NOT NULL
, CONSTRAINT PK_userID PRIMARY KEY (userID)
) ;
CREATE TABLE xref_users_zips (
userID int NOT NULL
, zipID int NOT NULL
, CONSTRAINT FK_userID FOREIGN KEY (userID) REFERENCES users(userID)
, CONSTRAINT FK_zipID FOREIGN KEY (zipID) REFERENCES zipCodes(zipID)
) ;
INSERT INTO zipCodes (zipcode)
VALUES ('00501'), ('00544'), ('00601')
;
INSERT INTO users (username)
VALUES ('johndoe'),('robertbuilder'),('zaphodbeeblebrox')
;
INSERT INTO xref_users_zips (userID, zipID)
VALUES (1,1), (2,2), (3,3)
;
SELECT *
FROM users u
INNER JOIN xref_users_zips xuz ON u.userID = xuz.userID
INNER JOIN zipcodes z ON xuz.zipID = z.zipID

How do I realize this 3 simple things in SQL?

i am an absolute SQL beginner and i already did search a lot with google but didnt find what i needed. So how do i realize in SQL(translated from a ER-Model):
An Entity having an Atribute that can have mulitple Entrys(I already found the ARRAY contstraint but i am unsure about that)
An Entity having an Atribute that consists itself of a few more Atributes(Picture: http://static3.creately.com/blog/wp-content/uploads/2012/03/Attributes-ER-Diagrams.jpeg)
Something like a isA Relation. And especially the total/partial and the disjunct characteristics.
Thanks already

In Postgres you have all three of this:
create table one
(
id integer primary key,
tags text[] -- can store multiple tags in a single column
);
A single column with multiple attributes can be done through a record type:
create type address as (number integer, street varchar(100), city varchar(100));
create table customer
(
id integer primary key,
name varchar(100) not null,
billing_address address
);
An isA relation can be done using inheritance
create table paying_customer
(
paid_at timestamp not null,
paid_amount decimal(14,2) not null
)
inherits (customer);
A paying customer has all attributes of a customer plus the time when the invoice was paid.

SQL Relation and Query

I am trying to create a database that contains two tables. I have included the create_tables.sql code if this helps. I am trying to set the relationship to make the STKEY the defining key so that a query can be used to search for thr key and show what issues this student has been having. At the moment when I search using:
SELECT *
FROM student, student_log
WHERE 'tilbun' like student.stkey
It shows all the issues in the table regardless of the STKEY. I think I may have the foreign key set incorrectly. I have included the create_tables.sql here.
CREATE TABLE `student`
(
`STKEY` VARCHAR(10),
`first_name` VARCHAR(15),
`surname` VARCHAR(15),
`year_group` VARCHAR(4),
PRIMARY KEY (STKEY)
)
;
CREATE TABLE `student_log`
(
`issue_number` int NOT NULL AUTO_INCREMENT,
`STKEY` VARCHAR(10),
`date_field` DATETIME,
`issue` VARCHAR(150),
PRIMARY KEY (issue_number),
INDEX (STKEY),
FOREIGN KEY (STKEY) REFERENCES student (STKEY)
)
;
Cheers for the help.

Though you have correctly defined the foreign key relationship in the tables, you must still specify a join condition when performing the query. Otherwise, you'll get a cartesian product of the two tables (all rows of one times all rows of the other)
SELECT
student.*,
student_log.*
FROM student INNER JOIN student_log ON student.STKEY = student_log.STKEY
WHERE student.STKEY LIKE 'tilbun'
And note that rather than using an implicit join (comma-separated list of tables), I have used an explicit INNER JOIN, which is the preferred modern syntax.
Finally, there's little use to using a LIKE clause instead of = unless you also use wildcard characters
WHERE student.STKEY LIKE '%tilbun%'

SQL One-to-Many Table vs. multiple one-to-one relationships

I'm working on a project with the following objective: A User can create a Challenge and select an optional Rival to take part of this challenge. The Challenge generates Daily entries and will track stats on these.
The basic User and Entry entities look like this:
CREATE TABLE users (
id (INT),
PRIMARY KEY (id)
);
CREATE TABLE entries (
challengeId INT,
userId INT,
entryDate DATE,
entryData VARCHAR,
PRIMARY KEY (challengeId, userId, entryDate)
)
The piece I'm having trouble with is the Challenge piece with the Rival concept. I can see two approaches.
// Hard code the concept of a Challenge Owner and Rival:
CREATE TABLE challenges (
id INT,
name VARCHAR,
ownerId INT,
rivalId INT NULL,
PRIMARY KEY (id),
UNIQUE KEY (ownerId, name)
);
// Create Many-to-one relationship.
CREATE TABLE challenges (
id INT,
name VARCHAR,
PRIMARY KEY (id),
UNIQUE KEY (name)
)
CREATE TABLE participant (
challengeId INT,
userId INT,
isOwner BIT,
PRIMARY KEY (challengeId, userId)
)
The problem with the first approach is that referential integrity is tough since now there are two columns where userIds reside (ownerId and rivalId). I'd have to create two tables for everything (owner_entries, rival_entries, owner_stats, etc.) in order to set up foreign keys.
The second approach solves this and has some advantages like allowing multiple rivals in the future. However, one thing I can't do anymore with that approach is enforce Challenge name uniqueness across a single user instead of the whole Challenge table. Additionally, tasks like finding a Challenge's owner is now trickier.
What's the right approach to the Challenges table? Is there anyway to set up these tables in a developer friendly manner or should I just jump all the way to Class Table Inheritance and manage the concept of Owner/Rivals there?

I think the way I would set this up is as follows (using the second approach):
CREATE TABLE challenges (id INT,
name VARCHAR,
owner_id INT,
PRIMARY KEY (id),
UNIQUE KEY (name, owner_id))
CREATE TABLE participant (challengeId INT,
userId INT,
PRIMARY KEY (challengeId, userId))
This allows easy tracking of who owns the challenge, yet extracts out the individual participants.
This would also allow you to unique the challenge name by the owner safely, and foreign keys on the userId in participant are easy. 'Rivals' are then all participants that are not the challenge owner.

I treat the first approach the right one.
You could have one table for users and one for challenges.
Are you aware that you can reference one table twice like below?
SELECT * FROM CHALLENGES
INNER JOIN USERS AS OWNERS ON OWNERS.ID = CHALLENGES.OWNERID
INNER JOIN USERS AS RIVALS ON RIVALS.ID = CHALLENGES.RIVALID
In this case you can reference both rivals and owners without creating new tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas