SQLite : how to achieve ORM "eager loading" in plain SQL?

SQLite : how to achieve ORM "eager loading" in plain SQL? - sql

I have a SQLite database with Movies, Actors, and Tags.
There is a many-to-many relation between movies and actors, and movies and tags.
In my app, I want to list all movies with their corresponding actors and tags, for example:
Mr. & Mrs. Smith: Actors: Brad Pitt, Angelina Jolie, Tags: Action, Comedy, Crime
Passengers: Actors: Jennifer Lawrence, Chris Pratt, Tags: Adventure, Drama, Romance
And I'm wondering what are the correct SQL statements to achieve that.
The tables in my database are defined as follows :
CREATE TABLE "Movie"
(
id INTEGER PRIMARY KEY,
name VARCHAR
)
CREATE TABLE "Actor"
(
id INTEGER PRIMARY KEY,
name VARCHAR
)
CREATE TABLE "Tag"
(
id INTEGER PRIMARY KEY,
name VARCHAR
)
CREATE TABLE "Movie_Actor"
(
movie_id INTEGER,
actor_id INTEGER,
FOREIGN KEY(movie_id) REFERENCES "Movie" (id),
FOREIGN KEY(actor_id) REFERENCES "Actor" (id),
UNIQUE(movie_id, actor_id)
);
CREATE TABLE "Movie_Tag"
(
movie_id INTEGER,
tag_id INTEGER,
FOREIGN KEY(movie_id) REFERENCES "Movie" (id),
FOREIGN KEY(tag_id) REFERENCES "Tag" (id),
UNIQUE(movie_id, tag_id)
);
To get a single movie with it's actors and tags I use the following 3 queries (for example Movie.id = 1):
To get the movie row:
SELECT *
FROM Movie
WHERE id = 1
To get the actors:
SELECT *
FROM
(SELECT * FROM Actor) AS T1
JOIN
(SELECT * FROM Movie_Actor WHERE Movie_Actor.movie_id = 1) AS T2 ON T1.id = T2.actor_id
To get tags:
SELECT *
FROM
(SELECT * FROM Tag) AS T1
JOIN
(SELECT * FROM Movie_Tag WHERE Movie_Tag.movie_id = 1) AS T2 ON T1.id = T2.tag_id
My question is, how should I go about retrieving the tags and actors when I'm getting a list of movies such as SELECT * FROM Movie?
Many ORMs have an option to 'eager load' relations, and I'm wondering how can I do it in plain SQL?
Do I need to execute extra 2 queries on each row I get from SELECT * FROM Movie?
Thank You.

To get movie with id = 1 along with all of the actors associated with that movie you do the following:
SELECT * FROM Movie
LEFT JOIN Movie_Actor ON Movie_Actor.movie_id = Movie.id
LEFT JOIN Actor ON Actor.id = Movie_Actor.actor_id
WHERE id = 1
To also get all the tags, keep joining the associated tables Movie_Tag and Tag.
You might think that this would be terribly inefficient because a lot of information is going to be duplicated, for example the name of a movie is going to be fetched not just once, but NA * NT times, where NA is the number of fetched actors and NT is the number of fetched tags.
Actually, databases tend to be smart about that, (precisely because this is a very popular mechanism of retrieving data with as few as possible roundtrips to the database,) so within their communication protocols they contain special measures to avoid transmitting field values that are identical from row to row. So, the actual amount of data transmitted is very close to exactly the amount of data that would have been transmitted if you queried each table separately.
The benefit, of course, is that you suffer the penalty of a single round-trip to the database, instead of several round-trips, one for each table.

Related

How to handle multivalued fields in MYSQL database for movie collection?

I have a large number of movies and TV series, which I currently keep track of in an MS Excel worksheet. Due to the large number of records and various data required, it is no longer a convenient option, so I want to switch to a MYSQL database, accessed through a GUI programmed in Java using Netbeans IDE.
I have the following tables in Excel:
Media_Library,
To_Be_Watched,
Statistics,
Wish_List,
Orders
Each film and TV series in my collection is in the Media_Library table, which has the following fields:
Sorting_Title
Title
Collection
Genre
Release_Year
Director
Age_Rating
Country
Runtime (min)
Watched
Media_Type
Format
For example: 'Alien 2', 'Aliens', 'Alien: Anthology', 'Action/Horror/Sci-Fi', 1986, 'James Cameron', 'M', 'America', 137, 'Yes', 'Movie', '4K UHD'
I'm stuck on what to do for the following fields: Genre, Director, Country, Runtime
Those 4 fields can each have multiple values, and I don't know how best to handle that; e.g. most films only have 1 runtime, but many have multiple (2 of the films have 4 different cuts). Also anthology films can have something like 6 different directors. I want to include all relevant genres, directors, countries and runtimes, but I don't know how to best do that.
I've tried adding a column for each value; genre1, genre2, ... This results in many blank values though. In the spreadsheet in Excel I put all applicable genres in a single field as one string, e.g. 'comedy/horror'.
What would be the easiest way to resolve this issue? Can I do a many-to-many relationship to achieve what I want?

Simply put a hard limit on the amount of genres.
For instance, while you may want the user to be able to enter as many genres as they want, is it rational to go above 20 genres? That doesn't make much sense and will only make searches much more time intensive.
For other possible duplicates, you can do something like this (in sqlite3 at least):
CREATE TABLE IF NOT EXISTS Directors
(id INTEGER PRIMARY KEY,
director TEXT,
UNIQUE(director) ON CONFLICT IGNORE)
CREATE TABLE IF NOT EXISTS file
(file_id INTEGER PRIMARY KEY,
filename TEXT,
director_id INTEGER,
watched INTEGER,
FOREIGN KEY (director_id)
REFERENCES Directors (id)
ON UPDATE CASCADE
ON DELETE SET NULL)
It doesn't matter if more than one genre have the same director, just as long as the 'file' table knows which one it's referencing and staying updated.
The 'watched' column holds a type of value that doesn't make sense to create an individual table for. For instance, say a song's track number is 2. Creating a table just for track numbers to reference doesn't make sense because you're going to spend a point in that table, then spend another point in the 'file' table to reference. So, just spend 1 point and put in the 'file'.
https://www.sqlitetutorial.net/sqlite-foreign-key/

Generally, you would add a second table, Directors, for instance, and then you relate that back to the movie title. You will need a uniqueID for the movie, and you do a join where that uniqueID is referenced in the Directors table, something like this working demo (not all fields were included in my demo):
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9e8d73bc798767b56b974ab4ebd30517
SELECT m.id, m.title, d.director
FROM Media_Library m
JOIN Directors d ON (m.id = d.mediaID);
or to concatenate the directors:
SELECT m.id, m.title, group_concat(d.director) as directors
FROM Media_Library m
JOIN Directors d ON (m.id = d.mediaID)
GROUP BY m.id;
Usually when you make this kind of relationship you will define a foreign key restraint, creating a link between the primary key of one table and a key (or keys) in another. In this case, the link is between id in Media_Library and mediaID in directors, so you would alter the create statement like this:
CREATE TABLE Directors (
id int not null auto_increment,
mediaID int,
director varchar(50),
PRIMARY KEY(id),
FOREIGN KEY (mediaID) REFERENCES Media_Library(id)
);
The foreign key is not strictly necessary, but it can reinforce database integrity. The ins and outs of foreign keys are out of scope for this answer, but you should probably read about them.
It is also possible to store the data in a JSON field since v5.7, like this:
CREATE TABLE test.Media_Library (id int not null auto_increment, title varchar (50), director JSON, PRIMARY KEY (id));
INSERT INTO test.Media_Library (title, director) VALUES
('Alien', json_array("Scott", "Scorsese")),
('The Alienist', '["Tarantino", "Nolan", "Kubrick"]'),
('Alien 2', '["Scott"]');
SELECT * FROM test.Media_Library;
https://www.db-fiddle.com/f/tG1SZorEHEYi5cYwgXjPeY/1
In the second query in that fiddle, I select only the first director from the list:
SELECT id, title, director->>'$[0]' as firstDirector
FROM Media_Library;
There are advantages to storing data this way, but there are tradeoffs, and unless you know what you are doing or you have a specific reason to be using JSON fields (for instance, you are getting the data from an api as JSON and you just want to use it as is), I would stick with the join method. Also, storing arrays is inherently non-normal (read about database normalization, a quick overview on wiki: https://en.wikipedia.org/wiki/Database_normalization).

Where to store cross referenced SQL values?

I'm building a movie database with two tables, one containing movie names and the other containing actor names. How do I cross reference the data?
For example, should entries in the movie names table contain a list of actors for each movie, or should entries in the actor names table contain a list of movies for actor?

You will need an additional table which links the two, and contains at least the movie id and actor id as its columns. Assuming your Actors table has actor_id as its primary key, and Movies has movie_id as its primary key. This table could also include information about the actor specific to that movie, for example, the character's name or other character-related info.
CREATE TABLE actors_in_movies (
/* Use whatever auto increment keyword is needed for your RDBMS */
id INT NOT NULL PRIMARY KEY <auto increment>,
actor_id INT,
movie_id INT,
character_name VARCHAR(),
other_character_info VARCHAR(),
character_best_quote TEXT,
FOREIGN KEY (actor_id) REFERENCES actors (actor_id),
FOREIGN KEY (movie_id) REFERENCES actors (movies_id)
);
To query all the movies an actor appears in,
use something to the effect of:
SELECT
actors.*,
movies.name
FROM
actors
JOIN actors_in_movies ON actors.actor_id = actors_in_movies.actor_id
JOIN movies ON movies.movie_id = actors_in_movies.movie_id
WHERE actors.actor_id = <some actor id>
To query all the actors in a particular movie,
use something to the effect of:
SELECT
actors.*
FROM
actors
JOIN actors_in_movies ON actors.actor_id = actors_in_movies.actor_id
JOIN movies ON movies.movie_id = actors_in_movies.movie_id
WHERE movies.movie_id = <some movie id>
To add an actor to a movie, insert a row into your actors_in_movies table:
INSERT INTO actors_in_movies (actor_id, movie_id) VALUES (..., ...);

Use a cross table! Have a table which has only two columns: movieID, actorID.

You should have a third table, one called ActorInMovie. This table should have 2 columns. ActorID, MovieID.
This way, you'll be able to have a many to many rel.

Matching delimited string to table rows

So I have two tables in this simplified example: People and Houses. People can own multiple houses, so I have a People.Houses field which is a string with comma delimeters (eg: "House1, House2, House4"). Houses can have multiple people in them, so I have a Houses.People field, which works the same way ("Sam, Samantha, Daren").
I want to find all the rows in the People table corresponding to the the names of people in the given house, and vice versa for houses belong to people. But I can't figure out how to do that.
This is as close as I've come up with so far:
SELECT People.*
FROM Houses
LEFT JOIN People ON Houses.People Like CONCAT(CONCAT('%', People.Name), '%')
WHERE House.Name = 'SomeArbitraryHouseImInterestedIn'
But I get some false positives (eg: Sam and Samantha might both get grabbed when I just want Samantha. And likewise with House3, House34, and House343, when I want House343).
I thought I might try and write a SplitString function so I could split a string (using a list of delimiters) into a set, and do some subquery on that set, but MySQL functions can't have tables as return values.
Likewise you can't store arrays as fields, and from what I gather the comma-delimited elements in a long string seems to be the usual way to approach this problem.
I can think of some different ways to get what I want but I'm wondering if there isn't a nice solution.

Likewise you can't store arrays as fields, and from what I gather the comma-delimited elements in a long string seems to be the usual way to approach this problem.
I hope that's not true. Representing "arrays" in SQL databases shouldn't be in a comma-delimited format, but the problem can be correctly solved by using a junction table. Comma-separated fields should have no place in relational databases, and they actually violates the very first normal form.
You'd want your table schema to look something like this:
CREATE TABLE people (
id int NOT NULL,
name varchar(50),
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE houses (
id int NOT NULL,
name varchar(50),
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE people_houses (
house_id int,
person_id int,
PRIMARY KEY (house_id, person_id),
FOREIGN KEY (house_id) REFERENCES houses (id),
FOREIGN KEY (person_id) REFERENCES people (id)
) ENGINE=INNODB;
Then searching for people will be as easy as this:
SELECT p.*
FROM houses h
JOIN people_houses ph ON ph.house_id = h.id
JOIN people p ON p.id = ph.person_id
WHERE h.name = 'SomeArbitraryHouseImInterestedIn';
No more false positives, and they all lived happily ever after.

The nice solution is to redesign your schema so that you have the following tables:
People
------
PeopleID (PK)
...
PeopleHouses
------------
PeopleID (PK) (FK to People)
HouseID (PK) (FK to Houses)
Houses
------
HouseID (PK)
...

Short Term Solution
For your immediate problem, the FIND_IN_SET function is what you want to use for joining:
For People
SELECT p.*
FROM PEOPLE p
JOIN HOUSES h ON FIND_IN_SET(p.name, h.people)
WHERE h.name = ?
For Houses
SELECT h.*
FROM HOUSES h
JOIN PEOPLE p ON FIND_IN_SET(h.name, p.houses)
WHERE p.name = ?
Long Term Solution
Is to properly model this by adding a table to link houses to people, because you're likely storing redundant relationships in both tables:
CREATE TABLE people_houses (
house_id int,
person_id int,
PRIMARY KEY (house_id, person_id),
FOREIGN KEY (house_id) REFERENCES houses (id),
FOREIGN KEY (person_id) REFERENCES people (id)
)

The problem is that you have to use another schema, like the one proposed by #RedFilter. You can see it as:
People table:
PeopleID
otherFields
Houses table:
HouseID
otherFields
Ownership table:
PeopleID
HouseID
otherFields
Hope that helps,

Hi you just change the table name places, left side is People and then right side is Houses:
SELECT People.*
FROM People
LEFT JOIN Houses ON Houses.People Like CONCAT(CONCAT('%', People.Name), '%')
WHERE House.Name = 'SomeArbitraryHouseImInterestedIn'

how to insert data using multiple tables

i have created a db names movielibrarysystem in which i have 3 tables..
that are type,publisher and movie... now 1 publisher could have many many movies and 1 movie is of many type.. in the movie table, the publisher id as well as the typeid are acting as a foreign keys..
my question is that how to insert a data into a movie table... i have already inserted data into publisher and type tables but i`m not able to insert into movie table..

Here's what you do. First, the relationship between publisher and movie is one to many - a publisher can publish many movies but each movie only has one publisher. However, type to movie is a many-to-many relationship (you state that one movie is of many types, but it's also the case that one type is of many movies), so you should have an extra table for that relationship.
Essentially:
publisher:
publisher_id
publisher_name
<other publisher info>
type:
type_id
type_name
<other type info>
movie:
movie_id
movie_name
publisher_id references publisher(publisher_id)
<other movie info>
movie_type:
movie_id references movie(movie_id)
type_id references type(type_id)
with suitable primary keys for all those.
Then assuming you have the publisher and type inserted, you can insert the movie thus:
begin transaction;
insert into movie (movie_name,publisher_id) values (
'Avatar',
(select publisher_id from publisher where publisher_name = 'Spielberg')
);
insert into movie_type (movie_id,type_id) values (
(select movie_id from movie where movie_name = 'Avatar'),
(select type_id from type where type_name = 'SciFi')
);
insert into movie_type (movie_id,type_id) values (
(select movie_id from movie where movie_name = 'Avatar'),
(select type_id from type where type_name = 'GrownUpSmurfs')
);
commit;
In other words, you use sub-selects to get the IDs from the relevant tables based on a unique set of properties (above example assumes movie names are unique, in reality you will need a more specific query, such as to handle the different films with the same name: The Omega Man, for example).
If you're not using a DBMS that supports selects in value sections, your best bet will be probably just to remember or extract the relevant values to a variable in whatever programming language you're using and construct a query to do it. In pseudo-code:
begin transaction;
insert into movie (movie_name,publisher_id) values (
'Avatar',
(select publisher_id from publisher where publisher_name = 'Spielberg')
);
select movie_id into :m_id from movie where movie_name = 'Avatar';
select type_id into :t_id1 from type where type_name = 'SciFi';
select type_id into :t_id2 from type where type_name = 'GrownUpSmurfs';
insert into movie_type (movie_id,type_id) values (:m_id, :t_id1);
insert into movie_type (movie_id,type_id) values (:m_id, :t_id2);
commit;
In response to your comment:
hey, i didn't got the point that of type and movie relationship.. will you please elaborate this point .. regards Abid
Both Avatar and Solaris can be considered of the type SciFi. So many movies to one genre. And Xmen:Wolverine can be considered both action and comic-remake. So many types to one movie.
Many-to-many relationships are best represented with a separate table containing the cross matches between the two related tables.

If you constrain tables type and publisher to use foreign keys you will need to first insert both ids into your movies table. If possible I would let the movies table be the one that increments the foreign keys.

Polymorphism in SQL database tables?

I currently have multiple tables in my database which consist of the same 'basic fields' like:
name character varying(100),
description text,
url character varying(255)
But I have multiple specializations of that basic table, which is for example that tv_series has the fields season, episode, airing, while the movies table has release_date, budget etc.
Now at first this is not a problem, but I want to create a second table, called linkgroups with a Foreign Key to these specialized tables. That means I would somehow have to normalize it within itself.
One way of solving this I have heard of is to normalize it with a key-value-pair-table, but I do not like that idea since it is kind of a 'database-within-a-database' scheme, I do not have a way to require certain keys/fields nor require a special type, and it would be a huge pain to fetch and order the data later.
So I am looking for a way now to 'share' a Primary Key between multiple tables or even better: a way to normalize it by having a general table and multiple specialized tables.

Right, the problem is you want only one object of one sub-type to reference any given row of the parent class. Starting from the example given by #Jay S, try this:
create table media_types (
media_type int primary key,
media_name varchar(20)
);
insert into media_types (media_type, media_name) values
(2, 'TV series'),
(3, 'movie');
create table media (
media_id int not null,
media_type not null,
name varchar(100),
description text,
url varchar(255),
primary key (media_id),
unique key (media_id, media_type),
foreign key (media_type)
references media_types (media_type)
);
create table tv_series (
media_id int primary key,
media_type int check (media_type = 2),
season int,
episode int,
airing date,
foreign key (media_id, media_type)
references media (media_id, media_type)
);
create table movies (
media_id int primary key,
media_type int check (media_type = 3),
release_date date,
budget numeric(9,2),
foreign key (media_id, media_type)
references media (media_id, media_type)
);
This is an example of the disjoint subtypes mentioned by #mike g.
Re comments by #Countably Infinite and #Peter:
INSERT to two tables would require two insert statements. But that's also true in SQL any time you have child tables. It's an ordinary thing to do.
UPDATE may require two statements, but some brands of RDBMS support multi-table UPDATE with JOIN syntax, so you can do it in one statement.
When querying data, you can do it simply by querying the media table if you only need information about the common columns:
SELECT name, url FROM media WHERE media_id = ?
If you know you are querying a movie, you can get movie-specific information with a single join:
SELECT m.name, v.release_date
FROM media AS m
INNER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?
If you want information for a given media entry, and you don't know what type it is, you'd have to join to all your subtype tables, knowing that only one such subtype table will match:
SELECT m.name, t.episode, v.release_date
FROM media AS m
LEFT OUTER JOIN tv_series AS t USING (media_id)
LEFT OUTER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?
If the given media is a movie,then all columns in t.* will be NULL.

Consider using a main basic data table with tables extending off of it with specialized information.
Ex.
basic_data
id int,
name character varying(100),
description text,
url character varying(255)
tv_series
id int,
BDID int, --foreign key to basic_data
season,
episode
airing
movies
id int,
BDID int, --foreign key to basic_data
release_data
budget

What you are looking for is called 'disjoint subtypes' in the relational world. They are not supported in sql at the language level, but can be more or less implemented on top of sql.

You could create one table with the main fields plus a uid then extension tables with the same uid for each specific case. To query these like separate tables you could create views.

Using the disjoint subtype approach suggested by Bill Karwin, how would you do INSERTs and UPDATEs without having to do it in two steps?
Getting data, I can introduce a View that joins and selects based on specific media_type but AFAIK I cant update or insert into that view because it affects multiple tables (I am talking MS SQL Server here). Can this be done without doing two operations - and without a stored procedure, natually.
Thanks

Question is quite old but for modern postresql versions it's also worth considering using json/jsonb/hstore type.
For example:
create table some_table (
name character varying(100),
description text,
url character varying(255),
additional_data json
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas