How can I optimize this MySQL query that involves two left joins? - optimization

I cannot figure out why my query slows down. What it boils down to are four tables: team, player, equipment, and metadata. Records in player and equipment have a FK to team, making team a parent of player and equipment. And all three of those tables' rows each have a record in metadata which stores things like creation date, creator user id, etc.
What I would like to retrieve all at once are any player and equipment records that belong to a particular team, in order of creation date. I start from the metadata table and left join the player and equipment tables via the metadata_id FK, but when I try to filter the SELECT to only retrieve records for a certain team, the query slow down big time when there are lots of rows.
Here is the query:
SELECT metadata.creation_date, player.id, equipment.id
FROM
metadata
JOIN datatype ON datatype.id = metadata.datatype_id
LEFT JOIN player ON player.metadata_id = metadata.id
LEFT JOIN equipment ON equipment.metadata_id = metadata.id
WHERE
datatype.name IN ('player', 'equipment')
AND (player.team_id = 1 OR equipment.team_id = 1)
ORDER BY metadata.creation_date;
You'll need to add a lot of rows to really see the slow down, around 10,000 for each table. What I don't understand is why it is really quick if I only filter in the where clause on one table, for example: "... AND player.team_id = 1" But when I add the other to make it "... AND (player.team_id = 1 OR equipment.team_id = 1)" it takes much, much longer.
Here are the tables and datatypes. Note that one thing that seems to help a lot, but not all that much, is the combined keys on player and equipment for metadata_id and team_id.
CREATE TABLE `metadata` (
`id` INT(4) unsigned NOT NULL auto_increment,
`creation_date` DATETIME NOT NULL,
`datatype_id` INT(4) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `datatype` (
`id` INT(4) unsigned NOT NULL auto_increment,
`name` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `team` (
`id` INT(4) unsigned NOT NULL auto_increment,
`metadata_id` INT(4) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `player` (
`id` INT(4) unsigned NOT NULL auto_increment,
`metadata_id` INT(4) unsigned NOT NULL,
`team_id` INT(4) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `equipment` (
`id` INT(4) unsigned NOT NULL auto_increment,
`metadata_id` INT(4) unsigned NOT NULL,
`team_id` INT(4) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
ALTER TABLE `metadata` ADD INDEX ( `datatype_id` ),
ADD INDEX ( `creation_date` );
ALTER TABLE `team` ADD INDEX ( `metadata_id` );
ALTER TABLE `player` ADD INDEX `metadata_id` ( `metadata_id`, `team_id` ),
ADD INDEX ( `team_id` );
ALTER TABLE `equipment` ADD INDEX `metadata_id` ( `metadata_id`, `team_id` ),
ADD INDEX ( `team_id` );
ALTER TABLE `metadata` ADD CONSTRAINT `metadata_ibfk_1` FOREIGN KEY (`datatype_id`) REFERENCES `datatype` (`id`);
ALTER TABLE `team` ADD CONSTRAINT `team_ibfk_1` FOREIGN KEY (`metadata_id`) REFERENCES `metadata` (`id`);
ALTER TABLE `player` ADD CONSTRAINT `player_ibfk_1` FOREIGN KEY (`metadata_id`) REFERENCES `metadata` (`id`);
ALTER TABLE `player` ADD CONSTRAINT `player_ibfk_2` FOREIGN KEY (`team_id`) REFERENCES `team` (`id`);
ALTER TABLE `equipment` ADD CONSTRAINT `equipment_ibfk_1` FOREIGN KEY (`metadata_id`) REFERENCES `metadata` (`id`);
ALTER TABLE `equipment` ADD CONSTRAINT `equipment_ibfk_2` FOREIGN KEY (`team_id`) REFERENCES `team` (`id`);
INSERT INTO `datatype` VALUES(1,'team'),(2,'player'),(3,'equipment');
Please note that I realize I could easily speed this up by doing a UNION of two SELECTS on player and equipment for a given team id, but the ORM I'm using does not natively support UNION's and so I'd much rather try and see if I can optimize this query instead. Also I'm just plain curious.

In MySQL it's hard to optimize "OR" conditions.
One common remedy is to split the query into two simpler queries and use UNION to combine them.
(SELECT metadata.creation_date, datatype.name, player.id
FROM metadata
JOIN datatype ON datatype.id = metadata.datatype_id
JOIN player ON player.metadata_id = metadata.id
WHERE datatype.name = 'player' AND player.team_id = 1)
UNION ALL
(SELECT metadata.creation_date, datatype.name, equipment.id
FROM metadata
JOIN datatype ON datatype.id = metadata.datatype_id
JOIN equipment ON equipment.metadata_id = metadata.id
WHERE datatype.name = 'equipment' AND equipment.team_id = 1)
ORDER BY creation_date;
You have to use the parentheses so that the ORDER BY applies to the result of the UNION instead of only to the result of the second SELECT.
update: What you're doing is called Polymorphic Associations, and it's hard to use in SQL. I even call it an SQL antipattern, despite some ORM frameworks that encourage its use.
What you really have in this case is a relationship between teams and players, and between teams and equipment. Players are not equipment and equipment are not players; they don't have a common supertype. It's misleading in both an OO sense and a relational sense that you have modeled them that way.
I'd say dump your metadata and datatype tables. These are anti-relational structures. Instead, use the team_id (which I assume is a foreign key to a teams table). Treat players and equipment as distinct types. Fetch them separately if you can't use UNION in your ORM. Then combine the result sets in your application.
You don't have to fetch everything in a single SQL query.

Related

multiple foreign keys as primary key postgres, should I do it?

This is one of those: why I should or why I should not.
So my books app has reviews, but one user must not review the same book more than one time. In my point of view makes sense to create a table for the reviews and make the user_id and the book_id (ISBN) as the PRIMARY KEY for the reviews table. But, could it be a problem at some point if the application gets too many reviews, for example, could that decision slow the queries?
I am using postgres and I am not sure if the following code is correct:
CREATE TABLE users(
user_id PRIMARY KEY SERIAL,
user_name VARCHAR NOT NULL UNIQUE,
pass_hash VARCHAR NOT NULL,
email VARCHAR NOT NULL UNIQUE,
);
CREATE TABLE books(
book_id PRIMARY KEY BIGINT,
author VARCHAR NOT NULL,
title VARCHAR NOT NULL,
year INT NOT NULL CHECK (year > 1484),
review_count INT DEFAULT 0 NOT NULL,
avrg_score FLOAT,
);
CREATE TABLE reviews(
user_id INT FOREIGN KEY REFERENCES users(user_id) NOT NULL
book_id INT FOREIGN KEY REFERENCES books(book_id) NOT NULL
score INT NOT NULL CHECK (score > 0, score < 11)
PRIMARY KEY (book_id, user_id)
);
This is a perfectly valid design choice.
You have a many-to-many relationship between books and users, which is represented by the reviews table. Having a compound primary key based on two foreign keys lets you enforce referential integrity (a given tuple may only appear once), and at the same time provide a primary key for the table.
Another option would be to have a surrogate primary key for the bridge table. This could make things easier if you need to reference the reviews from another table, but you would still need a unique constraint on both foreign key columns for integrity, so this would actually result in extra space being used.
When it comes to your code, it has a few issues:
the primary key keyword goes after the datatype
the check constraint is incorrectly formed
missing or additional commas here and there
Consider:
CREATE TABLE users(
user_id SERIAL PRIMARY KEY ,
user_name VARCHAR NOT NULL UNIQUE,
pass_hash VARCHAR NOT NULL,
email VARCHAR NOT NULL UNIQUE
);
CREATE TABLE books(
book_id BIGINT PRIMARY KEY,
author VARCHAR NOT NULL,
title VARCHAR NOT NULL,
year INT NOT NULL CHECK (year > 1484),
review_count INT DEFAULT 0 NOT NULL,
avrg_score FLOAT
);
CREATE TABLE reviews(
user_id INT REFERENCES users(user_id) NOT NULL,
book_id INT REFERENCES books(book_id) NOT NULL,
score INT NOT NULL CHECK (score > 0 and score < 11),
PRIMARY KEY (book_id, user_id)
);
I the above is good but I would drop columns review_count and avrg_score from books. These are derivable when needed. If needed for your application then instead of storing these create a view to derive them. This avoids the always complicated process of maintaining running values:
create view books_vw as
select b.book_id
, b.author
, b.title
, b.year
, count(r.*) review_count
, avg(r.score) avrg_score
from books b
left join reviews r
on r.book_id = b.book_id
group by
b.book_id
, b.author
, b.title
, b.year
;

Fill in bridge table with query

I have four tables, the main purpose of these tables is to have a many to many keyword to message relationship. each keyword can have many messages and each message can have many keywords they are related together if the category id matches.
CREATE TABLE public.trigger_category
(
id integer NOT NULL DEFAULT nextval('trigger_category_id_seq'::regclass),
description text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT trigger_category_id PRIMARY KEY (id)
)
CREATE TABLE public.trigger_keyword
(
id integer NOT NULL DEFAULT nextval('trigger_keyword_id_seq'::regclass),
keyword text COLLATE pg_catalog."default" NOT NULL,
category_id bigint NOT NULL,
CONSTRAINT trigger_keyword_id PRIMARY KEY (id),
CONSTRAINT trigger_keyword_category_id_fkey FOREIGN KEY (category_id)
REFERENCES public.trigger_category (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE NO ACTION
)
CREATE TABLE public.trigger_message
(
id integer NOT NULL DEFAULT nextval('trigger_message_id_seq'::regclass),
message text COLLATE pg_catalog."default" NOT NULL,
category_id bigint NOT NULL,
CONSTRAINT trigger_message_id PRIMARY KEY (id),
CONSTRAINT trigger_message_category_id_fkey FOREIGN KEY (category_id)
REFERENCES public.trigger_category (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
CREATE TABLE public.trigger_keyword_trigger_message
(
trigger_keyword_id bigint NOT NULL,
trigger_message_id bigint NOT NULL,
CONSTRAINT trigger_keyword_trigger_message_trigger_keyword_id_trigger_mess PRIMARY KEY (trigger_keyword_id, trigger_message_id),
CONSTRAINT trigger_keyword_trigger_message_trigger_keyword_id_fkey FOREIGN KEY (trigger_keyword_id)
REFERENCES public.trigger_keyword (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE NO ACTION,
CONSTRAINT trigger_keyword_trigger_message_trigger_message_id_fkey FOREIGN KEY (trigger_message_id)
REFERENCES public.trigger_message (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE NO ACTION
)
I manually insert keywords in the trigger_keyword table and I manually insert messages in the trigger_message table, if they are related then they would get the same category_id
Is it possible to write a query that would automatically go through the rows and if a keyword and message have the same category_id then it would create all the appropriate rows for the bridge table trigger_keyword_trigger_message?
You could achieve this with an Oracle Merge Query.
The USING clause selects all records to insert, and the WHEN MATCHED does to inserts in the bridge table.
MERGE INTO trigger_keyword_trigger_message tktm
USING (
SELECT tk.id tk_id, tm.id tm_id
FROM
trigger_keyword tk
INNER JOIN trigger_message tm on tm.category_id = tk.category_id
) us
WHEN MATCHED THEN
INSERT (tktm.trigger_keyword_id, tktm.trigger_message_id)
VALUES (us.tk_id, us.tm_id)
;

SQL - Query if the same card exists among multiple decks

Given the schema below, how can I query if and how many of the same character card is in multiple decks, without changing the underlying data?
CREATE TABLE 'deck' (
'deckId' BIGINT(20) NOT NULL AUTO_INCREMENT,
'title' TEXT(255) NOT NULL,
PRIMARY KEY ('deckId')
);
CREATE TABLE 'character' (
'characterId' BIGINT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(255) NOT NULL,
PRIMARY KEY ('userId')
);
CREATE TABLE 'card' (
'cardId' BIGINT(20) NOT NULL AUTO_INCREMENT,
'color' VARCHAR(255) NOT NULL,
'characterId' BIGINT(20) NOT NULL,
'deckId' BIGINT(20) NOT NULL,
PRIMARY KEY ('cardId'),
CONSTRAINT 'FK_card_character'
FOREIGN KEY ('characterId')
REFERENCES 'character' ('characterId'),
CONSTRAINT 'FK_card_deck'
FOREIGN KEY ('deckId')
REFERENCES 'deck' ('deckId')
);
The good news is this is a properly normalized schema. So often the questions posed here involve bad database design.
Essentially you need to GROUP BY, and then use the HAVING clause. The JOIN here is just cosmetic so you can get the Character name, but usually that is something people want. When you are getting comfortable with this type of query you might want to start with just the underlying table, which in your case is card.
SELECT ch.characterId, ch.name, count(*) as countOf
FROM card ca
JOIN character ch ON ch.characterId = ca.characterId
GROUP BY ca.characterId
HAVING COUNT(*) > 1

Use a common table with many to many relationship

I have two SQL tables: Job and Employee. I need to compare Job Languages Proficiencies and Employee Languages Proficiencies. A Language Proficiency is composed by a Language and a Language Level.
create table dbo.EmployeeLanguageProficiency (
EmployeeId int not null,
LanguageProficiencyId int not null,
constraint PK_ELP primary key clustered (EmployeeId, LanguageProficiencyId)
)
create table dbo.JobLanguageProficiency (
JobId int not null,
LanguageProficiencyId int not null,
constraint PK_JLP primary key clustered (JobId, LanguageProficiencyId)
)
create table dbo.LanguageProficiency (
Id int identity not null
constraint PK_LanguageProficiency_Id primary key clustered (Id),
LanguageCode nvarchar (4) not null,
LanguageLevelId int not null,
constraint UQ_LP unique (LanguageCode, LanguageLevelId)
)
create table dbo.LanguageLevel (
Id int identity not null
constraint PK_LanguageLevel_Id primary key clustered (Id),
Name nvarchar (80) not null
constraint UQ_LanguageLevel_Name unique (Name)
)
create table dbo.[Language]
(
Code nvarchar (4) not null
constraint PK_Language_Code primary key clustered (Code),
Name nvarchar (80) not null
)
My question is about LanguageProficiency table. I added an Id has PK but I am not sure this is the best option.
What do you think about this scheme?
Your constraint of EmployeeId, LanguageProficiencyId allows an employee to have more than one proficiency per language. This sounds counterintuitive.
This would be cleaner, as it allows only one entry per language:
create table dbo.EmployeeLanguageProficiency (
EmployeeId int not null,
LanguageId int not null,
LanguageLevelId int not null,
constraint PK_ELP primary key clustered (EmployeeId, LanguageId)
)
I don't see the point of table LanguageProficiency at the moment.
Same applies to the Job of course. Unless you would like to allow a "range" of proficiencies. But assuming that "too high proficiency" does not hurt, it can easilly be defined through a >= statement in our queries.
Rgds

Problems understanding FOREIGN KEY / CASCADE constraints

I need some help at understanding how foreign keys and cascades work. I understood the theory but I'm having troubles to apply these to a real world example.
Let's assume I've got the following tables (and an arbitrary number of other tables that may reference table tags):
CREATE TABLE tags (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) UNIQUE
) Engine=InnoDB;
CREATE TABLE news (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(63),
content TEXT,
INDEX (title)
) Engine=InnoDB;
So I create a further table to provide the many-to-many relation between news and tags:
CREATE TABLE news_tags (
news_id INT UNSIGNED,
tags_id INT UNSIGNED,
FOREIGN KEY (news_id) REFERENCES news (id) ON DELETE ...,
FOREIGN KEY (tags_id) REFERENCES tags (id) ON DELETE ...
) Engine=InnoDB;
My requirements to the cascades:
If I delete a news, all corresponding entries in news_tags should be removed as well.
Same applies for table x that may be added later with x_tags-table.
If I delete a tag, all corresponding entries in news_tags and in every further table x_tags should be removed as well.
I'm afraid that I may have to revisit my table structure for this purpose, but that's alright since I'm only trying to figure out how stuff works.
Any links to good tutorials, SQL-queries or JPA-examples appreciated!
You seem to be proposing something like this, which sounds reasonable to me:
CREATE TABLE tags
(
id INTEGER NOT NULL,
name VARCHAR(20) NOT NULL,
UNIQUE (id),
UNIQUE (name)
);
CREATE TABLE news
(
id INTEGER NOT NULL,
title VARCHAR(30) NOT NULL,
content VARCHAR(200) NOT NULL,
UNIQUE (id)
);
CREATE TABLE news_tags
(
news_id INTEGER NOT NULL,
tags_id INTEGER NOT NULL,
UNIQUE (tags_id, news_id),
FOREIGN KEY (news_id)
REFERENCES news (id)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (tags_id)
REFERENCES tags (id)
ON DELETE CASCADE
ON UPDATE CASCADE
);