How to optimize SQL query that uses GROUP BY and joined many-to-many relation tables?

How to optimize SQL query that uses GROUP BY and joined many-to-many relation tables? - sql

I have tables with many-to-many relations:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL DEFAULT '',
`size_id` tinyint(3) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `size` (`size_id`)
);
CREATE TABLE `items_styles` (
`style_id` smallint(5) unsigned NOT NULL,
`item_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`item_id`, `style_id`),
INDEX `style` (`style_id`),
INDEX `item` (`item_id`),
CONSTRAINT `items_styles_item_id_item_id` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`)
);
CREATE TABLE `items_themes` (
`theme_id` tinyint(3) unsigned NOT NULL,
`item_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`item_id`, `theme_id`),
INDEX `theme` (`theme_id`),
INDEX `item` (`item_id`),
CONSTRAINT `items_themes_item_id_item_id` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`)
);
I'm trying to get the report that shows style_id and the number of items that use this style but with applying filters to the item table and/or to another table, like this:
SELECT i_s.style_id, COUNT(i.id) total FROM item i
JOIN items_themes i_t ON i.id = i_t.item_id AND i_t.theme_id IN (6, 7)
JOIN items_styles i_s ON i.id = i_s.item_id
GROUP BY i_s.style_id;
-- or like this
SELECT i_s.style_id, COUNT(i.id) total FROM item i
JOIN items_themes i_t ON i.id = i_t.item_id AND i_t.theme_id IN (6, 7)
JOIN items_styles i_s ON i.id = i_s.item_id
WHERE i.size_id != 3
GROUP BY i_s.style_id;
The problem is that tables are pretty big so queries take a long time to execute (~8 seconds)
item - 8M rows
items_styles - 12M rows
items_themes - 11M rows
Is there any way to optimize these queries? If not, what approach can be used to receive such reports.
I will be grateful for any help. Thanks.

First, you don't need the items table for the queries. Probably doesn't have much impact on performance, but no need.
So you can write the query as:
SELECT i_s.style_id, COUNT(*) as total
FROM items_themes i_t JOIN
items_styles i_s
ON i_s.item_id = i_t.item_id
WHERE i_t.theme_id IN (6, 7)
GROUP BY i_s.style_id;
For this query, you want an index on items_themes(theme_id, item_id). There is no much you can do about the GROUP BY.
Then, I don't think this is what you really want, because it will double count an item that has both themes. So, use EXISTS instead:
SELECT i_s.style_id, COUNT(*) as total
FROM items_styles i_s
WHERE EXISTS (SELECT
FROM items_themes i_t
WHERE i_t.item_id = i_s.item_id AND
i_t.theme_id IN (6, 7)
)
GROUP BY i_s.style_id;
For this, you want an index on items_themes(item_id, theme_id). You can also try an index on items_styles(style_id). Some databases might be able to use that one, but I am guessing not MariaDB.

In a many-to-many table, it is optimal to have these two indexes:
PRIMARY KEY (`item_id`, `style_id`),
INDEX `style` (`style_id`, `item_id`)
And be sure to use InnoDB.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Still, you have two many-to-many mappings, so there probably is no great solution.

Related

In a SELECT command, how do I use data from one table to specify data in another?

I have 2 tables. What is important is the PlayerId and the Username.
CREATE TABLE [dbo].[Run]
(
[RunId] INT NOT NULL,
[PlayerId] INT NOT NULL,
[Duration] TIME(7) NOT NULL,
[DateUploaded] NCHAR(10) NOT NULL,
[VersionId] INT NOT NULL,
PRIMARY KEY CLUSTERED ([RunId] ASC),
CONSTRAINT [FK_Run_Player]
FOREIGN KEY ([PlayerId]) REFERENCES [dbo].[Player] ([PlayerId]),
CONSTRAINT [FK_Run_Version]
FOREIGN KEY ([VersionId]) REFERENCES [dbo].[Version] ([VersionId])
);
CREATE TABLE [dbo].[Player]
(
[PlayerId] INT NOT NULL,
[Username] NCHAR(20) NOT NULL,
[ProfilePicture] IMAGE NULL,
[Country] NCHAR(20) NOT NULL,
[LeagueId] INT NULL,
[DateJoined] DATE NULL,
PRIMARY KEY CLUSTERED ([PlayerId] ASC),
CONSTRAINT [FK_Player_League]
FOREIGN KEY ([LeagueId]) REFERENCES [dbo].[League] ([LeagueId])
);
I have a select command:
SELECT
PlayerId, Duration, VersionId, DateUploaded
FROM
[Run]
(with apologies in advance for my messy made up pseudocode), what I need it to do is:
SELECT (Player.PlayerId.Username)
What I basically need it to do, is instead of giving me just PlayerId, I need it to get the corresponding Username (from the other table) that matches each PlayerId (PlayerId is a foreign key)
So say for example instead of returning
1, 2, 3, 4, 5
it should return
John12, Abby2003, amy_932, asha7494, luke_ww
assuming, for example, Abby2003's PlayerId was 2.
I've done trial and error and either nobody's tried this before or I'm searching the wrong keywords. This is using VS 2022, ASP.NET Web Forms, and Visual Basic, but that shouldn't affect anything I don't think. Any syntax ideas or help would be greatly appreciated.

try this for join the 2 Table togother
SELECT R.RunId
,R.PlayerId
,R.Duration
,R.DateUploaded
,R.VersionId
,P.Username
,P.ProfilePicture
,P.Country
,P.LeagueId
,P.DateJoined
FROM Run R
inner join Player P on R.PlayerId = P.PlayerId

Usually in this case joins are used. You can join the two tables together, give them aliases (or don't, personal preference really), then select what you need. In this case, you would probably want an inner join. Your query would probably look something like this:
SELECT p.Username FROM [Run] r
INNER JOIN [Player] p ON r.PlayerId = p.PlayerId
Then if you need to you can put a WHERE clause after that.
More about joins here

Retrieving data from a many-to-many relationship in a recipes database

I am attempting to create a recipes database where a user can input ingredients and it will output a list of potential recipes. I have created three tables:
CREATE TABLE [dbo].[Ingredients] (
[Ingredient_ID] INT IDENTITY (1, 1) NOT NULL,
[Name] VARCHAR (50) NOT NULL,
PRIMARY KEY CLUSTERED ([Ingredient_ID] ASC)
);
CREATE TABLE [dbo].[recipes] (
[Recipe_ID] INT IDENTITY (1, 1) NOT NULL,
[Name] VARCHAR (50) NOT NULL,
[Instructions] TEXT NULL,
[Preperation_Time] FLOAT (53) NULL,
[Author] VARCHAR (50) NULL,
CONSTRAINT [PK.recipes] PRIMARY KEY CLUSTERED ([Recipe_ID] ASC)
);
CREATE TABLE [dbo].[RecipeIngredients] (
[Recipe_ID] INT NOT NULL,
[Ingredient_ID] INT NOT NULL,
PRIMARY KEY CLUSTERED ([Recipe_ID] ASC, [Ingredient_ID] ASC),
CONSTRAINT [FK_RecipeIngredients_To_Ingredients] FOREIGN KEY ([Ingredient_ID]) REFERENCES [dbo].[Ingredients] ([Ingredient_ID]),
CONSTRAINT [FK_RecipeIngredients_To_Recipes] FOREIGN KEY ([Recipe_ID]) REFERENCES [dbo].[recipes] ([Recipe_ID])
);
I have populated all tables and I am now attempting to retrieve the recipes based on what the user has entered.
I have a created a test SQL statement to retrieve all recipes that contain 'Eggs' using:
string sqlString = "SELECT recipes.Name, Instructions, recipes.Preperation_Time, Author FROM RecipeIngredients" +
" INNER JOIN recipes ON recipes.Recipe_ID = RecipeIngredients.Recipe_ID" +
" INNER JOIN Ingredients ON Ingredients.Ingredient_ID = RecipeIngredients.Ingredient_ID" +
" WHERE ingredients.Name = 'Eggs'";
The data does not show up in my dataGridView, but I am unsure if it is because the statement is wrong or other factors.
Is the statement correct? I am unfamiliar with the INNER JOIN command.
I am also unsure how to design an Sql statement that can take a varying amount of ingredient names without creating an Sql statement for every possibility.
Thanks in advance, if you need me to expand on anything I have asked please ask.

Here is the query that should work. Use of aliases is recommended
SELECT r.Name, r.Instructions, r.Preperation_Time, r.Author
FROM Ingredients i
join RecipeIngredients ri on i.Ingredient_ID = ri.Ingredient_ID
join recipes r on ri.Recipe_ID = r.Recipe_ID
where i.Name = 'Eggs'
You may want to run it through SQL Server Management Studio to ascertain return of result before coding it in C# solution.

MySQL, need some performance suggestions on my match query

I need some performance improvement guidance, my query takes several seconds to run and this is causing problems on the server. This query runs on the most common page on my site. I think a radical rethink may be required.
~ EDIT ~
This query produces a list of records whose keywords match those of the program (record) being queried. My site is a software download directory. And this list is used on the program listing page to show other similar programs. PadID is the primary key of the program records in my database.
~ EDIT ~
Heres my query
select match_keywords.PadID, count(match_keywords.Word) as matching_words
from keywords current_program_keywords
inner join keywords match_keywords on
match_keywords.Word=current_program_keywords.Word
where match_keywords.Word IS NOT NULL
and current_program_keywords.PadID=44243
group by match_keywords.PadID
order by matching_words DESC
LIMIT 0,11;
Heres the query explained.
Heres some sample data, however I doubt you'd be able to see the effects of any performance tweaks without more data, which I can provide if you'd like.
CREATE TABLE IF NOT EXISTS `keywords` (
`Word` varchar(20) NOT NULL,
`PadID` bigint(20) NOT NULL,
`LetterIdx` varchar(1) NOT NULL,
KEY `Word` (`Word`),
KEY `LetterIdx` (`LetterIdx`),
KEY `PadID_2` (`PadID`,`Word`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `keywords` (`Word`, `PadID`, `LetterIdx`) VALUES
('tv', 44243, 'T'),
('satellite tv', 44243, 'S'),
('satellite tv to pc', 44243, 'S'),
('satellite', 44243, 'S'),
('your', 44243, 'X'),
('computer', 44243, 'C'),
('pc', 44243, 'P'),
('soccer on your pc', 44243, 'S'),
('sports on your pc', 44243, 'S'),
('television', 44243, 'T');
I've tried adding an index, but this doesn't make much difference.
ALTER TABLE `keywords` ADD INDEX ( `PadID` )

You might find this helpful if I understood you correctly. The solution takes advantage of innodb's clustered primary key indexes (http://pastie.org/1195127)
EDIT: here's some links that may prove of interest:
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-adaptive-hash.html
drop table if exists programmes;
create table programmes
(
prog_id mediumint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
insert into programmes (name) values
('prog1'),('prog2'),('prog3'),('prog4'),('prog5'),('prog6');
drop table if exists keywords;
create table keywords
(
keyword_id mediumint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
insert into keywords (name) values
('tv'),('satellite tv'),('satellite tv to pc'),('pc'),('computer');
drop table if exists programme_keywords;
create table programme_keywords
(
keyword_id mediumint unsigned not null,
prog_id mediumint unsigned not null,
primary key (keyword_id, prog_id), -- note clustered composite primary key
key (prog_id)
)
engine=innodb;
insert into programme_keywords values
-- keyword 1
(1,1),(1,5),
-- keyword 2
(2,2),(2,4),
-- keyword 3
(3,1),(3,2),(3,5),(3,6),
-- keyword 4
(4,2),
-- keyword 5
(5,2),(5,3),(5,4);
/*
efficiently list all other programmes whose keywords match that of the
programme currently being queried (for instance prog_id = 1)
*/
drop procedure if exists list_matching_programmes;
delimiter #
create procedure list_matching_programmes
(
in p_prog_id mediumint unsigned
)
proc_main:begin
select
p.*
from
programmes p
inner join
(
select distinct -- other programmes with same keywords as current
pk.prog_id
from
programme_keywords pk
inner join
(
select keyword_id from programme_keywords where prog_id = p_prog_id
) current_programme -- the current program keywords
on pk.keyword_id = current_programme.keyword_id
inner join programmes p on pk.prog_id = p.prog_id
) matches
on matches.prog_id = p.prog_id
order by
p.prog_id;
end proc_main #
delimiter ;
call list_matching_programmes(1);
call list_matching_programmes(6);
explain
select
p.*
from
programmes p
inner join
(
select distinct
pk.prog_id
from
programme_keywords pk
inner join
(
select keyword_id from programme_keywords where prog_id = 1
) current_programme
on pk.keyword_id = current_programme.keyword_id
inner join programmes p on pk.prog_id = p.prog_id
) matches
on matches.prog_id = p.prog_id
order by
p.prog_id;
EDIT: added char_idx functionality as requested
alter table keywords add column char_idx char(1) null after name;
update keywords set char_idx = upper(substring(name,1,1));
select * from keywords;
explain
select
p.*
from
programmes p
inner join
(
select distinct
pk.prog_id
from
programme_keywords pk
inner join
(
select keyword_id from keywords where char_idx = 'P' -- just change the driver query
) keywords_starting_with
on pk.keyword_id = keywords_starting_with.keyword_id
) matches
on matches.prog_id = p.prog_id
order by
p.prog_id;

Try this approach, not sure if it will help but at least is different:
select PadID, count(Word) as matching_words
from keywords k
where Word in (
select Word
from keywords
where PadID=44243 )
group by PadID
order by matching_words DESC
LIMIT 0,11
Anyway the job you want to get done is heavy, and full of string comparison, maybe exporting keywords and storing only numeric ids in the keyword table can reduce the times.

Ok after reviewing you database I think there is not a lot of room to improve in the query, in fact on my test server with index on Word it only takes about 0.15s to complete, without the index it is almost 4x times slower.
Anyway I think that implementing the change in database sctructure f00 and I have told you it will improve the response time.
Also drop the index PadID_2 as it is now it is futile and it will only slow your writes.
What you should do but it requise to clean the database is to avoid duplicate keyword-prodId pair first removing al duplicate ones currently in DB (around 90k in my test with 3/4 of your DB) that will reduce query time and give meaningfull results. If you ask for a progId that has the keyword ABC that is duplicated for progdID2 then progID2 will be on top o other progIDs with the same ABC keyword but not duplicated, on my tests I have seen a progID that get several more matches that the same progID I am querying.
After dropping duplicates from the DB you will need to change your application to avoid this problem again in the future and just for being safe you could add a primary key (or index with unique activated) to Word + ProgID.

SQL Query always uses filesort in order by clause

I am trying to optimize a sql query which is using order by clause. When I use EXPLAIN the query always displays "using filesort". I am applying this query for a group discussion forum where there are tags attached to posts by users.
Here are the 3 tables I am using: users, user_tag, tags
user_tag is the association mapping table for users and their tags.
CREATE TABLE `usertable` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_name` varchar(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`user_name`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `user_tag` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned NOT NULL,
`tag_id` int(11) unsigned NOT NULL,
`usage_count` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `tag_id` (`tag_id`),
KEY `usage_count` (`usage_count`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I update the usage_count on server side using programming. Here is the query that's giving me problem. The query is to find out the tag_id and usage_count for a particular username, sorted by usage count in descending order
select user_tag.tag_id, user_tag.usage_count
from user_tag inner join usertable on usertable.user_id = user_tag.user_id
where user_name="abc" order by usage_count DESC;
Here is the explain output:
mysql> explain select
user_tag.tag_id,
user_tag.usage_count from user_tag
inner join usertable on
user_tag.user_id = usertable.user_id
where user_name="abc" order by
user_tag.usage_count desc;
Explain output here
What should I be changing to lose that "Using filesort"

I'm rather rusty with this, but here goes.
The key used to fetch the rows is not the same as the one used in the ORDER BY:
http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
As mentioned by OMG Ponies, an index on user_id, usage_count may resolve the filesort.
KEY `user_id_usage_count` (`user_id`,`usage_count`)

"Using filesort" is not necessarily bad; in many cases it doesn't actually matter.
Also, its name is somewhat confusing. The filesort() function does not necessarily use temporary files to perform the sort. For small data sets, the data are sorted in memory which is pretty fast.
Unless you think it's a specific problem (for example, after profiling your application on production-grade hardware in the lab, removing the ORDER BY solves a specific performance issue), or your data set is large, you should probably not worry about it.

How can I insert into tables with relations?

I have only done databases without relations, but now I need to do something more serious and correct.
Here is my database design:
Kunde = Customer
Vare = Product
Ordre = Order (Read: I want to make an order)
VareGruppe = ehm..type? (Read: Car, chair, closet etc.)
VareOrdre = Product_Orders
Here is my SQL (SQLite) schema:
CREATE TABLE Post (
Postnr INTEGER NOT NULL PRIMARY KEY,
Bynavn VARCHAR(50) NOT NULL
);
CREATE TABLE Kunde (
CPR INTEGER NOT NULL PRIMARY KEY,
Navn VARCHAR(50) NOT NULL,
Tlf INTEGER NOT NULL,
Adresse VARCHAR(50) NOT NULL,
Postnr INTEGER NOT NULL
CONSTRAINT fk_postnr_post REFERENCES Post(Postnr)
);
CREATE TABLE Varegruppe (
VGnr INTEGER PRIMARY KEY,
Typenavn VARCHAR(50) NOT NULL
);
CREATE TABLE Vare (
Vnr INTEGER PRIMARY KEY,
Navn VARCHAR(50) NOT NULL,
Pris DEC NOT NULL,
Beholdning INTEGER NOT NULL,
VGnr INTEGER NOT NULL
CONSTRAINT fk_varegruppevgnr_vgnr REFERENCES Varegruppe(VGnr)
);
CREATE TABLE Ordre (
Onr INTEGER PRIMARY KEY,
CPR INTEGER NOT NULL
CONSTRAINT fk_kundecpr_cpr REFERENCES Kunde(CPR),
Dato DATETIME NOT NULL,
SamletPris DEC NOT NULL
);
CREATE TABLE VareOrdre (
VareOrdreID INTEGER PRIMARY KEY,
Onr INTEGER NOT NULL
CONSTRAINT fk_ordrenr_onr REFERENCES Ordre(Onr),
Vnr INTEGER NOT NULL
CONSTRAINT fk_varevnr_vnr REFERENCES Vare(Vnr),
Antal INTEGER NOT NULL
);
It should work correctly.
But I am confused about Product_Orders.
How do I create an order? For example, 2 products using SQL INSERT INTO?
I can get nothing to work.
So far:
Only when I manually insert products and data into Product_Orders and then add that data to Orders = which makes it complete. Or the other way around (create an order in with 1 SQL, then manually inserting products into Product_orders - 1 SQL for each entry)

You should first create an order and then insert products in the table Product_Orders. This is necessary because you need an actual order with an id to associate it with the table Product_Orders.
You always should create a record in the foreign-key table before being able to create one in your current table. That way you should create a "Post", customer, type, product, order and product_order.

Try this ...
first you have to insert a customer
insert into kunde values(1, 'navn', 1, 'adresse', 1)
then you insert a type
insert into VareGruppe values(1, 'Type1')
then you insert a product
insert into vare values(1, 'product1', '10.0', 1, 1)
then you add an order
insert into ordre values(1, 1, '20090101', '10.0')
then you insert a register to the product_orders table
insert into VareOrdre values (1, 1, 1, 1)
I think this is it. :-)
As the primary keys are autoincrement, don't add them to the insert and specify the columns like this
insert into vare(Nav, Pris, Beholdning, VGnr) values('product1', '10.0', 1, 1)
Use Select ##identity to see the onr value

I think you already have the hang of what needs to happen. But what I think you are getting at is how to ensure data integrity.
This is where Transactions become important.
http://www.sqlteam.com/article/introduction-to-transactions

Is it the SalesPrice (I'm guessing that's what SamletPris means) that's causing the issue? I can see that being a problem here. One common design solution is to have 2 tables: Order and OrderLine. The Order is a header table - it will have the foreign key relationship to the Customer table, and any other 'top level' data. The OrderLine table has FK relationships to the Order table and to the Product table, along with quantity, unit price, etc. that are unique to an order's line item. Now, to get the sales price for an order, you sum the (unit price * quantity) of the OrderLine table for that order. Storing the SalesPrice for a whole order is likely to cause big issues down the line.

A note just in case this is MySQL: If you're using MyISAM, the MySQL server ignores the foreign keys completely. You have to set the engine to InnoDB if you want any kind of integrity actually enforced on the database end instead of just in your logic. This isn't your question but it is something to be aware of.
fbinder got the question right :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimize SQL query that uses GROUP BY and joined many-to-many relation tables? - sql

Related

In a SELECT command, how do I use data from one table to specify data in another?

Retrieving data from a many-to-many relationship in a recipes database

MySQL, need some performance suggestions on my match query

SQL Query always uses filesort in order by clause

How can I insert into tables with relations?

Categories

Resources