mariaDB query slow, only first client session - sql

Maybe just forget everything below...
After investigating, adding indexes we do get other results, and other queries becoming slow...
We really helps is this: Another VM (all the same image)
Open HeidiSQL, monitor the processes, explaining a query also might help. (No changes in the database) Then restart the machine, everything is fast. Just restarting without doing anything in HeidiSQL doesn't work.
In this database first some Temp(TMP) tables are filled (This are real tables!, not in memory temp tables). Then large stored-proc is executed to process the data from the temp tables to the other tables. Depending on the data in the TMP tables the query is very slow. (timeouts). Using HeidiSQL I figured out that one query was always busy.
After a client process kill the same scenario is fast, and stays fast. Even after restart of the machine the scenario stays fast..
In case of 4000 TMPproperty records and 1 TMPTransaction the query is fast. In case of 4000 TMPProperties and 100 TMPTransactions the query is very slow. Leading to timeouts in the client application.
I can image it has something to do with the join between TMPPropertyValue and TMPTransaction, but why only the first time?
Someone an idea what is wrong in the query?
UPDATE TMPPropertyValue, PropertyValue AS PV
INNER JOIN Object AS O ON PV.ObjectRowId = O.RowId
INNER JOIN TMPObject AS TMPO ON O.ObjectId = TMPO.ObjectId
INNER JOIN Transaction AS T ON T.RowId = PV.TransactionRowId
INNER JOIN Datastore AS D ON P.DatastoreRowId = D.RowId, TMPTransaction AS TT
SET TMPPropertyValue.Active = CASE
WHEN TT.TransactionDateTime > T.TransactionDateTime THEN 1
WHEN TT.TransactionDateTime < T.TransactionDateTime THEN 0
ELSE
CASE
WHEN TT.DatastoreID > D.DatastoreId THEN 1
ELSE 0
END
END
WHERE TMPO.RowId = TMPPropertyValue.ObjectRowId
AND TMPPropertyValue.TransactionRowId = TT.RowId
AND TMPPropertyValue.Active IS NULL
AND PV.PropertyRowId = TMPPropertyValue.PropertyRowId
AND (TT.DatastoreId <> D.DatastoreId OR TT.TransactionSeqNr <> T.TransactionSeqNr)
AND TMPPropertyValue.IsNew = 1;
I can't see what I must to rewrite this query, or indexed or so are needed? (As c# developer)
Update:
Killing the client, I managed keep the tmptables content and reproduce the slow query by creating a select statement, also replaced the joins and ',' joins and where with all inner joins.
mariadb version 10.3.11
SELECT * FROM TMPPropertyValue
INNER JOIN TMPObject AS TMPO ON TMPPropertyValue.ObjectRowId = TMPO.RowId
INNER JOIN Object AS O ON TMPO.ObjectId = O.ObjectId
INNER JOIN PropertyValue AS PV ON PVA.ObjectRowId = O.RowId
INNER JOIN Transaction AS T ON T.RowId = PVA.PerceptionRowId
INNER JOIN Datastore AS D ON T.DatastoreRowId = D.RowId
INNER JOIN TMPTransaction AS TT ON TMPPropertyValue.PerceptionRowId = TP.RowId
WHERE PV.PropertyRowId = TMPPropertyValue.PropertyRowId
AND (TP.DatastoreId <> D.DatastoreId OR TT.TransactionSeqNr <> T.TransactionSeqNr);
What really makes the query fast again is ommitting the JOIN with the Datastore table. This table is just 3 columns and about 10 records.
A index adding on the key and foreign key doesn't influence the result. (Transaction.DatastoreRowId and Datastore.RowId)
the select return 4004 record and 84/87 columns, depending on datastore join
CREATE TABLE `datastore` (
`RowId` int(11) NOT NULL AUTO_INCREMENT,
`DatastoreId` binary(16) NOT NULL,
`IsSynchronizable` tinyint(1) NOT NULL,
PRIMARY KEY (`RowId`),
KEY `IDX_Datastore_DatastoreIdRowId` (`DatastoreId`,`RowId`),
KEY `IDX_Datastore_RowIdDataStoreId` (`RowId`,`DatastoreId`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `Transaction` (
`RowId` int(11) NOT NULL AUTO_INCREMENT,
`DatastoreRowId` int(11) NOT NULL,
`TransactionSeqNr` int(11) NOT NULL DEFAULT 1,
`------
PRIMARY KEY (`RowId`),
UNIQUE KEY `IX_PerceptionActive_ContextDatastoreTransSeqNr` (`ContextRowId`,`DatastoreRowId`,`TransactionSeqNr`),
KEY `IDX_Transaction_TransactionDateTime` (`TransactionDateTime`),
KEY `IX_Transaction_ContextPerceptionSeqNrTransactionDateTime` (`ContextRowId`,`PerceptionSeqNr`,`TransactionDateTime`),
KEY `Transaction_Host` (`HostRowId`),
KEY `FK_Transaction_Datastore` (`DatastoreRowId`),
KEY `FK_Transaction_User` (`UserRowId`),
KEY `FK_Transaction_ClassificationDomain` (`ClassificationDomainRowId`),
CONSTRAINT `FK_Transaction_ClassificationDomain` FOREIGN KEY (`ClassificationDomainRowId`) REFERENCES `classificationdomain` (`RowId`),
CONSTRAINT `FK_Transaction_Context` FOREIGN KEY (`ContextRowId`) REFERENCES `context` (`RowId`),
CONSTRAINT `FK_Transaction_Datastore` FOREIGN KEY (`DatastoreRowId`) REFERENCES `datastore` (`RowId`),
CONSTRAINT `FK_Transaction_Host` FOREIGN KEY (`HostRowId`) REFERENCES `host` (`RowId`),
CONSTRAINT `FK_Transaction_User` FOREIGN KEY (`UserRowId`) REFERENCES `perceptionuser` (`RowId`)
) ENGINE=InnoDB AUTO_INCREMENT=2208 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
The explain is the same in both situations
mariadb explain

What finally fixed the issue:
Changed several queries in the stored-proc using better join/cross joins, no mixing etc
Adding some indexes for the columns used in the joins.
But what finally did the fix: adding a "Force Index" for the specific query. (on the new index). (since it was sometimes slow and then after some investigating/trying out using heidisql fast)

Related

How to optimize SQL query that uses GROUP BY and joined many-to-many relation tables?

I have tables with many-to-many relations:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL DEFAULT '',
`size_id` tinyint(3) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `size` (`size_id`)
);
CREATE TABLE `items_styles` (
`style_id` smallint(5) unsigned NOT NULL,
`item_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`item_id`, `style_id`),
INDEX `style` (`style_id`),
INDEX `item` (`item_id`),
CONSTRAINT `items_styles_item_id_item_id` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`)
);
CREATE TABLE `items_themes` (
`theme_id` tinyint(3) unsigned NOT NULL,
`item_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`item_id`, `theme_id`),
INDEX `theme` (`theme_id`),
INDEX `item` (`item_id`),
CONSTRAINT `items_themes_item_id_item_id` FOREIGN KEY (`item_id`) REFERENCES `item` (`id`)
);
I'm trying to get the report that shows style_id and the number of items that use this style but with applying filters to the item table and/or to another table, like this:
SELECT i_s.style_id, COUNT(i.id) total FROM item i
JOIN items_themes i_t ON i.id = i_t.item_id AND i_t.theme_id IN (6, 7)
JOIN items_styles i_s ON i.id = i_s.item_id
GROUP BY i_s.style_id;
-- or like this
SELECT i_s.style_id, COUNT(i.id) total FROM item i
JOIN items_themes i_t ON i.id = i_t.item_id AND i_t.theme_id IN (6, 7)
JOIN items_styles i_s ON i.id = i_s.item_id
WHERE i.size_id != 3
GROUP BY i_s.style_id;
The problem is that tables are pretty big so queries take a long time to execute (~8 seconds)
item - 8M rows
items_styles - 12M rows
items_themes - 11M rows
Is there any way to optimize these queries? If not, what approach can be used to receive such reports.
I will be grateful for any help. Thanks.
First, you don't need the items table for the queries. Probably doesn't have much impact on performance, but no need.
So you can write the query as:
SELECT i_s.style_id, COUNT(*) as total
FROM items_themes i_t JOIN
items_styles i_s
ON i_s.item_id = i_t.item_id
WHERE i_t.theme_id IN (6, 7)
GROUP BY i_s.style_id;
For this query, you want an index on items_themes(theme_id, item_id). There is no much you can do about the GROUP BY.
Then, I don't think this is what you really want, because it will double count an item that has both themes. So, use EXISTS instead:
SELECT i_s.style_id, COUNT(*) as total
FROM items_styles i_s
WHERE EXISTS (SELECT
FROM items_themes i_t
WHERE i_t.item_id = i_s.item_id AND
i_t.theme_id IN (6, 7)
)
GROUP BY i_s.style_id;
For this, you want an index on items_themes(item_id, theme_id). You can also try an index on items_styles(style_id). Some databases might be able to use that one, but I am guessing not MariaDB.
In a many-to-many table, it is optimal to have these two indexes:
PRIMARY KEY (`item_id`, `style_id`),
INDEX `style` (`style_id`, `item_id`)
And be sure to use InnoDB.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Still, you have two many-to-many mappings, so there probably is no great solution.

Bad SQLite query performance with outer joins

I have an SQLite database as part of an iOS app which works fine for the most part but certain small changes to a query can result in it taking 1000x longer to complete. Here's the 2 tables I have involved:
create table "journey_item" ("id" SERIAL NOT NULL PRIMARY KEY,
"position" INTEGER NOT NULL,
"last_update" BIGINT NOT NULL,
"rank" DOUBLE PRECISION NOT NULL,
"skipped" BOOLEAN NOT NULL,
"item_id" INTEGER NOT NULL,
"journey_id" INTEGER NOT NULL);
create table "content_items" ("id" SERIAL NOT NULL PRIMARY KEY,
"full_id" VARCHAR(32) NOT NULL,
"title" VARCHAR(508),
"timestamp" BIGINT NOT NULL,
"item_size" INTEGER NOT NULL,
"http_link" VARCHAR(254),
"local_url" VARCHAR(254),
"creator_id" INTEGER NOT NULL,
"from_id" INTEGER,"location_id" INTEGER);
Tables have indexes on primary and foreign keys.
And here are 2 queries which give a good example of my problem
SELECT * FROM content_items ci
INNER JOIN journey_item ji ON ji.item_id = ci.id WHERE ji.journey_id = 1
SELECT * FROM content_items ci
LEFT OUTER JOIN journey_item ji ON ji.item_id = ci.id WHERE ji.journey_id = 1
The first query takes 167 ms to complete while the second takes 3.5 minutes and I don't know why the outer join would make such a huge difference.
Edit:
Without the WHERE part the second query only takes 267 ms
The two queries should have the same result set (the where clause turns the left join into an inner join)`. However, SQLite probably doesn't recognize this.
If you have an index on journey_item(journey_id, item_id), then this would be used for the inner join version. However, the second version is probably scanning the first table for the join. An index on journey_item(item_id) would help, but probably still not match the performance of the first query.

Inner join between different database

I want to create a table using the following script in a database called DeltaDatabase:
CREATE TABLE [dbo].[OutStatus](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[OutId] [int] NOT NULL,
[StatusType] [varchar](255) NULL,
[StatusDate] [datetime] NULL)
I would then like to INNER JOIN a column into this table from another database called CoreDatabase.
The column name is sourceId from the table Client. So in other words OutId needs to be foreign key of SourceId.
How do I join that column into my OutStatus table from the other database using the create table script?
The basic syntax to retrieve data would be:
SELECT *
FROM CoreDatabase.dbo.Client c
INNER JOIN DeltaDatabase.dbo.OutStatus os ON c.SourceId = os.OutId
You need to fully qualify the tables name with: DatabaseName.Schema.TableName
You may wish to limit the columns or add a where clause to reduce the data that is returned.
As far as creating a foreign key across databases goes, it's not something you can do. You would have to use triggers or some other logic to maintain referential integrity between the primary and foreign keys.
Try the below query
Select * from DeltaDatabase.dbo.OutStatus OUS
Inner Join CoreDatabase.dbo.Client CL on OUS.OutId=CL.sourceId

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

MySQL, need some performance suggestions on my match query

I need some performance improvement guidance, my query takes several seconds to run and this is causing problems on the server. This query runs on the most common page on my site. I think a radical rethink may be required.
~ EDIT ~
This query produces a list of records whose keywords match those of the program (record) being queried. My site is a software download directory. And this list is used on the program listing page to show other similar programs. PadID is the primary key of the program records in my database.
~ EDIT ~
Heres my query
select match_keywords.PadID, count(match_keywords.Word) as matching_words
from keywords current_program_keywords
inner join keywords match_keywords on
match_keywords.Word=current_program_keywords.Word
where match_keywords.Word IS NOT NULL
and current_program_keywords.PadID=44243
group by match_keywords.PadID
order by matching_words DESC
LIMIT 0,11;
Heres the query explained.
Heres some sample data, however I doubt you'd be able to see the effects of any performance tweaks without more data, which I can provide if you'd like.
CREATE TABLE IF NOT EXISTS `keywords` (
`Word` varchar(20) NOT NULL,
`PadID` bigint(20) NOT NULL,
`LetterIdx` varchar(1) NOT NULL,
KEY `Word` (`Word`),
KEY `LetterIdx` (`LetterIdx`),
KEY `PadID_2` (`PadID`,`Word`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `keywords` (`Word`, `PadID`, `LetterIdx`) VALUES
('tv', 44243, 'T'),
('satellite tv', 44243, 'S'),
('satellite tv to pc', 44243, 'S'),
('satellite', 44243, 'S'),
('your', 44243, 'X'),
('computer', 44243, 'C'),
('pc', 44243, 'P'),
('soccer on your pc', 44243, 'S'),
('sports on your pc', 44243, 'S'),
('television', 44243, 'T');
I've tried adding an index, but this doesn't make much difference.
ALTER TABLE `keywords` ADD INDEX ( `PadID` )
You might find this helpful if I understood you correctly. The solution takes advantage of innodb's clustered primary key indexes (http://pastie.org/1195127)
EDIT: here's some links that may prove of interest:
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-adaptive-hash.html
drop table if exists programmes;
create table programmes
(
prog_id mediumint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
insert into programmes (name) values
('prog1'),('prog2'),('prog3'),('prog4'),('prog5'),('prog6');
drop table if exists keywords;
create table keywords
(
keyword_id mediumint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
insert into keywords (name) values
('tv'),('satellite tv'),('satellite tv to pc'),('pc'),('computer');
drop table if exists programme_keywords;
create table programme_keywords
(
keyword_id mediumint unsigned not null,
prog_id mediumint unsigned not null,
primary key (keyword_id, prog_id), -- note clustered composite primary key
key (prog_id)
)
engine=innodb;
insert into programme_keywords values
-- keyword 1
(1,1),(1,5),
-- keyword 2
(2,2),(2,4),
-- keyword 3
(3,1),(3,2),(3,5),(3,6),
-- keyword 4
(4,2),
-- keyword 5
(5,2),(5,3),(5,4);
/*
efficiently list all other programmes whose keywords match that of the
programme currently being queried (for instance prog_id = 1)
*/
drop procedure if exists list_matching_programmes;
delimiter #
create procedure list_matching_programmes
(
in p_prog_id mediumint unsigned
)
proc_main:begin
select
p.*
from
programmes p
inner join
(
select distinct -- other programmes with same keywords as current
pk.prog_id
from
programme_keywords pk
inner join
(
select keyword_id from programme_keywords where prog_id = p_prog_id
) current_programme -- the current program keywords
on pk.keyword_id = current_programme.keyword_id
inner join programmes p on pk.prog_id = p.prog_id
) matches
on matches.prog_id = p.prog_id
order by
p.prog_id;
end proc_main #
delimiter ;
call list_matching_programmes(1);
call list_matching_programmes(6);
explain
select
p.*
from
programmes p
inner join
(
select distinct
pk.prog_id
from
programme_keywords pk
inner join
(
select keyword_id from programme_keywords where prog_id = 1
) current_programme
on pk.keyword_id = current_programme.keyword_id
inner join programmes p on pk.prog_id = p.prog_id
) matches
on matches.prog_id = p.prog_id
order by
p.prog_id;
EDIT: added char_idx functionality as requested
alter table keywords add column char_idx char(1) null after name;
update keywords set char_idx = upper(substring(name,1,1));
select * from keywords;
explain
select
p.*
from
programmes p
inner join
(
select distinct
pk.prog_id
from
programme_keywords pk
inner join
(
select keyword_id from keywords where char_idx = 'P' -- just change the driver query
) keywords_starting_with
on pk.keyword_id = keywords_starting_with.keyword_id
) matches
on matches.prog_id = p.prog_id
order by
p.prog_id;
Try this approach, not sure if it will help but at least is different:
select PadID, count(Word) as matching_words
from keywords k
where Word in (
select Word
from keywords
where PadID=44243 )
group by PadID
order by matching_words DESC
LIMIT 0,11
Anyway the job you want to get done is heavy, and full of string comparison, maybe exporting keywords and storing only numeric ids in the keyword table can reduce the times.
Ok after reviewing you database I think there is not a lot of room to improve in the query, in fact on my test server with index on Word it only takes about 0.15s to complete, without the index it is almost 4x times slower.
Anyway I think that implementing the change in database sctructure f00 and I have told you it will improve the response time.
Also drop the index PadID_2 as it is now it is futile and it will only slow your writes.
What you should do but it requise to clean the database is to avoid duplicate keyword-prodId pair first removing al duplicate ones currently in DB (around 90k in my test with 3/4 of your DB) that will reduce query time and give meaningfull results. If you ask for a progId that has the keyword ABC that is duplicated for progdID2 then progID2 will be on top o other progIDs with the same ABC keyword but not duplicated, on my tests I have seen a progID that get several more matches that the same progID I am querying.
After dropping duplicates from the DB you will need to change your application to avoid this problem again in the future and just for being safe you could add a primary key (or index with unique activated) to Word + ProgID.