Bad SQLite query performance with outer joins

Bad SQLite query performance with outer joins - sql

I have an SQLite database as part of an iOS app which works fine for the most part but certain small changes to a query can result in it taking 1000x longer to complete. Here's the 2 tables I have involved:
create table "journey_item" ("id" SERIAL NOT NULL PRIMARY KEY,
"position" INTEGER NOT NULL,
"last_update" BIGINT NOT NULL,
"rank" DOUBLE PRECISION NOT NULL,
"skipped" BOOLEAN NOT NULL,
"item_id" INTEGER NOT NULL,
"journey_id" INTEGER NOT NULL);
create table "content_items" ("id" SERIAL NOT NULL PRIMARY KEY,
"full_id" VARCHAR(32) NOT NULL,
"title" VARCHAR(508),
"timestamp" BIGINT NOT NULL,
"item_size" INTEGER NOT NULL,
"http_link" VARCHAR(254),
"local_url" VARCHAR(254),
"creator_id" INTEGER NOT NULL,
"from_id" INTEGER,"location_id" INTEGER);
Tables have indexes on primary and foreign keys.
And here are 2 queries which give a good example of my problem
SELECT * FROM content_items ci
INNER JOIN journey_item ji ON ji.item_id = ci.id WHERE ji.journey_id = 1
SELECT * FROM content_items ci
LEFT OUTER JOIN journey_item ji ON ji.item_id = ci.id WHERE ji.journey_id = 1
The first query takes 167 ms to complete while the second takes 3.5 minutes and I don't know why the outer join would make such a huge difference.
Edit:
Without the WHERE part the second query only takes 267 ms

The two queries should have the same result set (the where clause turns the left join into an inner join)`. However, SQLite probably doesn't recognize this.
If you have an index on journey_item(journey_id, item_id), then this would be used for the inner join version. However, the second version is probably scanning the first table for the join. An index on journey_item(item_id) would help, but probably still not match the performance of the first query.

Related

mariaDB query slow, only first client session

Maybe just forget everything below...
After investigating, adding indexes we do get other results, and other queries becoming slow...
We really helps is this: Another VM (all the same image)
Open HeidiSQL, monitor the processes, explaining a query also might help. (No changes in the database) Then restart the machine, everything is fast. Just restarting without doing anything in HeidiSQL doesn't work.
In this database first some Temp(TMP) tables are filled (This are real tables!, not in memory temp tables). Then large stored-proc is executed to process the data from the temp tables to the other tables. Depending on the data in the TMP tables the query is very slow. (timeouts). Using HeidiSQL I figured out that one query was always busy.
After a client process kill the same scenario is fast, and stays fast. Even after restart of the machine the scenario stays fast..
In case of 4000 TMPproperty records and 1 TMPTransaction the query is fast. In case of 4000 TMPProperties and 100 TMPTransactions the query is very slow. Leading to timeouts in the client application.
I can image it has something to do with the join between TMPPropertyValue and TMPTransaction, but why only the first time?
Someone an idea what is wrong in the query?
UPDATE TMPPropertyValue, PropertyValue AS PV
INNER JOIN Object AS O ON PV.ObjectRowId = O.RowId
INNER JOIN TMPObject AS TMPO ON O.ObjectId = TMPO.ObjectId
INNER JOIN Transaction AS T ON T.RowId = PV.TransactionRowId
INNER JOIN Datastore AS D ON P.DatastoreRowId = D.RowId, TMPTransaction AS TT
SET TMPPropertyValue.Active = CASE
WHEN TT.TransactionDateTime > T.TransactionDateTime THEN 1
WHEN TT.TransactionDateTime < T.TransactionDateTime THEN 0
ELSE
CASE
WHEN TT.DatastoreID > D.DatastoreId THEN 1
ELSE 0
END
END
WHERE TMPO.RowId = TMPPropertyValue.ObjectRowId
AND TMPPropertyValue.TransactionRowId = TT.RowId
AND TMPPropertyValue.Active IS NULL
AND PV.PropertyRowId = TMPPropertyValue.PropertyRowId
AND (TT.DatastoreId <> D.DatastoreId OR TT.TransactionSeqNr <> T.TransactionSeqNr)
AND TMPPropertyValue.IsNew = 1;
I can't see what I must to rewrite this query, or indexed or so are needed? (As c# developer)
Update:
Killing the client, I managed keep the tmptables content and reproduce the slow query by creating a select statement, also replaced the joins and ',' joins and where with all inner joins.
mariadb version 10.3.11
SELECT * FROM TMPPropertyValue
INNER JOIN TMPObject AS TMPO ON TMPPropertyValue.ObjectRowId = TMPO.RowId
INNER JOIN Object AS O ON TMPO.ObjectId = O.ObjectId
INNER JOIN PropertyValue AS PV ON PVA.ObjectRowId = O.RowId
INNER JOIN Transaction AS T ON T.RowId = PVA.PerceptionRowId
INNER JOIN Datastore AS D ON T.DatastoreRowId = D.RowId
INNER JOIN TMPTransaction AS TT ON TMPPropertyValue.PerceptionRowId = TP.RowId
WHERE PV.PropertyRowId = TMPPropertyValue.PropertyRowId
AND (TP.DatastoreId <> D.DatastoreId OR TT.TransactionSeqNr <> T.TransactionSeqNr);
What really makes the query fast again is ommitting the JOIN with the Datastore table. This table is just 3 columns and about 10 records.
A index adding on the key and foreign key doesn't influence the result. (Transaction.DatastoreRowId and Datastore.RowId)
the select return 4004 record and 84/87 columns, depending on datastore join
CREATE TABLE `datastore` (
`RowId` int(11) NOT NULL AUTO_INCREMENT,
`DatastoreId` binary(16) NOT NULL,
`IsSynchronizable` tinyint(1) NOT NULL,
PRIMARY KEY (`RowId`),
KEY `IDX_Datastore_DatastoreIdRowId` (`DatastoreId`,`RowId`),
KEY `IDX_Datastore_RowIdDataStoreId` (`RowId`,`DatastoreId`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `Transaction` (
`RowId` int(11) NOT NULL AUTO_INCREMENT,
`DatastoreRowId` int(11) NOT NULL,
`TransactionSeqNr` int(11) NOT NULL DEFAULT 1,
`------
PRIMARY KEY (`RowId`),
UNIQUE KEY `IX_PerceptionActive_ContextDatastoreTransSeqNr` (`ContextRowId`,`DatastoreRowId`,`TransactionSeqNr`),
KEY `IDX_Transaction_TransactionDateTime` (`TransactionDateTime`),
KEY `IX_Transaction_ContextPerceptionSeqNrTransactionDateTime` (`ContextRowId`,`PerceptionSeqNr`,`TransactionDateTime`),
KEY `Transaction_Host` (`HostRowId`),
KEY `FK_Transaction_Datastore` (`DatastoreRowId`),
KEY `FK_Transaction_User` (`UserRowId`),
KEY `FK_Transaction_ClassificationDomain` (`ClassificationDomainRowId`),
CONSTRAINT `FK_Transaction_ClassificationDomain` FOREIGN KEY (`ClassificationDomainRowId`) REFERENCES `classificationdomain` (`RowId`),
CONSTRAINT `FK_Transaction_Context` FOREIGN KEY (`ContextRowId`) REFERENCES `context` (`RowId`),
CONSTRAINT `FK_Transaction_Datastore` FOREIGN KEY (`DatastoreRowId`) REFERENCES `datastore` (`RowId`),
CONSTRAINT `FK_Transaction_Host` FOREIGN KEY (`HostRowId`) REFERENCES `host` (`RowId`),
CONSTRAINT `FK_Transaction_User` FOREIGN KEY (`UserRowId`) REFERENCES `perceptionuser` (`RowId`)
) ENGINE=InnoDB AUTO_INCREMENT=2208 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
The explain is the same in both situations
mariadb explain

What finally fixed the issue:
Changed several queries in the stored-proc using better join/cross joins, no mixing etc
Adding some indexes for the columns used in the joins.
But what finally did the fix: adding a "Force Index" for the specific query. (on the new index). (since it was sometimes slow and then after some investigating/trying out using heidisql fast)

In a SELECT command, how do I use data from one table to specify data in another?

I have 2 tables. What is important is the PlayerId and the Username.
CREATE TABLE [dbo].[Run]
(
[RunId] INT NOT NULL,
[PlayerId] INT NOT NULL,
[Duration] TIME(7) NOT NULL,
[DateUploaded] NCHAR(10) NOT NULL,
[VersionId] INT NOT NULL,
PRIMARY KEY CLUSTERED ([RunId] ASC),
CONSTRAINT [FK_Run_Player]
FOREIGN KEY ([PlayerId]) REFERENCES [dbo].[Player] ([PlayerId]),
CONSTRAINT [FK_Run_Version]
FOREIGN KEY ([VersionId]) REFERENCES [dbo].[Version] ([VersionId])
);
CREATE TABLE [dbo].[Player]
(
[PlayerId] INT NOT NULL,
[Username] NCHAR(20) NOT NULL,
[ProfilePicture] IMAGE NULL,
[Country] NCHAR(20) NOT NULL,
[LeagueId] INT NULL,
[DateJoined] DATE NULL,
PRIMARY KEY CLUSTERED ([PlayerId] ASC),
CONSTRAINT [FK_Player_League]
FOREIGN KEY ([LeagueId]) REFERENCES [dbo].[League] ([LeagueId])
);
I have a select command:
SELECT
PlayerId, Duration, VersionId, DateUploaded
FROM
[Run]
(with apologies in advance for my messy made up pseudocode), what I need it to do is:
SELECT (Player.PlayerId.Username)
What I basically need it to do, is instead of giving me just PlayerId, I need it to get the corresponding Username (from the other table) that matches each PlayerId (PlayerId is a foreign key)
So say for example instead of returning
1, 2, 3, 4, 5
it should return
John12, Abby2003, amy_932, asha7494, luke_ww
assuming, for example, Abby2003's PlayerId was 2.
I've done trial and error and either nobody's tried this before or I'm searching the wrong keywords. This is using VS 2022, ASP.NET Web Forms, and Visual Basic, but that shouldn't affect anything I don't think. Any syntax ideas or help would be greatly appreciated.

try this for join the 2 Table togother
SELECT R.RunId
,R.PlayerId
,R.Duration
,R.DateUploaded
,R.VersionId
,P.Username
,P.ProfilePicture
,P.Country
,P.LeagueId
,P.DateJoined
FROM Run R
inner join Player P on R.PlayerId = P.PlayerId

Usually in this case joins are used. You can join the two tables together, give them aliases (or don't, personal preference really), then select what you need. In this case, you would probably want an inner join. Your query would probably look something like this:
SELECT p.Username FROM [Run] r
INNER JOIN [Player] p ON r.PlayerId = p.PlayerId
Then if you need to you can put a WHERE clause after that.
More about joins here

Index Guidance for SQL Query

Anyone have guidance on how to approach building indexes for the following query? The query works as expected, but I can't seem to get around full table scans. Working with Oracle 11g.
SELECT v.volume_id
FROM ( SELECT MIN (usv.volume_id) volume_id
FROM user_stage_volume usv
WHERE usv.status = 'NEW'
AND NOT EXISTS
(SELECT 1
FROM user_stage_volume kusv
WHERE kusv.deal_num = usv.deal_num
AND kusv.locked = 'Y')
GROUP BY usv.deal_num, usv.volume_type
ORDER BY MAX (usv.priority) DESC, MIN (usv.last_update) ASC) v
WHERE ROWNUM = 1;
Please request any more info you may need in comments and I'll edit.
Here is the create script for the table. The PK is VOLUME_ID. DEAL_NUM is not unique.
CREATE TABLE ENDUR.USER_STAGE_VOLUME
(
DEAL_NUM NUMBER(38) NOT NULL,
EXTERNAL_ID NUMBER(38) NOT NULL,
VOLUME_TYPE NUMBER(38) NOT NULL,
EXTERNAL_TYPE VARCHAR2(100 BYTE) NOT NULL,
GMT_START DATE NOT NULL,
GMT_END DATE NOT NULL,
VALUE FLOAT(126) NOT NULL,
VOLUME_ID NUMBER(38) NOT NULL,
PRIORITY INTEGER NOT NULL,
STATUS VARCHAR2(100 BYTE) NOT NULL,
LAST_UPDATE DATE NOT NULL,
LOCKED CHAR(1 BYTE) NOT NULL,
RETRY_COUNT INTEGER DEFAULT 0 NOT NULL,
INS_DATE DATE NOT NULL
)
ALTER TABLE ENDUR.USER_STAGE_VOLUME ADD (
PRIMARY KEY
(VOLUME_ID))

An index on (deal_num) would help the subquery greatly. In fact, an index on (deal_num, locked) would allow the subquery to avoid the table itself altogether.
You should expect a full table scan on the main query, as it filters on status which is not indexed (and most likely would not benefit from being indexed, unless 'NEW' is a fairly rare value for status).

I think it's running your inner subquery (inside not exists...) once for every run of the outer subquery.
That will be where performance takes a hit - it will run through all of user_stage_volume for each row in user_stage_volume, which is O(n^2), n being the number of rows in usv.
An alternative would be to create a view for the inner subquery, and use that view, or alternatively, to name a temporary view by using WITH.

SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value

I'm trying to speed up some code that I wrote years ago for my employer's purchase authorization app. Basically I have a SLOW subquery that I'd like to replace with a JOIN (if it's faster).
When the director logs into the application he sees a list of purchase requests he has yet to authorize or deny. That list is generated with the following query:
SELECT * FROM SA_ORDER WHERE ORDER_ID NOT IN
(SELECT ORDER_ID FROM SA_SIGNATURES WHERE TYPE = 'administrative director');
There are only about 900 records in sa_order and 1800 records in sa_signature and this query still takes about 5 seconds to execute. I've tried using a LEFT JOIN to retrieve records I need, but I've only been able to get sa_order records with NO matching records in sa_signature, and I need sa_order records with "no matching records with a type of 'administrative director'". Your help is greatly appreciated!
The schema for the two tables is as follows:
The tables involved have the following layout:
CREATE TABLE sa_order
(
`order_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_number` BIGINT NOT NULL,
`submit_date` DATE NOT NULL,
`vendor_id` BIGINT NOT NULL,
`DENIED` BOOLEAN NOT NULL DEFAULT FALSE,
`MEMO` MEDIUMTEXT,
`year_id` BIGINT NOT NULL,
`advisor` VARCHAR(255) NOT NULL,
`deleted` BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE TABLE sa_signature
(
`signature_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_id` BIGINT NOT NULL,
`signature` VARCHAR(255) NOT NULL,
`proxy` BOOLEAN NOT NULL DEFAULT FALSE,
`timestamp` TIMESTAMP NOT NULL DEFAULT NOW(),
`username` VARCHAR(255) NOT NULL,
`type` VARCHAR(255) NOT NULL
);

Create an index on sa_signatures (type, order_id).
This is not necessary to convert the query into a LEFT JOIN unless sa_signatures allows nulls in order_id. With the index, the NOT IN will perform as well. However, just in case you're curious:
SELECT o.*
FROM sa_order o
LEFT JOIN
sa_signatures s
ON s.order_id = o.order_id
AND s.type = 'administrative director'
WHERE s.type IS NULL
You should pick a NOT NULL column from sa_signatures for the WHERE clause to perform well.

You could replace the [NOT] IN operator with EXISTS for faster performance.
So you'll have:
SELECT * FROM SA_ORDER WHERE NOT EXISTS
(SELECT ORDER_ID FROM SA_SIGNATURES
WHERE TYPE = 'administrative director'
AND ORDER_ID = SA_ORDER.ORDER_ID);
Reason : "When using “NOT IN”, the query performs nested full table scans, whereas for “NOT EXISTS”, query can use an index within the sub-query."
Source : http://decipherinfosys.wordpress.com/2007/01/21/32/

This following query should work, however I suspect your real issue is you don't have the proper indices in place. You should have an index on the SA_SGINATURES table on the ORDER_ID column.
SELECT *
FROM
SA_ORDER
LEFT JOIN
SA_SIGNATURES
ON
SA_ORDER.ORDER_ID = SA_SIGNATURES.ORDER_ID AND
TYPE = 'administrative director'
WHERE
SA_SIGNATURES.ORDER_ID IS NULL;

select * from sa_order as o inner join sa_signature as s on o.orderid = sa.orderid and sa.type = 'administrative director'
also, you can create a non clustered index on type in sa_signature table
even better - have a master table for types with typeid and typename, and then instead of saving type as text in your sa_signature table, simply save type as integer. thats because computing on integers is way faster than computing on text

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?

The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.

Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.

I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;

TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bad SQLite query performance with outer joins - sql

Related

mariaDB query slow, only first client session

In a SELECT command, how do I use data from one table to specify data in another?

Index Guidance for SQL Query

SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value

MySQL query slow when selecting VARCHAR

Categories

Resources