How to optimize this low-performance MySQL query?

How to optimize this low-performance MySQL query? - sql

I’m currently using the following query for jsPerf. In the likely case you don’t know jsPerf — there are two tables: pages containing the test cases / revisions, and tests containing the code snippets for the tests inside the test cases.
There are currently 937 records in pages and 3817 records in tests.
As you can see, it takes quite a while to load the “Browse jsPerf” page where this query is used.
The query takes about 7 seconds to execute:
SELECT
id AS pID,
slug AS url,
revision,
title,
published,
updated,
(
SELECT COUNT(*)
FROM pages
WHERE slug = url
AND visible = "y"
) AS revisionCount,
(
SELECT COUNT(*)
FROM tests
WHERE pageID = pID
) AS testCount
FROM pages
WHERE updated IN (
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
)
AND visible = "y"
ORDER BY updated DESC
I’ve added indexes on all fields that appear in WHERE clauses. Should I add more?
How can this query be optimized?
P.S. I know I could implement a caching system in PHP — I probably will, so please don’t tell me :) I’d just really like to find out how this query could be improved, too.

Use:
SELECT x.id AS pID,
x.slug AS url,
x.revision,
x.title,
x.published,
x.updated,
y.revisionCount,
COALESCE(z.testCount, 0) AS testCount
FROM pages x
JOIN (SELECT p.slug,
MAX(p.updated) AS max_updated,
COUNT(*) AS revisionCount
FROM pages p
WHERE p.visible = 'y'
GROUP BY p.slug) y ON y.slug = x.slug
AND y.max_updated = x.updated
LEFT JOIN (SELECT t.pageid,
COUNT(*) AS testCount
FROM tests t
GROUP BY t.pageid) z ON z.pageid = x.id
ORDER BY updated DESC

You want to learn how to use EXPLAIN. This will execute the sql statement, and show you which indexes are being used, and what row scans are being performed. The goal is to reduce the number of row scans (ie, the database searching row by row for values).

You may want to try the subqueries one at a time to see which one is slowest.
This query:
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
Makes it sort the result by slug. This is probably slow.

Related

Multiple columns in a subquery

I am trying to find the products selected in previous week vs products selected for this week to find the churn in selection decision. Currently I am doing it for a single site and the result works fine with the correct number of records. Now I want to change the code where I get the output for 10 sites in a single query.
create temporary view removes as
select distinct
asin,
lagweek,
fc,
'removed' as state,
demand_pp,
instock_pp,
source,
filter_reason,
is_missing_in_pp,
is_missing_in_dc,
is_missing_in_nmi,
asin_nmi,
asin_pre,
asin_dc,
filter_reason_old,
asin_omi,
asin_preo,
asin_dco
from sel_old so where asin not in (select asin from sel_new sn where sn.lagweek = so.lagweek);
Since this is for a single site just doing asin not in (select asin ...) works fine but now I want to look at ASINs across multiple sites from the same logic. I tried the approach below but it returns incorrect number of records.
create temporary view removes as
select distinct
so.asin,
so.lagweek,
so.fc,
'removed' as state,
so.demand_pp,
so.instock_pp,
so.source,
so.filter_reason,
so.is_missing_in_pp,
so.is_missing_in_dc,
so.is_missing_in_nmi,
so.asin_nmi,
so.asin_pre,
so.asin_dc,
so.filter_reason_old,
so.asin_omi,
so.asin_preo,
so.asin_dco
from sel_old so left join (select asin, fc, lagweek from sel_new) as sn
on (so.asin <> sn.asin)
and (so.fc = sn.fc)
and (so.lagweek = sn.lagweek);
How should I approach this. I haven't been able find an easier solution if there are any.

You can use EXISTS predicate. It doesn't produce additional records, just tests the existence of some case and filters accordingly.
select distinct
so.asin,
so.lagweek,
so.fc,
'removed' as state,
so.demand_pp,
so.instock_pp,
so.source,
so.filter_reason,
so.is_missing_in_pp,
so.is_missing_in_dc,
so.is_missing_in_nmi,
so.asin_nmi,
so.asin_pre,
so.asin_dc,
so.filter_reason_old,
so.asin_omi,
so.asin_preo,
so.asin_dco
from sel_old so
where not exists (
select 1
from sel_new sn
where so.fc = sn.fc
and so.lagweek = sn.lagweek
and so.asin = sn.asin
)

why does my Query not load or take so long?

I already had a query which showed the wrong details (cg_Tracking.Distance), which I now tried to change by changing it, from cg_tracking.distance to but it seems not to load.
it was like this before and showed results very fast:
SELECT DISTINCT cg_tracking.f_nr,
cg_tracking.date_cg,
cg_tracking.manummer,
cg_tracking.distance,
cg_tracking.longitude,
cg_tracking.latitude,
cg_tracking.datetime_cg,
cg_tracking.speed
FROM cg_tracking
WHERE f_nr = '317'
GROUP BY cg_tracking.f_nr,
cg_tracking.date_cg,
cg_tracking.manummer,
cg_trackign.Distance.
cg_tracking.longitude,
cg_tracking.latitude,
cg_tracking.datetime_cg,
cg_tracking.speed
ORDER BY cg_tracking.date_cg ASC
Now I´ve changed it to this and it takes really long to load and doesn't even give me the right details.
SELECT DISTINCT cg_tracking.f_nr,
cg_tracking.date_cg,
cg_tracking.manummer,
Round(( cg_01_ziele.strecke / 1000 ), 1) AS Strecke,
cg_tracking.longitude,
cg_tracking.latitude,
cg_tracking.datetime_cg,
cg_tracking.speed
FROM cg_tracking, cg_01_Ziele
JOIN cg_zielfahrtstatuslog
ON cg_ZielfahrtstatusLog.ZielID = cg_01_Ziele.ZielID
JOIN cg_02_kunden
ON cg_02_kunden.zielid = cg_01_ziele.zielid
WHERE cg_tracking.F_NR = '317'
AND NOT( cg_zielfahrtstatuslog.status = 7
AND cg_zielfahrtstatuslog.interruption = 0)
AND cg_01_Ziele.DATETIME_CG between '2020-06-02T00:00:00'
AND '2020-06-02T23:59:59'
GROUP BY cg_tracking.f_nr,
cg_tracking.date_cg,
cg_tracking.manummer,
Round(( cg_01_ziele.strecke / 1000 ), 1),
cg_tracking.longitude,
cg_tracking.latitude,
cg_tracking.datetime_cg,
cg_tracking.speed
ORDER BY cg_tracking.date_cg ASC
It always gives me other F_NR and Datetime_cg eventhough I wrote where F_nr = '317' and between the dates I wanted.
I already deleted the and Not conditions and it still takes a lot of time and doesn't give me the right answer.
My assumption is because of the Joins and different tables but I don't know any solution.

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.

Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?

Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.

"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

Why does added RAND() cause MySQL to overload?

OK I have this query which gives me DISTINCT product_series, plus all the other fields in the table:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price
LIMIT 1
) ORDER BY product_price
This works fine. I am also ordering by price so I can get the starting price for each series. Nice.
However today my boss told me that all the products thats are showing up from this query are of metal_type white gold And he wants to show random metal types. so I added RAND() to the order by after the ORDER BY price so that I will still get the lowest price, but a random metal in the lowest price.. here is the new query:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price, RAND()
LIMIT 1
) ORDER BY product_price, RAND()
When I run this query, MySQL completely shuts down and tells me that there are too many connections And I get a phone call from the host admin asking me what the hell I did.
I didn't believe that could be just from added RAND() to the query and I thought it had to be a coincidence. I waited a few hours after everything was fixed and ran the query again. Immediately... same issue.
So what is going on? Because I have no clue. Is there something wrong with my query?
Thanks!!!!

Using RAND() for ORDER BY is not a good idea, because it does not scale as the data increases. You can see more information on it, including two alternatives you can adapt, in my answer to this question.

Here's a blog post that explains the issue quite well, and workarounds:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
And here's a similar warning against ORDER BY RAND() for MySQL, I think the cause is basically the same there:
http://www.webtrenches.com/post.cfm/avoid-rand-in-mysql

Depending on the number of products in your site, that function call is going to execute once per record, potentially slowing the query down.. considerably.
The Too Many Connections error is probably due to this query blocking others while it tries to compute those numbers.
Find another way. ;)

Instead, you can generate random numbers on the programming language you're using, instead of the MySQL side, as rand() is being called for each row

If you know how many records you have you can select a random record like this (this is Perl):
$handle->Sql("SELECT COUNT(0) AS nor FROM table");
$handle->FetchRow();
$nor = $handle->Data('nor');
$rand = int(rand()*$nor)+1;
$handle->Sql("SELECT * FROM table LIMIT $rand,1");
$handle->FetchRow();
.
.
.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimize this low-performance MySQL query? - sql

You want to learn how to use EXPLAIN. This will execute the sql statement, and show you which indexes are being used, and what row scans are being performed. The goal is to reduce the number of row scans (ie, the database searching row by row for values).

You may want to try the subqueries one at a time to see which one is slowest. This query: SELECT MAX(updated) FROM pages WHERE visible = "y" GROUP BY slug Makes it sort the result by slug. This is probably slow.

Related

Multiple columns in a subquery

why does my Query not load or take so long?

SQl Query get data very slow from different tables

Select first or random row in group by

Why does added RAND() cause MySQL to overload?

Categories

Resources