Two left joins and one query to MySQL performance problem - sql

I'm designing a project for quizzes and quizz results. So I have two tables: quizz_result and quizz. quizz has primary key on ID and quizz_result has foreign key QUIZZ_ID to quizz identity.
Query below is designed to take public quizzes ordered by date with asociated informations: if current user (683735) took this quizz and has a valid result (>0) and how many people filled this quizz till this point in time.
So i did this simple query with two left joins:
select
a.*,
COUNT(countt.QUIZZ_ID) SUMFILL
from
quizz a
left join quizz_result countt
on countt.QUIZZ_ID = a.ID
group by
a.ID
And added indexes on these columns:
Quizz:
ID, (ID, DATE), PUBLIC, (PUBLIC, DATE)
And on quizz_result:
ID, (QUIZZ_ID, USER_ID), QUIZZ_ID, USER_ID, (QUIZZ_ID, QUIZZ_RESULT_ID)
But still when I do query it takes like about one minute. And i have only 34k rows in QUIZZ_RESULTS and 120 rows in QUIZZ table.
When I do EXPLAIN on this query I get this:
SELECT TYPE: simple, possible keys: IDX_PUBLIC,DATE, rows: 34 extra: Using where; Using temporary; Using filesort
SELECT TYPE: simple, possible keys: IDX_QUIZZ_USER,IDX_QUIZZ_RES_RES_QUIZ,IDX_USERID,I..., rows: 1, extra: nothing here
SELECT TYPE: simple, possible keys: IDX_QUIZZ_USER,IDX_QUIZ_RES_RES_QUIZZ,ID_RESULT_ID, rows: 752, extra: Using index
And I don't know what to do to optimise this query. I see this:
Using where; Using temporary; Using filesort
But still I don't know how to get this better, or maybe number of rows in last select is to hight? 752?
How can I optimise this query?
EDIT: I've upadated query to this one with only one left join because it has the same long execution time.
EDIT2: I did remove everything to and thats it: this simple select with one query takes 1s to execute. How to optimise it?

Try taking some of those additional conditions out of your joins.
Moving them to the where clause can sometimes help. Also, consider putting the core joins into their own subquery and then limiting that with a where clause.

What about an index on (USER_ID, QUIZZ_ID, QUIZZ_RESULT_ID), since they're all AND'd together?

I've changed it to this:
select
a.*,
COUNT(a.ID) SUMFILL
from
quizz a
left join quizz_result countt
on countt.QUIZZ_ID = a.ID
group by
a.ID
And it's good now.

Try this:
SELECT q.*,
(
SELECT COUNT(*)
FROM quizz_results qr
WHERE qr.quizz_id = q.id
) AS total_played,
(
SELECT result
FROM qr.quizz_id = q.id
AND user_id = 683735
) AS current_user_won
FROM quizz q

Related

What is index do i need?

I have problems with the performance of this query. If I remove Order by section all work well. But I really want it. I tried to use many indexes but have not any results. Can you help me pls?
SELECT *
FROM "refuel_request" AS "refuel_request"
LEFT OUTER JOIN "user" AS "user" ON "refuel_request"."user_id" = "user"."user_id"
LEFT OUTER JOIN "bill_qr" AS "bill_qr" ON "refuel_request"."bill_qr_id" = "bill_qr"."bill_qr_id"
LEFT OUTER JOIN "car" AS "order.car" ON "refuel_request"."car_id" = "order.car"."car_id"
LEFT OUTER JOIN "refuel_request_status" AS "refuel_request_status" ON "refuel_request"."refuel_request_status_id" = "refuel_request_status"."refuel_request_status_id"
WHERE
refuel_request."refuel_request_status_id" IN ( '1', '2', '3')
ORDER BY "refuel_request".created_at desc
LIMIT 10
There is explain of this query
EXPLAIN (ANALYZE, BUFFERS)
Primary Keys and/or Foreign Keys
pk_refuel_request_id
refuel_request_bill_qr_id_fkey
refuel_request_user_id_fkey
All outer joind tables are 1:n related to refuel_request. This means your query is looking for the last ten created refuel requests with status 1 to 3.
You are outer joining the tables, because not every reful_request is related to a user, a bill_qr, a car, and a status. Or you outer join mistakenly. Anyway, none of the joins changes the number of retrieved rows; it's still one row per refuel request. In order to join the other tables' rows the DBMS just needs their primary key indexes. Nothing to worry about.
The only thing we must care about is finding the top reful_request rows for the statuses you are interested in as quickly as possible.
Use a partial index that only contains data for the statuses in question. The column you index is the created_at column, so as to get the top 10 immediately.
CREATE INDEX idx ON refuel_request (created_at DESC)
WHERE refuel_request_status_id IN (1, 2, 3);
Partial indexes are explained here: https://www.postgresql.org/docs/current/indexes-partial.html
You cannot have an index that supports both the WHERE condition and the ORDER BY, because you are using IN and not =.
The fastest option is to split the query into three parts, so that each part compares refuel_request.refuel_request_status_id with =. Combine these three queries with UNION ALL. Each of the queries has ORDER BY and LIMIT 10, and you wrap the whole thing in an outer query that has another ORDER BY and LIMIT 10.
Then you need these indexes:
CREATE INDEX ON refuel_request (refuel_request_status_id, created_at);
CREATE INDEX ON "user" (user_id);
CREATE INDEX ON bill_qr (bill_qr_id);
CREATE INDEX ON car (car_id);
CREATE INDEX ON refuel_request_status (refuel_request_status_id);
You need at least the indexes for the joins (do you really need LEFT joins?)
LEFT OUTER JOIN "user" AS "user" ON "refuel_request"."user_id" = "user"."user_id"
So, refuel_request.user_id must be in the index
LEFT OUTER JOIN "bill_qr" AS "bill_qr" ON "refuel_request"."bill_qr_id" =
LEFT OUTER JOIN "car" AS "order.car" ON "refuel_request"."car_id" =
bill_qr_id and car_id too
LEFT OUTER JOIN "refuel_request_status" AS "refuel_request_status" ON "refuel_request"."refuel_request_status_id" =
and refuel_request_status_id
WHERE
refuel_request."refuel_request_status_id" IN ( '1', '2', '3')
refuel_request_status_id must be the first key in the index as we need it in the WHERE
ORDER BY "refuel_request".created_at desc
and then created_at since it's in the ORDER clause. This will not improve performances per se, but will allow to run the ORDER BY without requiring access to the table data, the same reason why we put the other non-WHERE columns in there. Of course a partial index is even better, we shift the WHERE in the partiality clause and use created_at for the rest (the LIMIT 10 now means that we can do without the extra columns in the index, since retrieving three 1:N rows costs very little; in a different situation we might find it useful to keep those extra columns).
So one index that contains, in this order:
refuel_request_status_id, created_at, bill_qr_id, car_id too, user_id
^ WHERE ^ ORDER ^ used by the JOINS
However, do you really need a SELECT *? I believe you'd get better performances if you only included the fields you're really going to use.
The most effective index for this query would be on refuel_request (refuel_request_status_id, created_at DESC) so that both the main filtering and the ordering can be done using the index. You also want indexes on the columns you're joining, but those tables are small and inconsequential at the moment. In any case, the index I suggest isn't actually going to help much with the performance pain points you're having right now. Here are some suggestions:
Don't use SELECT * unless you really need all of the columns from all of these tables you're joining. Specifying only the necessary columns means postgres can load less data into memory, and work over it faster.
Postgres is spending a lot of time on the joins, joining about a million rows each time, when you're really only interested in ten of those rows. We can encourage it do the order/limit first by rearranging the query somewhat:
WITH refuel_request_subset AS MATERIALIZED (
SELECT *
FROM refuel_request
WHERE refuel_request_status_id IN ('1', '2', '3')
ORDER BY created_at DESC
LIMIT 10
)
SELECT *
FROM refuel_request_subset AS refuel_request
LEFT OUTER JOIN user ON refuel_request.user_id = user.user_id
LEFT OUTER JOIN bill_qr ON refuel_request.bill_qr_id = bill_qr.bill_qr_id
LEFT OUTER JOIN car AS "order.car" ON refuel_request.car_id = "order.car".car_id
LEFT OUTER JOIN refuel_request_status ON refuel_request.refuel_request_status_id = refuel_request_status.refuel_request_status_id;
Note: This assumes that the LEFT JOINS will not add rows to the result set, as is the case with your current dataset.
This trick only really works if you have a fixed number of IDs, but you can do the refuel_request_subset query separately for each ID and then UNION the results, as opposed to using the IN operator. That would allow postgres to fully use the index mentioned above.

Use ORDER BY UNIX_DATE() for join table in Bigquery

In my queries, I have used the unix_date function to group and count the data from backlogs to specific date. All works very well.
..
SELECT
*,
FROM
table1
FULL OUTER JOIN table2 USING (ID)
I'm not sure what should I add for the joining part to get a right query. I skipped the details of query as the query is quite long to be put on this post. Please let me know if you need the full query.
Problem: I think the join table append the row instead of just adding the column from joined query results because there are many same IDs in all tables (many-many relationship problem).However, not sure how to solve it.
Solved using composite key.
..
SELECT
*,
FROM
table1
FULL OUTER JOIN table2 USING (ID, Date)

Best way to get distinct count from a query joining two tables

I have 2 tables, table A & table B.
Table A (has thousands of rows)
id
uuid
name
type
created_by
org_id
Table B (has a max of hundred rows)
org_id
org_name
I am trying to get the best join query to obtain a count with a WHERE clause. I need the count of distinct created_bys from table A with an org_name in Table B that contains 'myorg'. I currently have the below query (producing expected results) and wonder if this can be optimized further?
select count(distinct a.created_by)
from a left join
b
on a.org_id = b.org_id
where b.org_name like '%myorg%';
You don't need a left join:
select count(distinct a.created_by)
from a join
b
on a.org_id = b.org_id
where b.org_name like '%myorg%'
For this query, you want an index on b.org_id, which I assume that you have.
I would use exists for this:
select count(distinct a.created_by)
from a
where exists (select 1 from b where b.org_id = a.org_id and b.org_name like '%myorg%')
An index on b(org_id) would help. But in terms of performance, key points are:
searching using like with a wildcard on both sides is not good for performance (this cannot take advantage of an index); it would be far better to search for an exact match, or at least to not have a wildcard on the left side of the string.
count(distinct ...) is more expensive than a regular count(); if you don't really need distinct, then don't use it.
Your query looks good already. Use a plain [INNER] JOIN instead or LEFT [OUTER] JOIN, like Gordon suggested. But that won't change much.
You mention that table B has only ...
a max of hundred rows
while table A has ...
thousands of rows
If there are many rows per created_by (which I'd expect), then there is potential for an emulated index skip scan.
(The need to emulate it might go away in one of the coming Postgres versions.)
Essential ingredient is this multicolumn index:
CREATE INDEX ON a (org_id, created_by);
It can replace a simple index on just (org_id) and works for your simple query as well. See:
Is a composite index also good for queries on the first field?
There are two complications for your case:
DISTINCT
0-n org_id resulting from org_name like '%myorg%'
So the optimization is harder to implement. But still possible with some fancy SQL:
SELECT count(DISTINCT created_by) -- does not count NULL (as desired)
FROM b
CROSS JOIN LATERAL (
WITH RECURSIVE t AS (
( -- parentheses required
SELECT created_by
FROM a
WHERE org_id = b.org_id
ORDER BY created_by
LIMIT 1
)
UNION ALL
SELECT (SELECT created_by
FROM a
WHERE org_id = b.org_id
AND created_by > t.created_by
ORDER BY created_by
LIMIT 1)
FROM t
WHERE t.created_by IS NOT NULL -- stop recursion
)
TABLE t
) a
WHERE b.org_name LIKE '%myorg%';
db<>fiddle here (Postgres 12, but works in Postgres 9.6 as well.)
That's a recursive CTE in a LATERAL subquery, using a correlated subquery.
It utilizes the multicolumn index from above to only retrieve a single row for every (org_id, created_by). With an index-only scans if the table is vacuumed enough.
The main objective of the sophisticated SQL is to completely avoid a sequential scan (or even a bitmap index scan) on the big table and only read very few fast index tuples.
Due to the added overhead it can be a bit slower for an unfavorable data distribution (many org_id and/or only few rows per created_by) But it's much faster for favorable conditions and is scales excellently, even for millions of rows. You'll have to test to find the sweet spot.
Related:
Optimize GROUP BY query to retrieve latest row per user
What is the difference between LATERAL and a subquery in PostgreSQL?
Is there a shortcut for SELECT * FROM?

Can any one please suggest which sql query is better approach for large data set in mysql

I have to find the max id row for same group in a table and show the roe details.Using following two approach we can achieve it. But want know which will be good approach for large data. Or any other new approach that will take less time to execute?
Approach 1:
select a.* from tab1 a left join (SELECT max(id) as id,name from tab1
GROUP by name) as tab2 on a.id=tab2.id where a.id=tab2.id
Approach 2:
SELECT id,name from tab1 where id in(SELECT MAX(id) FROM tab1 GROUP by name)
Taken from the manual (13.2.10.11 Rewriting Subqueries as Joins):
A LEFT JOIN can be faster than a subquery because
the server might be able to optimize it better.
So subqueries can be slower than LEFT [OUTER] JOINS, but in my opinion their strength is slightly higher readability. But since there is one LEFT JOIN and a subquery in the first approach, the second approach might be faster at large scale query.
You can also use a window function to avoid a self-join:
SELECT id, name
FROM (
SELECT
id,
name,
RANK() OVER(PARTITION BY name ORDER BY Id DESC) AS IdRankPerGroup
FROM tab1
) src
WHERE IdRankPerGroup = 1
The RANK() function orders each row per "name" group and assigns a ranking based on the "id" value within each group. Then in the outer query, you just get the rows with a ranking = 1.
Try all three queries, check out the EXPLAIN plans, and see which one works best with large amounts of data.

SQL COUNT(col) vs extra logging column... efficiency?

I can't seem to find much information about this.
I have a table to log users comments. I have another table to log likes / dislikes from other users for each comment.
Therefore, when selecting this data to be displayed on a web page, there is a complex query requiring joins and subqueries to count all likes / dislikes.
My example is a query someone kindly helped me with on here to achieve the required results:
SELECT comments.comment_id, comments.descr, comments.created, usrs.usr_name,
(SELECT COUNT(*) FROM comment_likers WHERE comment_id=comments.comment_id AND liker=1)likes,
(SELECT COUNT(*) FROM comment_likers WHERE comment_id=comments.comment_id AND liker=0)dislikes,
comment_likers.liker
FROM comments
INNER JOIN usrs ON ( comments.usr_id = usrs.usr_id )
LEFT JOIN comment_likers ON ( comments.comment_id = comment_likers.comment_id
AND comment_likers.usr_id = $usrID )
WHERE comments.topic_id=$tpcID
ORDER BY comments.created DESC;
However, if I added a likes and dislikes column to the COMMENTS table and created a trigger to automatically increment / decrement these columns as likes get inserted / deleted / updated to the LIKER table then the SELECT statement would be more simple and more efficient than it is now. I am asking, is it more efficient to have this complex query with the COUNTS or to have the extra columns and triggers?
And to generalise, is it more efficient to COUNT or to have an extra column for counting when being queried on a regular basis?
Your query is very inefficient. You can easily eliminate those sub queries, which will dramatically increase performance:
Your two sub queries can be replaced by simply:
sum(liker) likes,
sum(abs(liker - 1)) dislikes,
Making the whole query this:
SELECT comments.comment_id, comments.descr, comments.created, usrs.usr_name,
sum(liker) likes,
sum(abs(liker - 1)) dislikes,
comment_likers.liker
FROM comments
INNER JOIN usrs ON comments.usr_id = usrs.usr_id
LEFT JOIN comment_likers ON comments.comment_id = comment_likers.comment_id
AND comment_likers.usr_id = $usrID
WHERE comments.topic_id=$tpcID
ORDER BY comments.created DESC;