Improve SQL queries reducing inner joins with In - sql

I came across some legacy code in which they used two inner joins I replaced that with following .
select p.id
from publisher p
inner join retailer r on p.retailer_id = r.id
where p.id IN (1,4,5 .... around 100 or more random ids I get from code)
order by case r.sector_id when r.sector_id then 1 else 2 end,
p.seo_frontpage_factor asc,
case (r.poi_sector_id) when r.poi_sector_id then 3 else 4 end,
p.id asc
Previous query was like below.
select p.id
from publisher p
inner join retailer r on p.retailer_id = r.id
inner join (select 1 as id union all select 2 ,union all select 3 ......) as x on p.id = x.id
order by case r.sector_id when r.sector_id then 1 else 2 end,
p.seo_frontpage_factor asc,
case (r.poi_sector_id) when r.poi_sector_id then 3 else 4 end,
p.id asc
My question is would this be a good idea reducing inner joins to improve performance ? If not what can I do optimise this or this is good as it is ?

The IN clause is more readable than the inner join. So it is a good idea to make this change as to increase readablility. The optimizer should treat the two queries just the same, if it is any good :-)
The query is obviously made to rank the 100+ publisher IDs by their retailer. It is quite rare to only return the publisher IDs, but well, we don't know what's done with the result of course. Obviously the app only needs the IDs ordered by retailer rank.
UPDATE: I had errors in the following part that are now corrected thanks to joop.
You can increase readability further:
case r.sector_id when r.sector_id then 1 else 2 end
is simply
case when r.sector_id is null then 2 else 1 end
Same for
case (r.poi_sector_id) when r.poi_sector_id then 3 else 4 end
which is
case when r.poi_sector_id is null then 4 else 3 end
A strange way of writing this. May even be a mistake.
The CASE expressions sort nulls after non-nulls, so we could just as well use
order by r.sector_id is null, p.seo_frontpage_factor, r.poi_sector_id is null, p.id
as PostgreSQL sorts false before true (same as MySQL by the way).

Related

Optimization of Top-N and last-N in one query

In RDBMS PostgreSQL 12.8 I have one table (to simplicity I've omitted several columns):
figure (~3.5 millions rows)
id
----
1
2
...
and another table
figure_step (~20 rows for one figure, in total ~70 millions rows)
id figure_id number status
--------------------------
1 1 1 'FINISHED'
2 1 2 'STARTED'
3 2 1 'FINISHED'
4 2 2 'DELAYED'
5 2 3 'CANCELLED'
...
I have query that selects top-1-by-number step in 'DELAYED' status and last-1-by-number step in 'FINISHED' status for figure:
SELECT * FROM figure f
LEFT JOIN LATERAL (SELECT * FROM figure_step fs
WHERE fs.status = 'DELAYED' AND f.id = fs.figure_id
ORDER BY number ASC
LIMIT 1) step_one
ON step_one.figure_id = f.id
LEFT JOIN LATERAL (SELECT * FROM figure_step fs
WHERE fs.status 'FINISHED' AND f.id = fs.figure_id
ORDER BY number DESC
LIMIT 1) step_two
ON step_two.figure_id = f.id
WHERE ...
LIMIT ... OFFSET ...
WHERE clause selects about 19000 rows from ~3.5 million rows with LIMIT about 150 rows. Keyword lateral used here to prevent joining big tables figure and figure_step.
So, I have two questions:
Is it appropriate to use lateral here? I think so, because without it we need to join figure_step two times using left join.
Here we join figure_step two times with lateral join. Are there any ways to optimize this, for example, reuse part of the subquery?
I'm probably wrong about this, and misunderstand the importance of that number.
But maybe one lateral might be enough.
By using conditional aggregation, instead of the double order & limit.
...
LEFT JOIN LATERAL (
SELECT
MIN(CASE WHEN fs.status = 'DELAYED' THEN fs.id END) MinDelayedId
, MAX(CASE WHEN fs.status = 'FINISHED' THEN fs.id END) MaxFinishedId
FROM figure_step fs
WHERE f.id = fs.figure_id
) step
...

SQL Query SELECT same data from different tables, show all records, but show / display matches

I have two tables where I would like to compare information (In order to get from the initial table to the other I need to go through a reference table).
I am getting the results I am looking for except when a match is found an extra row of data is added (screen shot below). There should be only 4 rows, I don't understand why the value in column 1 row 5 wasn't just added to column 1 row 4.
Any help would be much appreciated.
Code
Select DISTINCT
CASE
WHEN LIC.ORDER_NUM = LN_STLIC.ORDER_NUMBER THEN LIC.ORDER_NUM
ELSE ''
END 'ORDER Number 1',
LN_STLIC.ORDER_NUMBER 'ORDER Number 2'
from LN_TABLE1 LN_STLIC
LEFT OUTER JOIN LN_REF LN_PDE_RTN on LN_STLIC.LNPID = LN_PDE_RTN.LNPID
LEFT OUTER JOIN LN_TABLE2 LIC on LN_PDE_RTN.ID = LIC.ID
where LIC.ID = '123456'
Example Table Data
LN_TABLE1
LN_REF
LN_TABLE2
Results
You've defined Order Number 1 as
CASE
WHEN LIC.ORDER_NUM = LN_STLIC.ORDER_NUMBER THEN LIC.ORDER_NUM
ELSE ''
END
So, you can reasonably infer that when Order Number 1 is blank, it's because LIC.ORDER_NUM doesn't match LN_STLIC.ORDER_NUMBER.
You asked for a DISTINCT on the combination of Order Number 1 and Order Number 2. So every combination of the two of them that appears in the data, will appear just once.
Because LN_TABLE1 has four different order numbers for the same value of LNPID, you're going to generate 3 records with a blank Order Number 1 and Order Number 2 set to 210414. But the distinct will take it down to just one, plus the one where they match (records 4 and 5 in your example).
You'd probably have to join on LIC.ORDER_NUM = LN_STLIC.ORDER_NUMBER in order to match the order numbers to each other, and get only 1 record for order number 210414.
I could give you a better query if I knew a bit more about what you were trying to achieve.
Your question is confusing:
We can do a simple LEFT JOIN with LN_STLIC and LIC :
Select CASE WHEN LN_STLIC.order_number = LIC.order_num THEN LIC.ORDER_NUM END ORD
Number1, LN_STLIC.order_number ORD Number2
from LN_TABLE1 LN_STLIC
LEFT OUTER JOIN LN_TABLE2 LIC ON LN_STLIC.order_number = LIC.order_num;
Also doing the below also gets you the same things in case you want to use LIC.ID = '1234' and there is no need to use DISTINCT.
Select
CASE WHEN LIC.ORDER_NUM = LN_STLIC.ORDER_NUMBER THEN LIC.ORDER_NUM
ELSE ''
END ORD_NBR_1,
LN_STLIC.ORDER_NUMBER ORD_NBR_2
from #LN_TABLE1 LN_STLIC
LEFT OUTER JOIN #LN_REF LN_PDE_RTN on LN_STLIC.LNPID = LN_PDE_RTN.LNPID
LEFT OUTER JOIN #LN_TABLE2 LIC on LN_PDE_RTN.ID = LIC.ID AND
LN_STLIC.ORDER_NUMBER =LIC.ORDER_NUM
;

Multiple choice answer T-SQL query

Query is to get whether the user selected multiple choice answer for a question is right or not, if correct then 1 else 0
I have two tables question_answer and user_exam_answer, entries table has the user submitted answer in column submitted_option_id
user_exam_answer table
question_answer
I tried to write a query when user answer (user_exam_answer) matches with the question answer (question_answer) table
select
count(1) as result
from
(select
qa.question_id,
count(qa.correct_option_id) as col1,
count(sa.submited_option_id) as col2
from
question_answer qa
left join
user_exam_answers sa on (sa.question_id = qa.question_id
and sa.submited_option_id = qa.correct_option_id
and sa.exam_id = 'html_001'
and sa.user_id = 'user_123')
group by
qa.question_id
having
count(qa.correct_option_id) = count(sa.submited_option_id)
) as t
But the problem is when:
QA (question_answer.correct_option_id) has 3 entries and SA user_exam_answers.submited_option_id) has 2 entries then the query is correct and returns
QA (question_answer.correct_option_id) has 2 entries and SA (user_exam_answers.submited_option_id) is 3 entries then the query is correct and returns
QA (question_answer.correct_option_id) has 2 entries SA (user_exam_answers.submited_option_id) has 1 entry then the query is correct and returns
but when
QA (question_answer.correct_option_id) has 1 entries and SA (user_exam_answers.submited_option_id) has 2 entries then the query returns the wrong answer
I am looking for a query which holds true for all the four condition
For each question list the expected answers and the submitted answers (you need a FULL OUTER JOIN to do this, a LEFT join is not enough) and count the number of matches. Then compare this count with the count of the expected answers.
select question_id, case when cnt = sum_test then 1 else 0 end as mark
from (
select question_id, count(*) cnt, sum(test) sum_test
from (
select coalesce(q.question_id, s.question_id) as question_id,
correct_option_id,
submitted_option_id,
case when correct_option_id = submitted_option_id then 1 else 0 end as test
from question_answer q full outer join user_exam_answer s
on q.question_id = s.question_id and q.correct_option_id = s.submitted_option_id
) x
group by question_id
) y
You can find a live demo here
it's very unclear what you're trying to do here, but the below should help get you started:
select
sa.user_id,
sa.exam_id,
qa.question_id,
sa.submitted_option_id,
qa.correct_option_id,
case when sa.submitted_option_id = qa.correct_option_id then 1 else 0 end as question_score
from
question_answer qa
LEFT JOIN user_exam_answer sa ON
uea.question_id = qa.question_id
where
sa.exam_id='html_001'
and sa.user_id='user_123'
I'd expect the qa table to also have an exam_id column, but that isn't in your images.
Please try the following...
SELECT user_id,
exam_id,
question_answer.question_id AS question_id,
submited_option_id,
correct_option_id,
CASE
WHEN correct_option_id = submited_option_id THEN
1
ELSE
0
END AS marked_option_id
FROM question_answer
LEFT JOIN user_exam_answer ON question_answer.question_id = user_exam_answer.question_id
ORDER BY user_id,
exam_id,
question_id;
This query performs joins the two tables together using a LEFT JOIN so that a record for an an unanswered question is still returned. It then compares the correct answer to the supplied answer for each record and if they match it returns a value of 1, otherwise it will return a value of 0. The results of this comparison are then included in the output as the field marked_option_id. The resulting output is then sorted for convenience of reading.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Tricky (MS)SQL query with aggregated functions

I have these three tables:
table_things: [id]
table_location: [id]
[location]
[quantity]
table_reservation: [id]
[quantity]
[location]
[list_id]
Example data:
table_things:
id
1
2
3
table_location
id location quantity
1 100 10
1 101 4
2 100 1
table_reservation
id quantity location list_id
1 2 100 500
1 1 100 0
2 1 100 0
They are connected by [id] being the same in all three tables and [location] being the same in table_loation and table_reservation.
[quantity] in table_location shows how many ([quantity]) things ([id]) are in a certain place ([location]).
[quantity] in table_reservation shows how many ([quantity]) things ([id]) are reserved in a certain place ([location]).
There can be 0 or many rows in table_reservation that correspond to table_location.id = table_reservation_id, so I probably need to use an outer join for that.
I want to create a query that answers the question: How many things ([id]) are in this specific place (WHERE table_location=123), how many of of those things are reserved (table_reservation.[quantity]) and how many of those that are reserved are on a table_reservation.list_id where table_reservation.list_id > 0.
I can't get the aggregate functions right to where the answer contains only the number of lines that are in table_location with the given WHERE clause and at the same time I get the correct number of table_reservation.quantity.
If I do this I get the correct number of lines in the answer:
SELECT table_things.[id],
table_location.[quantity],
SUM(table_reservation.[quantity]
FROM table_location
INNER JOIN table_things ON table_location.[id] = table_things.[id]
RIGHT OUTER JOIN table_reservation ON table_things.location = table_reservation.location
WHERE table_location.location = 100
GROUP BY table_things.[id], table_location[quantity]
But the problem with that query is that I (of course) get an incorrect value for SUM(table_reservation.[quantity]) since it sums up all the corresponding rows in table_reservation and posts the same value on each of the rows in the result.
The second part is trying to get the correct value for the number of table_reservation.[quantity] whose list_id > 0. I tried something like this for that, in the SELECT list:
(SELECT SUM(CASE WHEN table_reservation.list_id > 0 THEN table_reservation.[quantity] ELSE 0 END)) AS test
But that doesn't even parse... I'm just showing it to show my thinking.
Probably an easy SQL problem, but it's been too long since I was doing these kinds of complicated queries.
For your first two questions:
How many things ([id]) are in this specific place (WHERE table_location=123), how many of of those things are reserved (table_reservation.[quantity])
I think you simply need a LEFT OUTER JOIN instead of RIGHT, and an additional join predicate for table_reservation
SELECT l.id,
l.quantity,
Reserved = SUM(ISNULL(r.quantity, 0))
FROM table_location AS l
INNER JOIN table_things AS t
ON t.id = l.ID
LEFT JOIN table_reservation r
ON r.id = t.id
AND r.location = l.location
WHERE l.location = 100
GROUP BY l.id, l.quantity;
N.B I have added ISNULL so that when nothing is reserved you get a result of 0 rather than NULL. You also don't actually need to reference table_things at all, but I am guessing this is a simplified example and you may need other fields from there so have left it in. I have also used aliases to make the query (in my opinion) easier to read.
For your 3rd question:
and how many of those that are reserved are on a table_reservation.list_id where table_reservation.list_id > 0.
Then you can use a conditional aggregate (CASE expression inside your SUM):
SELECT l.id,
l.quantity,
Reserved = SUM(r.quantity),
ReservedWithListOver0 = SUM(CASE WHEN r.list_id > 0 THEN r.[quantity] ELSE 0 END)
FROM table_location AS l
INNER JOIN table_things AS t
ON t.id = l.ID
LEFT JOIN table_reservation r
ON r.id = t.id
AND r.location = l.location
WHERE l.location = 100
GROUP BY l.id, l.quantity;
As a couple of side notes, unless you are doing it for the right reasons (so that different tables are queried depending on who is executing the query), then it is a good idea to always use the schema prefix, i.e. dbo.table_reservation rather than just table_reservation. It is also quite antiquated to prefix your object names with the object type (i.e. dbo.table_things rather than just dbo.things). It is somewhat subject, but this page gives a good example of why it might not be the best idea.
You can use a query like the following:
SELECT tt.[id],
tl.[quantity],
tr.[total_quantity],
tr.[partial_quantity]
FROM table_location AS tl
INNER JOIN table_things AS tt ON tl.[id] = tt.[id]
LEFT JOIN (
SELECT id, location,
SUM(quantity) AS total_quantity,
SUM(CASE WHEN list_id > 0 THEN quantity ELSE 0 END) AS partial_quantity
FROM table_reservation
GROUP BY id, location
) AS tr ON tl.id = tr.id AND tl.location = tr.location
WHERE tl.location = 100
The trick here is to do a LEFT JOIN to an already aggregated version of table table_reservation, so that you get one row per id, location. The derived table uses conditional aggregation to calculate field partial_quantity that contains the quantity where list_id > 0.
Output:
id quantity total_quantity partial_quantity
-----------------------------------------------
1 10 3 2
2 1 1 0
This was a classic case of sitting with a problem for a few hours and getting nowhere and then when you post to stackoverflow, you suddenly come up with the answer. Here's the query that gets me what I want:
SELECT table_things.[id],
table_location.[quantity],
SUM(table_reservation.[quantity],
(SELECT SUM(CASE WHEN table_reservation.list_id > 0 THEN ISNULL(table_reservation.[quantity], 0) ELSE 0 END)) AS test
FROM table_location
INNER JOIN table_things ON table_location.[id] = table_things.[id]
RIGHT OUTER JOIN table_reservation ON table_things.location = table_reservation.location AND table_things.[id] = table_reservation.[id]
WHERE table_location.location = 100
GROUP BY table_things.[id], table_location[quantity]
Edit: After having read GarethD's reply below, I did the changes he suggested (to my real code, not to the query above) which makes the (real) query correct.

Join Table on one record but calculate field based on other rows in the join

I am trying to write a query to Identify my subscribers who have abandoned a shopping cart in the last day but also I need a calculated field that represents weather or not they have received and incentive in the last 7 days.
I have the following tables
AbandonCart_Subscribers
Sendlog
The first part of the query is easy, get abandoners in the last day
select a.* from AbandonCart_Subscribers
where DATEDIFF(day,a.DateAbandoned,GETDATE()) <= 1
Here is my attempt to calculate the incentive but I am fairly certain it is not correct as IncentiveRecieved is always 0 even when I know it should not be...
select a.*,
CASE
WHEN DATEDIFF(D,s.SENDDATE,GETDATE()) >= 7
THEN 1
ELSE 0
END As IncentiveRecieved
from AbandonCart_Subscribers a
left join SendLog s on a.EmailAddress = s.EmailAddress and s.CampaignID IS NULL
where
DATEDIFF(day,a.DateAbandoned,GETDATE()) <= 1
Here is a SQL fiddle with the objects and some data. I would really appreciate some help.
Thanks
http://sqlfiddle.com/#!3/f481f/1
Kishore is right in saying the main problem is that it should be <=7, not >=7. However, there is another problem.
As it stands, you could get multiple results. You don't want to do a left join on SendLog in case the same email address is in there more than once. Instead you should be getting a unique result from that table. There's a couple of ways of doing that; here is one such way which uses a derived table. The table I have called s will give you a unique list of emails that have been sent an incentive in the last week.
select a.*,
CASE
WHEN s.EmailAddress is not null
THEN 1
ELSE 0
END As IncentiveRecieved
from AbandonCart_Subscribers a
left join (select distinct EmailAddress
from SendLog s
where s.CampaignID IS NULL
and DATEDIFF(D,s.SENDDATE,GETDATE()) <= 7
) s on a.EmailAddress = s.EmailAddress
where DATEDIFF(day,a.DateAbandoned,GETDATE()) <= 1
You can set a variable using a condition:
select a.*,(DATEDIFF(D,s.SENDDATE,GETDATE()) >= 7) as `IncentiveRecieved `
from AbandonCart_Subscribers a
left join SendLog s on a.EmailAddress = s.EmailAddress and s.CampaignID IS NULL
where
DATEDIFF(day,a.DateAbandoned,GETDATE()) <= 1
Is this what you're looking for?
should it not be less than 7 instead of greater than 7?
select a.*,
CASE
WHEN DATEDIFF(D,s.SENDDATE,GETDATE()) <= 7 AND CampaignID is not null
THEN 1
ELSE 0
END As IncentiveRecieved
from AbandonCart_Subscribers a
left join SendLog s on a.EmailAddress = s.EmailAddress --and s.CampaignID IS NULL
where
DATEDIFF(day,a.DateAbandoned,GETDATE()) <= 1
Hope this satisfies your need.