Tricky (MS)SQL query with aggregated functions - sql

I have these three tables:
table_things: [id]
table_location: [id]
[location]
[quantity]
table_reservation: [id]
[quantity]
[location]
[list_id]
Example data:
table_things:
id
1
2
3
table_location
id location quantity
1 100 10
1 101 4
2 100 1
table_reservation
id quantity location list_id
1 2 100 500
1 1 100 0
2 1 100 0
They are connected by [id] being the same in all three tables and [location] being the same in table_loation and table_reservation.
[quantity] in table_location shows how many ([quantity]) things ([id]) are in a certain place ([location]).
[quantity] in table_reservation shows how many ([quantity]) things ([id]) are reserved in a certain place ([location]).
There can be 0 or many rows in table_reservation that correspond to table_location.id = table_reservation_id, so I probably need to use an outer join for that.
I want to create a query that answers the question: How many things ([id]) are in this specific place (WHERE table_location=123), how many of of those things are reserved (table_reservation.[quantity]) and how many of those that are reserved are on a table_reservation.list_id where table_reservation.list_id > 0.
I can't get the aggregate functions right to where the answer contains only the number of lines that are in table_location with the given WHERE clause and at the same time I get the correct number of table_reservation.quantity.
If I do this I get the correct number of lines in the answer:
SELECT table_things.[id],
table_location.[quantity],
SUM(table_reservation.[quantity]
FROM table_location
INNER JOIN table_things ON table_location.[id] = table_things.[id]
RIGHT OUTER JOIN table_reservation ON table_things.location = table_reservation.location
WHERE table_location.location = 100
GROUP BY table_things.[id], table_location[quantity]
But the problem with that query is that I (of course) get an incorrect value for SUM(table_reservation.[quantity]) since it sums up all the corresponding rows in table_reservation and posts the same value on each of the rows in the result.
The second part is trying to get the correct value for the number of table_reservation.[quantity] whose list_id > 0. I tried something like this for that, in the SELECT list:
(SELECT SUM(CASE WHEN table_reservation.list_id > 0 THEN table_reservation.[quantity] ELSE 0 END)) AS test
But that doesn't even parse... I'm just showing it to show my thinking.
Probably an easy SQL problem, but it's been too long since I was doing these kinds of complicated queries.

For your first two questions:
How many things ([id]) are in this specific place (WHERE table_location=123), how many of of those things are reserved (table_reservation.[quantity])
I think you simply need a LEFT OUTER JOIN instead of RIGHT, and an additional join predicate for table_reservation
SELECT l.id,
l.quantity,
Reserved = SUM(ISNULL(r.quantity, 0))
FROM table_location AS l
INNER JOIN table_things AS t
ON t.id = l.ID
LEFT JOIN table_reservation r
ON r.id = t.id
AND r.location = l.location
WHERE l.location = 100
GROUP BY l.id, l.quantity;
N.B I have added ISNULL so that when nothing is reserved you get a result of 0 rather than NULL. You also don't actually need to reference table_things at all, but I am guessing this is a simplified example and you may need other fields from there so have left it in. I have also used aliases to make the query (in my opinion) easier to read.
For your 3rd question:
and how many of those that are reserved are on a table_reservation.list_id where table_reservation.list_id > 0.
Then you can use a conditional aggregate (CASE expression inside your SUM):
SELECT l.id,
l.quantity,
Reserved = SUM(r.quantity),
ReservedWithListOver0 = SUM(CASE WHEN r.list_id > 0 THEN r.[quantity] ELSE 0 END)
FROM table_location AS l
INNER JOIN table_things AS t
ON t.id = l.ID
LEFT JOIN table_reservation r
ON r.id = t.id
AND r.location = l.location
WHERE l.location = 100
GROUP BY l.id, l.quantity;
As a couple of side notes, unless you are doing it for the right reasons (so that different tables are queried depending on who is executing the query), then it is a good idea to always use the schema prefix, i.e. dbo.table_reservation rather than just table_reservation. It is also quite antiquated to prefix your object names with the object type (i.e. dbo.table_things rather than just dbo.things). It is somewhat subject, but this page gives a good example of why it might not be the best idea.

You can use a query like the following:
SELECT tt.[id],
tl.[quantity],
tr.[total_quantity],
tr.[partial_quantity]
FROM table_location AS tl
INNER JOIN table_things AS tt ON tl.[id] = tt.[id]
LEFT JOIN (
SELECT id, location,
SUM(quantity) AS total_quantity,
SUM(CASE WHEN list_id > 0 THEN quantity ELSE 0 END) AS partial_quantity
FROM table_reservation
GROUP BY id, location
) AS tr ON tl.id = tr.id AND tl.location = tr.location
WHERE tl.location = 100
The trick here is to do a LEFT JOIN to an already aggregated version of table table_reservation, so that you get one row per id, location. The derived table uses conditional aggregation to calculate field partial_quantity that contains the quantity where list_id > 0.
Output:
id quantity total_quantity partial_quantity
-----------------------------------------------
1 10 3 2
2 1 1 0

This was a classic case of sitting with a problem for a few hours and getting nowhere and then when you post to stackoverflow, you suddenly come up with the answer. Here's the query that gets me what I want:
SELECT table_things.[id],
table_location.[quantity],
SUM(table_reservation.[quantity],
(SELECT SUM(CASE WHEN table_reservation.list_id > 0 THEN ISNULL(table_reservation.[quantity], 0) ELSE 0 END)) AS test
FROM table_location
INNER JOIN table_things ON table_location.[id] = table_things.[id]
RIGHT OUTER JOIN table_reservation ON table_things.location = table_reservation.location AND table_things.[id] = table_reservation.[id]
WHERE table_location.location = 100
GROUP BY table_things.[id], table_location[quantity]
Edit: After having read GarethD's reply below, I did the changes he suggested (to my real code, not to the query above) which makes the (real) query correct.

Related

How can I get a distinct count of a named inner join?

I've built a query for a summary table of information, and it's almost there, with one small bug. The confirmed_class_count variable comes back too high if there's multiple users on a class, leading me to believe that the number isn't distinct
Here's my current code:
SELECT "staffs".*,
count(distinct subclasses) as class_count,
sum(case when users.confirmed_at is not null then 1 else 0 end) confirmed_class_count
FROM
staffs
INNER JOIN classes as subclasses on staffs.staff_id = ANY(subclasses.staff)
INNER JOIN "classes_users" ON "classes_users"."class_id" = "subclasses"."id"
INNER JOIN "users" ON "users"."id" = "classes_users"."user_id"
INNER JOIN class_types ON class_types.code = subclasses.class_type_code
WHERE
(subclasses.closed_date is NULL OR subclasses.closed_date > '2019-09-06')
GROUP BY
staffs.id ORDER BY "staffs"."full_name" ASC
I want to replace the sum with something like (select distinct count(*) from subcases where users.confirmed_at is not null) as confirmed_case_count but I get relation "subclasses" does not exist.
How do I get what I'm intending here?
You can use count distinct with conditional aggregation. Replace
sum(class when users.confirmed_at is not null then 1 else 0 end) confirmed_class_count
^ looks like a typo, this should be case not class
with
count(distinct case when users.confirmed_at is not null then classes_users.class_id end) confirmed_class_count

Best way to compare two sets of data w/ SQL

What I have is a query that grabs a set of data. This query is ran at a certain time. Then, 30 minutes later, I have another query (same syntax) that runs and grabs that same set of data. Finally, I have a third query (which is the query in question) that compares both sets of data. The records it pulls out are ones that agree with: if "FEDVIP_Active" was FALSE in the first data set and TRUE in the second data set, OR "UniqueID" didn't exist in the first data set and does in the second data set AND FEDVIP_Active is TRUE. I'm questioning the performance of the query below that does the comparison. It times out after 30 minutes. Is there anything you can see that I shouldn't be doing in order to be the most efficient to run? The two identical-ish data sets I'm comparing have around a million records each.
First query that grabs the initial set of data:
select Unique_ID, First_Name, FEDVIP_Active, Email_Primary
from Master_Subscribers_Prospects
Second query is exactly the same as the first.
Then, the third query below compares the data:
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b
on 1 = 1
where a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0 or
(b.Unique_ID not in (select Unique_ID from Master_Subscribers_Prospects_1) and b.FEDVIP_Active = 1)
If I understand correctly, you want all records from the second data set where the corresponding unique id in the first data set is not active (either by not existing or by having the flag set to not active).
I would suggest exists:
select a.*
from Master_Subscribers_Prospects_1 a
where a.FEDVIP_Active = 1 and
not exists (select 1
from Master_Subscribers_Prospects_2 b
where b.Unique_ID = a.Unique_ID and
b.FEDVIP_Active = 1
);
For performance, you want an index on Master_Subscribers_Prospects_2(Unique_ID, FEDVIP_Active).
An inner join on 1 = 1 is a disguised cross join and the number of rows a cross join produces can grow rapidly. It's the product of the number of rows in both relations involved. For performance you want to keep intermediate results as small as possible.
Then instead of IN EXISTS is often performing better, when the number of rows of the subquery is large.
But I think you don't need IN or EXITS at all.
Assuming unique_id identifies a record and is not null, you could left join the first table to the second one on common unique_ids. Then if and only if no record for an unique_id in the second table exits the unique_id of the first table in the result of the join is null, so you can check for that.
SELECT b.fedvip_active,
b.unique_id,
b.first_name,
b.email_primary
FROM master_subscribers_prospects_2 b
LEFT JOIN master_subscribers_prospects_1 a
ON b.unique_id = a.unique_id
WHERE a.fedvip_active = 1
AND b.fedvip_active = 0
OR a.unique_id IS NULL
AND b.fedvip_active = 1;
For that query indexes on master_subscribers_prospects_1 (unique_id, fedvip_active) and master_subscribers_prospects_2 (unique_id, fedvip_active) might also help to speed things up.
Doing an inner select in where sats is always bad.
Here is a same version with a left join, that might work for you.
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b on 1 = 1
left join Master_Subscribers_Prospects_1 sa on sa.Unique_ID = b.Unique_ID
where (a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0) or
(sa.Unique_ID is null and b.FEDVIP_Active = 1)

Improve SQL queries reducing inner joins with In

I came across some legacy code in which they used two inner joins I replaced that with following .
select p.id
from publisher p
inner join retailer r on p.retailer_id = r.id
where p.id IN (1,4,5 .... around 100 or more random ids I get from code)
order by case r.sector_id when r.sector_id then 1 else 2 end,
p.seo_frontpage_factor asc,
case (r.poi_sector_id) when r.poi_sector_id then 3 else 4 end,
p.id asc
Previous query was like below.
select p.id
from publisher p
inner join retailer r on p.retailer_id = r.id
inner join (select 1 as id union all select 2 ,union all select 3 ......) as x on p.id = x.id
order by case r.sector_id when r.sector_id then 1 else 2 end,
p.seo_frontpage_factor asc,
case (r.poi_sector_id) when r.poi_sector_id then 3 else 4 end,
p.id asc
My question is would this be a good idea reducing inner joins to improve performance ? If not what can I do optimise this or this is good as it is ?
The IN clause is more readable than the inner join. So it is a good idea to make this change as to increase readablility. The optimizer should treat the two queries just the same, if it is any good :-)
The query is obviously made to rank the 100+ publisher IDs by their retailer. It is quite rare to only return the publisher IDs, but well, we don't know what's done with the result of course. Obviously the app only needs the IDs ordered by retailer rank.
UPDATE: I had errors in the following part that are now corrected thanks to joop.
You can increase readability further:
case r.sector_id when r.sector_id then 1 else 2 end
is simply
case when r.sector_id is null then 2 else 1 end
Same for
case (r.poi_sector_id) when r.poi_sector_id then 3 else 4 end
which is
case when r.poi_sector_id is null then 4 else 3 end
A strange way of writing this. May even be a mistake.
The CASE expressions sort nulls after non-nulls, so we could just as well use
order by r.sector_id is null, p.seo_frontpage_factor, r.poi_sector_id is null, p.id
as PostgreSQL sorts false before true (same as MySQL by the way).

Why left join is not giving distinct result?

I have following sql query and my left join is not giving me distinct result please help me to trace out.
SELECT DISTINCT
Position.Date,
Position.SecurityId,
Position.PurchaseLotId,
Position.InPosition,
ISNULL(ClosingPrice.Bid, Position.Mark) AS Mark
FROM
Fireball_Reporting.dbo.Reporting_DailyNAV_Pricing POSITION WITH (NOLOCK, READUNCOMMITTED)
LEFT JOIN Fireball.dbo.AdditionalSecurityPrice ClosingPrice WITH (NOLOCK, READUNCOMMITTED) ON
ClosingPrice.SecurityID = Position.PricingSecurityID AND
ClosingPrice.Date = Position.Date AND
ClosingPrice.SecurityPriceSourceID = #SourceID AND
ClosingPrice.PortfolioID IN (5,6)
WHERE
DatePurchased > #NewPositionDate AND
Position.Date = #CurrentPositionDate AND
InPosition = 1 AND
Position.PortfolioId IN (
SELECT
PARAM
FROM
Fireball_Reporting.dbo.ParseMultiValuedParameter(#PortfolioId, ',')
) AND
(
Position > 1 OR
Position < - 1
)
Now here in above my when I use LEFT JOIN ISNULL(ClosingPrice.Bid, Position.Mark) AS Mark and LEFT JOIN it is giving me more no of records with mutiple portfolio ids
for e.g . (5,6)
If i put portfolioID =5 giving result as 120 records
If i put portfolioID =6 giving result as 20 records
When I put portfolioID = (5,6) it should give me 140 records
but it is giving result as 350 records which is wrong . :(
It is happening because when I use LEFT JOIN there is no condition of PurchaseLotID in that as table Fireball.dbo.AdditionalSecurityPrice ClosingPrice not having column PurchaseLotID so it is giving me other records also whoes having same purchaseLotID's with diferent prices .
But I dont want that records
How can I eliminate those records ?
You get one Entry per DailyLoanAndCashPosition.PurchaseLotId = NAVImpact.PurchaseLotId
which would mean you must have more entrys in with the same PurchaseLotId
The most likely cause is that the left join produces duplicated PurchaseLotIds. The best way to know if if you perform a select distinct(PurchaseLotId) on your left side of the inner join.

SQL Having Clause

I'm trying to get a stored procedure to work using the following syntax:
select count(sl.Item_Number)
as NumOccurrences
from spv3SalesDocument as sd
left outer join spv3saleslineitem as sl on sd.Sales_Doc_Type = sl.Sales_Doc_Type and
sd.Sales_Doc_Num = sl.Sales_Doc_Num
where
sd.Sales_Doc_Type='ORDER' and
sd.Sales_Doc_Num='OREQP0000170' and
sl.Item_Number = 'MCN-USF'
group by
sl.Item_Number
having count (distinct sl.Item_Number) = 0
In this particular case when the criteria is not met the query returns no records and the 'count' is just blank. I need a 0 returned so that I can apply a condition instead of just nothing.
I'm guessing it is a fairly simple fix but beyond my simple brain capacity.
Any help is greatly appreciated.
Wally
First, having a specific where clause on sl defeats the purpose of the left outer join -- it bascially turns it into an inner join.
It sounds like you are trying to return 0 if there are no matches. I'm a T-SQL programmer, so I don't know if this will be meaningful in other flavors... and I don't know enough about the context for this query, but it sounds like you are trying to use this query for branching in an IF statement... perhaps this will help you on your way, even if it is not quite what you're looking for...
IF NOT EXISTS (SELECT 1 FROM spv3SalesDocument as sd
INNER JOINs pv3saleslineitem as sl on sd.Sales_Doc_Type = sl.Sales_Doc_Type
and sd.Sales_Doc_Num = sl.Sales_Doc_Num
WHERE sd.Sales_Doc_Type='ORDER'
and sd.Sales_Doc_Num='OREQP0000170'
and sl.Item_Number = 'MCN-USF')
BEGIN
-- Do something...
END
I didn't test these but off the top of my head give them a try:
select ISNULL(count(sl.Item_Number), 0) as NumOccurrences
If that one doesn't work, try this one:
select
CASE count(sl.Item_Number)
WHEN NULL THEN 0
WHEN '' THEN 0
ELSE count(sl.Item_Number)
END as NumOccurrences
This combination of group by and having looks pretty suspicious:
group by sl.Item_Number
having count (distinct sl.Item_Number) = 0
I'd expect this having condition to approve only groups were Item_Number is null.
To always return a row, use a union. For example:
select name, count(*) as CustomerCount
from customers
group by
name
having count(*) > 1
union all
select 'No one found!', 0
where not exists
(
select *
from customers
group by
name
having count(*) > 1
)