Query returning too many results - sql

SQL query that returns expected 29 results for a.id = 366
select a.name, c.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
where a.id = 366
GROUP BY a.name, c.name
order by MAX(b.renew_date), MAX(b.date) desc;
SQL code below that returns 34 results, multiple results where two different Provides supplied the same course. I know these extra results are because I added e.name to the list to be returned. But all that is needed is the 29 entries with the latest date and Providers names.
select a.name, c.name, e.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
GROUP BY a.name, c.name, e.name
order by MAX(b.renew_date), MAX(b.date) desc;
Can anyone rework this code to return a single DISTINCT Provider name with the MAX(renew_date) for each course.

This returns exactly one row per distinct combination of (a.name, c.name):
The one with the latest renew_date.
Among these, the one with the latest date (may differ from global max(date)!).
Among these, the one with the alphabetically first e.name:
SELECT DISTINCT ON (a.name, c.name)
a.name AS a_name, c.name AS c_name, e.name AS e_name
, b.renew_date, b.date
FROM boson_course c
JOIN boson_coursedetail b on c.id = b.course_id
JOIN boson_coursedetail_attendance d on d.coursedetail_id = b.id
JOIN boson_employee a on a.id = d.employee_id
JOIN boson_provider e on b.provider_id = e.id
WHERE a.id = 366
ORDER BY a.name, c.name
, b.renew_date DESC NULLS LAST
, b.date DESC NULLS LAST
, e.name;
The result is sorted by a_name, c_name first. If you need your original sort order, wrap this in a subquery:
SELECT *
FROM (<query from above>) sub
ORDER BY renew_date DESC NULLS LAST
, date DESC NULLS LAST
, a_name, c_name, e_name;
Explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Why DESC NULL LAST?
PostgreSQL sort by datetime asc, null first?
Aside: Don't use basic type names like date ad column names. Also, name is hardly ever a good name. As you can see, we have to use aliases to make this query useful. Some general advice on naming conventions:
How to implement a many-to-many relationship in PostgreSQL?

Try using distinct on:
select distinct on (a.name, c.name, e.name), a.name, c.name, e.name,
B.date, b.renew_date as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
ORDER BY a.name, c.name, e.name, B.date desc
order by MAX(b.renew_date), MAX(b.date) desc;

Related

SQL - Sum of query not true

I have this query;
SELECT l.Name, COALESCE(SUM(A.Count), 0) AS A, COALESCE(SUM(B.Count), 0) AS B
FROM List l
LEFT JOIN A ON A.Name = l.Name
LEFT JOIN B ON B.Name = l.Name
GROUP BY l.Name
ORDER BY l.Name
And query results not true.
Sum of Product3 in Table A is not true.
Demo : https://www.db-fiddle.com/f/rdKLkyaeEsi8bPcNPkUnTE/4
You could sum separately for A and B and then combine results:
SELECT Name, MAX(A) AS A, MAX(B) AS B
FROM (
SELECT l.Name, SUM(A.Count) AS A, 0 AS B
FROM List l
LEFT JOIN A ON A.Name = l.Name
GROUP BY l.Name
UNION ALL
SELECT l.Name, 0 AS A, SUM(B.Count)AS B
FROM List l
LEFT JOIN B ON B.Name = l.Name
GROUP BY l.Name) sub
GROUP BY Name
ORDER BY Name;
db-fiddle.com demo
You should be aggregating the A and B tables in separate subqueries:
SELECT
l.Name,
COALESCE(a.cnt, 0) AS a_cnt,
COALESCE(b.cnt, 0) AS b_cnt
FROM List l
LEFT JOIN
(
SELECT Name, SUM(Count) AS cnt
FROM A
GROUP BY Name
) a
ON l.Name = a.name
LEFT JOIN
(
SELECT Name, SUM(Count) AS cnt
FROM B
GROUP BY Name
) b
ON l.Name = b.name;
The problem with your current approach is that the double join to the A and B tables is likely resulting in double counting. By using separate subqueries we avoid this problem.
In your original question on this subject, I suggested correlated subqueries. These are probably the easiest way to accomplish what you want:
select l.name,
(select sum(a.count)
from a
where a.name = l.name
) as a,
(select sum(b.count)
from b
where b.name = l.name
) as b
from list l;
You should check null values before sum() not after.
SELECT l.Name, SUM(COALESCE(A.Count, 0)) AS A, SUM(COALESCE(B.Count, 0)) AS B
FROM List l
LEFT JOIN A ON A.Name = l.Name
LEFT JOIN B ON B.Name = l.Name
GROUP BY l.Name
ORDER BY l.Name

Can i do a inner "With" inside another "With" in SQL?

I'm trying to use multiple SQL With clauses.
The reason of me using multiple With is that I'm sending this SQL to a AS400 project. The With TEMP has to be obligatory instead of Temp2 that has to be optional.
I can't figure out how to do it. This SQL still throws an error:
With Temp2 As
(
With Temp As
(
Select Name, Surname, Age
From People
Where Age > 18
)
Select A.*, B.*
From Temp A
Left Join City B on B.Name = A.Name
and B.Surname = A.Surname
Where B.City = "Venice"
)
Select *
From Temp2 C
Left Join State D on D.City = C.City
I'd like to understand how I can do something like that.
Yes, any CTE can reference a CTE that is created before it. The first CTE must be prefaced by "With" and terminated with a comma, which allows for another CTE to be created.
with temp as
(
select name, surname, age
from people
where age > 18
),
temp2 as
(
select a.*, b.*
from temp a
left join city b
on b.name = a.name
and b.surname = a.surname
where b.city = "Venice"
)
select *
from temp2 c
left join state d
on d.city = c.city
;
This is functionally equivalent to the query below, which does not require any CTE's.
select *
from people as a
join city b
on b.name = a.name
and b.surname = a.surname
and b.city = "Venice"
left join state c
on c.city = b.city
where a.age > 18
;
For what you are describing, you shouldn't need either CTEs or subqueries. Just use regular JOINs.
SELECT p.Name, p.Surname, p.Age, C.City, s.StateName, s.CountryName
FROM People p
INNER JOIN City c ON p.Name = c.Name
AND p.Surname = c.Surname
AND c.City = 'Venice'
LEFT OUTER JOIN State s ON c.City = s.City
WHERE p.Age > 18
See https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=6d16c8325ee1da354588ddddc75bb162 for demo.

Group by and aggregate by multiple columns

Example tables
taccount
tuser
tproject
What I want to achieve:
accountName count(u.id) count(p.id)
-----------------------------------
Account A 1 1
Account B 1 1
Account C 2 3
In other words I want a single query to join these tables together and count user's and project's per account
I tried:
SELECT
a.name as "accountName",
count(u.name),
count(p.id)
FROM "taccount" a
INNER JOIN "tuser" u ON u.account_id = a.id
INNER JOIN "tproject" p ON p.admin_id = u.id
GROUP BY u.name, a.name, p.id
But it's not grouping by account. It's giving me the following result
Any advice?
You can try below
SELECT
a.name as "accountName",
count(distinct u.name),
count(p.id)
FROM "taccount" a
INNER JOIN "tuser" u ON u.account_id = a.id
INNER JOIN "tproject" p ON p.admin_id = u.id
GROUP BY a.name
When you do Aggregate Function and If there are Column are not do Aggregate you must put in your Group By, because Aggregate functions perform a calculation on a set of rows and return a single row.
SELECT
a.name as "accountName",
count(distinct u.name),
count(p.id)
FROM
"taccount" a
INNER JOIN "tuser" u ON u.account_id = a.id
INNER JOIN "tproject" p ON p.admin_id = u.id
GROUP BY
a.name
So you need just Group By your column "accountName"
change your group by column name
SELECT
a.name as "accountName",
count(distinct u.account_id),
count(p.id)
FROM "taccount" a
INNER JOIN "tuser" u ON u.account_id = a.id
INNER JOIN "tproject" p ON p.admin_id = u.id
GROUP BY a.name
this will work:
select a.name,count(u.id),count(p.id) from
taccount a,tuser b, tproject where
a.id=b.account_id and
b.id=c.admin_id
group by a.name;

SQL Query Refactoring Possible?

Assume this is part of the result set
AND
Assume Dob,Name,Adress,Postcode,Telephone,EmailAddress are the same for each ID - and these columns are used in the group by clause
Sample data:
ID date Amount
---------------------------
12345 1/1/2017 100
12345 1/2/2017 200
12345 1/3/2017 300
With the outer query included I get the following which is what I want to achieve
ID date Amount
--------------------------
12345 1/1/2017 600
I want to confirm if there's a better way in terms of performance for this code. I feel like I could do a join, or a shorter version of the query but I can't get the logic right.
When I remove the outer query and do the MIN and SUM aggregate functions inside, the results doesn't group by correctly. It'll show more than one result for each id.
Also is it possible for a shorter group by?
Here's the partial version of the final code
SELECT
a.id, a.dob, a.claim_id,
a.name, a.Address, a.postcode,
a.Telephone, a.EmailAdress,
MIN(a.date), SUM(a.amount) as Amount
FROM
(SELECT DISTINCT
i.date, i.id, cl.name, cl.address,
cl.postcode, cl.telephone, cl.dob,
cl.EmailAdress, i.amount, cm.claim_id
FROM
testdb.dbo.invoice i
JOIN
testdb.dbo.claim cm with (nolock) ON i.id = cm.id
JOIN
testdb.dbo.clients cl with (nolock) ON cm.clientid = cl.id
JOIN
(....) c ON i.id = c.id
WHERE
.....) AS a
GROUP BY
a.id, a.dob, a.claim, a.name, a.Address,
a.postcode, a.Telephone, a.EmailAdress
ORDER BY
1
SELECT DISTINCT
i.date ,i.id ,cl.name ,cl.address
,cl.postcode ,cl.telephone,cl.dob
,cl.EmailAdress ,i.amount ,cm.claim_id
FROM
testdb.dbo.invoice i
JOIN
testdb.dbo.claim cm with (nolock) on i.id = cm.id
JOIN
testdb.dbo.clients cl with (nolock) on cm.clientid = cl.id
JOIN
( .... ) c on i.id = c.id
WHERE
.....
GROUP BY
i.id,i.dob,cm.claim_id,cl.name,cl.Address,cl.postcode,
cl.Telephone,cl.EmailAdress
ORDER BY 1
Is pretty much the previous code. With the outer query removed. I'm not sure what happened previously and as to why it still gave me multiple records(I'm not sure what differed now and then). But it isn't doing that anymore with this code.
Why not do the calculation inline and then join the detail tables afterwards,
something like:
SELECT
a.id, a.dob, claimDetails.claim_id,
a.name, a.Address, a.postcode,
a.Telephone, a.EmailAdress,
claimDetails.FirstDate, claimDetails.Amount
FROM a
LEFT JOIN
(
SELECT i.id, cm.claim_id, MIN(i.date) as FirstDate, SUM(i.amount) as Amount
FROM testdb.dbo.invoice i
JOIN testdb.dbo.claim cm ON i.id = cm.id
GROUP BY i.id, cm.claim_id
) claimDetails
ON claimDetails.id = a.id
LEFT JOIN Clients....

Get the row with max(timestamp)

I need to select most recently commented articles, with the last comment for each article, i.e. other columns of the row which contains max(c.created):
SELECT a.id, a.title, a.text, max(c.created) AS cid, c.text?
FROM subscriptions s
JOIN articles a ON a.id=s.article_id
JOIN comments c ON c.article_id=a.id
WHERE s.user_id=%d
GROUP BY a.id, a.title, a.text
ORDER BY max(c.created) DESC LIMIT 10;
Postgres tells me that I have to put c.text into GROUP BY. Obviously, I don't want to do this. min/max doesn't fit too. I don't have idea, how to select this.
Please advice.
In PostgreSQL, DISTINCT ON is probably the optimal solution for this kind of query:
SELECT DISTINCT ON (a.id)
a.id, a.title, a.text, c.created, c.text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = %d
ORDER BY a.id, c.created DESC
This retrieve articles with the latest comment and associated additional columns.
Explanation, links and a benchmark in this closely related answer.
To get the latest 10, wrap this in a subquery:
SELECT *
FROM (
SELECT DISTINCT ON (a.id)
a.id, a.title, a.text, c.created, c.text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = 12
ORDER BY a.id, c.created DESC
) x
ORDER BY created DESC
LIMIT 10;
Alternatively, you could use window functions in combination with standard DISTINCT:
SELECT DISTINCT
a.id, a.title, a.text, c.created, c.text
,first_value(c.created) OVER w AS c_created
,first_value(c.text) OVER w AS c_text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = 12
WINDOW w AS (PARTITION BY c.article_id ORDER BY c.created DESC)
ORDER BY c_created DESC
LIMIT 10;
This works, because DISTINCT (unlike aggregate functions) is applied after window functions.
You'd have to test which is faster. I'd guess the last one is slower.