SQL Query not finding all rows in SUM and COUNT - sql

select coalesce(ratings.positive,0) as positive,coalesce(ratings.negative,0) as negative,articles.id,x.username,commentnumb,
articles.category,
articles."createdAt",
articles.id,
articles.title,
articles."updatedAt"
FROM articles
LEFT JOIN (SELECT id AS userId,username,about FROM users) x ON articles.user_id = x.userId
LEFT JOIN (SELECT id,
article_id,
sum(case when rating = '1' then 1 else 0 end) as positive,
sum(case when rating = '0' then 1 else 0 end) as negative
from article_ratings
GROUP by id
) as ratings ON ratings.article_id = articles.id
LEFT JOIN (SELECT article_id,id,
count(article_id) as commentNumb
from article_comments
GROUP by id
) as comments ON comments.article_id = articles.id
WHERE articles."createdAt" <= :date
group by ratings.positive,ratings.negative,articles.id,x.username,commentnumb
order by articles."createdAt" desc
LIMIT 10
The code is working, however I have many more comments and many more ratings than what is counted in both SUM and COUNT functions.
How do I fix this query?
This is using postgres.
I've done some experimentation and it seems that the third join for comments is the one causing issues.

In the derived tables, you should ideally be grouping using article_id. But, you are grouping based on id. Due to this, you are getting more than the necessary rows in the derived tables. I have modified the query to suit your needs.
SELECT COALESCE(ratings.positive,0) AS positive,COALESCE(ratings.negative,0) AS negative,articles.id,x.username,commentnumb,
articles.category,
articles."createdAt",
articles.id,
articles.title,
articles."updatedAt"
FROM articles
LEFT OUTER JOIN (SELECT id AS userId,username,about FROM users) x ON articles.user_id = x.userId
LEFT OUTER JOIN (SELECT article_id,
SUM(case when rating = '1' then 1 else 0 end) as positive,
SUM(case when rating = '0' then 1 else 0 end) as negative
FROM article_ratings
GROUP by article_id
) AS ratings ON ratings.article_id = articles.id
LEFT OUTER JOIN (SELECT article_id,
count(article_id) as commentNumb
FROM article_comments
GROUP by article_id
) AS comments ON comments.article_id = articles.id
WHERE articles."createdAt" <= :date
ORDER BY articles."createdAt" desc
LIMIT 10;

Related

Re-writing EXISTS as JOIN or a subquery in Oracle

I have a query which is very costly and taking more than an hour to execute. I tried converting the EXISTS clause to join but I am stuck, can anyone help?
The purpose is to find duplicate product within a unique space id. FLAT_TABLE consists of 5 million records.
Query:
select
tbl1.product,
tbl1.status,
tbl1.reservation,
tbl1.unique_space_id
FROM
schema1.flat_table tbl1
WHERE
tbl1.status = 'Active'
AND tbl1.product = 'Cage'
AND EXISTS
(SELECT 1
FROM schema1.flat_table tbl2
WHERE tbl2.product = 'Cage'
AND tbl2.status = 'Active'
AND tbl2.reservation <> 'Space Reserved'
AND tbl1.unique_space_id = tbl2.unique_space_id
GROUP BY tbl2.unique_space_id
HAVING COUNT (1) > 1
);
You can use analytical function count as follows:
select * from
(select tbl1.product, tbl1.status, tbl1.reservation, tbl1.unique_space_id,
count(case when tbl1.reservation <> 'Space Reserved' then 1 end)
over(partition by tbl1.unique_space_id) as cnt
FROM schema1.flat_table tbl1
WHERE tbl1.status = 'Active' AND tbl1.product = 'Cage')
where cnt > 1
You could rewrite your query as an inner join to the current exists subquery. The join would have the effect of filtering in the same way the exists clause was behaving.
SELECT DISTINCT
tbl1.product,
tbl1.status,
tbl1.reservation,
tbl1.unique_space_id
FROM schema1.flat_table tbl1
INNER JOIN
(
SELECT unique_space_id
FROM schema1.flat_table
WHERE product = 'Cage' AND
status = 'Active' AND
reservation <> 'Space Reserved'
GROUP BY unique_space_id
HAVING COUNT(*) > 1
) tbl2
ON tbl2.unique_space_id = tbl1.unique_space_id
WHERE
tbl1.status = 'Active' AND
tbl1.product = 'Cage';
Here is a more concise version using COUNT as an analytic function, along with a QUALIFY clause;
SELECT DISTINCT product, status, reservation, unique_space_id
FROM schema1.flat_table
WHERE status = 'Active' AND product = 'Cage'
QUALIFY COUNT(CASE WHEN reservation <> 'Space Reserved' THEN 1 END)
OVER (PARTITION BY unique_space_id) > 1;

SQL code in Access says error in FROM clause

SELECT
p.Name,
p.Age,
MAX(COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)))
FROM Players AS p
INNER JOIN Teams AS t
ON t.ID = p.Team_ID
INNER JOIN Matches AS m
ON m.Team_ID = t.ID
GROUP BY
p.Name,
p.Age;
I can suggest the following query:
SELECT TOP 1
p.Name,
p.Age,
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID))
FROM (Players AS p
INNER JOIN Teams AS t
ON t.ID = p.Team_ID)
INNER JOIN Matches AS m
ON m.Team_ID = t.ID
GROUP BY
p.Name,
p.Age
ORDER BY
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)) DESC;
This fixes the syntax problem with your joins. It also interprets the MAX as meaning that you want the record with the maximum ratio of counts. In this case, we can use TOP 1 along with ORDER BY to identify this max record.
MS Access requires strange parentheses when you have more than one join. In addition, MAX(COUNT(m.winTeam_ID)) doesn't make sense. I don't know what you are trying to calculate in the SELECT. Perhaps this does what you want:
SELECT p.Name, p.Age,
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)))
FROM (Players AS p INNER JOIN
Teams AS t
ON t.ID = p.Team_ID
) INNER JOIN
Matches AS m
ON m.Team_ID = t.ID
GROUP BY p.Name, p.Age;
I think your Matches table shouldn't have a Team_ID And instead you have winTeam_ID and lossTeam_ID!
And also you want to query players of a team with - something like - the best win-rate.
If so, Use a query like this - tested on SQL Server only -:
select
p.Age, p.Name, ts.rate
from
Players p
join
(select top(1) -- sub-query will return just first record
t.ID
, sum(case when (t.ID = winTeam_ID) then 1 else 0 end) as wins
, sum(case when (t.ID = lossTeam_ID) then 1 else 0 end) as losses
, sum(case when (t.ID = winTeam_ID) then 1.0 else 0.0 end) /
(sum(case when (t.ID = winTeam_ID) then 1.0 else 0.0 end) + sum(case when (t.ID = lossTeam_ID) then 1.0 else 0.0 end)) as rate
from Teams as t
left join Matches as m
on t.ID = m.winTeam_ID
or t.ID = lossTeam_ID
group by t.ID
order by rate desc -- This will make max rate as first
) as ts -- Team stats calculated in this sub-query
on p.Team_ID = ts.ID;

How to optimize multiple subqueries to the same data set

Imagine I have a query like the following one:
SELECT
u.ID,
( SELECT
COUNT(*)
FROM
POSTS p
WHERE
p.USER_ID = u.ID
AND p.TYPE = 1
) AS interesting_posts,
( SELECT
COUNT(*)
FROM
POSTS p
WHERE
p.USER_ID = u.ID
AND p.TYPE = 2
) AS boring_posts,
( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
AND c.TYPE = 1
) AS interesting_comments,
( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
AND c.TYPE = 2
) AS boring_comments
FROM
USERS u;
( Hopefully it's correct because I just came up with it and didn't test it )
where I try to calculate the number of interesting and boring posts and comments that the user has.
Now, the problem with this query is that we have 2 sequential scans on both the posts and comments table and I wonder if there is a way to avoid that?
I could probably LEFT JOIN both posts and comments to the users table and do some aggregation but it's gonna generate a lot of rows before aggregation and I am not sure if that's a good way to go.
Aggregate posts and comments and outer join them to the users table.
select
u.id as user_id,
coaleasce(p.interesting, 0) as interesting_posts,
coaleasce(p.boring, 0) as boring_posts,
coaleasce(c.interesting, 0) as interesting_comments,
coaleasce(c.boring, 0) as boring_comments
from users u
left join
(
select
user_id,
count(case when type = 1 then 1 end) as interesting,
count(case when type = 2 then 1 end) as boring
from posts
group by user_id
) p on p.user_id = u.id
left join
(
select
user_id,
count(case when type = 1 then 1 end) as interesting,
count(case when type = 2 then 1 end) as boring
from comments
group by user_id
) c on c.user_id = u.id;
compare results and execution plan (here you scan posts once):
with c as (
select distinct
count(1) filter (where TYPE = 1) over (partition by USER_ID) interesting_posts
, count(1) filter (where TYPE = 2) over (partition by USER_ID) boring_posts
, USER_ID
)
, p as (select USER_ID,max(interesting_posts) interesting_posts, max(boring_posts) boring_posts from c)
SELECT
u.ID, interesting_posts,boring_posts
, ( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
) AS comments
FROM
USERS u
JOIN p on p.USER_ID = u.ID

Remove grouped data set when total of count is zero with subquery

I'm generating a data set that looks like this
category user total
1 jonesa 0
2 jonesa 0
3 jonesa 0
1 smithb 0
2 smithb 0
3 smithb 5
1 brownc 2
2 brownc 3
3 brownc 4
Where a particular user has 0 records in all categories is it possible to remove their rows form the set? If a user has some activity like smithb does, I'd like to keep all of their records. Even the zeroes rows. Not sure how to go about that, I thought a CASE statement may be of some help but I'm not sure, this is pretty complicated for me. Here is my query
SELECT DISTINCT c.category,
u.user_name,
CASE WHEN (
SELECT COUNT(e.entry_id)
FROM category c1
INNER JOIN entry e1
ON c1.category_id = e1.category_id
WHERE c1.category_id = c.category_id
AND e.user_name = u.user_name
AND e1.entered_date >= TO_DATE ('20140625','YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')) > 0 -- I know this won't work
THEN 'Yes'
ELSE NULL
END AS TOTAL
FROM user u
INNER JOIN role r
ON u.id = r.user_id
AND r.id IN (1,2),
category c
LEFT JOIN entry e
ON c.category_id = e.category_id
WHERE c.category_id NOT IN (19,20)
I realise the case statement won't work, but it was an attempt on how this might be possible. I'm really not sure if it's possible or the best direction. Appreciate any guidance.
Try this:
delete from t1
where user in (
select user
from t1
group by user
having count(distinct category) = sum(case when total=0 then 1 else 0 end) )
The sub query can get all the users fit your removal requirement.
count(distinct category) get how many category a user have.
sum(case when total=0 then 1 else 0 end) get how many rows with activities a user have.
There are a number of ways to do this, but the less verbose the SQL is, the harder it may be for you to follow along with the logic. For that reason, I think that using multiple Common Table Expressions will avoid the need to use redundant joins, while being the most readable.
-- assuming user_name and category_name are unique on [user] and [category] respectively.
WITH valid_categories (category_id, category_name) AS
(
-- get set of valid categories
SELECT c.category_id, c.category AS category_name
FROM category c
WHERE c.category_id NOT IN (19,20)
),
valid_users ([user_name]) AS
(
-- get set of users who belong to valid roles
SELECT u.[user_name]
FROM [user] u
WHERE EXISTS (
SELECT *
FROM [role] r
WHERE u.id = r.[user_id] AND r.id IN (1,2)
)
),
valid_entries (entry_id, [user_name], category_id, entry_count) AS
(
-- provides a flag of 1 for easier aggregation
SELECT e.[entry_id], e.[user_name], e.category_id, CAST( 1 AS INT) AS entry_count
FROM [entry] e
WHERE e.entered_date BETWEEN TO_DATE('20140625','YYYYMMDD') AND TO_DATE('20140731', 'YYYYMMDD')
-- determines if entry is within date range
),
user_categories ([user_name], category_id, category_name) AS
( SELECT u.[user_name], c.category_id, c.category_name
FROM valid_users u
-- get the cartesian product of users and categories
CROSS JOIN valid_categories c
-- get only users with a valid entry
WHERE EXISTS (
SELECT *
FROM valid_entries e
WHERE e.[user_name] = u.[user_name]
)
)
/*
You can use these for testing.
SELECT COUNT(*) AS valid_categories_count
FROM valid_categories
SELECT COUNT(*) AS valid_users_count
FROM valid_users
SELECT COUNT(*) AS valid_entries_count
FROM valid_entries
SELECT COUNT(*) AS users_with_entries_count
FROM valid_users u
WHERE EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT COUNT(*) AS users_without_entries_count
FROM valid_users u
WHERE NOT EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT uc.[user_name], uc.[category_name], e.[entry_count]
FROM user_categories uc
INNER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
*/
-- Finally, the results:
SELECT uc.[user_name], uc.[category_name], SUM(NVL(e.[entry_count],0)) AS [entry_count]
FROM user_categories uc
LEFT OUTER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
Here's another method:
WITH totals AS (
SELECT
c.category,
u.user_name,
COUNT(e.entry_id) AS total,
SUM(COUNT(e.entry_id)) OVER (PARTITION BY u.user_name) AS user_total
FROM
user u
INNER JOIN
role r ON u.id = r.user_id
CROSS JOIN
category c
LEFT JOIN
entry e ON c.category_id = e.category_id
AND u.user_name = e.user_name
AND e1.entered_date >= TO_DATE ('20140625', 'YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')
WHERE
r.id IN (1, 2)
AND c.category_id IN (19, 20)
GROUP BY
c.category,
u.user_name
)
SELECT
category,
user_name,
total
FROM
totals
WHERE
user_total > 0
;
The totals derived table calculates the totals per user and category as well as totals across all categories per user (using SUM() OVER ...). The main query returns only rows where the user total is greater than zero.

Joined SQL query MAX aggregate with condition

I am having trouble with a SQL query. Here is a representation of my schema on SQL Fiddle:
http://sqlfiddle.com/#!15/14c8e/1
The issue is that I want to return rows of data from the Invitations table and join them with a sum of both the 'sent' event_type and 'viewed' event_type from the associated events, as well as the latest created_at date.
I can get all the data and counts working, but am having issue with the last_sent_on. Is there a way I can use a condition in a MAX aggregate function?
e.g.
MAX(
SELECT events.created_at
WHERE event_type='sent'
)
If not, how would I write the proper subselect?
I am currently using Postgresql.
Thank you.
You can use a case statement inside of max just as you've done with sum. The query below will select the maximum created_at for event_type='sent'
SELECT
i.id,
i.name,
i.email,
max(case when e.event_type='sent' then e.created_at end) AS last_sent_on,
sum(case when e.event_type='sent' then 1 else 0 end) AS sent_count,
sum(case when e.event_type='viewed' then 1 else 0 end) AS view_count
FROM
invitations i
LEFT OUTER JOIN
events e
ON e.eventable_id = i.id
WHERE e.eventable_type='Invitation'
GROUP BY i.id, i.name, i.email
SQLFiddle
Try using a subquery to build the max value for sent.
SELECT
i.id,
i.name,
i.email,
sent.last_sent,
sum(case when e.event_type='sent' then 1 else 0 end) AS sent_count,
sum(case when e.event_type='viewed' then 1 else 0 end) AS view_count
FROM
invitations i
LEFT OUTER JOIN
events e
ON e.eventable_id = i.id
LEFT JOIN ( SELECT eventable_id uid, MAX(created_at) AS last_sent
FROM events
WHERE event_type = 'sent'
GROUP BY eventable_id ) AS sent
ON sent.uid = i.id
WHERE e.eventable_type='Invitation'
GROUP BY i.id, i.name, i.email, sent.last_sent