selective count in a group by

selective count in a group by - sql

I'm trying to make a report about the state of our clubs and their teams for the current season. A club can have many teams and each team has season-specific information (like image_url) in the table season_team.
I want to know how many teams already entered any season-specific information and how many uploaded an image. Following SQL is incorrect SQL, but maybe my intention comes across:
SELECT
t.club_id,
count(distinct t.id) as team_count,
count(st.id) WHERE st.id IS NOT null as season_team_count,
count(st.id) WHERE st.image_url IS NOT null as teams_with_image,
count(st.id) WHERE st.state = 'APPROVED' as approved_teams,
count(st.id) WHERE st.state = 'REJECTED' as rejected_teams
FROM
team t
LEFT OUTER JOIN season_team st ON (st.team_id = t.id and st.season_id = 8)
GROUP BY
t.club_id
can anyone point me in the right direction how to do this best?
I know I can do a dirty trick by just doing a "count(distinct st.image_url) -1" to get all the different images and substract 1 for the null entry. But that wouldn't work for the approved_teams

You can use FILTER:
SELECT t.club_id,
count(distinct t.id) as team_count,
count(st.id) FILTER (WHERE st.id IS NOT null) as season_team_count,
count(st.id) FILTER (WHERE st.image_url IS NOT null) as teams_with_image,
count(st.id) FILTER (WHERE st.state = 'APPROVED') as approved_teams,
count(st.id) FILTER (WHERE st.state = 'REJECTED') as rejected_teams
FROM team t LEFT OUTER JOIN
season_team st
ON st.team_id = t.id and st.season_id = 8
GROUP BY t.club_id

Related

SQL UPDATE statement with LEFT JOIN, GROUP BY and HAVING?

I need to update some rows in a table. I've created a Select statement to make sure I've got the rows I wanted to select.
I want to update task_status_id in the table task, and I've tried in various ways but always end up with a syntax error and have honestly no idea how to do so even though I've tried to follow others examples by using INNER JOIN and putting the select statement in parenthesis. Any help would be appreciated.
UPDATE statement to merge with the SELECT statement.
UPDATE task
SET task_status_id = (SELECT task_status_id
FROM task_status
WHERE task_type_id = 1
AND name = 'Completed');
WHERE
SELECT
t.task_id
FROM task t
LEFT JOIN user u
ON t.user_id = u.user_id
LEFT JOIN contract co
ON u.user_id = co.user_id
LEFT JOIN task_status ts
ON t.task_status_id = ts.task_status_id
WHERE co.status = 'Closed' AND
t.task_type_id = 1 AND
t.task_status_id != (SELECT task_status_id
FROM task_status
WHERE task_type_id = 1
AND name = 'Completed')
GROUP BY t.task_id
HAVING count(t.contract_id) <= 2;

First of all, it doesn't make sense to use LEFT JOIN contract co and then filter results using co.status = 'Closed', because if you're going to filter by a column from a joined table then you should use INNER JOIN (unless you're comparing to null in the filter).
Secondly, syntax here is incorrect - you should use not in instead of !=
AND t.task_status_id != (SELECT task_status_id
FROM task_status
WHERE task_type_id = 1
AND name = 'Completed')
However, since you already joined the task_status table you can replace the above block of code with the following (assuming that task_status_id is a unique column):
AND ts.name != 'Completed'
Either way, you should post sample data and expected result.

SELECT DISTINCT count() in Microsoft Access

I've created a database where we can track bugs we have raised with our developers (Table: ApplixCalls) and track any correspondence related to the logged bugs (Table: Correspondence).
I'm trying to create a count where we can see the number of bugs which have no correspondence or only correspondence from us. This should give us the visibility to see where we should be chasing our developers for updates etc.
So far I have this SQL:
SELECT DISTINCT Count(ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
HAVING (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
I'm finding that this part is counting every occasion we have sent an update, when I need it to count 1 where OurRef is unique and it only has updates from us:
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
Hopefully that makes sense...
Is there a way around this?

MS Access does not support count(distinct). In your case, you can use a subquery. In addition, your query should not work. Perhaps this is what you intend:
SELECT COUNT(*)
FROM (SELECT ApplixCalls.OurRef
FROM ApplixCalls LEFT JOIN
Correspondence
ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((orrespondence.OurRef Is Null) AND (ApplixCalls.Position) <> 'Closed')) OR
(ApplixCalls.Position <> 'Closed') AND (Correspondence.[SBSUpdate?] = True))
)
GROUP BY ApplixCalls.OurRef
) as x;
Modifications:
You have a HAVING clause with no GROUP BY. I think this should be a WHERE (although I am not 100% sure of the logic you intend).
The SELECT DISTINCT is replaced by SELECT . . . GROUP BY.
The COUNT(DISTINCT) is now COUNT(*) with a subquery.
EDIT:
Based on the description in your comments:
SELECT COUNT(*)
FROM (SELECT ApplixCalls.OurRef
FROM ApplixCalls LEFT JOIN
Correspondence
ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((orrespondence.OurRef Is Null) AND (ApplixCalls.Position) <> 'Closed')) OR
(ApplixCalls.Position <> 'Closed') AND (Correspondence.[SBSUpdate?] = True))
)
GROUP BY ApplixCalls.OurRef
HAVING SUM(IIF(Correspondence.[SBSUpdate?] = False, 1, 0)) = 0
) as x;

I can not understand why are you using having clause. I hope this query will fullfill youe need.
SELECT DISTINCT Count(ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
HAVING (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);

If you are counting all the element that respond to you condition you don't need DISTINCT .. distinct if for removing duplicate result
SELECT Count(distinct ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);

Column I don't want to group => COLUMN must appear IN the GROUP BY clause OR be used IN an aggregate FUNCTION

I am getting this error, the query works in SQLite.
ActiveRecord::StatementInvalid (PG::GroupingError: ERROR: COLUMN "receipts.is_read" must appear IN the GROUP BY clause OR be used IN an aggregate FUNCTION
Unsure what to do, it also occurs when I remove the coalesce part. I am not sure what else I can specify to make it work. Any ideas?
SELECT DISTINCT conversations.id, COALESCE(CASE WHEN receipts.is_read = 't' THEN 't' END, 'f') AS READ, conversations.updated_at, p1.nickname, p2.nickname
FROM conversations
INNER JOIN notifications ON notifications.conversation_id = conversations.id
INNER JOIN receipts ON receipts.notification_id = notifications.id
INNER JOIN profiles p1 ON p1.id = notifications.sender_id
INNER JOIN profiles p2 ON p2.id = receipts.receiver_id
WHERE (receipts.receiver_id = #{SELF.id} ) AND notifications.sender_id != #{SELF.id} OR (notifications.sender_id = #{SELF.id}) AND receipts.receiver_id != #{SELF.id}
GROUP BY conversations.id
ORDER BY conversations.updated_at DESC

You have an invalid mix of DISTINCT and GROUP BY and some un-grouped columns.
After replacing the mess with DISTINCT ON (and some other clean-up) this would work, but it's unclear, what you actually want to achieve:
SELECT DISTINCT ON (c.id)
c.id, COALESCE(r.is_read, FALSE) AS read
, c.updated_at, p1.nickname AS nickname1, p2.nickname AS nickname2
FROM conversations c
JOIN notifications n USING (conversation_id)
JOIN receipts r USING (notification_id)
JOIN profiles p1 ON p1.id = n.sender_id
JOIN profiles p2 ON p2.id = r.receiver_id
WHERE (r.receiver_id = #{SELF.id} AND n.sender_id <> #{SELF.id} OR
r.receiver_id <> #{SELF.id} AND n.sender_id = #{SELF.id})
ORDER BY c.id, c.updated_at DESC;
This gets you the row with the latest conversations.updated_at out of each group sharing the same conversations.id. Explanation:
Select first row in each GROUP BY group?

Joined Tables Causing Interference

I have a large but somewhat straightforward SQL query. Basically, users on my site develop reputations for different types of activities, such as writing reviews, leaving comments, and adding entries to our database. For the most part, these points are stored in the reputable_actions table, and I retrieve them by LEFT JOINing the reputable_actions table repeatedly. This feels sloppy, but it mostly work.
The problem I'm experiencing is with two of the reputations, "reviewer" and "community." Unlike the others, they aren't stored in the reputable_actions table. Instead, their values are derived from the votes table, which I access by first LEFT JOINing the comments table. For some reason, joining the comments table causes all my other reputations to increase exponentially. In one trial, the "archivist" reputation was suppose to be 25, but when I joined the comments, it ballooned to 10050.
I'm a novice with SQL and I've tried what I know (namely, applying GROUP BY clauses to users.id), but I haven' had any luck yet. Some guidance would be greatly appreciated.
SELECT users.*,
SUM(COALESCE(reviewers.value, 0)) as reviewer,
SUM(COALESCE(communities.value,0)) as community,
SUM(COALESCE(developers.value,0)) as developer,
SUM(COALESCE(moderators.value,0)) as moderator,
SUM(COALESCE(marketers.value,0)) as marketer,
SUM(COALESCE(archivists.value,0)) as archivist,
SUM(COALESCE(karmas.value,0)) as karma
FROM `users`
LEFT JOIN comments AS impressions
ON impressions.user_id = users.id
AND impressions.type = 'impression'
LEFT JOIN comments AS replies
ON replies.user_id = users.id
AND replies.type = 'reply'
LEFT JOIN votes AS reviewers
ON reviewers.voteable_type = 'impression'
AND reviewers.voteable_id = impressions.id
LEFT JOIN votes AS communities
ON communities.voteable_type = 'reply'
AND communities.voteable_id = replies.id
LEFT JOIN reputable_actions AS developers
ON developers.reputation_type = 'developer'
AND developers.user_id = users.id
LEFT JOIN reputable_actions AS moderators
ON moderators.reputation_type = 'moderator'
AND moderators.user_id = users.id
LEFT JOIN reputable_actions AS marketers
ON marketers.reputation_type = 'marketer'
AND marketers.user_id = users.id
LEFT JOIN reputable_actions AS archivists
ON archivists.reputation_type = 'archivist'
AND archivists.user_id = users.id
LEFT JOIN reputable_actions AS karmas
ON karmas.reputation_type = 'karma'
AND karmas.user_id = users.id
GROUP BY users.id

Basically you need to do two separate group bys, and combine the results. There's a trick to avoid joining multiple times, it may not by faster if you have many other voteable types, or reputation types.
Select
u.*,
Coalesce(r.developer, 0) as developer,
Coalesce(r.moderator, 0) as moderator,
Coalesce(r.marketer, 0) as marketer,
Coalesce(r.archivist, 0) as archivist,
Coalesce(r.karma, 0) as karma,
Coalesce(v.impressions, 0) as impressions,
Coalesce(v.replies, 0) as replies
From
users u
Left Outer Join (
Select
user_id,
Sum(Case When reputation_type = 'developer' Then value Else 0 End) as developer,
Sum(Case When reputation_type = 'moderator' Then value Else 0 End) as moderator,
Sum(Case When reputation_type = 'marketer' Then value Else 0 End) as marketer,
Sum(Case When reputation_type = 'archivist' Then value Else 0 End) as archivist,
Sum(Case When reputation_type = 'karma' Then value Else 0 End) as karma
From
reputable_actions
Group By
user_id
) r On u.id = r.user_id
Left Outer Join (
Select
c.user_id,
Sum(Case When c.type = 'impression' Then v.value Else 0 End) as impressions,
Sum(Case When c.type = 'reply' Then v.value Else 0 End) as replies
From
comments c
inner join -- maybe left outer?
votes v
on v.voteable_type = c.type And v.voteable_id = c.id
Group By
user_id
) v On u.id = v.user_id
Example (with no data). If your tables are structured differently to this, let me know.

There are multiple rows in comments and/or votes that match one row in the "rest" of the join. This "multiplies" the resulting rows and multiplies the results of other aggregate functions with it (as you already noted).
The simplest solution is to get the SUM(reviewers.value) and SUM(communities.value) in a separate query.
BTW, you'll experience the same problem if there is ever more than one reputable_actions row (of the same reputation_type) matching the same users row.

"Simple" SQL Query

Each of my clients can have many todo items and every todo item has a due date.
What would be the query for discovering the next undone todo item by due date for each file? In the event that a client has more than one todo, the one with the lowest id is the correct one.
Assuming the following minimal schema:
clients (id, name)
todos (id, client_id, description, timestamp_due, timestamp_completed)
Thank you.

I haven't tested this yet, so you may have to tweak it:
SELECT
TD1.client_id,
TD1.id,
TD1.description,
TD1.timestamp_due
FROM
Todos TD1
LEFT OUTER JOIN Todos TD2 ON
TD2.client_id = TD1.client_id AND
TD2.timestamp_completed IS NULL AND
(
TD2.timestamp_due < TD1.timestamp_due OR
(TD2.timestamp_due = TD1.timestamp_due AND TD2.id < TD1.id)
)
WHERE
TD2.id IS NULL
Instead of trying to sort and aggregate, you're basically answering the question, "Is there any other todo that would come before this one?" (based on your definition of "before"). If not, then this is the one that you want.
This should be valid on most SQL platforms.

This question is the classic pick-a-winner for each group. It gets posted about twice a day.
SELECT *
FROM todos t
WHERE t.timestamp_completed is null
and
(
SELECT top 1 t2.id
FROM todos t2
WHERE t.client_id = t2.client_id
and t2.timestamp_completed is null
--there is no earlier record
and
(t.timestamp_due > t2.timestamp_due
or (t.timestamp_due = t2.timestamp_due and t.id > t2.id)
)
) is null

SELECT c.name, MIN(t.id)
FROM clients c, todos t
WHERE c.id = t.client_id AND t.timestamp_complete IS NULL
GROUP BY c.id
HAVING t.timestamp_due <= MIN(t.timestamp_due)
Avoids a subquery, correlated or otherwise but introduces a bunch of aggregate operations which aren't much better.

Some Jet SQL, I realize it is unlikely that the questioner is using Jet, however the reader may be.
SELECT c.name, t.description, t.timestamp_due
FROM (clients c
INNER JOIN
(SELECT t.client_id, Min(t.id) AS MinOfid
FROM todos t
WHERE t.timestamp_completed Is Null
GROUP BY t.client_id) AS tm
ON c.id = tm.client_id)
INNER JOIN todos t ON tm.MinOfid = t.id

The following should get you close, first get the min time for each client, then lookup the client/todo information
SELECT
C.Id,
C.Name,
T.Id
T.Description,
T.timestamp_due
FROM
{
SELECT
client_id,
MIN(timestamp_due) AS "DueDate"
FROM todos
WHERE timestamp_completed IS NULL
GROUP BY ClientId
} AS MinValues
INNER JOIN Clients C
ON (MinValues.client_id = C.Id)
INNER JOIN todos T
ON (MinValues.client_id = T.client_id
AND MinValues.DueDate = T.timestamp_due)
ORDER BY C.Name
NOTE: Written assuming SQL Server

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

selective count in a group by - sql

Related

SQL UPDATE statement with LEFT JOIN, GROUP BY and HAVING?

SELECT DISTINCT count() in Microsoft Access

Column I don't want to group => COLUMN must appear IN the GROUP BY clause OR be used IN an aggregate FUNCTION

Joined Tables Causing Interference

"Simple" SQL Query

Categories

Resources