Issues with Counting Records with SQL across multiple tables - sql

There are three tables: Cases, Calls, and SubEvents.
The table schema is a Case can have multiple Calls, and each Call can have multiple SubEvents.
I'd like to use a query to get a count of all calls and a count of all sub-events (if there are any) for each case after a certain date.
For example, a case named John has 3 calls... the first call has 2 sub-events, the second call has 1 sub-event, and the third call has zero. So the query should return this result:
Case
Call Total
SubEvent Total
John
3
3
I've tried writing the query multiple ways, with subqueries etc, but I can't get it to work properly. The closest I've come is the query below, but this provides the incorrect count for Calls. It gives me 4 when it should give me 3. Another example I tried had 4 calls with 10 sub-events, but the query returned 11 total calls instead of 4.
I'd appreciate any help. My SQL has gotten rusty after a period of disuse, and this is in someone's Access database, which is pretty fickle when it comes to writing SQL queries.
SELECT c.casename, Count(e.callid) AS EventTotal, Count(s.id) AS SubEventTotal
FROM (cases AS c INNER JOIN calls AS e ON c.contactid = e.contactid) left JOIN tblSubEvents AS s ON e.callid = s.callid
WHERE e.calldate > #1/1/2022#
GROUP BY c.casename

Try this:
SELECT
c.casename
,Count(e.callid) AS EventTotal
,sum(s.id) AS SubEventTotal
FROM (cases AS c INNER JOIN calls AS e ON c.contactid = e.contactid)
left JOIN (select callid, count(s.id) as id from tblSubEvents s group by callid) AS s ON e.callid = s.callid
WHERE e.calldate > #1/1/2022#
GROUP BY c.casename
Basically, pushing the responsibility for the sub-event count to the join query.

Consider joining two aggregate derived tables:
SELECT case.casename, c_agg.EventTotal, e_agg.SubEventTotal
FROM (case
INNER JOIN (
SELECT contactid, COUNT(callid) AS EventTotal
FROM calls
WHERE calldate > CDate('2022-01-01')
GROUP BY contactid
) c_agg
ON case.contactid = c_agg.contactid)
LEFT JOIN (
SELECT callid, COUNT(id) AS SubEventTotal
FROM tblSubEvents
GROUP BY callid
) e_agg
ON c_agg.callid = e_agg.callid

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P โ€‹โ€‹(NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values โ€‹โ€‹of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?
First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

Is it possible to count rows based on column value in a single query?

Let's start with such example:
I have 2 tables: Therapists and their Sessions.
I have such query
select therapists.id, s1.completed_count, s2.canceled_count
from therapists
inner join (select therapist_id, count(*) as completed_count
from sessions
where sessions.status = 'complete'
group by therapist_id) s1 on s1.therapist_id = therapists.id
inner join (select therapist_id, count(*) as canceled_count
from sessions
where sessions.status = 'canceled'
group by therapist_id) s2 on s2.therapist_id = therapists.id;
Is it possible to count sessions like this with single subquery join? Because I have several statuses and don't think that is ok to performance to write so many joins.
If you have a static number of statuss, numerous JOINs are fine. Alternatively, you could consider something along the lines of crosstab to "pivot" a row result to a column result. See PostgreSQL: tablefunc Module.
If you have a dynamic number of statuss, you could perhaps leverage array_agg to convert row results to an array. See PostgreSQL: Array Functions and Operators.
Simple
SELECT
therapists.id,
COUNT(
CASE WHEN status = 'complete' THEN 1 END
) AS completed_count,
COUNT(
CASE WHEN status = 'canceled' THEN 1 END
) AS canceled_count
FROM therapists t
JOIN sessions s ON s.therapist_id = t.id
GROUP BY t.id
;

Aggregate query across two tables in SQL?

I'm working in BigQuery. I've got two tables:
TABLE: orgs
code: STRING
group: STRING
TABLE: org_employees
code: STRING
employee_count: INTEGER
The code in each table is effectively a foreign key. I want to get all unique groups, with a count of the orgs in them, and (this is the tricky bit) a count of how many of of those orgs only have a single employee. Data that looks like this:
group,orgs,single_handed_orgs
00Q,23,12
00K,15,7
I know how to do the first bit, get the unique groups and count of associated orgs from the orgs table:
SELECT
count(code), group
FROM
[orgs]
GROUP BY group
And, I know how to get the count of single-handed orgs from the practice table:
SELECT
code,
(employee_count==1) AS is_single_handed
FROM
[org_employees]
But I'm not sure how to glue them together. Can anyone help?
for BigQuery: legacy SQL
SELECT
[group],
COUNT(o.code) as orgs,
SUM(employee_count = 1) as single_handed_orgs
FROM [orgs] AS o
LEFT JOIN [org_employees] AS e
ON e.code = o.code
GROUP BY [group]
using LEFT JOIN in case if some codes are missing in org_employees tables
for BigQuery: standard SQL
SELECT
grp,
COUNT(o.code) AS orgs ,
SUM(CASE employee_count WHEN 1 THEN 1 ELSE 0 END) AS single_handed_orgs
FROM orgs AS o
LEFT JOIN org_employees AS e
ON e.code = o.code
GROUP BY grp
Note use of grp vs group - looks like standard sql does like use of Reserved Keywords even if i put backticks around
Confirmed:
you can use keyword with backticks around
You could join the two tables to get the groups that have just one employee. Then you wrap this in a sub query and you count the groups that you have.
I'm using a COUNT DISTINCT and GROUP BY because I don't know how your data is structured. Is there only a single line per group or multiple?
SELECT
COUNT(DISTINCT group)
FROM (
SELECT
group
FROM
orgs AS o INNER JOIN org_employees AS e ON o.code = e.code
WHERE
employee_count = 1
GROUP BY
group
)

SQL select with join are returning double results

I am trying to select some data from different tables using join.
First, here is my SQL (MS) query:
SELECT Polls.pollID,
Members.membername,
Polls.polltitle, (SELECT COUNT(*) FROM PollChoices WHERE pollID=Polls.pollID) AS 'choices',
(SELECT COUNT(*) FROM PollVotes WHERE PollVotes.pollChoiceID = PollChoices.pollChoicesID) AS 'votes'
FROM Polls
INNER JOIN Members
ON Polls.memberID = Members.memberID
INNER JOIN PollChoices
ON PollChoices.pollID = Polls.pollID;
And the tables involved in this query is here:
The query returns this result:
pollID | membername | polltitle | choices | votes
---------+------------+-----------+---------+-------
10000036 | TestName | Test Title| 2 | 0
10000036 | TestName | Test Title| 2 | 1
Any help will be greatly appreciated.
Your INNER JOIN with PollChoices is bringing in more than 1 row for a given poll as there are 2 choices for the poll 10000036 as indicated by choices column.
You can change the query to use GROUP BY and get the counts.
In case you don't have entries for each member in the PollVotes or Polls table, you need to use LEFT JOIN
SELECT Polls.pollID,
Members.membername,
Polls.polltitle,
COUNT(PollChoices.pollID) as 'choices',
COUNT(PollVotes.pollvoteId) as 'votes'
FROM Polls
INNER JOIN Members
ON Polls.memberID = Members.memberID
INNER JOIN PollChoices
ON PollChoices.pollID = Polls.pollID
INNER JOIN PollVotes
ON PollVotes.pollChoiceID = PollChoices.pollChoicesID
AND PollVotes.memberID = Members.memberID
GROUP BY Polls.pollID,
Members.membername,
Polls.polltitle
You are getting 1 row for each PollChoices record since there are multiple choices per Polls INNER JOIN Members. You may be expecting the SELECT COUNT(*) sub-queries to act as a GROUP BY clause, but they don't.
If that doesn't make sense, add a bare minimum of sample data and the expected result and we can help more.
This query result is telling you the number of votes per choice in each poll.
In your example, this voter named TestName answered the poll (with ID 10000036) and gave one choice 1 vote, and the second choice 0 votes. This is why you are getting two rows in your result.
I'm not sure if you are expecting just one row because you didn't specify what data, exactly, you are trying to select. However if you are trying to see the number of votes that TestName has submitted, for each choice where the vote was greater than 1, then you will have to modify your query like this:
select * from
(SELECT Polls.pollID,
Members.membername,
Polls.polltitle, (SELECT COUNT(*) FROM PollChoices WHERE pollID=Polls.pollID) AS 'choices',
(SELECT COUNT(*) FROM PollVotes WHERE PollVotes.pollChoiceID = PollChoices.pollChoicesID) AS 'votes'
FROM Polls
INNER JOIN Members
ON Polls.memberID = Members.memberID
INNER JOIN PollChoices
ON PollChoices.pollID = Polls.pollID) as mysubquery where votes <> 0;

Self Join bringing too many records

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2
You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number
I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)