Return count of total group membership when providers are part of a group - sql

TABLE A: Pre-joined table - Holds a list of providers who belong to a group and the group the provider belongs to. Columns are something like this:
ProviderID (PK, FK) | ProviderName | GroupID | GroupName
1234 | LocalDoctor | 987 | LocalDoctorsUnited
5678 | Physican82 | 987 | LocalDoctorsUnited
9012 | Dentist13 | 153 | DentistryToday
0506 | EyeSpecial | 759 | OphtaSpecialist
TABLE B: Another pre-joined table, holds a list of providers and their demographic information. Columns as such:
ProviderID (PK,FK) | ProviderName | G_or_I | OtherColumnsThatArentInUse
1234 | LocalDoctor | G | Etc.
5678 | Physican82 | G | Etc.
9012 | Dentist13 | I | Etc.
0506 | EyeSpecial | I | Etc.
The expected result is something like this:
ProviderID | ProviderName | ProviderStatus | GroupCount
1234 | LocalDoctor | Group | 2
5678 | Physican82 | Group | 2
9012 | Dentist13 | Individual | N/A
0506 | EyeSpecial | Individual | N/A
The goal is to determine whether or not a provider belongs to a group or operates individually, by the G_or_I column. If the provider belongs to a group, I need to include an additional column that provides the count of total providers in that group.
The Group/Individual portion is relatively easy - I've done something like this:
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus
FROM
TableA A
LEFT OUTER JOIN TableB B
ON A.ProviderID = B.ProviderID;
So far so good, this returns the expected results based on the G_or_I flag.
However, I can't seem to wrap my head around how to complete the COUNT portion. I feel like I may be overthinking it, and stuck in a loop of errors. Some things I've tried:
Add a second CASE STATEMENT:
CASE
WHEN B.G_or_I = 'G'
THEN (
SELECT CountedGroups
FROM (
SELECT ProviderID, count(GroupID) AS CountedGroups
FROM TableA
WHERE A.ProviderID = B.ProviderID
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
)
)
ELSE 'N/A' END
This returns an error stating that a single row sub-query is returning more than one row. If I limit the number of rows returned to 1, the CountedGroups column returns 1 for every row. This makes me think that its not performing the count function as I expect it to.
I've also tried including a direct count of TableA as a factored sub-query:
WITH CountedGroups AS
( SELECT Provider ID, count(GroupID) As GroupSum
FROM TableA
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
) --This as a standalone query works just fine
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus,
CASE
WHEN B.G_or_I = 'G'
THEN GroupSum
ELSE 'N/A' END
FROM
CountedGroups CG
JOIN TableA A
ON CG.ProviderID = A.ProviderID
LEFT OUTER JOIN TableB
ON A.ProviderID = B.ProviderID
This returns either null or completely incorrect column values
Other attempts have been a number of variations of this, with a mix of bad results or Oracle errors. As I mentioned above, I'm probably way overthinking it and the solution could be rather simple. Apologies if the information is confusing or I've not provided enough detail. The real tables have a lot of private medical information, and I tried to translate the essence of the issue as best I could.
Thank you.

You can use the CASE..WHEN and analytical function COUNT as follows:
SELECT
A.PROVIDERID,
A.PROVIDERNAME,
CASE
WHEN B.G_OR_I = 'G' THEN 'Group'
ELSE 'Individual'
END AS PROVIDERSTATUS,
CASE
WHEN B.G_OR_I = 'G' THEN TO_CHAR(COUNT(1) OVER(
PARTITION BY A.GROUPID
))
ELSE 'N/A'
END AS GROUPCOUNT
FROM
TABLE_A A
JOIN TABLE_B B ON A.PROVIDERID = B.PROVIDERID;
TO_CHAR is needed on COUNT as output expression must be of the same data type in CASE..WHEN

Your problem seems to be that you are missing a column. You need to add group name, otherwise you won't be able to differentiate rows for the same practitioner who works under multiple business entities (groups). This is probably why you have a DISTINCT on your query. Things looked like duplicates which weren't. Once you've done that, just use an analytic function to figure out the rest:
SELECT ta.providerid,
ta.providername,
DECODE(tb.g_or_i, 'G', 'Group', 'I', 'Individual') AS ProviderStatus,
ta.group_name,
CASE
WHEN tb.g_or_i = 'G' THEN COUNT(DISTINCT ta.provider_id) OVER (PARTITION BY ta.group_id)
ELSE 'N/A'
END AS GROUP_COUNT
FROM table_a ta
INNER JOIN table_b tb ON ta.providerid = tb.providerid
Is it possible that your LEFT JOIN was going the wrong direction? It makes more sense that your base demographic table would have all practitioners in it and then the Group table might be missing some records. For instance if the solo prac was operating under their own SSN and Type I NPI without applying for a separate Type II NPI or TIN.

Related

postgresql total column sum

SELECT
SELECT pp.id, TO_CHAR(pp.created_dt::date, 'dd.mm.yyyy') AS "Date", CAST(pp.created_dt AS time(0)) AS "Time",
au.username AS "User", ss.name AS "Service", pp.amount, REPLACE(pp.status, 'SUCCESS', ' ') AS "Status",
pp.account AS "Props", pp.external_id AS "External", COALESCE(pp.external_status, null, 'indefined') AS "External status"
FROM payment AS pp
INNER JOIN auth_user AS au ON au.id = pp.creator_id
INNER JOIN services_service AS ss ON ss.id = pp.service_id
WHERE pp.created_dt::date = (CURRENT_DATE - INTERVAL '1' day)::date
AND ss.name = 'Some Name' AND pp.status = 'SUCCESS'
id | Date | Time | Service |amount | Status |
------+-----------+-----------+------------+-------+--------+---
9 | 2021.11.1 | 12:20:01 | some serv | 100 | stat |
10 | 2021.12.1 | 12:20:01 | some serv | 89 | stat |
------+-----------+-----------+------------+-------+--------+-----
Total | | | | 189 | |
I have a SELECT like this. I need to get something like the one shown above. That is, I need to get the total of one column. I've tried a lot of things already, but nothing works out for me.
If I understand correctly you want a result where extra row with aggregated value is appended after result of original query. You can achieve it multiple ways:
1. (recommended) the simplest way is probably to union your original query with helper query:
with t(id,other_column1,other_column2,amount) as (values
(9,'some serv','stat',100),
(10,'some serv','stat',89)
)
select t.id::text, t.other_column1, t.other_column2, t.amount from t
union all
select 'Total', null, null, sum(amount) from t
2. you can also use group by rollup clause whose purpose is exactly this. Your case makes it harder since your query contains many columns uninvolved in aggregation. Hence it is better to compute aggregation aside and join unimportant data later:
with t(id,other_column1,other_column2,amount) as (values
(9,'some serv','stat',100),
(10,'some serv','stat',89)
)
select case when t.id is null then 'Total' else t.id::text end as id
, t.other_column1
, t.other_column2
, case when t.id is null then ext.sum else t.amount end as amount
from (
select t.id, sum(amount) as sum
from t
group by rollup(t.id)
) ext
left join t on ext.id = t.id
order by ext.id
3. For completeness I just show you what should be done to avoid join. In that case group by clause would have to use all columns except amount (to preserve original rows) plus the aggregation (to get the sum row) hence the grouping sets clause with 2 sets is handy. (The rollup clause is special case of grouping sets after all.) The obvious drawback is repeating case grouping... expression for each column uninvolved in aggregation.
with t(id,other_column1,other_column2,amount) as (values
(9,'some serv','stat',100),
(10,'some serv2','stat',89)
)
select case grouping(t.id) when 0 then t.id::text else 'Total' end as id
, case grouping(t.id) when 0 then t.other_column1 end as other_column1
, case grouping(t.id) when 0 then t.other_column2 end as other_column2
, sum(t.amount) as amount
from t
group by grouping sets((t.id, t.other_column1, t.other_column2), ())
order by t.id
See example (db fiddle):
(To be frank, I can hardly imagine any purpose other than plain reporting where a column mixes id of number type with label Total of text type.)

Mandatory condition matching in ´where-in´ clause

Consider a simple where clause
select * from table_abc where col_a in (1,2,3)
I know the current conditions
If 1,2,3 are absent, I will not get any results
If 1,2,3 are present, I will get all results associated with 1,2,3
If 1 is present and 2,3 is absent, I will get only results associated with 1.
My question is if we can execute the query for the condition for
If 1 is present and 2,3 is absent, I should still get all results associated with 1,2,3
However, if 1,2,3 are absent, I will not get any results
In other words, can I have a particular value in the where-in clause set as mandatory? How can we change the current query?
EDIT : As pointed out in the comment, I have forgot to add the table structure. It is better that I explain the use case as well.
Table 1 : Admins
ID admin_id
-------------
1 001
2 002
Table 2 : Events
ID event_id
-------------
1 110
2 220
Table 3 : Admins_Events
admin_id event_id
-------------
001 110
001 220
002 220
Now, as a part of filtering, let's say I have the query
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
LEFT JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
LEFT JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
WHERE (events.event_id IN (110) AND admins.admin_id IN (001))
And currently, I am getting the results as
admin_id event_id
-------------
001 110
where as I would want something like
admin_id event_id
-------------
001 110
001 220
I have to still show the other events associated with the admin even though I do not pass it in the where-in clause. I was thinking to pass all the event_id's every time and match the mandatory event_id and also match the remaining event_ids in case the mandatory event_id is found.
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
LEFT JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
LEFT JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
WHERE (events.event_id IN (mandatory[110], 220) AND admins.admin_id IN (001))
How can I change the query?
Add another condition with EXISTS in the WHERE clause:
SELECT a.admin_id, e.event_id
FROM Admins a
LEFT JOIN Admins_Events ae ON ae.admin_id = a.admin_id
LEFT JOIN Events e ON e.event_id = ae.event_id
WHERE (e.event_id IN (110, 220) AND a.admin_id IN (001))
AND EXISTS (
SELECT 1 FROM Admins_Events
WHERE event_id = 110 AND admin_id = a.admin_id
)
See the demo.
Results:
| admin_id | event_id |
| -------- | -------- |
| 1 | 110 |
| 1 | 220 |
It sounds like in your example you want all events associated with admins who are associated with event 110. In Mysql I'd do it with the following query, which joins admins to events twice: once to filter for the event you need, and once to get all the other events.
However, in your example, you don't need the admins table in the query at all, since you just need the admin ID, which you can get directly from the admins_events table. I left it in, in case your real "admins" table also had other attributes you wanted (name, location, etc) which are not available in the admins_events join table.
The specific query is:
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
JOIN "admins_events" AS "specific_admins_events" ON "specific_admins_events"."admin_id" = "admins"."admin_id"
JOIN "events" AS "specific_events" ON "specific_events"."event_id" = "specific_admins_events"."event_id"
WHERE (specific_events.event_id IN (110))
First, you only need the admin_events table.
Then method uses window functions:
SELECT ae.*
FROM (SELECT ae.*,
SUM(CASE WHEN ae.event_id IN (110) THEN 1 ELSE 0 END) OVER (PARTITION BY ae.admin_id) as num_110
FROM admins_events ae
) ae
WHERE ae.admin_id IN ('001') AND -- assume this is a string
num_110 > 0 AND
ae.event_id IN (110, 220);

Filtering within Postrgres aggregations

I have a table in Postgres called tasks. It records Mechanical Turk-style tasks. It has the following columns:
entity_name, text (the thing being reviewed)
reviewer_email, text (the email address of the person doing the reviewing)
result, boolean (the entry provided by the reviewer)
Each entity that needs to be reviewed leads to the generation of two task rows, each assigned to a different reviewer. When both reviewers disagree (e.g. their values for result are not equal), the application kicks off a third task, assigned to a moderator. The moderators always have the same email domain.
I'm trying to get the counts for each time reviewer a reviewer has been overruled by a moderator, or affirmed by a moderator. I think I'm fairly close, but the last bit is proving tricky:
SELECT
reviewer_email,
COUNT(*) FILTER(
WHERE entity_name IN (
SELECT entity_name
FROM tasks
GROUP BY entity_name
HAVING
COUNT(*) FILTER (WHERE result IS NOT NULL) = 3 -- find the entities that have exactly three reviews
AND
-- this is the tricky part:
-- need something like:
-- WHERE current_review.result = moderator_review.result
)
) AS overruled_count
FROM
tasks
WHERE
result IS NOT NULL
GROUP BY
reviewer_email
HAVING
reviewer_email NOT LIKE '%#moderators-domain.net'
Sample data:
id | entity_name | reviewer_email | result
1 | apple | bob#email.net | true
2 | apple | alice#email.net | false
3 | apple | mod##moderators-domain.net | true
4 | pair | bob#email.net | true
5 | pair | alice#email.net | false
6 | pair | mod##moderators-domain.net | false
7 | kiwi | bob#email.net | true
8 | kiwi | alice#email.net | true
Desired results:
reviewer_email | overruled_count | affirmed_count
bob#email.net | 1 | 1
alice#email.net | 1 | 1
Bob and Alice each have done three reviews. On one review, they agreed, therefore there was no moderation. They disagreed on the other two reviews and were overruled once, and affirmed once by the moderator.
I believe the code above has me on the right track, but I'm definitely interested in other approaches to this.
I think this is a harder problem than you might realize. The following appends the moderator review to each non-moderator review:
select t.*, tm.result as moderator_result
from tasks t join
tasks tm
on t.entity_name = tm.entity_name
where t.reviewer_email NOT LIKE '%#moderators-domain.net' and
tm.reviewer_email LIKE '%#moderators-domain.net';
From this, we can aggregate the results that you want:
select reviewer_email,
sum( (result = moderator_result)::int ) as moderator_agrees,
sum( (result <> moderator_result)::int ) as moderator_disagrees
from (select t.*, tm.result as moderator_result
from tasks t join
tasks tm
on t.entity_name = tm.entity_name
where t.reviewer_email NOT LIKE '%#moderators-domain.net' and
tm.reviewer_email LIKE '%#moderators-domain.net'
) t
group by reviewer_email;
There may be a way to do this using filter and even window functions. This method seems the most natural to me.
I should note that the subquery is not necessary, of course:
select t.reviewer_email,
sum( (t.result = tm.result)::int ) as moderator_agrees,
sum( (t.result <> tm.result)::int ) as moderator_disagrees
from tasks t join
tasks tm
on t.entity_name = tm.entity_name
where t.reviewer_email NOT LIKE '%#moderators-domain.net' and
tm.reviewer_email LIKE '%#moderators-domain.net'
group by t.reviewer_email;
Just adding some changes to make the query a bit easier to understand in my opinion.
I'm guessing we also need to consider the case where we have users who have never been either affirmed or overruled (so the counts for them would be 0)
SELECT
tasks.reviewer_email,
COUNT(*) FILTER (WHERE tasks.result = modtasks.result) AS affirmed_count,
COUNT(*) FILTER (WHERE tasks.result <> modtasks.result) AS overruled_count
FROM tasks
LEFT JOIN tasks modtasks
ON modtasks.entity_name = tasks.entity_name
AND modtasks.reviewer_email LIKE '%#moderators-domain.net'
WHERE tasks.reviewer_email NOT LIKE '%#moderators-domain.net'
GROUP BY tasks.reviewer_email
Sample data
CREATE TABLE tasks
("id" int, "entity_name" text, "reviewer_email" text, "result" boolean)
;
INSERT INTO tasks
("id", "entity_name", "reviewer_email", "result")
VALUES
(1, 'apple', 'bob#email.net', 'true'),
(2, 'apple', 'alice#email.net', 'false'),
(3, 'apple', 'mod##moderators-domain.net', 'true'),
(4, 'pair', 'bob#email.net', 'true'),
(5, 'pair', 'alice#email.net', 'false'),
(6, 'pair', 'mod##moderators-domain.net', 'false'),
(7, 'kiwi', 'bob#email.net', 'true'),
(8, 'kiwi', 'alice#email.net', 'true')
;
Query 1
WITH
CTE_moderated_tasks
AS
(
SELECT
id AS mod_id
,entity_name AS mod_entity_name
,result AS mod_result
FROM tasks
WHERE reviewer_email LIKE '%#moderators-domain.net'
)
SELECT
tasks.reviewer_email
,SUM(CASE WHEN tasks.result <> mod_result THEN 1 ELSE 0 END) AS overruled_count
,SUM(CASE WHEN tasks.result = mod_result THEN 1 ELSE 0 END) AS affirmed_count
FROM
CTE_moderated_tasks
INNER JOIN tasks ON
tasks.entity_name = CTE_moderated_tasks.mod_entity_name
AND tasks.id <> CTE_moderated_tasks.mod_id
GROUP BY
tasks.reviewer_email
I split the query into two parts.
At first I want to find all tasks where moderator was involved (CTE_moderated_tasks). It assumes that moderator can't be involved more than once in the same task.
This result is inner joined to original tasks table thus naturally filtering out all tasks where moderator was not involved. This also gives us moderator opinion next to the reviewer opinion. This assumes that there are only two reviewers for the same task.
All that is left now is simple grouping by reviewers and counting how many times reviewer's and moderator's opinions matched. I used a classic SUM(CASE ...) for this conditional aggregate.
You don't have to use CTE, I used it primarily for readability.
I'd also like to highlight that this query uses LIKE only during one scan of the table. If there is an index on entity_name the join may be rather efficient.
Result
| reviewer_email | overruled_count | affirmed_count |
|-----------------|-----------------|----------------|
| alice#email.net | 1 | 1 |
| bob#email.net | 1 | 1 |
.
Variant without self-join
Here is another variant without self-join, which may be more efficient. You need to test with your real data, indexes and hardware.
This query uses window function with partitioning by entity_name to bring moderator result for each row without explicit self-join. You can use any aggregate function here (SUM or MIN or MAX), because there will be at most one row from moderator for each entity_name.
Then simple grouping with conditional aggregate give us the count.
Here conditional aggregate uses the fact that NULL compared to any value never returns true. mod_result for entities that don't have a moderator would have nulls and both result <> mod_result and result = mod_result would yield false, so such rows don't contribute to either count.
Final HAVING reviewer_email NOT LIKE '%#moderators-domain.net' removes the count of moderator results themselves.
Again, you don't have to use CTE here and I used it primarily for readability. I'd recommend to run just the CTE first and examine intermediate results to understand how the query works.
Query 2
WITH
CTE
AS
(
SELECT
id
,entity_name
,reviewer_email
,result::int
,SUM(result::int)
FILTER (WHERE reviewer_email LIKE '%#moderators-domain.net')
OVER (PARTITION BY entity_name) AS mod_result
FROM tasks
)
SELECT
reviewer_email
,SUM(CASE WHEN result <> mod_result THEN 1 ELSE 0 END) AS overruled_count
,SUM(CASE WHEN result = mod_result THEN 1 ELSE 0 END) AS affirmed_count
FROM CTE
GROUP BY reviewer_email
HAVING reviewer_email NOT LIKE '%#moderators-domain.net'

UPDATE SQL table based on query

I have a table called students with 1000 students in. I have a query which tells me which of those students has free tuition. In the stduents table I have a field called FreeTuition and I want to populate/update that field with the results of the query. Do I need to use some kind of loop?
The students table has StuCode which is unique, the query returns StuCode of all the students with free tuition. This is how I want it to look:
| StuCode | FreeTuition |
-------------------------
| S12345 | Yes |
| S12346 | No |
-------------------------
Not at all. Something like this:
with yourquery as (
<your query here>
)
update s
set FreeTuition = (case when yq. StuCode is not null then 'Y' else 'N' end)
from students s left join
yourquery yq
on s. StuCode = yq. StuCode;
Note: This sets the value for all students, yes or no. You can change the left join to just join to set the value only for students returned by the subquery.

Self-referencing Query and Not Equals

Trying to pull data from a single table called tblTooling where two TlPartNo numbers are equal to different values and the TlToolNo are not equal for these TlPartNo . This is an Access DB and the following statement gets me close, but still gives too much data.
SELECT DISTINCT
tblTooling.TlToolNo,
tblTooling.TlPartNo,
tblTooling.TlOP,
tblTooling.TlQuantity
FROM tblTooling, tblTooling AS tblTooling_1
WHERE (((tblTooling.TlToolNo)<>tblTooling_1.TlToolNo)
AND ((tblTooling.TlPartNo)="10290722")
AND ((tblTooling_1.TlPartNo)="10295379"));
The included image has the tblTooling structure and Data. Plus the expected results from the query.
You seem to want exclude a ToolNo value when it occurs with both PartNo values. In that case you could group intermediate results by ToolNo, and see whether in such a group there is only one PartNo present (with having). In that case keep that record, and in the outer query, get the two other columns added to it:
SELECT DISTINCT
tblTooling.TlToolNo,
tblTooling.TlPartNo,
tblTooling.TlOP,
tblTooling.TlQuantity
FROM tblTooling
INNER JOIN (
SELECT TlToolNo,
Min(TlPartNo) AS MinTlPartNo,
Max(TlPartNo) AS MaxTlPartNo
FROM tblTooling
WHERE TlPartNo IN ("10290722", "10295379")
GROUP BY TlToolNo
HAVING Min(TlPartNo) = Max(TlPartNo)
) AS grp
ON grp.TlToolNo = tblTooling.TlToolNo
AND grp.MinTlPartNo = tblTooling.TlPartNo
Note that for your sample data this will return 4 rows:
TlToolNo | TlPartNo | TlOP | TlQuantity
----------+----------+------+-----------
T00012362 | 10290722 | OP10 | 2
T00012456 | 10290722 | OP10 | 1
T00013456 | 10290722 | OP20 | 1
T00014348 | 10295379 | OP20 | 1
I think you can do this with not exists:
select t.*
from tblTooling as t
where not exists (select 1
from tblTooling as t2
where t2.TlPartNo in ("10290722", "10295379") and
t2.TlToolNo = t.TlToolNo and
t2.tiid <> t.tiid
) and
t.TlPartNo in ("10290722", "10295379");
This saves on the select distinct, which should be a performance boost.