Mandatory condition matching in ´where-in´ clause - sql

Consider a simple where clause
select * from table_abc where col_a in (1,2,3)
I know the current conditions
If 1,2,3 are absent, I will not get any results
If 1,2,3 are present, I will get all results associated with 1,2,3
If 1 is present and 2,3 is absent, I will get only results associated with 1.
My question is if we can execute the query for the condition for
If 1 is present and 2,3 is absent, I should still get all results associated with 1,2,3
However, if 1,2,3 are absent, I will not get any results
In other words, can I have a particular value in the where-in clause set as mandatory? How can we change the current query?
EDIT : As pointed out in the comment, I have forgot to add the table structure. It is better that I explain the use case as well.
Table 1 : Admins
ID admin_id
-------------
1 001
2 002
Table 2 : Events
ID event_id
-------------
1 110
2 220
Table 3 : Admins_Events
admin_id event_id
-------------
001 110
001 220
002 220
Now, as a part of filtering, let's say I have the query
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
LEFT JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
LEFT JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
WHERE (events.event_id IN (110) AND admins.admin_id IN (001))
And currently, I am getting the results as
admin_id event_id
-------------
001 110
where as I would want something like
admin_id event_id
-------------
001 110
001 220
I have to still show the other events associated with the admin even though I do not pass it in the where-in clause. I was thinking to pass all the event_id's every time and match the mandatory event_id and also match the remaining event_ids in case the mandatory event_id is found.
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
LEFT JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
LEFT JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
WHERE (events.event_id IN (mandatory[110], 220) AND admins.admin_id IN (001))
How can I change the query?

Add another condition with EXISTS in the WHERE clause:
SELECT a.admin_id, e.event_id
FROM Admins a
LEFT JOIN Admins_Events ae ON ae.admin_id = a.admin_id
LEFT JOIN Events e ON e.event_id = ae.event_id
WHERE (e.event_id IN (110, 220) AND a.admin_id IN (001))
AND EXISTS (
SELECT 1 FROM Admins_Events
WHERE event_id = 110 AND admin_id = a.admin_id
)
See the demo.
Results:
| admin_id | event_id |
| -------- | -------- |
| 1 | 110 |
| 1 | 220 |

It sounds like in your example you want all events associated with admins who are associated with event 110. In Mysql I'd do it with the following query, which joins admins to events twice: once to filter for the event you need, and once to get all the other events.
However, in your example, you don't need the admins table in the query at all, since you just need the admin ID, which you can get directly from the admins_events table. I left it in, in case your real "admins" table also had other attributes you wanted (name, location, etc) which are not available in the admins_events join table.
The specific query is:
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
JOIN "admins_events" AS "specific_admins_events" ON "specific_admins_events"."admin_id" = "admins"."admin_id"
JOIN "events" AS "specific_events" ON "specific_events"."event_id" = "specific_admins_events"."event_id"
WHERE (specific_events.event_id IN (110))

First, you only need the admin_events table.
Then method uses window functions:
SELECT ae.*
FROM (SELECT ae.*,
SUM(CASE WHEN ae.event_id IN (110) THEN 1 ELSE 0 END) OVER (PARTITION BY ae.admin_id) as num_110
FROM admins_events ae
) ae
WHERE ae.admin_id IN ('001') AND -- assume this is a string
num_110 > 0 AND
ae.event_id IN (110, 220);

Related

How to: SELECT from multiple CTEs that are mutually exclusive

I am in a situation where I need to show stats of a user based on referrals that user made & then stats about those referred users' activities. So, if user A refers user B, C then it means user A referred 2 users. Now, I need to get how many of those referred users did take an action i.e. PURCHASED. But they may not purchase which counts to 0. I have following query using CTEs which does work as expected but it also returns some false negative results. i.e.
WITH direct_referrals AS (
SELECT id
FROM "user"
WHERE "user"."referredBy" = ${userId}
),
application_stats AS (
SELECT count(status), status
FROM "application"
WHERE "userId" IN (SELECT id FROM direct_referrals)
GROUP BY status
)
SELECT *, (SELECT count(id) FROM direct_referrals) AS "totalReferrals"
FROM application_stats;
This query returns correct result if at-least 1 referred user took some action but it fails when none has taken any action in which case this query returns the referred users to be 0 which is not true.
I can see that SELECT is dependent on application_stats CTE which may not return any result & hence the direct_referrals are also not returned. I am somewhat new to SQL so don't really know many options. Any help is appreciated.
Thanks
Update with sample Data & Expected Results
// User model
Id username referredBy
---- -------- -------------------
1 jon NULL
2 jane 1
3 doe 1
4 smith 2
5 john 1
// Application model
Id userId status
---- -------- -------------------
1 12 'APPLIED'
2 13 'APPLIED'
3 14 'VIEWED'
Expected Result (for userId = 1):
User (referral) stats Application stats
------------------- -------------------
3 0
Actual Result:
User (referral) stats Application stats
------------------- -------------------
0 0
Something like this should give you what you want:
WITH REFERRAL AS (
SELECT REFERREDBY, COUNT(USERNAME) AS "REF_CNT"
FROM USER_MODEL
WHERE REFERREDBY IS NOT NULL
GROUP BY REFERREDBY
),
APPLICATIONS AS (
SELECT UM.REFERREDBY, COUNT(ML.APPL_STATUS) AS "APPL_CNT"
FROM USER_MODEL UM
LEFT OUTER JOIN APPLICATION_MODEL ML ON UM.ID = ML.USER_ID AND ML.APPL_STATUS = 'PURCHASED'
GROUP BY UM.REFERREDBY
)
SELECT R.REFERREDBY, R.REF_CNT, A.APPL_CNT
FROM REFERRAL R
LEFT OUTER JOIN APPLICATIONS A ON R.REFERREDBY = A.REFERREDBY;

Return count of total group membership when providers are part of a group

TABLE A: Pre-joined table - Holds a list of providers who belong to a group and the group the provider belongs to. Columns are something like this:
ProviderID (PK, FK) | ProviderName | GroupID | GroupName
1234 | LocalDoctor | 987 | LocalDoctorsUnited
5678 | Physican82 | 987 | LocalDoctorsUnited
9012 | Dentist13 | 153 | DentistryToday
0506 | EyeSpecial | 759 | OphtaSpecialist
TABLE B: Another pre-joined table, holds a list of providers and their demographic information. Columns as such:
ProviderID (PK,FK) | ProviderName | G_or_I | OtherColumnsThatArentInUse
1234 | LocalDoctor | G | Etc.
5678 | Physican82 | G | Etc.
9012 | Dentist13 | I | Etc.
0506 | EyeSpecial | I | Etc.
The expected result is something like this:
ProviderID | ProviderName | ProviderStatus | GroupCount
1234 | LocalDoctor | Group | 2
5678 | Physican82 | Group | 2
9012 | Dentist13 | Individual | N/A
0506 | EyeSpecial | Individual | N/A
The goal is to determine whether or not a provider belongs to a group or operates individually, by the G_or_I column. If the provider belongs to a group, I need to include an additional column that provides the count of total providers in that group.
The Group/Individual portion is relatively easy - I've done something like this:
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus
FROM
TableA A
LEFT OUTER JOIN TableB B
ON A.ProviderID = B.ProviderID;
So far so good, this returns the expected results based on the G_or_I flag.
However, I can't seem to wrap my head around how to complete the COUNT portion. I feel like I may be overthinking it, and stuck in a loop of errors. Some things I've tried:
Add a second CASE STATEMENT:
CASE
WHEN B.G_or_I = 'G'
THEN (
SELECT CountedGroups
FROM (
SELECT ProviderID, count(GroupID) AS CountedGroups
FROM TableA
WHERE A.ProviderID = B.ProviderID
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
)
)
ELSE 'N/A' END
This returns an error stating that a single row sub-query is returning more than one row. If I limit the number of rows returned to 1, the CountedGroups column returns 1 for every row. This makes me think that its not performing the count function as I expect it to.
I've also tried including a direct count of TableA as a factored sub-query:
WITH CountedGroups AS
( SELECT Provider ID, count(GroupID) As GroupSum
FROM TableA
GROUP BY ProviderID --originally had this as ORDER BY, but that was a mis-type on my part
) --This as a standalone query works just fine
SELECT DISTINCT
A.ProviderID,
A.ProviderName,
CASE
WHEN B.G_or_I = 'G'
THEN 'Group'
WHEN B.G_or_I = 'I'
THEN 'Individual' END AS ProviderStatus,
CASE
WHEN B.G_or_I = 'G'
THEN GroupSum
ELSE 'N/A' END
FROM
CountedGroups CG
JOIN TableA A
ON CG.ProviderID = A.ProviderID
LEFT OUTER JOIN TableB
ON A.ProviderID = B.ProviderID
This returns either null or completely incorrect column values
Other attempts have been a number of variations of this, with a mix of bad results or Oracle errors. As I mentioned above, I'm probably way overthinking it and the solution could be rather simple. Apologies if the information is confusing or I've not provided enough detail. The real tables have a lot of private medical information, and I tried to translate the essence of the issue as best I could.
Thank you.
You can use the CASE..WHEN and analytical function COUNT as follows:
SELECT
A.PROVIDERID,
A.PROVIDERNAME,
CASE
WHEN B.G_OR_I = 'G' THEN 'Group'
ELSE 'Individual'
END AS PROVIDERSTATUS,
CASE
WHEN B.G_OR_I = 'G' THEN TO_CHAR(COUNT(1) OVER(
PARTITION BY A.GROUPID
))
ELSE 'N/A'
END AS GROUPCOUNT
FROM
TABLE_A A
JOIN TABLE_B B ON A.PROVIDERID = B.PROVIDERID;
TO_CHAR is needed on COUNT as output expression must be of the same data type in CASE..WHEN
Your problem seems to be that you are missing a column. You need to add group name, otherwise you won't be able to differentiate rows for the same practitioner who works under multiple business entities (groups). This is probably why you have a DISTINCT on your query. Things looked like duplicates which weren't. Once you've done that, just use an analytic function to figure out the rest:
SELECT ta.providerid,
ta.providername,
DECODE(tb.g_or_i, 'G', 'Group', 'I', 'Individual') AS ProviderStatus,
ta.group_name,
CASE
WHEN tb.g_or_i = 'G' THEN COUNT(DISTINCT ta.provider_id) OVER (PARTITION BY ta.group_id)
ELSE 'N/A'
END AS GROUP_COUNT
FROM table_a ta
INNER JOIN table_b tb ON ta.providerid = tb.providerid
Is it possible that your LEFT JOIN was going the wrong direction? It makes more sense that your base demographic table would have all practitioners in it and then the Group table might be missing some records. For instance if the solo prac was operating under their own SSN and Type I NPI without applying for a separate Type II NPI or TIN.

Get specific row from each group

My question is very similar to this, except I want to be able to filter by some criteria.
I have a table "DOCUMENT" which looks something like this:
|ID|CONFIG_ID|STATE |MAJOR_REV|MODIFIED_ON|ELEMENT_ID|
+--+---------+----------+---------+-----------+----------+
| 1|1234 |Published | 2 |2019-04-03 | 98762 |
| 2|1234 |Draft | 1 |2019-01-02 | 98762 |
| 3|5678 |Draft | 3 |2019-01-02 | 24244 |
| 4|5678 |Published | 2 |2017-10-04 | 24244 |
| 5|5678 |Draft | 1 |2015-05-04 | 24244 |
It's actually a few more columns, but I'm trying to keep this simple.
For each CONFIG_ID, I would like to select the latest (MAX(MAJOR_REV) or MAX(MODIFIED_ON)) - but I might want to filter by additional criteria, such as state (e.g., the latest published revision of a document) and/or date (the latest revision, published or not, as of a specific date; or: all documents where a revision was published/modified within a specific date interval).
To make things more interesting, there are some other tables I want to join in.
Here's what I have so far:
SELECT
allDocs.ID,
d.CONFIG_ID,
d.[STATE],
d.MAJOR_REV,
d.MODIFIED_ON,
d.ELEMENT_ID,
f.ID FILE_ID,
f.[FILENAME],
et.COLUMN1,
e.COLUMN2
FROM DOCUMENT -- Get all document revisions
CROSS APPLY ( -- Then for each config ID, only look at the latest revision
SELECT TOP 1
ID,
MODIFIED_ON,
CONFIG_ID,
MAJOR_REV,
ELEMENT_ID,
[STATE]
FROM DOCUMENT
WHERE CONFIG_ID=allDocs.CONFIG_ID
ORDER BY MAJOR_REV desc
) as d
LEFT OUTER JOIN ELEMENT e ON e.ID = d.ELEMENT_ID
LEFT OUTER JOIN ELEMENT_TYPE et ON e.ELEMENT_TYPE_ID=et.ID
LEFT OUTER JOIN TREE t ON t.NODE_ID = d.ELEMENT_ID
OUTER APPLY ( -- This is another optional 1:1 relation, but it's wrongfully implemented as m:n
SELECT TOP 1
FILE_ID
FROM DOCUMENT_FILE_RELATION
WHERE DOCUMENT_ID=d.ID
ORDER BY MODIFIED_ON DESC
) as df -- There should never be more than 1, but we're using TOP 1 just in case, to avoid duplicates
LEFT OUTER JOIN [FILE] f on f.ID=df.FILE_ID
WHERE
allDocs.CONFIG_ID = '5678' -- Just for testing purposes
and d.state ='Released' -- One possible filter criterion, there may be others
It looks like the results are correct, but multiple identical rows are returned.
My guess is that for documents with 4 revisions, the same values are found 4 times and returned.
A simple SELECT DISTINCT would solve this, but I'd prefer to fix my query.
This would be a classic row_number & partition by question I think.
;with rows as
(
select <your-columns>,
row_number() over (partion by config_id order by <whatever you want>) as rn
from document
join <anything else>
where <whatever>
)
select * from rows where rn=1

UPDATE SQL table based on query

I have a table called students with 1000 students in. I have a query which tells me which of those students has free tuition. In the stduents table I have a field called FreeTuition and I want to populate/update that field with the results of the query. Do I need to use some kind of loop?
The students table has StuCode which is unique, the query returns StuCode of all the students with free tuition. This is how I want it to look:
| StuCode | FreeTuition |
-------------------------
| S12345 | Yes |
| S12346 | No |
-------------------------
Not at all. Something like this:
with yourquery as (
<your query here>
)
update s
set FreeTuition = (case when yq. StuCode is not null then 'Y' else 'N' end)
from students s left join
yourquery yq
on s. StuCode = yq. StuCode;
Note: This sets the value for all students, yes or no. You can change the left join to just join to set the value only for students returned by the subquery.

SQL - Computing overlap between Interests

I have a schema (millions of records with proper indexes in place) that looks like this:
groups | interests
------ | ---------
user_id | user_id
group_id | interest_id
A user can like 0..many interests and belong to 0..many groups.
Problem: Given a group ID, I want to get all the interests for all the users that do not belong to that group, and, that share at least one interest with anyone that belongs to the same provided group.
Since the above might be confusing, here's a straightforward example (SQLFiddle):
| 1 | 2 | 3 | 4 | 5 | (User IDs)
|-------------------|
| A | | A | | |
| B | B | B | | B |
| | C | | | |
| | | D | D | |
In the above example users are labeled with numbers while interests have characters.
If we assume that users 1 and 2 belong to group -1, then users 3 and 5 would be interesting:
user_id interest_id
------- -----------
3 A
3 B
3 D
5 B
I already wrote a dumb and very inefficient query that correctly returns the above:
SELECT * FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "interests" WHERE "interest_id" IN (
SELECT "interest_id" FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
) AND "user_id" NOT IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
);
But all my attempts to translate that into a proper joined query revealed themselves fruitless: either the query returns way more rows than it should or it just takes 10x as long as the sub-query, like:
SELECT "iii"."user_id" FROM "interests" AS "iii"
WHERE EXISTS
(
SELECT "ii"."user_id", "ii"."interest_id" FROM "groups" AS "gg"
INNER JOIN "interests" AS "ii" ON "gg"."user_id" = "ii"."user_id"
WHERE EXISTS
(
SELECT "i"."interest_id" FROM "groups" AS "g"
INNER JOIN "interests" AS "i" ON "g"."user_id" = "i"."user_id"
WHERE "group_id" = -1 AND "i"."interest_id" = "ii"."interest_id"
) AND "group_id" != -1 AND "ii"."user_id" = "iii"."user_id"
);
I've been struggling trying to optimize this query for the past two nights...
Any help or insight that gets me in the right direction would be greatly appreciated. :)
PS: Ideally, one query that returns an aggregated count of common interests would be even nicer:
user_id totalInterests commonInterests
------- -------------- ---------------
3 3 1/2 (either is fine, but 2 is better)
5 1 1
However, I'm not sure how much slower it would be compared to doing it in code.
Using the following to set up test tables
--drop table Interests ----------------------------
CREATE TABLE Interests
(
InterestId char(1) not null
,UserId int not null
)
INSERT Interests values
('A',1)
,('A',3)
,('B',1)
,('B',2)
,('B',3)
,('B',5)
,('C',2)
,('D',3)
,('D',4)
-- drop table Groups ---------------------
CREATE TABLE Groups
(
GroupId int not null
,UserId int not null
)
INSERT Groups values
(-1, 1)
,(-1, 2)
SELECT * from Groups
SELECT * from Groups
The following query would appear to do what you want:
DECLARE #GroupId int
SET #GroupId = -1
;WITH cteGroupInterests (InterestId)
as (-- List of the interests referenced by the target group
select distinct InterestId
from Groups gr
inner join Interests nt
on nt.UserId = gr.UserId
where gr.GroupId = #GroupId)
-- Aggregate interests for each user
SELECT
UserId
,count(OwnInterstId) OwnInterests
,count(SharedInterestId) SharedInterests
from (-- Subquery lists all interests for each user
select
nt.UserId
,nt.InterestId OwnInterstId
,cte.InterestId SharedInterestId
from Interests nt
left outer join cteGroupInterests cte
on cte.InterestId = nt.InterestId
where not exists (-- Correlated subquery: is "this" user in the target group?)
select 1
from Groups gr
where gr.GroupId = #GroupId
and gr.UserId = nt.UserId)) xx
group by UserId
having count(SharedInterestId) > 0
It appears to work, but I'd want to do more elaborate tests, and I've no idea how well it'd work against millions of rows. Key points are:
cte creates a temp table referenced by the later query; building an actual temp table might be a performance boost
Correlated subqueries can be tricky, but indexes and not exists should make this pretty quick
I was lazy and left out all the underscores, sorry
This is a bit confounding. I think the best approach is exists and not exists:
select i.*
from interest i
where not exists (select 1
from groups g
where i.user_id = g.user_id and
g.group_id = $group_id
) and
exists (select 1
from groups g join
interest i2
on g.user_id = i2.user_id
where g.user_id <> i.user_user_id and
i.interest_id = i2.interest_id
);
The first subquery is saying that the user is not in the group. The second is saying that the interest is shared with someone who is in the group.