How to: SELECT from multiple CTEs that are mutually exclusive - sql

I am in a situation where I need to show stats of a user based on referrals that user made & then stats about those referred users' activities. So, if user A refers user B, C then it means user A referred 2 users. Now, I need to get how many of those referred users did take an action i.e. PURCHASED. But they may not purchase which counts to 0. I have following query using CTEs which does work as expected but it also returns some false negative results. i.e.
WITH direct_referrals AS (
SELECT id
FROM "user"
WHERE "user"."referredBy" = ${userId}
),
application_stats AS (
SELECT count(status), status
FROM "application"
WHERE "userId" IN (SELECT id FROM direct_referrals)
GROUP BY status
)
SELECT *, (SELECT count(id) FROM direct_referrals) AS "totalReferrals"
FROM application_stats;
This query returns correct result if at-least 1 referred user took some action but it fails when none has taken any action in which case this query returns the referred users to be 0 which is not true.
I can see that SELECT is dependent on application_stats CTE which may not return any result & hence the direct_referrals are also not returned. I am somewhat new to SQL so don't really know many options. Any help is appreciated.
Thanks
Update with sample Data & Expected Results
// User model
Id username referredBy
---- -------- -------------------
1 jon NULL
2 jane 1
3 doe 1
4 smith 2
5 john 1
// Application model
Id userId status
---- -------- -------------------
1 12 'APPLIED'
2 13 'APPLIED'
3 14 'VIEWED'
Expected Result (for userId = 1):
User (referral) stats Application stats
------------------- -------------------
3 0
Actual Result:
User (referral) stats Application stats
------------------- -------------------
0 0

Something like this should give you what you want:
WITH REFERRAL AS (
SELECT REFERREDBY, COUNT(USERNAME) AS "REF_CNT"
FROM USER_MODEL
WHERE REFERREDBY IS NOT NULL
GROUP BY REFERREDBY
),
APPLICATIONS AS (
SELECT UM.REFERREDBY, COUNT(ML.APPL_STATUS) AS "APPL_CNT"
FROM USER_MODEL UM
LEFT OUTER JOIN APPLICATION_MODEL ML ON UM.ID = ML.USER_ID AND ML.APPL_STATUS = 'PURCHASED'
GROUP BY UM.REFERREDBY
)
SELECT R.REFERREDBY, R.REF_CNT, A.APPL_CNT
FROM REFERRAL R
LEFT OUTER JOIN APPLICATIONS A ON R.REFERREDBY = A.REFERREDBY;

Related

SQL: Process condition in each group

I have a table like below.
The rule is, In each group, there is always one person with Code (>0), we can see that person as primary person in group.
In each group, if the primary person with Code > 0 and Status = "active", then we choose this record (Allen in group A).
However, if if the primary person with Code > 0 has Status != "active", then we need to seek other people in its group.
In group B, Amanda has Code but inactive so she is out, in the rest 3 people, the Status of Sarah has higher priority than Joe (Status priority: active -> pre_active -> pending -> inactive), so we choose Sarah and give her code with 2 (same as Amanda, the primary record in this group).
If there are multiple record in group that code = 0 and has same status, then we choose the one on top (by sequence).
In the end, I have to keep 1 record for each group, and give them code number from primary record if the selected record has code in 0.
Name
Code
Status
Group
Allen
8
active
A
Louis
0
inactive
A
Cindy
0
inactive
A
Joe
0
pending
B
Amanda
2
inactive
B
Sarah
0
pre_active
B
The result should be like below:
Name
Code
Status
Group
Allen
8
active
A
Sarah
2
pre_active
B
Thanks in advance!
I did it with two correlated queries, which would not be too efficient.
I also used GroupCode field name to avoid escaping the keyword.
In MySQL syntax:
SELECT Name,
(SELECT MAX(Code)
FROM YourTable tbl2
WHERE tbl1.GroupCode=tbl2.GroupCode) Code,
Status,
GroupCode
FROM YourTable tbl1
WHERE tbl1.Name = (SELECT Name
FROM YourTable tbl3
WHERE tbl1.GroupCode=tbl3.GroupCode
ORDER BY
-- active -> pre_active -> pending -> inactive
CASE Status
WHEN 'active' THEN 0
WHEN 'pre_active' THEN 1
WHEN 'pending' THEN 2
WHEN 'inactive' THEN 3
END
LIMIT 1);
DEMO Fiddle

Mandatory condition matching in ´where-in´ clause

Consider a simple where clause
select * from table_abc where col_a in (1,2,3)
I know the current conditions
If 1,2,3 are absent, I will not get any results
If 1,2,3 are present, I will get all results associated with 1,2,3
If 1 is present and 2,3 is absent, I will get only results associated with 1.
My question is if we can execute the query for the condition for
If 1 is present and 2,3 is absent, I should still get all results associated with 1,2,3
However, if 1,2,3 are absent, I will not get any results
In other words, can I have a particular value in the where-in clause set as mandatory? How can we change the current query?
EDIT : As pointed out in the comment, I have forgot to add the table structure. It is better that I explain the use case as well.
Table 1 : Admins
ID admin_id
-------------
1 001
2 002
Table 2 : Events
ID event_id
-------------
1 110
2 220
Table 3 : Admins_Events
admin_id event_id
-------------
001 110
001 220
002 220
Now, as a part of filtering, let's say I have the query
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
LEFT JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
LEFT JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
WHERE (events.event_id IN (110) AND admins.admin_id IN (001))
And currently, I am getting the results as
admin_id event_id
-------------
001 110
where as I would want something like
admin_id event_id
-------------
001 110
001 220
I have to still show the other events associated with the admin even though I do not pass it in the where-in clause. I was thinking to pass all the event_id's every time and match the mandatory event_id and also match the remaining event_ids in case the mandatory event_id is found.
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
LEFT JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
LEFT JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
WHERE (events.event_id IN (mandatory[110], 220) AND admins.admin_id IN (001))
How can I change the query?
Add another condition with EXISTS in the WHERE clause:
SELECT a.admin_id, e.event_id
FROM Admins a
LEFT JOIN Admins_Events ae ON ae.admin_id = a.admin_id
LEFT JOIN Events e ON e.event_id = ae.event_id
WHERE (e.event_id IN (110, 220) AND a.admin_id IN (001))
AND EXISTS (
SELECT 1 FROM Admins_Events
WHERE event_id = 110 AND admin_id = a.admin_id
)
See the demo.
Results:
| admin_id | event_id |
| -------- | -------- |
| 1 | 110 |
| 1 | 220 |
It sounds like in your example you want all events associated with admins who are associated with event 110. In Mysql I'd do it with the following query, which joins admins to events twice: once to filter for the event you need, and once to get all the other events.
However, in your example, you don't need the admins table in the query at all, since you just need the admin ID, which you can get directly from the admins_events table. I left it in, in case your real "admins" table also had other attributes you wanted (name, location, etc) which are not available in the admins_events join table.
The specific query is:
SELECT "admins"."admin_id", "events"."event_id" FROM "admins"
JOIN "admins_events" ON "admins_events"."admin_id" = "admins"."admin_id"
JOIN "events" ON "events"."event_id" = "admins_events"."event_id"
JOIN "admins_events" AS "specific_admins_events" ON "specific_admins_events"."admin_id" = "admins"."admin_id"
JOIN "events" AS "specific_events" ON "specific_events"."event_id" = "specific_admins_events"."event_id"
WHERE (specific_events.event_id IN (110))
First, you only need the admin_events table.
Then method uses window functions:
SELECT ae.*
FROM (SELECT ae.*,
SUM(CASE WHEN ae.event_id IN (110) THEN 1 ELSE 0 END) OVER (PARTITION BY ae.admin_id) as num_110
FROM admins_events ae
) ae
WHERE ae.admin_id IN ('001') AND -- assume this is a string
num_110 > 0 AND
ae.event_id IN (110, 220);

SQL - to find the most complete strings in a column from a table

Each time a user searches for a text on the website, the search text gets recorded to search_table. The sub-searches are also recorded. They are recorded with an asterisk.
The goal is to find the most complete search texts that the user searched for.
The ideal way would be:
Group the ids = 1,4,6 and obtain id=6
Group the ids = 2,5,7 and obtain id = 7
Group the ids = 3 and obtain id = 3
Group the ids 8, 9 and obtain id = 9
SEARCH_TABLE
id user search_text
--------------------
1 user1 data manag*
2 user1 confer*
3 user1 incomplete sear*
4 user1 data managem*
5 user1 conference c*
6 user1 data management
7 user1 conference call
8 user1 status in*
9 user1 status information
Output should be
user search_text
---------------------
user1 data management
user1 conference call
user1 incomplete sear*
user1 status information
Can you help please?
Something like below should do the work:
SELECT * FROM
SEARCH_TABLE st
WHERE
NOT EXISTS (
SELECT 1 FROM
SEARCH_TABLE st2
-- remove asterkis and ad %
WHERE st2.search_Text LIKE replace(st.search_text,'*','')||'%'
)
This is filtering all searches that are part of others.
This is probably not the most elegant way, but here's a go at it:
alter table your_table
add group_id int
select [user], left(search_text, 5) as Group_Text, IDENTITY(int, 1,1) as Group_ID
into #group_id_table
from your_table
group by [user], left(search_text, 5)
order by [user], left(search_text, 5)
update a
set a.group_id = b.group_id
from your_table as a
join #group_id_table as b
on left(search_text, 5) = group_text
select [user], max(search_text), group_id
from your_table
group by [user], group_id
order by [user], group_id
This achieved the desired results when I ran it, but of course because you're basing the group_id's off a user specified string length there could be issues there. I hope this does the job for you.
Give this a shot. I separated out the completed texts (and their shorter partials), and then found the longest partial for each record. Tested in Oracle as I don't have access to a PostgreSQL right now, but I didn't use anything exotic so it should work.
with
--Contains all completed searches
COMPLETE as (select * from SEARCH_TABLE where SEARCH_TEXT not like '%*'),
--Contains all searches that are incomplete and dont have a completed match
INCOMPLETE as (
select S.*
from SEARCH_TABLE S
left join COMPLETE C
on S.USR = C.USR
and C.SEARCH_TEXT like replace(S.SEARCH_TEXT, '*', '%')
where C.ID is null
),
--chains all incompleted with any matching pattern shorter than it.
CHAINED_INC as (
select LONGER.USR, LONGER.ID, LONGER.SEARCH_TEXT, SHORTER.SEARCH_TEXT SEARCH_TEXT_SHORT
from INCOMPLETE LONGER
join INCOMPLETE SHORTER
on LONGER.SEARCH_TEXT like replace(SHORTER.SEARCH_TEXT, '*', '%')
and LONGER.ID <> SHORTER.ID
)
--if a text is not the shorter text for a different record, that means it's the longest text for that pattern.
select distinct T1.USR, T1.SEARCH_TEXT
from CHAINED_INC T1
left join CHAINED_INC T2
on T1.USR = T2.USR
and T1.SEARCH_TEXT = T2.SEARCH_TEXT_SHORT
where T2.SEARCH_TEXT_SHORT is null
--finally, union back to the completed texts.
union all
select USR, SEARCH_TEXT from COMPLETE
;
Edit: removed ID from select

SQL - Computing overlap between Interests

I have a schema (millions of records with proper indexes in place) that looks like this:
groups | interests
------ | ---------
user_id | user_id
group_id | interest_id
A user can like 0..many interests and belong to 0..many groups.
Problem: Given a group ID, I want to get all the interests for all the users that do not belong to that group, and, that share at least one interest with anyone that belongs to the same provided group.
Since the above might be confusing, here's a straightforward example (SQLFiddle):
| 1 | 2 | 3 | 4 | 5 | (User IDs)
|-------------------|
| A | | A | | |
| B | B | B | | B |
| | C | | | |
| | | D | D | |
In the above example users are labeled with numbers while interests have characters.
If we assume that users 1 and 2 belong to group -1, then users 3 and 5 would be interesting:
user_id interest_id
------- -----------
3 A
3 B
3 D
5 B
I already wrote a dumb and very inefficient query that correctly returns the above:
SELECT * FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "interests" WHERE "interest_id" IN (
SELECT "interest_id" FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
) AND "user_id" NOT IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
);
But all my attempts to translate that into a proper joined query revealed themselves fruitless: either the query returns way more rows than it should or it just takes 10x as long as the sub-query, like:
SELECT "iii"."user_id" FROM "interests" AS "iii"
WHERE EXISTS
(
SELECT "ii"."user_id", "ii"."interest_id" FROM "groups" AS "gg"
INNER JOIN "interests" AS "ii" ON "gg"."user_id" = "ii"."user_id"
WHERE EXISTS
(
SELECT "i"."interest_id" FROM "groups" AS "g"
INNER JOIN "interests" AS "i" ON "g"."user_id" = "i"."user_id"
WHERE "group_id" = -1 AND "i"."interest_id" = "ii"."interest_id"
) AND "group_id" != -1 AND "ii"."user_id" = "iii"."user_id"
);
I've been struggling trying to optimize this query for the past two nights...
Any help or insight that gets me in the right direction would be greatly appreciated. :)
PS: Ideally, one query that returns an aggregated count of common interests would be even nicer:
user_id totalInterests commonInterests
------- -------------- ---------------
3 3 1/2 (either is fine, but 2 is better)
5 1 1
However, I'm not sure how much slower it would be compared to doing it in code.
Using the following to set up test tables
--drop table Interests ----------------------------
CREATE TABLE Interests
(
InterestId char(1) not null
,UserId int not null
)
INSERT Interests values
('A',1)
,('A',3)
,('B',1)
,('B',2)
,('B',3)
,('B',5)
,('C',2)
,('D',3)
,('D',4)
-- drop table Groups ---------------------
CREATE TABLE Groups
(
GroupId int not null
,UserId int not null
)
INSERT Groups values
(-1, 1)
,(-1, 2)
SELECT * from Groups
SELECT * from Groups
The following query would appear to do what you want:
DECLARE #GroupId int
SET #GroupId = -1
;WITH cteGroupInterests (InterestId)
as (-- List of the interests referenced by the target group
select distinct InterestId
from Groups gr
inner join Interests nt
on nt.UserId = gr.UserId
where gr.GroupId = #GroupId)
-- Aggregate interests for each user
SELECT
UserId
,count(OwnInterstId) OwnInterests
,count(SharedInterestId) SharedInterests
from (-- Subquery lists all interests for each user
select
nt.UserId
,nt.InterestId OwnInterstId
,cte.InterestId SharedInterestId
from Interests nt
left outer join cteGroupInterests cte
on cte.InterestId = nt.InterestId
where not exists (-- Correlated subquery: is "this" user in the target group?)
select 1
from Groups gr
where gr.GroupId = #GroupId
and gr.UserId = nt.UserId)) xx
group by UserId
having count(SharedInterestId) > 0
It appears to work, but I'd want to do more elaborate tests, and I've no idea how well it'd work against millions of rows. Key points are:
cte creates a temp table referenced by the later query; building an actual temp table might be a performance boost
Correlated subqueries can be tricky, but indexes and not exists should make this pretty quick
I was lazy and left out all the underscores, sorry
This is a bit confounding. I think the best approach is exists and not exists:
select i.*
from interest i
where not exists (select 1
from groups g
where i.user_id = g.user_id and
g.group_id = $group_id
) and
exists (select 1
from groups g join
interest i2
on g.user_id = i2.user_id
where g.user_id <> i.user_user_id and
i.interest_id = i2.interest_id
);
The first subquery is saying that the user is not in the group. The second is saying that the interest is shared with someone who is in the group.

The best way to come up with the desired result in this query

I have two tables one has a set of ids and the other one has a set of ids and a user id as follows
Client
id
-----
3
4
6
7
9
11
Business
ClientId | userId
----------------------
4 2
4 3
9 2
So basically i will have a parameter coming in #userId and if the #userId = 2 for example then that user has access to clientId 4 and 9 and all the others in the ClientId table but if say #userId = 5, this user can not access client ids 4 and 9 because they are restricted in the business table to only clients 2 and 3.
My desire result is a list of all the client ids a user can see (but must check that if a user id is specified in the business table and the user id is non of those there, that user id can not see the client ids in the business table when querying the clients table.
I am sorry it is so confusing.. i am having a hard time coming up with this one.. Any pointers would be much appreciated.
The Result should be
Assume User ID = 2
id
---
3
4
6
7
9
11
Assume User ID = 13
id
---
3
6
7
11
Because 4 and 9 are restricted to users 2 and 3 respectively.
Edit: based on my re-read, my understanding of the logic is this: return all Clients if the UserID is in the Business table, return all clients that don't exist in the Business table otherwise.
IF EXISTS (SELECT ClientId FROM Business WHERE UserId = #userId)
BEGIN
SELECT DISTINCT Id
FROM Client
END
ELSE
BEGIN
SELECT Id
FROM Client
WHERE Id NOT IN
(
SELECT DISTINCT ClientId
FROM Business
)
END
select DISTINCT ID from Client C
left join Business B on C.ID = B.ClientID
Where B.ClientID is null OR B.UserID = #UserID
I believe this is the query you want based on my interpretation of your question.
Here's the plain English description of what the query does: Remove from the list of all Clients, the clients that are restricted by the Business table. Then add the list of Clients that the user has explicit access to.
declare #userid int =5
(
select Id from Client
except
select clientId from Business
)
union
select clientId from Business where userId = #userId