How to query two same values in SQL Server 2008? - sql

I have table in SQL Server 2008 with following fields
RoomUserId->Primary key
RoomId->Foreign Key of Table Rooms
UserId->Foreign Key of Table Users
Now I have the values as following where the RoomId is common for both users
RoomUserId RoomId UserId
1 11 1
2 11 2
3 12 1
4 12 3
5 13 1
6 13 4
Now I need a SQL query to find the roomid of two users which is distinct. i.e, roomid of user 1 and user 2, roomid of user 1 and user 3.
Please anyone help me with is since I am new to SQL Server.

Try this:
SELECT r1.roomId FROM room r1 JOIN room r2 ON r1.roomId = r2.roomId WHERE r1.userId = 1 AND r2.userId = 3

If you mean find a roomid that is common to users:
select distinct f.roomid
from rooms f
inner join rooms s on f.roomid = s.roomid and f.userid <> s.userid
Or you can use grouping:
select roomid
from rooms
group by roomid
having count(distinct userid) > 1

If you only ever need rooms where there is more than one user then this will work:
SELECT DISTINCT RoomID
FROM RoomUser r1
INNER JOIN RoomUser r2
ON r1.RoomID = r2.RoomID
AND r1.RoomUserID != r2.RoomUserID
If you need the room ID of rooms with x users then use the Having Clause, this is more extensible than self joining e.g. if you need to find room IDs with 3 or more User IDs then you would end up with:
SELECT RoomID
FROM RoomUser
GROUP BY RoomID
HAVING COUNT(DISTINCT UserID) > 3
Whereas using self joins, while probably more efficient will end up with some quite messy SQL. Check execution plans and run some tests to see which is more efficient for your needs.
If you actually need the user IDs of the Users in Rooms with more than one User ID then you could use a CTE to build a comma separated string of users in each room:
;WITH RoomUserCTE AS
( SELECT RoomID,
MIN(UserID) [UserID],
CONVERT(VARCHAR(20), MIN(UserID)) [Users],
0 [Recursion]
FROM RoomUser
GROUP BY RoomID
UNION ALL
SELECT a.RoomID,
b.UserID [UserID],
CONVERT(VARCHAR(20), Users + ', ' + CONVERT(VARCHAR, b.UserID)),
Recursion + 1
FROM RoomUserCTE a
INNER JOIN RoomUser b
ON a.RoomID = b.RoomID
AND b.UserID > a.UserID
)
SELECT RoomID, Users
FROM ( SELECT *, MAX(Recursion) OVER(PARTITION BY RoomID) [MaxRecursion]
FROM RoomUserCTE
) cte
WHERE MaxRecursion = Recursion
For the data in your question this will yield
| RoomID | Users |
|---------+---------|
| 11 | 1, 2 |
| 12 | 1, 3 |
| 13 | 1, 4 |
This would work no matter how many user IDs were associated with the same Room ID, so again is more forward compatible.

Related

Postgres, groupBy and count for table and relations at the same time

I have a table called 'users' that has the following structure:
id (PK)
campaign_id
createdAt
1
123
2022-07-14T10:30:01.967Z
2
1234
2022-07-14T10:30:01.967Z
3
123
2022-07-14T10:30:01.967Z
4
123
2022-07-14T10:30:01.967Z
At the same time I have a table that tracks clicks per user:
id (PK)
user_id(FK)
createdAt
1
1
2022-07-14T10:30:01.967Z
2
2
2022-07-14T10:30:01.967Z
3
2
2022-07-14T10:30:01.967Z
4
2
2022-07-14T10:30:01.967Z
Both of these table are up to millions of records... I need the most efficient query to group the data per campaign_id.
The result I am looking for would look like this:
campaign_id
total_users
total_clicks
123
3
1
1234
1
3
I unfortunately have no idea how to achieve this while minding performance and most important of it all I need to use WHERE or HAVING to limit the query in a certain time range by createdAt
Note, PostgreSQL is not my forte, nor is SQL. But, I'm learning spending some time on your question. Have a go with INNER JOIN after two seperate SELECT() statements:
SELECT * FROM
(
SELECT campaign_id, COUNT (t1."id(PK)") total_users FROM t1 GROUP BY campaign_id
) tbl1
INNER JOIN
(
SELECT campaign_id, COUNT (t2."user_id(FK)") total_clicks FROM t2 INNER JOIN t1 ON t1."id(PK)" = t2."user_id(FK)" GROUP BY campaign_id
) tbl2
USING(campaign_id)
See an online fiddle. I believe this is now also ready for a WHERE clause in both SELECT statements to filter by "createdAt". I'm pretty sure someone else will come up with something better.
Good luck.
Hope this will help you.
select u.campaign_id,
count(distinct u.id) users_count,
count(c.user_id) clicks_count
from
users u left join clicks c on u.id=c.user_id
group by 1;
See here query output

SQL query (Postgres) how to answer that?

I have a table with company id's (non unique) and some attribute (let's call it status id), status can be between 1 to 18 (many to many the row id is what unique)
now I need to get results of companies who only have rows with 1 and 18, if they have any number as well (let's say 3) then this company should not be returned.
The data is stored as row id, some meta data, company id and one status id, the example below is AFTER I ran a group by query.
So as an example if I do group by and string agg, I am getting these values:
Company ID Status
1 1,9,12,18
2 12,13,18
3 1
4 8
5 18
So in this case I need to return only 3 and 5.
You should fix your data model. Here are some reasons:
Storing numbers in strings is BAD.
Storing multiple values in a string is BAD.
SQL has poor string processing capabilities.
Postgres offers many ways to store multiple values -- a junction table, arrays, and JSON come to mind.
For your particular problem, how about an explicit comparison?
where status in ('1', '18', '1,18', '18,1')
You can group by companyid and set 2 conditions in the having clause:
select companyid
from tablename
group by companyid
having
sum((status in (1, 18))::int) > 0
and
sum((status not in (1, 18))::int) = 0
Or with EXCEPT:
select companyid from tablename
except
select companyid from tablename
where status not in (1, 18)
See the demo.
Results:
> | companyid |
> | --------: |
> | 3 |
> | 5 |
You can utilize group by and having. ie:
select *
from myTable
where statusId in (1,18)
and companyId in (select companyId
from myTable
group by companyId
having count(distinct statusId) = 1);
EDIT: If you meant to include those who have 1,18 and 18,1 too, then you could use array_agg instead:
select *
from t t1
inner join
(select companyId, array_agg(statusId) as statuses
from t
group by companyId
) t2 on t1.companyid = t2.companyid
where array[1,18] #> t2.statuses;
EDIT: If you meant to get back only companyIds without the rest of columns and data:
select companyId
from t
group by companyId
having array[1,18] #> array_agg(statusId);
DbFiddle Demo

In SQL, how can i segment users by number of items they have? (redshift)

I'm not a SQL expert so apologies if this is actually really simple.
I have a table that lists users and the different questionnaires they have taken. Users can take questionnaires in any order and take as many as they like. There are a total of 7 available and I want to get a view of how many have taken 1 out of 7, 2 of 7, 3 of 7 etc etc
So a really rough example is the table might look like this:
And I want a query that will show me:
count Users with 1 Q: 1
count Users with 2 Q: 2
count Users with 3 Q: 0
count Users with 4 Q: 0
count Users with 5 Q: 1
count Users with 6 Q: 0
count Users with 7 Q: 0
You can do this with two levels of aggregation:
select cnt_questionnaires, count(*) cnt_users
from (
select count(*) cnt_questionnaires from mytable group by userID
) t
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t ;
create table #t (userid INT, q nvarchar(32));
insert into #t
values
(1,'Q1'),
(1,'Q3'),
(2,'Q2'),
(3,'Q1'),
(3,'Q2'),
(3,'Q3'),
(3,'Q4'),
(3,'Q5'),
(4,'Q2'),
(4,'Q3')
-- select * from #t
SELECT
v.qCount,
Count(c.userid) uCount
FROM
(VALUES (1),(2),(3),(4),(5),(6),(7)) v(qCount)
LEFT JOIN (
select
userid, count(q) qCount
from
#t
group by userid
) c ON c.qCount = v.qCount
GROUP BY
v.qCount
Assuming you have user_id on each row, the challenge is getting the zero values. Redshift is not very flexible when it comes to creating tables. Assuming your source data has enough rows, you can use:
select n.n, coalesce(u.cnt, 0)
from (select row_number() over () as n
from t
limit 7
) n left join
(select user_id, count(*) as cnt
from t
group by user_id
) u
on n.n = u.cnt;

SQL - Computing overlap between Interests

I have a schema (millions of records with proper indexes in place) that looks like this:
groups | interests
------ | ---------
user_id | user_id
group_id | interest_id
A user can like 0..many interests and belong to 0..many groups.
Problem: Given a group ID, I want to get all the interests for all the users that do not belong to that group, and, that share at least one interest with anyone that belongs to the same provided group.
Since the above might be confusing, here's a straightforward example (SQLFiddle):
| 1 | 2 | 3 | 4 | 5 | (User IDs)
|-------------------|
| A | | A | | |
| B | B | B | | B |
| | C | | | |
| | | D | D | |
In the above example users are labeled with numbers while interests have characters.
If we assume that users 1 and 2 belong to group -1, then users 3 and 5 would be interesting:
user_id interest_id
------- -----------
3 A
3 B
3 D
5 B
I already wrote a dumb and very inefficient query that correctly returns the above:
SELECT * FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "interests" WHERE "interest_id" IN (
SELECT "interest_id" FROM "interests" WHERE "user_id" IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
) AND "user_id" NOT IN (
SELECT "user_id" FROM "groups" WHERE "group_id" = -1
)
);
But all my attempts to translate that into a proper joined query revealed themselves fruitless: either the query returns way more rows than it should or it just takes 10x as long as the sub-query, like:
SELECT "iii"."user_id" FROM "interests" AS "iii"
WHERE EXISTS
(
SELECT "ii"."user_id", "ii"."interest_id" FROM "groups" AS "gg"
INNER JOIN "interests" AS "ii" ON "gg"."user_id" = "ii"."user_id"
WHERE EXISTS
(
SELECT "i"."interest_id" FROM "groups" AS "g"
INNER JOIN "interests" AS "i" ON "g"."user_id" = "i"."user_id"
WHERE "group_id" = -1 AND "i"."interest_id" = "ii"."interest_id"
) AND "group_id" != -1 AND "ii"."user_id" = "iii"."user_id"
);
I've been struggling trying to optimize this query for the past two nights...
Any help or insight that gets me in the right direction would be greatly appreciated. :)
PS: Ideally, one query that returns an aggregated count of common interests would be even nicer:
user_id totalInterests commonInterests
------- -------------- ---------------
3 3 1/2 (either is fine, but 2 is better)
5 1 1
However, I'm not sure how much slower it would be compared to doing it in code.
Using the following to set up test tables
--drop table Interests ----------------------------
CREATE TABLE Interests
(
InterestId char(1) not null
,UserId int not null
)
INSERT Interests values
('A',1)
,('A',3)
,('B',1)
,('B',2)
,('B',3)
,('B',5)
,('C',2)
,('D',3)
,('D',4)
-- drop table Groups ---------------------
CREATE TABLE Groups
(
GroupId int not null
,UserId int not null
)
INSERT Groups values
(-1, 1)
,(-1, 2)
SELECT * from Groups
SELECT * from Groups
The following query would appear to do what you want:
DECLARE #GroupId int
SET #GroupId = -1
;WITH cteGroupInterests (InterestId)
as (-- List of the interests referenced by the target group
select distinct InterestId
from Groups gr
inner join Interests nt
on nt.UserId = gr.UserId
where gr.GroupId = #GroupId)
-- Aggregate interests for each user
SELECT
UserId
,count(OwnInterstId) OwnInterests
,count(SharedInterestId) SharedInterests
from (-- Subquery lists all interests for each user
select
nt.UserId
,nt.InterestId OwnInterstId
,cte.InterestId SharedInterestId
from Interests nt
left outer join cteGroupInterests cte
on cte.InterestId = nt.InterestId
where not exists (-- Correlated subquery: is "this" user in the target group?)
select 1
from Groups gr
where gr.GroupId = #GroupId
and gr.UserId = nt.UserId)) xx
group by UserId
having count(SharedInterestId) > 0
It appears to work, but I'd want to do more elaborate tests, and I've no idea how well it'd work against millions of rows. Key points are:
cte creates a temp table referenced by the later query; building an actual temp table might be a performance boost
Correlated subqueries can be tricky, but indexes and not exists should make this pretty quick
I was lazy and left out all the underscores, sorry
This is a bit confounding. I think the best approach is exists and not exists:
select i.*
from interest i
where not exists (select 1
from groups g
where i.user_id = g.user_id and
g.group_id = $group_id
) and
exists (select 1
from groups g join
interest i2
on g.user_id = i2.user_id
where g.user_id <> i.user_user_id and
i.interest_id = i2.interest_id
);
The first subquery is saying that the user is not in the group. The second is saying that the interest is shared with someone who is in the group.

Query to select distinct values from different tables and not have them repeat (show them as a flat file)

I'm trying to get all phones, emails, and organizations for a person and show it in a flat file format. There should be n number of rows, where n is the max count of organizations, emails, or phones. NULL values will be shown once all values have been shown in the rows, with NULL being the last values. The emails and phones can only have 1 PreferredInd per person. I want these to be on the same row (1 of them can be NULL). I've tried to do this on a more complex query, but couldn't get it to work, so I've started over using this simpler example.
Example tables and values:
#ContactPerson
Id Name
1 John Doe
#ContactEmail
Id PersonId Email PreferredInd
1 1 johndoe#us.gov 0
2 1 jdoe#us.gov 1
3 1 johndoe#gmail.com 0
#ContactPhone
Id PersonId Phone PreferredInd
1 1 888-867-5309 0
2 1 305-476-5234 1
#ContactOrganization
Id PersonId Organization
1 1 US Government
2 1 US Army
I want a resulting set to look like:
Name Organization PreferredInd Email Phone
John Doe US Government 1 jdoe#us.gov 888-867-5309
John Doe US Army 0 johndoe#us.gov 305-467-5234
John Doe NULL 0 johndoe#gmail.com NULL
The complete sql code that I have for this example is here on pastebin. It also includes code to create the sample tables. It works when the count of emails exceeds the count of organizations or phones, but that won't always be true. I can't seem to figure out how to get the result that I'm looking for. The actual tables I'm working with can have 0 or infinity emails, phones, or organizations per person. There will also be many more values, but I can fix that myself.
Can you help me fix my query or show me a simpler way to do it? If you have any questions, just let me know and I can try to answer them.
something like this?
with cte_e as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactEmail
), cte_p as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactPhone
), cte_o as (
select
*,
row_number() over(order by Organization) as rn
from ContactOrganization
), cte_d as (
select distinct rn, PersonId from cte_e union
select distinct rn, PersonId from cte_p union
select distinct rn, PersonId from cte_o
)
select
pr.Name, o.Organization, e.Email, p.Phone
from cte_d as d
left outer join ContactPerson as pr on pr.Id = d.PersonId
left outer join cte_e as e on e.PersonId = d.PersonId and e.rn = d.rn
left outer join cte_p as p on p.PersonId = d.PersonId and p.rn = d.rn
left outer join cte_o as o on o.PersonId = d.PersonId and o.rn = d.rn
sql fiddle demo
it's a bit clumsy, I can think of couple of other possible ways to do this, but I think this one is most readable one
Step 1
Write a query that does the full join of all the tables, which will end up with lots of duplicate rows for each person (for each email or phone number)
Step 2
Write a second query that uses GroupBy to group the rows, and that uses the Case or Decode keywords (like a c# switch statement) to find the preferred row value and select it as the value to display