Count how many users have both month A and B - sql

I want to count how many users have both january and february months. I have a users table with this structure and data:
id | 1
user | u1
month | january
id | 2
user | u1
month | february
id | 3
user | u2
month | january
In my example the response would be 1.
I've tried doing SELECT COUNT(*) FROM (SELECT * FROM users WHERE users.month = 'january') s1 LEFT JOIN users s2 ON s1.user = s2.user AND s2.month = 'february';
In my actual data set this SELECT COUNT(*) FROM users WHERE users.month = 'january' returns about 100 so the overall selection can not possibly be larger than this result, yet the result is way higher.
I'm sure the answer is very simple, however i'm not very proficient in SQL so i just don't know what part of the documentation i should be reading.

You can use conditional aggregation:
select count(*)
from (select t.user
from t
where t.month in ('january', 'february')
group by t.user
having count(distinct t.month) = 2
) t;
If there is at most one row per user per month, then a join might have better performance:
select count(*)
from t tj join
t tf
on tj.user = tf.user and
tj.month = 'january' and
tf.month = 'february';
If you can have duplicates, then count(distinct user) is needed.

With EXISTS:
select count(distinct user)
from tablename t
where t.month in ('january', 'february')
and exists (
select 1 from tablename where user = t.user and month > t.month
)
See the demo.

Related

SQL query that joins two tables and sum from second

I have this tables with example data:
Calendar
ID | Date
---+-----------
1 | 2020-01-01
2 | 2020-01-02
3 | 2020-01-03
EmployeeTimeWorked
ID | Date | HoursWorked | UserID
---+------------+--------------+-------
1 | 2020-01-01 | 2 | 2
2 | 2020-01-01 | 4 | 2
I want to make a MS-SQL query that shows days the user have not worked, and how many hours they have left to work (they should work 8 hours per day). All within within a time period, say a week.
The result should look like this:
EmployeeHaveNotWorked
Date | HoursLeftToWork
-----------+----------------
2020-01-01 | 2
Any idea how to make such a MS-SQL Query?
First get all users with all dates. This is done with a cross join. Seeing that you are using a UserID I suppose there is a users table. Otherwise get the users from the EmployeeTimeWorked table.
Then outer join the working times per user and date. This is a simple aggregation query.
Then subtract the worked hours from the required 8 hours.
select
u.userid,
c.date,
8 - coalesce(w.hours_worked, 0) as hours_left_to_work
from users u
cross join calendar c
left outer join
(
select userid, date, sum(hoursworked) as hours_worked
from employeetimeworked
group by userid, date
) w on w.userid = u.userid and w.date = c.date
order by u.userid, c.date;
Use a cross join to generate all possible rows and then filter out the ones that exist:
select u.userid, c.date,
8 - coalesce(sum(HoursWorked), 0) as remaining_time
from calendar c cross join
(select distinct userid from EmployeeTimeWorked) u left join
EmployeeTimeWorked etw
on etw.userid = u.userid and etw.date = c.date
where etw.userid is null
group by u.userid, c.date
having sum(HoursWorked) < 8
This query seem to have done it for me:
select * from (select c.Date, 8 - coalesce(sum(t.durationHours),0) hours_left_to_work
from Calendar c
left join TimeLog t on t.Date = c.Date
where c.date >= '2020-08-01' and c.date <= '2020-08-31'
group by c.Date) as q1
where q1.hours_left_to_work IS NOT NULL
AND q1.hours_left_to_work > 0;
TimeLog = EmployeeTimeWorked

Find the rows for customers that joined during a specific time period in another table

I have the 2 following tables:
customer_transaction
customer_id| event_name | event_date
1 | joined_rewards|2019-07-10
12 | joined_rewards|2018-07-10
17 | joined_rewards|2009-07-10
visit
customer_id| visit_start| visit_end|visit_type
1 | 2019-07-09|2019-07-11| IP
12 | 2018-06-11|2018-07-12| IP
17 | 2009-07-08|2009-07-10| EP
I want to know all the customers in the customer_transaction table that joined the rewards program between their visits of visit_type = IP. So for all the visit_types = IP, I want to know the customers who joined the rewards program during the frame of their visit period.
In this example, my new table would have customer ids 1 and 12.
I tried
SELECT DISTINCT customer_id, event_date
INTO visit_rewards
FROM customer_transaction
WHERE event_date BETWEEN (Select customer_id, visit_start, visit_end from visit)```
Click: demo:db<>fiddle
SELECT
v.customer_id
FROM
visit v
JOIN customer_transaction ct
ON ct.event_date BETWEEN v.visit_start AND v.visit_end
AND v.visit_type = 'IP'
If there can be multiple rows in the visits table (as suggested by the name), I would suggest exists:
select ct.*
from customer_transaction ct
where ct.event_name = 'joined_rewards' and
exists (select 1
from visits v
where v.customer_id = ct.customer_id and
v.visit_start <= ct.event_date and
v.visit_end >= ct.event_date and
v.visit_type = 'IP'
);

Select count of total records and also distinct records

I have a table such as this:
PalmId | UserId | CreatedDate
1 | 1 | 2018-03-08 14:18:27.077
1 | 2 | 2018-03-08 14:18:27.077
1 | 3 | 2018-03-08 14:18:27.077
1 | 1 | 2018-03-08 14:18:27.077
I wish to know how many dates were created for Palm 1 and I also wish to know how many users have created those dates for Palm 1. So the outcome for first is 4 and outcome for second is 3
I am wondering if I can do that in a single query as oppose to having to do a subquery and a join on itself as in example below.
SELECT MT.[PalmId], COUNT(*) AS TotalDates, T1.[TotalUsers]
FROM [MyTable] MT
LEFT OUTER JOIN (
SELECT MT2.[PalmId], COUNT(*) AS TotalUsers
FROM [MyTable] MT2
GROUP BY MT2.[UserId]
) T1 ON T1.[PalmId] = MT.[PalmId]
GROUP BY MT.[PalmId], T1.[TotalUsers]
According to first table you could do something like this:
select count(distinct uerid) as N_Users,
count(created_date) as created_date, -- if you use count(*) you consider also rows with 'NULL'
palmid
from your_table
group by palmid
If you want "4" and "3", then I think you want:
SELECT MT.PalmId, COUNT(*) AS NumRows, COUNT(DISTINCT mt.UserId) as NumUsers
FROM MyTable MT
GROUP BY MT.PalmId

Left Join with Group By

I am using PostgreSQL 9.4.
I have a table of workouts. Users can create multiple results for each workout, and a result has a score.
Given a list of workout_ids and two user_ids, I want to return the best score for each workout for each user. If the user does not have a result for that workout, I want to return a padded/null result.
SELECT "results".*, "workouts".*
FROM "results" LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
WHERE (
(user_id, workout_id, score) IN
(SELECT user_id, workout_id, MAX(score)
FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3)
GROUP BY user_id, workout_id)
)
In this query, the left join is acting as an inner join; I'm not getting any padding if the user has not got a result for the workout. This query should always return six rows, regardless of how many results exist.
Example data:
results
user_id | workout_id | score
-----------------------------
1 | 1 | 10
1 | 3 | 10
1 | 3 | 15
2 | 1 | 5
Desired result:
results.user_id | results.workout_id | max(results.score) | workouts.name
-------------------------------------------------------------------------
1 | 1 | 10 | Squat
1 | 2 | null | Bench
1 | 3 | 15 | Deadlift
2 | 1 | 5 | Squat
2 | 2 | null | Bench
2 | 3 | null | Deadlift
The where filters out your NULL values, so that is why the result is not what you expect.
Joinint the WHERE clause results instead of filter the where clause results.
SELECT "results".*, "workouts".*,"max_score".*
FROM "results"
LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
LEFT JOIN (SELECT user_id, workout_id, MAX(score)
FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3)
GROUP BY user_id, workout_id) max_score ON workouts.workout_id=max_score.workout_id;
You need to alter the SELECT to get the correct columns.
SELECT DISTINCT ON (1, 2)
u.user_id
, w.id AS workout_id
, r.score
, w.name AS workout_name
FROM workouts w
CROSS JOIN (VALUES (1), (2)) u(user_id)
LEFT JOIN results r ON r.workout_id = w.id
AND r.user_id = u.user_id
WHERE w.id IN (1, 2, 3)
ORDER BY 1, 2, r.score DESC NULLS LAST;
Step by step explanation
Form a complete Cartesian product of given workouts and users.
Assuming the given workouts always exist.
Assuming that not all given users have results for all given workouts.
LEFT JOIN to results. All conditions go into the ON clause of the LEFT JOIN, not into the WHERE clause, which would exclude (workout_id, user_id) combinations that have no result. See:
Rails includes query with conditions not returning all results from left table
Finally pick the best result per (user_id, workout_id) with DISTINCT ON. While being at it, produce the desired sort order. See:
Select first row in each GROUP BY group?
Depending on the size of tables and data distribution there may be faster solutions. See:
Optimize GROUP BY query to retrieve latest row per user
Simple version
If all you want is the maximum score for each (user_id, workout_id) combination, there is simple version:
SELECT user_id, workout_id, max(r.score) AS score
FROM unnest('{1,2}'::int[]) u(user_id)
CROSS JOIN unnest('{1,2,3}'::int[]) w(workout_id)
LEFT JOIN results r USING (user_id, workout_id)
GROUP BY 1, 2
ORDER BY 1, 2;
db<>fiddle here
Old sqlfiddle.
How about using distinct on or row_number()?
SELECT DISTINCT ON (r.user_id, r.workout_id) r.*, w.*
FROM "results" r LEFT JOIN
"workouts" w
ON "w."id" = r."workout_id"
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3)
ORDER BY r.user_id, r.workout_id, score desc;
The row_number() equivalent requires a subquery:
SELECT rw.*
FROM (SELECT r.*, w.*,
row_number() over (partition by user_id, workout_id order by score desc) as seqnum
FROM "results" r LEFT JOIN
"workouts" w
ON "w."id" = r."workout_id"
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3)
) rw
WHERE seqnum = 1;
You should choose the columns more judiciously than using a *. The subquery might return errors in the case of duplicate column names.
EDIT:
You need to generate the rows first, and then the results for each. Here is one method, building on the second query:
SELECT u.user_id, w.workout_id, rw.score, rw.name
FROM (SELECT 1 as user_id UNION ALL SELECT 2) u CROSS JOIN
(SELECT 1 as workout_id UNION ALL SELECT 2 UNION ALL SELECT 3) w LEFT JOIN
(SELECT r.*, w.*,
row_number() over (partition by user_id, workout_id order by score desc) as seqnum
FROM "results" r LEFT JOIN
"workouts" w
ON "w."id" = r."workout_id"
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3)
) rw
ON rw.user_id = u.user_id and rw.workout_id = w.workout_id and
rw.seqnum = 1;

(SQL) Match users belong to which group given user_id[]

user table
ID | name
1 | ada
2 | bob
3 | tom
group Table
ID | name
1 | group A
2 | group B
3 | group C
user_group Table
user_id | group_id
1 | 1
2 | 1
1 | 2
2 | 2
3 | 2
1 | 3
3 | 3
Given group of user ids : [1, 2, 3]
How to query the group that all users in the above list belongs to? (in this case: Group B)
To get all groups that contain exactly the specified users (i.e. all specified users and no other users)
DECLARE #numUsers int = 3
SELECT ug.group_id
--The Max doesn't really do anything here because all
--groups with the same group id have the same name. The
--max is just used so we can select the group name eventhough
--we aren't aggregating across group names
, MAX(g.name) AS name
FROM user_group ug
--Filter to only groups with three users
JOIN (SELECT group_id FROM user_group GROUP BY group_id HAVING COUNT(*) = #numUsers) ug2
ON ug.group_id = ug2.group_id
JOIN [group] g
ON ug.group_id = g.ID
WHERE user_id IN (1, 2, 3)
GROUP BY ug.group_id
--The distinct is only necessary if user_group
--isn't keyed by group_id, user_id
HAVING COUNT(DISTINCT user_id) = #numUsers
To get groups that contain all specified users:
DECLARE #numUsers int = 3
SELECT ug.group_id
--The Max doesn't really do anything here because all
--groups with the same group id have the same name. The
--max is just used so we can select the group name eventhough
--we aren't aggregating across group names
, MAX(g.name) AS name
FROM user_group ug
JOIN [group] g
ON ug.group_id = g.ID
WHERE user_id IN (1, 2, 3)
GROUP BY ug.group_id
--The distinct is only necessary if user_group
--isn't keyed by group_id, user_id
HAVING COUNT(DISTINCT user_id) = 3
SQL Fiddle: http://sqlfiddle.com/#!6/0e968/3
Try This:
Select t2.name
FROM
(Select group_id
From
user_group
Group by group_id
Having Count(user_id) = (Select Count(*) FROM User_Table)) AS T1
INNER JOIN
Group_Table AS T2
ON T1.group_id = T2.ID
See Fiddle: http://sqlfiddle.com/#!2/fa7250/4
Select UserID,count(*)
From UserGroupTable
group by UserID
This will give a count of 3 where the UserID/GroupID is unique (as zerkms pointed out)
SELECT name FROM group_tbl WHERE id IN (SELECT g_id FROM user_grp GROUP BY g_id HAVING Count(u_id)=(SELECT Count(id) FROM user_tbl));