get sum of count of count postgres - sql

I have two tables jobs and users.
Users has a one-to-many relationship with jobs.
I want to segment users into groups of jobs_done.
In other words, how many users did 1 job, 2 jobs, 3 jobs, etc
The below query does that. However, I would like to lump together all users that have done 3 or more jobs into one group.
Here is the query I currently have
select
jobs_done,
count(1) as number_of_users
from ( select
u.id,
count(*) as jobs_done
from jobs j
JOIN users u on j.user_id = u.id
group by u.id ) a
group by jobs_done
Current Output:
times_used number_of_users
1 255
2 100
3 30
4 10
5 9
Desired Output:
times_used number_of_users
1 255
2 100
3+ 49

You can use a case expression to group values 3+ into one large group. This should work:
select
case
when jobs_done >= 3 then '3+'
else cast(jobs_done as varchar(5))
end as jobs_done,
count(1) as number_of_users
from (
select
u.id,
count(*) as jobs_done
from jobs j
join users u on j.user_id = u.id
group by u.id
) a
group by case when jobs_done >= 3 then '3+'
else cast(jobs_done as varchar(5))
end;

you can group by basically everything. this is a simplistic example:
test=# SELECT CASE WHEN x < 4 THEN x::text ELSE '4+' END AS y,
count(*)
FROM generate_series(1, 10) AS x
GROUP BY y
ORDER BY 1;
y | count
----+-------
1 | 1
2 | 1
3 | 1
4+ | 7
(4 rows)

Related

SUM the COUNT results from a database table with two different value

I'd like to count total events, which can have two different values, and I could not figure out how to merge them together. My query is the following:
SELECT TOP(20)
[MatchEvents].[PlayerID], [MatchEvents].[EventType],
COUNT([MatchEvents].[ID]) AS [TOTAL]
FROM
[MatchEvents]
INNER JOIN
[Match] ON [MatchEvents].[MatchID] = [Match].[ID]
AND [Match].[Season] = 1
WHERE
([MatchEvents].[EventType] = 0 OR [MatchEvents].[EventType] = 1)
GROUP BY
[MatchEvents].[PlayerID], [MatchEvents].[EventType]
ORDER BY
[TOTAL] ESC
Current output:
PlayerID
EventType
Total
1
0
8
1
1
3
2
0
8
2
1
3
3
0
8
3
1
3
Expected output:
PlayerID
Total
1
11
2
11
3
11
How could I merge my current results further?
Thanks!
From your expected results it appears you just need to remove grouping by EventType
I would suggest the following:
select top(20) me.PlayerID, Count(*) as Total
from MatchEvents me
join [Match] m on m.Id = me.MatchId and m.Season = 1
where me.EventType in (0, 1)
group by me.PlayerID
order by Total desc;

How to get these rows as columns in an SQL query

I need some help in writing up this SQL query using a single table. Something like this
User ID
Category
Spend
Transactions
Country
1
Sport
30
2
USA
1
Bills
60
3
USA
2
Sport
10
1
MEX
3
Grocery
50
8
CAN
2
Grocery
70
4
MEX
3
Sport
20
5
CAN
3
Bills
30
2
CAN
1
Petrol
60
5
USA
I then want to group the rows by the User id and group the spend and transactions each by the category and having the country as a column by itself like this.
User ID
Sport_Spend
Bills_Spend
Grocery_Spend
Petrol_Spend
Sport_Transactions
Bills_Transactions
Grocery_Transactions
Petrol_Transactions
Country
1
30
60
0
60
2
3
0
5
USA
2
10
0
70
0
1
0
4
0
MEX
3
20
30
50
0
5
2
8
0
CAN
Its stumping me a bit would appreciate some help.
#jarlh comments are most relevant and need to be addressed. But here is something to start with: (ms sql code) (I opted out from transactions columns to reduce the problem, but the coding is just the same) https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=25550539029ba1c4be0826725bf9e00a
with data (UserID,Category,Spend,Transactions,Country) as(
select 1,'Sport',30,2,'USA' union all
select 1,'Bills',60,3,'USA' union all
select 2,'Sport',10,1,'MEX' union all
select 3,'Grocery',50,8,'CAN' union all
select 2,'Grocery',70,4,'MEX' union all
select 3,'Sport',20,5,'CAN' union all
select 3,'Bills',30,2,'CAN' union all
select 1,'Petrol',60,5,'USA'
)
select UserID
,isnull(SUM([Sport]),0)as Sport
,isnull(SUM([Bills]),0)as Bills
,isnull(SUM([Grocery]),0)as Grocery
,isnull(SUM([Petrol]),0)as Petrol
,MAX(Country)as Country
from (
select UserID,Category,Spend,Transactions,Country
from data) p
PIVOT(
SUM(SPEND)
For CATEGORY in ([Sport] ,[Bills] ,[Grocery] ,[Petrol])
)as PivotTable
group by UserID
select
COALESCE(user_id,0) as user_id,
COALESCE(Sport_Spend,0) as Sport_Spend,
COALESCE(Bills_Spend,0) as Bills_Spend,
COALESCE(Grocery_Spend,0) as Grocery_Spend,
COALESCE(Petrol_Spend,0) as Petrol_Spend,
COALESCE(Sport_Transactions,0) as Sport_Transactions,
COALESCE(Bills_Transactions,0) as Bills_Transactions,
COALESCE(Grocery_Transactions,0) as Grocery_Transactions,
COALESCE(Petrol_Transactions,0) as Petrol_Transactions
,country from
(SELECT DISTINCT user_id,country from table_name) as A
LEFT JOIN
(select user_id, spend as Sport_Spend ,transactions as Sport_Transactions from table_name where category='Sport') as B using (user_id)
LEFT JOIN
(select user_id, spend as Bills_Spend ,transactions as Bills_Transactions from table_name where category='Bills') as C using (user_id)
LEFT JOIN
(select user_id, spend as Grocery_Spend ,transactions as Grocery_Transactions from table_name where category='Grocery') as D using (user_id)
LEFT JOIN
(select user_id, spend as Petrol_Spend ,transactions as Petrol_Transactions from table_name where category='Petrol') as E using (user_id)
ORDER BY user_id;

How to select IDs that have at least two specific instaces in a given column

I'm working with a medical claim table in pyspark and I want to return only userid's that have at least 2 claim_ids. My table looks something like this:
claim_id | userid | diagnosis_type | claim_type
__________________________________________________
1 1 C100 M
2 1 C100a M
3 2 D50 F
5 3 G200 M
6 3 C100 M
7 4 C100a M
8 4 D50 F
9 4 A25 F
From this example, I would want to return userid's 1, 3, and 4 only. Currently I'm building a temp table to count all of the distinct instances of the claim_ids
create table temp.claim_count as
select distinct userid, count(distinct claim_id) as claims
from medical_claims
group by userid
and then pulling from this table when the number of claim_id >1
select distinct userid
from medical_claims
where userid (
select distinct userid
from temp.claim_count
where claims>1)
Is there a better / more efficient way of doing this?
If you want only the ids, then use group by:
select userid, count(*) as claims
from medical_claims
group by userid
having count(*) > 1;
If you want the original rows, then use window functions:
select mc.*
from (select mc.*, count(*) over (partition by userid) as num_claims
from medical_claims mc
) mc
where num_claims > 1;

Display all available users on a given day

I want to display all available users (user type: employee) on a given schedule date. They are not available if they are scheduled both day (PM/AM)
Here are my following tables:
User Types
TypeID TypeName
1 Admin
2 Employee
Users
UserID TypeID Name
1 1 Admin 1
2 2 Employee 1
3 2 Employee 2
4 1 Admin 2
5 2 Employee 3
6 2 Employee 4
7 2 Employee 5
Schedule
SchedID UserID SchedDate Day (PM/AM)
1 2 8/27/2013 PM
2 2 8/27/2013 AM
3 3 8/27/2013 AM
4 5 8/27/2013 PM
5 6 8/27/2013 AM
Expected Result (WHERE SchedDate='8/27/2013')
UserID Name
3 Employee 2
5 Employee 3
6 Employee 4
7 Employee 5
This is my current SQL statement:
SELECT Users.UserID, Users.Name FROM Users LEFT OUTER JOIN
Schedule ON Schedule.UserID = Users.UserID WHERE Users.TypeID = 5
Let's phrase this a little differently. A user is unavailable if the user has both AM and PM scheduled for the DAY column. Otherwise, the user is available.
Given that there are only two values in that column, the following query does the filtering you want:
SELECT u.UserID, u.Name
FROM Users u LEFT OUTER JOIN
Schedule s
ON s.UserID = u.UserID and
s.ScheduleDate = '2013-08-27'
WHERE u.TypeID = 5
GROUP BY u.UserID, u.Name
HAVING COUNT(distinct s.day) < 2;
If you know the values are never repeated, then you can change the having clause to:
HAVING COUNT(*) < 2;
This is a bit of a trick. When there is no match in the schedule table at all, the counts will return 0 (in the first case) or 1 (in the second case).
SELECT USERS.USERID,
USERS.NAME
FROM USERS
WHERE NOT EXISTS (SELECT SCHEDID
FROM SCHEDULE
WHERE SCHEDULE.USERID = USERS.USERID
AND DAY = 'AM')
AND NOT EXISTS (SELECT SCHEDID
FROM SCHEDULE
WHERE SCHEDULE.USERID = USERS.USERID
AND DAY = 'PM')

MySQL group by max value

Sample tables (many2many = users has many tickers and tickers has many users):
#users
id relevance
1 10
2 6
3 8
4 3
5 5
#users_tickers
user_id ticker_id
1 2
1 3
2 4
2 1
2 3
3 2
4 2
...
I must select users with max relevance for each ticker - so for each ticker one user with the best relevance.
How would you do that?
Something like this should do:
SELECT FROM users u
INNER JOIN users_tickers ut ON ut.user_id=u.id
WHERE NOT EXISTS(
SELECT FROM users u1
INNER JOIN users_tickers ut1 ON ut1.user_id=u1.id
WHERE ut1.ticker_id=ut.ticker_id AND u1.relevance > u.relevance
)
Here is how I would do this in T-SQL. I am not familiar with MySQL's dialect.
SELECT * FROM users AS U
INNER JOIN user_tickers t ON t.user_id = U.user_id
WHERE U.RELEVANCE = (
SELECT MAX(RELEVANCE) FROM users as usub WHERE U.user_id = usub.user_id
)
ORDER BY U.relevance DESC
Produces:
user_id relevance user_id ticker_id
1 10 1 2
1 10 1 3
3 8 3 2
2 6 2 4
2 6 2 1
2 6 2 3
4 3 4 2
I think you can do this:
SELECT *,
(SELECT U2.user_id FROM users as U2, users_tickers as UT2
WHERE UT2.ticker_id = T1.ticker_id
AND UT2.user_id = U2.user_id
ORDER BY U2.relevance
LIMIT 1)
FROM tickers as T1
I don't know if this is good code, but I think this is what you're looking for. Last time I did SQL everybody was saying Joins are slow, use subquerys. Don't know if that still applies.
EDIT: I tested it in MySQL with the values you gave (and an added tickers table) and it works.
EDIT: And of course you can change the 'n' users by changing LIMIT 1... to LIMIT n
SELECT * FROM users_tickers LEFT JOIN users ON users.id = users_tickers.user_id WHERE ticker_id = 2 ORDER BY relevance DESC LIMIT 1;
This would give you the info you requested for a given ticker. Is this what you are asking for? or a complete list of tickers with its most relevant user?
I'm looking for a fast sql. I did it like this...
SELECT *
FROM (
SELECT id, relevance, ticker_id
FROM `users`
INNER JOIN users_tickers ON users_tickers.user_id=users.id
ORDER BY users.relevance DESC, users.created_at DESC
) AS X
GROUP BY ticker_id
ORDER BY relevance DESC
LIMIT 0,10
Still far from optimal :).