Count how many times a user logged in 1x, 2x, 3x - sql

I'm just beginning to learn SQL and this has completely stumped me. I join two tables on user_id where the event was a login. So far so good. Then I need to group those occurrences and count them to return the answer. How many times did users log in 1x, 2x, 3x...?
What I am having trouble with is referencing the first count (occurrences) and the fact that I can't group by occurrences since it is an aggregate function.
Here is the code, it returns two columns, user_id and occurrences. The data is on www.mode.com.
SELECT
Users.user_id,
COUNT(Users.user_id) AS occurrences
FROM
tutorial.playbook_users Users
JOIN tutorial.playbook_events EVENTS ON Users.user_id = EVENTS.user_id
WHERE
EVENTS.event_name = 'login'
GROUP BY
1
ORDER BY
2

So just aggregate it again
SELECT
q.UserLogins AS occurrences,
COUNT(*) AS Total
FROM
(
SELECT
Users.user_id,
COUNT(EVENTS.user_id) AS UserLogins
FROM tutorial.playbook_users Users
JOIN tutorial.playbook_events EVENTS
ON EVENTS.user_id = Users.user_id
AND EVENTS.event_name = 'login'
GROUP BY Users.user_id
) q
GROUP BY q.UserLogins
ORDER BY q.UserLogins

Related

Trying to count the number of occurences that 3 columns from 2 tables have on my organizations table? I need the occurrences joined in one table

-- 2. In one table, show how many private topics, admins, and standard users each organization has.
SELECT organizations.name, COUNT(topics.privacy) AS private_topic, COUNT(users.type) AS user_admin, COUNT(users.type) AS user_standard
FROM organizations
LEFT JOIN topics
ON organizations.id=topics.org_id
AND topics.privacy='private'
LEFT JOIN users
ON users.org_id=organizations.id
AND users.type='admin'
LEFT JOIN users
ON users.org_id=organizations.id
AND users.type='standard'
GROUP BY organizations.name
;
org_id is the foreign key that reals both the users table and topics table. It keeps giving me the wrong result by only either counting the number of admins or standard users and putting that for all rows in the each column. Any help is really appreciated as I have been stuck on this for a while now!
So, I am getting an error when I do as you said which is that the users table cannot be specified more than once. I updated the code to how you said to write it but still nothing. They really don't give me any sample data either but I just made some queries and saw the number of times there are private topics for example, which is in the privacy column of the topics table. When I dont get this error as I said, the joins seem to overwrite themselves where each row for all the columns is the same as the last join.
It appears to me that topics and users have no relationship. You're just trying to get the result together in a single query. There are other and possibly better ways to accomplish that but I think this will fix what you've got already (assuming you have id columns for each table.)
SELECT
organizations.name,
COUNT(DISTINCT topics.id) AS private_topic,
COUNT(DISTINCT users.id) FILTER (WHERE users.type = 'admin') AS user_admin,
COUNT(DISTINCT users.id) FILTER (WHERE users.type = 'standard') AS user_standard`
FROM organizations
LEFT JOIN topics
ON organizations.id = topics.org_id AND topics.privacy = 'private'
LEFT JOIN users
ON users.org_id = organizations.id
GROUP BY organizations.name;
I propose this as a more straightforward way:
SELECT
min(o.name) as "name",
(
select count(*) from topics t
where t.org_id = o.id AND t.privacy = 'private'
) as private_topics,
(
select count(*) from users u
where u.org_id = o.id and u.type = 'admin'
) AS user_admin,
(
select count(*) from users u
where u.org_id = o.id and u.type = 'standard'
) AS user_standard
FROM organizations o
GROUP BY o.id;

How to count and group by column across one to many relationship while handling 0 case?

I am trying to formulate a single SQL query that will count a table across a one to many relationship. Here is the short version of my schema:
User(id)
Group(id)
UserGroup(user_id, group_id)
Post(id, user_id, group_id)
The goal is to return the count of posts for each user in a group. The specific issue I am running into is my current query cannot return 0 for a user that has no posts. Here is my naive query:
SELECT
COUNT(*) as total,
user_id
FROM
posts
WHERE
group_id = ?
GROUP BY user_id
ORDER BY
total DESC
This works fine when every user has a post, but when some have no posts, they do not show up in the list. How can I write a single query that handles this scenario and returns count 0 for said users? I know I need to somehow incorporate UserGroup to get the list of users, but am stuck from there.
Use a left join:
SELECT u.id, COUNT(*) as total
FROM users u LEFT JOIN
posts p
ON p.user_id = u.id AND
p.group_id = ?
GROUP BY u.id
ORDER BY total DESC
I think I got it, but not sure how performant.
select count(p), u.id from users u left join (select * from workouts where group_id = ?) p on p.user_id = u.id where u.id in (select user_id from user_group where group_id = ?) group by u.id;

LEFT JOIN discarding left rows in results?

Simplifying my issue, let's say I have two tables:
"Users" storing user_id and event_date from users who access each day.
"Purchases" storing user_id, event_date and product_id from users who make purchases each day.
I need to get from all users, their respective product purchases, or null value for product_id if a user didn't make a purchase. For that purpose I made this query:
with all_users as (
select user_id from `my_project.my_dataset.Users`
where event_date = "2019-12-01"
)
select user_id,product_id
from all_users
left join `my_project.my_dataset.Purchases`
using(user_id)
where event_date = "2019-12-01"
But this query returns only user_id who made purchases, in other words, there are rows in the LEFT from_item (all_users) that are being ommited in the result.
Is this working as spected? I read that LEFT JOIN always retains all rows of the left from_item.
EDIT 1:
Adding some screenshots:
This is the full query detailed before, but with real names (table "Users" is "user_metrics_daily" and table "Purchases" is "virtual_currency_daily"). As you can see, I added the count(distinct user_pseudo_id)OVER() to count how many distinct users are in the result.
In the other hand, this is a query to get the number of users I expect to have in the result (8935 users, with null values in product_id for users who don't purchase). But actually I got 2724 distinct users (the number of users who made purchases).
EDIT 2: I found a solution to my desired result, but still I don't understand what's wrong with my first query.
Your query (as it is) should return an error because user_id is ambiguous. BigQuery does not know if you want the column from all_users or my_project.my_dataset.Purchases.
Discarding that, you need to explicitly say from which table the projected columns should come from. In your case, user_id from all_users and product_id from my_project.my_dataset.Purchases.
with all_users as (
select user_id from `my_project.my_dataset.Users`
where event_date = "2019-12-01"
)
select
a.user_id,
p.product_id
from all_users as a
left join `my_project.my_dataset.Purchases` as p on a.user_id = p.user_id
where event_date = "2019-12-01"

Join two tables and with sum and fitering

I have two tables Users and Inputs with the following schema
Users - id, name, create_time
Inputs - id, user_id, create_time, amount
I've created 2 similar queries that select join data from the two tables.
The first query returns all users, adding a field called daily_amount which sums all the Inputs of each user in a given time range - Works fine
The second query - Adds a user id filter
I want to limit the query to a specific user by Id (id = 12 in the given query), im getting inconsistent results, I get a single record, but it has a different Id and the daily_amount is incorrect.
Your assistance is appreciated.
-- All users query - Works fine
SELECT Users.id,
Users.name,
Users.create_time,
SUM(Inputs.amount) AS daily_amount
FROM Users
LEFT JOIN Inputs ON Users.id = Inputs.user_id
AND Inputs.create_time BETWEEN startTime AND endTime
GROUP BY Users.id,
Users.name
-- User specific query
SELECT Users.id,
Users.name,
Users.create_time,
SUM(Inputs.amount) AS daily_amount
FROM Users
LEFT JOIN Inputs ON Users.id = Inputs.user_id
AND Users.id = 12 -- trying to filter only specific user by id
AND Inputs.create_time BETWEEN startTime AND endTime
GROUP BY Users.id,
Users.name
You must put the condition in the WHERE clause:
SELECT Users.id,
Users.name,
Users.create_time,
SUM(Inputs.amount) AS daily_amount
FROM Users
LEFT JOIN Inputs ON Users.id = Inputs.user_id
WHERE Users.id = 12
AND Inputs.create_time BETWEEN startTime AND endTime
GROUP BY Users.id, Users.name, Users.create_time

Too much Data using DISTINCT MAX

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows
You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username
There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1