SQL subquery rows as GROUP BY columns - sql

I wanna have columns in the response for each task_type with counts grouped by date_trunc('day') and user_id. So once the whole query runs it would return task_type_1 column and the field value would be the number of tasks with that type for a given user for that given day.
So far I have this which runs but not sure how to add the task_type grouping to this query:
SELECT users.id AS user_id,
date_trunc('day', workforce_assigned_tasks.created_at) AS day,
SUM(workforce_assigned_tasks.duration) AS duration,
SUM(workforce_assigned_tasks.earnings_cents) AS earnings_cents,
SUM(workforce_assigned_tasks.subtask_count) AS subtask_count,
WHAT GOES HERE?
FROM users
JOIN workforce_assigned_tasks ON workforce_assigned_tasks.user_id = users.id
JOIN workforce_tasks ON workforce_assigned_tasks.workforce_task_id = workforce_tasks.id
GROUP BY date_trunc('day', workforce_assigned_tasks.created_at), users.id;

You can use conditional aggregation, which in Postgres uses the FILTER clause:
SELECT u.id AS user_id, date_trunc('day', wat.created_at) AS day,
SUM(wat.duration) AS duration,
SUM(wat.earnings_cents) AS earnings_cents,
SUM(wat.subtask_count) AS subtask_count,
COUNT(*) FILTER (WHERE wt.task_type_1 = 1)
FROM users u JOIN
workforce_assigned_tasks wat
ON wat.user_id = u.id JOIN
workforce_tasks wt
ON wat.workforce_task_id = wt.id
GROUP BY date_trunc('day', wat.created_at), u.id;
I am guessing that task_type is in workforce_tasks.
Note that the use of table aliases makes the query easier to write and to read.

Related

Join activities with users makes redundant rows, I just want first activity

I am on a postgres database
So I have a raw sql query like the following I intend to use to query the database
Select
acts.created_at as "firstActivity",
users.*
from users
join activities acts on acts.user_id = users.id
and acts.created_at > users.created_at
and acts.created_at < users.updated_at
where users.region_id='1'
the problem is that there are multiple activities in between the user's creation and update. The created_at and updated_at fields are of course dates like the following 2021-11-10 09:27:14+00
I would like to only return the first activity of those activities between the two times.
Take a look at DISTINCT ON (expression), this is probably what you need
Select DISTINCT ON (users.id) id, acts.created_at as "firstActivity",
from users
join activities acts on acts.user_id = users.id
and acts.created_at > users.created_at
and acts.created_at < users.updated_at
where users.region_id='1'
Order by users.id, acts.created_at
See documentation
well if I underattended your question right
you should use the function
min(created_at/updated_at)
in your query and you can add the condition of between the dates A and B
If you only want to join the first activity per user, then use a lateral join:
select
a.created_at as "firstActivity",
u.*
from users u
cross join lateral
(
select *
from activities acts
where acts.user_id = u.id
and acts.created_at > u.created_at
and acts.created_at < u.updated_at
order by acts.created_at
fetch first row only
) a
where u.region_id = '1';
By joining only the rows you are interested in, you prevent from getting a big intermediate result that you must then deal with afterwards.

right way to alias count * in a subquery

I have query below as
select t.comment_count, count(*) as frequency
from
(select u.id, count(c.user_id) as comment_count
from users u
left join comments c
on u.id = c.user_id
and c.created_at between '2020-01-01' and '2020-01-31'
group by 1) t
group by 1
order by 1
when I also try to alias the count(*) as count(t.*) it gives error, can I not alias that with the t from the table? Not sure what I am missing
Thank you
Count(*) stands for the count of all rows returned by a query (with respect to GROUP BY columns). So it makes no sence to specify one of the involved tables. Consider counting rows produced by a join for example. If you need a count of rows of the specific table t you can use count(distinct t.<unique column>)

How to pull the count of occurences from 2 SQL tables

I am using python on a SQlite3 DB i created. I have the DB created and currently just using command line to try and get the sql statement correct.
I have 2 tables.
Table 1 - users
user_id, name, message_count
Table 2 - messages
id, date, message, user_id
When I setup table two, I added this statement in the creation of my messages table, but I have no clue what, if anything, it does:
FOREIGN KEY (user_id) REFERENCES users (user_id)
What I am trying to do is return a list containing the name and message count during 2020. I have used this statement to get the TOTAL number of posts in 2020, and it works:
SELECT COUNT(*) FROM messages WHERE substr(date,1,4)='2020';
But I am struggling with figuring out if I should Join the tables, or if there is a way to pull just the info I need. The statement I want would look something like this:
SELECT name, COUNT(*) FROM users JOIN messages ON messages.user_id = users.user_id WHERE substr(date,1,4)='2020';
One option uses a correlated subquery:
select u.*,
(
select count(*)
from messages m
where m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
) as cnt_messages
from users u
This query would take advantage of an index on messages(user_id, date).
You could also join and aggregate. If you want to allow users that have no messages, a left join is a appropriate:
select u.name, count(m.user_id) as cnt_messages
from users u
left join messages m
on m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
group by u.user_id, u.name
Note that it is more efficient to filter the date column against literal dates than applying a function on it (which precludes the use of an index).
You are missing a GROUP BY clause to group by user:
SELECT u.user_id, u.name, COUNT(*) AS counter
FROM users u JOIN messages m
ON m.user_id = u.user_id
WHERE substr(m.date,1,4)='2020'
GROUP BY u.user_id, u.name

Too much Data using DISTINCT MAX

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows
You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username
There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1

Using SQL Aggregate Functions With Multiple Joins

I am attempting to use multiple aggregate functions across multiple tables in a single SQL query (using Postgres).
My table is structured similar to the following:
CREATE TABLE user (user_id INT PRIMARY KEY, user_date_created TIMESTAMP NOT NULL);
CREATE TABLE item_sold (item_sold_id INT PRIMARY KEY, sold_user_id INT NOT NULL);
CREATE TABLE item_bought (item_bought_id INT PRIMARY KEY, bought_user_id INT NOT NULL);
I want to count the number of items bought and sold for each user. The solution I thought up does not work:
SELECT user_id, COUNT(item_sold_id), COUNT(item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
That seems to perform all the combinations of (item_sold_id, item_bought_id), e.g. if there are 4 sold and 2 bought, both COUNT()s are 8.
How can I properly query the table to obtain both counts?
The easy fix to your query is to use distinct:
SELECT user_id, COUNT(distinct item_sold_id), COUNT(distinct item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
However, the query is doing unnecessary work. If someone has 100 items bought and 200 items sold, then the join produces 20,000 intermediate rows. That is a lot.
The solution is to pre-aggregate the results or use a correlated subquery in the select. In this case, I prefer the correlated subquery solution (assuming the right indexes are available):
SELECT u.user_id,
(select count(*) from item_sold s where u.user_id = s.sold_user_id),
(select count(*) from item_bought b where u.user_id = b.bought_user_id)
FROM user u
WHERE u.user_date_created > '2014-01-01';
The right indexes are item_sold(sold_user_id) and item_bought(bought_user_id). I prefer this over pre-aggregation because of the filtering on the user table. This only does the calculations for users created this year -- that is harder to do with pre-aggregation.
SQL Fiddle
With a lateral join it is possible to pre aggregate only the filtered users
select user_id, total_item_sold, total_item_bought
from
"user" u
left join lateral (
select sold_user_id, count(*) as total_item_sold
from item_sold
where sold_user_id = u.user_id
group by sold_user_id
) item_sold on user_id = sold_user_id
left join lateral (
select bought_user_id, count(*) as total_item_bought
from item_bought
where bought_user_id = u.user_id
group by bought_user_id
) item_bought on user_id = bought_user_id
where u.user_date_created >= '2014-01-01'
Notice that you need >= in the filter otherwise it is possible to miss the exact first moment of the year. Although that timestamp is unlikely with naturally entered data, it is common with an automated job.
Another way to solve this problem is to use two nested selects.
select user_id,
(select count(*) from item_sold where sold_user_id = user_id),
(select count(*) from item_bought where bought_user_id = user_id)
from user
where user_date_created > '2014-01-01'