How to pull the count of occurences from 2 SQL tables - sql

I am using python on a SQlite3 DB i created. I have the DB created and currently just using command line to try and get the sql statement correct.
I have 2 tables.
Table 1 - users
user_id, name, message_count
Table 2 - messages
id, date, message, user_id
When I setup table two, I added this statement in the creation of my messages table, but I have no clue what, if anything, it does:
FOREIGN KEY (user_id) REFERENCES users (user_id)
What I am trying to do is return a list containing the name and message count during 2020. I have used this statement to get the TOTAL number of posts in 2020, and it works:
SELECT COUNT(*) FROM messages WHERE substr(date,1,4)='2020';
But I am struggling with figuring out if I should Join the tables, or if there is a way to pull just the info I need. The statement I want would look something like this:
SELECT name, COUNT(*) FROM users JOIN messages ON messages.user_id = users.user_id WHERE substr(date,1,4)='2020';

One option uses a correlated subquery:
select u.*,
(
select count(*)
from messages m
where m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
) as cnt_messages
from users u
This query would take advantage of an index on messages(user_id, date).
You could also join and aggregate. If you want to allow users that have no messages, a left join is a appropriate:
select u.name, count(m.user_id) as cnt_messages
from users u
left join messages m
on m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
group by u.user_id, u.name
Note that it is more efficient to filter the date column against literal dates than applying a function on it (which precludes the use of an index).

You are missing a GROUP BY clause to group by user:
SELECT u.user_id, u.name, COUNT(*) AS counter
FROM users u JOIN messages m
ON m.user_id = u.user_id
WHERE substr(m.date,1,4)='2020'
GROUP BY u.user_id, u.name

Related

Join activities with users makes redundant rows, I just want first activity

I am on a postgres database
So I have a raw sql query like the following I intend to use to query the database
Select
acts.created_at as "firstActivity",
users.*
from users
join activities acts on acts.user_id = users.id
and acts.created_at > users.created_at
and acts.created_at < users.updated_at
where users.region_id='1'
the problem is that there are multiple activities in between the user's creation and update. The created_at and updated_at fields are of course dates like the following 2021-11-10 09:27:14+00
I would like to only return the first activity of those activities between the two times.
Take a look at DISTINCT ON (expression), this is probably what you need
Select DISTINCT ON (users.id) id, acts.created_at as "firstActivity",
from users
join activities acts on acts.user_id = users.id
and acts.created_at > users.created_at
and acts.created_at < users.updated_at
where users.region_id='1'
Order by users.id, acts.created_at
See documentation
well if I underattended your question right
you should use the function
min(created_at/updated_at)
in your query and you can add the condition of between the dates A and B
If you only want to join the first activity per user, then use a lateral join:
select
a.created_at as "firstActivity",
u.*
from users u
cross join lateral
(
select *
from activities acts
where acts.user_id = u.id
and acts.created_at > u.created_at
and acts.created_at < u.updated_at
order by acts.created_at
fetch first row only
) a
where u.region_id = '1';
By joining only the rows you are interested in, you prevent from getting a big intermediate result that you must then deal with afterwards.

How to print two attribute values from your Sub query table

Suppose I have two tables,
User
Post
Posts are made by Users (i.e. the Post Table will have foreign key of user)
Now my question is,
Print the details of all the users who have more than 10 posts
To solve this, I can type the following query and it would give me the desired result,
SELECT * from USER where user_id in (SELECT user_id from POST group by user_id having count(user_id) > 10)
The problem occurs when I also want to print the Count of the Posts along with the user details. Now obtaining the count of user is not possible from USER table. That can only be done from POST table. But, I can't get two values from my subquery, i.e. I can't do the following,
SELECT * from USER where user_id in (SELECT user_id, **count(user_id)** from POST group by user_id having count(user_id) > 10)
So, how do I resolve this issue? One solution I know is this, but this I think it would be a very naive way to resolve this and will make the query much more complex and also much more slow,
SELECT u.*, (SELECT po.count(user_id) from POST as po group by user_id having po.count(user_id) > 10) from USER u where u.user_id in (SELECT p.user_id from POST p group by user_id having p.count(user_id) > 10)
Is there any other way to solve this using subqueries?
Move the aggregation to the from clause:
SELECT u.*, p.num_posts
FROM user u JOIN
(SELECT p.user_id, COUNT(*) as num_posts
FROM post p
GROUP BY p.user_id
HAVING COUNT(*) > 10
) p
ON u.user_id = p.user_id;
You can do this with subqueries:
select u.*
from (select u.*,
(select count(*) from post p where p.user_id = u.user_id) as num_posts
from users u
) u
where num_posts > 10;
With an index on post(user_id), this might actually have better performance than the version using JOIN/GROUP BY.
You can try by joining the tables, Prefer to do a JOIN than using SUBQUERY
SELECT user.*, count( post.user_id ) as postcount
FROM user LEFT JOIN post ON users.user_id = post.user_id
GROUP BY post.user_id
HAVING postcount > 10 ;

SQL subquery rows as GROUP BY columns

I wanna have columns in the response for each task_type with counts grouped by date_trunc('day') and user_id. So once the whole query runs it would return task_type_1 column and the field value would be the number of tasks with that type for a given user for that given day.
So far I have this which runs but not sure how to add the task_type grouping to this query:
SELECT users.id AS user_id,
date_trunc('day', workforce_assigned_tasks.created_at) AS day,
SUM(workforce_assigned_tasks.duration) AS duration,
SUM(workforce_assigned_tasks.earnings_cents) AS earnings_cents,
SUM(workforce_assigned_tasks.subtask_count) AS subtask_count,
WHAT GOES HERE?
FROM users
JOIN workforce_assigned_tasks ON workforce_assigned_tasks.user_id = users.id
JOIN workforce_tasks ON workforce_assigned_tasks.workforce_task_id = workforce_tasks.id
GROUP BY date_trunc('day', workforce_assigned_tasks.created_at), users.id;
You can use conditional aggregation, which in Postgres uses the FILTER clause:
SELECT u.id AS user_id, date_trunc('day', wat.created_at) AS day,
SUM(wat.duration) AS duration,
SUM(wat.earnings_cents) AS earnings_cents,
SUM(wat.subtask_count) AS subtask_count,
COUNT(*) FILTER (WHERE wt.task_type_1 = 1)
FROM users u JOIN
workforce_assigned_tasks wat
ON wat.user_id = u.id JOIN
workforce_tasks wt
ON wat.workforce_task_id = wt.id
GROUP BY date_trunc('day', wat.created_at), u.id;
I am guessing that task_type is in workforce_tasks.
Note that the use of table aliases makes the query easier to write and to read.

PostgreSQL: Get the count of rows in a join query

I am trying to get some data joining few tables. I have an audit table where I store the audits for actions performed by users. I am trying to get the list of users in the order of the number audits they have and the number of audits. I have the following query:
SELECT s.user_created,
u.first_name,
u.last_name,
u.email,
a.message as audits
FROM cbrain_school s
inner join ugrp_user u on s.user_created = u.user_id
inner join audit a on u.user_id = a.user_id
order by u.created_time desc;
This query will give me 1 row per entry in the audit table. I just want 1 row per user and the count of entries in the audit table ordered by the number of audits.
Is there any way to do that. I was getting an error when I tried to include count() in the above query
First of all you are joining with the table cbrain_school. Why? You are selecting no data from this table (except for s.user_created which is simply u.user_id). I suppose you want to limit the users show to the cbrain_school.user_created? Then use EXISTS or IN to look this up.
select u.user_id, u.first_name, u.last_name, u.email, a.message as audits
from ugrp_user u
inner join audit a on u.user_id = a.user_id
where u.user_id in (select user_created from cbrain_school)
order by u.created_time desc;
This shows much better that cbrain_school.user_created is mere criteria. (But the query result is the same, of course.) It's a good habit to avoid joins, when you are not really interested in the joined rows.
Now you don't want to show each message anymore, but merely count them per user. So rather then joining messages, you should join the message count:
select u.user_id, u.first_name, u.last_name, u.email, a.cnt
from ugrp_user u
inner join
(
select user_id, count(*) as cnt
from audit
group by user_id
) a on u.user_id = a.user_id
where u.user_id in (select user_created from cbrain_school)
order by u.created_time desc;
(You could also join all messages and only then aggregate, but I don't recommend this. It would work for this and many other queries, but is prone to errors when working with multiple tables, where you might suddenly count or add up values multifold. It's a good habit to join before aggregating.)

Best way to construct this query? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Retrieving the last record in each group
I have two tables set up similar to this (simplified for the quest):
actions-
id - user_id - action - time
users -
id - name
I want to output the latest action for each user. I have no idea how to go about it.
I'm not great with SQL, but from what I've looked up, it should look something like the following. not sure though.
SELECT `users`.`name`, *
FROM users, actions
JOIN < not sure what to put here >
ORDER BY `actions`.`time` DESC
< only one per user_id >
Any help would be appreciated.
SELECT * FROM users JOIN actions ON actions.id=(SELECT id FROM actions WHERE user_id=users.id ORDER BY time DESC LIMIT 1);
you need to do a groupwise max - please refer to examples here http://jan.kneschke.de/projects/mysql/groupwise-max/
here's an example i did for somone else which is similar to your requirements:
http://pastie.org/925108
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
select u.name, a.action, a.time
from user u, action a
where u.id = a.user_id
and a.time in (select max(time) from action where user_id = u.user_id group by user_id )
note untested - but this should be the pattern
DECLARE #Table (ID Int, User_ID, Time DateTime)
-- This gets the latest entry for each user
INSERT INTO #Table (ID, User_ID, Time)
SELECT ID, User_ID, MAX(TIME)
FROM actions z
INNER JOIN users x on x.ID = z.ID
GROUP BY z. userID
-- Join to get resulting action
SELECT z.user_ID, z.Action
FROM actions z
INNER JOIN #Table x on x.ID = z.ID
This is the greatest-n-per-group problem that comes up frequently on Stack Overflow. Follow the tag for dozens of other posts on this problem.
Here's how to do it in MySQL given your schema with no subqueries and no GROUP BY:
SELECT u.*, a1.*
FROM users u JOIN actions a1 ON (u.id = a1.user_id)
LEFT OUTER JOIN actions a2 ON (u.id = a2.user_id AND a1.time < a2.time)
WHERE a2.id IS NULL;
In other words, show the user with her action such that if we search for another action with the same user and a later time, we find none.
It seems to me that the following will be works
WITH GetMaxTimePerUser (user_id, time) (
SELECT user_id, MAX(time)
FROM actions
GROUP BY user_id
)
SELECT u.name, a.action, amax.time
FROM actions AS a
INNER JOIN users AS u ON u.id=a.user_id
INNER JOIN GetMaxTimePerUser AS u_maxtime ON u_maxtime.user_id=u.id
WHERE a.time=u_maxtime.time
Usage of temporary named result set (common table expression or CTE) without subqueries and OUTER JOIN is the way best opened for query optimization. (CTE is something like a VIEW but existing only virtual or inline)