Join activities with users makes redundant rows, I just want first activity - sql

I am on a postgres database
So I have a raw sql query like the following I intend to use to query the database
Select
acts.created_at as "firstActivity",
users.*
from users
join activities acts on acts.user_id = users.id
and acts.created_at > users.created_at
and acts.created_at < users.updated_at
where users.region_id='1'
the problem is that there are multiple activities in between the user's creation and update. The created_at and updated_at fields are of course dates like the following 2021-11-10 09:27:14+00
I would like to only return the first activity of those activities between the two times.

Take a look at DISTINCT ON (expression), this is probably what you need
Select DISTINCT ON (users.id) id, acts.created_at as "firstActivity",
from users
join activities acts on acts.user_id = users.id
and acts.created_at > users.created_at
and acts.created_at < users.updated_at
where users.region_id='1'
Order by users.id, acts.created_at
See documentation

well if I underattended your question right
you should use the function
min(created_at/updated_at)
in your query and you can add the condition of between the dates A and B

If you only want to join the first activity per user, then use a lateral join:
select
a.created_at as "firstActivity",
u.*
from users u
cross join lateral
(
select *
from activities acts
where acts.user_id = u.id
and acts.created_at > u.created_at
and acts.created_at < u.updated_at
order by acts.created_at
fetch first row only
) a
where u.region_id = '1';
By joining only the rows you are interested in, you prevent from getting a big intermediate result that you must then deal with afterwards.

Related

How to pull the count of occurences from 2 SQL tables

I am using python on a SQlite3 DB i created. I have the DB created and currently just using command line to try and get the sql statement correct.
I have 2 tables.
Table 1 - users
user_id, name, message_count
Table 2 - messages
id, date, message, user_id
When I setup table two, I added this statement in the creation of my messages table, but I have no clue what, if anything, it does:
FOREIGN KEY (user_id) REFERENCES users (user_id)
What I am trying to do is return a list containing the name and message count during 2020. I have used this statement to get the TOTAL number of posts in 2020, and it works:
SELECT COUNT(*) FROM messages WHERE substr(date,1,4)='2020';
But I am struggling with figuring out if I should Join the tables, or if there is a way to pull just the info I need. The statement I want would look something like this:
SELECT name, COUNT(*) FROM users JOIN messages ON messages.user_id = users.user_id WHERE substr(date,1,4)='2020';
One option uses a correlated subquery:
select u.*,
(
select count(*)
from messages m
where m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
) as cnt_messages
from users u
This query would take advantage of an index on messages(user_id, date).
You could also join and aggregate. If you want to allow users that have no messages, a left join is a appropriate:
select u.name, count(m.user_id) as cnt_messages
from users u
left join messages m
on m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
group by u.user_id, u.name
Note that it is more efficient to filter the date column against literal dates than applying a function on it (which precludes the use of an index).
You are missing a GROUP BY clause to group by user:
SELECT u.user_id, u.name, COUNT(*) AS counter
FROM users u JOIN messages m
ON m.user_id = u.user_id
WHERE substr(m.date,1,4)='2020'
GROUP BY u.user_id, u.name

Best approach for limiting rows coming back in SQL when joining for a sum

I need to get back a list of users and the total amount that they have ordered. In reality my query is more complex but I think this sums it up. My issue is, if a user made 5 orders for example, I'll get back their name and the total they've ordered 5 times due to the join (having 5 rows in the order table for that user).
What's the recommended approach for when you need to total the records in one table that has multiple rows without requiring many rows to come back? distinct could work but is this the best? (especially when my select chooses more information than what's below)
SELECT user.name, sum(order.amount) FROM USER user
INNER JOIN USER_ORDERS order
ON (user.user_id = order.user_id)
Are you just looking for GROUP BY?
SELECT u.name, SUM(o.amount)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Note that this has included user_id in the GROUP BY, just in case two users have the same name.
If you want all users, even those without orders, then you want a LEFT JOIN:
SELECT u.name, SUM(o.amount)
FROM USER u LEFT JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Or a correlated subquery:
SELECT u.name,
(SELECT SUM(o.amount)
FROM USER_ORDERS uo
WHERE u.user_id = uo.user_id
)
FROM USER u;
You could use the analytic version of SUM.
SELECT u.name, SUM(o.amount) OVER(PARTITION BY u.name)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id;

SQL Statement JOIN only returns elements with association

I have Acts by Users who are joined to Groups by Memberships in a PostgreSQL db.
I have a query to generate rows for a leaderboard. However, it currently excludes Users where the Acts table does not include a row with the corresponding users_id. I want to include all group members, even those with 0 Acts.
The current query:
SELECT acts.users_id, username, avatar_url, COUNT(acts.id)
FROM acts
JOIN users ON acts.users_id = users.id
JOIN memberships on memberships.users_id = users.id
WHERE memberships.groups_id = ' + req.params.group_id + '
AND acts.created_at >= (CURRENT_DATE - 6)
GROUP BY acts.users_id, username, avatar_url
ORDER BY COUNT(acts.id) DESC
I have tried changing JOIN before users to RIGHT JOIN and LEFT JOIN, but I get the same result. At one point, I think RIGHT JOIN was working, but somehow, I have gone awry.
I want to include all group members, even those with 0 Acts.
Tripping wires:
If you want include members with 0 Acts, you cannot return acts.users_id. Use memberships.users_id instead.
The condition a.created_at >= (CURRENT_DATE - 6) in the WHERE clause voids all attempts with LEFT JOIN. Move that condition into the JOIN clause. See:
Postgres Left Join with where condition
SELECT m.users_id -- !!!
, u.username, avatar_url
, COUNT(a.users_id) AS ct_acts
FROM memberships m
JOIN users u ON m.users_id = u.id
LEFT JOIN acts a ON a.users_id = u.id
AND a.created_at >= (CURRENT_DATE - 6) -- !!!
WHERE m.groups_id = ' + req.params.group_id + '
GROUP BY 1, 2, 3
ORDER BY COUNT(*) DESC;
Assuming referential integrity between memberships and users (FK constraint), so the join to users can remain as [INNER] JOIN.
Also assuming "all group members" is suppose to mean all WHERE m.groups_id = ' + req.params.group_id + ', or we need to do more.
But what exactly are you counting there? Currently, this looks like a multiplication of acts with group memberships. May be a misunderstanding. See:
Two SQL LEFT JOINS produce incorrect result
Depending on exact table definitions and what you want to count, exactly, there might be a faster query ...
I have had issues like this before. What I would do is remove all the where statements and the joins. Start by left joining the users to acts only and see if the query retains the inactive users that you wanted. Likewise try a left join between users and memberships. Once you have a query of two tables with users that do not exist in the ACT table. Join the third table with the output of the first two. And then finally apply your where statement and group by count.
If you want to return users who don't have acts, you need to start from an other table than acts.
You can for exemple try to start from the users table instead of the acts :
SELECT acts.users_id, username, avatar_url, COUNT(acts.id)
FROM users
LEFT JOIN acts ON acts.users_id = users.id
LEFT JOIN memberships on memberships.users_id = users.id
WHERE memberships.groups_id = ' + req.params.group_id + '
AND acts.created_at >= (CURRENT_DATE - 6)
GROUP BY acts.users_id, username, avatar_url
ORDER BY COUNT(acts.id) DESC
This should return you all the users, even if they don't have an acts or a memberships.

Too much Data using DISTINCT MAX

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows
You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username
There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1

PostgreSQL: Get the count of rows in a join query

I am trying to get some data joining few tables. I have an audit table where I store the audits for actions performed by users. I am trying to get the list of users in the order of the number audits they have and the number of audits. I have the following query:
SELECT s.user_created,
u.first_name,
u.last_name,
u.email,
a.message as audits
FROM cbrain_school s
inner join ugrp_user u on s.user_created = u.user_id
inner join audit a on u.user_id = a.user_id
order by u.created_time desc;
This query will give me 1 row per entry in the audit table. I just want 1 row per user and the count of entries in the audit table ordered by the number of audits.
Is there any way to do that. I was getting an error when I tried to include count() in the above query
First of all you are joining with the table cbrain_school. Why? You are selecting no data from this table (except for s.user_created which is simply u.user_id). I suppose you want to limit the users show to the cbrain_school.user_created? Then use EXISTS or IN to look this up.
select u.user_id, u.first_name, u.last_name, u.email, a.message as audits
from ugrp_user u
inner join audit a on u.user_id = a.user_id
where u.user_id in (select user_created from cbrain_school)
order by u.created_time desc;
This shows much better that cbrain_school.user_created is mere criteria. (But the query result is the same, of course.) It's a good habit to avoid joins, when you are not really interested in the joined rows.
Now you don't want to show each message anymore, but merely count them per user. So rather then joining messages, you should join the message count:
select u.user_id, u.first_name, u.last_name, u.email, a.cnt
from ugrp_user u
inner join
(
select user_id, count(*) as cnt
from audit
group by user_id
) a on u.user_id = a.user_id
where u.user_id in (select user_created from cbrain_school)
order by u.created_time desc;
(You could also join all messages and only then aggregate, but I don't recommend this. It would work for this and many other queries, but is prone to errors when working with multiple tables, where you might suddenly count or add up values multifold. It's a good habit to join before aggregating.)