Group by multiple columns and limit per group - Postgres - sql

I'm creating a messaging app as a side project and I'm trying to query a user's conversations efficiently.
The messages table structure is basic right now with some dummy data:
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | time |
| 2 | 2 | 1 | text | time |
| 3 | 1 | 2 | text | time |
| 4 | 1 | 3 | text | time |
| 5 | 3 | 2 | text | time |
| 6 | 3 | 1 | text | time |
| 7 | 2 | 1 | text | time |
I'd like to be able to query the DB and group by "conversation" - A.K.A any rows that have the same sender_id or receiver_id in either column - rows (1, 2, 3, 7), (4, 6), (5). I'd like to be able to limit each "group" to n rows and order them by the created_at column. It would ideally look like (created_at values are arbitrary numbers to show descending values):
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | 400 |
| 2 | 2 | 1 | text | 300 |
| 3 | 1 | 2 | text | 200 |
| 7 | 2 | 1 | text | 100 |
| 4 | 1 | 3 | text | 700 |
| 6 | 3 | 1 | text | 500 |
| 5 | 3 | 2 | text | 300 |
Ideally there would be an additional column added that would number each group (easy to create multi-dimensional array.
So far I've been able to "group" by sender/receiver ids, order by created_at, and limit the number per group. However, It's not quite right. Here's the query:
SELECT
filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
FROM (
SELECT messages.*,
rank() OVER (
PARTITION BY sender_id
ORDER BY created_at DESC
)
FROM messages
WHERE messages.sender_id = 1 or messages.receiver_id = 1
) filter WHERE rank <= 50;
My result set looks like this:
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | 400 |
| 3 | 1 | 2 | text | 300 |
| 4 | 1 | 3 | text | 700 |
| 2 | 2 | 1 | text | 300 |
| 7 | 2 | 1 | text | 100 |
| 6 | 3 | 1 | text | 500 |
| 5 | 3 | 2 | text | 300 |
You can see that row 3 and 6 should be grouped but aren't.

You can use rank(). To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id):
select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
from (
select
t.*,
rank() over(
partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)
order by created_at desc
) rn
from mytable t
) t
where rn <= 50
order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn

Related

Get row for each unique user based on highest column value

I have the following data
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
+--------+-----------+--------+
I am trying to find the distribution of each user's Rating, based on their latest row in the table (latest is determined by Timestamp). On the path to that, I am trying to get a list of user IDs and Ratings which would look like the following
+--------+--------+
| UserId | Rating |
+--------+--------+
| 1 | 1198 |
| 2 | 1202 |
+--------+--------+
Trying to get here, I sorted the list on UserId and Timestamp (desc) which gives the following.
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
+--------+-----------+--------+
So now I just need to take the top N rows, where N is the number of players. But, I can't do a LIMIT statement as that needs a constant expression, as I want to use count(id) as the input for LIMIT which doesn't seem to work.
Any suggestions on how I can get the data I need?
Cheers!
Andy
This should work:
SELECT test.UserId, Rating FROM test
JOIN
(select UserId, MAX(Timestamp) Timestamp FROM test GROUP BY UserId) m
ON test.UserId = m.UserId AND test.Timestamp = m.Timestamp
If you can use WINDOW FUNCTIONS then you can use the following:
SELECT UserId, Rating FROM(
SELECT UserId, Rating, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Timestamp DESC) row_num FROM test
)m WHERE row_num = 1

Postgresql: Group rows in a row and add array

Hi i have a table like this;
+----+----------+-------------+
| id | room_id | house_id |
+----+----------+-------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 1 | 2 |
| 5 | 2 | 2 |
| 6 | 3 | 2 |
| 7 | 1 | 3 |
| 8 | 2 | 3 |
| 9 | 3 | 3 |
+----+-------+----------------+
and i want to create a view like this
+----+----------+-------------+
| id | house_id | rooms |
+----+----------+-------------+
| 1 | 1 | [1,2,3] |
| 2 | 2 | [1,2,3] |
| 3 | 3 | [1,2,3] |
+----+-------+----------------+
i tried many ways but i cant gruop them in one line
Thanks for any help.
You can use array_agg():
select house_id, array_agg(room_id order by room_id) as rooms
from t
group by house_id;
If you want the first column to be incremental, you can use row_number():
select row_number() over (order by house_id) as id, . . .

Count rows in table that are the same in a sequence

I have a table that looks like this
+----+------------+------+
| ID | Session_ID | Type |
+----+------------+------+
| 1 | 1 | 2 |
| 2 | 1 | 4 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 3 | 2 |
| 7 | 3 | 1 |
+----+------------+------+
And I would like to count all occurences of a type that are in a sequence.
Output look some how like this:
+------------+------+-----+
| Session_ID | Type | cnt |
+------------+------+-----+
| 1 | 2 | 1 |
| 1 | 4 | 1 |
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 1 |
| 3 | 1 | 1 |
+------------+------+-----+
A simple group by like
SELECT session_id, type, COUNT(type)
FROM table
GROUP BY session_id, type
doesn't work, since I need to group only rows that are "touching".
Is this possible with a merge sql-select or will I need some sort of coding. Stored Procedure or Application side coding?
UPDATE Sequence:
If the following row has the same type, it should be counted (ordered by ID).
to determine the sequence the ID is the key with the session_ID, since I just want to group rows with the same session_ID.
So if there are 3 rows is in one session
row with the ID 1 has type 1,
and the second row has type 1
and row 3 has type 2
Input:
+----+------------+------+
| ID | Session_ID | Type |
+----+------------+------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
+----+------------+------+
The squence is Row 1 to Row 2. This three row should output
Output:
+------------+------+-------+
| Session_ID | Type | count |
+------------+------+-------+
| 1 | 1 | 2 |
| 3 | 2 | 1 |
+------------+------+-------+
You can use a difference of id and row_number() to identify the gaps and then perform your count
;with cte as
(
Select *, id - row_number() over (partition by session_id,type order by id) as grp
from table
)
select session_id,type,count(*) as cnt
from cte
group by session_id,type,grp
order by max(id)

Limit a sorted number of rows joined

I have two tables, A and B, and a join table M. I want to, for each A.id, get the top 2 B.id's sorting on the value in table M, producing the results below. This is running on an Azure SQL database
Table A Table M Table B
+-----+ +-----+-----+-------+ +-----+
| Id | | AId | BId | Value | | Id |
+-----+ +-----+-----+-------+ +-----+
| 1 | | 1 | 3 | 4 | | 1 |
| 2 | | 1 | 2 | 3 | | 2 |
| 3 | | 3 | 2 | 3 | | 3 |
| 4 | | 3 | 5 | 6 | | 4 |
+-----+ | 3 | 3 | 4 | | 5 |
| 4 | 1 | 2 | +-----+
| 4 | 2 | 1 |
| 4 | 4 | 3 |
+-----+-----+-------+
Result
+-----+-----+-------+
| AId | BId | Value |
+-----+-----+-------+
| 1 | 3 | 4 |
| 1 | 2 | 3 |
| 3 | 5 | 6 |
| 3 | 3 | 4 |
| 4 | 1 | 2 |
| 4 | 4 | 3 |
+-----+-----+-------+
I know that I can select all the M.AId rows where they equal 1, sort it, and limit by 2, but I need to do this for every row in Table A. I've made an attempt to use group by, but I wasn't sure how to sort and limit it. I've also tried to search for resources associated with this issue but I couldn't find any resources.
(I also wasn't sure how to word the title for this issue)
You can just use ROW_NUMBER:
SELECT
AId, BId, Value
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY AId ORDER BY Value DESC)
FROM M
) t
WHERE Rn <= 2

SQL Exclude Records with Leveling

For example, I have a table like this:
+---------+-------+----------+
| sort_id | level | security |
+---------+-------+----------+
| 1 | 1 | A |
| 2 | 2 | A |
| 3 | 3 | U |
| 4 | 4 | A |
| 5 | 5 | A |
| 6 | 3 | A |
| 7 | 4 | U |
| 8 | 5 | A |
| 9 | 6 | A |
| 10 | 7 | A |
| 11 | 3 | A |
| 12 | 3 | A |
+---------+-------+----------+
Security column is A for Authorized and U for Unauthorized. I need to exclude those records under the Unauthorized records based on their level.
For a better picture of the SQL records, it looks like this:
Those pointed with arrow are the Unauthorized records and we should exclude those under it.
So the SQL result should be the following table:
+---------+-------+----------+
| sort_id | level | security |
+---------+-------+----------+
| 1 | 1 | A |
| 2 | 2 | A |
| 3 | 3 | U |
| 6 | 3 | A |
| 7 | 4 | U |
| 11 | 3 | A |
| 12 | 3 | A |
+---------+-------+----------+
How can we produce it using a simple Select statement? Thanks in advanced! Just comment if something is unclear.
If I understand "under the unauthorized records" as meaning a sequence of records with increasing id`s following the unauthorized records (based on the id), then here is an approach:
select sort_id, level, security
from (select t.*, min(case when authorized = 'U' then id end) over (partition by grp) as minuid
from (select t.*,
(row_number() over (order by id) - level) as grp
from table t
) t
) t
where id > minuid;