First two rows per combination of two columns - sql

Given a table like this in PostgreSQL:
Messages
message_id | creating_user_id | receiving_user_id | created_utc
-----------+------------------+-------------------+-------------
1 | 1 | 2 | 1424816011
2 | 3 | 2 | 1424816012
3 | 3 | 2 | 1424816013
4 | 1 | 3 | 1424816014
5 | 1 | 3 | 1424816015
6 | 2 | 1 | 1424816016
7 | 2 | 1 | 1424816017
8 | 1 | 2 | 1424816018
I want to get the newest two rows per creating_user_id/receiving_user_id where the other user_id is 1. So the result of the query should look like:
message_id | creating_user_id | receiving_user_id | created_utc
-----------+------------------+-------------------+-------------
1 | 1 | 2 | 1424816011
4 | 1 | 3 | 1424816014
5 | 1 | 3 | 1424816015
6 | 2 | 1 | 1424816016
Using a window function with row_number() I can get the first 2 messages for each creating_user_id or the first 2 messages for each receiving_user_id, but I'm not sure how to get the first two messages for per creating_user_id/receiving_user_id.

Since you filter rows where one of both columns is 1 (and irrelevant), and 1 happens to be the smallest number of all, you can simply use GREATEST(creating_user_id, receiving_user_id) to distill the relevant number to PARTITION BY. (Else you could employ CASE.)
The rest is standard procedure: calculate a row number in a subquery and select the first two in the outer query:
SELECT message_id, creating_user_id, receiving_user_id, created_utc
FROM (
SELECT *
, row_number() OVER (PARTITION BY GREATEST (creating_user_id
, receiving_user_id)
ORDER BY created_utc) AS rn
FROM messages
WHERE 1 IN (creating_user_id, receiving_user_id)
) sub
WHERE rn < 3
ORDER BY created_utc;
Exactly your result.
SQL Fiddle.

Related

How to combine previously grouped rows by one or more shared rows?

In my PostgreSQL database, I have two separate queries (q1, q2) joining across multiple tables assigning the same items to different groups (I call these subgroups) based on different criteria. I get query result 1 and 2 (qr1, qr2).
An item might appear in one or both, but within a result it is unique.
I want to assign a new group id based on both subgroups and assigning the same group id if the subgroups share one or more items.
qr1 qr2
+----------+-------------+ +----------+-------------+
| item | subgroup1 | | item | subgroup2 |
+----------+-------------+ +----------+-------------+
| 1 | 1 | | 1 | 5 |
| 2 | 1 | | 5 | 6 |
| 3 | 2 | | 6 | 6 |
| 4 | 3 | | 7 | 7 |
| 5 | 3 | | 8 | 5 |
| 6 | 4 | | 10 | 5 |
+----------+-------------+ +----------+-------------+
combined (interested in item and group):
+---------+------------+------------+-------+
| item | subgroup1 | subgroup2 | group |
+---------+------------+------------+-------+
| 1 | 1 | 5 | 1 |
| 2 | 1 | N | 1 |
| 3 | 2 | N | 2 |
| 4 | 3 | N | 3 |
| 5 | 3 | 6 | 3 |
| 6 | 4 | 6 | 3 |
| 7 | N | 7 | 4 |
| 8 | N | 5 | 1 |
| 10 | N | 5 | 1 |
+---------+------------+------------+-------+
Subgroup 1 and 5 would be combined because of shared item 1. (group items 1, 2, 8, 10)
Subgroup 3, 4! and 6 would be combined because of shared item 5. (group items 4, 5, 6)
Subgroup 2 and 7 constitute their own group and do not share an item with any other group.
I tried to use window functions, count by duplicate item, and going from there. But I got stuck.
As mentioned in the request comments, you need a recursive query. The recursive part is what I call cte im my query. There is an array called items that I use to avoid cycles.
The idea is per item I assign all other items that are directly or indirectly associated the same group. Then I aggregate by item and take the smallest associated item and thus detect all items belonging to the same group. I use DENSE_RANK to get consecutive group numbers.
with recursive
qr1(item, subgroup) as (values (1,1), (2,1), (3,2), (4,3), (5,3), (6,4)),
qr2(item, subgroup) as (values (1,5), (5,6), (6,6), (7,7), (8,5), (10,5)),
qr(item, subgroup) as (select * from qr1 union all select * from qr2),
cte(item, other, items) as
(
select item, item, array[item]
from qr
union all
select cte.item, g.item, cte.items || g.item
from cte
join qr on qr.item = cte.other
join qr g on g.subgroup = qr.subgroup
where g.item <> all (cte.items)
)
select
item,
min(qr1.subgroup) as sg1,
min(qr2.subgroup) as sg2,
dense_rank() over (order by min(other)) as grp
from cte
left join qr1 using (item)
left join qr2 using (item)
group by item
order by item;
Demo: https://dbfiddle.uk/fG6AnX6l

How to order id's using subtotal from another column in PostgreSQL

I have a table returned by a select query. Example :
id | day | count |
-- | ------ | ----- |
1 | 71 | 3 |
1 | 70 | 2 |
1 |Subtotal| 5 |
2 | 70 | 5 |
2 | 71 | 2 |
2 | 69 | 2 |
2 |Subtotal| 9 |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
the day column contains text values (so varchar)
subtotal is the sum of the counts for an id (e.g. id 2 has subtotal of 5 + 2 + 2 = 9)
I now want to order this table so the id’s with the lowest subtotal count come first, and then ordered by day with subtotal at the end (like before)
Expected output:
id | day | count |
-- | ------ | ----- |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
1 | 70 | 2 |
1 | 71 | 3 |
1 |Subtotal| 5 |
2 | 69 | 2 |
2 | 70 | 5 |
2 | 71 | 2 |
2 |Subtotal| 9 |
I can't figure out how to order based on subtotal only ?
i've tried multiple order by (eg: ORDER BY day = 'Subtotal' & a mix of others) and using window functions but none are helping. Cheers !
Not sure if it's directly applicable to your source query (since you haven't included it) however the ordering you require on the sample data can be done with:
order by Max(count) over(partition by id), day
Note - ordering by day works with your sample data but as it's a string it will not honour numeric ordering, this should really be ordered by the source of the numerical value - again since we don't have your actual query I can't suggest anything more applicable but I'm sure you can substitute the correct column/expression.
I just crated table with 3 columns and tried to reproduce your expected result. I assume that there might be a problem ordering by day, subtotal would be always on top, but it seems as working solution.
create table test
(
id int,
day varchar(15),
count int
)
insert into test
values
(1,'71',3),
(1,'70',2),
(2,'70',5),
(2,'71',2),
(2,'69',2),
(3,'69',1),
(3,'70',1)
select id, day, count
from
(
select id, day, sum(count) as count
from test
group by id, rollup(day)
) as t
order by Max(count) over(partition by id), day

Select max value from column for every value in other two columns

I'm working on a webapp that tracks tvshows, and I need to get all episodes id's that are season finales, which means, the highest episode number from all seasons, for all tvshows.
This is a simplified version of my "episodes" table.
id tvshow_id season epnum
---|-----------|--------|-------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 2 | 1
5 | 1 | 2 | 2
6 | 2 | 1 | 1
7 | 2 | 1 | 2
8 | 2 | 1 | 3
9 | 2 | 1 | 4
10 | 2 | 2 | 1
11 | 2 | 2 | 2
The expect output:
id
---|
3 |
5 |
9 |
11 |
I've managed to get this working for the latest season but I can't make it work for all seasons.
I've also tried to take some ideas from this but I can't seem to find a way to add the tvshow_id in there.
I'm using Postgres v10
SELECT Id from
(Select *, Row_number() over (partition by tvshow_id,season order by epnum desc) as ranking from tbl)c
Where ranking=1
You can use the below SQL to get your result, using GROUP BY with sub-subquery as:
select id from tab_x
where (tvshow_id,season,epnum) in (
select tvshow_id,season,max(epnum)
from tab_x
group by tvshow_id,season)
Below is the simple query to get desired result. Below query is also good in performance with help of using distinct on() clause
select
distinct on (tvshow_id,season)
id
from your_table
order by tvshow_id,season ,epnum desc

Oracle: sql query for deleting duplicate rows based on a group

i need a SQL-Query to delete duplicates from a table. Lets start with my tables
rc_document: (there are more entries, this is just an example)
+----------------+-------------+----------------------+
| rc_document_id | document_id | rc_document_group_id |
+----------------+-------------+----------------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
| 6 | 3 | 2 |
+----------------+-------------+----------------------+
(document_id can be exists in mulitple rc_document-group´s)
rc_document_group:
+----------------------+----------+
| rc_document_group_id | priority |
+----------------------+----------+
| 1 | 1 |
| 2 | 2 |
+----------------------+----------+
Each rc_document can be joined with the rc_document_group. In the rc_document_group is the priority for each rc_document.
I want to delete the rc_document rows with document_id which have not the highest priority in the rc_document_group. Because the document_id can be exists in multiple rc_document-group´s .. i just want to keep that one, with the highest priority.
here is my expected rc_document table after deleting duplicate document_id´s:
+----------------+-------------+----------------------+
| rc_document_id | document_id | rc_document_group_id |
+----------------+-------------+----------------------+
| 2 | 2 | 1 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
| 6 | 3 | 2 |
+----------------+-------------+----------------------+
the rc_document´s with rc_document_id 1 and 3 must be deleted, because there document_id 1 and 3 are in another rc_document_group with higher priority.
Im new in sql and i have no idea how to write these sql query ... thank for your help!!
First, you could join the two tables in order to get the corresponding priority on each row. After that, you could use the analytic function MAX() to get, for each row, the max priority within each group of document_id. At this point, you filter out the rows where the priority is not equal to the max priority in the group.
Try this query:
SELECT t.rc_document_id,
t.document_id,
t.rc_document_group_id
FROM (SELECT d.*,
g.priority,
MAX(g.priority) OVER(PARTITION BY document_id) max_priority
FROM rc_document d
INNER JOIN rc_document_group g
ON d.rc_document_group_id = g.rc_document_group_id) t
WHERE t.priority = t.max_priority

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.