Group by groups of 3 by column - sql

Say I have a table that looks like this:
| id | category_id | created_at |
| 1 | 3 | date... |
| 2 | 4 | date... |
| 3 | 1 | date... |
| 4 | 2 | date... |
| 5 | 5 | date... |
| 6 | 6 | date... |
And imagine there are a lot more entries. I'd like to grab these in a way that they are fresh, so ordering them by created_at DESC - but I'd also like to group them by category, in groups of 3!
So in pseudocode it looks something like this:
Go to category 1
-> Pick last 3
Go to category 2
-> Pick last 3
Go to category 3
-> Pick last 3
And so forth, starting over from category_id 1 when there's no other category to grab from. This will then be paginated as well so I need to make it work with offset & limit as well somehow.
I'm not at all sure where to start or what they keywords to google for are. I'd be happy with some nudges in the right direction so I can find the answer myself, or a full answer.

Another case for the window function row_number().
Just the latest 3 rows per category
SELECT id, category_id, created_at
FROM (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
) sub
WHERE rn < 4
ORDER BY category_id, rn;
The latest 3 rows per category, plus later rows
If you want to append the rest of the rows (your question gets fuzzy if and how):
SELECT *
FROM (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
) sub
ORDER BY (rn > 3), category_id, rn;
One can sort by the outcome of a boolean expression (rn > 3):
FALSE (0)
TRUE (1)
NULL (because default is NULLS LAST - not applicable here)
This way, the latest 3 rows per category come first and all the rest later.
Or use a CTE and UNION ALL:
WITH cte AS (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
)
)
SELECT id, category_id, created_at
FROM cte
WHERE rn < 4
ORDER BY category_id, rn
)
UNION ALL
)
SELECT id, category_id, created_at
FROM cte
WHERE rn >= 4
ORDER BY category_id, rn
);
Same result.
All parentheses required to attach ORDER BY in individual legs of a UNION query.

Related

How to select the last record of each ID

I need to extract the last records of each user from the table. The table schema is like below.
mytable
product | user_id |
-------------------
A | 15 |
B | 15 |
A | 16 |
C | 16 |
-------------------
The output I want to get is
product | user_id |
-------------------
B | 15 |
C | 16 |
Basically the last records of each user.
Thanks in advance!
You can use a window function called ROW_NUMBER.Here is a solution for you given below. I have also made a demo query in db-fiddle for you. Please check link Demo Code in DB-Fiddle
WITH CTE AS
(SELECT product, user_id,
ROW_NUMBER() OVER(PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id FROM CTE WHERE RN=1 ;
You can try using row_number()
select product,iserid
from
(
select product, userid,row_number() over(partition by userid order by product desc) as rn
from tablename
)A where rn=1
There is no such thing as a "last" record unless you have a column that specifies the ordering. SQL tables represent unordered sets (well technically, multisets).
If you have such a column, then use distinct on:
select distinct on (user_id) t.*
from t
order by user_id, <ordering col> desc;
Distinct on is a very handy Postgres extension that returns one row per "group". It is the first row based on the ordering specified in the order by clause.
You should have a column that stores the insertion order. Whether through auto increment or a value with date and time.
Ex:
autoIncrement
produt
user_id
1
A
15
2
B
15
3
A
16
4
C
16
SELECT produt, user_id FROM table inner join
( SELECT MAX(autoIncrement) as id FROM table group by user_id ) as table_Aux
ON table.autoIncrement = table_Aux.id

Need to sum all Most Recent Rows from each Store that have ItemID

I have a table with, among other things, these columns: DateTransferred, ComputedQuantity, StoreID, ItemID
I have two goals. My simpler goal is to write a query where I feel in the ItemID and it sums up the ComputedQuantity where it matches that ItemID, only using the most recent DateTransferred for each StoreID. So with the following example data:
DateTransferred | StoreID | ItemID | ComputedQuantity
11/10/17 | 1 | 1 | 3 <
10/10/17 | 1 | 1 | 4
09/10/17 | 2 | 1 | 9 <
08/10/17 | 3 | 1 | 1 <
07/10/17 | 3 | 1 | 10
I would want it to pull every row with < next to it, as that's the most recent Date for that StoreID, and sum up to 13
My more complicated goal is that I would like to include the above-calculated value into a 'join' where I'm dealing with the Item table, so that I can pull all the items and join them with a new column which has the summed up ComputedQuantity
This is on SQL Server 10 on Windows Server 2008, if that matters
One simple method uses a correlated subquery:
select t.*
from t
where t. DateTransferred = (select max(t2.DateTransferred)
from t t2
where t2.storeid = t.storeid
);
Another even simpler method uses window functions:
select t.*
from (select t.*,
row_number() over (partition by storeid order by DateTransferred desc) as seqnum
from t
) t
where seqnum = 1;
In either case, you can add a where clause to the subquery if you want the most recent date on or before some given date (say a year ago).
Also, these both assume that your data has no future dates. If so, then add where DateTransferred < getdate().
The final statement which sums the ComputedQuantities:
select ItemID, SUM(ComputedQuantity) Quantity
from (select t.*,
row_number() over (partition by StoreID, ItemID order by DateTransferred DESC) as seqnum
from [db].[dbo].[InventoryTransferLog] t
) t
where seqnum = 1 and ComputedQuantity > 0
GROUP BY ItemID
ORDER BY ItemID
I decided not to sum values < 0

Compare different orders of the same table

I have this following scenario, a table with these columns:
table_id|user_id|os_number|inclusion_date
In the system, the os_number is sequential for the users, but due to a system bug some users inserted OSs in wrong order. Something like this:
table_id | user_id | os_number | inclusion_date
-----------------------------------------------
1 | 1 | 1 | 2015-11-01
2 | 1 | 2 | 2015-11-02
3 | 1 | 3 | 2015-11-01
Note the os number 3 inserted before the os number 2
What I need:
Recover the table_id of the rows 2 and 3, which is out of order.
I have these two select that show me the table_id in two different orders:
select table_id from table order by user_id, os_number
select table_id from table order by user_id, inclusion_date
I can't figure out how can I compare these two selects and see which users are affected by this system bug.
Your question is a bit difficult because there is no correct ordering (as presented) -- because dates can have ties. So, use the rank() or dense_rank() function to compare the two values and return the ones that are not in the correct order:
select t.*
from (select t.*,
rank() over (partition by user_id order by inclusion_date) as seqnum_d,
rank() over (partition by user_id order by os_number) as seqnum_o
from t
) t
where seqnum_d <> seqnum_o;
Use row_number() over both orders:
select *
from (
select *,
row_number() over (order by os_number) rnn,
row_number() over (order by inclusion_date) rnd
from a_table
) s
where rnn <> rnd;
table_id | user_id | os_number | inclusion_date | rnn | rnd
----------+---------+-----------+----------------+-----+-----
3 | 1 | 3 | 2015-11-01 | 3 | 2
2 | 1 | 2 | 2015-11-02 | 2 | 3
(2 rows)
Not entirely sure about the performance on this but you could use a cross apply on the same table to get the results in one query. This will bring up the pairs of table_ids which are incorrect.
select
a.table_id as InsertedAfterTableId,
c.table_id as InsertedBeforeTableId
from table a
cross apply
(
select b.table_id
from table b
where b.inclusion_date < a.inclusion_date and b.os_number > a.os_number
) c
Both query examples given below simply check a mismatch between inclusion date and os_number:
This first query should return the offending row (the one whose os_number is off from its inclusion date)--in the case of the example row 3.
select table.table_id, table.user_id, table.os_number from table
where EXISTS(select * from table t
where t.user_id = table.user_id and
t.inclusion_date > table.inclusion_date and
t.os_number < table.os_number);
This second query will return the table numbers and users for two rows that are mismatched:
select first_table.table_id, second_table.table_id, first_table.user_id from
table first_table
JOIN table second_table
ON (first_table.user_id = second_table.user_id and
first_table.inclusion_date > second_table.inclusion_date and
first_table.os_number < second_table.os_number);
I would use WINDOW FUNCTIONS to get row numbers in orders in question and then compare them:
SELECT
sub.table_id,
sub.user_id,
sub.os_number,
sub.inclusion_date,
number_order_1, number_order_2
FROM (
SELECT
table_id,
user_id,
os_number,
inclusion_date,
row_number() OVER (PARTITION BY user_id
ORDER BY os_number
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) AS number_order_1,
row_number() OVER (PARTITION BY user_id
ORDER BY inclusion_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) AS number_order_2
FROM
table
) sub
WHERE
number_order_1 <> number_order_1
;
EDIT:
Because of a_horse_with_no_name made good point about my final answer. I've back to my first answer (look in edit history) which work also if os_number isn't gapless.
select *
from (
select a_table.*,
lag(inclusion_date) over (partition by user_id order by os_number) as last_date
from a_table
) result
where last_date is not null AND last_date>inclusion_date;
This should cover gaps as well as ties. Basically, I simply check the inclusion_date of the last os_number, and make sure it's not strictly greater than the current date (so 2 version on the same date is fine).

Rank function for date in Oracle SQL

I have the following code for example:
SELECT id, order_day, purchase_id FROM d
customer_id and purchase_id are unique. Each customer_id could have multiple purchase_id. Assume every one has made at least 5 orders.
Now, I just want to pull the first 5 purchase IDs of each customers ID (this depends on the earliest dates of purchases). I want the result to look like this:
id | purchase_id | rank
-------------------------
A | WERFEW43 | 1
A | ERTGDSFV | 3
A | FDGRT45 | 2
A | BRTE4TEW | 4
A | DFGDV | 5
B | DSFSF | 1
B | CF345 | 2
B | SDFSDFSDFS | 4
I thought of Ranking order_day, but my knowledge is not good enough to pull this off.
select id,purchase_id, rank() over (order by order_day)
from d
you also can try dense_rank() over (order by order_day) and row_number() over (order by order_day) and choose which one will be more suitable for you
select *
from
( SELECT
id
,order_day
,purchase_id
,row_number() -- ranking
over (partition by id -- each customer
order by order_day) as rn -- based on oldest dates
FROM d
) as dt
where rn <= 5

SQL: Ignore some returned rows, deleting others

I have this table :
| Column | Type |
+---------------+--------------------------------+
| id | integer |
| recipient_id | integer |
| is_read | boolean |
| updated_at | timestamp(0) without time zone |
I have to delete items from this table with this specific rule:
for each recipient_id, we keep the 5 last read items, and we delete the old read one.
I tried to bend my mind with RECURSIVE WITH statements but failed miserably. I've implemented my solution programmatically but I wanted to know if there was a decent pure SQL solution.
DELETE FROM tbl t
USING (
SELECT id, row_number() OVER (PARTITION BY recipient_id
ORDER BY updated_at DESC) as rn
FROM tbl
WHERE is_read
) x
WHERE x.rn > 5
AND x.id = t.id;
A JOIN is usually faster than an IN expression, especially with larger numbers of items.
And use row_number(), not rank()!
Check out window functions:
DELETE FROM table
WHERE id IN (
SELECT id
FROM (
SELECT id, rank() OVER (PARTITION BY recipient_id ORDER BY updated_at DESC) as position
FROM table
WHERE is_read
) subselect WHERE position > 5
)
delete from t
where id in (
select id
from (
select
id,
row_number() over(partition by recipient_id order by updated_at desc) rn
from t
where is_read
) s
where s.rn > 5
)