SQL - Multiple conditionals with row_number() - sql

+------------+------------------+-------+-----------------+------------------------------+
| product_id | date | STOCK | REAL_STOCK-DIFF | Couting one time? |
+------------+------------------+-------+-----------------+------------------------------+
| 1ab7 | 10/18/2022 18:30 | 6009 | 495 | 495 |
| 1ab7 | 10/18/2022 20:10 | 6003 | 495 | 0 |
| 1ab7 | 10/20/2022 10:05 | 5514 | 495 | 0 |
| 1ab7 | 10/20/2022 11:05 | 23856 | 0 | 0 |
| 1ab7 | 10/20/2022 12:05 | 25850 | 0 | 0 |
| 1ab7 | 10/20/2022 13:05 | 44160 | 0 | 0 |
| 1ab7 | 10/20/2022 14:05 | 48205 | 130 | 130 |
| 1ab7 | 10/20/2022 17:05 | 48122 | 130 | 0 |
| 1ab7 | 10/20/2022 18:05 | 48075 | 130 | 0 |
| 1ab7 | 10/20/2022 19:05 | 17438 | 128 | 128 |
| 1ab7 | 10/21/2022 1:38 | 17310 | 128 | 0 |
| 2ab7 | 10/18/2022 18:30 | 85692 | 0 | 0 |
| 2ab7 | 10/20/2022 14:05 | 84498 | | SUM DIF STOCK == 495+130+128 |
| 2ab7 | 10/20/2022 15:05 | 84477 | | |
| 2ab7 | 10/20/2022 16:05 | 0 | | |
| 2ab7 | 10/20/2022 23:38 | 0 | | |
| 2ab7 | 10/21/2022 0:05 | 0 | | |
+------------+------------------+-------+-----------------+------------------------------+
This data shows the select that I tried to do with partition by in SQL, I'm controlling the stock and I have to show the stock difference over the products. That said, I was doing something like partition by max - min, however several conditions are applied, like: The stock can suddenly grow or decrease and even be removed completely (stock = 0), the partition by clause won't solve it purely.
My real stock is the third column "STOCK" in black, as you can see it was dropping from 6009 to the minimum 5514, then it grew up to 23856 until 48205, then was deducted to 17438.
Here we could do something like if 23856 > last number (5514) then new minimum and maximum = 23856, Just don't know how to partition it though. For the 17438, something like if the previous row_number > 17438*1.2 (20%) then new minimum = 17438
The SQL that I made was giving me the "Dif_Stock" column, which is wrongly displaying 42691 as the difference.
All that I'm trying to do is achieve the values that I inserted here in this column "Couting one time?":
"SUM DIF STOCK == 495+130+128"
My SQL code:
SELECT DISTINCT
product_id,
date,
stock,
MaxStock,
MinStock,
(MaxStock - MinStock) AS Dif_Stock
FROM (SELECT product_id,
date,
stock,
MAX(stock) OVER (PARTITION BY product_id) AS MaxStock,
MIN(stock) OVER (PARTITION BY product_id) AS MinStock,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY product_id) AS ROWN
FROM (SELECT product_id,
category,
product_name,
vol,
price,
CAST(stock AS int) AS stock,
date
FROM stock_control
WHERE 1 = 1) STOCK
GROUP BY date,
product_id,
category,
product_name,
vol,
price,
stock,
date
--HAVING STOCK != 0
) STOCK_2
--ORDER BY (MaxStock - MinStock) DESC
ORDER BY product_id,
date ASC;
Real image bellow from select * from mytable

Related

How to get Max date and sum of its rows SQL

I have following table,
+------+-------------+----------+---------+
| id | date | amount | amount2 |
+------+-------------+----------+---------+
| | | | 500 |
| 1 | 1/1/2020 | 1000 | |
+------+-------------+----------+---------+
| | | | 100 |
| 1 | 1/3/2020 | 1558 | |
+------+-------------+----------+---------+
| | | | 200 |
| 1 | 1/3/2020 | 126 | |
+------+-------------+----------+---------+
| | | | 500 |
| 2 | 2/5/2020 | 4921 | |
+------+-------------+----------+---------+
| | | | 100 |
| 2 | 2/5/2020 | 15 | |
+------+-------------+----------+---------+
| | | | 140 |
| 2 | 1/1/2020 | 5951 | |
+------+-------------+----------+---------+
| | | | 10 |
| 2 | 1/2/2020 | 1588 | |
+------+-------------+----------+---------+
| | | | 56 |
| 2 | 1/3/2020 | 1568 | |
+------+-------------+----------+---------+
| | | | 45 |
| 2 | 1/4/2020 | 12558 | |
+------+-------------+----------+---------+
I need to get each Id's max date and its amount and amount2 summations, how can I do this. according to above data, I need following output.
+------+-------------+----------+---------+
| | | | 300 |
| 1 | 1/3/2020 | 1684 | |
+------+-------------+----------+---------+
| | | | 600 |
| 2 | 2/5/2020 | 4936 | |
+------+-------------+----------+---------+
How can I do this.
Aggregate and use MAX OVER to get the IDs' maximum dates:
select id, [date], sum_amount, sum_amount2
from
(
select
id, [date], sum(amount) as sum_amount, sum(amount2) as sum_amount2,
max([date]) over (partition by id) as max_date_for_id
from mytable group by id, [date]
) aggregated
where [date] = max_date_for_id
order by id;
first is to use dense_rank() to find the row with latest date
dense_rank () over (partition by id order by [date] desc)
after that, just simply group by with sum() on the amount
select id, [date], sum(amount), sum(amount2)
from
(
select *,
dr = dense_rank () over (partition by id order by [date] desc)
from your_table
) t
where dr = 1
group by id, [date]

SQL (Redshift) get start and end values for consecutive data in a given column

I have a table that has the subscription state of users on any given day. The data looks like this
+------------+------------+--------------+
| account_id | date | current_plan |
+------------+------------+--------------+
| 1 | 2019-08-01 | free |
| 1 | 2019-08-02 | free |
| 1 | 2019-08-03 | yearly |
| 1 | 2019-08-04 | yearly |
| 1 | 2019-08-05 | yearly |
| ... | | |
| 1 | 2020-08-02 | yearly |
| 1 | 2020-08-03 | free |
| 2 | 2019-08-01 | monthly |
| 2 | 2019-08-02 | monthly |
| ... | | |
| 2 | 2019-08-31 | monthly |
| 2 | 2019-09-01 | free |
| ... | | |
| 2 | 2019-11-26 | free |
| 2 | 2019-11-27 | monthly |
| ... | | |
| 2 | 2019-12-27 | monthly |
| 2 | 2019-12-28 | free |
+------------+------------+--------------+
I would like to have a table that gives the start and end dats of a subscription. It would look something like this:
+------------+------------+------------+-------------------+
| account_id | start_date | end_date | subscription_type |
+------------+------------+------------+-------------------+
| 1 | 2019-08-03 | 2020-08-02 | yearly |
| 2 | 2019-08-01 | 2019-08-31 | monthly |
| 2 | 2019-11-27 | 2019-12-27 | monthly |
+------------+------------+------------+-------------------+
I started by doing a LAG windown function with a bunch of WHERE statements to grab the "state changes", but this makes it difficult to see when customers float in and out of subscriptions and i'm not sure this is the best method.
lag as (
select *, LAG(tier) OVER (PARTITION BY account_id ORDER BY date ASC) AS previous_plan
, LAG(date) OVER (PARTITION BY account_id ORDER BY date ASC) AS previous_plan_date
from data
)
SELECT *
FROM lag
where (current_plan = 'free' and previous_plan in ('monthly', 'yearly'))
This is a gaps-and-islands problem. I think a difference of row numbers works:
select account_id, current_plan, min(date), max(date)
from (select d.*,
row_number() over (partition by account_id order by date) as seqnum,
row_number() over (partition by account_id, current_plan order by date) as seqnum_2
from data
) d
where current_plan <> free
group by account_id, current_plan, (seqnum - seqnum_2);

Find rows with adjourning date ranges and accumulate their durations

My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.
Example Data
locations
SELECT id, name, is_federal_state
FROM locations
WHERE is_federal_state = true;
| id | name | is_federal_state |
|----|-------------------|------------------|
| 2 | Baden-Württemberg | true |
| 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types;
| id | name |
|----|-----------------------|
| 1 | Herbst |
| 8 | Wochenende |
"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".
periods
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id
FROM periods
WHERE location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 8 |
Task
I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:
SELECT id, starts_on, ends_on,
(ends_on - starts_on + 1) AS duration,
holiday_or_vacation_type_id
FROM periods
| id | starts_on | ends_on | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).
Problem 1
Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
Problem 2
It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.
The final operation is some aggregation (using window functions):
SELECT p.*,
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id,
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods
) p
) p;

Obtain MIN() and MAX() over not correlative values in PostgreSQL

I have a problem that I can't found a solution. This is my scenario:
parent_id | transaction_code | way_to_pay | type_of_receipt | unit_price | period | series | number_from | number_to | total_numbers
10 | 2444 | cash | local | 15.000 | 2018 | A | 19988 | 26010 | 10
This result's when a grouping parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series, MIN(number), MAX(number) and COUNT(number). But the grouping hides that the number is not correlative, because this is my childs situation:
parent_id | child_id | number
10 | 1 | 19988
10 | 2 | 19989
10 | 3 | 19990
10 | 4 | 19991
10 | 5 | 22001
10 | 6 | 22002
10 | 7 | 26007
10 | 8 | 26008
10 | 9 | 26009
10 | 10 | 26010
What is the magic SQL to achieve the following?
parent_id | transaction_code | way_to_pay | type_of_receipt | unit_price | period | series | number_from | number_to | total_numbers
10 | 2444 | cash | local | 15.000 | 2018 | A | 19988 | 19991 | 4
10 | 2444 | cash | local | 15.000 | 2018 | A | 22001 | 22002 | 2
10 | 2444 | cash | local | 15.000 | 2018 | A | 26007 | 26010 | 4
You can identify adjacent numbers by subtracting a sequence. It would help if you showed your query, but the idea is this:
select parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series,
min(number), max(number), count(*)
from (select t.*,
row_number() over
(partition by parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series
order by number
) as seqnum
from t
) t
group by parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series,
(number - seqnum);

How do you join records one to one with multiple possible matches ...?

I have a table of transactions like the following
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | NULL | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | NULL | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | NULL | 12 |
| 4 | ID Issue | 11/27/2012 | -3 | -3.50 | 100 | NULL |
| 5 | ID Issue | 11/27/2012 | -3 | -3.50 | 102 | NULL |
| 6 | ID Issue | 11/25/2012 | -1 | -1.25 | 104 | NULL |
These transactions are duplicates where the 'Issue's have a work order ID while the 'ID Issue' transactions have the item number. I would like to update the [Item Number] field for the 'Issue' transactions to include the Item Number. When I do a join on the Date, Qty, and Total I get something like this
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 10 |
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 11 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | 104 | 12 |
The duplicates are multiplied! I would like this
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | 104 | 12 |
Or this (Item Number is switched for the two matches)
| ID | Trans Type | Date | Qty | Total | Item Number | Work Order |
-------------------------------------------------------------------------
| 1 | Issue | 11/27/2012 | 3 | 3.50 | 102 | 10 |
| 2 | Issue | 11/27/2012 | 3 | 3.50 | 100 | 11 |
| 3 | Issue | 11/25/2012 | 1 | 1.25 | 104 | 12 |
Either would be fine. What would be a simple solution?
Use SELECT DISTINCT to filter same results out or you could partition your results to get the first item in each grouping.
UPDATE
Here's the code to illustrate the partition approach.
SELECT ID, [Trans Type], [Date], [Qty], [Total], [Item Number], [Work Order]
FROM
(
SELECT
ID, [Trans Type], [Date], [Qty], [Total], [Item Number], [Work Order], ROW_NUMBER() OVER
(PARTITION BY ID, [Trans Type], [Date], [Qty], [Total]
ORDER BY [Item Number]) AS ItemRank
FROM YourTable
) AS SubQuery
WHERE ItemRank = 1