base_table
month id sales cumulative_sales
2021-01-01 33205 10 10
2021-02-01 33205 15 25
Based on the base table above, I would like to add more rows up to the current month,
even if there is no sales.
Expected table
month id sales cumulative_sales
2021-01-01 33205 10 10
2021-02-01 33205 15 25
2021-03-01 33205 0 25
2021-04-01 33205 0 25
2021-05-01 33205 0 25
.........
2021-11-01 33205 0 25
My query stops at
select month, id, sales,
sum(sales) over (partition by id
order by month
rows between unbounded preceding and current row) as cumulative_sales
from base_table
This works. Assumes the month column is constrained to hold only "first of the month" dates. Use the desired hard-coded start date, or use another CTE to get the earliest date from base_table:
with base_table as (
select *
from (values
('2021-01-01'::date,33205,10)
,('2021-02-01' ,33205,15)
,('2021-01-01' ,12345,99)
,('2021-04-01' ,12345,88)
) dat("month",id,sales)
)
select cal.dt::date
,list.id
,coalesce(dat.sales,0) as sales
,coalesce(sum(dat.sales) over (partition by list.id order by cal.dt),0) as cumulative_sales
from generate_series('2020-06-01' /* use desired start date here */,current_date,'1 month') cal(dt)
cross join (select distinct id from base_table) list
left join base_table dat on dat."month" = cal.dt and dat.id = list.id
;
Results:
| dt | id | sales | cumulative_sales |
+------------+-------+-------+------------------+
| 2020-06-01 | 12345 | 0 | 0 |
| 2020-07-01 | 12345 | 0 | 0 |
| 2020-08-01 | 12345 | 0 | 0 |
| 2020-09-01 | 12345 | 0 | 0 |
| 2020-10-01 | 12345 | 0 | 0 |
| 2020-11-01 | 12345 | 0 | 0 |
| 2020-12-01 | 12345 | 0 | 0 |
| 2021-01-01 | 12345 | 99 | 99 |
| 2021-02-01 | 12345 | 0 | 99 |
| 2021-03-01 | 12345 | 0 | 99 |
| 2021-04-01 | 12345 | 88 | 187 |
| 2021-05-01 | 12345 | 0 | 187 |
| 2021-06-01 | 12345 | 0 | 187 |
| 2021-07-01 | 12345 | 0 | 187 |
| 2021-08-01 | 12345 | 0 | 187 |
| 2021-09-01 | 12345 | 0 | 187 |
| 2021-10-01 | 12345 | 0 | 187 |
| 2021-11-01 | 12345 | 0 | 187 |
| 2020-06-01 | 33205 | 0 | 0 |
| 2020-07-01 | 33205 | 0 | 0 |
| 2020-08-01 | 33205 | 0 | 0 |
| 2020-09-01 | 33205 | 0 | 0 |
| 2020-10-01 | 33205 | 0 | 0 |
| 2020-11-01 | 33205 | 0 | 0 |
| 2020-12-01 | 33205 | 0 | 0 |
| 2021-01-01 | 33205 | 10 | 10 |
| 2021-02-01 | 33205 | 15 | 25 |
| 2021-03-01 | 33205 | 0 | 25 |
| 2021-04-01 | 33205 | 0 | 25 |
| 2021-05-01 | 33205 | 0 | 25 |
| 2021-06-01 | 33205 | 0 | 25 |
| 2021-07-01 | 33205 | 0 | 25 |
| 2021-08-01 | 33205 | 0 | 25 |
| 2021-09-01 | 33205 | 0 | 25 |
| 2021-10-01 | 33205 | 0 | 25 |
| 2021-11-01 | 33205 | 0 | 25 |
The cross join pairs every date output by generate_series() with every id value from base_table.
The left join ensures that no dt+id pairs get dropped from the output when no such record exists in base_table.
The coalesce() functions ensure that the sales and cumulative_sales show 0 instead of null for dt+id combinations that don't exist in base_table. Remove them if you don't mind seeing nulls in those columns.
Related
I have the following table (example):
+----+-------+-------------+----------------+
| id | value | last_update | ingestion_date |
+----+-------+-------------+----------------+
| 1 | 30 | 2021-02-03 | 2021-02-07 |
+----+-------+-------------+----------------+
| 1 | 29 | 2021-02-03 | 2021-02-06 |
+----+-------+-------------+----------------+
| 1 | 28 | 2021-01-25 | 2021-02-02 |
+----+-------+-------------+----------------+
| 1 | 25 | 2021-01-25 | 2021-02-01 |
+----+-------+-------------+----------------+
| 1 | 23 | 2021-01-20 | 2021-01-31 |
+----+-------+-------------+----------------+
| 1 | 20 | 2021-01-20 | 2021-01-30 |
+----+-------+-------------+----------------+
| 2 | 55 | 2021-02-03 | 2021-02-06 |
+----+-------+-------------+----------------+
| 2 | 50 | 2021-01-25 | 2021-02-02 |
+----+-------+-------------+----------------+
The result I need:
It should be the last updated value in the column value and the penult value (based in the last_update and ingestion_date) in the value2.
+----+-------+-------------+----------------+--------+
| id | value | last_update | ingestion_date | value2 |
+----+-------+-------------+----------------+--------+
| 1 | 30 | 2021-02-03 | 2021-02-07 | 28 |
+----+-------+-------------+----------------+--------+
| 2 | 55 | 2021-02-03 | 2021-02-06 | 50 |
+----+-------+-------------+----------------+--------+
The query I have right now is the following:
SELECT id, value, last_update, ingestion_date, value2
FROM
(SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY last_update DESC, ingestion_date DESC) AS order,
LAG(value) OVER(PARTITION BY id ORDER BY last_update, ingestion_date) AS value2
FROM table)
WHERE ordem = 1
The result I am getting:
+----+-------+-------------+----------------+--------+
| ID | value | last_update | ingestion_date | value2 |
+----+-------+-------------+----------------+--------+
| 1 | 30 | 2021-02-03 | 2021-02-07 | 29 |
+----+-------+-------------+----------------+--------+
| 2 | 55 | 2021-02-03 | 2021-02-06 | 50 |
+----+-------+-------------+----------------+--------+
Obs: I am using Athena from AWS
I have a table that looks like this:
| id | date_start | gap_7_days |
| -- | ------------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 |
| 1 | 2021-06-13 00:00:00 | 0 |
| 1 | 2021-06-19 00:00:00 | 0 |
| 1 | 2021-06-27 00:00:00 | 0 |
| 2 | 2021-07-04 00:00:00 | 1 |
| 2 | 2021-07-11 00:00:00 | 1 |
| 2 | 2021-07-18 00:00:00 | 1 |
| 2 | 2021-07-25 00:00:00 | 1 |
| 2 | 2021-08-01 00:00:00 | 1 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-09 00:00:00 | 0 |
| 2 | 2021-08-16 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
| 2 | 2021-08-30 00:00:00 | 1 |
| 2 | 2021-08-31 00:00:00 | 0 |
| 2 | 2021-09-01 00:00:00 | 0 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-15 00:00:00 | 1 |
| 2 | 2021-08-22 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
For each ID, I check whether consecutive date_start values are 7 days apart, and put a 1 or 0 in gap_7_days accordingly.
I want to do the following (using Redshift SQL only):
Get the length of each sequence of consecutive 1s in gap_7_days for each ID
Expected output:
| id | date_start | gap_7_days | sequence_length |
| -- | ------------------- | --------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 | |
| 1 | 2021-06-13 00:00:00 | 0 | |
| 1 | 2021-06-19 00:00:00 | 0 | |
| 1 | 2021-06-27 00:00:00 | 0 | |
| 2 | 2021-07-04 00:00:00 | 1 | 6 |
| 2 | 2021-07-11 00:00:00 | 1 | 6 |
| 2 | 2021-07-18 00:00:00 | 1 | 6 |
| 2 | 2021-07-25 00:00:00 | 1 | 6 |
| 2 | 2021-08-01 00:00:00 | 1 | 6 |
| 2 | 2021-08-08 00:00:00 | 1 | 6 |
| 2 | 2021-08-09 00:00:00 | 0 | |
| 2 | 2021-08-16 00:00:00 | 1 | 3 |
| 2 | 2021-08-23 00:00:00 | 1 | 3 |
| 2 | 2021-08-30 00:00:00 | 1 | 3 |
| 2 | 2021-08-31 00:00:00 | 0 | |
| 2 | 2021-09-01 00:00:00 | 0 | |
| 2 | 2021-08-08 00:00:00 | 1 | 4 |
| 2 | 2021-08-15 00:00:00 | 1 | 4 |
| 2 | 2021-08-22 00:00:00 | 1 | 4 |
| 2 | 2021-08-23 00:00:00 | 1 | 4 |
Get the number of sequences for each ID
Expected output:
| id | num_sequences |
| -- | ------------------- |
| 1 | 0 |
| 2 | 3 |
How can I achieve this?
If you want the number of sequences, just look at the previous value. When the current value is "1" and the previous is NULL or 0, then you have a new sequence.
So:
select id,
sum( (gap_7_days = 1 and coalesce(prev_gap_7_days, 0) = 0)::int ) as num_sequences
from (select t.*,
lag(gap_7_days) over (partition by id order by date_start) as prev_gap_7_days
from t
) t
group by id;
If you actually want the lengths of the sequences, as in the intermediate results, then ask a new question. That information is not needed for this question.
I want to get front value and back value using min-max comparing.
Here is my sample table.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=322aafeb7970e25f9a85d0cd2f6c00d6
For example, this is temp01 table.
I want to get start&end NULL value in temp01.
So, I fill
price = [NULL, NULL, 13000]
to
price = [12000, 12000, 13000]
Because the 12000 is minimum value in [230, 1]. And I fill end NULL is filled maximum value in [cat01, cat02] group
| SEQ | cat01 | cat02 | dt_day | price |
+-----+-------+-------+------------+-------+
| 1 | 230 | 1 | 2019-01-01 | NULL |
| 2 | 230 | 1 | 2019-01-02 | NULL |
| 3 | 230 | 1 | 2019-01-03 | 13000 |
...
| 11 | 230 | 1 | 2019-01-11 | NULL |
| 12 | 230 | 1 | 2019-01-12 | NULL |
| 1 | 230 | 2 | 2019-01-01 | NULL |
| 2 | 230 | 2 | 2019-01-02 | NULL |
| 3 | 230 | 2 | 2019-01-03 | 12000 |
...
| 12 | 230 | 2 | 2019-01-11 | NULL |
| 13 | 230 | 2 | 2019-01-12 | NULL |
[result]
| SEQ | cat01 | cat02 | dt_day | price |
+-----+-------+-------+------------+-------+
| 1 | 230 | 1 | 2019-01-01 | 12000 | --START
| 2 | 230 | 1 | 2019-01-02 | 12000 |
| 3 | 230 | 1 | 2019-01-03 | 13000 |
| 4 | 230 | 1 | 2019-01-04 | 12000 |
| 5 | 230 | 1 | 2019-01-05 | NULL |
| 6 | 230 | 1 | 2019-01-06 | NULL |
| 7 | 230 | 1 | 2019-01-07 | 19000 |
| 8 | 230 | 1 | 2019-01-08 | 20000 |
| 9 | 230 | 1 | 2019-01-09 | 21500 |
| 10 | 230 | 1 | 2019-01-10 | 21500 |
| 11 | 230 | 1 | 2019-01-11 | 21500 |
| 12 | 230 | 1 | 2019-01-12 | 21500 |
| 13 | 230 | 1 | 2019-01-13 | 21500 | --END
| 1 | 230 | 2 | 2019-01-01 | 12000 | --START
| 2 | 230 | 2 | 2019-01-02 | 12000 |
| 3 | 230 | 2 | 2019-01-03 | 12000 |
| 4 | 230 | 2 | 2019-01-04 | 17000 |
| 5 | 230 | 2 | 2019-01-05 | 22000 |
| 6 | 230 | 2 | 2019-01-06 | NULL |
| 7 | 230 | 2 | 2019-01-07 | 23000 |
| 8 | 230 | 2 | 2019-01-08 | 23200 |
| 9 | 230 | 2 | 2019-01-09 | NULL |
| 10 | 230 | 2 | 2019-01-10 | 24000 |
| 11 | 230 | 2 | 2019-01-11 | 24000 |
| 12 | 230 | 2 | 2019-01-12 | 24000 |
| 13 | 230 | 2 | 2019-01-13 | 24000 | --END
Please, let me know what a good way to fill NULL using linear relationships.
find the min() and max() for the price GROUP BY cat01, cat02.
Also find the min and max seq for the row where price is not null
after that it is just simply inner join to your table and update where price is null
with val as
(
select cat01, cat02,
min_price = min(price),
max_price = max(price),
min_seq = min(case when price is not null then seq end),
max_seq = max(case when price is not null then seq end)
from temp01
group by cat01, cat02
)
update t
set price = case when t.seq < v.min_seq then min_price
when t.seq > v.max_seq then max_price
end
FROM temp01 t
inner join val v on t.cat01 = v.cat01
and t.cat02 = v.cat02
where t.price is null
dbfiddle
EDIT : returning the price as a new column in SELECT query
with val as
(
select cat01, cat02, min_price = min(price), max_price = max(price),
min_seq = min(case when price is not null then seq end),
max_seq = max(case when price is not null then seq end)
from temp01
group by cat01, cat02
)
select t.*,
new_price = coalesce(t.price,
case when t.seq < v.min_seq then min_price
when t.seq > v.max_seq then max_price
end)
FROM temp01 t
left join val v on t.cat01 = v.cat01
and t.cat02 = v.cat02
Updated dbfiddle
sample
+---------+------------+----------+------------+
| prdt_no | order_date | quantity | unit_price |
+---------+------------+----------+------------+
| A001 | 2020-01-01 | 100 | 10 |
| A001 | 2020-01-10 | 200 | 10 |
| A001 | 2020-02-01 | 100 | 20 |
| A001 | 2020-02-05 | 100 | 20 |
| A001 | 2020-02-07 | 100 | 20 |
| A001 | 2020-02-10 | 100 | 15 |
| A002 | 2020-01-01 | 100 | 10 |
| A002 | 2020-01-10 | 200 | 10 |
| A002 | 2020-02-01 | 100 | 20 |
| A002 | 2020-02-05 | 100 | 20 |
| A002 | 2020-02-07 | 100 | 20 |
| A002 | 2020-02-10 | 100 | 15 |
+---------+------------+----------+------------+
expected
if query condition is order_date between 2020-02-02 and 2020-02-10 then the data will be expected to get below result
+---------+------------+----------+------------+------------------------+-----------------+-------------+-----------------------------+
| prdt_no | order_date | quantity | unit_price | last_unit_price_before | unit_price_diff | cost_reduce | last_unit_price_change_date |
+---------+------------+----------+------------+------------------------+-----------------+-------------+-----------------------------+
| A001 | 2020-02-05 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A001 | 2020-02-07 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A001 | 2020-02-10 | 100 | 15 | 20 | -5 | -500 | 2020-02-10 |
| A002 | 2020-02-05 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A002 | 2020-02-07 | 100 | 20 | 10 | 10 | 1000 | 2020-02-01 |
| A002 | 2020-02-10 | 100 | 15 | 20 | -5 | -500 | 2020-02-10 |
+---------+------------+----------+------------+------------------------+-----------------+-------------+-----------------------------+
logic
I hope to get same product last unit price before then use it to calculate the price difference
the data record count actually over 200K
like photo
test demo link
SQL Server 2012 | db<>fiddle
you can use OUTER APPLY() to get the last low with price difference
SELECT *,
unit_price_diff = T.[unit_price] - L.[last_unit_price_before]
FROM T
OUTER APPLY
(
SELECT TOP 1
last_unit_price_before = x.[unit_price],
last_unit_price_change_date = x.[order_date]
FROM T x
WHERE x.[prdt_no] = T.[prdt_no]
AND x.[order_date] < T.[order_date]
AND x.[unit_price] <> T.[unit_price]
ORDER BY x.[order_date] DESC
) L
WHERE T.[order_date] >= '2020-02-01'
AND T.[order_date] <= '2020-02-10'
db<>fiddle
I have a table "Product" as :
| ProductId | ProductCatId | Price | Date | Deadline |
--------------------------------------------------------------------
| 1 | 1 | 10.00 | 2016-01-01 | 2016-01-27 |
| 2 | 2 | 10.00 | 2016-02-01 | 2016-02-27 |
| 3 | 3 | 10.00 | 2016-03-01 | 2016-03-27 |
| 4 | 1 | 10.00 | 2016-04-01 | 2016-04-27 |
| 5 | 3 | 10.00 | 2016-05-01 | 2016-05-27 |
| 6 | 3 | 10.00 | 2016-06-01 | 2016-06-27 |
| 7 | 1 | 20.00 | 2016-01-01 | 2016-01-27 |
| 8 | 2 | 30.00 | 2016-02-01 | 2016-02-27 |
| 9 | 1 | 40.00 | 2016-03-01 | 2016-03-27 |
| 10 | 4 | 15.00 | 2016-04-01 | 2016-04-27 |
| 11 | 1 | 25.00 | 2016-05-01 | 2016-05-27 |
| 12 | 5 | 55.00 | 2016-06-01 | 2016-06-27 |
| 13 | 5 | 55.00 | 2016-06-01 | 2016-01-27 |
| 14 | 5 | 55.00 | 2016-06-01 | 2016-02-27 |
| 15 | 5 | 55.00 | 2016-06-01 | 2016-03-27 |
I want to create SP count rows of Product each month with condition Year = CurrentYear , like :
| Month| SumProducts | SumExpiredProducts |
-------------------------------------------
| 1 | 3 | 3 |
| 2 | 3 | 3 |
| 3 | 3 | 3 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 2 | 2 |
What should i do ?
You can use a query like the following:
SELECT MONTH([Date]),
COUNT(*) AS SumProducts ,
COUNT(CASE WHEN [Date] > Deadline THEN 1 END) AS SumExpiredProducts
FROM mytable
WHERE YEAR([Date]) = YEAR(GETDATE())
GROUP BY MONTH([Date])