How to fill missing values for missing dates with value from date before in sql bigquery? [duplicate] - google-bigquery

This question already has an answer here:
Create Balance Sheet with every date is filled in Bigquery
(1 answer)
Closed 8 months ago.
Hi I have a product table with daily price, the catch here is that for the table only updates if there's a price change, and for the dates in between will not be written into the table because the price is the same as the day before.
How do I fill missing values of price with the last entry of date before?
date
id
price
2022-01-01
1
5
2022-01-03
1
6
2022-01-05
1
7
2022-01-01
2
10
2022-01-02
2
11
2022-01-06
2
12
into
date
id
price
2022-01-01
1
5
2022-01-02
1
5
2022-01-03
1
6
2022-01-04
1
6
2022-01-05
1
7
2022-01-01
2
10
2022-01-02
2
11
2022-01-03
2
11
2022-01-04
2
11
2022-01-05
2
11
2022-01-06
2
12
I am currently thinking of creating a table for dates and joining and using lag function. Anyone can help?
select
date,id,
case
when price is null then nullPrice
else price
end as price
from(
select *,
Lag(price, 1) OVER(.
ORDER BY date,id ASC) AS nullPrice
from price_table
join date_table using(date)
)

Consider below:
WITH days_by_id AS (
SELECT id, GENERATE_DATE_ARRAY(MIN(date), MAX(date)) days
FROM sample
GROUP BY id
)
SELECT date, id,
IFNULL(price, LAST_VALUE(price IGNORE NULLS) OVER (PARTITION BY id ORDER BY date)) AS price
FROM days_by_id, UNNEST(days) date LEFT JOIN sample USING (id, date);
output :

You can use generate_date_array function for this
with date_arr
as(
select *
from unnest(generate_date_array('2022-01-01', '2022-05-01')) as dt)
select da.dt, t1.*
from date_arr da
left outer join table1 t1
on da.dt = t1.dt
You can replace hardcoded dates with max and min date from table.

Related

Is there anyway to check some interval in sql

For example I have a table like this:
CREATE TABLE sales (
id int NOT NULL PRIMARY KEY,
sku text NOT NULL,
date date NOT NULL,
amount real NOT NULL,
CONSTRAINT date_sku UNIQUE (sku,date)
)
Is there anyway to check for each sku if every 2 days average sales is bigger than for example 14 amount sold. I want to find date ranges, the percentage and amount it sold in those days.
dbfiddle
for example for sku B in my example, it sold 15 at 2022-01-01 and 20 at 2022-01-02 and the average is 17.5 for these 2 days which is bigger than 14 therefore it will appear in my result and the change is 17.5 / 14 = 1.25.
Again for the next 2 days we have 20 at 2022-01-02 and 13 at 2022-01-03. Therefore the average is 16.5 which is bigger than 14 and it will appear in the result
but for 13 at 2022-01-03 and 12 at 2022-01-04 and the average is about 12.5. Because 12.5 is not bigger than 14, it will not appear in the result.
my desired output with 14 amount example is:
sku start_date end_date amount_sold change_rate
B 2022-01-01 2022-01-02 17.5 1.25
B 2022-01-02 2022-01-03 16.5 1.17
D 2022-01-01 2022-01-02 28 2
I tried using CASE WHEN but I know that it wont work for large data like one year:
SELECT *
FROM (
SELECT sku,
AVG(CASE WHEN date BETWEEN '2022-01-01' AND '2022-01-02' THEN amount END) AS first_in,
AVG(CASE WHEN date BETWEEN '2022-01-02' AND '2022-01-03' THEN amount END) AS second_in,
AVG(CASE WHEN date BETWEEN '2022-01-03' AND '2022-01-04' THEN amount END) AS third_in
FROM sales
GROUP BY sku
) AS t
WHERE first_in > 14
OR second_in > 14
OR third_in > 14
As a general rule, use the LEAD (or LAG) to retrieve data from the next or previous record. At least this is what I did before you asked for possibly several days. Other window functions are suitable for your need if you want more than 1 day:
SELECT *, averageamount/14
FROM (
SELECT sku, date,
MAX(date) OVER w AS nextdate,
AVG(amount) OVER w AS averageAmount
FROM sales
WINDOW w AS (PARTITION BY sku ORDER BY date RANGE BETWEEN '0 day' PRECEDING AND '2 days' FOLLOWING )
) s
WHERE averageAmount > 14
This above select all the ranges that are up to 3 days long (days D, D+1 and D+2). You may want to remove the ranges that are less than 3 days long by appending the additional condition:
AND nextdate >= date + interval '2 days'

snowflake sql: sum for each day between two dates

I hope someone can help. Suppose I have this table
id
actual_date
target_date
qty
1
2022-01-01
2022-01-01
2
2
2022-01-02
2022-01-01
1
3
2022-01-03
2022-01-01
3
4
2022-01-03
2022-01-02
1
5
2022-01-03
2022-01-03
2
what i would like to calculate is the qty that has to be processed on each date.
E.g. on the target date 2022-01-01 the quota qty is 6 (2+1+3).
On the 2.1.2022 i would also have to add the qtys that havent been processed on the day before, which means id 2 because the actual date is 2022-01-02 (so after the target date) and id 3. The quota qty for the 2022-01-02 is then 1+3+1.
And for the 2022-01-03 is 6 = 2+1+3, because id 3 has an actual date on 2022-01-02 (it wasnt processed neither on 01-01 nor on 01-02 and id 4 wasnt processed on 01-02.
Here's what the desired output would look like:
target_date
qty_qouta
2022-01-01
6
2022-01-02
4
2022-01-03
6
Hopefully this gets you started ... recommend testing heaps more edge cases, the business rules don't quite feel right to me -> as you don't seem to show when actual>target. But hope this helps.
WITH CTE AS( SELECT 1 ID, '2022-01-01'::DATE ACTUAL_DATE,'2022-01-01'::DATE TARGET_DATE, 2 QTY
UNION ALL SELECT 2 ID, '2022-01-02'::DATE ACTUAL_DATE,'2022-01-01'::DATE TARGET_DATE, 1 QTY
UNION ALL SELECT 3 ID, '2022-01-03'::DATE ACTUAL_DATE,'2022-01-01'::DATE TARGET_DATE, 3 QTY
UNION ALL SELECT 4 ID, '2022-01-03'::DATE ACTUAL_DATE,'2022-01-02'::DATE TARGET_DATE, 1 QTY
UNION ALL SELECT 5 ID, '2022-01-03'::DATE ACTUAL_DATE,'2022-01-03'::DATE TARGET_DATE, 2 QTY
)
,CTE2 AS(SELECT
ACTUAL_DATE D
, SUM(QTY) ACTUAL_QTY
FROM CTE GROUP BY 1)
,CTE3 AS(SELECT
TARGET_DATE D
, SUM(QTY) TARGET_QTY
FROM CTE GROUP BY 1)
SELECT
D DATE
,ACTUAL_QTY
,TARGET_QTY
,TARGET_QTY-ACTUAL_QTY DELTA
,ZEROIFNULL(LAG(DELTA)OVER(PARTITION BY 1 ORDER BY D))GHOST
,GREATEST(TARGET_QTY,DELTA+GHOST,ACTUAL_QTY)VOLIA
FROM
CTE2 FULL OUTER JOIN CTE3 USING(D);

Find top common value

I am trying to get the top common date group by code and item. Is there anyway I can achieve this in snowflake?
My current table looks something like this. I need to extract out the date that is available in all item for each code. For e.g. for code = 1, I only want date = 2022-03-01 because it's the only date that is common between item a,b,c.
Code
Date
item
1
2022-01-01
a
1
2022-03-01
a
1
2022-01-01
b
1
2022-03-01
b
1
2022-03-01
c
1
2022-05-01
c
2
2022-01-01
a
2
2022-05-01
a
2
2022-01-01
b
2
2022-03-01
b
2
2022-01-01
c
My end result:
Code
Date
item
1
2022-03-01
a
1
2022-03-01
b
1
2022-03-01
c
2
2022-01-01
a
2
2022-01-01
b
2
2022-01-01
c
You may use count window function to count the similar dates for each code, then use the desne_rank function to get the rows with date value equal to the date with max count.
with count_dates as
(
select *,
count(*) over (partition by Code, Date) cn
from table_name
)
select Code, Date, item
from
(
select *,
dense_rank() over (partition by Code order by cn desc) rnk
from count_dates
) T
where rnk=1
order by Code
Using dense_rank() over (partition by Code order by cn desc) rnk will return all the latest common dates (dates with with same maximum count value), if you want to get only the latest common date use dense_rank() over (partition by Code order by cn desc, Date desc) rnk.
Output:

Generate multiples rows of new column based on one value of another column

I have a table like below:
ID
Date
1
2022-01-01
2
2022-03-21
I want to add a new column based on the date and it should look like this
ID
Date
NewCol
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Let's say that there is a #EndDate = 2022-05-31 (that's where it should stop)
I'm having a hard time trying to figure out how to do it in SSMS. Would appreciate any insights! Thanks :)
In the following solutions we leverage string_split with combination with replicate to generate new records.
select ID
,Date
,dateadd(month, row_number() over(partition by ID order by (select null)), Date) as NewCol
from (
select *
from t
outer apply string_split(replicate(',',datediff(month, Date, '2022-05-31')-1),',')
) t
ID
Date
NewCol
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Fiddle
For SQL in Azure and SQL Server 2022 we have a cleaner solution based on [ordinal][4].
"The enable_ordinal argument and ordinal output column are currently
supported in Azure SQL Database, Azure SQL Managed Instance, and Azure
Synapse Analytics (serverless SQL pool only). Beginning with SQL
Server 2022 (16.x) Preview, the argument and output column are
available in SQL Server."
select ID
,Date
,dateadd(month, ordinal, Date) as NewCol
from (
select *
from t
outer apply string_split(replicate(',',datediff(month, Date, '2022-05-31')-1),',',1)
) t
with cal (id, dt) as
(
select id, date as dt from t
union all select id, dateadd(month, 1, dt) from cal where month(dt) < month('2022-05-31')
)
select t.id
,t.date
,cal.dt as new_col
from cal join t on t.id = cal.id and t.date != cal.dt
order by id, new_col
id
date
new_col
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Fiddle
There are many ways to "explode" a row into a set, the simplest in my opinion is a recursive CTE:
DECLARE #endpoint date = '20220531';
DECLARE #prev date = DATEADD(MONTH, -1, #endpoint);
WITH x AS
(
SELECT ID, date, NewCol = DATEADD(MONTH, 1, date) FROM #d
UNION ALL
SELECT ID, date, DATEADD(MONTH, 1, NewCol) FROM x
WHERE NewCol < #prev
)
SELECT * FROM x
ORDER BY ID, NewCol;
Working example in this fiddle.
Keep in mind that if you could have > 100 months you'll need to add OPTION (MAXRECURSION) (or just consider using a different solution at scale).

How to do one-to-one inner join

I've a transaction table of purchased and returned items, and I want to match a return transaction with the transaction where that corresponding item was purchased. (Here I used the same item ID and amount in all records for simplicity)
trans_ID
date
item_ID
amt
type
1
2022-01-09
100
5000
purchase
2
2022-01-07
100
5000
return
3
2022-01-06
100
5000
purchase
4
2022-01-05
100
5000
purchase
5
2022-01-04
100
5000
return
6
2022-01-03
100
5000
return
7
2022-01-03
100
5000
purchase
8
2022-01-02
100
5000
purchase
9
2022-01-01
100
5000
return
Matching conditions are:
The return date must be greater than or equal the purchase date
The return and purchase transactions must relate to the same item's ID and same transaction amount
For each return, there must be only 1 purchase matched to it (In case there are many related purchases, choose one with the most recent purchase date. But if the most recent purchase was already used for mapping with another return, choose the second-most recent purchase instead, and so on.)
From 3), that means each purchase must be matched with only 1 return as well.
The result should look like this.
trans_ID
date
trans_ID_matched
date_matched
2
2022-01-07
3
2022-01-06
5
2022-01-04
7
2022-01-03
6
2022-01-03
8
2022-01-02
This is what I've tried.
with temp as (
select a.trans_ID, a.date
, b.trans_ID as trans_ID_matched
, b.date as date_matched
, row_number() over (partition by a.trans_ID, a.date order by b.date desc) as rn1
from
(
select *
from transaction_table
where type = 'return'
) a
inner join
(
select *
from transaction_table
where type = 'purchase'
) b
on a.item_ID = b.item_ID and a.amount = b.amount and a.date >= b.date
)
select * from temp where rn = 1
But what I got is
trans_ID
date
trans_ID_matched
date_matched
2
2022-01-07
3
2022-01-06
5
2022-01-04
7
2022-01-03
6
2022-01-03
7
2022-01-03
Here, the trans ID 7 shouldn't be used again in the last row as it has been already matched with trans ID 5 in the row 2. So is there any way to match trans ID 6 with 8 (or any way to tell SQL not to use the already-used one like the purchase 7) ?
I created a fiddle, the result seem OK, but it's up to you to test if this is OK on all situtations..... 😉
WITH cte as (
SELECT
t1.trans_ID,
t1.[date],
t1.item_ID,
t1.amt,
t1.[type],
pur.trans_ID trans_ID_matched,
pur.[date] datE_matched,
jojo.c
FROM table1 t1
CROSS APPLY (
SELECT
trans_ID,
item_ID,
[date],
amt
FROM table1 pur
WHERE pur.[type] = 'purchase' and t1.[type]='return'
and pur.item_ID = t1.item_ID
and pur.amt = t1.amt
and pur.[date] <= t1.[date]
) pur
CROSS APPLY (
SELECt count(*) as c FROM table1 WHERE trans_ID> t1.trans_ID and trans_ID<pur.trans_ID
) jojo
where jojo.c <=2
)
select
trans_ID,
[date],
item_ID,
amt,
CASE WHEN min(c)=0 then min(trans_ID_matched) else max(trans_ID_matched) end
from cte
group by
trans_ID,
[date],
item_ID,
amt
order by trans_ID;
DBFIDDLE
The count(*) detects the distance between the selected trans_ID from the return and the purchase.
This might go wrong the are more than 2 adjacent 'returns'... (I am afraid it will break, so I did not test this 😢).
But is's a nice problem. Hopefully this will give you any other ideas to find the correct sulution!