Valid_from Valid_to from a full loaded table - sql

There is a source table which loads the data full and monthly. The table looks like below example.
Source table:
pk
code_paym
code_terms
etl_id
1
2
3
2020-08-01
1
2
3
2020-09-01
1
2
4
2020-10-01
1
2
4
2020-11-01
1
2
4
2020-12-01
1
2
4
2021-01-01
1
2
3
2021-02-01
1
2
3
2021-03-01
1
2
3
2021-04-01
1
2
3
2021-05-01
I would like to create valid_from valid_to columns from the source table like below example.
Desired Output:
pk
code_paym
code_terms
valid_from
valid_to
1
2
3
2020-08-01
2020-09-01
1
2
4
2020-10-01
2021-01-01
1
2
3
2021-02-01
2021-05-01
As it can be seen attributes can go back to the same values by the time.
How can I make this output happen by sql code?
Thank you very much,
Regards

Using CONDITIONAL_TRUE_EVENT windowed function to determine continuous subgroups:
CREATE OR REPLACE TABLE t( pk INT, code_paym INT, code_terms INT, etl_id DATE)
AS
SELECT 1, 2, 3, '2020-08-01'
UNION ALL SELECT 1, 2, 3, '2020-09-01'
UNION ALL SELECT 1, 2, 4, '2020-10-01'
UNION ALL SELECT 1, 2, 4, '2020-11-01'
UNION ALL SELECT 1, 2, 4, '2020-12-01'
UNION ALL SELECT 1, 2, 4, '2021-01-01'
UNION ALL SELECT 1, 2, 3, '2021-02-01'
UNION ALL SELECT 1, 2, 3, '2021-03-01'
UNION ALL SELECT 1, 2, 3, '2021-04-01'
UNION ALL SELECT 1, 2, 3, '2021-05-01';
Query:
WITH cte AS (
SELECT t.*,
CONDITIONAL_TRUE_EVENT(CODE_TERMS != LAG(CODE_TERMS,1,CODE_TERMS)
OVER(PARTITION BY PK, CODE_PAYM ORDER BY ETL_ID))
OVER(PARTITION BY PK, CODE_PAYM ORDER BY ETL_ID) AS grp
FROM t
)
SELECT PK, CODE_PAYM, grp, MIN(ETL_ID) AS valid_from, MAX(ETL_ID) AS valid_to
FROM cte
GROUP BY PK, CODE_PAYM, grp;
Output:

Related

How to find the row with the highest value cell based on another column from within a group of values?

I have this table:
Site_ID
Volume
RPT_Date
RPT_Hour
1
10
01/01/2021
1
1
7
01/01/2021
2
1
13
01/01/2021
3
1
11
01/16/2021
1
1
3
01/16/2021
2
1
5
01/16/2021
3
2
9
01/01/2021
1
2
24
01/01/2021
2
2
16
01/01/2021
3
2
18
01/16/2021
1
2
7
01/16/2021
2
2
1
01/16/2021
3
I need to select the RPT_Hour with the highest Volume for each set of dates
Needed Output:
Site_ID
Volume
RPT_Date
RPT_Hour
1
13
01/01/2021
1
1
11
01/16/2021
1
2
24
01/01/2021
2
2
18
01/16/2021
1
SELECT site_id, volume, rpt_date, rpt_hour
FROM (SELECT t.*,
ROW_NUMBER()
OVER (PARTITION BY site_id, rpt_date ORDER BY volume DESC) AS rn
FROM MyTable) t
WHERE rn = 1;
I cannot figure out how to group the table into like date groups. If I could do that, I think the rn = 1 will return the highest volume row for each date.
The way I see it, your query is OK (but rpt_hour in desired output is not).
SQL> with test (site_id, volume, rpt_date, rpt_hour) as
2 (select 1, 10, date '2021-01-01', 1 from dual union all
3 select 1, 7, date '2021-01-01', 2 from dual union all
4 select 1, 13, date '2021-01-01', 3 from dual union all
5 select 1, 11, date '2021-01-16', 1 from dual union all
6 select 1, 3, date '2021-01-16', 2 from dual union all
7 select 1, 5, date '2021-01-16', 3 from dual union all
8 --
9 select 2, 9, date '2021-01-01', 1 from dual union all
10 select 2, 24, date '2021-01-01', 3 from dual union all
11 select 2, 16, date '2021-01-01', 3 from dual union all
12 select 2, 18, date '2021-01-16', 1 from dual union all
13 select 2, 7, date '2021-01-16', 2 from dual union all
14 select 2, 1, date '2021-01-16', 3 from dual
15 ),
16 temp as
17 (select t.*,
18 row_number() over (partition by site_id, rpt_date order by volume desc) rn
19 from test t
20 )
21 select site_id, volume, rpt_date, rpt_hour
22 from temp
23 where rn = 1
24 /
SITE_ID VOLUME RPT_DATE RPT_HOUR
---------- ---------- ---------- ----------
1 13 01/01/2021 3
1 11 01/16/2021 1
2 24 01/01/2021 3
2 18 01/16/2021 1
SQL>
One option would be using MAX(..) KEEP (DENSE_RANK ..) OVER (PARTITION BY ..) analytic function without need of any subquery such as :
SELECT DISTINCT
site_id,
MAX(volume) KEEP (DENSE_RANK FIRST ORDER BY volume DESC) OVER
(PARTITION BY site_id, rpt_date) AS volume,
rpt_date,
MAX(rpt_hour) KEEP (DENSE_RANK FIRST ORDER BY volume DESC) OVER
(PARTITION BY site_id, rpt_date) AS rpt_hour
FROM t
GROUP BY site_id, rpt_date, volume, rpt_hour
ORDER BY site_id, rpt_date
Demo

How to make a query showing purchases of a client on the same day, but only if those were made in diffrent stores (oracle)?

I want to show cases of clients with at least 2 purchases on the same day. But I only want to count those purchases that were made in different stores.
So far I have:
Select Purchase.PurClientId, Purchase.PurDate, Purchase.PurId
from Purchase
join
(
Select count(Purchase.PurId),
Purchase.PurClientId,
to_date(Purchase.PurDate)
from Purchases
group by Purchase.PurClientId,
to_date(Purchase.PurDate)
having count (Purchase.PurId) >=2
) k
on k.PurClientId=Purchase.PurClientId
But I have no clue how to make it count purchases only if those were made in different stores. The column which would allow to identify shop is Purchase.PurShopId.
Thanks for help!
You can use:
SELECT PurId,
PurDate,
PurClientId,
PurShopId
FROM (
SELECT p.*,
COUNT(DISTINCT PurShopId) OVER (
PARTITION BY PurClientId, TRUNC(PurDate)
) AS num_stores
FROM Purchase p
)
WHERE num_stores >= 2;
Or
SELECT *
FROM Purchase p
WHERE EXISTS(
SELECT 1
FROM Purchase x
WHERE p.purclientid = x.purclientid
AND p.purshopid != x.purshopid
AND TRUNC(p.purdate) = TRUNC(x.purdate)
);
Which, for the sample data:
CREATE TABLE purchase (
purid PRIMARY KEY,
purdate,
purclientid,
PurShopId
) AS
SELECT 1, DATE '2021-01-01', 1, 1 FROM DUAL UNION ALL
SELECT 2, DATE '2021-01-02', 1, 1 FROM DUAL UNION ALL
SELECT 3, DATE '2021-01-02', 1, 2 FROM DUAL UNION ALL
SELECT 4, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 5, DATE '2021-01-03', 1, 1 FROM DUAL UNION ALL
SELECT 6, DATE '2021-01-04', 1, 2 FROM DUAL;
Both output:
PURID
PURDATE
PURCLIENTID
PURSHOPID
2
2021-01-02 00:00:00
1
1
3
2021-01-02 00:00:00
1
2
db<>fiddle here

How can I partition by group that falls within a time range?

I have the following table showing when customers bought a certain product. The data I have is CustomerID, Amount, Dat. I am trying to create the column ProductsIn30Days, which represents how many products a customer bought in the range Dat-30 days inclusive the current day.
For example, ProductsIn30Days for CustomerID 1 on Dat 25.3.2020 is 7, since the customer bought 2 products on 25.3.2020 and 5 more products on 24.3.2020, which falls within 30 days before 25.3.2020.
CustomerID
Amount
Dat
ProductsIn30Days
1
1
23.3.2018
1
1
2
24.3.2020
2
1
3
24.3.2020
5
1
2
25.3.2020
7
1
2
24.5.2020
2
1
1
15.6.2020
3
2
7
24.3.2017
7
2
2
24.3.2020
2
I tried something like this with no success, since the partition only works on a single date rather than on a range like I would need:
select CustomerID, Amount, Dat,
sum(Amount) over (partition by CustomerID, Dat-30)
from table
Thank you for help.
You can use an analytic SUM function with a range window:
SELECT t.*,
SUM(Amount) OVER (
PARTITION BY CustomerID
ORDER BY Dat
RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW
) AS ProductsIn30Days
FROM table_name t;
Which, for the sample data:
CREATE TABLE table_name (CustomerID, Amount, Dat) AS
SELECT 1, 1, DATE '2018-03-23' FROM DUAL UNION ALL
SELECT 1, 2, DATE '2020-03-24' FROM DUAL UNION ALL
SELECT 1, 3, DATE '2020-03-24' FROM DUAL UNION ALL
SELECT 1, 2, DATE '2020-03-25' FROM DUAL UNION ALL
SELECT 1, 2, DATE '2020-05-24' FROM DUAL UNION ALL
SELECT 1, 1, DATE '2020-06-15' FROM DUAL UNION ALL
SELECT 2, 7, DATE '2017-03-24' FROM DUAL UNION ALL
SELECT 2, 2, DATE '2020-03-24' FROM DUAL;
Outputs:
CUSTOMERID
AMOUNT
DAT
PRODUCTSIN30DAYS
1
1
2018-03-23 00:00:00
1
1
2
2020-03-24 00:00:00
5
1
3
2020-03-24 00:00:00
5
1
2
2020-03-25 00:00:00
7
1
2
2020-05-24 00:00:00
2
1
1
2020-06-15 00:00:00
3
2
7
2017-03-24 00:00:00
7
2
2
2020-03-24 00:00:00
2
Note: If you have values on the same date then they will be tied in the order and always aggregated together (i.e. rows 2 & 3). If you want them to be aggregated separately then you need to order by something else to break the ties but that would not work with a RANGE window.
db<>fiddle here

Next value per group in SQL

I am trying to fix a data quality issue and I have the following table origin:
WITH origin AS (
SELECT 1 AS item_id, 'cake' as item_group, DATE '2020-04-01' AS start_date, DATE '2020-12-07' AS end_date, 1 as group_rank UNION ALL
SELECT 2, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 3, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 4, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 5, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 6, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank UNION ALL
SELECT 7, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank UNION ALL
SELECT 8, 'pie', DATE '2020-12-07',DATE '2020-12-31', 1 as group_rank UNION ALL
SELECT 9, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank UNION ALL
SELECT 10, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank
)
select *
from origin
item_id
item_group
start_date
end_date
group_rank
1
cake
2020-04-01
2020-12-07
1
2
cake
2020-12-07
2020-12-31
2
3
cake
2020-12-07
2020-12-31
2
4
cake
2020-12-07
2020-12-31
2
5
cake
2020-12-07
2020-12-31
2
6
cake
2020-12-31
2021-12-07
3
7
cake
2020-12-31
2021-12-07
3
8
pie
2020-12-07
2020-12-31
1
9
pie
2020-12-31
2021-12-07
2
10
pie
2020-12-31
2021-12-07
2
Every row is a unique item, which is of a certain item_group: pie or cake. Items within the group are ranked according to the start_date. The problem with the table is that when I do a join with a calendar table, because some items have overlapping start_date and end_date (1 item ends the same day when the other one end ) I end up having duplicates. What I want to achieve, I want to fix the end_dates (-1 day) of the old items.
For that I need to understand whether the items are overlapping within 1 day. I thought i'd use the rank to find the next value within the group: basically check current rank, find the one higher, take the start_date of the higher rank. But i couldn't figure out the way to get this right.
So my ideal table is the following:
WITH final_result AS (
SELECT 1 AS item_id, 'cake' as item_group, DATE '2020-04-01' AS start_date, DATE '2020-12-07' AS end_date, 1 as group_rank, DATE '2020-12-07' as next_group_start_date, 1 as end_date_equals_next_group_start_date, DATE '2020-12-06' as new_end_date UNION ALL
SELECT 2, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 3, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 4, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 5, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 6, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank, NULL, 0, DATE '2020-12-07' UNION ALL
SELECT 7, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank, NULL, 0, DATE '2020-12-07' UNION ALL
SELECT 8, 'pie', DATE '2020-12-07',DATE '2020-12-31', 1 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 9, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank, NULL, 0, DATE '2020-12-06' UNION ALL
SELECT 10, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank, NULL, 0, DATE '2020-12-06'
)
select *
from final_result
item_id
item_group
start_date
end_date
group_rank
next_group_start_date
end_date_equals_next_group_start_date
new_end_date
1
cake
2020-04-01
2020-12-07
1
2020-12-07
1
2020-12-06
2
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
3
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
4
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
5
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
6
cake
2020-12-31
2021-12-07
3
NULL
0
2020-12-07
7
cake
2020-12-31
2021-12-07
3
NULL
0
2020-12-07
8
pie
2020-12-07
2020-12-31
1
2020-12-31
1
2020-12-30
9
pie
2020-12-31
2021-12-07
2
NULL
0
2020-12-06
10
pie
2020-12-31
2021-12-07
2
NULL
0
2020-12-06
By identifying the new_group_start_date I can understand whether there is an overlap on a day. end_date_equals_next_group_start_dateshows whether start_date = new_group_start_date, i.e. there is an overlap. If so - I can create a new_end_date, which is end_date - 1.
What you want to do is use the LEAD() window function.
SELECT
*,
LEAD(start_date, 1) OVER(PARTITION BY item_group ORDER BY group_rank) AS next_group_start_date
FROM origin
This works but doesn't give the exact result you were expecting. In order to get the expected result you need to join the origin table with a table using the LEAD() window function a distinct item_group, group, start_date table.
SELECT
*,
end_date - end_date_equals_next_group_start_date AS new_end_date
FROM (
SELECT
origin.*,
b.next_group_start_date,
CASE
WHEN origin.end_date = b.next_group_start_date
THEN 1
ELSE 0
END AS end_date_equals_next_group_start_date
FROM origin
JOIN (
SELECT
item_group,
group_rank,
LEAD(start_date, 1) OVER(PARTITION BY item_group ORDER BY group_rank) AS next_group_start_date
FROM (
SELECT DISTINCT item_group, group_rank, start_date
FROM origin
) a
) b ON origin.item_group = b.item_group and origin.group_rank = b.group_rank
) c
Here's a dbfiddle of the query

Add default value in a range of date if not exist (SQL - BIGQUERY)

I need to add missing default value of 0 from a range of date.
id date item store sells
1 2015-08-01 1 1 2
2 2015-08-03 1 1 4
3 2015-08-04 1 1 1
4 2015-08-01 3 2 2
5 2015-08-02 3 2 2
6 2015-08-04 3 2 5
After apply the insert or complete the values
id date item store sells
1 2015-08-01 1 1 2
7 2015-08-02 1 1 0
2 2015-08-03 1 1 4
3 2015-08-04 1 1 1
4 2015-08-01 3 2 2
5 2015-08-02 3 2 2
8 2015-08-03 3 2 0
6 2015-08-04 3 2 5
I want to do this in Bigquery of google cloud, i could use INSERT or SELECT.
Below is for BigQuery Standard SQL
#standardSQL
WITH start_end AS (
SELECT item, store, MIN(date) start_date, MAX(date) end_date
FROM `project.dataset.table`
GROUP BY item, store
),
dates AS (
SELECT item, store, date
FROM start_end, UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS date
)
SELECT d.date, d.item, d.store, IFNULL(t.sells, 0) sell
FROM dates d
LEFT JOIN `project.dataset.table` t
ON d.date = t.date
AND d.item = t.item
AND d.store = t.store
ORDER BY item, d.date
You can test / play with it using dummy date from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2015-08-01' date, 1 item, 1 store, 2 sells UNION ALL
SELECT 2, DATE '2015-08-03', 1, 1, 4 UNION ALL
SELECT 3, DATE '2015-08-04', 1, 1, 1 UNION ALL
SELECT 4, DATE '2015-08-01', 3, 2, 2 UNION ALL
SELECT 5, DATE '2015-08-02', 3, 2, 2 UNION ALL
SELECT 6, DATE '2015-08-04', 3, 2, 5
),
start_end AS (
SELECT item, store, MIN(date) start_date, MAX(date) end_date
FROM `project.dataset.table`
GROUP BY item, store
),
dates AS (
SELECT item, store, date
FROM start_end, UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS date
)
SELECT d.date, d.item, d.store, IFNULL(t.sells, 0) sell
FROM dates d
LEFT JOIN `project.dataset.table` t
ON d.date = t.date
AND d.item = t.item
AND d.store = t.store
ORDER BY item, d.date
The output is as below
date item store sell
2015-08-01 1 1 2
2015-08-02 1 1 0
2015-08-03 1 1 4
2015-08-04 1 1 1
2015-08-01 3 2 2
2015-08-02 3 2 2
2015-08-03 3 2 0
2015-08-04 3 2 5
Update for follow up in comment: what about the id? because in bigdata doesn't have autoincrement
#standardSQL
WITH start_end AS (
SELECT item, store, MIN(DATE) start_date, MAX(DATE) end_date
FROM `project.dataset.table`
GROUP BY item, store
),
dates AS (
SELECT item, store, DATE
FROM start_end, UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS DATE
)
SELECT ROW_NUMBER() OVER(ORDER BY IFNULL(id, 999999999), d.item, d.date) id, d.date, d.item, d.store, IFNULL(t.sells, 0) sell
FROM dates d
LEFT JOIN `project.dataset.table` t
ON d.date = t.date
AND d.item = t.item
AND d.store = t.store
ORDER BY d.item, d.date
You can test / play with it using below dummy data
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2015-08-01' DATE, 1 item, 1 store, 2 sells UNION ALL
SELECT 2, DATE '2015-08-03', 1, 1, 4 UNION ALL
SELECT 3, DATE '2015-08-04', 1, 1, 1 UNION ALL
SELECT 4, DATE '2015-08-01', 3, 2, 2 UNION ALL
SELECT 5, DATE '2015-08-02', 3, 2, 2 UNION ALL
SELECT 6, DATE '2015-08-04', 3, 2, 5
),
start_end AS (
SELECT item, store, MIN(DATE) start_date, MAX(DATE) end_date
FROM `project.dataset.table`
GROUP BY item, store
),
dates AS (
SELECT item, store, DATE
FROM start_end, UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS DATE
)
SELECT ROW_NUMBER() OVER(ORDER BY IFNULL(id, 999999999), d.item, d.date) id, d.date, d.item, d.store, IFNULL(t.sells, 0) sell
FROM dates d
LEFT JOIN `project.dataset.table` t
ON d.date = t.date
AND d.item = t.item
AND d.store = t.store
ORDER BY d.item, d.date
The output is
id date item store sell
1 2015-08-01 1 1 2
7 2015-08-02 1 1 0
2 2015-08-03 1 1 4
3 2015-08-04 1 1 1
4 2015-08-01 3 2 2
5 2015-08-02 3 2 2
8 2015-08-03 3 2 0
6 2015-08-04 3 2 5
Note: above update is solely based on your limited example provided in your question and can potentially not cover all cases