Related
I have the following table where I want to filter only the last 4 weeks - challenge: the date range of the underlying table must be from 2018 - 2021 so that all other columns can be filled. Filtering the date did not work for me, because then I wouldn't get data for the columns of the previous year.
How can I filter the table so that I always get the last 4 weeks from today but also have the data of all other columns?
+---------+---------------+---------------+---------------+---------------+--+
| isoweek | sessions_2021 | sessions_2020 | sessions_2019 | sessions_2018 | |
+---------+---------------+---------------+---------------+---------------+--+
| 44 | 534260 | 156450 | 476604 | 539819 | |
+---------+---------------+---------------+---------------+---------------+--+
| 45 | 514197 | 133285 | 481228 | 491133 | |
+---------+---------------+---------------+---------------+---------------+--+
| 46 | 487541 | 122930 | 448876 | 485281 | |
+---------+---------------+---------------+---------------+---------------+--+
| 47 | 502791 | 169920 | 267869 | 491630 | |
+---------+---------------+---------------+---------------+---------------+--+
| 48 | 430129 | 144058 | null | 459051 | |
+---------+---------------+---------------+---------------+---------------+--+
| 49 | 410885 | 127426 | null | 468970 | |
+---------+---------------+---------------+---------------+---------------+--+
| 50 | 183323 | 147254 | null | 438814 | |
+---------+---------------+---------------+---------------+---------------+--+
| 51 | null | 122491 | null | 455786 | |
+---------+---------------+---------------+---------------+---------------+--+
| 52 | null | 70972 | null | 478501 | |
+---------+---------------+---------------+---------------+---------------+--+
| 53 | null | null | 52712 | null | |
+---------+---------------+---------------+---------------+---------------+--+
If the input is the example table you have above you can try the following:
with sample_data as (
select isoweek from unnest(generate_array(1,52,1)) as isoweek
)
select isoweek
from sample_data
where isoweek between extract(isoweek from date_sub(current_date(),INTERVAL 4 WEEK))
and extract(isoweek from current_date());
I'd expect this to work to get me a list of calendar dates over the past 12 months excluding weekends; but it just gives me the entire list of dates - which I suppose is fine - but want to know why the below is incorrect.
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
AND to_char(sysdate,'DY') NOT IN ('SAT','SUN')
Because you're doing this:
AND to_char(sysdate,'DY') NOT IN ('SAT','SUN')
And today isn't Saturday or Sunday. You need to look at the calculated CalendarDate value; but you can't do that in the same level of subquery. You could try to recalculate it:
AND to_char(ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum,'DY') NOT IN ('SAT','SUN')
but this will return no rows - at least when run at the moment. As it happens, March 1st 2020 was a Sunday, so that is excluded; and because of when and how rownum is generated, that result is excluded, and the next one sees the same value, which is excluded, and so on.
You can use an inline view to avoid both issues:
SELECT CalendarDate
FROM (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
)
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
CALENDARDATE
02-MAR-20
03-MAR-20
04-MAR-20
05-MAR-20
06-MAR-20
09-MAR-20
10-MAR-20
...
db<>fiddle
I've chucked in a language modifier to stop it behaving differently for users with sessions not set to English.
Querying against all_objects isn't ideal though, it would be better to use a hierarcical query:
SELECT *
FROM (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + level AS CalendarDate
FROM dual
CONNECT BY level <= TRUNC(SYSDATE) - ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) + 1
)
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
ORDER BY CalendarDate
db<>fiddle
or a recursive CTE, if you're 11gR2+:
WITH rcte (CalendarDate) AS (
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12)
FROM dual
UNION ALL
SELECT rcte.CalendarDate + interval '1' day
FROM rcte
WHERE rcte.CalendarDate < TRUNC(SYSDATE)
)
SELECT CalendarDate
FROM rcte
WHERE to_char(CalendarDate,'DY','NLS_DATE_LANGUAGE=ENGLISH') NOT IN ('SAT','SUN')
ORDER BY CalendarDate
db<>fiddle (as 18c to avoid a couple of issues with the patch level in the 11g version it uses).
You checking whether today is sunday or monday with to_char(sysdate,'DY'). you need to check CalendarDate which is not available in your window. You can use cte to calculate the calendar then you can remove weekends with your condition as below.
with cte (CalendarDate) as
(
SELECT ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum AS CalendarDate
FROM all_objects
WHERE ADD_MONTHS(TRUNC(SYSDATE,'MM'),-12) - 1 + rownum <= sysdate
)
select * from cte where
to_char(CalendarDate,'DY') not in ('SAT','SUN');
| CALENDARDATE |
| :----------- |
| 02-MAR-20 |
| 03-MAR-20 |
| 04-MAR-20 |
| 05-MAR-20 |
| 06-MAR-20 |
| 09-MAR-20 |
| 10-MAR-20 |
| 11-MAR-20 |
| 12-MAR-20 |
| 13-MAR-20 |
| 16-MAR-20 |
| 17-MAR-20 |
| 18-MAR-20 |
| 19-MAR-20 |
| 20-MAR-20 |
| 23-MAR-20 |
| 24-MAR-20 |
| 25-MAR-20 |
| 26-MAR-20 |
| 27-MAR-20 |
| 30-MAR-20 |
| 31-MAR-20 |
| 01-APR-20 |
| 02-APR-20 |
| 03-APR-20 |
| 06-APR-20 |
| 07-APR-20 |
| 08-APR-20 |
| 09-APR-20 |
| 10-APR-20 |
| 13-APR-20 |
| 14-APR-20 |
| 15-APR-20 |
| 16-APR-20 |
| 17-APR-20 |
| 20-APR-20 |
| 21-APR-20 |
| 22-APR-20 |
| 23-APR-20 |
| 24-APR-20 |
| 27-APR-20 |
| 28-APR-20 |
| 29-APR-20 |
| 30-APR-20 |
| 01-MAY-20 |
| 04-MAY-20 |
| 05-MAY-20 |
| 06-MAY-20 |
| 07-MAY-20 |
| 08-MAY-20 |
| 11-MAY-20 |
| 12-MAY-20 |
| 13-MAY-20 |
| 14-MAY-20 |
| 15-MAY-20 |
| 18-MAY-20 |
| 19-MAY-20 |
| 20-MAY-20 |
| 21-MAY-20 |
| 22-MAY-20 |
| 25-MAY-20 |
| 26-MAY-20 |
| 27-MAY-20 |
| 28-MAY-20 |
| 29-MAY-20 |
| 01-JUN-20 |
| 02-JUN-20 |
| 03-JUN-20 |
| 04-JUN-20 |
| 05-JUN-20 |
| 08-JUN-20 |
| 09-JUN-20 |
| 10-JUN-20 |
| 11-JUN-20 |
| 12-JUN-20 |
| 15-JUN-20 |
| 16-JUN-20 |
| 17-JUN-20 |
| 18-JUN-20 |
| 19-JUN-20 |
| 22-JUN-20 |
| 23-JUN-20 |
| 24-JUN-20 |
| 25-JUN-20 |
| 26-JUN-20 |
| 29-JUN-20 |
| 30-JUN-20 |
| 01-JUL-20 |
| 02-JUL-20 |
| 03-JUL-20 |
| 06-JUL-20 |
| 07-JUL-20 |
| 08-JUL-20 |
| 09-JUL-20 |
| 10-JUL-20 |
| 13-JUL-20 |
| 14-JUL-20 |
| 15-JUL-20 |
| 16-JUL-20 |
| 17-JUL-20 |
| 20-JUL-20 |
| 21-JUL-20 |
| 22-JUL-20 |
| 23-JUL-20 |
| 24-JUL-20 |
| 27-JUL-20 |
| 28-JUL-20 |
| 29-JUL-20 |
| 30-JUL-20 |
| 31-JUL-20 |
| 03-AUG-20 |
| 04-AUG-20 |
| 05-AUG-20 |
| 06-AUG-20 |
| 07-AUG-20 |
| 10-AUG-20 |
| 11-AUG-20 |
| 12-AUG-20 |
| 13-AUG-20 |
| 14-AUG-20 |
| 17-AUG-20 |
| 18-AUG-20 |
| 19-AUG-20 |
| 20-AUG-20 |
| 21-AUG-20 |
| 24-AUG-20 |
| 25-AUG-20 |
| 26-AUG-20 |
| 27-AUG-20 |
| 28-AUG-20 |
| 31-AUG-20 |
| 01-SEP-20 |
| 02-SEP-20 |
| 03-SEP-20 |
| 04-SEP-20 |
| 07-SEP-20 |
| 08-SEP-20 |
| 09-SEP-20 |
| 10-SEP-20 |
| 11-SEP-20 |
| 14-SEP-20 |
| 15-SEP-20 |
| 16-SEP-20 |
| 17-SEP-20 |
| 18-SEP-20 |
| 21-SEP-20 |
| 22-SEP-20 |
| 23-SEP-20 |
| 24-SEP-20 |
| 25-SEP-20 |
| 28-SEP-20 |
| 29-SEP-20 |
| 30-SEP-20 |
| 01-OCT-20 |
| 02-OCT-20 |
| 05-OCT-20 |
| 06-OCT-20 |
| 07-OCT-20 |
| 08-OCT-20 |
| 09-OCT-20 |
| 12-OCT-20 |
| 13-OCT-20 |
| 14-OCT-20 |
| 15-OCT-20 |
| 16-OCT-20 |
| 19-OCT-20 |
| 20-OCT-20 |
| 21-OCT-20 |
| 22-OCT-20 |
| 23-OCT-20 |
| 26-OCT-20 |
| 27-OCT-20 |
| 28-OCT-20 |
| 29-OCT-20 |
| 30-OCT-20 |
| 02-NOV-20 |
| 03-NOV-20 |
| 04-NOV-20 |
| 05-NOV-20 |
| 06-NOV-20 |
| 09-NOV-20 |
| 10-NOV-20 |
| 11-NOV-20 |
| 12-NOV-20 |
| 13-NOV-20 |
| 16-NOV-20 |
| 17-NOV-20 |
| 18-NOV-20 |
| 19-NOV-20 |
| 20-NOV-20 |
| 23-NOV-20 |
| 24-NOV-20 |
| 25-NOV-20 |
| 26-NOV-20 |
| 27-NOV-20 |
| 30-NOV-20 |
| 01-DEC-20 |
| 02-DEC-20 |
| 03-DEC-20 |
| 04-DEC-20 |
| 07-DEC-20 |
| 08-DEC-20 |
| 09-DEC-20 |
| 10-DEC-20 |
| 11-DEC-20 |
| 14-DEC-20 |
| 15-DEC-20 |
| 16-DEC-20 |
| 17-DEC-20 |
| 18-DEC-20 |
| 21-DEC-20 |
| 22-DEC-20 |
| 23-DEC-20 |
| 24-DEC-20 |
| 25-DEC-20 |
| 28-DEC-20 |
| 29-DEC-20 |
| 30-DEC-20 |
| 31-DEC-20 |
| 01-JAN-21 |
| 04-JAN-21 |
| 05-JAN-21 |
| 06-JAN-21 |
| 07-JAN-21 |
| 08-JAN-21 |
| 11-JAN-21 |
| 12-JAN-21 |
| 13-JAN-21 |
| 14-JAN-21 |
| 15-JAN-21 |
| 18-JAN-21 |
| 19-JAN-21 |
| 20-JAN-21 |
| 21-JAN-21 |
| 22-JAN-21 |
| 25-JAN-21 |
| 26-JAN-21 |
| 27-JAN-21 |
| 28-JAN-21 |
| 29-JAN-21 |
| 01-FEB-21 |
| 02-FEB-21 |
| 03-FEB-21 |
| 04-FEB-21 |
| 05-FEB-21 |
| 08-FEB-21 |
| 09-FEB-21 |
| 10-FEB-21 |
| 11-FEB-21 |
| 12-FEB-21 |
| 15-FEB-21 |
| 16-FEB-21 |
| 17-FEB-21 |
| 18-FEB-21 |
| 19-FEB-21 |
| 22-FEB-21 |
| 23-FEB-21 |
| 24-FEB-21 |
| 25-FEB-21 |
| 26-FEB-21 |
| 01-MAR-21 |
| 02-MAR-21 |
| 03-MAR-21 |
| 04-MAR-21 |
| 05-MAR-21 |
| 08-MAR-21 |
| 09-MAR-21 |
db<>fiddle here
I have a table (t_stocks) with data like this:
exchanged,stock_symbol,closing_date,closing_price
NSE,TCS,2009-08-09,2200.1
NSE,TCS,2009-08-10,2300.1
NSE,TCS,2009-08-11,12200.1
NSE,TCS,2009-08-12,22300.1
NSE,TCS,2009-09-09,2200.1
NSE,TCS,2009-09-10,2300.1
NSE,TCS,2009-09-11,12200.1
NSE,TCS,2009-09-12,22300.1
NSE,INFY,2009-08-09,2500.34
NSE,INFY,2009-08-10,1500.34
NSE,INFY,2009-08-09,7500.34
NSE,INFY,2009-08-10,14500.34
NSE,INFY,2009-09-09,2500.34
NSE,INFY,2009-09-10,1500.34
NSE,INFY,2009-09-09,7500.34
NSE,INFY,2009-09-10,14500.34
NSE,TCS,2010-08-09,2200.1
NSE,TCS,2010-08-10,2300.1
NSE,TCS,2010-08-11,12200.1
NSE,TCS,2010-08-12,22300.1
NSE,TCS,2010-09-09,2200.1
NSE,TCS,2010-09-10,2300.1
NSE,TCS,2010-09-11,12200.1
NSE,TCS,2010-09-12,22300.1
NSE,INFY,2010-08-09,2500.34
NSE,INFY,2010-08-10,1500.34
NSE,INFY,2010-08-09,7500.34
NSE,INFY,2010-08-10,14500.34
NSE,INFY,2010-09-09,2500.34
NSE,INFY,2010-09-10,1500.34
NSE,INFY,2010-09-09,7500.34
NSE,INFY,2010-09-10,14500.34
...
...
I would need to write a query which generate a report as follow.
exchanged, stock_symbol , closing_date , closing_price , yesterday_closing , diff_yesterday_price
(Price difference between Yesterday prices and today prices) with an output like this:
+----------------+-------------------+-------------------+--------------------+------------------------+-----------------------+--+
| exchanged | stock_symbol | closing_date | closing_price | yesterday_closing | diff_yesterday_price |
+----------------+-------------------+-------------------+--------------------+------------------------+-----------------------+--+
| NSE | INFY | 2009-08-09 | 2500.34 | NULL | NULL |
| NSE | INFY | 2009-08-09 | 7500.34 | 2500.34 | -5000 |
| NSE | INFY | 2009-08-10 | 14500.34 | 7500.34 | -7000 |
| NSE | INFY | 2009-08-10 | 1500.34 | 14500.34 | 13000 |
| NSE | INFY | 2009-09-09 | 7500.34 | 1500.34 | -6000 |
| NSE | INFY | 2009-09-09 | 2500.34 | 7500.34 | 5000 |
| NSE | INFY | 2009-09-10 | 14500.34 | 1500.34 | -13000 |
| NSE | INFY | 2009-09-10 | 1500.34 | 2500.34 | 1000 |
| NSE | INFY | 2010-08-09 | 7500.34 | 14500.34 | 7000 |
| NSE | INFY | 2010-08-09 | 2500.34 | 7500.34 | 5000 |
.....
.....
May anyone give me some clues to do this in an efficient way.
Thanks in advance,
Regards.
You can use hive window function lag() to solve this. You can read more about window functions in hive here.
Here is the working DEMO in PostgreSQL, but same query works in HIVE as well.
select
exchanged,
stock_symbol,
closing_date,
closing_price,
yesterday_price,
(yesterday_price - closing_price) as diff_yesterday_price
from
(
select
*,
lag(closing_price) over (partition by stock_symbol order by closing_date) as yesterday_price
from stockExchange
) la
order by
stock_symbol,
closing_date
I have list of database that needed to be grouped. I've successfully done this by using R, yet now I have to do this by using BigQuery. The data is shown as per following table
| category | sub_category | date | day | timestamp | type | cpc | gmv |
|---------- |-------------- |----------- |----- |------------- |------ |------ |--------- |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:37:36 PM | BI | 1.94 | 252,293 |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:37:39 PM | RT | 1.94 | 252,293 |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:38:29 PM | RT | 1.58 | 205,041 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:14 AM | BI | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:18 AM | RT | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:52 AM | RT | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:06:33 AM | BI | 1.55 | 201,354 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:55:47 PM | PP | 1 | 129,282 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:56:23 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:57:19 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:57:34 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:58:46 PM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:59:27 PM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:59:51 PM | RT | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:00:57 AM | BI | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:01:11 AM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:03:01 AM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:12:42 AM | RT | 1.19 | 154,886 |
I wanted to group the rows. A row that has <= 8 minutes timestamp-difference with the next row will be grouped as one row with below output example:
| category | sub_category | date | day | time | start_timestamp | end_timestamp | type | cpc | gmv |
|---------- |-------------- |----------------------- |--------- |---------- |--------------------- |--------------------- |---------- |------ |--------- |
| ABC | ABC-1 | 2/17/2020 | Mon | 23:37:36 | (02/17/20 23:37:36) | (02/17/20 23:38:29) | BI|RT | 1.82 | 236,542 |
| ABC | ABC-1 | 2/18/2020 | Tue | 0:05:14 | (02/18/20 00:05:14) | (02/18/20 00:06:33) | BI|RT | 1.59 | 206,636 |
| XYZ | XYZ-1 | 02/17/2020|02/18/2020 | Mon|Tue | 0:06:21 | (02/17/20 23:55:47) | (02/18/20 00:12:42) | PP|RT|BI | 0.95 | 123,815 |
There were some new-generated fields as per below:
| fields | definition |
|----------------- |-------------------------------------------------------- |
| day | Day of the row (combination if there's different days) |
| time | Start of timestamp |
| start_timestamp | Start timestamp of the first row in group |
| end_timestamp | Start timestamp of the last row in group |
| type | Type of Row (combination if there's different types) |
| cpc | Average CPC of the Group |
| gwm | Average GMV of the Group |
Could anyone help me to make the query as per above requirements?
Thank you
This is a gaps and island problem. Here is a solution that uses lag() and a cumulative sum() to define groups of adjacent records with less than 8 minutes gap; the rest is aggregation.
select
category,
sub_category,
string_agg(distinct day, '|' order by dt) day,
min(dt) start_dt,
max(dt) end_dt,
string_agg(distinct type, '|' order by dt) type,
avg(cpc) cpc,
avg(gwm) gwm
from (
select
t.*,
sum(case when dt <= datetime_add(lag_dt, interval 8 minute) then 0 else 1 end)
over(partition by category, sub_category order by dt) grp
from (
select
t.*,
lag(dt) over(partition by category, sub_category order by dt) lag_dt
from (
select t.*, datetime(date, timestamp) dt
from mytable t
) t
) t
) t
) t
group by category, sub_category, grp
Note that you should not be storing the date and time parts of your timestamps in separated columns: this makes the logic more complicated when you need to combine them (I added another level of nesting to avoid repeated conversions, which would have obfuscated the code).
I have a question about retention rate.
I have 2 tables, including the customer data and the order data.
DISTRIBUTOR as d
+---------+-----------+--------------+--------------------+
| ID | SETUP_DT | REINSTATE_DT | LOCAL_REINSTATE_DT |
+---------+-----------+--------------+--------------------+
| C111111 | 2018/1/1 | Null | Null |
| C111112 | 2015/12/9 | 2018/10/25 | 2018/10/25 |
| C111113 | 2018/10/1 | Null | Null |
| C111114 | 2018/10/6 | 2018/12/14 | 2018/12/14 |
+---------+-----------+--------------+--------------------+
ORDER as o, please noted that the data is for reference...
+---------+----------+-----+
| ID | ORD_DT | OAL |
+---------+----------+-----+
| C111111 | 2018/1/1 | 112 |
| C111111 | 2018/1/1 | 100 |
| C111111 | 2018/1/1 | 472 |
| C111111 | 2018/1/1 | 452 |
| C111111 | 2018/1/1 | 248 |
| C111111 | 2018/1/1 | 996 |
+---------+----------+-----+
The 3rd Table in my mind to create the retention rate report
+---------+-----------+-----------+---------------+-----------+
| ID | APP_MON | ORDER_MON | TimeDiff(Mon) | TTL AMT |
+---------+-----------+-----------+---------------+-----------+
| C111111 | 2018/1/1 | 2018/1/1 | - | 25,443 |
| C111111 | 2018/1/1 | 2018/2/1 | 1 | 7,610 |
| C111111 | 2018/1/1 | 2018/3/1 | 2 | 20,180 |
| C111111 | 2018/1/1 | 2018/4/1 | 3 | 22,265 |
| C111111 | 2018/1/1 | 2018/5/1 | 4 | 34,118 |
| C111111 | 2018/1/1 | 2018/6/1 | 5 | 19,523 |
| C111111 | 2018/1/1 | 2018/7/1 | 6 | 20,220 |
| C111111 | 2018/1/1 | 2018/8/1 | 7 | 2,006 |
| C111111 | 2018/1/1 | 2018/9/1 | 8 | 15,813 |
| C111111 | 2018/1/1 | 2018/10/1 | 9 | 16,733 |
| C111111 | 2018/1/1 | 2018/11/1 | 10 | 20,973 |
| C111112 | 2018/10/1 | 2017/11/1 | - | 516 |
| C111112 | 2018/10/1 | 2018/10/1 | - | 1 |
| C111113 | 2018/10/1 | Null | - | Null |
| C111114 | 2018/12/1 | Null | - | Null |
+---------+-----------+-----------+---------------+-----------+
Definition:
- APP_MON: the month that the customer joined, which is the max date from the start date of [d.SETUP_DT], [d.REINSTATE_DT] and [d.LOCAL_REINSTATE_DT]
- ORD_MON: the month that the customer purchased, which is the start date of the order date month
- TimeDiff: The duration by month between APP_MON and ORD_MON, e.g. if A's ODR_MON is 2018/1/1 and A'S APP_MON is 2018/2/1, the duration is 1.
- TTL_AMT: the total order amount that the customer bought in the related order date month
I tried to get the data from 3rd table.
But I run the code below and it's very slow... I need a more effective way since I have millions of data...
Thanks.
I don't think you need to use unpivot. To get the latest date you can just use the greatest() function.
This solution has two subqueries, one to calculate the app_mon for each new customer and the other to calculate the earliest order date for all customers who placed an order in the last two years. This may not be the most performative approach but your first priority should be to get the correct outcome; once you have that you can tune it if necessary:
with cust as
(
select d.dist_id as id
, greatest(d.setup_dt, d.reinstate_dt, d.local_reinstate_dt) as app_mo
from mjensen_dev.gc_distributor d
where d.setup_dt >= date '2017-01-01'
or d.reinstate_dt >= date '2017-01-01'
or d.local_reinstate_dt >= date '2017-01-01'
) , ord as
(
select o.dist_id as id
, min(o.ord_dt) as ord_mon
, sum(o.oal) as ord_amt
from gc_orders o
where o.ord_dt >= date '2017-01-01'
group by o.dist_id
, trunc(o.ord_dt,'mm')
)
select cust.dist_id as id
, cust.app_mon
, ord.ord_mon
, floor(months_between(ord.ord_mon, cust.app_mon ) as mon_diff
, sum(o.oal) as ord_amt
from cust
inner join gc_orders o on cust.id = o.dist_id
order by 1, 2
/
You may wish to tweak at my calculation of mon_diff. This calculation treats 2018/2/1 - 2018/1/1 as one month difference. Because it seems odd to me that a customer who places an order on the day they joined would have a mon_diff of 1 rather than zero. But if your statement of the business rule is correct you would need to add 1 to the calculation. Likewise I have not included the trunc() in the processing of the dates but you may wish to reinstate it.