Get data from multiple tables by two timestamp - sql

PostgreSQL 10.12
I have a table with calculated data grouped by date with hour, e.g.:
hourly_stats
clicks_count | visitors_count | product_id | promoter_id | bundle_id | date_time
------------------------------------------------------------------------------------------
15 | 6 | 123 | 456 | 789 | 2018-11-02 12:00:00
8 | 3 | 123 | 456 | 789 | 2018-11-02 16:00:00
2 | 1 | 123 | 456 | 789 | 2018-11-13 10:00:00
5 | 2 | 123 | 456 | 789 | 2018-11-13 21:00:00
Every new hour I collect statistics for the previous hour and insert it into the table.
In addition, to always display fresh data, I use a materialized view, which stores the calculated data from the beginning of the current hour to the current moment (refreshed every 5 minutes).
The core part of the query is always based on two timestamp values and looks like this:
SELECT *
FROM (
SELECT
clicks_count,
visitors_count,
product_id,
promoter_id,
bundle_id,
date_time
FROM hourly_stats
UNION ALL (
SELECT
clicks_count,
visitors_count,
product_id,
promoter_id,
bundle_id,
date_time
FROM materialized_stats
)
)
WHERE (date_time > start_date AND date_time <= end_date)
This core part is used in multiple really complex queries, which are too slow. For example, it takes more than a 1.5 minute to complete the query (if no row is filtered by start_date and end_date) if table has more than 20 million records in one of the cases.
I decided to add two more table with calculated data grouped by year-month-day:
daily_stats
clicks_count | visitors_count | product_id | promoter_id | bundle_id | date_time
------------------------------------------------------------------------------------------
23 | 9 | 123 | 456 | 789 | 2018-11-02
7 | 3 | 123 | 456 | 789 | 2018-11-13
and by year-month:
monthly_stats
clicks_count | visitors_count | product_id | promoter_id | bundle_id | date_time
------------------------------------------------------------------------------------------
30 | 12 | 123 | 456 | 789 | 2018-11
So, if I have start_date = '2019-01-01 00:00:00' and end_date = '2020-08-12 16:00:00' I will be able to collect data like this
(SELECT
clicks_count,
visitors_count,
product_id,
promoter_id,
bundle_id,
date_time
FROM monthly_stats
WHERE 'monthly_condition')
UNION ALL
(SELECT
clicks_count,
visitors_count,
product_id,
promoter_id,
bundle_id,
date_time
FROM daily_stats
WHERE 'daily_condition')
UNION ALL
(SELECT
clicks_count,
visitors_count,
product_id,
promoter_id,
bundle_id,
date_time
FROM hourly_stats
WHERE 'hourly_condition')
UNION ALL (
SELECT
clicks_count,
visitors_count,
product_id,
promoter_id,
bundle_id,
date_time
FROM materialized_stats
)
Each calculated row is added to the corresponding table only after the base time period (month, day, or hour) is over. So for specific set of product_id | promoter_id | bundle_id I should get:
19 rows from monthly_stats +
11 rows from daily_stats +
16 rows from hourly_stats +
1 row from materialized_stats
Already implemented restrictions (on a application layer):
max end_date value may be equal to the end of the current day
start_date is always less than end_date
start_date and end_date values ​​are specified with an hour precision
Question: how to implement these 'monthly_condition', 'daily_condition' and 'hourly_condition' above? They should be based on the start_date and end_date parts, but I quite don't understand how to do this.
Thanks for any help.

This is an interesting problem. I had to solve this once before for SQL Server. PostgreSQL makes it much easier. Everything down to the fullness cte has been tested. The allstats cte is a best guess since I do not have your tables or data.
with invars as (
select '2016-08-15 12:35:00'::timestamptz as start_date,
'2020-08-12 19:00:00'::timestamptz as end_date
), days as (
select c.dhour,
tstzrange(
date_trunc('hour', i.start_date),
date_trunc('hour', i.end_date), '[)') as qrange
from invars i
cross join lateral generate_series(
date_trunc('hour', i.start_date),
date_trunc('hour', i.end_date),
interval '1 hour'
) as c(dhour)
), calendar as (
select dhour,
date_trunc('day', dhour) as dday,
date_trunc('month', dhour) as dmonth,
qrange
from days
), fullness as (
select dhour, dday, dmonth, qrange,
qrange #> tstzrange(dday, dday + interval '1 day', '[)') as full_day,
qrange #> tstzrange(dmonth, dmonth + interval '1 month', '[)') as full_month
from calendar
), allstats as (
select clicks_count, visitors_count, product_id, promoter_id, bundle_id
from monthly_stats
where date_time in (select distinct to_char(dmonth, 'YYYY-MM')
from fullness where full_month)
union all
select clicks_count, visitors_count, product_id, promoter_id, bundle_id
from daily_stats
where date_time in (select distinct to_char(dday, 'YYYY-MM-DD')
from fullness where full_day and not full_month)
union all
select clicks_count, visitors_count, product_id, promoter_id, bundle_id
from hourly_stats
where date_time in (select dhour from fullness
where not full_day and not full_month
and dhour < date_trunc(hour, now()))
union all
select clicks_count, visitors_count, product_id, promoter_id, bundle_id
from materialized_stats
)
select * from allstats;
I think your problem description leaves off the fact that the start_date can begin in the middle of a month or even a day. This query covers that.

Related

Stack several rows into one with date condition

I've got raw data from table with information about clients. Information comes from different sources, so it causes duplicates but with different dates:
id pp type start_dt end_dt
100| 1 | Y | 01.05.19 | 01.10.20
100| 1 | Y | 10.08.20 | 01.10.20
100| 1 | N | 01.10.20 | 02.12.21
100| 1 | N | 13.12.20 | 02.12.21
100| 1 | Y | 02.12.21 | 02.12.26
100| 1 | Y | 20.12.21 | 20.12.26
For example, in this table row 2, 4 and 6 have start date within "start_dt" and "end_dt" of previous row. It's a duplicate, but I need to combine min start date and max end date from both rows for type.
FYI. First two rows and last two rows have same id, pp and type, but I need to stack them separately because of the timeline.
What I want to get (continuous timeline for a client is a key):
id pp type start_dt end_dt | cnt
100| 1 | Y | 01.05.19 | 01.10.20 | 2
100| 1 | N | 01.10.20 | 02.12.21 | 2
100| 1 | Y | 02.12.21 | 20.12.26 | 2
I'm using PL/SQL. I think it could be solved by window functions, but I can't figure out which functions to use.
Tried to solve it by group by while having > 1, but in this case it stacks four rows with same type (rows 1,2 and 5,6) into one. I need two separate rows for each type while saving continuous timeline of dates for one client.
From Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern matching:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id, pp
ORDER BY start_dt
MEASURES
FIRST(type) AS type,
FIRST(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
PATTERN (overlapping* last_row)
DEFINE
overlapping AS type = NEXT(type)
AND MAX(end_dt) >= NEXT(start_dt)
)
Which, for the sample data:
CREATE TABLE table_name (id, pp, type, start_dt, end_dt) AS
SELECT 100, 1, 'Y', DATE '2019-05-01', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2020-08-10', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-10-01', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-12-13', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-02', DATE '2026-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-20', DATE '2026-12-20' FROM DUAL;
Outputs:
ID
PP
TYPE
START_DT
END_DT
CNT
100
1
Y
2019-05-01 00:00:00
2020-10-01 00:00:00
2
100
1
N
2020-10-01 00:00:00
2021-12-02 00:00:00
2
100
1
Y
2021-12-02 00:00:00
2026-12-20 00:00:00
2
fiddle
If you want to use analytic and aggregation functions then it is a bit more complicated:
SELECT id, pp, type,
MIN(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
FROM (
SELECT id, pp, type, start_dt, end_dt,
SUM(grp_change) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
) AS grp
FROM (
SELECT t.*,
CASE
WHEN start_dt <= MAX(end_dt) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS grp_change
FROM table_name t
)
)
GROUP BY id, pp, type, grp
ORDER BY id, pp, start_dt
fiddle
I prefer this version because comparing "type = next(type)" without "type" being in the "order by" may lead to errors.
match_recognize(
partition by id, pp, type
order by start_dt,end_dt
measures first(start_dt) as start_dt, max(end_dt) as end_dt, count(*) as n
pattern (merged* strt)
define
merged as max(end_dt) >= next(start_dt)
)

Create time intervals based on values in one column / SQL Oracle

I need to create query that will return time intervals from table, that has attributes for (almost) every day.
The original table looks like the following:
Person | Date | Date_Type
-------|------------|----------
Sam | 01.06.2020 | Vacation
Sam | 02.06.2020 | Vacation
Sam | 03.06.2020 | Work
Sam | 04.06.2020 | Work
Sam | 05.06.2020 | Work
Frodo | 01.06.2020 | Work
Frodo | 02.06.2020 | Work
.....
And the desired should look like:
Person | Date_Interval | Date_Type
-------|-----------------------|----------
Sam | 01.06.2020-02.06.2020 | Vacation
Sam | 03.06.2020-05.06.2020 | Work
Frodo | 01.06.2020-02.06.2020 | Work
.....
Will be grateful for any idea :)
This reads like a gaps-and-island problem. Here is one approach:
select person, min(date) startdate, max(date) enddate, date_type
from (
select t.*,
row_number() over(partition by person order by date) rn1,
row_number() over(partition by person, date_type order by date) rn2
from mytable t
) t
group by person, date_type, rn1 - rn2
This also works if not all dates are contiguous (since you stated that you have almost all dates, I understood you don't have them all).
This is a type of gaps-and-islands problem.
To get adjacent days with the same date_type, you can subtract a sequence. It will be constant for adjacent days. Then you can aggregate:
select person, date_type, min(date), max(date)
from (select t.*,
row_number() over (partition by person, date_type
order by date) as seqnum
from t
) t
group by person, date_type, (date - seqnum);
One of the simplest methods is to use MATCH_RECOGNIZE to perform a row-by-row comparison and aggregation:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY Person
ORDER BY "DATE"
MEASURES
FIRST( "DATE" ) AS start_date,
LAST( "DATE") AS end_date,
FIRST( Date_Type ) AS date_type
ONE ROW PER MATCH
PATTERN ( successive_dates+ )
DEFINE
SUCCESSIVE_DATES AS (
FIRST( Date_Type ) = NEXT( Date_Type )
AND MAX( "DATE" ) + INTERVAL '1' DAY = NEXT( "DATE")
)
);
Which, for the sample data:
CREATE TABLE table_name ( Person, "DATE", Date_Type ) AS
SELECT 'Sam', DATE '2020-06-01', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-02', 'Vacation' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-03', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-04', 'Work' FROM DUAL UNION ALL
SELECT 'Sam', DATE '2020-06-05', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-01', 'Work' FROM DUAL UNION ALL
SELECT 'Frodo', DATE '2020-06-02', 'Work' FROM DUAL;
Outputs:
PERSON | START_DATE | END_DATE | DATE_TYPE
:----- | :------------------ | :------------------ | :--------
Frodo | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Work
Sam | 2020-06-01 00:00:00 | 2020-06-01 00:00:00 | Vacation
Sam | 2020-06-03 00:00:00 | 2020-06-04 00:00:00 | Work
db<>fiddle here

Can Not Group By on DATE from Timestamp

I am unable to group by on date from a timestamp column in below query:
CHG_TABLE
+----+--------+----------------+-----------------+-------+-----------+
| Key|Seq_Num | Start_Date | End_Date | Value |Record_Type|
+----+--------+----------------+-----------------+-------+-----------+
| 1 | 1 | 5/25/2019 2.05 | 12/31/9999 00.00| 800 | Insert |
| 1 | 1 | 5/25/2019 2.05 | 5/31/2019 11.12 | 800 | Update |
| 1 | 2 | 5/31/2019 11.12| 12/31/9999 00.00| 900 | Insert |
| 1 | 2 | 5/31/2019 11.12| 6/15/2019 12.05 | 900 | Update |
| 1 | 3 | 6/15/2019 12.05| 12/31/9999 00.00| 1000 | Insert |
| 1 | 3 | 6/15/2019 12.05| 6/25/2019 10.20 | 1000 | Update |
+---+---------+----------------+-----------------+-------+-----------+
RESULT:
+-----+------------------+----------------+-----------+----------+
| Key | Month_Start_Date | Month_End_Date |Begin_Value|End_Value |
+---- +------------------+----------------+-----------+----------+
| 1 | 6/1/2019 | 6/30/2019 | 1700 | 1000 |
| 1 | 7/1/2019 | 7/31/2019 | 1000 | 1000 |
+-----+------------------+----------------+-----------+----------+
Begin_Value : Sum(Value) for Max(Start_Date) < Month_Start_Date -> Should pick up latest date from last month
End_Value : Sum(Value) for Max(Start_Date) <= Month_End_Date -> Should pick up the latest date
SELECT k.key,
dd.month_start_date,
dd.month_end_date,
gendata.value first_value,
gendata.next_value last_value
FROM dim_date dd CROSS JOIN dim_person k
JOIN (SELECT ct.key,
dateadd('day',1,last_day(ct.start_date)) start_date ,
SUM(ct.value),
lead(SUM(ct.value)) OVER(ORDER BY ct.start_date) next_value
FROM (SELECT key,to_char(start_Date,'MM-YYYY') MMYYYY, max(start_Date) start_date
FROM CHG_TABLE
GROUP BY to_char(start_Date,'MM-YYYY'), key
) dt JOIN CHG_TABLE ct ON
dt.start_date = ct.start_date AND
dt.key = ct.key
group by ct.key, to_char(start_Date,'MM-YYYY')
) gendata ON
to_char(dd.month_end_date,'MM-YYYY') = to_char(to_char(start_Date,'MM-YYYY')) AND
k.key = gendata.key;
Error:
start_Date is not a valid group by expression
Related post:
Monthly Snapshot using Date Dimension
Hoping, I understood your question correctly.
You can check below query
WITH chg_table ( key, seq_num, start_date, end_date, value, record_type ) AS
(
SELECT 1,1,TO_DATE('5/25/2019 2.05','MM/DD/YYYY HH24.MI'),TO_DATE('12/31/9999 00.00','MM/DD/YYYY HH24.MI'), 800, 'Insert' FROM DUAL UNION ALL
SELECT 1,1,TO_DATE('5/25/2019 2.05','MM/DD/YYYY HH24.MI'),TO_DATE('5/31/2019 11.12','MM/DD/YYYY HH24.MI'), 800, 'Update' FROM DUAL UNION ALL
SELECT 1,2,TO_DATE('5/31/2019 11.12','MM/DD/YYYY HH24.MI'),TO_DATE('12/31/9999 00.00','MM/DD/YYYY HH24.MI'), 900, 'Insert' FROM DUAL UNION ALL
SELECT 1,2,TO_DATE('5/31/2019 11.12','MM/DD/YYYY HH24.MI'),TO_DATE('6/15/2019 12.05','MM/DD/YYYY HH24.MI'), 900, 'Update' FROM DUAL UNION ALL
SELECT 1,3,TO_DATE('6/15/2019 12.05','MM/DD/YYYY HH24.MI'),TO_DATE('12/31/9999 00.00','MM/DD/YYYY HH24.MI'), 1000, 'Insert' FROM DUAL UNION ALL
SELECT 1,3,TO_DATE('6/15/2019 12.05','MM/DD/YYYY HH24.MI'),TO_DATE('6/25/2019 10.20','MM/DD/YYYY HH24.MI'), 1000, 'Update' FROM DUAL
)
select key , new_start_date Month_Start_Date , new_end_date Month_End_Date , begin_value ,
nvl(lead(begin_value) over(order by new_start_date),begin_value) end_value
from
(
select key , new_start_date , new_end_date , sum(value) begin_value
from
(
select key, seq_num, start_date
, value, record_type ,
trunc(add_months(start_date,1),'month') new_start_date ,
trunc(add_months(start_date,2),'month')-1 new_end_date
from chg_table
where record_type = 'Insert'
)
group by key , new_start_date , new_end_date
)
order by new_start_date
;
Db Fiddle link: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=c77a71afa82769b48f424e1c0fa1c0b6
I am assuming that you are getting an "ORA-00979: not a GROUP BY expression" and this is due to your use of the TO_CHAR(timestamp_col,'DD-MM-YYYY') in the GROUP BY clause.
Adding the TO_CHAR(timestamp_col,'DD-MM-YYYY') to the select side of your statement should resolve this and provide the results you are expecting.
a, b, dateadd('day',1,last_day(timestamp_col)) start_date, TO_CHAR(timestamp_col,'DD-MM-YYYY'), ...```

Generating multiple rows from a single row based on dates

I have a database table with a start date and a number of months. How can I transform that into multiple rows based on the number of months?
I want to transform this
Into this:
We can try using a calendar table here, which includes all possible start of month dates which might appear in the expected output:
with calendar as (
select '2017-09-01'::date as dt union all
select '2017-10-01'::date union all
select '2017-11-01'::date union all
select '2017-12-01'::date union all
select '2018-01-01'::date union all
select '2018-02-01'::date union all
select '2018-03-01'::date union all
select '2018-04-01'::date union all
select '2018-05-01'::date union all
select '2018-06-01'::date union all
select '2018-07-01'::date union all
select '2018-08-01'::date
)
select
t.id as subscription_id,
c.dt,
t.amount_monthly
from calendar c
inner join your_table t
on c.dt >= t.start_date and
c.dt < t.start_date + (t.month_count::text || ' month')::interval
order by
t.id,
c.dt;
Demo
This can easily be done using generate_series() in Postgres
select t.id,
g.dt::date,
t.amount_monthly
from the_table t
cross join generate_series(t.start_date,
t.start_date + interval '1' month * (t.month_count - 1),
interval '1' month) as g(dt);
OK, it's very easy to implement this in PostgreSQL, just use generate_series, as below:
select * from month_table ;
id | start_date | month_count | amount | amount_monthly
------+------------+-------------+--------+----------------
1382 | 2017-09-01 | 3 | 38 | 1267
1383 | 2018-02-01 | 6 | 50 | 833
(2 rows)
select
id,
generate_series(start_date,start_date + (month_count || ' month') :: interval - '1 month'::interval, '1 month'::interval)::date as date,
amount_monthly
from
month_table ;
id | date | amount_monthly
------+------------+----------------
1382 | 2017-09-01 | 1267
1382 | 2017-10-01 | 1267
1382 | 2017-11-01 | 1267
1383 | 2018-02-01 | 833
1383 | 2018-03-01 | 833
1383 | 2018-04-01 | 833
1383 | 2018-05-01 | 833
1383 | 2018-06-01 | 833
1383 | 2018-07-01 | 833
(9 rows)
You may not need so many subqueries but this should help you understand how it can be broken down
WITH date_minmax AS(
SELECT
min(start_date) as date_first,
(max(start_date) + (month_count::text || ' months')::interval)::date AS date_last
FROM "your_table"
GROUP BY month_count
), series AS (
SELECT generate_series(
date_first,
date_last,
'1 month'::interval
)::date as list_date
FROM date_minmax
)
SELECT
id as subscription_id,
list_date as date,
amount_monthly as amount
FROM series
JOIN "your_table"
ON list_date <# daterange(
start_date,
(start_date + (month_count::text || ' months')::interval)::date
)
ORDER BY list_date
This should achieve the desired result http://www.sqlfiddle.com/#!17/7d943/1

split one row to multiple row in oracle

I have a simple select query that has this result:
first_date | last_date | outstanding
14/01/2015 | 14/04/2015 | 100000
I want to split it to be
first_date | last_date | period | outstanding
14/01/2015 | 31/01/2015 | 31/01/2015 | 100000
01/02/2015 | 28/02/2015 | 28/02/2015 | 100000
01/03/2015 | 31/03/2015 | 31/03/2015 | 100000
01/04/2015 | 14/04/2015 | 31/04/2015 | 100000
Please show me how to do it simply, without using function/procedure, object and cursor.
Try:
WITH my_query_result AS(
SELECT date '2015-01-14' as first_date , date '2015-04-14' as last_date,
10000 as outstanding
FROM dual
)
SELECT greatest( trunc( add_months( first_date, level - 1 ),'MM'), first_date )
as first_date,
least( trunc( add_months( first_date, level ),'MM')-1, last_date )
as last_date,
trunc( add_months( first_date, level ),'MM')-1 as period,
outstanding
FROM my_query_result t
connect by level <= months_between( trunc(last_date,'MM'), trunc(first_date,'MM') ) + 1;
A side note: April has only 30 days, so a date 31/04/2015 in your question is wrong.