Postgres Join tables using timestamp (part of it) - sql

I have data with a frequency of one minute for 3 years and I would need to put it in one table to make it comparable.
Table1-2019
date_time
v_2020
01.01.2019 01:00:00
50
01.01.2019 01:01:00
49
01.01.2019 01:02:00
56
Table2-2020
date_time
v_2020
01.01.2020 01:00:00
60
01.01.2020 01:01:00
59
01.01.2020 01:02:00
56
Table3-2021
date_time
v_2020
01.01.2021 01:00:00
55
01.01.2021 01:01:00
54
01.01.2021 01:02:00
48
requested table
date_time
v_2019
v_2020
v_2021
01.01. 01:00:00
50
60
55
01.01. 01:01:00
49
59
54
01.01. 01:02:00
56
56
48
Visualisation of tables
I tried several codes, but they didn't work. With functions JOIN and LEFT, I have a problem with the format of date_time column (it is a timestamp without zone). With the SUBSTR I had also a problem with format of date_time.
Finally I tried code below, but it also doesn't work.
CREATE TABLE all AS
SELECT A.date_time, A.v_2019 FROM Table1 AS A
JOIN Table2
WHERE (select datepart(day, month, hour, minute) from A.date_time)=(select datepart(day, month, hour, minute) from Table2.date_time)
JOIN Table3
WHERE (select datepart(day, month, hour, minute) from A.date_time)=(select datepart(day, month, hour, minute) from Table3.date_time)

Once you create your tables run this query. I believe that it is straightforward:
select to_char(t1.date_time, 'mm-dd hh24:mi') date_time,
t1.v_2020 v_2020_2019,
t2.v_2020 v_2020_2020,
t3.v_2020 v_2020_2021
from table1 t1
join table2 t2 on t2.date_time = t1.date_time + interval '1 year'
join table3 t3 on t3.date_time = t1.date_time + interval '2 years';
See DB-fiddle
date_time
v_2020_2019
v_2020_2020
v_2020_2021
01-01 01:00
50
60
55
01-01 01:01
49
59
54
01-01 01:02
56
56
48

While you can do this with an INTERVAL I think you should consider a JOIN condition that uses date manipulating functions.
Keep in mind using something like WHERE DATE_TRUNC(...) or JOIN ... ON DATE_TRUNC(...) will NOT respect indexes on these fields. When passing the field value into a function you're essentially creating a black box that cannot take advantage of an index. You would need to create an index specifically on DATE_TRUNC('DAY', date_time) for example.
Here is another DBFiddle for you to consider
You can do this in a couple ways:
SELECT TO_CHAR(v19.date_time, 'MM-DD HH24:MI') datetime
, v19.v_2019
, v20.v_2020
, v21.v_2021
FROM t_2019 v19
FULL JOIN t_2020 v20
ON DATE_PART('MONTH', v19.date_time) = DATE_PART('MONTH', v20.date_time)
AND DATE_PART('DAY', v19.date_time) = DATE_PART('DAY', v20.date_time)
AND v19.date_time::TIME = v20.date_time::TIME
FULL JOIN t_2021 v21
ON DATE_PART('MONTH', v20.date_time) = DATE_PART('MONTH', v21.date_time)
AND DATE_PART('DAY', v20.date_time) = DATE_PART('DAY', v21.date_time)
AND v20.date_time::TIME = v21.date_time::TIME
;
SELECT TO_CHAR(v19.date_time, 'MM-DD HH24:MI') datetime
, v19.v_2019
, v20.v_2020
, v21.v_2021
FROM t_2019 v19
FULL JOIN t_2020 v20
ON TO_CHAR(v19.date_time, 'MM-DD HH24:MI') = TO_CHAR(v20.date_time, 'MM.DD HH24:MI')
FULL JOIN t_2021 v21
ON TO_CHAR(v20.date_time, 'MM-DD HH24:MI') = TO_CHAR(v21.date_time, 'MM.DD HH24:MI')
;
Both of these result in the following:
datetime
v_2019
v_2020
v_2021
01-01 01:00
50
60
55
01-01 01:01
49
59
54
01-01 01:02
56
56
48

Related

Need to create a 5 minute interval in trino for a date and show an aggregate function

data table format
timestamp
stream_id
2021-01-01 12:30:29.928
123.
2021-01-01 01:30:29.928
124.
2021-01-01 05:30:29.928
223.
2021-01-01 01:23:29.928
134.
expected output
day
timestamp.
count(stream_id)
2021-01-01.
12:00.
5.
2021-01-01.
12:05.
18
2021-01-01.
12:10.
39
2021-01-01.
12:20.
90
2021-01-01.
12:25.
45
2021-01-01.
12:30.
76
2021-01-01.
12:35.
93
You can use some date manipulation to split timestamp into parts and then use them to group data and perform aggregation:
-- sample data
WITH dataset(timestamp, stream_id) AS (
values (timestamp '2021-01-01 12:30:29.928', 123),
(timestamp '2021-01-01 01:30:29.928', 124),
(timestamp '2021-01-01 05:30:29.928', 223),
(timestamp '2021-01-01 01:23:29.928', 134),
(timestamp '2021-01-01 01:24:29.928', 135),
(timestamp '2021-01-01 01:26:29.928', 136)
)
-- query
select day,
date_add('minute', 5*grp, day) ts, -- construct the 5 minute group back
count(*) cnt
from
(select date_trunc('day', timestamp) day,
date_diff('millisecond', date_trunc('day', timestamp), timestamp) / (5 * 60 * 1000) grp,
stream_id
from dataset)
group by day, grp;
Output:
day
ts
cnt
2021-01-01 00:00:00.000
2021-01-01 12:30:00.000
1
2021-01-01 00:00:00.000
2021-01-01 01:25:00.000
1
2021-01-01 00:00:00.000
2021-01-01 05:30:00.000
1
2021-01-01 00:00:00.000
2021-01-01 01:30:00.000
1
2021-01-01 00:00:00.000
2021-01-01 01:20:00.000
2
Use date_format if needed to turn timestamps into needed representations.

How to fill the time gap after grouping date record for months in postgres

I have table records as -
date n_count
2020-02-19 00:00:00 4
2020-07-14 00:00:00 1
2020-07-17 00:00:00 1
2020-07-30 00:00:00 2
2020-08-03 00:00:00 1
2020-08-04 00:00:00 2
2020-08-25 00:00:00 2
2020-09-23 00:00:00 2
2020-09-30 00:00:00 3
2020-10-01 00:00:00 11
2020-10-05 00:00:00 12
2020-10-19 00:00:00 1
2020-10-20 00:00:00 1
2020-10-22 00:00:00 1
2020-11-02 00:00:00 376
2020-11-04 00:00:00 72
2020-11-11 00:00:00 1
I want to be grouped all the records into months for finding month total count which is working, but there is a missing of month. how to fill this gap.
time month_count
"2020-02-01" 4
"2020-07-01" 4
"2020-08-01" 5
"2020-09-01" 5
"2020-10-01" 26
"2020-11-01" 449
This is what I have tried.
SELECT (date_trunc('month', date))::date AS time,
sum(n_count) as month_count
FROM table1
group by time
order by time asc
You can use generate_series() to generate all starts of months between the earliest and latest date available in the table, then bring the table with a left join:
select d.dt, coalesce(sum(t.n_count), 0) as month_count
from (
select generate_series(date_trunc('month', min(date)), date_trunc('month', max(date)), '1 month') as dt
from table1
) as d(dt)
left join table1 t on t.date >= d.dt and t.date < d.dt + interval '1 month'
group by d.dt
order by d.dt
I would simply UNION a date series, generated from MIN and MAX date:
demo:db<>fiddle
WITH cte AS ( -- 1
SELECT
*,
date_trunc('month', date)::date AS time
FROM
t
)
SELECT
time,
SUM(n_count) as month_count --3
FROM (
SELECT
time,
n_count
FROM cte
UNION
SELECT -- 2
generate_series(
(SELECT MIN(time) FROM cte),
(SELECT MAX(time) FROM cte),
interval '1 month'
)::date,
0
) s
GROUP BY time
ORDER BY time
Use CTE to calculate date_trunc only once. Could be left out if you like to call your table twice in the UNION below
Generate monthly date series from MIN to MAX date containing your n_count value = 0. Add it to the table
Do your calculation

cast two separate columns which has hour ( datatype number) and minutes ( datatype number) to time datatype and subtract 90 minutes in oracle

I have two separate columns for hours and minutes in my table and I have a report where i should be subtracting 90 minutes from total time put together or ( 1 hour from hour field) and 30 minutes from minutes field. The output can be in minutes or hours.
I tried "to_char ( hours_column -1,'00' ) || ':' || to_char ( minutes_column -30,'00' ) AS "MAX_TIME" " - this fails when I have time like 9:00 I get 8:-30 as the output when I need to get 7:30.
I came up with some sql code with DATEADD and cast functions which worked but it fails when I implement it in Oracle.
Select Substring(Cast(DATEADD(minute, -90, Cast(hourscolumn + ':' + minutes column as Time)) as varchar(20)),1,5) as max_time
Can someone help me to implement the above code in Oracle? I'm just trying to deduct 90 minutes by putting the hours and minutes columns together.
Something like this?
test CTE represents your data. How come you got that (bad) idea? Who/what prevents you from storing 32 hours and 87 minutes into those columns?
query itself contains
time: the way you create a valid date value. It'll fail if hours and/or minutes are invalid (such as previously mentioned 32:87)
subtracted: subtract 90 minutes from time; (24 * 60) represents 24 hours in a day, 60 minutes in an hour. It'll contain both date and time component
the final result is achieved by applying to_char with appropriate format mask (hh24:mi) to the subtracted value
SQL> alter session set nls_Date_format = 'dd.mm.yyyy hh24:mi';
Session altered.
SQL> with test (hours, minutes) as
2 (select '09', '00' from dual union all
3 select '23', '30' from dual union all
4 select '00', '20' from dual
5 )
6 select hours,
7 minutes,
8 to_date(hours||minutes, 'hh24mi') time,
9 --
10 to_date(hours||minutes, 'hh24mi') - 90 / (24 * 60) subtracted,
11 --
12 to_char(to_date(hours||minutes, 'hh24mi') - 90 / (24 * 60), 'hh24:mi') result
13 from test;
HO MI TIME SUBTRACTED RESUL
-- -- ---------------- ---------------- -----
09 00 01.07.2019 09:00 01.07.2019 07:30 07:30
23 30 01.07.2019 23:30 01.07.2019 22:00 22:00
00 20 01.07.2019 00:20 30.06.2019 22:50 22:50
SQL>
Use NUMTODSINTERVAL to convert the hours and minutes to INTERVAL data types and then you can subtract INTERVAL '90' MINUTE and EXTRACT the resulting hour and minute components.
Oracle Setup:
CREATE TABLE table_name ( hours_column, minutes_column ) AS
SELECT 0, 0 FROM DUAL UNION ALL
SELECT 1, 30 FROM DUAL UNION ALL
SELECT 2, 45 FROM DUAL UNION ALL
SELECT 3, 0 FROM DUAL UNION ALL
SELECT 27, 59 FROM DUAL
Query:
SELECT EXTRACT( HOUR FROM time ) + EXTRACT( DAY FROM time ) * 24 AS hours,
EXTRACT( MINUTE FROM time ) AS minutes,
time,
TO_CHAR( EXTRACT( HOUR FROM time ) + EXTRACT( DAY FROM time ) * 24, '00' )
|| ':' || TO_CHAR( ABS( EXTRACT( MINUTE FROM time ) ), 'FM00' ) AS as_string
FROM (
SELECT NUMTODSINTERVAL( hours_column, 'HOUR' )
+ NUMTODSINTERVAL( minutes_column, 'MINUTE' )
- INTERVAL '90' MINUTE AS time
FROM table_name
)
Output:
HOURS | MINUTES | TIME | AS_STRING
----: | ------: | :---------------------------- | :--------
-1 | -30 | -000000000 01:30:00.000000000 | -01:30
0 | 0 | +000000000 00:00:00.000000000 | 00:00
1 | 15 | +000000000 01:15:00.000000000 | 01:15
1 | 30 | +000000000 01:30:00.000000000 | 01:30
26 | 29 | +000000001 02:29:00.000000000 | 26:29
db<>fiddle here

How can I generate zeros for missing id values?

I'm trying to generate an output that fills in missing counts with 0s.
I'm using Oracle SQL. So far, my solution is based on Grouping records hour by hour or day by day and filling gaps with zero or null with small additions.
WITH TEMP
AS ( SELECT MINDT + ( (LEVEL - 1) / 24) DDD
FROM (SELECT TRUNC (MIN (MY_TIMESTAMP), 'HH24') MINDT,
TRUNC (MAX (MY_TIMESTAMP), 'HH24') MAXDT
FROM MAIN_TABLE.TABLE_VIEW THV
WHERE MY_TIMESTAMP BETWEEN TO_DATE ('08/01/2018:00:00:00',
'MM/DD/YYYY:HH24:MI:SS')
AND TO_DATE (
'08/03/2018:23:59:59',
'MM/DD/YYYY:HH24:MI:SS')) V
CONNECT BY MINDT + ( (LEVEL - 1) / 24) <= MAXDT)
SELECT TO_CHAR (TRUNC (D1, 'HH24'), 'YYYY-MM-DD HH24'), COUNT (D2), ID
FROM (SELECT NVL (MY_TIMESTAMP, DDD) D1,
MY_TIMESTAMP D2,
THV.ID ID
FROM MAIN_TABLE.TABLE_VIEW THV
RIGHT OUTER JOIN
(SELECT DDD FROM TEMP) AD
ON DDD = TRUNC (MY_TIMESTAMP, 'HH24')
WHERE MY_TIMESTAMP BETWEEN TO_DATE ('08/01/2018:00:00:00',
'MM/DD/YYYY:HH24:MI:SS')
AND TO_DATE ('08/03/2018:23:59:59',
'MM/DD/YYYY:HH24:MI:SS'))
GROUP BY ID, TRUNC (D1, 'HH24')
ORDER BY ID, TRUNC (D1, 'HH24')
Right now I'm getting:
CNT ID DT
4 1 2018-08-01 00
1 1 2018-08-01 01
1 1 2018-08-01 04
20 1 2018-08-01 05
76 1 2018-08-01 07
But what I want is:
CNT ID DT
4 1 2018-08-01 00
1 1 2018-08-01 01
0 1 2018-08-01 02
0 1 2018-08-01 03
1 1 2018-08-01 04
20 1 2018-08-01 05
0 1 2018-08-01 06
76 1 2018-08-01 07
Any help would be appreciated.
It works pretty smooth if you have a table to join with that has all the hours you expect to have in the results. For a table to have all the hours, it would just have 24 records.
It can be a temp table, but if it was a real table, it would simplify your report to a standard query. And if this report is used regularly, why not have an extra table? I've seen DBAs have a generic "numbers" table with lots of numbers in it for tricks like this (to get 0-23, query the table where n between 0 and 23). Another example, if you want every individual date for a 90 day period, can use a numbers table for 0-89 and add that value to a start date to be able to join on every possible date in that period.

Alternative to this query to run under MariaDb 10.1

This query works as expected under Mysql 8, but MariaDB 10.1 is used on my server. Do you know if an alternative exists to this ? And how to achieve it ?
SELECT * FROM (
SELECT
*,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(hs.`ending_hour`, hs.`starting_hour`))) OVER (ORDER BY hs.starting_hour RANGE BETWEEN INTERVAL '12' HOUR PRECEDING AND INTERVAL '12' HOUR following)) AS tot
FROM
time_table hs
WHERE hs.`starting_hour` > DATE_SUB(NOW(), INTERVAL 50 DAY) AND hs.`ending_hour` <= NOW()
ORDER BY hs.`starting_hour` ASC
) t1
HAVING tot >= '14:00:00'
;
fiddle
The problem is RANGE BETWEEN INTERVAL on OVER window function doesn't exists under MariaDB at this moment.
Thank you
Sample data:
id starting_hour ending_hour
------ ------------------- ---------------------
1 2018-09-02 06:00:00 2018-09-02 08:30:00
2 2018-09-02 08:30:00 2018-09-02 10:00:00
4 2018-09-03 11:00:00 2018-09-03 15:00:00
5 2018-09-04 15:30:00 2018-09-04 16:00:00
6 2018-09-04 16:15:00 2018-09-04 17:00:00
7 2018-09-19 00:00:00 2018-09-19 03:00:00
8 2018-09-19 04:00:00 2018-09-19 15:00:00
9 2018-09-20 00:00:00 2018-09-20 22:01:00
10 2018-10-21 12:00:00 2018-10-21 11:00:00
11 2018-10-29 09:09:00 2018-10-29 10:10:00
12 2018-10-09 02:10:00 2018-10-09 14:00:00
In my use case id 7, 8 and 9 are the results.
RE-EDIT
Thanks to #Gordon Linoff answer's, this is the corrected query.
But finally doesn't work as expected. Increasing INTERVAL 50 DAY return non wanted rows that MySQL window function doesn't.
SELECT hs.*,
(
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(hs2.ending_hour, hs2.starting_hour))))
FROM hours_sailor hs2
WHERE hs2.starting_hour >= DATE_SUB(hs.starting_hour, INTERVAL 12 HOUR) AND hs2.starting_hour <= DATE_SUB(NOW(), INTERVAL 12 HOUR)
) AS duration
FROM `time_table` hs
WHERE hs.`starting_hour` > DATE_SUB(NOW(), INTERVAL 50 DAY) AND hs.`ending_hour` <= NOW()
HAVING duration >= '14:00:00'
ORDER BY hs.starting_hour ASC;
You can express this using a correlated subquery. I think this is the equivalent logic:
SELECT hs.*,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(hs2.ending_hour, hs2.starting_hour)))
FROM time_table hs2
WHERE hs2.starting_hour >= hs.starting_hour - INTERVAL '12' HOUR AND
hs2.starting_hour <= hs.starting_hour + INTERVAL '12' HOUR
) AS tot
FROM time_table hs
WHERE hs.starting_hour > DATE_SUB(NOW(), INTERVAL 50 DAY) AND
hs.ending_hour <= NOW()
HAVING tot >= '14:00:00'
ORDER BY hs.starting_hour ASC;
EDIT:
If you also want the timing restriction for the "range" calculation, you need to include it in the subquery. This filtering is built into the window function, but it is more often a hinderance than feature:
SELECT hs.*,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(hs2.ending_hour, hs2.starting_hour)))
FROM time_table hs2
WHERE hs2.starting_hour >= hs.starting_hour - INTERVAL '12' HOUR AND
hs2.starting_hour <= hs.starting_hour + INTERVAL '12' HOUR AND
hs2.starting_hour > DATE_SUB(NOW(), INTERVAL 50 DAY) AND
hs2.ending_hour <= NOW()
) AS tot
FROM time_table hs
WHERE hs.starting_hour > DATE_SUB(NOW(), INTERVAL 50 DAY) AND
hs.ending_hour <= NOW()
HAVING tot >= '14:00:00'
ORDER BY hs.starting_hour ASC;