I'm relatively new to working with PostgreSQL and I could use some help with this.
Suppose I have a table of forecasted values (let's say temperature) are stored, which are indicated by a dump_date_time . This dump_date_time is the date_time when the values were stored in the table. The temperature forecasts are also indicated by the date_time to which the forecast corresponds. Lets say that every 6 hours a forecast is published.
Example:
At 06:00 today the temperature for tomorrow at 16:00 is published and stored in the table. Then at 12:00 today the temperature for tomorrow at 16:00 is published and also stored in the table. I now have two forecasts for the same date_time (16:00 tomorrow) which are published at two different times (06:00 and 12:00 today), indicated by the dump_date_time.
All these values are stored in the same table, with three columns: dump_date_time, date_time and value. My goal is to SELECT from this table the difference between the temperatures of the two forecasts. How do I do this?
One option uses a join:
select date_time, t1.value - t2.value value_diff
from mytable t1
inner join mytable t2 using (date_time)
where t1.dump_date = '2020-01-01 06:00:00'::timestamp
and t2.dump_date = '2020-01-01 16:00:00'::timestamp
Something like:
create table forecast(dump_date_time timestamptz, date_time timestamptz, value numeric)
insert into forecast values ('09/24/2020 06:00', '09/25/2020 16:00', 50), ('09/24/2020 12:00', '09/25/2020 16:00', 52);
select max(value) - min(value) from forecast where date_time = '09/25/2020 16:00';
?column?
----------
2
--Specifying dump_date_time range
select
max(value) - min(value)
from
forecast
where
date_time = '09/25/2020 16:00'
and
dump_date_time <#
tstzrange(current_date + interval '6 hours',
current_date + interval '12 hours', '[]');
?column?
----------
2
This is a very simple case. If you need something else you will need to provide more information.
UPDATE
Add example that uses timestamptz range to select dump_date_time in range.
Related
I use goodle big query. My query includes 2 different timestamps: start_at and end_at.
The goal of the query is to round these 2 timestamps to the nearest 30 minutes interval, which I manage using this: TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_at, INTERVAL MOD(EXTRACT(MINUTE FROM start_at), 30) MINUTE),MINUTE) and the same goes for end_at.
Events occur (net_lost_orders) at each rounded timestamp.
The 2 problems that I encounter are:
First, as long as start_at and end_at are in the same 30 min. interval, things work well but when it is not the case (for example when start_at: 19:15 (nearest 30 min interval is 19:00) / end_at: 21:15 (nearest 30 min interval is 21:00), the results are not as expected. Additionally, I do not only need the 2 extreme intervals but all 30 minutes interval between start_at and end_at(19:00/19:30/20:00/20:30/21:00 in the example).
Secondly, I don't manage to create a condition that allows to show each interval on a separate row. I have tried to CAST, TRUNCATE,EXTRACTthe timestamps and to use CASE WHEN and to GROUP BY without success.
Here's the final part of the query (timestamps rounded excluded):
...
-------LOST ORDERS--------
a AS (SELECT created_date, closure, zone_id, city_id, l.interval_start,
l.net as net_lost_orders, l.starts_at, CAST(DATETIME(l.starts_at, timezone)AS TIMESTAMP) as start_local_time
FROM `XXX`, UNNEST(lost_orders) as l),
b AS (SELECT city_id, city_name, zone_id, zone_name FROM `YYY`),
lost AS (SELECT DISTINCT created_date, closure, zone_name, city_name, start_local_time,
TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_local_time, INTERVAL MOD(EXTRACT(MINUTE FROM start_local_time), 30) MINUTE),MINUTE) AS lost_order_30_interval,
net_lost_orders
FROM a LEFT JOIN b ON a.city_id=b.city_id AND a.zone_id=b.zone_id AND a.city_id=b.city_id
WHERE zone_name='Atlanta' AND created_date='2021-09-09'
ORDER BY rt ASC),
------PREPARATION CLOSURE START AND END INTERVALS------
f AS (SELECT
DISTINCT TIMESTAMP_TRUNC(TIMESTAMP_SUB(start_at, INTERVAL MOD(EXTRACT(MINUTE FROM start_at), 30) MINUTE),MINUTE) AS start_closure_30_interval,
TIMESTAMP_TRUNC(TIMESTAMP_SUB(end_at, INTERVAL MOD(EXTRACT(MINUTE FROM end_at), 30) MINUTE),MINUTE) AS end_closure_30_interval,
country_code,
report_date,
Day,
CASE
WHEN Day="Monday" THEN 1
WHEN Day="Tuesday" THEN 2
WHEN Day="Wednesday" THEN 3
WHEN Day="Thursday" THEN 4
WHEN Day="Friday" THEN 5
WHEN Day="Saturday" THEN 6
WHEN Day="Sunday" THEN 7
END AS Weekday_order,
report_week,
city_name,
events_mod.zone_name,
closure,
start_at,
end_at,
activation_threshold,
deactivation_threshold,
shrinkage_drive_time,
ROUND(duration/60,2) AS duration,
FROM events_mod
WHERE report_date="2021-09-09"
AND events_mod.zone_name="Atlanta"
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
ORDER BY report_date, start_at ASC)
------FINAL TABLE------
SELECT DISTINCT
start_closure_30_interval,end_closure_30_interval, report_date, Day, Weekday_order, report_week, f.city_name, f.zone_name, closure,
start_at, end_at, start_time,end_time, activation_threshold, deactivation_threshold, duration, net_lost_orders
FROM f
LEFT JOIN lost ON f.city_name=lost.city_name
AND f.zone_name=lost.zone_name
AND f.report_date=lost.created_date
AND f.start_closure_30_interval=lost.lost_order_30_interval
AND f.end_closure_30_interval=lost.lost_order_30_interval
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
Results:
Expected results:
I would be really grateful if you could help and explain me how to get all the rounded timestamps between start_at and end_aton separate rows. Thank you in advance. Best, Fabien
Spreadsheet here
Consider below approach
select intervals, any_value(t).*, sum(Nb_lost_orders) as Nb_lost_orders
from table1 t,
unnest(generate_timestamp_array(
timestamp_seconds(div(unix_seconds(starts_at), 1800) * 1800),
timestamp_seconds(div(unix_seconds(ends_at), 1800) * 1800),
interval 30 minute
)) intervals
left join (
select Nb_lost_orders,
timestamp_seconds(div(unix_seconds(Time_when_the_lost_order_occurred), 1800) * 1800) as intervals
from Table2
)
using(intervals)
group by intervals
if applied to sample data in your question
with Table1 as (
select 'Closure' Event, timestamp '2021-09-09 11:00:00' starts_at, timestamp '2021-09-09 11:45:00' ends_at union all
select 'Closure', '2021-09-09 12:05:00', '2021-09-09 14:10:00'
), Table2 as (
select 5 Nb_lost_orders, timestamp '2021-09-09 11:38:00' Time_when_the_lost_order_occurred
)
output is
I have monthly data in BigQuery in the following form:
CREATE TABLE if not EXISTS spend (
id int,
created_at DATE,
value float
);
INSERT INTO spend VALUES
(1, '2020-01-01', 100),
(2, '2020-02-01', 200),
(3, '2020-03-01', 100),
(4, '2020-04-01', 100),
(5, '2020-05-01', 50);
I would like a query to translate it into daily data in the following day:
One row per day.
The value of each day should be the monthly value divided by the number of days of the month.
What's the simplest way of doing this in BigQuery?
You can make use of GENERATE_DATE_ARRAY() in order to get an array between the desired dates (in your case, between 2020-01-01 and 2020-05-31) and create a calendar table, and then divide the value of a given month among the days in the month :)
Try this and let me know if it worked:
with calendar_table as (
select
calendar_date
from
unnest(generate_date_array('2020-01-01', '2020-05-31', interval 1 day)) as calendar_date
),
final as (
select
ct.calendar_date,
s.value,
s.value / extract(day from last_day(ct.calendar_date)) as daily_value
from
spend as s
cross join
calendar_table as ct
where
format_date('%Y-%m', date(ct.calendar_date)) = format_date('%Y-%m', date(s.created_at))
)
select * from final
My recommendation is to do this "locally". That is, run generate_date_array() for each row in the original table. This is much faster than a join across rows. BigQuery also makes this easy with the last_day() function:
select t.id, u.date,
t.value / extract(day from last_day(t.created_at))
from `table` t cross join
unnest(generate_date_array(t.created_at,
last_day(t.created_at, month)
)
) u(date);
I need to get the count of records using PostgreSQL from time 7:00:00 am till next day 6:59:59 am and the count resets again from 7:00am to 6:59:59 am.
Where I am using backend as java (Spring boot).
The columns in my table are
id (primary_id)
createdon (timestamp)
name
department
createdby
How do I give the condition for shift wise?
You'd need to pick a slice based on the current time-of-day (I am assuming this to be some kind of counter which will be auto-refreshed in some application).
One way to do that is using time ranges:
SELECT COUNT(*)
FROM mytable
WHERE createdon <# (
SELECT CASE
WHEN current_time < '07:00'::time THEN
tsrange(CURRENT_DATE - '1d'::interval + '07:00'::time, CURRENT_DATE + '07:00'::time, '[)')
ELSE
tsrange(CURRENT_DATE + '07:00'::time, CURRENT_DATE + '1d'::interval + '07:00'::time, '[)')
END
)
;
Example with data: https://rextester.com/LGIJ9639
As I understand the question, you need to have a separate group for values in each 24-hour period that starts at 07:00:00.
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
This uses the date and time functions and the GROUP BY clause:
Shift the timestamp value back 7 hours ((createdon - '7h'::interval)), so the distinction can be made by a change of date (at 00:00:00). Then,
Truncate the value to the date (date_trunc('day', …)), so that all values in a bucket are flattened to a single value (the date at midnight). Then,
Add 7 hours again to the value (… + '7h'::interval), so that it represents the starting time of the bucket. Then,
Group by that value (GROUP BY date_bucket).
A more complete example, with schema and data:
DROP TABLE IF EXISTS lorem;
CREATE TABLE lorem (
id serial PRIMARY KEY,
createdon timestamp not null
);
INSERT INTO lorem (createdon) (
SELECT
generate_series(
CURRENT_TIMESTAMP - '36h'::interval,
CURRENT_TIMESTAMP + '36h'::interval,
'45m'::interval)
);
Now the query:
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
;
produces this result:
date_bucket | count
---------------------+-------
2019-03-06 07:00:00 | 17
2019-03-07 07:00:00 | 32
2019-03-08 07:00:00 | 32
2019-03-09 07:00:00 | 16
(4 rows)
You can use aggregation -- by subtracting 7 hours:
select (createdon - interval '7 hour')::date as dy, count(*)
from t
group by dy
order by dy;
I have two tables that are correctly
user journeys
id timestamp bus
1 00:10 12
1 16:10 12
2 14:00 23
bus
id timestamp price
12 00:00 1.3
12 00:10 1.5
12 00:20 1.7
12 18:00 2.0
13 00:00 3.0
My goal is to find how much each user spent on travel today.
In our case, the user took bus number 12 at 00:10 and paid 1.5, and another one at 16:10 where the price increased to 1.7. In total, this person paid 3.2 today. We always take the latest updated price.
I've done this using a massive subquery and it looks inefficient. Does anyone have a slick solution?
Sample Data For Reproduction:
Please see http://sqlfiddle.com/#!17/10ad6/2
Or Build Schema:
drop table if exists journeys;
create table journeys(
id numeric,
timestamp timestamp without time zone,
bus numeric
);
truncate table journeys;
insert into journeys
values
(1, '2018-08-22 00:10:00', 12),
(1, '2018-08-22 16:10:00', 12),
(2, '2018-08-22 14:00:00', 23);
-- Bus Prices
drop table if exists bus;
create table bus (
bus_id int,
timestamp timestamp without time zone,
price numeric
);
truncate table bus;
insert into bus
values
(12, '2018-08-22 00:10:00', 1.3),
(12, '2018-08-22 00:10:00', 1.5),
(12, '2018-08-22 00:20:00', 1.7),
(12, '2018-08-22 18:00:00', 2.0),
(13, '2018-08-22 00:00:00', 3.0);
I don't know that this is faster than your solution (which you don't show). A correlated subquery seems like a reasonable solution.
But another method is:
SELECT j.*, b.price
FROM journeys j LEFT JOIN
(SELECT b.*, LEAD(timestamp) OVER (PARTITION BY bus_id ORDER BY timestamp) as next_timestamp
FROM bus b
) b
ON b.bus_id = j.bus AND
j.timestamp >= b.timestamp AND
(j.timestamp < b.next_timestamp OR b.next_timestamp IS NULL);
You may also do this using an inner join and windowing functions:
SELECT user_id, SUM(price)
FROM
(
SELECT user_id, journey_timestamp, bus_id, price_change_timestamp,
COALESCE(LEAD(price_change_timestamp) OVER(PARTITION BY bus_id ORDER BY price_change_timestamp), CAST('2100-01-01 00:00:00' AS TIMESTAMP)) AS next_price_timestamp, price
FROM
(
SELECT a.id AS user_id, a.timestamp AS journey_timestamp, a.bus AS bus_id, b.timestamp AS price_change_timestamp, b.price
FROM journeys a
INNER JOIN bus b
ON a.bus = b.bus_id
) a1
) a2
WHERE journey_timestamp >= price_change_timestamp AND journey_timestamp < next_price_timestamp
GROUP BY user_id
This is essentially what is happening:
1) The inner query joins the tables, ensuring that each journey transaction is matched to all price fares the bus has had at all points of time.
2) The LEAD function partitions by bus_id ordered by the times when the bus fares changed, to create a "window" for which that fare is valid. The COALESCE hack is to work around the NULLs that are generated in the process.
3) We filter by those rows where the journey timestamp lies within the "window", and find the fares for each user with a groupby.
I have a table of orders which have a create_date_time (ie - 02/12/2015 14:00:44)
What I would like to do is group two months worth of orders by this create_date_time but instead of using trunc and using a proper day I'd like to go from 6am to 6am. I've tried this below but it doesn't seem to work in that way, rather it truncates and then alters the create_date_time.
select "Date", sum(CFS), sum(MCR) from
(select trunc(phi.create_date_Time)+6/24 as "Date",
case when pkt_sfx = 'CFS' then sum(total_nbr_of_units)
End as CFS,
case when pkt_sfx <> 'CFS' then sum(total_nbr_of_units)
end as MCR
from pkt_hdr ph
inner join pkt_hdr_intrnl phi
on phi.pkt_ctrl_nbr = ph.pkt_ctrl_nbr
where sale_grp = 'I'
group by trunc(phi.create_date_time)+6/24, pkt_sfx
union
select trunc(phi.create_date_Time)+6/24 as "Date",
case when pkt_sfx = 'CFS' then sum(total_nbr_of_units)
End as CFS,
case when pkt_sfx <> 'CFS' then sum(total_nbr_of_units)
end as MCR
from wm_archive.pkt_hdr ph
inner join wm_archive.pkt_hdr_intrnl PHI
on phi.pkt_Ctrl_nbr = ph.pkt_ctrl_nbr
where sale_grp = 'I'
and trunc(phi.create_date_time) >= trunc(sysdate)-60
group by trunc(phi.create_date_time)+6/24, pkt_sfx
)
group by "Date"
Please note the union isn't necessarily important but it is required in the code as half the results will be archived but the current archive day will cause date overlap that must be removed with the outer query.
Thanks
If I understood correctly, you need to subtract six hours from date and then trunc this date:
select trunc(dt-6/24) dt, sum(units) u
from ( select dt, units from t1 union all
select dt, units from t2 )
group by trunc(dt-6/24)
Test:
create table t1 (dt date, units number(5));
insert into t1 values (timestamp '2015-12-01 12:47:00', 7);
insert into t1 values (timestamp '2015-12-01 23:47:00', 7);
create table t2 (dt date, units number(5));
insert into t2 values (timestamp '2015-12-02 05:47:00', 7);
insert into t2 values (timestamp '2015-12-02 14:47:00', 7);
Output:
Dt U
---------- ---
2015-12-01 21
2015-12-02 7
Just swap the ordering of TRUNC and adding 6h - instead of
select trunc(phi.create_date_Time)+6/24 as "Date"
use
select trunc(phi.create_date_Time + 6/24) as "Date"
(you also need to change the other occurrences of trunc())
BTW: I'd use another name for the "Date" column - DATE is a SQL data type, so having a column named "Date" is somewhat confusing.