How do i give the condition to group by time period? - sql

I need to get the count of records using PostgreSQL from time 7:00:00 am till next day 6:59:59 am and the count resets again from 7:00am to 6:59:59 am.
Where I am using backend as java (Spring boot).
The columns in my table are
id (primary_id)
createdon (timestamp)
name
department
createdby
How do I give the condition for shift wise?

You'd need to pick a slice based on the current time-of-day (I am assuming this to be some kind of counter which will be auto-refreshed in some application).
One way to do that is using time ranges:
SELECT COUNT(*)
FROM mytable
WHERE createdon <# (
SELECT CASE
WHEN current_time < '07:00'::time THEN
tsrange(CURRENT_DATE - '1d'::interval + '07:00'::time, CURRENT_DATE + '07:00'::time, '[)')
ELSE
tsrange(CURRENT_DATE + '07:00'::time, CURRENT_DATE + '1d'::interval + '07:00'::time, '[)')
END
)
;
Example with data: https://rextester.com/LGIJ9639

As I understand the question, you need to have a separate group for values in each 24-hour period that starts at 07:00:00.
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
This uses the date and time functions and the GROUP BY clause:
Shift the timestamp value back 7 hours ((createdon - '7h'::interval)), so the distinction can be made by a change of date (at 00:00:00). Then,
Truncate the value to the date (date_trunc('day', …)), so that all values in a bucket are flattened to a single value (the date at midnight). Then,
Add 7 hours again to the value (… + '7h'::interval), so that it represents the starting time of the bucket. Then,
Group by that value (GROUP BY date_bucket).
A more complete example, with schema and data:
DROP TABLE IF EXISTS lorem;
CREATE TABLE lorem (
id serial PRIMARY KEY,
createdon timestamp not null
);
INSERT INTO lorem (createdon) (
SELECT
generate_series(
CURRENT_TIMESTAMP - '36h'::interval,
CURRENT_TIMESTAMP + '36h'::interval,
'45m'::interval)
);
Now the query:
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
;
produces this result:
date_bucket | count
---------------------+-------
2019-03-06 07:00:00 | 17
2019-03-07 07:00:00 | 32
2019-03-08 07:00:00 | 32
2019-03-09 07:00:00 | 16
(4 rows)

You can use aggregation -- by subtracting 7 hours:
select (createdon - interval '7 hour')::date as dy, count(*)
from t
group by dy
order by dy;

Related

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

create a temporary sql table using recursion as a loop to populate custom time interval

Suppose you have a table like:
id subscription_start subscription_end segment
1 2016-12-01 2017-02-01 87
2 2016-12-01 2017-01-24 87
...
And wish to generate a temporary table with months.
One way would be to encode the month date as:
with months as (
select
'2016-12-01' as 'first',
'2016-12-31' as 'last'
union
select
'2017-01-01' as 'first',
'2017-01-31' as 'last'
...
) select * from months;
So that I have an output table like:
first_day last_day
2017-01-01 2017-01-31
2017-02-01 2017-02-31
2017-03-01 2017-03-31
I would like to generate a temporary table with a custom interval (above), without manually encoding all the dates.
Say the interval is of 12 months, for each year, for as many years there are in the db.
I'd like to have general approach to compute the months table with the same output as above.
Or, one may adjust the range to a custom interval (months split an year in 12 parts, but one may want to split a time in a custom interval of days).
To start, I was thinking to use recursive query like:
with months(id, first_day, last_day, month) as (
select
id,
first_day,
last_day,
0
where
subscriptions.first_day = min(subscriptions.first_day)
union all
select
id,
first_day,
last_day,
months.month + 1
from
subscriptions
left join months on cast(
strftime('%m', datetime(subscriptions.subscription_start)) as int
) = months.month
where
months.month < 13
)
select
*
from
months
where
month = 1;
but it does not do what I'd expect: here I was attempting to select the first row from the table with the minimum date, and populate a table at interval of months, ranging from 1 to 12. For each month, I was comparing the string date field of my table (e.g. 2017-03-01 = 3 is march).
The query above does work and also seems a bit complicated, but for the sake of learning, which alternative would you propose to create a temporary table months without manually coding the intervals ?

How to use BigQuery Analytic Functions to calculate time between timestamped rows?

I have a data set that represents analytics events like:
Row timestamp account_id type
1 2018-11-14 21:05:40 UTC abc start
2 2018-11-14 21:05:40 UTC xyz another_type
3 2018-11-26 22:01:19 UTC xyz start
4 2018-11-26 22:01:23 UTC abc start
5 2018-11-26 22:01:29 UTC xyz some_other_type
11 2018-11-26 22:13:58 UTC xyz start
...
With some number of account_ids. I need to find the average time between start records per account_id.
I'm trying to use analytic functions as described here. My end goal would be a table like:
Row account_id avg_time_between_events_mins
1 xyz 53
2 abc 47
3 pqr 65
...
my best attempt--based on this post--looks like this:
WITH
events AS (
SELECT
COUNTIF(type = 'start' AND account_id='abc') OVER (ORDER BY timestamp) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
account_id='abc')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
This calculates the time between each start event and the last non-start event prior to the next start event for a specific account_id.
I tried to use PARTITION and a WINDOW FRAME CLAUSE like this:
WITH
events AS (
SELECT
COUNT(*) OVER (PARTITION BY account_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
type = 'start')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
But I got a nonsense result table. Can anyone walk me through how I would write and reason about a query like this?
You don't really need analytic functions for this:
select timestamp_diff(min(timestamp), max(timestamp), MINUTE)) / nullif(count(*) - 1, 0)
from `myproject.dataset.events`
where type = 'start'
group by account_id;
This is the timestamp of the most recent minus the oldest, divided by one less than the number of starts. That is the average between the starts.

PostgreSQL sum of intervals

In my database I have rows like:
date , value
16:13:00, 500
16:17:00, 700
16:20:00, 0
Now I want to do "special sum" over value from 16:00:00 to 17:00:00. So until 16:13 we assume that we have 0.
So special sum would look like (I'll omit seconds):
...
0 + -- (16:12)
500 + -- (16:13)
500 + -- (16:14)
500 + -- (16:15)
500 + -- (16:16)
700 + -- (16:17)
700 + -- (16:18)
700 + -- (16:19)
0 + -- (16:20)
...
So I have in database only changes of value and when this change occurs. And I want to sum over the whole hour. Result of this should be 4100.
What is the optimal way of doing that kind of sum in sql with PostgreSQL?
Best
You could at first select only the hour of your timestamp and then group by this hour:
SELECT
sum(s.value),
s.hour
FROM
(SELECT
value,
EXTRACT(HOUR FROM time) as hour
FROM la_table) as s
GROUP BY s.hour
This way you would just get values from 15:00:00 to 15:59:59 of course.
SQLFiddle for playing: http://sqlfiddle.com/#!1/d6ad1/1
If I've understood you correctly, you are looking for simply totals per hour?
SELECT EXTRACT(hour FROM "date") hr,
SUM(value) total
FROM yourtable
GROUP BY hr
ORDER BY hr;
If I understand, you wish to select every entries with occurance within a time period. In this example any row with a time value within 10:00 and 11:00 is selected if it stars before the period or during the period or if it ends during the period of after the period.
select * from table
where table.start_time < end_of_period and table.end_time > end_of_period
select * from table
where (start_time < '2017-05-16 11:00:00') and (end_time > '2017-05-16 10:00:00')

Modulo Time in SQL Server 2005 - Return data every n hours

I have something like this:
SELECt *
FROM (
SELECT prodid, date, time, tmp, rowid
FROM live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
) b
WHERE b.rowid % 400 = 0
FYI: The reason for the convert in the where clause, is because my date is stored as a varchar(10), I had to convert it to datetime in order to get the correct range of data. (I tried a bunch of different things and this worked)
I'm wondering how I can return the data I want every 4 hours during those selected dates. I have data collected approximately every 5 seconds (with some breaks in data) - ie data wasn't collected during a 2 hour period, but then continues at 5 second increments.
In my example I just used a modulo with my rowid - and the syntax works, but as I mentioned above there are some periods where data isnt collected so using logic like: if you take data every 5 seconds and multiple that by 4 hours you can approximately say how many rows are in between wont work.
My time column is a varchar column and is in the form hh:mm:ss
My ideal output is:
| prodid | date | time | tmp |
| 4 | 3/19/2012 | 10:00:00 | 2.3 |
| 7 | 3/19/2012 | 14:00:24 | 3.2 |
As you can see I can be a bit off (in terms of seconds) - I more so need the approximate value in terms of time.
Thank you in advance.
This should work
select prodid, date, time, tmp, rowid
from live_pilot_plant as lpp
inner join (
select min(prodid) as prodid -- is prodid your PK?? if not change it to rowid or whatelse is your PK
from live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101) -- or whatever you want
AND CONVERT(DATETIME, '3/31/2012', 101) -- for better performance it is on the inner select
group by date,
floor( -- floor makes the trick
convert(float,convert(datetime, time)) -- assumes "time" column is a varchar containing data like '19:23:05'
* 6 -- 6 comes form 24 hours / 4 hours
)
) as filter on lpp.prodid = filter.prodid -- if prodid is not the PK also correct here.
A side note for everyone else who have date + time data in only one datetime field, suppose named "when_it_was", the group by can be as simple as:
group by floor(when_it_was * 6) -- again, 6 comes from 24/4
something along the lines of the following should work. Basically create date + time partitions, each partition representing a block of 4 hours and pick the record with the highest rank from each partition
select * from (
select *,
row_number() over (partition by date,cast(left( time, charindex( ':', time) - 1) as int) / 4 order by
date, time) as ranker from live_pilot_plant
) Z where ranker = 1
Assuming rowid is a PK and increased with date/time. Just convert time field to 4 hours interval number substring(time,1,2))/4 and select MIN(rowid) from each of 4 hours groups in a day:
select prodid, date, time, tmp, rowid from live_pilot_plant where rowid in
(
select min(rowid)
from live_pilot_plant
WHERE CONVERT(DATETIME, date, 101) BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
group by date,convert(int,substring(time,1,2))/4
)
order by CONVERT(DATETIME, date, 101),time