Postgres generate_series() of given length - sql

We can generate a datetime array with specified bin-width using:
select generate_series(
timestamp without time zone '2020-10-01 00:00:00', '2020-10-04 00:00:00',
'24 hours') as ts
ts
1 2020-10-01
2 2020-10-02
3 2020-10-03
4 2020-10-04
Is it possible to generate an array of set length i.e. a given number of bins/breaks?
I want to provide a date range and number of equal intervals to divide it into.

From the comments:
I want to provide a date range and number of equal intervals to divide it into
You can use generate_series() with integers, and date arithmetics:
with params as (
select
timestamp '2020-10-01' ts_start,
timestamp '2020-10-04' ts_end,
3 num
)
select ts_start + (ts_end - ts_start) * i / num as ts
from params
cross join lateral generate_series(0, num) s(i)
This splits the given time range into 3 intervals (resulting in a total of 4 timestamps).
Demo on DB Fiddle:
| ts |
| :------------------ |
| 2020-10-01 00:00:00 |
| 2020-10-02 00:00:00 |
| 2020-10-03 00:00:00 |
| 2020-10-04 00:00:00 |

Related

How to group sum results by date with custom start time PostrgresQL

I am trying to group my sum results by custom day in Postgresql.
As regular day starts at 00:00 , I would like mine to start at 04:00am , so if there is entry with time 2019-01-03 02:23 it would count into '2019-01-02' instead.
Right now my code looks like this:
Bottom part works perfectly on day type 00:00 - 23.59 , however I would like to group it by my upper range created above. I just don't know how to connect those two parts.
with dateRange as(
SELECT
generate_series(
MIN(to_date(payments2.paymenttime,'DD Mon YYYY')) + interval '4 hour',
max(to_date(payments2.paymenttime,'DD Mon YYYY')),
'24 hour') as theday
from payments2
)
select
sum(cast(payments2.servicecharge as money)) as total,
to_date(payments2.paymenttime,'DD Mon YYYY') as date
from payments2
group by date
Result like this
+------------+------------+
| total | date |
+------------+------------+
| 20 | 2019-01-01 |
+------------+------------+
| 60 | 2019-01-02 |
+------------+------------+
| 35 | 2019-01-03 |
+------------+------------+
| 21 | 2019-01-04 |
+------------+------------+
Many thanks for your help.
If I didn't misunderstand your question, you just need to subtract 4 hours from the timestamp before casting to date, you don't even need the CTE.
Something like
select
sum(cast(payments2.servicecharge as money)) as total,
(to_timestamp(payments2.paymenttime,'DD Mon YYYY HH24:MI:SS') - interval '4 hours')::date as date
from payments2
group by date
Yu may need to use a different format in the to_timestamp function depending on the format of the payments2.paymenttime string

How to convert a number (YYMMDDhhmmss) into timestamp in Oracle's SQL?

I am trying to calculate a difference between systimestamp and a number in Oracle's SQL.
The number format is YYMMDDhhmmss (For example 190903210000). This number is held in a table and is based on 24 hour timezone.
I am trying to calculate a difference that number and system timestamp in seconds.
I have to use a select sentence as below:
SELECT X FROM table
Then I need to calculate a difference (SYSTEMTIMESTAMP - X)
My system timestamp is formatted like 03-SEP-19 06.21.49.817757 PM +00:00
Could you please advise me on the right approach?
To convert a NUMBER to a TIMESTAMP you can use an expression like TO_TIMESTAMP(TO_CHAR(...)). To compute the number of seconds between two timestamps, one solution is to cast both to dates, and substract them : you will get a (decimal) result in days, which can be then converted to seconds.
Consider:
(
CAST(systimestamp AS DATE)
- CAST(TO_TIMESTAMP(TO_CHAR(190903210000), 'YYMMDDhh24miss') AS DATE)
) * 60 * 60 * 24
However, since your numeric representation of the timestamp does not contain fractional seconds (nor timezone), it would probably be simpler to convert directly to a DATE, which would remove the need to CAST it afterwards, hence:
(
CAST(systimestamp AS DATE)
- TO_DATE(TO_CHAR(190903210000), 'YYMMDDhh24miss')
) * 60 * 60 * 24
Demo on DB Fiddle:
SELECT
systimestamp,
190903210000 num,
TO_TIMESTAMP(TO_CHAR(190903210000), 'YYMMDDhh24miss') num_as_timestamp,
(
CAST(systimestamp AS DATE) -
CAST(TO_TIMESTAMP(TO_CHAR(190903210000), 'YYMMDDhh24miss') AS DATE)
) * 60 * 60 * 24 diff
FROM DUAL;
SYSTIMESTAMP | NUM | NUM_AS_TIMESTAMP | DIFF
:---------------------------------- | -----------: | :------------------------------ | ---:
03-SEP-19 09.13.39.989343 PM +01:00 | 190903210000 | 03-SEP-19 09.00.00.000000000 PM | 819
SELECT
systimestamp,
190903210000 num,
TO_DATE(TO_CHAR(190903210000), 'YYMMDDhh24miss') num_as_date,
(
CAST(systimestamp AS DATE)
- TO_DATE(TO_CHAR(190903210000), 'YYMMDDhh24miss')
) * 60 * 60 * 24 diff
FROM DUAL;
SYSTIMESTAMP | NUM | NUM_AS_DATE | DIFF
:---------------------------------- | -----------: | :----------------- | ----------------------------------------:
03-SEP-19 09.20.44.524445 PM +01:00 | 190903210000 | 03-SEP-19 21:00:00 | 1243.999999999999999999999999999999999996
Try this:
select 190903210000 n,
to_timestamp(to_char(190903210000), 'YYMMDDHH24MISS') ts
from dual;
In SQLFiddle and the Oracle documentation.

Summing counts based on overlapping intervals in postgres

I want to sum the column for every two minute interval (so it would be the sum of 1,2 and 2,3 and 3,4, etc...), but I'm not exactly sure how to go about doing that.
My data looks something like:
minute | source | count
2018-01-01 10:00 | a | 7
2018-01-01 10:01 | a | 5
2018-01-01 10:02 | a | 10
2018-01-01 10:00 | b | 20
2018-01-01 10:05 | a | 12
What I want
(e.g. row1+row2, row2+3, row3, row4, row5)
minute | source | count
2018-01-01 10:00 | a | 12
2018-01-01 10:01 | a | 15
2018-01-01 10:02 | a | 10
2018-01-01 10:00 | b | 20
2018-01-01 10:05 | a | 12
You can use a correlated subquery selecting the sum of the counts for the records in the interval sharing the source (I guess that the source must match is an requirement. If not, just remove the comparison in the WHERE clause.).
SELECT "t1"."minute",
"t1"."source",
(SELECT sum("t2"."count")
FROM "elbat" "t2"
WHERE "t2"."source" = "t1"."source"
AND "t2"."minute" >= "t1"."minute"
AND "t2"."minute" <= "t1"."minute" + INTERVAL '1 MINUTE') "count"
FROM "elbat" "t1";
SQL Fiddle
the post above assumes all the timestamps are to the minute. if you want to check for every 2 minutes throughout the day you can use the generate_series function. the issue with including the beginning minute and ending time in each interval will be b having 2 rows in the results.
ie.
select begintime,
endtime,
source,
sum(count)
from mytable
inner join (
select begintime, endtime
from (
select lag(time, 1) over (order by time) as begintime,
time as endtime
from (
select *
from generate_series('2018-01-01 00:00:00', '2018-01-02 00:00:00', interval '2 minutes') time
) q
) q2
where begintime is not null
) times on minute between begintime and endtime
group by begintime, endtime, source
order by begintime, endtime, source
you can change the 'minute between begintime and endtime' to 'minute > begintime and minute <= endtime' if you don't want that overlap

Convert timestamp value from string to timestamp hive

I have timestamp value stored as string in my table created in hive, and want to convert it to the timestamp type.
I tried the following code:
select date_value, FROM_UNIXTIME(UNIX_TIMESTAMP(date_value, 'dd-MMM-YY HH.mm.ss')) from sales limit 2;
Original time and result is as following:
Original time result
07-NOV-12 17.07.03 2012-01-01 17:07:03
25-FEB-13 04.26.53 2012-12-30 04:26:53
What's wrong in my script?
yy instead of YY
select date_value
,FROM_UNIXTIME(UNIX_TIMESTAMP(date_value, 'dd-MMM-yy HH.mm.ss')) as ts
from sales
;
+--------------------+---------------------+
| date_value | ts |
+--------------------+---------------------+
| 07-NOV-12 17.07.03 | 2012-11-07 17:07:03 |
| 25-FEB-13 04.26.53 | 2013-02-25 04:26:53 |
+--------------------+---------------------+

Filling Out & Filtering Irregular Time Series Data

Using Postgresql 9.4, I am trying to craft a query on time series log data that logs new values whenever the value updates (not on a schedule). The log can update anywhere from several times a minute to once a day.
I need the query to accomplish the following:
Filter too much data by just selecting the first entry for the timestamp range
Fill in sparse data by using the last reading for the log value. For example, if I am grouping the data by hour and there was an entry at 8am with a log value of 10. Then the next entry isn't until 11am with a log value of 15, I would want the query to return something like this:
Timestamp | Value
2015-07-01 08:00 | 10
2015-07-01 09:00 | 10
2015-07-01 10:00 | 10
2015-07-01 11:00 | 15
I have got a query that accomplishes the first of these goals:
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
)
select
time_range.hour,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1;
But I can't figure out how to fill in the nulls where there is no value. I tried using the lag() feature of Postgresql's Window functions, but it didn't work when there were multiple nulls in a row.
Here's a SQLFiddle that demonstrates the issue:
http://sqlfiddle.com/#!15/f4d13/5/0
your columns are log_hour and first_vlue
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
),
base as (
select
time_range.hour lh,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1)
SELECT
log_hour, log_val, value_partition, first_value(log_val) over (partition by value_partition order by log_hour)
FROM (
SELECT
date_trunc('hour', base.lh) as log_hour,
log_val,
sum(case when log_val is null then 0 else 1 end) over (order by base.lh) as value_partition
FROM base) as q
UPDATE
this is what your query return
Timestamp | Value
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
I want this result set to be split in groups like this
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
and to assign to every row in a group the value of first row from that group (done by last select)
In this case, a method for obtaining the grouping is to create a column which holds the number of
not null values counted until current row and split by this value. (use of sum(case))
value | sum(case)
| 10 | 1 |
| null | 1 |
| null | 1 |
| 15 | 2 | <-- new not null, increment
| nul | 2 |
| 19 | 3 | <-- new not null, increment
| 13 | 4 | <-- new not null, increment
and now I can partion by sum(case)