Getting random time interval in postgreSQL - sql

I need random interval time between 0 and (10 days and 5 hours).
My code:
select random() * (interval '10 days 5 hours')
from generate_series(1, 50)
It works like should, except a few strange results, like:
0 years 0 mons 7 days 26 hours 10 mins 1.353353 secs
The problem is 26 hours, it shouldn't be more than 23. And I never get 10 days, what I'd like to.

Intervals in Postgres are quite flexible, so hour values of greater than 23 do not necessarily roll over to days. Use jusify_interval() to return them to the normal "days" and "hours"."
So:
select justify_interval(random() * interval '10 day 5 hour')
from generate_series(1, 200)
order by 1 desc;
will return values with appropriate values for days, hours, minutes, and seconds.
Now, why aren't you getting intervals with more than 10 days? This is simple randomness. If you increase the number of rows to 200 (as above), you'll see them (in all likelihood). If you run the code multiple times, sometimes you'll see none in that range; sometimes you'll see two.
Why? You are asking how often you get a value of 240+ in a range of 245. Those top 5 hours account for 0.02% of the range (about 1/50). In other words a sample of 50 is not big enough -- any given sample of 50 random values is likely to be missing 1 or more 5 hour ranges.
Plus, without justify_interval(), you are likely to miss those anyway because they may show up as 9 days with an hours component larger than 23.

Try this:
select justify_hours(random() * (interval '245 hours'))
FROM generate_series(1, 50)
See Postgres Documentation for an explanation of the justify_* functions.

One option would be to use an interval of one hour, and then multiply by the random number between 0 and 1 coming from the series:
select random() * 245 * interval '1 hour'
from generate_series(1, 50);
I can see that the other answers suggest using justify_interval. If you just want a series of intervals between 0 and 245 hours (245 hours corresponding to 10 days and 5 hours), then my answer should suffice.

Related

Converting duration in varchar to number type and minutes

I'm struggling with this.
I have a column in Snowflake called DURATION, it is VARCHAR type.
The values include basically number in days, hours, minutes, seconds. The value could include either just the number with one unit of time (day or hour or minute or second) such as 3 hours or 14 minutes or 3 seconds or it could include the combination of either all units of time or a few such as 1 day 3 hours 35 minutes or 1 hour 9 minutes or 45 minutes 1 second.
The value could also be blank or invalid such as text or it could be indicating day, hour or minute but without a number (see the last 3 rows in the table below).
I would greatly appreciate it if you guys could help me with the following:
in SNOWFLAKE, convert all valid values to number type and normalize them to minutes (e.g. the resulted value for 7 Hours and 13 Minutes would be 433).
Thanks a lot, guys!
DURATION
1 Second
10 Seconds
1 Minute
3 Minutes
20 Minutes
1 Hour
2 Hours
7 Hours 13 Minutes
1 Hour 1 Minute
1 Day
1 Day 1 Hour
1 Day 1 Hour 1 Minute
1 Day 10 Hours
2 Days 1 Hour
3 Days 9 Hours
1 Day 3 Hours 45 Minutes
Duration (invalid)
Days
Day Minute
Minutes
I tried many things using regex_substr, try_to_number, coalesce functions in CASE statements but I'm getting either 0s or NULL for all values. Very frustrating
I think you would want to use STRTOK_TO_ARRAY in a CTE subquery or put into a temp table. Then you could use ARRAY_POSITION to find the labels and the index one less than the label should be the value. Those values could be put into separate columns with a case for each label pulling the found values. The case statements could be computed columns if you insert the results of the first query into a table. From there you can concatenate colons and cast to a time type and use datediff, or do the arithmetic to calculate the minutes.

Optimization: How to get TimeId from time for each minute in a week?

I am creating a table which will have 2 columns:
Day_time (time from 1978-01-01 00:00:00 Sunday, till 1978-01-07 23:59:00.0 Saturday, Granularity: Minute)
Time_id (a unique id for each minute), to be populated
I have column one populated. I want to populate column two.
How I am doing it right now:
EXTRACT(dayofweek FROM day_time) * 10000 + DATEDIFF('minutes', TRUNC(day_time), day_time)
I basically want a function where I pass any date and it tells me where I am in a week. So, I need a function, just like the function above. Just more optimized, where I give a date and get a unique ID. The unique ID should repeat weekly.
Example: ID for Jan 1, 2015 00:00:00 will be same as Jan 8, 2015 00:00:00.
Why 1978-01-01? cuz it starts from a Sunday.
Why 10,000? cuz the number of minutes in a day are in four digits.
You can do it all in one fell swoop, without needing to extract the date separately:
SELECT DATEDIFF('minutes', date_trunc('week',day_time), day_time) which I'd expect to be marginally faster.
Another approach that I'd expect to be significantly faster would be converting the timestamp to epoch, dividing by 60 to get minutes from epoch and then taking the value modulus of 10,080 (for 60 * 24 * 7 minutes in a week).
SELECT (extract(epoch from day_time) / 60) % 10080
If you don't care about the size of the weekly index, you could also do:
SELECT (extract(epoch from day_time)) % 604800 and skip the division step altogether, which should make it faster still.

Updating dates with random time

I have a datetime column, all of them at 12:00 am. Is there a way to update them with random hours, minutes to nearest 1/2 hour while keeping the same date(day) value?
Update Activities set ActivityDate = ....
Here's one option using dateadd:
update Activities
set ActivityDate = DateAdd(minute,
30 * (abs(checksum(NewId())) % 47), ActivityDate);
SQL Fiddle Demo
And here's a good post about generating random numbers. Using that, multiple by 30 minutes to get to the nearest half hour.
Note, this uses % 47 since there are 1440 minutes in a day -- that divides into 48 potential half hour segments in that same day.

Efficient PostgreSQL Query for Mins and Maxis withing equal intervals in a time period

I am using Postgres v9.2.6.
Have a system with lots of devices that take measurements. These measurements are stored in
table with three fields.
device_id
measurement (Indexed)
time (Indexed)
There could be 10 Million measurements in a single year. Most of the time the user is only interested in 100 min max pairs within equal interval for a certain period, for example in last 24 hours or in last 53 weeks. To get these 100 mins and maxs the period is divided into 100 equal intervals. From each interval min and max is extracted. Would you recommend the most efficient approach to query the data? So far I have tried the following query:
WITH periods AS (
SELECT time.start AS st, time.start + (interval '1 year' / 100) AS en FROM generate_series(now() - interval '1 year', now(), interval '1 year' / 100) AS time(start)
)
SELECT * FROM sample_data
JOIN periods
ON created_at BETWEEN periods.st AND periods.en AND
customer_id = 23
WHERE
sample_data.id = (SELECT id FROM sample_data WHERE created_at BETWEEN periods.st AND periods.en ORDER BY sample ASC LIMIT 1)
This test approach took over a minute for 1 million points on MacBook Pro.
Thanks...
Sorry about that. It was actually my question and looks like the author of this post caught cold so I ca not ask him to edit it. I've posted "more good" question here - Slow PostgreSQL Query for Mins and Maxs within equal intervals in a time period. Could you please close this question?

date_trunc 5 minute interval in PostgreSQL [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the fastest way to truncate timestamps to 5 minutes in Postgres?
Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
I want to aggregate data at 5 minute intervals in PostgreSQL. If I use the date_trunc() function, I can aggregate data at an hourly, monthly, daily, weekly, etc. interval but not a specific interval like 5 minute or 5 days.
select date_trunc('hour', date1), count(*) from table1 group by 1;
How can we achieve this in PostgreSQL?
SELECT date_trunc('hour', date1) AS hour_stump
, (extract(minute FROM date1)::int / 5) AS min5_slot
, count(*)
FROM table1
GROUP BY 1, 2
ORDER BY 1, 2;
You could GROUP BY two columns: a timestamp truncated to the hour and a 5-minute-slot.
The example produces slots 0 - 11. Add 1 if you prefer 1 - 12.
I cast the result of extract() to integer, so the division / 5 truncates fractional digits. The result:
minute 0 - 4 -> slot 0
minute 5 - 9 -> slot 1
etc.
This query only returns values for those 5-minute slots where values are found. If you want a value for every slot or if you want a running sum over 5-minute slots, consider this related answer:
PostgreSQL: running count of rows for a query 'by minute'
Here's a simple query you can either wrap in a function or cut and paste all over the place:
select now()::timestamp(0), (extract(epoch from now()::timestamptz(0)-date_trunc('d',now()))::int)/60;
It'll give you the current time, and a number from 0 to the n-1 where n=60 here. To make it every 5 minutes, make that number 300 and so on. It groups by the seconds since the start of the day. To make it group by seconds since year begin, hour begin, or whatever else, change the 'd' in the date_trunc.