postgresql query to get counts between 12:00 and 12:00 - sql

I have the following query that works fine, but it is giving me counts for a single, whole day (00:00 to 23:59 UTC). For example, it's giving me counts for all of January 1 2017 (00:00 to 23:59 UTC).
My dataset lends itself to be queried from 12:00 UTC to 12:00 UTC. For example, I'm looking for all counts from Jan 1 2017 12:00 UTC to Jan 2 2017 12:00 UTC.
Here is my query:
SELECT count(DISTINCT ltg_data.lat), cwa, to_char(time, 'MM/DD/YYYY')
FROM counties
JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE cwa = 'MFR'
AND time BETWEEN '1987-06-01'
AND '1992-08-1'
GROUP BY cwa, to_char(time, 'MM/DD/YYYY');
FYI...I'm changing the format of the time so I can use the results more readily in javascript.
And a description of the dataset. It's thousands of point data that occurs within various polygons every second. I'm determining if the points are occurring withing the polygon "cwa = MFR" and then counting them.
Thanks for any help!

I see two approaches here.
first, join generate_series(start_date::timestamp,end_date,'12 hours'::interval) to get count in those generate_series. this would be more correct I believe. But it has a major minus - you have to lateral join it against existing data set to use min(time) and max(time)...
second, a monkey hack itself, but much less coding and less querying. Use different time zone to make 12:00 a start of the day, eg (you did not give the sample, so I generate content of counties with generate_series with 2 hours interval as sample data):
t=# with counties as (select generate_series('2017-09-01'::timestamptz,'2017-09-04'::timestamptz,'2 hours'::interval)
g)
select count(1),to_char(g,'MM/DD/YYYY') from counties
group by to_char(g,'MM/DD/YYYY')
order by 2;
count | to_char
-------+------------
12 | 09/01/2017
12 | 09/02/2017
12 | 09/03/2017
1 | 09/04/2017
(4 rows)
so for UTC time zone there are 12 two hours interval rows for days above, due to inclusive nature of generate_series in my sample, 1 row for last days. in general: 37 rows.
Now a monkey hack:
t=# with counties as (select generate_series('2017-09-01'::timestamptz,'2017-09-04'::timestamptz,'2 hours'::interval)
g)
select count(1),to_char(g at time zone 'utc+12','MM/DD/YYYY') from counties
group by to_char(g at time zone 'utc+12','MM/DD/YYYY')
order by 2;
count | to_char
-------+------------
6 | 08/31/2017
12 | 09/01/2017
12 | 09/02/2017
7 | 09/03/2017
(4 rows)
I select same dates for different time zone, switching it exactly 12 hours, getting first day starting at 31 Aug middday, not 1 Sep midnight, and the count changes, still totalling 37 rows, but grouping your requested way...
update
for your query I'd try smth like:
SELECT count(DISTINCT ltg_data.lat), cwa, to_char(time at time zone 'utc+12', 'MM/DD/YYYY')
FROM counties
JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE cwa = 'MFR'
AND time BETWEEN '1987-06-01'
AND '1992-08-1'
GROUP BY cwa, to_char(time at time zone 'utc+12', 'MM/DD/YYYY');
also if you want to apply +12 hours logic to where clause - add at time zone 'utc+12' to "time" comparison as well

Related

SQL - Fuzzy JOIN on Timestamp columns within X amount of time

Say I have two tables:
a:
timestamp
precipitation
2015-08-03 21:00:00 UTC
3
2015-08-03 22:00:00 UTC
3
2015-08-04 3:00:00 UTC
4
2016-02-04 18:00:00 UTC
4
and b:
timestamp
loc
2015-08-03 21:23:00 UTC
San Francisco
2016-02-04 16:04:00 UTC
New York
I want to join to get a table who has fuzzy joined entries where every row in b tries to get joined to a row in a. Criteria:
The time is within 60 minutes. If a match does not exist within 60 minutes, do not include that row in the output.
In the case of a tie where some row in b could join onto two rows in a, pick the closest one in terms of time.
Example Output:
timestamp
loc
precipitation
2015-08-03 21:00:00 UTC
San Francisco
3
What you need is an ASOF join. I don't think there is an easy way to do this with BigQuery. Other databases like Kinetica (and I think Clickhouse) support ASOF functions that can be used to perform 'fuzzy' joins.
The syntax for Kinetica would be something like the following.
SELECT *
FROM a
LEFT JOIN b
ON ASOF(a.timestamp, b.timestamp, INTERVAL '0' MINUTES, INTERVAL '60' MINUTES, MIN)
The ASOF function above sets up an interval of 60 minutes within which to look for matches on the right side table. When there are multiple matches, it selects the one that is closest (MAX would pick the one that is farthest away).
As per my understanding and based on the data you provided I think the below query should work for your use case.
create temporary table a as(
select TIMESTAMP('2015-08-03 21:00:00 UTC') as ts, 3 as precipitation union all
select TIMESTAMP('2015-08-03 22:00:00 UTC'), 3 union all
select TIMESTAMP('2015-08-04 3:00:00 UTC'), 4 union all
select TIMESTAMP('2016-02-04 18:00:00 UTC'), 4
);
create temporary table b as(
select TIMESTAMP('2015-08-03 21:23:00 UTC') as ts,'San Francisco ' as loc union all
select TIMESTAMP('2016-02-04 14:04:00 UTC') as ts,'New York ' as loc
);
select b_ts,a_ts,loc,precipitation,diff_time_sec
from(
select b.ts b_ts,a.ts a_ts,
ABS(TIMESTAMP_DIFF(b.ts,a.ts, SECOND)) as diff_time_sec,
*
from b
inner join a on b.ts between date_sub(a.ts, interval 60 MINUTE) and date_add(a.ts, interval 60 MINUTE)
)
qualify RANK() OVER(partition by b_ts ORDER BY diff_time_sec) = 1

PostgreSQL - Select splitted rows based on a column value

Could someone please suggest a query which splits items by working minutes per hour?
Source table
start_timestamp
item_id
total_working_minutes
2021-02-01 14:10
A
120
2021-02-01 14:30
B
20
2021-02-01 16:30
A
10
Expected result
timestamp_by_hour
item_id
working_minutes
2021-02-01 14:00
A
50
2021-02-01 14:00
B
20
2021-02-01 15:00
A
60
2021-02-01 16:00
A
20
Thanks in advance!
You can accomplish this using a recursive query, which should work in both Redshift and PostgreSQL. First, extract
The hour and amount of minutes worked the first hour
The total minutes worked
Then, repeat by recursion for each row where the minutes worked in the current hour is less than total minutes worked. In the recursion, increase the starting hour by 1, and reduce total minutes worked by the minutes worked in the preceding hour.
Finally, aggregate the results by hour and ID.
with recursive
split_times(timestamp_by_hour, item_id, working_minutes, total_working_minutes) as
(
select
date_trunc('hour', start_timestamp),
item_id,
least(total_working_minutes, 60 - extract(minutes from start_timestamp)),
total_working_minutes
from work_time
union all
select
timestamp_by_hour + interval '1 hour',
item_id,
least(total_working_minutes - working_minutes, 60),
total_working_minutes - working_minutes
from split_times
where total_working_minutes > working_minutes
)
select timestamp_by_hour, item_id, sum(working_minutes) working_minutes
from split_times
group by timestamp_by_hour, item_id
order by timestamp_by_hour, item_id;
DB Fiddle

How to get users which were online everyday last week?

Data example:
id visiting_time
1 13.01.2001 02:34
1 14.01.2001 02:36
1 15.01.2001 02:36
1 16.01.2001 02:37
1 17.01.2001 02:38
1 18.01.2001 02:39
1 19.01.2001 02:40
2 13.01.2001 02:35
2 15.01.2001 02:36
2 16.01.2001 02:37
2 17.01.2001 02:38
2 18.01.2001 02:39
2 19.01.2001 02:40
I want to get all users which were online everyday for the last week, f.e. from 13th january 00:00 till 20th january 00:00.
For my data sample the answer is:
id
1
Considered
everyday for the last week, f.e. from 13th january 00:00 till 20th
january 00:00
and
I point it out myself. In general, I can choose any number of days I
want.
I guess it works only as a filter so the task is "find users online everyday during selected interval
SELECT id,
count(DISTINCT toDate(visiting_time)) AS number_of_days_visited
FROM user_visits
WHERE visiting_time BETWEEN '2001-01-13 00:00:00' AND '2001-01-20 00:00:00'
GROUP BY id
HAVING number_of_days_visited =
round((toUInt32(toDateTime('2001-01-20 00:00:00')) - toUInt32(toDateTime('2001-01-13 00:00:00'))) / 60 / 60 / 24)
In HAVING I computed number of days from the WHERE filter.
The below code will work only if the visiting_time column format is YYYY-MM-DD HH:MM, otherwise the dates are not comparable:
SELECT t.id FROM (SELECT id, COUNT(DISTINCT substr(visiting_time, 1, 10)) AS counter From table1 WHERE ((visiting_time >= '2001-01-13 00:00' AND visiting_time < '2001-01-20 00:00')) GROUP BY id) AS t WHERE t.counter=7

Using sum function with a condition based on a returned value

I have a set of given month with a number of hours related to each of it
DATE HOURS
8/1/2013 3
9/1/2013 8
10/1/2013 2
11/1/2013 4
12/1/2013 1
I need to return the sum of hours for everything that is in the past including current month, in the example below, starting in august, sum would be august only. For september, I'd need august + september
DATE HOURS SUM
8/1/2013 3 3
9/1/2013 8 11
10/1/2013 2 13
11/1/2013 4 17
12/1/2013 1 18
I am not sure how to proceed, since the date condition is different for each line.
If anyone can help on this, it'd be greatly appreciated
You can do this in most SQL dialects using a correlated subquery (or a non-equijoin, but I find the subquery cleaner):
select date, hours,
(select sum(t2.hours)
from t t2
where t2.date <= t.date
) as cum
from t;
Many SQL engines also support the cumulative sum function, which would typically look like this:
select date, hours sum(hours) over (order by date) as cum
from t

SQL - Grouping results by custom 24 hour period

I need to create an Oracle 11g SQL report showing daily productivity: how many units were shipped during a 24 hour period. Each period starts at 6am and finishes at 5:59am the next day.
How could I group the results in such a way as to display this 24 hour period? I've tried grouping by day, but, a day is 00:00 - 23:59 and so the results are inaccurate.
The results will cover the past 2 months.
Many thanks.
group by trunc(your_date - 1/4)
Days are whole numbers in oracle so 6 am will be 0.25 of a day
so :
select
trunc(date + 0.25) as period, count(*) as number
from table
group by trunc(date + 0.25 )
I havent got an oracle to try it on at the moment.
Well, you could group by a calculated date.
So, add 6 hours to the dates and group by that which would then technically group your dates correctly and produce the correct results.
Assuming that you have a units column or similar on your table, perhaps something like this:
SQL Fiddle
SELECT
TRUNC(us.shipping_datetime - 0.25) + 0.25 period_start
, TRUNC(us.shipping_datetime - 0.25) + 1 + (1/24 * 5) + (1/24/60 * 59) period_end
, SUM(us.units) units
FROM units_shipped us
GROUP BY TRUNC(us.shipping_datetime - 0.25)
ORDER BY 1
This simply subtracts 6 hours (0.25 of a day) from each date. If the time is earlier than 6am, the subtraction will make it fall prior to midnight, and when the resultant value is truncated (time element is removed, the date at midnight is returned), it falls within the grouping for the previous day.
Results:
| PERIOD_START | PERIOD_END | UNITS |
-----------------------------------------------------------------------
| April, 22 2013 06:00:00+0000 | April, 23 2013 05:59:00+0000 | 1 |
| April, 23 2013 06:00:00+0000 | April, 24 2013 05:59:00+0000 | 3 |
| April, 24 2013 06:00:00+0000 | April, 25 2013 05:59:00+0000 | 1 |
The bit of dynamic maths in the SELECT is just to help readability of the results. If you don't have a units column to SUM() up, i.e. each row represents a single unit, then substitute COUNT(*) instead.