postgresql query to get counts between 12:00 and 12:00 - sql

I have the following query that works fine, but it is giving me counts for a single, whole day (00:00 to 23:59 UTC). For example, it's giving me counts for all of January 1 2017 (00:00 to 23:59 UTC).
My dataset lends itself to be queried from 12:00 UTC to 12:00 UTC. For example, I'm looking for all counts from Jan 1 2017 12:00 UTC to Jan 2 2017 12:00 UTC.
Here is my query:
SELECT count(DISTINCT, cwa, to_char(time, 'MM/DD/YYYY')
FROM counties
JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE cwa = 'MFR'
AND time BETWEEN '1987-06-01'
AND '1992-08-1'
GROUP BY cwa, to_char(time, 'MM/DD/YYYY');
FYI...I'm changing the format of the time so I can use the results more readily in javascript.
And a description of the dataset. It's thousands of point data that occurs within various polygons every second. I'm determining if the points are occurring withing the polygon "cwa = MFR" and then counting them.
Thanks for any help!

I see two approaches here.
first, join generate_series(start_date::timestamp,end_date,'12 hours'::interval) to get count in those generate_series. this would be more correct I believe. But it has a major minus - you have to lateral join it against existing data set to use min(time) and max(time)...
second, a monkey hack itself, but much less coding and less querying. Use different time zone to make 12:00 a start of the day, eg (you did not give the sample, so I generate content of counties with generate_series with 2 hours interval as sample data):
t=# with counties as (select generate_series('2017-09-01'::timestamptz,'2017-09-04'::timestamptz,'2 hours'::interval)
select count(1),to_char(g,'MM/DD/YYYY') from counties
group by to_char(g,'MM/DD/YYYY')
order by 2;
count | to_char
12 | 09/01/2017
12 | 09/02/2017
12 | 09/03/2017
1 | 09/04/2017
(4 rows)
so for UTC time zone there are 12 two hours interval rows for days above, due to inclusive nature of generate_series in my sample, 1 row for last days. in general: 37 rows.
Now a monkey hack:
t=# with counties as (select generate_series('2017-09-01'::timestamptz,'2017-09-04'::timestamptz,'2 hours'::interval)
select count(1),to_char(g at time zone 'utc+12','MM/DD/YYYY') from counties
group by to_char(g at time zone 'utc+12','MM/DD/YYYY')
order by 2;
count | to_char
6 | 08/31/2017
12 | 09/01/2017
12 | 09/02/2017
7 | 09/03/2017
(4 rows)
I select same dates for different time zone, switching it exactly 12 hours, getting first day starting at 31 Aug middday, not 1 Sep midnight, and the count changes, still totalling 37 rows, but grouping your requested way...
for your query I'd try smth like:
SELECT count(DISTINCT, cwa, to_char(time at time zone 'utc+12', 'MM/DD/YYYY')
FROM counties
JOIN ltg_data on ST_contains(counties.the_geom, ltg_data.ltg_geom)
WHERE cwa = 'MFR'
AND time BETWEEN '1987-06-01'
AND '1992-08-1'
GROUP BY cwa, to_char(time at time zone 'utc+12', 'MM/DD/YYYY');
also if you want to apply +12 hours logic to where clause - add at time zone 'utc+12' to "time" comparison as well


SQL - Fuzzy JOIN on Timestamp columns within X amount of time

Say I have two tables:
2015-08-03 21:00:00 UTC
2015-08-03 22:00:00 UTC
2015-08-04 3:00:00 UTC
2016-02-04 18:00:00 UTC
and b:
2015-08-03 21:23:00 UTC
San Francisco
2016-02-04 16:04:00 UTC
New York
I want to join to get a table who has fuzzy joined entries where every row in b tries to get joined to a row in a. Criteria:
The time is within 60 minutes. If a match does not exist within 60 minutes, do not include that row in the output.
In the case of a tie where some row in b could join onto two rows in a, pick the closest one in terms of time.
Example Output:
2015-08-03 21:00:00 UTC
San Francisco
What you need is an ASOF join. I don't think there is an easy way to do this with BigQuery. Other databases like Kinetica (and I think Clickhouse) support ASOF functions that can be used to perform 'fuzzy' joins.
The syntax for Kinetica would be something like the following.
ON ASOF(a.timestamp, b.timestamp, INTERVAL '0' MINUTES, INTERVAL '60' MINUTES, MIN)
The ASOF function above sets up an interval of 60 minutes within which to look for matches on the right side table. When there are multiple matches, it selects the one that is closest (MAX would pick the one that is farthest away).
As per my understanding and based on the data you provided I think the below query should work for your use case.
create temporary table a as(
select TIMESTAMP('2015-08-03 21:00:00 UTC') as ts, 3 as precipitation union all
select TIMESTAMP('2015-08-03 22:00:00 UTC'), 3 union all
select TIMESTAMP('2015-08-04 3:00:00 UTC'), 4 union all
select TIMESTAMP('2016-02-04 18:00:00 UTC'), 4
create temporary table b as(
select TIMESTAMP('2015-08-03 21:23:00 UTC') as ts,'San Francisco ' as loc union all
select TIMESTAMP('2016-02-04 14:04:00 UTC') as ts,'New York ' as loc
select b_ts,a_ts,loc,precipitation,diff_time_sec
select b.ts b_ts,a.ts a_ts,
ABS(TIMESTAMP_DIFF(b.ts,a.ts, SECOND)) as diff_time_sec,
from b
inner join a on b.ts between date_sub(a.ts, interval 60 MINUTE) and date_add(a.ts, interval 60 MINUTE)
qualify RANK() OVER(partition by b_ts ORDER BY diff_time_sec) = 1

PostgreSQL - Select splitted rows based on a column value

Could someone please suggest a query which splits items by working minutes per hour?
Source table
2021-02-01 14:10
2021-02-01 14:30
2021-02-01 16:30
Expected result
2021-02-01 14:00
2021-02-01 14:00
2021-02-01 15:00
2021-02-01 16:00
Thanks in advance!
You can accomplish this using a recursive query, which should work in both Redshift and PostgreSQL. First, extract
The hour and amount of minutes worked the first hour
The total minutes worked
Then, repeat by recursion for each row where the minutes worked in the current hour is less than total minutes worked. In the recursion, increase the starting hour by 1, and reduce total minutes worked by the minutes worked in the preceding hour.
Finally, aggregate the results by hour and ID.
with recursive
split_times(timestamp_by_hour, item_id, working_minutes, total_working_minutes) as
date_trunc('hour', start_timestamp),
least(total_working_minutes, 60 - extract(minutes from start_timestamp)),
from work_time
union all
timestamp_by_hour + interval '1 hour',
least(total_working_minutes - working_minutes, 60),
total_working_minutes - working_minutes
from split_times
where total_working_minutes > working_minutes
select timestamp_by_hour, item_id, sum(working_minutes) working_minutes
from split_times
group by timestamp_by_hour, item_id
order by timestamp_by_hour, item_id;
DB Fiddle

How to get users which were online everyday last week?

Data example:
id visiting_time
1 13.01.2001 02:34
1 14.01.2001 02:36
1 15.01.2001 02:36
1 16.01.2001 02:37
1 17.01.2001 02:38
1 18.01.2001 02:39
1 19.01.2001 02:40
2 13.01.2001 02:35
2 15.01.2001 02:36
2 16.01.2001 02:37
2 17.01.2001 02:38
2 18.01.2001 02:39
2 19.01.2001 02:40
I want to get all users which were online everyday for the last week, f.e. from 13th january 00:00 till 20th january 00:00.
For my data sample the answer is:
everyday for the last week, f.e. from 13th january 00:00 till 20th
january 00:00
I point it out myself. In general, I can choose any number of days I
I guess it works only as a filter so the task is "find users online everyday during selected interval
count(DISTINCT toDate(visiting_time)) AS number_of_days_visited
FROM user_visits
WHERE visiting_time BETWEEN '2001-01-13 00:00:00' AND '2001-01-20 00:00:00'
HAVING number_of_days_visited =
round((toUInt32(toDateTime('2001-01-20 00:00:00')) - toUInt32(toDateTime('2001-01-13 00:00:00'))) / 60 / 60 / 24)
In HAVING I computed number of days from the WHERE filter.
The below code will work only if the visiting_time column format is YYYY-MM-DD HH:MM, otherwise the dates are not comparable:
SELECT FROM (SELECT id, COUNT(DISTINCT substr(visiting_time, 1, 10)) AS counter From table1 WHERE ((visiting_time >= '2001-01-13 00:00' AND visiting_time < '2001-01-20 00:00')) GROUP BY id) AS t WHERE t.counter=7

Using sum function with a condition based on a returned value

I have a set of given month with a number of hours related to each of it
8/1/2013 3
9/1/2013 8
10/1/2013 2
11/1/2013 4
12/1/2013 1
I need to return the sum of hours for everything that is in the past including current month, in the example below, starting in august, sum would be august only. For september, I'd need august + september
8/1/2013 3 3
9/1/2013 8 11
10/1/2013 2 13
11/1/2013 4 17
12/1/2013 1 18
I am not sure how to proceed, since the date condition is different for each line.
If anyone can help on this, it'd be greatly appreciated
You can do this in most SQL dialects using a correlated subquery (or a non-equijoin, but I find the subquery cleaner):
select date, hours,
(select sum(t2.hours)
from t t2
where <=
) as cum
from t;
Many SQL engines also support the cumulative sum function, which would typically look like this:
select date, hours sum(hours) over (order by date) as cum
from t

SQL - Grouping results by custom 24 hour period

I need to create an Oracle 11g SQL report showing daily productivity: how many units were shipped during a 24 hour period. Each period starts at 6am and finishes at 5:59am the next day.
How could I group the results in such a way as to display this 24 hour period? I've tried grouping by day, but, a day is 00:00 - 23:59 and so the results are inaccurate.
The results will cover the past 2 months.
Many thanks.
group by trunc(your_date - 1/4)
Days are whole numbers in oracle so 6 am will be 0.25 of a day
so :
trunc(date + 0.25) as period, count(*) as number
from table
group by trunc(date + 0.25 )
I havent got an oracle to try it on at the moment.
Well, you could group by a calculated date.
So, add 6 hours to the dates and group by that which would then technically group your dates correctly and produce the correct results.
Assuming that you have a units column or similar on your table, perhaps something like this:
SQL Fiddle
TRUNC(us.shipping_datetime - 0.25) + 0.25 period_start
, TRUNC(us.shipping_datetime - 0.25) + 1 + (1/24 * 5) + (1/24/60 * 59) period_end
, SUM(us.units) units
FROM units_shipped us
GROUP BY TRUNC(us.shipping_datetime - 0.25)
This simply subtracts 6 hours (0.25 of a day) from each date. If the time is earlier than 6am, the subtraction will make it fall prior to midnight, and when the resultant value is truncated (time element is removed, the date at midnight is returned), it falls within the grouping for the previous day.
| April, 22 2013 06:00:00+0000 | April, 23 2013 05:59:00+0000 | 1 |
| April, 23 2013 06:00:00+0000 | April, 24 2013 05:59:00+0000 | 3 |
| April, 24 2013 06:00:00+0000 | April, 25 2013 05:59:00+0000 | 1 |
The bit of dynamic maths in the SELECT is just to help readability of the results. If you don't have a units column to SUM() up, i.e. each row represents a single unit, then substitute COUNT(*) instead.