PostgreSQL "nested"? distincts and count - sql

I need to get the count of the distinct names per hour in one query in PostgreSQL 9.1
The relevant columns(generalized for question) in my table are:
occurred timestamp with time zone and
name character varying(250)
And the table name for the sake of the question is just table
The occurred timestamps will all be within a midnight to midnight(exclusive) range for one day. So far my query looks like:
'SELECT COUNT(DISTINCT ON (name)) FROM table'
It would be nice if I could get the output formatted as a list of 24 integers(one for each hour of the day), the names aren't required to be returned.

If I understand correctly what you want, you can write:
SELECT EXTRACT(HOUR FROM occurred),
COUNT(DISTINCT name)
FROM ...
WHERE ...
GROUP
BY EXTRACT(HOUR FROM occurred)
ORDER
BY EXTRACT(HOUR FROM occurred)
;

SELECT date_trunc('hour', occurred) AS hour_slice
,count(DISTINCT name) AS name_ct
FROM mytable
GROUP BY 1
ORDER BY 1;
DISTINCT ON is a different feature.
date_trunc() gives you a sum for every distinct hour, while EXTRACT sums per hour-of-day over longer periods of time. The two results do not add up, because summing up multiple count(DISTINCT x) is equal or greater than one count(DISTINCT x).

You want this by hour:
select extract(hour from occurred) as hr, count(distinct name)
from table t
group by extract(hour from occurred)
order by 1
This assumes there is data for only one day. Otherwise, hours from different days would be combined. To get around this, you would need to include date information as well.

Related

Obtain latest record for a given second Postgres

I have data with millisecond precision timestamp. I want to only filter for the most recent timestamp within a given second. Ie. records (2020-07-13 5:05.38.009, event1), (2020-07-13 5:05.38.012, event2) should only retrieve the latter.
I've tried the following:
SELECT
timestamp as time, event as value, event_type as metric
FROM
table
GROUP BY
date_trunc('second', time)
But then I'm asked to group by event as well and I see all the data (as if no group by was provided)
In Postgres, you can use distinct on:
select distinct on (date_trunc('second', time)) t.*
from t
order by time desc;

SQL question: count of occurrence greater than N in any given hour

I'm looking through login logs (in Netezza) and trying to find users who have greater than a certain number of logins in any 1 hour time period (any consecutive 60 minute period, as opposed to strictly a clock hour) since December 1st. I've viewed the following posts, but most seem to address searching within a specific time range, not ANY given time period. Thanks.
https://dba.stackexchange.com/questions/137660/counting-number-of-occurences-in-a-time-period
https://dba.stackexchange.com/questions/67881/calculating-the-maximum-seen-so-far-for-each-point-in-time
Count records per hour within a time span
You could use the analytic function lag to look back in a sorted sequence of time stamps to see whether the record that came 19 entries earlier is within an hour difference:
with cte as (
select user_id,
login_time,
lag(login_time, 19) over (partition by user_id order by login_time) as lag_time
from userlog
order by user_id,
login_time
)
select user_id,
min(login_time) as login_time
from cte
where extract(epoch from (login_time - lag_time)) < 3600
group by user_id
The output will show the matching users with the first occurrence when they logged a twentieth time within an hour.
I think you might do something like that (I'll use a login table, with user, datetime as single column for the sake of simplicity):
with connections as (
select ua.user
, ua.datetime
from user_logons ua
where ua.datetime >= timestamp'2018-12-01 00:00:00'
)
select ua.user
, ua.datetime
, (select count(*)
from connections ut
where ut.user = ua.user
and ut.datetime between ua.datetime and (ua.datetime + 1 hour)
) as consecutive_logons
from connections ua
It is up to you to complete with your columns (user, datetime)
It is up to you to find the dateadd facilities (ua.datetime + 1 hour won't work); this is more or less dependent on the DB implementation, for example it is DATE_ADD in mySQL (https://www.w3schools.com/SQl/func_mysql_date_add.asp)
Due to the subquery (select count(*) ...), the whole query will not be the fastest because it is a corelative subquery - it needs to be reevaluated for each row.
The with is simply to compute a subset of user_logons to minimize its cost. This might not be useful, however this will lessen the complexity of the query.
You might have better performance using a stored function or a language driven (eg: java, php, ...) function.

How to compare time stamps from consecutive rows

I have a table that I would like to sort by a timestamp desc and then compare all consecutive rows to determine the difference between each row. From there, I would like to find all the rows whose difference is greater than ~2hours.
I'm stuck on how to actually compare consecutive rows in a table. Any help would be much appreciated.
I'm using Oracle SQL Developer 3.2
You didn't show us your table definition, but something like this:
select *
from (
select t.*,
t.timestamp_column,
t.timestamp_column - lag(timestamp_column) over (order by timestamp_column) as diff
from the_table t
) x
where diff > interval '2' hour;
This assumes that timestamp_column is defined as timestamp not date (otherwise the result of the difference wouldn't be an interval)

sql Query to find the maximium hour of particular event in table

I have a single table with fields (crime-id int , crime_time timestamp , crime string, city string )
There are only 9 unique crimes in table . I need to find the Time ie the hour in which a particular crime occured frequency in max times . Eg if Robbery cause most between 10- 11 it must show 10 or 11 ... the time may start from 00:00 nd ends in 23:59
viod answer is almost ok.
But you need a group by to count the robbery in time slot.
Also need put an alias for the subquery.
SELECT period, max(nb)
FROM (
SELECT extract(hour from crime_time) as period, count(*) as nb
FROM crimes
WHERE crime_string = 'Robbery'
GROUP BY extract(hour from crime_time)
) as subquery_alias
GROUP BY period
This should do, but I have not tested it (and you may have to find hive equivalents of the postgres function I use: extract (doc is available here: http://www.postgresql.org/docs/9.1/static/functions-datetime.html).
SELECT max(nb), period
FROM (
SELECT count(*) as nb, period
FROM (
SELECT crime_string, extract(hour from crime_time) as period
FROM crimes
WHERE crime_string = 'Robbery'
)
GROUP BY period
);

How to get number of hits by time regardless of Date?

I am working on a sql view that should get the average number of hits by hour of the day, regardless of what day/date it is for traffic monitoring (12:00:00.000 - 12:59:59.999). Any ideas?
EDIT
Now I have the total, how do I get the average? SELECT AVG("FUNCTION BELOW") DOES NOT WORK
SELECT COUNT(*) AS total, DATEPART(hh, LogDate) AS HourOfDay
FROM dbo.Log
GROUP BY DATEPART(hh, LogDate)
Convert to DATEPART(hh,.....
Example SELECT DATEPART(hh,GETDATE())
Since you are on SQL Server 2008, you can use the time data type, just convert to time
example
SELECT CONVERT(TIME,GETDATE())
Then you can filter that also
Since I am not sure what your output is supposed to be like I am showing you both, but if all you need is to group by hour, then just do a datepart(hh.....
The query below may be good enough for you. It divides the count by the difference between todays date and the minimum date in the LogDate column.
SELECT DATEPART(hh,LogDate) as Hour
,CAST(COUNT(*)as decimal)/DATEDIFF(d,(SELECT MIN(LogDate) from log)
,CURRENT_TIMESTAMP) as AverageHits
, COUNT(*) as Count
FROM log
GROUP BY DATEPART(hh,LogDate)
ORDER by DATEPART(hh,LogDate) asc