sql Query to find the maximium hour of particular event in table

sql Query to find the maximium hour of particular event in table - sql

I have a single table with fields (crime-id int , crime_time timestamp , crime string, city string )
There are only 9 unique crimes in table . I need to find the Time ie the hour in which a particular crime occured frequency in max times . Eg if Robbery cause most between 10- 11 it must show 10 or 11 ... the time may start from 00:00 nd ends in 23:59

viod answer is almost ok.
But you need a group by to count the robbery in time slot.
Also need put an alias for the subquery.
SELECT period, max(nb)
FROM (
SELECT extract(hour from crime_time) as period, count(*) as nb
FROM crimes
WHERE crime_string = 'Robbery'
GROUP BY extract(hour from crime_time)
) as subquery_alias
GROUP BY period

This should do, but I have not tested it (and you may have to find hive equivalents of the postgres function I use: extract (doc is available here: http://www.postgresql.org/docs/9.1/static/functions-datetime.html).
SELECT max(nb), period
FROM (
SELECT count(*) as nb, period
FROM (
SELECT crime_string, extract(hour from crime_time) as period
FROM crimes
WHERE crime_string = 'Robbery'
)
GROUP BY period
);

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.

Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

SQL question: count of occurrence greater than N in any given hour

I'm looking through login logs (in Netezza) and trying to find users who have greater than a certain number of logins in any 1 hour time period (any consecutive 60 minute period, as opposed to strictly a clock hour) since December 1st. I've viewed the following posts, but most seem to address searching within a specific time range, not ANY given time period. Thanks.
https://dba.stackexchange.com/questions/137660/counting-number-of-occurences-in-a-time-period
https://dba.stackexchange.com/questions/67881/calculating-the-maximum-seen-so-far-for-each-point-in-time
Count records per hour within a time span

You could use the analytic function lag to look back in a sorted sequence of time stamps to see whether the record that came 19 entries earlier is within an hour difference:
with cte as (
select user_id,
login_time,
lag(login_time, 19) over (partition by user_id order by login_time) as lag_time
from userlog
order by user_id,
login_time
)
select user_id,
min(login_time) as login_time
from cte
where extract(epoch from (login_time - lag_time)) < 3600
group by user_id
The output will show the matching users with the first occurrence when they logged a twentieth time within an hour.

I think you might do something like that (I'll use a login table, with user, datetime as single column for the sake of simplicity):
with connections as (
select ua.user
, ua.datetime
from user_logons ua
where ua.datetime >= timestamp'2018-12-01 00:00:00'
)
select ua.user
, ua.datetime
, (select count(*)
from connections ut
where ut.user = ua.user
and ut.datetime between ua.datetime and (ua.datetime + 1 hour)
) as consecutive_logons
from connections ua
It is up to you to complete with your columns (user, datetime)
It is up to you to find the dateadd facilities (ua.datetime + 1 hour won't work); this is more or less dependent on the DB implementation, for example it is DATE_ADD in mySQL (https://www.w3schools.com/SQl/func_mysql_date_add.asp)
Due to the subquery (select count(*) ...), the whole query will not be the fastest because it is a corelative subquery - it needs to be reevaluated for each row.
The with is simply to compute a subset of user_logons to minimize its cost. This might not be useful, however this will lessen the complexity of the query.
You might have better performance using a stored function or a language driven (eg: java, php, ...) function.

PostgreSQL "nested"? distincts and count

I need to get the count of the distinct names per hour in one query in PostgreSQL 9.1
The relevant columns(generalized for question) in my table are:
occurred timestamp with time zone and
name character varying(250)
And the table name for the sake of the question is just table
The occurred timestamps will all be within a midnight to midnight(exclusive) range for one day. So far my query looks like:
'SELECT COUNT(DISTINCT ON (name)) FROM table'
It would be nice if I could get the output formatted as a list of 24 integers(one for each hour of the day), the names aren't required to be returned.

If I understand correctly what you want, you can write:
SELECT EXTRACT(HOUR FROM occurred),
COUNT(DISTINCT name)
FROM ...
WHERE ...
GROUP
BY EXTRACT(HOUR FROM occurred)
ORDER
BY EXTRACT(HOUR FROM occurred)
;

SELECT date_trunc('hour', occurred) AS hour_slice
,count(DISTINCT name) AS name_ct
FROM mytable
GROUP BY 1
ORDER BY 1;
DISTINCT ON is a different feature.
date_trunc() gives you a sum for every distinct hour, while EXTRACT sums per hour-of-day over longer periods of time. The two results do not add up, because summing up multiple count(DISTINCT x) is equal or greater than one count(DISTINCT x).

You want this by hour:
select extract(hour from occurred) as hr, count(distinct name)
from table t
group by extract(hour from occurred)
order by 1
This assumes there is data for only one day. Otherwise, hours from different days would be combined. To get around this, you would need to include date information as well.

How do I produce a time interval query in SQLite?

I have an events based table that I would like to produce a query, by minute for the number of events that were occuring.
For example, I have an event table like:
CREATE TABLE events (
session_id TEXT,
event TEXT,
time_stamp DATETIME
)
Which I have transformed into the following type of table:
CREATE TABLE sessions (
session_id TEXT,
start_ts DATETIME,
end_ts DATETIME,
duration INTEGER
);
Now I want to create a query that would group the sessions by a count of those that were active during a particular minute. Where I would essentially get back something like:
TIME_INTERVAL ACTIVE_SESSIONS
------------- ---------------
18:00 1
18:01 5
18:02 3
18:03 0
18:04 2

Ok, I think I got more what I wanted. It doesn't account for intervals that are empty, but it is good enough for what I need.
select strftime('%Y-%m-%dT%H:%M:00.000',start_ts) TIME_INTERVAL,
(select count(session_id)
from sessions s2
where strftime('%Y-%m-%dT%H:%M:00.000',s1.start_ts) between s2.start_ts and s2.end_ts) ACTIVE_SESSIONS
from sessions s1
group by strftime('%Y-%m-%dT%H:%M:00.000',start_ts);
This will generate a row per minute for the period that the data covers with a count for the number of sessions that were had started (start_ts) but hadn't finished (end_ts).

PostgreSQL allows the following query.
In contrast to your example, this returns an additional column for the day, and it omits the minutes where nothing happened (count=0).
select
day, hour, minute, count(*)
from
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),
(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),
(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),
(50),(51),(52),(53),(54),(55),(56),(57),(58),(59))
as minutes (minute),
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23))
as hours (hour),
(select distinct cast(start_ts as date) from sessions
union
select distinct cast(end_ts as date) from sessions)
as days (day),
sessions
where
(day,hour,minute)
between (cast(start_ts as date),extract(hour from start_ts),extract(minute from start_ts))
and (cast(end_ts as date), extract(hour from end_ts), extract(minute from end_ts))
group by
day, hour, minute
order by
day, hour, minute;

This isn't exactly your query, but I think it could help. Did you look into the SQLite R-Tree module? This would allow you to create a virtual index on the start/stop time:
CREATE VIRTUAL TABLE sessions_index USING rtree (id, start, end);
Then you could search via:
SELECT * FROM sessions_index WHERE end >= <first minute> AND start <= <last minute>;

MySQL to get the count of rows that fall on a date for each day of a month

I have a table that contains a list of community events with columns for the days the event starts and ends. If the end date is 0 then the event occurs only on the start day. I have a query that returns the number of events happening on any given day:
SELECT COUNT(*) FROM p_community e WHERE
(TO_DAYS(e.date_ends)=0 AND DATE(e.date_starts)=DATE('2009-05-13')) OR
(DATE('2009-05-13')>=DATE(e.date_starts) AND DATE('2009-05-13')<=DATE(e.date_ends))
I just sub in any date I want to test for "2009-05-13".
I need to be be able to fetch this data for every day in an entire month. I could just run the query against each day one at a time, but I'd rather run one query that can give me the entire month at once. Does anyone have any suggestions on how I might do that?
And no, I can't use a stored procedure.

Try:
SELECT COUNT(*), DATE(date) FROM table WHERE DATE(dtCreatedAt) >= DATE('2009-03-01') AND DATE(dtCreatedAt) <= DATE('2009-03-10') GROUP BY DATE(date);
This would get the amount for each day in may 2009.
UPDATED: Now works on a range of dates spanning months/years.

Unfortunately, MySQL lacks a way to generate a rowset of given number of rows.
You can create a helper table:
CREATE TABLE t_day (day INT NOT NULL PRIMARY KEY)
INSERT
INTO t_day (day)
VALUES (1),
(2),
…,
(31)
and use it in a JOIN:
SELECT day, COUNT(*)
FROM t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day
Or you may use an ugly subquery:
SELECT day, COUNT(*)
FROM (
SELECT 1 AS day
UNION ALL
SELECT 2 AS day
…
UNION ALL
SELECT 31 AS day
) t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas