SQL statement to get the most frequent hours and/or period for a user - sql

I'm new to SQL Datetime.
Considering I have records in MySQL with Datetime data (as seen below),
what is the best way to get the most frequent occurring HOUR or range/period for these records for a particular user (i.e. john)?
ID | Datetime | User | Activity
0 2010-03-29 13:15:56 john visit
1 2010-03-29 13:13:14 ariel visit
2 2010-03-29 13:09:13 john visit
3 2010-03-29 13:07:21 john visit
4 2010-02-23 12:21:03 john visit
5 2010-02-23 12:01:03 john visit
6 2010-02-23 11:01:03 john visit
7 2010-02-23 02:01:03 john visit
With the above data,
the frequent hour for john doing visit would be 13, while period would be perhaps 12-13.
The goal is to find the period/time that the user does a certain activity most.
Thanks in advance for all the help!

It can depend on the database you're using, but in mysql :
SELECT COUNT(id) as count, HOUR(Datetime) as hour FROM table GROUP BY hour ORDER BY count DESC LIMIT 1
(see http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html)

with oracle you can write something like
select TO_CHAR(Datetime, 'HH24'), count(ID) from Table group by TO_CHAR(Datetime, 'HH24')
on other RDBMS use equivalent function to extract hour part from Datetime field.

Related

Excluding identical returns from ACCESS query when names are equal

In MS ACCESS, I have a table containing names and dates for when a persons yearly exam expires. This exam is valid for 12 months, so the next exam is typically done before all 12 months have expired.
Table, called "Exam", looks like this (in the real table names are unique):
ID Name Dateexp
1 Peter 30/07/2020
2 john 10/09/2020
3 Bob 11/10/2019
4 Peter 25/06/2021
I have a query that shows the persons with a "valid" exam. I looks like this:
SELECT Name As Name, Dateexp As Expiry FROM Overall WHERE Dateexp > now();
It returns:
Name Expiry
Peter 30/07/2020
John 10/09/2020
Peter 25/06/2021
Problem is that "Peter" has done a new exam thereby extending his expiry date from 30/07/2020 to 25/06/21 and I only want the latest one to be shown.
Query should return:
Name Expiry
Peter 25/06/2021
John 10/09/2020
I am truly lost - does anyone have an idea as to how this can be solved?
Thank you!
You can use max and having clause:
Select name, max(dateexp) as dateexp
from overall
Group by name
Having max(dateexp) > now()
If I followed you correctly, you can just use aggregation, and filter with a having clause:
select name, max(dateexp) as expiry
from overall
group by name
having max(dateexp) > now();
This filters on names whose latest expiry date is in the future.

Use 12 different date ranges in join statement-SQL without re-running the join 12 times

Supposedly I have 2 tables in the formats below:
Table 1: Visits
Customer | Website_visit_id | Time_of_visit
Table 2: Booking
Customer | Hotel_booking_id | Time_of_booking
I created a table that has customer's id, their booking, and all the visits they made to the website within 30 days prior to making the booking using this:
select customer, booking_id, website_visit_id
from visits a
join booking b
on a.customer = b.customer
and Time_of_visit between dateadd(days, -30, time_of_booking) and time_of_booking
My question is- if I want to expand this time frame to look at how many visits within 60,90,120...365 days prior to making the booking, how do I do that most efficiently (instead of having to run the join 12 times with the dateadd number changed? Is there a way to add a parameter in place of the '30' days in the join statement?
The output can be in the form of a list of booking and their corresponding visits/ timeframe, or a table of bookings and total of visits for each time frame (something that looks like this:
Time_frame| Customer | Hotel_booking_id | Number of visits
T30D | Mike | 1A | 5
T60D | Mike | 1A | 15
T90D | Mike | 1A | 22
T120D | Mike | 1A | 27
Thank you in advance
If you are looking to dynamically build a query, you should look to see if your DB supports dynamic queries/sql. Essentially, instead of running a query, you would run a stored procedure with logic to identify what your query parameters should be. From there, you would be able to dynamically create your query based on your preferred logic and execute it.
I'm not sure which DB you are using however I know that MySQL and Oracle DBs support this.
Please see links below for further information:
Oracle Documentation
MySQL Documentation

Time between date. (More advanced than just Datediff)

I have a table that contains Guest_ID and Trip_Date. I have been tasked with trying to find out for each Guest_ID how many times they have had over 365 days between trips. I know that for the time between the dates I can use datediff formula but I am unsure of how to get the dates plugged in properly. I think if I can get help with this part I can do the rest.
For each time this happened I need to report back Guest_ID, Prior_Last_Trip, New_Trip, days between. This data goes back for over a decade so it is possible for a Guest to have multiple periods of over a year between visits.
I was thinking of just loading a table with that data that can be queried later. That way once I figure out how to make this work the first time I can setup a stored procedure or trigger to check for new occurrences of this and populate the table.
I was not sure were to begin on this code. I was thinking recursion might be the answer but I do not know recursion just that it exist.
This table is quite large. Around 1.5 million unique Guest_ID's with over 30 million trips.
I am using SQL Server 2012. If there is anything else I can add to help this let me know. I will edit and update this as I have ideas on how to make this work myself.
Edit 1: Sample Data and Desired Results
Guest_ID Trip_Date
1 1/1/2013
1 2/5/2013
1 12/5/2013
1 1/1/2015
1 6/5/2015
1 8/1/2017
1 10/2/2017
1 1/6/2018
1 6/7/2018
1 7/1/2018
1 7/5/2018
2 1/1/2018
2 2/6/2018
2 4/2/2018
2 7/3/2018
3 1/1/2014
3 6/5/2014
3 9/4/2014
Guest_ID Prior_Last_Trip New_Trip DaysBetween
1 12/5/2013 1/1/2015 392
1 6/5/2015 8/1/2017 788
So you can see that Guest 1 had 2 different times where they did not have a trip for over a year and that those two instances are recorded in the results. Guest 2 never had a gap of over a year and therefor has no records in the results. Guest 3 has not had a trip in over a year but without have the return trip currently does not qualify for the result set. Should Guest 3 ever make another trip they would then be added to the result set.
Edit 2: Working Query
Thanks to #Code4ml I got this working. Here is the complete query.
Select
Guest_ID, CurrentTrip, DaysBetween, Lasttrip
From (
Select
Guest_ID
,Lag(Trip_Date,1) Over(Partition by Guest_ID Order by Trip_Date) as LastTrip
,Trip_Date as CurrentTrip
,DATEDIFF(d,Lag(Trip_Date,1) Over(Partition by Guest_ID Order by Trip_Date),Trip_Date) as DaysBetween
From UCS
) as A
Where DaysBetween > 365
You may try SQL LAG function to access previous trip date like below.
SELECT guest_id, trip_date,
LAG (trip_date,1) OVER (PARTITION BY guest_id ORDER BY trip_date desc) AS prev_trip_date
FROM tripsTable
Now you can use this as a subquery to calculate number of days between trips and filter the data as required.

Count The number of Visit in SQL

I need to find out a way to record the number of the visit of a person in my table, so that if it is the first time then I need to have the number in the Visits equal to 1 and in the second time this person come the number in the visit should be 2.
Below is a description to the situation.
CNT PATID DATE PATName VISIT
----------------------------------------
300 3001 16/08/2015 Jason 1
300 3002 16/08/2015 Sayde 1
300 3003 20/08/2015 Sayde 2
300 3004 20/08/2015 wetni 1
300 3005 20/08/2015 Jason 2
The column Visit is the thing I want to be able to calculate and show.
The best way is to calculate this on the fly with row_number() function
select CNT, PATID, DATE, PATName,
row_number() over (partition by PATName order by PATID) as VISIT
from table
The above will work in SQL Server and ORACLE. If you use MySQL, you need to use a variable

a question about sql group by

I have a table named visiting that looks like this:
id | visitor_id | visit_time
-------------------------------------
1 | 1 | 2009-01-06 08:45:02
2 | 1 | 2009-01-06 08:58:11
3 | 1 | 2009-01-06 09:08:23
4 | 1 | 2009-01-06 21:55:23
5 | 1 | 2009-01-06 22:03:35
I want to work out a sql that can get how many times a user visits within one session(successive visit's interval less than 1 hour).
So, for the example data, I want to get following result:
visitor_id | count
-------------------
1 | 3
1 | 2
BTW, I use postgresql 8.3.
Thanks!
UPDATE: updated the timestamps in the example data table. sorry for the confusion.
UPDATE: I don't care much if the solution is a single sql query, using store procedure, subquery etc. I only care how to get it done :)
The question is slightly ambiguous because you're making the assumption or requiring that the hours are going to start at a set point, i.e. a natural query would also indicate that there's a result record of (1,2) for all the visits between the hour of 08:58 and 09:58. You would have to "tell" your query that the start times are for some determinable reason visits 1 and 4, or you'd get the natural result set:
visitor_id | count
--------------------
1 | 3
1 | 2 <- extra result starting at visit 2
1 | 1 <- extra result starting at visit 3
1 | 2
1 | 1 <- extra result starting at visit 5
That extra logic is going to be expensive and too complicated for my fragile mind this morning, somebody better than me at postgres can probably solve this.
I would normally want to solve this by having a sessionkey column in the table I could cheaply group by for perforamnce reasons, but there's also a logical problem I think. Deriving session info from timings seems dangerous to me because I don't believe that the user will be definitely logged out after an hours activity. Most session systems work by expiring the session after a period of inactivity, i.e. it's very likely that a visit after 9:45 is going to be in the same session because your hourly period is going to be reset at 9:08.
The problem seems a little fuzzy.
It gets more complicated as id 3 is within an hour of id 1 and 2, but if the user had visited at 9:50 then that would have been within an hour of 2 but not 1.
You seem to be after a smoothed total - for a given visit, how many visits are within the following hour?
Perhaps you should be asking for how many visits have a succeeding visit less than an hour distant? If a visit is less than an hour from the preceeding one then should it 'count'?
So what you probably want is how many chains do you have where the links are less than an arbitrary amount (so the hypothetical 9:50 visit would be included in the chain that starts with id 1).
no simple solution
There is no way to do this in a single SQL statment.
Below are 2 ideas: one uses a loop to count visits, the other changes the way the visiting table is populated.
loop solution
However, it can be done without too much trouble with a loop.
(I have tried to get the postgresql syntax correct, but I'm no expert)
/* find entries where there is no previous entry for */
/* the same visitor within the previous hour: */
select v1.* , 0 visits
into temp_table
from visiting v1
where not exists ( select 1
from visiting v2
where v2.visitor_id = v1.visitor_id
and v2.visit_time < v1.visit_time
and v1.visit_time - interval '1 hour' < v2.visit_time
)
select #rows = ##rowcount
while #rows > 0
begin
update temp_table
set visits = visits + 1 ,
last_time = v.visit_time
from temp_table t ,
visiting v
where t.visitor_id = v.visitor_id
and v.visit_time - interval '1 hour' < t.last_time
and not exists ( select 1
from visiting v2
where v2.visitor_id = t.visitor_id
and v2.visit_time between t.last_time and v.visit_time
)
select #rows = ##rowcount
end
/* get the result: */
select visitor_id,
visits
from temp_table
The idea here is to do this:
get all visits where there is no prior visit inside of an hour.
this identifies the sessions
loop, getting the next visit for each of these "first visits"
until there are no more "next visits"
now you can just read off the number of visits in each session.
best solution?
I suggest:
add a column to the visiting table: session_id int not null
change the process which makes the entries so that it checks to see if the previous visit by the current visitor was less than an hour ago. If so, it sets session_id to the same as the session id for that earlier visit. If not, it generates a new session_id .
you could put this logic in a trigger.
Then your original query can be solved by:
SELECT session_id, visitor_id, count(*)
FROM visiting
GROUP BY session_id, visitor_id
Hope this helps. If I've made mistakes (I'm sure I have), leave a comment and I'll correct it.
PostgreSQL 8.4 will have a windowing function, by then we can eliminate creating temporary table just to simulate rownumbers (sequence purposes)
create table visit
(
visitor_id int not null,
visit_time timestamp not null
);
insert into visit(visitor_id, visit_time)
values
(1, '2009-01-06 08:45:02'),
(2, '2009-02-06 08:58:11'),
(1, '2009-01-06 08:58:11'),
(1, '2009-01-06 09:08:23'),
(1, '2009-01-06 21:55:23'),
(2, '2009-02-06 08:59:11'),
(2, '2009-02-07 00:01:00'),
(1, '2009-01-06 22:03:35');
create temp table temp_visit(visitor_id int not null, sequence serial not null, visit_time timestamp not null);
insert into temp_visit(visitor_id, visit_time) select visitor_id, visit_time from visit order by visitor_id, visit_time;
select
reference.visitor_id, count(nullif(reference.visit_time - prev.visit_time < interval '1 hour',false))
from temp_visit reference
left join temp_visit prev
on prev.visitor_id = reference.visitor_id and prev.sequence = reference.sequence - 1
group by reference.visitor_id;
One or both of these may work? However, both will end up giving you more columns in the result than you are asking for.
SELECT visitor_id,
date_part('year', visit_time),
date_part('month', visit_time),
date_part('day', visit_time),
date_part('hour', visit_time),
COUNT(*)
FROM visiting
GROUP BY 1, 2, 3, 4, 5;
SELECT visitor_id,
EXTRACT(EPOCH FROM visit_time)-(EXTRACT(EPOCH FROM visit_time) % 3600),
COUNT(*)
FROM visiting
GROUP BY 1, 2;
This can't be done in a single SQL.
The better option is to handle it in stored procedure
If it were T-SQL, I would write something as:
SELECT visitor_id, COUNT(id),
DATEPART(yy, visit_time), DATEPART(m, visit_time),
DATEPART(d, visit_time), DATEPART(hh, visit_time)
FROM visiting
GROUP BY
visitor_id,
DATEPART(yy, visit_time), DATEPART(m, visit_time),
DATEPART(d, visit_time), DATEPART(hh, visit_time)
which gives me:
1 3 2009 1 6 8
1 2 2009 1 6 21
I do not know how or if you can write this in postgre though.