Group by period of time in Oracle SQL - sql

I am trying to group part of my data by a given period of time using a timestamp element, and count the number of the element in the group, but I have not been very successful after many attempts.
First of all I start with this data:
REQ_DATA PROVIDER_NAME
24/01/2023 12:01:01 PRO_1
24/01/2023 12:03:01 PRO_1
24/01/2023 12:00:01 PRO_1
24/01/2023 12:05:01 PRO_1
24/01/2023 12:12:01 PRO_1
24/01/2023 12:18:01 PRO_1
24/01/2023 12:18:50 PRO_1
24/01/2023 12:18:54 PRO_2
What I want to get is a result like that (the period being 10 minutes):
TIME_FILTER PROVIDER_NAME COUNT
24/01/2023 12:01:01 - 24/01/2023 12:05:01 PRO_1 4
24/01/2023 12:12:01 - 24/01/2023 12:18:54 PRO_1 3
24/01/2023 12:18:54 - 24/01/2023 12:18:54 PRO_2 1
The TIME_FILTER column could be avoided, I put it here to better understand what I want to do.
The solution should be compatible with Oracle 11g.

Related

Create an extra column based condition in pandas

I have a data frame as shown below
Tenancy_ID Start_Date Cancelled_Date
1 2011-10-02 07:18:16 2011-12-02 08:15:16
2 2012-10-22 07:18:17 NaT
1 2013-06-02 07:14:12 NaT
3 2016-10-02 07:18:16 2017-03-02 08:18:15
From the above I would like to create new column named as Cancelled_Status based on the availability of cancelled date in Cancelled_Date.
Expected Output:
Tenancy_ID Start_Date Cancelled_Date Cancelled_status
1 2011-10-02 07:18:16 2011-12-02 08:15:16 Cancelled
2 2012-10-22 07:18:17 NaT Not_Cancelled
1 2013-06-02 07:14:12 NaT Not_Cancelled
3 2016-10-02 07:18:16 2017-03-02 08:18:15 Cancelled
Use numpy.where with Series.isna:
df['Cancelled_status'] = np.where(df['Cancelled_Date'].isna(), 'Not_Cancelled', 'Cancelled')
Alternative with
Series.notna:
df['Cancelled_status'] = np.where(df['Cancelled_Date'].notna(), 'Cancelled', 'Not_Cancelled')

SQL query to show user session length

I have a table that looks like this:
user_id page happened_at
2 'page3' 2017-10-05 11:31
1 'page2' 2016-02-01 00:02
2 'page1' 2017-10-05 15:24
3 'page3' 2017-03-31 19:35
4 'page1' 2017-07-09 00:24
2 'page3' 2017-10-05 15:28
1 'page3' 2018-02-01 13:02
2 'page2' 2017-10-05 16:14
2 'page3' 2017-10-05 16:34
etc
I have a query that identifies user sessions, which are opened pages #1, #2 and #3, in that particular order, made in a time period less than one hour from each other (page3 within an hour of page2, page2 within an hour of page1). Any pages, opened between that, can be ignored. Example of a session from the table above:
user_id page happened_at
2 'page1' 2017-10-05 15:24
2 'page2' 2017-10-05 16:14
2 'page3' 2017-10-05 16:34
My query so far looks like this and shows user_id of users, who had sessions:
select user_id
from (select user_id,page,happened_at,
lag(page) over(partition by user_id order by happened_at) as prev_page,
lead(page) over(partition by user_id order by happened_at) as next_page,
datediff(minute,lag(happened_at) over(partition by user_id order by happened_at),happened_at) as time_diff_with_prev_action,
datediff(minute,happened_at,lead(happened_at) over(partition by user_id order by happened_at)) as time_diff_with_next_action
from tbl
) t
where page='page2' and prev_page='page1' and next_page='page3'
and time_diff_with_prev_action <= 60 and time_diff_with_next_action <= 60
What I need is to edit a query, add 2 columns to the output, session start time and session end time, which is last action + 1 hour. Please advice how to make it. Temporary tables are forbidden, so it should be just a query. Example output should be:
user_id session_start session_end
2 2017-10-05 15:24 2017-10-05 17:34
Thanks for your time!

Getting a count by date based on the number of observations with encompassing date ranges

I am working with a table in Microsoft Access whereby I have 2 columns with a start and end date.
I want to get the count by date of the number of rows with date ranges that encompass the date in the output table.
Input Data
Start Date End Date
01/02/2017 03/02/2017
07/02/2017 19/02/2017
09/02/2017 19/02/2017
11/02/2017 12/02/2017
12/02/2017 17/02/2017
Desired Output
Date Count
01/02/2017 1
02/02/2017 1
03/02/2017 1
04/02/2017 0
05/02/2017 0
06/02/2017 0
07/02/2017 1
08/02/2017 1
09/02/2017 2
10/02/2017 2
11/02/2017 3
12/02/2017 4
13/02/2017 3
14/02/2017 3
15/02/2017 3
16/02/2017 3
17/02/2017 3
18/02/2017 2
19/02/2017 2
20/02/2017 0
For this project, I have to use Microsoft Access 2010, so a solution in either SQL code or design view input would be great.
Any help on this would be appreciated. Thanks!
Use the below query to get the required result. You can also change the column with respect to your requirements
SELECT END_DATE AS DATE, COUNT(*) AS COUNT FROM TABLE_NAME
GROUP BY END_DATE ORDER BY END_DATE;

How to set an event duration limit to define "same event" and "new event" in Access 2010 query or SQL?

I am working in Access 2010 with datetime-stamped records (photographs from camera-traps) that signify visits by specific animals (SpeciesID (0-10), AnimalID (1-20) to different camera sites (StationID). I want to calculate the number and duration of visits by each AnimalID to each StationID.
The problem is that sometimes animals visit the same station multiple times in the same day. I have tried queries that group the records by date and show the 'First of' and 'Last of' the datetime field, but this just gives the datetime of the first and last records of that animal at each station on that day, not of each individual visit.
The criteria I want to use is 'if consecutive records of the same animal, species and station are >20 minutes apart, then they are separate visits'. I wonder whether a way to solve this is to create a new field with an update query that gives each visit a unique 'VisitID' number using this criteria, so I can then group records by VisitID to calculate the First and Last datetime for each separate visit? Can anyone suggest a way to do this as a query or in SQL, or think of another way of doing this??
My data table (called Capture) is laid out like this:
CaptureID | StationID | SpeciesID | AnimalID | cDateTime
CaptureID is a unique Autonumber for every record. SpeciesID can be 1-10, AnimalID can be 1-20 (but AnimalIDs are only assigned to records of Species 1), StationID can be 1-12, cDateTime can be any time as the camera traps are motion-triggered, and is formatted as DD/MM/YYYY hh:mm:ss. I want the visit duration to be formatted as hh:mm:ss.
Any help or advice much appreciated!!
Here is my solution. My test data is
CaptureID AnimalID StationID cDateTime VisitStart VisitEnd
--------- -------- --------- ------------------- ------------------- -------------------
1 1 1 2013-05-21 08:00:00
2 2 1 2013-05-21 08:02:00
3 1 1 2013-05-21 08:07:00
4 2 1 2013-05-21 08:21:00
5 1 1 2013-05-21 08:28:00
Notes:
I have omitted SpeciesID since AnimalID is the unique identifier, so SpeciesID really belongs in the [Animals] table with the other details about that particular animal.
All VisitStart values are initially NULL. That is important for one of the queries below.
To populate VisitStart, we'll just use the cDateTime for any capture that does not have a previous capture within 20 minutes for the same AnimalID and StationID.
UPDATE Captures SET VisitStart = cDateTime
WHERE NOT EXISTS
(
SELECT * FROM Captures c2
WHERE c2.AnimalID=Captures.AnimalID AND c2.StationID=Captures.StationID
AND c2.cDateTime<Captures.cDateTime
AND c2.cDateTime>=DateAdd("n", -20, Captures.cDateTime)
)
That gives us the start times for the discrete visits:
CaptureID AnimalID StationID cDateTime VisitStart VisitEnd
--------- -------- --------- ------------------- ------------------- -------------------
1 1 1 2013-05-21 08:00:00 2013-05-21 08:00:00
2 2 1 2013-05-21 08:02:00 2013-05-21 08:02:00
3 1 1 2013-05-21 08:07:00
4 2 1 2013-05-21 08:21:00
5 1 1 2013-05-21 08:28:00 2013-05-21 08:28:00
Now we can fill in the rest of the VisitStart values by finding the largest previous VisitStart for that AnimalID/StationID
UPDATE Captures
SET VisitStart = DMax("VisitStart", "Captures", "AnimalID=" & AnimalID & " AND StationID=" & StationID & " AND cDateTime<#" & Format(cDateTime, "yyyy-mm-dd Hh:Nn:Ss") & "#")
WHERE VisitStart IS NULL
That gives us
CaptureID AnimalID StationID cDateTime VisitStart VisitEnd
--------- -------- --------- ------------------- ------------------- -------------------
1 1 1 2013-05-21 08:00:00 2013-05-21 08:00:00
2 2 1 2013-05-21 08:02:00 2013-05-21 08:02:00
3 1 1 2013-05-21 08:07:00 2013-05-21 08:00:00
4 2 1 2013-05-21 08:21:00 2013-05-21 08:02:00
5 1 1 2013-05-21 08:28:00 2013-05-21 08:28:00
A similar query can calculate the VisitEnd values
UPDATE Captures
SET VisitEnd = DMax("cDateTime", "Captures", "AnimalID=" & AnimalID & " AND StationID=" & StationID & " AND VisitStart=#" & Format(VisitStart, "yyyy-mm-dd Hh:Nn:Ss") & "#")
The result is
CaptureID AnimalID StationID cDateTime VisitStart VisitEnd
--------- -------- --------- ------------------- ------------------- -------------------
1 1 1 2013-05-21 08:00:00 2013-05-21 08:00:00 2013-05-21 08:07:00
2 2 1 2013-05-21 08:02:00 2013-05-21 08:02:00 2013-05-21 08:21:00
3 1 1 2013-05-21 08:07:00 2013-05-21 08:00:00 2013-05-21 08:07:00
4 2 1 2013-05-21 08:21:00 2013-05-21 08:02:00 2013-05-21 08:21:00
5 1 1 2013-05-21 08:28:00 2013-05-21 08:28:00 2013-05-21 08:28:00
Calculating the visit duration is simply a matter of using DateDiff() on VisitStart and VisitEnd. Note that the last visit will have a duration of zero since there was only one capture for it.
You could define an on insert trigger for the capture table and a new field 'VisitStart'.
The trigger would, in pseudocode:
Search for any record with a capture date + AnimalId within 20 mins of
this capture.
If one exists then take its VisitStart field to populate the new
record's VisitStart.
If none exists then set new VisitStart to Capture Date.
I realise this does not help you with your existing data but a one-off process to pump-prime the system should be possible.
Any good ?

How to ORDER before GROUP with DBIx::Class

I've got a simple temporal table that looks like this:
Table: item_approval
item user status modified
2 fred approved 2010-12-01 00:00:00
3 fred approved 2010-12-02 00:00:00
4 fred disapproved 2010-12-03 00:00:00
7 jack unapproved 2010-12-05 00:00:00
4 fred approved 2010-12-06 00:00:00
4 jack unapproved 2010-12-07 00:00:00
4 fred disapproved 2010-12-04 00:00:00
I'm using DBIx::Class. My "Item" result is defined with:
__PACKAGE__->has_many(
"item_approvals",
"Schema::Result::ItemApproval",
{ "foreign.item" => "self.id" },
{ cascade_copy => 0, cascade_delete => 0 },
);
Which means I can do:
my $item = $schema->resultset('Item')->find({id=>4});
Which is fine. Then, I can do:
my #approvals = $item->item_approvals;
to get a resultset like this:
item user status modified
4 fred disapproved 2010-12-03 00:00:00
4 fred approved 2010-12-06 00:00:00
4 jack unapproved 2010-12-07 00:00:00
4 fred disapproved 2010-12-04 00:00:00
My question: How do I get the set of Fred and Jack's single most recent approval status? That is, I want to get this resultset:
item user status modified
4 fred approved 2010-12-06 00:00:00
4 jack unapproved 2010-12-07 00:00:00
I tried things like this:
my #approvals = $item->search({}, {
group_by => 'user',
order_by => {-desc => 'modified'}
});
but the "ORDER BY" is executed after the "GROUP BY", so I get things like this instead:
item user status modified
4 fred disapproved 2010-12-03 00:00:00
4 jack unapproved 2010-12-07 00:00:00
Help?
From the behavior described in your comments I'm guessing your database is MySQL.
I'm also assuming your item_approval table has a primary key which I will call PK.
One option is to use a sub select to pick the row that has the largest (most recent) modified value:
select item, user, status, modified
from item_approval me
where PK = (select s.PK from item_approval s where me.item = s.item and me.user = s.user order by s.modified desc, s.PK desc limit 1)
and me.item = 4
This is a fairly slow option because it will re-run the sub select for each row and then reject all but one row for each item/user combination.
Other databases have slightly different ways to get similar results.