Oracle - Calculating time differences - sql

Let's say I have following data:
Create Table Pm_Test (
Ticket_id Number,
Department_From varchar2(100),
Department_To varchar2(100),
Routing_Date Date
);
Insert Into Pm_Test Values (1,'A','B',To_Date('20140101120005','yyyymmddhh24miss'));
Insert Into Pm_Test Values (1,'B','C',To_Date('20140101130004','yyyymmddhh24miss'));
Insert Into Pm_Test Values (1,'C','D',To_Date('20140101130004','yyyymmddhh24miss'));
Insert Into Pm_Test Values (1,'D','E',To_Date('20140201150004','yyyymmddhh24miss'));
Insert Into Pm_Test Values (2,'A','B',To_Date('20140102120005','yyyymmddhh24miss'));
Insert Into Pm_Test Values (3,'D','B',To_Date('20140102120005','yyyymmddhh24miss'));
Insert Into Pm_Test Values (3,'B','A',To_Date('20140102170005','yyyymmddhh24miss'));
For the following requirements I already added two virtual columns, I think they might be necessary:
Select t.*,
Count(Ticket_id) Over (Partition By Ticket_id Order By Ticket_id) Cnt_Id,
Row_Number() Over (Partition By Ticket_id Order By Ticket_id ) row_number
From Pm_Test t;
1) I want to measure how long each ticket stayed in a department (routing_date of successor_department - routing_date of predecessor department) by adding the column PROCESSING_TIME:
2) I want to measure the total processing time by adding the column TOTAL_PROCESSING_TIME:
What SQL statements would be necessary to do so?
Thank you very much in advance!

To solve your problem, the way you described, the following sql should get you there. One thing to keep in mind, this data model doesn't seem the most efficient to capture processing times, if that's its true intent as the first department to get the ticket isn't measured.
select dept.ticket_id, department_from, department_to, routing_date, dept_processing_time, total_ticket_processing_time
from
(select ticket_id, max(routing_date) - min(routing_date) total_ticket_processing_time
from pm_test
group by ticket_id) total
join
(select ticket_id, department_from, department_to, routing_date,
coalesce(routing_date - lag(routing_date) over (partition by ticket_id order by routing_date), 0) dept_processing_time
from pm_test) dept
on (total.ticket_id = dept.ticket_id);

This query produces desired output. Analytic functions max(), min() and lag() used for calculations.
Results are in hours, like in your question.
SQLFiddle
select t.ticket_id, t.department_from, t.department_to,
to_char(t.routing_date, 'mm.dd.yy hh24:mi:ss') rd,
count(ticket_id) over (partition by ticket_id) cnt_id,
row_number() over (partition by ticket_id order by t.routing_date ) rn,
round(24 * (t.routing_date-
nvl(lag(t.routing_date) over (partition by ticket_id
order by t.routing_date), routing_date) ) , 8) dept_time,
round(24 * (max(t.routing_date) over (partition by ticket_id)
- min(t.routing_date) over (partition by ticket_id)), 8) total_time
from pm_test t

Related

How to grab the last value in a column per user for the last date

I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;

Display duplicate row indicator and get only one row when duplicate

I built the schema at http://sqlfiddle.com/#!18/7e9e3
CREATE TABLE BoatOwners
(
BoatID INT,
OwnerDOB DATETIME,
Name VARCHAR(200)
);
INSERT INTO BoatOwners (BoatID, OwnerDOB,Name)
VALUES (1, '2021-04-06', 'Bob1'),
(1, '2020-04-06', 'Bob2'),
(1, '2019-04-06', 'Bob3'),
(2, '2012-04-06', 'Tom'),
(3, '2009-04-06', 'David'),
(4, '2006-04-06', 'Dale1'),
(4, '2009-04-06', 'Dale2'),
(4, '2013-04-06', 'Dale3');
I would like to write a query that would produce the following result characteristics :
Returns only one owner per boat
When multiple owners on a single boat, return the youngest owner.
Display a column to indicate if a boat has multiple owners.
So the following data set when apply that query would produce
I tried
ROW_NUMBER() OVER (PARTITION BY ....
but haven't had much luck so far.
with data as (
select BoatID, OwnerDOB, Name,
row_number() over (partition by BoatID order by OwnerDOB desc) as rn,
count() over (partition by BoatID) as cnt
from BoatOwners
)
select BoatID, OwnerDOB, Name,
case when cnt > 1 then 'Yes' else 'No' end as MultipleOwner
from data
where rn = 1
This is just a case of numbering the rows for each BoatId group and also counting the rows in each group, then filtering accordingly:
select BoatId, OwnerDob, Name, Iif(qty=1,'No','Yes') MultipleOwner
from (
select *, Row_Number() over(partition by boatid order by OwnerDOB desc)rn, Count(*) over(partition by boatid) qty
from BoatOwners
)b where rn=1

Aggregating consecutive rows in SQL

Given the sql table (I'm using SQLite3):
CREATE TABLE person(name text, number integer);
And filling with the values:
insert into person values
('Leandro', 2),
('Leandro', 4),
('Maria', 8),
('Maria', 16),
('Jose', 32),
('Leandro', 64);
What I want is to get the sum of the number column, but only for consecutive rows, so that I can the result, that maintain the original insertion order:
Leandro|6
Maria|24
Jose|32
Leandro|64
The "closest" I got so far is:
select name, sum(number) over(partition by name) from person order by rowid;
But it clearly shows I'm far from understanding SQL, as the most important features (grouping and summation of consecutive rows) is missing, but at least the order is there :-):
Leandro|70
Leandro|70
Maria|24
Maria|24
Jose|32
Leandro|70
Preferably the answer should not require creation of temporary tables, as the output is expected to always have the same order of how the data was inserted.
This is a type of gaps-and-islands problem. You can use the difference of row numbers for this purpose:
select name, sum(number)
from (select p.*,
row_number() over (order by number) as seqnum,
row_number() over (partition by name order by number) as seqnum_1
from person p
) p
group by name, (seqnum - seqnum_1)
order by. min(number);
Why this works is a little tricky to explain. However, it becomes pretty obvious when you look at the results of the subquery. The difference of row numbers is constant on adjacent rows when the name does not change.
Here is a db<>fiddle.
You can do it with window functions:
LAG() to check if the previous name is the same as the current one
SUM() to create groups for consecutive same names
and then group by the groups and aggregate:
select name, sum(number) total
from (
select *, sum(flag) over (order by rowid) grp
from (
select *, rowid, name <> lag(name, 1, '') over (order by rowid) flag
from person
)
)
group by grp
See the demo.
Results:
> name | total
> :------ | ----:
> Leandro | 6
> Maria | 24
> Jose | 32
> Leandro | 64
I would change the create table statement to the following:
CREATE TABLE person(id integer, firstname nvarchar(255), number integer);
you need a third column to dertermine the insert order
I would rename the column name to something like firstname, because name is a keyword in some DBMS. This applies also for the column named number. Moreover I would change the text type of name to nvarchar, because it is sortable in the group by cause.
Then you can insert your data:
insert into person values
(1, 'Leandro', 2),
(2, 'Leandro', 4),
(3, 'Maria', 8),
(4, 'Maria', 16),
(5, 'Jose', 32),
(6, 'Leandro', 64);
After that you can query the data in the following way:
SELECT firstname, value FROM (
SELECT p.id, p.firstname, p.number, LAG(p.firstname) over (ORDER BY p.id) as prevname,
CASE
WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
ELSE number
END as value
FROM Person p
) AS temp
WHERE temp.firstname <> temp.prevname OR
temp.prevname IS NULL
First you select the value in the case statement
Then you filter the data and look at those entries which previous name is not the name of the actual name.
To understand the query better, you can run the subquery on it's own:
SELECT p.id, p.firstname, p.number, LEAD(p.firstname) over (ORDER BY p.id) as nextname, LAG(p.firstname) over (ORDER BY p.id) as prevname,
CASE
WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
ELSE number
END as value
FROM Person p
Based on Gordon Linoff's answer (https://stackoverflow.com/a/64727401/1721672), I extracted the inner select as CTE and the following query works pretty well:
with p(name, number, seqnum, seqnum_1) as
(select name, number,
row_number() over (order by number) as seqnum,
row_number() over (partition by name order by number) as seqnum_1
from person)
select
name, sum(number)
from
p
group by
name, (seqnum - seqnum_1)
order by
min(number);
Producing the expected result:
Leandro|6
Maria|24
Jose|32
Leandro|64

Oracle SQL: how to group same value in different group

Database:
Oracle Database 12c Release 12.2.0.1.0
Following is my test case script:
create table test
(
id number(1),
sdate date,
tdate date,
prnt_id number(1)
);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('10/17/2012','mm/dd/yyyy'), to_date('10/16/2014','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('10/16/2014','mm/dd/yyyy'), to_date('2/16/2016','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('2/16/2016','mm/dd/yyyy'), to_date('9/30/2016','mm/dd/yyyy'), 3);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('9/30/2016','mm/dd/yyyy'), to_date('3/16/2017','mm/dd/yyyy'), 3);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('3/16/2017','mm/dd/yyyy'), to_date('1/16/2019','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('1/16/2019','mm/dd/yyyy'), to_date('10/16/2019','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('10/16/2019','mm/dd/yyyy'), to_date('12/1/2999','mm/dd/yyyy'), 2);
commit;
select * from test order by sdate;
Question:
I want to modify the above Select SQL which returns all 7 rows from test table, selects all the columns plus two more columns.
First additional column (min_sdate) will return 10/17/2012 for rows 1,2 and 2/16/2016 for rows 3,4 and 3/16/2017 for rows 5,6,7.
Second additional column (max_tdate) will return 2/16/2016 for rows 1,2 and 3/16/2017 for rows 3,4 and 12/1/2999 for rows 5,6,7.
Basically, I'm trying to group by prnt_id column but instead of two groups (prnt_id: 2 and 3), I want three groups (prnt_id: 2,3,2), and then for those three groups get the min(sdate) and max(tdate).
I was thinking I could use analytical function min() and max() with window clause to achieve this, but not sure how to frame the SQL.
Any or all help will be appreciated. Thanks!
This is a form of gaps-and-islands. Assuming that the dates tile with no gaps, you can use the difference of row numbers to identify the islands:
select t.*,
min(sdate) over (partition by id, prnt_id, seqnum - seqnum_2),
max(edate) over (partition by id, prnt_id, seqnum - seqnum_2)
from (select t.*,
row_number() over (partition by id order by sdate) as seqnum,
row_number() over (partition by id, prnt_id order by sdate) as seqnum_2
from test t
) t;
Why this works is a little tricky to explain. But if you look at the results of the subquery, you will be able to see how the difference in row numbers defines the groups you want to define.

Select the max timestamp in hive

I have a table-customer with two records in the different timestamp. I want to select the max timestamp records: 2014-08-15 15:54:07.379.
Select Customer_ID, Account_ID, max(ProcessTimeStamp)
from Customer
group by Customer_ID, Account_ID
I should get one record, but the actual result is two records.
How can I get the max ProcessTimeStamp records?
You can use windows function here.
Either you can use dense_rank() or row_num().
1.USING DENSE_RANK()
select customer_id,account_id,processTimeStamp
from (select *
,dense_rank() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
2.USING ROW NUMBER
select customer_id,account_id,processTimeStamp
from (select *
,row_number() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
BUT with row_number() each row will get a unique number and if there are duplicate records than row_number will give only the row where row number=1(in above mentioned case).