Oracle SQL: how to group same value in different group - sql

Database:
Oracle Database 12c Release 12.2.0.1.0
Following is my test case script:
create table test
(
id number(1),
sdate date,
tdate date,
prnt_id number(1)
);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('10/17/2012','mm/dd/yyyy'), to_date('10/16/2014','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('10/16/2014','mm/dd/yyyy'), to_date('2/16/2016','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('2/16/2016','mm/dd/yyyy'), to_date('9/30/2016','mm/dd/yyyy'), 3);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('9/30/2016','mm/dd/yyyy'), to_date('3/16/2017','mm/dd/yyyy'), 3);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('3/16/2017','mm/dd/yyyy'), to_date('1/16/2019','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('1/16/2019','mm/dd/yyyy'), to_date('10/16/2019','mm/dd/yyyy'), 2);
insert into test (id, sdate, tdate, prnt_id) values (1, to_date('10/16/2019','mm/dd/yyyy'), to_date('12/1/2999','mm/dd/yyyy'), 2);
commit;
select * from test order by sdate;
Question:
I want to modify the above Select SQL which returns all 7 rows from test table, selects all the columns plus two more columns.
First additional column (min_sdate) will return 10/17/2012 for rows 1,2 and 2/16/2016 for rows 3,4 and 3/16/2017 for rows 5,6,7.
Second additional column (max_tdate) will return 2/16/2016 for rows 1,2 and 3/16/2017 for rows 3,4 and 12/1/2999 for rows 5,6,7.
Basically, I'm trying to group by prnt_id column but instead of two groups (prnt_id: 2 and 3), I want three groups (prnt_id: 2,3,2), and then for those three groups get the min(sdate) and max(tdate).
I was thinking I could use analytical function min() and max() with window clause to achieve this, but not sure how to frame the SQL.
Any or all help will be appreciated. Thanks!

This is a form of gaps-and-islands. Assuming that the dates tile with no gaps, you can use the difference of row numbers to identify the islands:
select t.*,
min(sdate) over (partition by id, prnt_id, seqnum - seqnum_2),
max(edate) over (partition by id, prnt_id, seqnum - seqnum_2)
from (select t.*,
row_number() over (partition by id order by sdate) as seqnum,
row_number() over (partition by id, prnt_id order by sdate) as seqnum_2
from test t
) t;
Why this works is a little tricky to explain. But if you look at the results of the subquery, you will be able to see how the difference in row numbers defines the groups you want to define.

Related

Ranking and obtaining data across moving window

I have following table -
create table iphone_defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
insert into iphone_defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into iphone_defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into iphone_defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into iphone_defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into iphone_defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into iphone_defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into iphone_defects values ('iPhone','Audio problem',25,202106,'2020-08-09');
Am expecting the following result -
fwkyr refers to Financial Week in a year. I have added in additional column fwenddate basically referring to max date in the financial week of the year.
Basically the ask is to obtain the defect with largest quantity in a 4 week window from the current week. Say for the fwkyr - 202112, the highest defects is for 'Glass breakage' and the total quantity is 100.
This is a static window. My actual use case needs 52 week.
Without the moving window, I know that I can rank and get the data but not sure on how to even approach this problem. Any help?
Per updated question my updated solution gets much longer and changes quite a bit.
I am still not sure if user selects from which week you need another 52 weeks or if you are looking at this calculation from start (week 1) of every year.
I also assume that you have a typo in one of your insert statements when I compare to your desired output table. So I changed it to fit your output table.
1. Create table
create table table.defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
2. Insert data (adjusted last insert to match your output table)
insert into table.defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into table.defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into table.defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into table.defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into table.defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into table.defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into table.defects values ('iPhone','Audio problem',55,202106,'2020-08-09');
3. Query for results
###############################################################################
### start count of weeks since selected first week and
### get number of weeks by desired range
###############################################################################
WITH
get_weeks AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_numbering,
SPLIT(CAST(ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr)/4 AS string), '.')[
OFFSET
(0)] AS week_id_0,
FROM
table.defects
ORDER BY
fwkyr DESC
),
###############################################################################
### produce filter column for each window period by offsetting
###############################################################################
get_weeks_consequtive AS (
SELECT
*,
LAG(week_id_0,1) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_1,
LAG(week_id_0,2) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_2,
LAG(week_id_0,3) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_3
FROM
get_weeks ),
###############################################################################
### create tables and calculations per window using filter column where you group by for qty and keep top qty only
###############################################################################
week_id_0 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_0 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_1 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_1 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_2 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_2 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_3 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_3 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1)
###############################################################################
### union all selected windows
###############################################################################
SELECT
*
FROM
week_id_0
UNION ALL
SELECT
*
FROM
week_id_1
UNION ALL
SELECT
*
FROM
week_id_2
UNION ALL
SELECT
*
FROM
week_id_3
ORDER BY
week_id DESC
get_weeks
get_weeks_consequtive
week_id_1
result
PS ---
I brainstormed this quick per your update perhaps there is a better way and I would be interested in seeing it.
Anyhow, with such lengthy queries I typically produce a python script with text templates for repetitive parts and use a loop to expand repetitive parts to desired lengths by incrementing changing values and inserting them with so called f strings.

Display duplicate row indicator and get only one row when duplicate

I built the schema at http://sqlfiddle.com/#!18/7e9e3
CREATE TABLE BoatOwners
(
BoatID INT,
OwnerDOB DATETIME,
Name VARCHAR(200)
);
INSERT INTO BoatOwners (BoatID, OwnerDOB,Name)
VALUES (1, '2021-04-06', 'Bob1'),
(1, '2020-04-06', 'Bob2'),
(1, '2019-04-06', 'Bob3'),
(2, '2012-04-06', 'Tom'),
(3, '2009-04-06', 'David'),
(4, '2006-04-06', 'Dale1'),
(4, '2009-04-06', 'Dale2'),
(4, '2013-04-06', 'Dale3');
I would like to write a query that would produce the following result characteristics :
Returns only one owner per boat
When multiple owners on a single boat, return the youngest owner.
Display a column to indicate if a boat has multiple owners.
So the following data set when apply that query would produce
I tried
ROW_NUMBER() OVER (PARTITION BY ....
but haven't had much luck so far.
with data as (
select BoatID, OwnerDOB, Name,
row_number() over (partition by BoatID order by OwnerDOB desc) as rn,
count() over (partition by BoatID) as cnt
from BoatOwners
)
select BoatID, OwnerDOB, Name,
case when cnt > 1 then 'Yes' else 'No' end as MultipleOwner
from data
where rn = 1
This is just a case of numbering the rows for each BoatId group and also counting the rows in each group, then filtering accordingly:
select BoatId, OwnerDob, Name, Iif(qty=1,'No','Yes') MultipleOwner
from (
select *, Row_Number() over(partition by boatid order by OwnerDOB desc)rn, Count(*) over(partition by boatid) qty
from BoatOwners
)b where rn=1

ORA-00904: "PREV_TEMP": invalid identifier with LAG function

Whats is wrong with this query?
It returns:
ORA-00904: "PREV_TEMP": invalid identifier
SELECT Id, RecordDate, Temperature, LAG(Temperature) OVER (ORDER BY RecordDate) as prev_temp
FROM Weather
WHERE Temperature > prev_temp;
SQL schema:
Create table If Not Exists Weather (Id int, RecordDate date, Temperature int)
Truncate table Weather
insert into Weather (Id, RecordDate, Temperature) values ('1', '2015-01-01', '10')
insert into Weather (Id, RecordDate, Temperature) values ('2', '2015-01-02', '25')
insert into Weather (Id, RecordDate, Temperature) values ('3', '2015-01-03', '20')
insert into Weather (Id, RecordDate, Temperature) values ('4', '2015-01-04', '30')
What is wrong with the query is that column aliases cannot be re-used in the SELECT, WHERE, FROM, or GROUP BY clauses where they are defined. This applies to window functions, as well as everything else. And this is a rule in SQL, not Oracle (although some databases relax the restriction on GROUP BY).
In your case, there are basically two solutions, a subquery and a CTE:
WITH w AS (
SELECT w.*,
LAG(Temperature) OVER (ORDER BY RecordDate) as prev_temperature
FROM weather w
)
SELECT Id, RecordDate, Temperature, prev_temp
FROM w
WHERE Temperature > prev_temp;
You cannot use directly, but need to use in a subquery to be able to use the returning value from analytic function
SELECT *
FROM
(
SELECT Id, RecordDate, Temperature,
LAG(Temperature) OVER (ORDER BY RecordDate) as prev_temp
FROM Weather
)
WHERE Temperature > prev_temp;

Oracle - Calculating time differences

Let's say I have following data:
Create Table Pm_Test (
Ticket_id Number,
Department_From varchar2(100),
Department_To varchar2(100),
Routing_Date Date
);
Insert Into Pm_Test Values (1,'A','B',To_Date('20140101120005','yyyymmddhh24miss'));
Insert Into Pm_Test Values (1,'B','C',To_Date('20140101130004','yyyymmddhh24miss'));
Insert Into Pm_Test Values (1,'C','D',To_Date('20140101130004','yyyymmddhh24miss'));
Insert Into Pm_Test Values (1,'D','E',To_Date('20140201150004','yyyymmddhh24miss'));
Insert Into Pm_Test Values (2,'A','B',To_Date('20140102120005','yyyymmddhh24miss'));
Insert Into Pm_Test Values (3,'D','B',To_Date('20140102120005','yyyymmddhh24miss'));
Insert Into Pm_Test Values (3,'B','A',To_Date('20140102170005','yyyymmddhh24miss'));
For the following requirements I already added two virtual columns, I think they might be necessary:
Select t.*,
Count(Ticket_id) Over (Partition By Ticket_id Order By Ticket_id) Cnt_Id,
Row_Number() Over (Partition By Ticket_id Order By Ticket_id ) row_number
From Pm_Test t;
1) I want to measure how long each ticket stayed in a department (routing_date of successor_department - routing_date of predecessor department) by adding the column PROCESSING_TIME:
2) I want to measure the total processing time by adding the column TOTAL_PROCESSING_TIME:
What SQL statements would be necessary to do so?
Thank you very much in advance!
To solve your problem, the way you described, the following sql should get you there. One thing to keep in mind, this data model doesn't seem the most efficient to capture processing times, if that's its true intent as the first department to get the ticket isn't measured.
select dept.ticket_id, department_from, department_to, routing_date, dept_processing_time, total_ticket_processing_time
from
(select ticket_id, max(routing_date) - min(routing_date) total_ticket_processing_time
from pm_test
group by ticket_id) total
join
(select ticket_id, department_from, department_to, routing_date,
coalesce(routing_date - lag(routing_date) over (partition by ticket_id order by routing_date), 0) dept_processing_time
from pm_test) dept
on (total.ticket_id = dept.ticket_id);
This query produces desired output. Analytic functions max(), min() and lag() used for calculations.
Results are in hours, like in your question.
SQLFiddle
select t.ticket_id, t.department_from, t.department_to,
to_char(t.routing_date, 'mm.dd.yy hh24:mi:ss') rd,
count(ticket_id) over (partition by ticket_id) cnt_id,
row_number() over (partition by ticket_id order by t.routing_date ) rn,
round(24 * (t.routing_date-
nvl(lag(t.routing_date) over (partition by ticket_id
order by t.routing_date), routing_date) ) , 8) dept_time,
round(24 * (max(t.routing_date) over (partition by ticket_id)
- min(t.routing_date) over (partition by ticket_id)), 8) total_time
from pm_test t

I want to generate marksheet containing percentage, rank standard wise and division wise in oracle database

Here is my column
create table exam_details(
Stud_id varchar2(50),
Stud_course_id Number,
Stud_div char,
Stud_Sub_id Number,
Stud_Marks Number,
Sub_total_Marks Number,
Exam_id Number,
Exam_date Date
);
and content for db is
insert into exam_details values ('1A1',1,'A',1,55,100,1,'2-jan-2015');
insert into exam_details values ('1A1',1,'A',2,65,100,1,'3-jan-2015');
insert into exam_details values ('1A1',1,'A',3,72,100,1,'5-jan-2015');
insert into exam_details values ('1B1',1,'B',1,45,100,1,'2-jan-2015');
insert into exam_details values ('1B1',1,'B',2,65,100,1,'3-jan-2015');
insert into exam_details values ('1B1',1,'B',3,58,100,1,'5-jan-2015');
insert into exam_details values ('2A1',2,'A',1,75,100,1,'2-jan-2015');
insert into exam_details values ('2A1',2,'A',2,65,100,1,'3-jan-2015');
insert into exam_details values ('2A1',2,'A',3,82,100,1,'5-jan-2015');
I have tried But i am getting result only for
select stud_id,
RANK() OVER(ORDER BY stud_marks DESC) AS "Rank"
from exam_details;
Maybe this one:
select stud_id, Stud_div,
RANK() OVER(ORDER BY stud_marks DESC) AS Rank_all,
RANK() OVER(PARTITION BY Stud_div ORDER BY stud_marks DESC) AS Rank_div
from exam_details;