Insert the table data based on grouping of two columns - sql

I have a oracle table with the following format,
For eg:
JLID Dcode SID TDT QTY
8295783 3119255 9842 3/5/2018 14
8269771 3119255 9842 3/6/2018 11
8302211 3119255 1126 3/1/2018 19
Here I have different SID for the same Dcode, now I need to get the SID with the maximum Qty. (i.e) for SID 9842 - (14+11)=25, for SID 1126 it is 19, then the results should be on SID 9842. So, our query should returns the following results
JLID Dcode START_DT END_DT SID
111 3119255 3/1/2018 3/31/2018 12:00 9842
Startdate and enddate should be calculated from TDT (i.e) start date is the first date of the month and the end date is the last date of the month
Can anyone please suggest me some ideas to do it.

It might be as simple as this:
SELECT Dcode, start_date, end_date, SID FROM (
SELECT Dcode, SID, TRUNC(start_date, 'MONTH') AS start_date
, LAST_DAY(end_date) AS end_date
, ROW_NUMBER() OVER ( PARTITION BY Dcode ORDER BY total_qty DESC ) AS rn
FROM (
SELECT Dcode, SID, MIN(TDT) AS start_date, MAX(TDT) AS end_date
, SUM(QTY) AS total_qty
FROM mytable
GROUP BY Dcode, SID
)
) WHERE rn = 1
In the inner most subquery I aggregation to get the range of dates and total quantity for particular values of Dcode and SID. Then I use an anaylitic (window) function to get the row for which total quantity is the greatest. (You would want to use RANK() in place of ROW_NUMBER() in the event you want to return more than one value of SID with the same quantity.)

Here's one option which doesn't contain JLID = 111 in the final result as I have no idea where you took it from.
SQL> with test (jlid, dcode, sid, tdt, qty) as
2 (select 8295783, 3119255, 9842, date '2018-03-05', 14 from dual union
3 select 8269771, 3119255, 9842, date '2018-08-22', 11 from dual union
4 select 8302211, 3119255, 1126, date '2018-03-01', 19 from dual union
5 --
6 select 1234567, 1112223, 1000, date '2018-06-16', 88 from dual
7 )
8 select dcode,
9 min (trunc (tdt, 'mm')) start_dt, --> MIN
10 max (last_day (tdt)) end_dt, --> MAX
11 sid
12 from (select dcode,
13 sid,
14 tdt,
15 sqty,
16 rank () over (partition by dcode order by sqty desc) rnk
17 from (select dcode,
18 sid,
19 tdt,
20 sum (qty) over (partition by dcode, sid) sqty
21 from test))
22 where rnk = 1
23 group by dcode, sid; --> GROUP BY
DCODE START_DT END_DT SID
---------- ---------------- ---------------- ----------
1112223 01.06.2018 00:00 30.06.2018 00:00 1000
3119255 01.03.2018 00:00 31.08.2018 00:00 9842
SQL>

Related

Oracle sum most recent records until a defined value then Ignore the rest

Im looking to sum a column until a defined value then ignore the rest of the records.
ID
WHEN
VALUE
AVG_COL
101
2016
6
84.5
101
2015
3
76
101
2014
3
87
101
2013
15
85.8
101
2012
6
92
101
2011
3
81
101
2010
3
82.3
I need a single result set of
ID
VALUE
AVG_COL
101
30
82.3
I have tried the following
SELECT
ID,
WHEN,
VALUE,
AVG_COL,
SUM(VALUE) OVER (PARTITION BY ID ORDER BY WHEN) AS VALUE, --must equal 30
AVG(AVG_COL) OVER (PARTITION BY ID) AVG
FROM
TABLE_ONE
WHERE
VALUE = 30;
Any help would be greatly appreciated!
Hi try some thing like this, where modified the WHERE clause
-- Untested
SELECT
ID,
WHEN,
VALUE,
AVG_COL,
SUM(VALUE) OVER (PARTITION BY ID ORDER BY WHEN) AS VALUE, --must equal 30
AVG(AVG_COL) OVER (PARTITION BY ID) AVG
FROM
TABLE_ONE
WHERE
ID IN (SELECT ID FROM ( (SELECT ID, sum(VALUE) sum_val FROM
TABLE_ONE GROUP BY ID) WHERE SUM_VAL = 30);
try this
select id,
SUM(VALUE) OVER (PARTITION BY ID ORDER BY WHEN RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS VALUE,
AVG(AVG_COL) OVER (PARTITION BY ID) AVG
from table_one
where VALUE <= 30
order by when desc
fetch first 1 rows only;
You can compute the running sum with window functions, then filter and limit in an outer query.
Assuming a table like (id, dt, val, . . .):
select *
from (
select t.*,
sum(val) over(partition by id order by dt) sum_val
from mytable t
) t
where sum_val >= 30
order by row_number() over(partition by id order by dt desc)
fetch first row with ties
Notes:
this handles multiple ids at once
for each id, this brings the first row where the running sum of the value reaches at least 30, if any (rows are processed by descending date)
What you need is running sum of values and a decision on what order by you want to apply for that running sum.
SAMPLE DATA
WITH
tbl AS
(
Select 101 "ID", 2016 "YR", 6 "VAL", 84.5 "AVG_COL" From Dual Union All
Select 101 "ID", 2015 "YR", 3 "VAL", 76 "AVG_COL" From Dual Union All
Select 101 "ID", 2014 "YR", 3 "VAL", 87 "AVG_COL" From Dual Union All
Select 101 "ID", 2013 "YR", 15 "VAL", 85.8 "AVG_COL" From Dual Union All
Select 101 "ID", 2012 "YR", 6 "VAL", 92 "AVG_COL" From Dual Union All
Select 101 "ID", 2011 "YR", 3 "VAL", 81 "AVG_COL" From Dual Union All
Select 101 "ID", 2010 "YR", 3 "VAL", 82.3 "AVG_COL" From Dual
),
Create CTE (I named it grid) to prepare your data - in this case I used descending order by years:
grid AS
(
Select
ID, YR,
VAL,
Sum(VAL) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row) "RUNNING_SUM",
CASE WHEN Sum(VAL) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row) >= 30
THEN Sum(1) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row)
END "IS_OVER_RN",
AVG_COL,
Round(AVG(AVG_COL) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row), 2) "RUNNING_AVG"
From
tbl
)
This cte is resulting as:
ID YR VAL RUNNING_SUM IS_OVER_RN AVG_COL RUNNING_AVG
---------- ---------- ---------- ----------- ---------- ---------- -----------
101 2016 6 6 84.5 84.5
101 2015 3 9 76 80.25
101 2014 3 12 87 82.5
101 2013 15 27 85.8 83.33
101 2012 6 33 5 92 85.06
101 2011 3 36 6 81 84.38
101 2010 3 39 7 82.3 84.09
Now you can get your result with the code below
-- Main SQL
SELECT
ID, RUNNING_SUM, RUNNING_AVG
FROM
grid g
WHERE IS_OVER_RN = (Select Min(IS_OVER_RN) From grid Where ID = g.ID)
Result:
-- with YEARS in DESCENDING order
ID RUNNING_SUM RUNNING_AVG
---------- ----------- -----------
101 33 85.06
If you make orderings within analytic functions in grid cte ASCENDING then the same main SQL would result with:
-- with YEARS in ASCENDING order
ID RUNNING_SUM RUNNING_AVG
---------- ----------- -----------
101 30 82.5

BigQuery - Picking latest not null value within 28 interval

I'm trying to add a column on this table and stuck for a little while
ID
Category 1
Date
Data1
A
1
2022-05-30
21
B
2
2022-05-21
15
A
2
2022-05-02
33
A
1
2022-02-11
3
B
2
2022-05-01
19
A
1
2022-05-15
null
A
1
2022-05-20
11
A
2
2022-04-20
22
to
ID
Category 1
Date
Data1
Picked_Data
A
1
2022-05-30
21
11
B
2
2022-05-21
15
19
A
2
2022-05-02
33
22
A
1
2022-02-11
3
some number or null
B
2
2022-05-01
19
some number or null
A
1
2022-05-15
null
some number or null
A
1
2022-05-20
11
some number or null
A
2
2022-04-20
22
some number or null
The logic is to partition by Category1 and ID then pick the latest none null value within the past 28 days. If there is no data exist, it'll be null
For the first row, ID = A and Category 1, it will pick 7th row as they are in the same category, ID and the date difference is <= 28. It skipped row 4th and 6th as the date is too far back and null value.
I've tried querying this by
select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data
but it's picking incorrect rows,my guess is this query
Date between Date - Inteverval 28 DAY and Date
is not picking the correct date.. could anyone give me advise/suggestion how I could twick this query?
Consider below approach
select *,
first_value(data1 ignore nulls) over past_28_days as picked_data
from your_table
window past_28_days as (
partition by id, category_1
order by unix_date(date)
range between 29 preceding and 1 preceding
)
if applied to sample data in your question - output is
Consider below approach:
with sample_data as (
select 'A' as ID, 1 as category_1, date('2022-05-30') as date, 21 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-21') as date, 15 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-05-02') as date, 33 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-02-11') as date, 3 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-01') as date, 19 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-15') as date, NULL as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-20') as date, 11 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-04-20') as date, 22 as data1,
),
with_next_data as (
select *,
lag(date) over (partition by ID,category_1 order by date) as next_date,
lag(data1) over (partition by ID,category_1 order by date) as next_data,
from sample_data
)
select
id,
category_1,
date,
data1,
if(date_diff(date, next_date,day) <= 28, next_data, null) as picked_data
from with_next_data
Output:

How to create a start and end date with no gaps from one date column and to sum a value within the dates

I am new SQL coding using in SQL developer.
I have a table that has 4 columns: Patient ID (ptid), service date (dt), insurance payment amount (insr_amt), out of pocket payment amount (op_amt). (see table 1 below)
What I would like to do is (1) create two columns "start_dt" and "end_dt" using the "dt" column where if there are no gaps in the date by the patient ID then populate the start and end date with the first and last date by patient ID, however if there is a gap in service date within the patient ID then to create the separate start and end date rows per patient ID, along with (2) summing the two payment amounts by patient ID with in the one set of start and end date visits (see table 2 below).
What would be the way to run this using SQL code in SQL developer?
Thank you!
Table 1:
Ptid
dt
insr_amt
op_amt
A
1/1/2021
30
20
A
1/2/2021
30
10
A
1/3/2021
30
10
A
1/4/2021
30
30
B
1/6/2021
10
10
B
1/7/2021
20
10
C
2/1/2021
15
30
C
2/2/2021
15
30
C
2/6/2021
60
30
Table 2:
Ptid
start_dt
end_dt
total_insr_amt
total_op_amt
A
1/1/2021
1/4/2021
120
70
B
1/6/2021
1/7/2021
30
20
C
2/1/2021
2/2/2021
30
60
C
2/6/2021
2/6/2021
60
30
You didn't mention the specific database so this solution works in PostgreSQL. You can do:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select *,
sum(inc) over(partition by ptid order by dt) as grp
from (
select *,
case when dt - interval '1 day' = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
Result:
ptid start_dt end_dt total_insr_amt total_op_amt
----- ---------- ---------- -------------- -----------
A 2021-01-01 2021-01-04 120 70
B 2021-01-06 2021-01-07 30 20
C 2021-02-01 2021-02-02 30 60
C 2021-02-06 2021-02-06 60 30
See running example at DB Fiddle 1.
EDIT for Oracle
As requested, the modified query that works in Oracle is:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select x.*,
sum(inc) over(partition by ptid order by dt) as grp
from (
select t.*,
case when dt - 1 = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
See running example at db<>fiddle 2.

Oracle Running Subtraction

I have the below data. I want to subtract the first row from Total Qty (80) and then subtract the rest of the rows from QTY from the previous row of QTY1.
QTY QTY1 DATE TOTAL QTY
2 78 01-JAN-20 80
1 77 15-JAN-20
46 31 22-JAN-20
16 15 27-JAN-20
Is there a way to do this? Any help is greatly appreciated. Thanks
select
t.*
,first_value(TOTAL_QTY)over(order by DT) - sum(QTY)over(order by DT) as QTY1
from t;
Full example with your sample data:
with T(QTY, DT, TOTAL_QTY) as (
select 2 , to_date('01-JAN-20','dd-mon-yy'),80 from dual union all
select 1 , to_date('15-JAN-20','dd-mon-yy'),null from dual union all
select 46, to_date('22-JAN-20','dd-mon-yy'),null from dual union all
select 16, to_date('27-JAN-20','dd-mon-yy'),null from dual
)
select
t.*
,first_value(TOTAL_QTY)over(order by DT) - sum(QTY)over(order by DT) as QTY1
from t;
Result:
QTY DT TOTAL_QTY QTY1
2 2020-01-01 80 78
1 2020-01-15 77
46 2020-01-22 31
16 2020-01-27 15
SQL tables represent unordered sets. Your question seems to rely on the ordering of the rows. Let me assume you have a column that represents the ordering.
Use a cumulative sum:
select t.*,
sum(total_qty) over () - sum(qty) over (order by <ordering col>) as qty1
from t;
Here is a db<>fiddle.
Something like this (the CTE is just your data): if you add any more stuff later (in the total_qty column), then that would also get added to the total_qty calcuation (as would be typical for additions to, and subtractions from, inventory.
with d as
(select 2 qty, 78 qty1 , to_date('01-JAN-20','dd-mon-rr') datecol, 80 total_qty from dual union all
select 1, 77, to_date('15-JAN-20','dd-mon-rr'),null from dual union all
select 46 , 31, to_date('22-JAN-20','dd-mon-rr'),null from dual union all
select 16 , 15 , to_date('27-JAN-20','dd-mon-rr'),null from dual
)
select sum(total_qty) over (order by datecol) - sum(qty) over (order by datecol)
from d
You can do:
select
qty,
first_value(total_qty) over(order by date)
- sum(qty) over(order by date) as qty1,
date, total_qty
from t
order by date

Oracle SQL overlap between begin date and end date in 2 or more records

Database my_table:
id seq start_date end_date
1 1 01-01-2017 02-01-2017
1 2 07-01-2017 09-01-2017
1 3 11-01-2017 11-01-2017
2 1 20-01-2017 20-01-2017
3 1 01-02-2017 02-02-2017
3 2 03-02-2017 04-02-2017
3 3 08-01-2017 09-02-2017
3 4 09-01-2017 10-02-2017
3 5 10-01-2017 12-02-2017
My requirement is to get the first date (normally seq 1 start date) and end date (normally last seq end date) and the number of dates occurred during all seq for each unique ID.
Date occurred:
id 1 2 3
01-01-2017 20-01-2017 01-02-2017
02-01-2017 02-02-2017
07-01-2017 03-02-2017
08-01-2017 04-02-2017
09-01-2017 08-02-2017
11-01-2017 09-02-2017
10-02-2017
11-02-2017
12-02-2017
total 6 1 9
Here is the result I want:
id start_date end_date num_date
1 01-01-2017 11-01-2017 6
2 20-01-2017 20-01-2017 1
3 01-02-2017 12-02-2017 9
I have tried
SELECT id
, MIN(start_date)
, MAX(end_date)
, SUM(end_date - start_date + 1)
FROM my_table
GROUP BY id
and this SQL statement work fine in id 1 and 2 since there is no overlap date between begin date and end date. But for id 3, the result num_date is 11. Could you please suggest the SQL statement to solve this problem? Thank you.
One more question: The date in database is in datetime format. How do I convert it to date. I tried to use TRUNC function but it sometimes convert date to yesterday instead.
You need to count how many times an end_date equals the following start_date. For this you need to use the lag() or the lead() analytic function. You can use a case expression for the comparison, but alas you can't wrap the case expression within a COUNT or SUM in the same query; you need a subquery and an outer query.
Something like this; not tested, since you didn't provide CREATE TABLE and INSERT statements to recreate your sample data.
select id, min(start_date) as start_date, max(end_date) as end_date,
sum(end_date - start_date + 1 - flag) as num_days
from ( select id, start_date, end_date,
case when start_date = lag(end_date)
over (partition by id order by end_date) then 1
else 0 end as flag
from my_table
)
group by id;
SELECT id,
MIN( start_date ) AS start_date,
MAX( end_date ) AS end_date,
SUM( end_date - start_date + 1 ) AS num_days
FROM (
SELECT id,
GREATEST(
start_date,
COALESCE(
LAG( end_date ) OVER ( PARTITION BY id ORDER BY seq ) + 1,
start_date
)
) AS start_date,
end_date
FROM your_table
)
WHERE start_date <= end_date
GROUP BY id;