oracle select the rows using analytical conditions - sql

I have a table with the data as below
id start_dt cance1_dt record_eff_dt latest_row_flag
1 null null 01/01/2018 N
1 01/02/2018 01/02/2018 01/02/2018 N
1 01/03/2018 null 01/03/2018 Y
2 null 01/04/2018 01/04/2018 Y
3 01/05/2018 null 01/05/2018 N
3 null 01/06/2018 01/06/2018 Y
I have to rank the rows by grouping the rows with same id (partition by id) using the below conditions
condition 1
case when start_dt is not null AND cancel_dt is null and latest_row_flag = 'Y' then rank is 1.
condition 2
case when cancel_dt is null and latest_row_flag = 'Y' then scan all the rows for that same id and see if there is ever a row with start_dt is not null and cancel_dt is null then rank it as 2 else 3.
condition 3
else rank all other cases with 3
I'm struggling to come up with the code for condition 2, where i have to look through all the previous rows to see if there is ever such a case. Please help.

Hmmm . . . the condition just depends on window functions:
select t.*,
(case when start_dt is not null and cancel_dt is null and latest_row_flag = 'Y'
then 1
when cancel_dt is null and latest_row_flag = 'Y' and
sum(case when start_dt is not null and cancel_dt is null then 1 else 0 end) over (partition by id) > 0
then 2
else 3
end) as ranking
from t;

Related

Get a Count of DISTINCT Rows having non NULL Values in Columns with a GroupBy clause

Suppose the below table as my raw data
ID
Type
Date
Value1
Value2
1
A
31-Oct-22
NULL
0.5
1
B
31-Oct-22
NULL
0.6
1
C
31-Oct-22
0.8
0.7
1
A
30-Sep-22
0.6
NULL
2
A
31-Oct-22
0.2
NULL
2
C
31-Oct-22
NULL
0.3
2
B
30-Sep-22
NULL
NULL
2
D
30-Sep-22
NULL
NULL
What I want to do is find the Unique Count of IDs which have NON NULL Values in Value1 and Value2 with GroupBy on Date. So ideally the output would look somewhat like the below
The query output should ideally be as follows
Date
Value1
Value2
31-Oct-22
2
2
30-Sep-22
1
0
Explanation on the above values.
For 31-Oct-22 & Value1: Both ID 1 and ID2 had NON NULL multiple entries. The DISTINCT count here thus is 2.
For 30-Sep-22 & Value1: ID 1 had only one entry which was non NULL and ID2 had TWO NULL multiple entries. The DISTINCT count here thus is 1.
For 30-Sep-22 & Value2: Both ID1 and ID2 had only NULL entries. Hence count is 0.
I initially thought about DISTINCT. However I'm not sure how to merge DISTINCT ON a different column and combine it to get a NON NULL count of the columns.
Please help me.
We can do this with two levels of aggregation. In Postgres, we could use boolean aggregate functions:
select date,
count(*) filter(where has_value_1) cnt_value1,
count(*) filter(where has_value_2) cnt_value2
from (
select date, id,
bool_or(value1 is not null) has_value_1,
bool_or(value2 is not null) has_value_2
from mytable
group by date, id
) t
group by date
order by date
A more portable way to phrase this would be:
select date,
sum(has_value_1) cnt_value1,
sum(has_value_2) cnt_value2
from (
select date, id,
max(case when value1 is not null then 1 else 0 end) has_value_1,
max(case when value2 is not null then 1 else 0 end) has_value_2
from mytable
group by date, id
) t
group by date
order by date
Demo on DB Fiddlde:
date
cnt_value1
cnt_value2
2022-09-30
1
0
2022-10-31
2
2

SQL: Unable to join multiple select statements

I have a a table as below:
SourceCustomerId
BusinessDate
HasTaxBenifit
HasCollateral
HasLoan
BS:100037
2016-12-23
No
No
Yes
BS:100056
2018-01-13
No
Yes
No
BS:100037
2011-06-03
No
Yes
Yes
BS:100056
2019-10-14
Yes
No
No
BS:100022
2014-09-17
Yes
No
Yes
BS:100037
2013-07-18
Yes
Yes
No
BS:100056
2016-03-19
Yes
Yes
Yes
BS:100022
2015-04-20
Yes
No
No
BS:100022
2017-08-14
No
Yes
No
BS:100022
2012-11-23
No
Yes
No
And the output that I am expecting is
BinaryTaxBenefit
BinaryLoan
BinaryCollateral
diff_BinaryTaxBenefit
diff_BinaryLoan
diff_BinaryCollateral
0
0
0
NULL
NULL
NULL
1
0
1
1
0
1
1
0
0
0
0
-1
0
1
0
-1
1
0
0
1
1
NULL
NULL
NULL
1
1
0
0
-1
1
0
0
1
0
1
0
1
1
1
NULL
NULL
NULL
0
1
0
-1
0
1
1
0
0
0
0
0
To obtain this output, We need to follow three steps:
Partition the data by SourceCustomerID and then order the data by Source customer ID and Business Date
Create other columns BinaryTaxBenefit, BinaryLoan, BinaryCollateral The logic is every column has a binary equivalent and they will be having a value of 0 if the columns are having a value 'No'
Last and most difficult part subtract(BinaryColumn's only) the rows. But the subtraction must be within the group only.
So the first value difference is always NULL and the rest difference
I am able to write separate SQL queries for Step1 and Step2 :
Step1: Partition Data by SourceCustomerID and then order the data by Source customer ID and Business Date:
SELECT
SourceCustomerId,
BusinessDate,
ROW_NUMBER() OVER(PARTITION BY SourceCustomerId ORDER BY SourceCustomerId, BusinessDate ASC) RowNumber,
HasTaxBenifit,
HasLoan,
HasCollateral
from personDetail pd
from personDetail pd
Step2: Create other columns BinaryTaxBenefit, BinaryLoan, BinaryCollateral:
select * ,
(case when HasTaxBenifit = 'Yes' then 1 else 0 end) as BinaryTaxBenefit,
(case when HasLoan = 'Yes' then 1 else 0 end) as BinaryLoan,
(case when HasCollateral = 'Yes' then 1 else 0 end) as BinaryCollateral
from personDetail pd
How to I club step1 and Step2 into a single SQL Query?
Step3: Last and most difficult part subtract(Binary Column's only) the rows:
Here it is subtract all the rows without considering the gorup, not sure how to fix this
with v as (
select RowNumber, BinaryTaxBenefit, BinaryLoan, BinaryCollateral from personDetailTrial
)
select
RowNumber,BinaryTaxBenefit, BinaryLoan, BinaryCollateral,
BinaryTaxBenefit - lag(BinaryTaxBenefit, 1) over(order by RowNumber) as diff_BinaryTaxBenefit,
BinaryLoan - lag(BinaryLoan, 1) over(order by RowNumber) as diff_BinaryLoan,
BinaryCollateral - lag(BinaryCollateral, 1) over(order by RowNumber) as diff_BinaryCollateral
from v
if I understood, you can use CROSS APLY,
https://www.sqlshack.com/es/la-diferencia-entre-cross-apply-y-outer-apply-en-sql-server/

The number of item s corresponding to the day postgresql

How to make a table as follows?
table: item
id
region_id
date
1
2
2020-11-10
2
1
2020-11-11
3
3
2020-11-10
...
...
....
Result: the number of items corresponding to the day
region_id
2020-11-10
2020-11-11
...
1
0
1
2
1
0
3
1
0
you can use either crosstab or you can use group by :
select region_id
, MAX(case when date = '2020-11-10' then 1 else 0 end) as "2020-11-10"
, MAX(case when date = '2020-11-11' then 1 else 0 end) as "2020-11-11"
, MAX(case when date = '2020-11-12' then 1 else 0 end) as "2020-11-12"
, ...
from table
group by region_id
if you don't want to dynamically generate the column list , you have to use dynamic sql either with group by or crosstab.

How to update a column based on values of other columns

I have a tables as below
row_wid id code sub_code item_nbr orc_cnt part_cnt variance reporting_date var_start_date
1 1 ABC PQR 23AB 0 1 1 11-10-2019 NULL
2 1 ABC PQR 23AB 0 1 1 12-10-2019 NULL
3 1 ABC PQR 23AB 1 1 0 13-10-2019 NULL
4 1 ABC PQR 23AB 1 2 1 14-10-2019 NULL
5 1 ABC PQR 23AB 1 3 2 15-10-2019 NULL
I have to update var_start_date column with min(reporting_date) for each combination of id,code,sub_code and item_nbr only till variance field is zero.
Row with variance = 0 should have null var_start_date. and next row after that should have next min(var_start_date.). FYI, variance is calculated as par_cnt-orc_cnt
so my output should look like this -
row_wid id code sub_code item_nbr orc_cnt part_cnt variance reporting_date var_start_date
1 1 ABC PQR 23AB 0 1 1 11-10-2019 11-10-2019
2 1 ABC PQR 23AB 0 1 1 12-10-2019 11-10-2019
3 1 ABC PQR 23AB 1 1 0 13-10-2019 NULL
4 1 ABC PQR 23AB 1 2 1 14-10-2019 14-10-2019
5 1 ABC PQR 23AB 1 3 2 15-10-2019 14-10-2019
I am trying to write a function using below query to divide the data into sets.
SELECT DISTINCT MIN(reporting_date)
OVER (partition by id, code,sub_code,item_nbr ORDER BY row_wid ),
RANK() OVER (partition by id, code,sub_code,item_nbr ORDER BY row_wid)
AS rnk,id, code,sub_code,item_nbr,orc_cnt,part_cnt,variance,row_wid
FROM TABLE T1
.But dont know how to include variance field to split the sets.
I would suggest:
select t.*,
(case when variance <> 0
then min(reporting_date) over (partition by id, code, sub_code, item_nbr, grouping)
end) as new_reporting_date
from (select t.*,
sum(case when variance = 0 then 1 else 0 end) over (partition by id, code, sub_code, item_nbr) as grouping
from t
) t;
Note that this does not use a JOIN. It should be more efficient than an answer that does.
Try as below
SELECT T.*, CASE WHEN T.variance = 0 THEN NULL ELSE MIN(reporting_date) OVER (PARTITION BY T1.RANK ORDER BY T1.RANK) END AS New_var_start_date
FROM mytbl T
LEFT JOIN (
SELECT row_wid, variance, COUNT(CASE variance WHEN 0 THEN 1 END) OVER (ORDER BY row_wid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) +1 AS [Rank]
FROM mytbl
) T1 ON T.row_wid = T1.row_wid
SQL FIDDLE DEMO

inserting into table closing rows while keeping the date column

Following this question.
My table
id sum type date
1 3 -1 2017-02-02
1 6 -1 2017-02-04
1 -6 2 2017-02-01
1 -3 1 2017-02-09
1 3 -1 2017-02-17
1 6 -1 2017-02-05
This query finds people who pass the conditions and returns an occurrences number of rows of those users with some columns modified.
with t as(
select id
, -abs (sum) as sum
, sum (case when type = -1 then 1 else -1 end) as occurrences
--, collect_list(date) as time_col
from table
group by id, abs(sum)
having sum (case when type = -1 then 1 else -1 end) > 15
)
select t.id
, t.sum
, 2 as type
from t
lateral view explode (split (space (cast (occurrences as int) - 1),' ')) e
-- lateral view explode(time_col) time_table as time_key;
The problem is, I need every row to hold one date column from the list. I tried adding , collect_list(date) as time_col and then
lateral view explode(time_col) time_table as time_key;
but this just returned all possible combinations. I could probably use a join(would that work?), but I wondered if that's really necessary.
In the end these rows
1 3 -1 2017-02-17
1 6 -1 2017-02-05
would transform into
1 -3 2 2017-02-17
1 -6 2 2017-02-05
select val_id
,-val_sum as val_sum
,2 as val_type
,val_date
from (select val_id
,val_sum
,val_type
,val_date
,sum (case when val_type = -1 then 1 else -1 end) over
(
partition by val_id,-abs (val_sum)
) as occurrences
,row_number () over
(
partition by val_id,val_sum
order by val_date desc
) as rn
from mytable
) t
where val_type = -1
and rn <= occurrences
and occurrences > 15
;
Execution results (without and occurrences > 15)
+--------+---------+----------+------------+
| val_id | val_sum | val_type | val_date |
+--------+---------+----------+------------+
| 1 | -3 | 2 | 2017-02-17 |
+--------+---------+----------+------------+
| 1 | -6 | 2 | 2017-02-05 |
+--------+---------+----------+------------+