Rank and partition query in SQL - sql

I have a table in AS400 as below
Type Values Status
A 1 Y
A 2 N
A 3 Y
A 4 Y
A 5 N
B 2 Y
B 7 N
C 3 Y
C 5 N
C 4 Y
C 6 Y
C 7 Y
C 1 Y
D 3 Y
D 5 Y
E 7 N
E 4 N
E 3 Y
E 6 N
E 7 Y
E 8 N
What I need is Top 2 of each type that have a status Y. I.e. the result should be something like A 1 , A 3, B 2 , C3, C4, D3, D5, E3, E7.
The query I have used is this
SELECT type,
REFERENCES
FROM (
SELECT type,
REFERENCES,
STATUS,
rank() OVER (PARTITION BY Type ORDER BY REFERENCES DESC) AS Rank
FROM Tables
) a
WHERE rank <= 2
AND Type IN (A,B,C,D,E)
AND STATUS = Y;
The issue here is it doesn't filter out the status beforehand. It picks up the top 2, then filters out with Y. So the result looks something like A1 instead of A1 and A3, because it has first picked A1 and A2 , and then filtered out A2 .
Where do I insert the Status=y to get a more accurate result.
I am a novice in SQL so if theres a better way to write the above query as well, I am fine with that.

This outta do it. Use the with clause to feed the filtered result into a new query which then you can rank.
with testtable (type, value, status) as (
select 'A', 1, 'Y' from dual union all
select 'A', 2, 'N' from dual union all
select 'A', 3, 'Y' from dual union all
select 'A', 4, 'Y' from dual union all
select 'A', 5, 'N' from dual union all
select 'B', 2, 'Y' from dual union all
select 'B', 7, 'N' from dual union all
select 'C', 3, 'Y' from dual union all
select 'C', 5, 'N' from dual union all
select 'C', 4, 'Y' from dual union all
select 'C', 6, 'Y' from dual union all
select 'C', 7, 'Y' from dual union all
select 'C', 1, 'Y' from dual union all
select 'D', 3, 'Y' from dual union all
select 'D', 5, 'Y' from dual union all
select 'E', 7, 'N' from dual union all
select 'E', 4, 'N' from dual union all
select 'E', 3, 'Y' from dual union all
select 'E', 6, 'N' from dual union all
select 'E', 7, 'Y' from dual union all
select 'E', 8, 'N' from dual
)
, ys as (
select
*
from testtable
where STATUS = 'Y'
)
, yrank as (
select
type,
value,
status,
rank() over(partition by type order by value) Y_RANK
from ys
)
select
*
from yrank
where Y_RANK <= 2

Related

SQL How to align ranges of data points in rows?

Suppose having the data set:
with
data_table(title, x) as (
select 'a', 1 from dual union all
select 'a', 3 from dual union all
select 'a', 4 from dual union all
select 'a', 5 from dual union all
select 'b', 1 from dual union all
select 'b', 2 from dual union all
select 'b', 3 from dual union all
select 'b', 6 from dual
)
select * from data_table;
TITLE | X
-----------
a 1
a 3
a 4
a 5
b 1
b 2
b 3
b 6
Wee see that points related to a and b are different.
How to align values in X column so both groups have the same points, filling the gaps with NULL?
Expected result is:
TITLE | X
-----------
a 1
a NULL
a 3
a 4
a 5
a NULL
b 1
b 2
b 3
b NULL
b NULL
b 6
Straightforward solution I got is
with
data_table(title, x) as (
select 'a', 1 from dual union all
select 'a', 3 from dual union all
select 'a', 4 from dual union all
select 'a', 5 from dual union all
select 'b', 1 from dual union all
select 'b', 2 from dual union all
select 'b', 3 from dual union all
select 'b', 6 from dual
),
all_points(x) AS (
select distinct x from data_table
),
all_titles(title) AS (
select distinct title from data_table
),
aligned_data(title, x) as (
select t.title, p.x from all_points p cross join all_titles t
)
select ad.title, dt.x
from aligned_data ad
left join data_table dt on dt.title = ad.title and dt.x = ad.x
order by ad.title, ad.x;
As wee see cross join in aligned_data definition is bottleneck and can kill performance on valuable data sets.
I wonder if this task could be solved more elegantly. Maybe a trick with window functions can be proposed.

Update Oracle SQL - table with values from duplicates

i hope that somebody could help. I need to update a table from a select with duplicates.
ID;CLASS;VALUE;NEW
1;a;a3;
1;b;s6;
1;c;b99;
2;a;s3;
2;b;r6;
2;c;b99;
3;a;s5;
4;a;r6;
4;b;a3;
Look at my example table, there is a colum NEW which i have to update. In the example the column NEW was filled manually.
Here is the goal (as shown in table col NEW):
1.find duplicates via ID (HAVING COUNT(*) >1 or something like that)
UPDATE TABLE SET NEW=
CLASS || '_' || VALUE
WHERE CLASS='a' or 'b'
Easy for you?
Thx in advance
The logic behind is not completely clear; this could be a way.
setup:
create table yourTable(id, class, value, new) as
(
select 1, 'a', 'a3', cast (null as varchar2(10)) from dual union all
select 1, 'b', 's6', null from dual union all
select 1, 'c', 'b99', null from dual union all
select 2, 'a', 's3', null from dual union all
select 2, 'b', 'r6', null from dual union all
select 2, 'c', 'b99', null from dual union all
select 3, 'a', 's5', null from dual union all
select 4, 'a', 'r6', null from dual union all
select 4, 'b', 'a3', null from dual
)
query:
merge into yourTable t1
using (
select listagg(value, '_') within group (order by class) as new,
id
from yourTable
where class in ('a', 'b')
group by id
having count(distinct class) = 2
) t2
on ( t1.id = t2.id
and t1.class in ('a', 'b')
)
when matched then
update set t1.new = t2.new
result:
SQL> select *
2 from yourTable;
ID C VAL NEW
---------- - --- ----------
1 a a3 a3_s6
1 b s6 a3_s6
1 c b99
2 a s3 s3_r6
2 b r6 s3_r6
2 c b99
3 a s5
4 a r6 r6_a3
4 b a3 r6_a3
9 rows selected.

Filter rows according to changes among sets of rows

I have the following sample data:
LOCAN_ACCOUNT LOAN_VERSION LENDER PROPORTION PARAM_1
------------- ------------ ------ ---------- -------
1 1 A 0.6 a
1 1 B 0.4 b
1 2 A 0.6 a
1 2 B 0.4 b
1 3 A 0.6 a
1 3 B 0.4 b
...
...
2 1 A 0.55 a
2 1 B 0.45 b
2 2 A 0.55 a
2 2 B 0.45 b
2 2 C 0 c -- << Note the addition of lender C in LOAN_VERSION = 2
2 3 A 0.55 a
2 3 B 0.45 b
...
...
3 1 A 0.555 a
3 1 B 0.445 b
3 2 A 0.555 a
3 2 B 0.445 bbbbb -- << Note the modification of PARAM_1 for lender B in LOAN_VERSION = 2
...
...
4 1 A 0.555 a
4 1 B 0.445 b
4 2 A 0.555 a
4 2 D 0.445 d -- << Note the modification of lenders from B to D in loan versions 1 and 2
Requirements:
The expected output is:
LOAN_ACCOUNT SHOULD_BE_RETURNED
------------ ------------------
1 Yes
2 No
4 No
3 No
When column SHOULD_BE_RETURNED is 'Yes' when for a loan account, for all its loan versions, there are no changes in lenders, their proportions and param_1 column. Loan account 1 in above example satisfies these conditions.
The column should be "No" when-
There is a new lender in any versions for a loan. There is an addition of a lender "C" in loan account 2. Hence for loan account 2, SHOULD_BE_RETURNED = "No".
There is no change in any of the lenders' proportions or param_1 values throughout all the loan versions. Note the change in param_1 for loan account 3. Hence for that too SHOULD_BE_RETURNED = "No".
There should be no change in lenders. Notice the change in lender from "B" to "D" for loan account 4. Hence for that too SHOULD_BE_RETURNED = "No".
What I have tried:
So far I have managed only upto this but it also gives me the wrong output:
/*
WITH cte_loan_version AS (
SELECT 1 loan_account, 1 loan_version, 'A' lender, 0.6 proportion, 'a' param_1 FROM dual
UNION ALL SELECT 1, 1, 'B', 0.4, 'b' FROM dual
UNION ALL SELECT 1, 2, 'A', 0.6, 'a' FROM dual
UNION ALL SELECT 1, 2, 'B', 0.4, 'b' FROM dual
UNION ALL SELECT 1, 3, 'A', 0.6, 'a' FROM dual
UNION ALL SELECT 1, 3, 'B', 0.4, 'b' FROM dual
UNION ALL SELECT 2, 1, 'A', 0.55, 'a' FROM dual
UNION ALL SELECT 2, 1, 'B', 0.45, 'b' FROM dual
UNION ALL SELECT 2, 2, 'A', 0.55, 'a' FROM dual
UNION ALL SELECT 2, 2, 'B', 0.45, 'b' FROM dual
UNION ALL SELECT 2, 2, 'C', 0.00, 'c' FROM dual
UNION ALL SELECT 2, 3, 'A', 0.55, 'a' FROM dual
UNION ALL SELECT 2, 3, 'B', 0.45, 'b' FROM dual
UNION ALL SELECT 3, 1, 'A', 0.555, 'a' FROM dual
UNION ALL SELECT 3, 1, 'B', 0.445, 'b' FROM dual
UNION ALL SELECT 3, 2, 'A', 0.555, 'a' FROM dual
UNION ALL SELECT 3, 2, 'B', 0.445, 'bbbbb' FROM dual
UNION ALL SELECT 4, 1, 'A', 0.555, 'a' FROM dual
UNION ALL SELECT 4, 1, 'B', 0.445, 'b' FROM dual
UNION ALL SELECT 4, 2, 'A', 0.555, 'a' FROM dual
UNION ALL SELECT 4, 2, 'D', 0.445, 'd' FROM dual) -- */
SELECT lv.loan_account,
CASE
WHEN NOT EXISTS
(SELECT 1
FROM cte_loan_version lv_in
WHERE lv_in.loan_account = lv.loan_account
HAVING COUNT (DISTINCT lender) > 1
OR COUNT (DISTINCT proportion) > 1
OR COUNT (DISTINCT param_1) > 1)
THEN 'Yes'
ELSE 'No'
END AS should_be_returned
FROM cte_loan_version lv
GROUP BY lv.loan_account;
Any help on this will be much appreciated.
One approach is to aggregate the values together for each version, and then check if they are the same:
select loan_account,
(case when min(lpp) = max(lpp) then 'Y' else 'N' end) as should_be_returned
from (select loan_account, loan_version,
list_agg(lender || '-' || proportion || '-' || param_1, ', ') within group (order by lender, proportion, param_1) as lpp
from sampledata
group by loan_account
)
group by loan_account

Calculating data point which have Precision of 99%

We have a table which have millions of entry. The table have two columns, now there is correlation between X and Y when X is beyond a value, Y tends to be B (However it is not always true, its a trend not a certainty).
Here i want to find the threshold value for X, i.e(X1) such that at least 99% of the value which are less than X1 are B.
It can be done using code easily. But is there a SQL query which can do the computation.
For the below dataset expected is 6 because below 6 more than 99% is 'B' and there is no bigger value of X for which more than 99% is 'B'. However if I change it to precision of 90% then it will become 12 because if X<12 more than 90% of the values are 'B' and there is no bigger value of X for which it holds true
So we need to find the biggest value X1 such that at least 99% of the value lesser than X1 are 'B'.
X Y
------
2 B
3 B
3 B
4 B
5 B
5 B
5 B
6 G
7 B
7 B
7 B
8 B
8 B
8 B
12 G
12 G
12 G
12 G
12 G
12 G
12 G
12 G
13 G
13 G
13 B
13 G
13 G
13 G
13 G
13 G
14 B
14 G
14 G
Ok, I think this accomplishes what you want to do, but it will not work for the data volume you are mentioning. I'm posting it anyway in case it can help someone else provide an answer.
This may be one of those cases where the most efficient way is to use a cursor with sorted data.
Oracle has some builting functions for correlation analysis but I've never worked with it so I don't know how they work.
select max(x)
from (select x
,y
,num_less
,num_b
,num_b / nullif(num_less,0) as percent_b
from (select x
,y
,(select count(*) from table b where b.x<a.x) as num_less
,(select count(*) from table b where b.x<a.x and b.y = 'B') as num_b
from table a
)
where num_b / nullif(num_less,0) >= 0.99
);
The inner select does the following:
For every value of X
Count the nr of values < X
Count the nr of 'B'
The next SELECT computes the ratio of B's and filter only the rows where the ratio is above the threshold. The outer just picks the max(x) from those remaining rows.
Edit:
The non-scalable part in the above query is the semi-cartesian self-joins.
This is mostly inspired from the previous answer, which had some flaws.
select max(next_x) from
(
select
count(case when y='B' then 1 end) over (order by x) correct,
count(case when y='G' then 1 end) over (order by x) wrong,
lead(x) over (order by x) next_x
from table_name
)
where correct/(correct + wrong) > 0.99
Sample data:
create table table_name(x number, y varchar2(1));
insert into table_name
select 2, 'B' from dual union all
select 3, 'B' from dual union all
select 3, 'B' from dual union all
select 4, 'B' from dual union all
select 5, 'B' from dual union all
select 5, 'B' from dual union all
select 5, 'B' from dual union all
select 6, 'G' from dual union all
select 7, 'B' from dual union all
select 7, 'B' from dual union all
select 7, 'B' from dual union all
select 8, 'B' from dual union all
select 8, 'B' from dual union all
select 8, 'B' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 12, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'B' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 13, 'G' from dual union all
select 14, 'B' from dual union all
select 14, 'G' from dual union all
select 14, 'G' from dual;
Give a try with this and share the results:
Assuming table name as table_name and columns as x and y
with TAB AS (
select (count(x) over (PARTITION BY Y order by x rows between unbounded preceding and current row))/
(COUNT(case when y='B' then 1 end) OVER (PARTITION BY Y)) * 100 CC, x, y
from table_name)
select x,y from (SELECT min(cc) over (partition by y) min_cc, x, cc, y
FROM TAB
where cc >= 99)
where min_cc = cc

group by field for specific values

How do we use group by only to consider a certain value of a column
eg
if the column has values like , and I only want to group the records with the merge_ind = 'Y' or null, if it is say N the record should be treated as separate value
Merge1 Merge2
A Y
A Y
A Y
B Y
B Y
B Y
C N
C N
C N
D N
D N
E null
E null
F null
F null
null null
the o/p should be
count Merge1 merge2
3 A Y
3 B Y
1 C N
1 C N
1 C N
1 D N
1 D N
2 E null
1 F null
1 null null
I implemented it using a union but am not very happy with the performance.
Thanks
Ali
you can do something like this:
SQL> with data as (select 'A' Merge1, 'Y' Merge2 from dual union all
2 select 'A', 'Y' from dual union all
3 select 'A', 'Y' from dual union all
4 select 'B', 'Y' from dual union all
5 select 'B', 'Y' from dual union all
6 select 'B', 'Y' from dual union all
7 select 'C', 'N' from dual union all
8 select 'C', 'N' from dual union all
9 select 'C', 'N' from dual union all
10 select 'D', 'N' from dual union all
11 select 'D', 'N' from dual union all
12 select 'E', null from dual union all
13 select 'E', null from dual union all
14 select 'F', null from dual union all
15 select 'F', null from dual union all
16 select null, null from dual)
17 select merge1, max(merge2), count(*)
18 from (select merge1, merge2,
19 case when merge2 = 'Y' or merge2 is null then merge2 else to_char(rownum) end grp
20 from data)
21 group by merge1, grp
22 order by merge1;
M M COUNT(*)
- - ----------
A Y 3
B Y 3
C N 1
C N 1
C N 1
D N 1
D N 1
E 2
F 2
1
test fiddle: http://sqlfiddle.com/#!4/b85cc/1
Try:
select Merge1, Merge2, count(*)
from table1
group by Merge1, Merge2, case when Merge2 = 'N' then to_char(rownum) else Merge2 end
order by Merge1
Here is a sqlfiddle demo
After some considerable mucking around, I have a query that does the job although I could swear this question was originally tagged mysql, and unfortunately this is a mysql only answer:
select count, merge1, merge2
from (
select count(*) count, merge1, merge2,
if(merge2 = 'Y' or merge2 is null, 0, n)
from (
select merge1, merge2,
(#n := if(#n is null, 1, #n + 1)) n
from t
) x
group by 2, 3, 4
) y
Values not Y are treated as separate values with their own group.
It works by assigning a unique number to each row, them selectively grouping by that too when the value is not Y, thus creating a separate group for each non-Y row.
Here's an sqlfiddle with this query running your dara.