Recursive ORDER BY - sql

I have a USERS table which is a membership matrix like below. Table is unique on ID, and each ID belongs to at least one group, but could belong to all 3.
SELECT 1 AS ID, 0 AS IS_A, 0 AS IS_B, 1 AS IS_C FROM DUAL UNION ALL
SELECT 2,0,1,0 FROM DUAL UNION ALL
SELECT 3,0,1,1 FROM DUAL UNION ALL
SELECT 4,1,1,0 FROM DUAL UNION ALL
SELECT 5,1,1,0 FROM DUAL UNION ALL
SELECT 6,1,1,1 FROM DUAL UNION ALL
SELECT 7,0,1,1 FROM DUAL UNION ALL
SELECT 8,0,0,1 FROM DUAL UNION ALL
SELECT 9,1,0,0 FROM DUAL UNION ALL
SELECT 10,1,0,1 FROM DUAL UNION ALL
SELECT 11,0,0,1 FROM DUAL UNION ALL
SELECT 12,0,1,1 FROM DUAL
The final goal is to SELECT randomly a sample of at least 4 users from A, 3 from B and 5 from C (just an example) but with exactly 10 distinct IDs (otherwise the solution is trivial; just SELECT *).
The focus is less to determine if it's possible at all, but more to attempt a best effort to maximize memberships.
The output is expected to be unique on ID.
I can only think of a procedural way to achieve this:
Take the first ID with MAX(IS_A+IS_B+IS_C)
Check if the quotas are reached
If, for example, we already have 4 users from A, then we'll continue with the next ID with MAX(IS_B+IS_C), completely ignoring any further contributions from IS_A column
If we have already achieved all quotas, revert back to taking MAX(IS_A+IS_B+IS_C) to get "bonus" points
Stop upon reaching the overall maximum of 10
In essence, we prioritize and incrementally take the ID that has the most memberships in groups that have not reached the quota
However, I can't figure out how to do this in Oracle SQL since the ORDER BY would depend on not just the current row's values, but also recursively on whether the earlier rows have filled up the respective quotas.
I've tried ROWNUM, ROW_NUMBER(), SUM(IS_A) OVER (ORDER BY ...), RECURSIVE CTE but to no avail. Best I have is
WITH CTE AS (
SELECT ID, IS_A, IS_B, IS_C
, ROW_NUMBER() OVER (ORDER BY IS_A+IS_B+IS_C DESC) AS RN
FROM USERS
)
, CTE2 AS (
SELECT CTE.*
, GREATEST(4 - SUM(IS_A) OVER (ORDER BY RN), 0.001) AS QUOTA_A --clip negatives to 0.001
, GREATEST(3 - SUM(IS_B) OVER (ORDER BY RN), 0.001) AS QUOTA_B --so that when all quotas are exhausted,
, GREATEST(5 - SUM(IS_C) OVER (ORDER BY RN), 0.001) AS QUOTA_C --we still prioritize those that contribute most number of concurrent memberships
FROM CTE
)
SELECT ID FROM CTE2
ORDER BY QUOTA_A*IS_A + QUOTA_B*IS_B + QUOTA_C*IS_C DESC
FETCH NEXT 10 ROWS ONLY
but it does not work because QUOTA_A is computed based on ORDER BY RN instead of recursively.
Thanks in advance!

Related

Distribution of data in buckets - Oracle 11g

I have a table with two columns BRANCH and ACTIVITIES, where BRANCH is a unique id of location and ACTIVITIES are number of records belong to respective BRANCH. These records to be distributed in 5 buckets in a way that all buckets should contain almost equal number of records. (no matter if difference is +/-1000)
The challenge is if one branch is selected in a bucket then all activities of same branch will also be selected in same bucket, in other words, number of activities belong to one BRANCH cannot be split. Lets take a very simple example so that I can explain what I am trying to achieve
Total Branches=10
Total Number of activities (records) = 55,000
Average (total activities/total buckets) = 11,000
Sample Data
After Distribution
All buckets contain 11,000 records but things are not such straight forward when we look into real data real data
All Oracle query masters are requested to please look into this. Your expert opinion will highly be appreciated.
Unfortunately, this is a bin-packing problem and a "perfect" solution requires -- essentially -- searching through all possible assignments of buckets and then choosing the "best" one. And such an approach is not really suitable for SQL.
For a "good-enough" solution, though, something like a round-robin approach often works well. Simply enumerate the branches from biggest to smallest and assign them to buckets:
select a.branch,
1 + mod(seqnum, 5) as bucket
from (select a.branch, count(*) as cnt,
row_number() over (order by count(*)) desc as seqnum
from activities a
group by a.branch
) a;
Because of the ordering, this is going to generally create buckets of different sizes. So, a slight variation assigns the buckets as 1-2-3-4-5-5-4-3-2-1:
select a.branch,
(case when mod(seqnum, 10) in (0, 9) then 1
when mod(seqnum, 10) in (1, 8) then 2
when mod(seqnum, 10) in (2, 7) then 3
when mod(seqnum, 10) in (3, 6) then 4
when mod(seqnum, 10) in (4, 5) then 5
end) as bucket
from (select a.branch, count(*) as cnt,
row_number() over (order by count(*)) desc as seqnum
from activities a
group by a.branch
) a;
You could also try below query. I added some stats columns in this inline view stats_cols_added_tab before I applied dense_rank analytic function to that inline view. Finally I used NTILE analytic function to get five groups.
with sample_data (branch, activities) as (
select 1, 1000 from dual union all
select 2, 2000 from dual union all
select 3, 3000 from dual union all
select 4, 4000 from dual union all
select 5, 5000 from dual union all
select 6, 6000 from dual union all
select 7, 7000 from dual union all
select 8, 8000 from dual union all
select 9, 9000 from dual union all
select 10, 10000 from dual
)
,
stats_cols_added_tab as (
select s.*
, count(*)over() total_branches
, sum(activities)over() total_number_of_activities
, avg(activities)over() * 2 Average
, case when row_number()over(order by s.branch) <= count(*)over() / 2 then 1 else 2 end grp
from sample_data s
)
SELECT BRANCH, ACTIVITIES, NTILE(5) OVER (ORDER BY ranked_grp, BRANCH) AS bucket
FROM (
select BRANCH, ACTIVITIES
, dense_rank()over(
PARTITION BY grp
order by decode(grp, 1, activities, -1 * activities)
) ranked_grp
from stats_cols_added_tab t
) t
order by ranked_grp, BRANCH
;

ORACLE SQL group based on values in a reference table

Customer table and Acct tables has global scope, they share and increment this value
Below is customer table, SEQ NO 1 is beginning of customer data, SEQ_NO 238 is beginning of another customer data
Another is account table, all accounts with their SEQ_NOs inside a boundary of customer get same group (I want to group those accounts to the same customer, so that I can use listAgg to concatenate account id.), for example, below from SEQ_NO 2 and NO 224 (inclusive) should be assigned to the same group.
Is there a SQL way to do that, The worst case I was thinking is to define oracle type, and using function do that.
Any help is appreciate.
If I understand your question correctly, you want to be able to assign rows in the account table to groups, one per customer, so that you can then aggregate based on these groups.
So, the question is how to identify to which customer each account belongs, based on the sequence boundaries given in the first table ("customer") and the specific account numbers in the second table ("account").
This can be done in plain SQL, and relatively easily. You need a join between the accounts table and a subquery based on the customers table. The subquery must show the first and the last sequence number allocated to each client; to do that, you can use the lead analytic function. A bit of care must be taken regarding the last customer, for whom there is no upper limit for the sequence numbers.
You didn't provide test data in a usable format, so I created sample data in the with clause below (which is not part of the query - it's just there as a placeholder for test data).
with
customer (cust_id, seq_no) as (
select 101, 1 from dual union all
select 102, 34 from dual union all
select 200, 58 from dual union all
select 130, 90 from dual
)
, account (acct_id, seq_no) as (
select 1003, 3 from dual union all
select 1005, 11 from dual union all
select 1007, 33 from dual union all
select 1008, 60 from dual union all
select 1103, 77 from dual union all
select 1140, 92 from dual union all
select 1145, 99 from dual
)
select c.cust_id,
listagg(a.acct_id, ',') within group (order by a.acct_id) as acct_list
from (
select cust_id, seq_no as lower_no,
lead(seq_no) over (order by seq_no) - 1 as upper_no
from customer
) c
left outer join account a
on a.seq_no between c.lower_no and nvl(c.upper_no, a.seq_no)
group by c.cust_id
order by c.cust_id
;
OUTPUT
CUST_ID ACCT_LIST
------- --------------------
101 1003,1005,1007
102
130 1140,1145
200 1008,1103

Numbering sequential numbers in a column in SQL

I want to give a number to the value based on value in above row such that when the sequence breaks, it should again start from 1 otherwise should keep on increment the number.
The query is :
select'30300001' as lst union all
select'30300002' union all
select'30300003' union all
select'30300004' union all
select'30300001' union all
select'30300006' union all
select'30300007' union all
select'30300008' union all
select'30300009'
And the output I want to be as:
select'30300001' as lst,1 as rnk union all
select'30300002',2 union all
select'30300003',3 union all
select'30300004',4 union all
select'30300001',1 union all
select'30300006',1 union all
select'30300007',2 union all
select'30300008',3 union all
select'30300009',4
I tried it with row_number and rank functions but could not get the required output. How can I get the desired result?
If you have an orderingid (of some sort), then you can use the difference between your column and a column with a sequence. This identifies a group, which can then be used with other window functions:
select q.*,
row_number() over (partition by grp order by orderingid)
from (select q.*,
(lst - row_number() over (order by orderingid)) as grp
from query q
) q;
Note: this assumes that lst is actually a number or a value readily converted to a number.

In oracle How can I Find out one/two Columns data which corresponding other columns have maximum value

I'm Using Oracle where,
I have a Table(FE_IMPORT_LC Table) with data from where i give in following few column with data
TRANSMIT_LC_NO LIAB_AMT_LCY REM_LC_AMT_LCY IMP_AMEND_NO
108615020048 10022000 10022112 00
108615020048 10022000 10022112 01
108615020048 10022000 10022112 02
108615020048 11692000 8351760 03
I want to find out Data of the Red Marked Rows, which IMP_AMEND_NO column value is maximum. That means I want to find out one/two Columns data which corresponding other columns have maximum value.
So, I already create following query:
SELECT l1.liab_amt_lcy
FROM fe_import_lc l1
WHERE l1.transmit_lc_no = '108615020048'
AND l1.imp_amend_no = (SELECT MAX(l2.imp_amend_no)
FROM fe_import_lc l2
WHERE l2.transmit_lc_no = l1.transmit_lc_no)
But I want more effective query for this, If any one know about it please...Please give answer/reply as early as possible.
Try;
select liab_amt_lcy
from (
SELECT l1.liab_amt_lcy, imp_amend_no
FROM fe_import_lc l1
WHERE l1.transmit_lc_no = '108615020048'
order by imp_amend_no desc
)
where rownum < 2
Try something like below, where l1 would be your FE_IMPORT_LC table. Better to create a view with the logic of l2 table given below and then select.
with l1(TRANSMIT_LC_NO, LIAB_AMT_LCY, REM_LC_AMT_LCY, IMP_AMEND_NO) as(
select 108615020048,10022000,10022112,00 from dual union
select 108615020048,10022000,10022112,01 from dual union
select 108615020048,10022000,10022112,02 from dual union
select 108615020048,10022000,10022112,03 from dual
), l2 as(
select l1.*,row_number() over (partition by TRANSMIT_LC_NO order by IMP_AMEND_NO desc) as rno from l1)
select TRANSMIT_LC_NO, LIAB_AMT_LCY,REM_LC_AMT_LCY,IMP_AMEND_NO from l2
where rno=1;
If 2 rows have same max(IMP_AMEND_NO ) and if you want both, use below query(instead of row_number, I am using rank here. Rest same.
with l1(TRANSMIT_LC_NO, LIAB_AMT_LCY, REM_LC_AMT_LCY, IMP_AMEND_NO) as(
select 108615020048,10022000,10022112,00 from dual union all
select 108615020048,10022000,10022112,01 from dual union all
select 108615020048,10022000,10022112,03 from dual union all
select 108615020048,10022000,10022112,03 from dual
), l2 as(
select l1.*,rank() over (partition by TRANSMIT_LC_NO order by IMP_AMEND_NO desc) as rno from l1)
select TRANSMIT_LC_NO, LIAB_AMT_LCY,REM_LC_AMT_LCY,IMP_AMEND_NO from l2
where rno=1;
Here you dont have to specify TRANSMIT_LC_NO explicitely. If you have many records, then also you can get only row corresponding to max(IMP_AMEND_NO). But if you want to use this is a PL/SQL block, then put the TRANSMIT_LC_NO in the where clause in the select query from FE_IMPORT_LC and proceed like below.
You can try this, I don't have environment currently to test syntax error. However, I think with little modification it should work fine
select * from
(
select TRANSMIT_LC_NO, LIAB_AMT_LCY, REM_LC_AMT_LCY, IMP_AMEND_NO,
row_number() over(partition by transmit_lc_no order by imp_amend_no desc) as MAX_ID
from fe_import_lc
)
t where t.MAX_ID=1
and T.TRANSMIT_LC_NO = '108615020048';

How can i get rid of 'ORA-01489: result of string concatenation is too long' in this query?

this query gets the dominating sets in a network. so for example given a network
A<----->B
B<----->C
B<----->D
C<----->E
D<----->C
D<----->E
F<----->E
it returns
B,E
B,F
A,E
but it doesn't work for large data because i'm using string methods in my result. i have been trying to remove the string methods and return a view or something but to no avail
With t as (select 'A' as per1, 'B' as per2 from dual union all
select 'B','C' from dual union all
select 'B','D' from dual union all
select 'C','B' from dual union all
select 'C','E' from dual union all
select 'D','C' from dual union all
select 'D','E' from dual union all
select 'E','C' from dual union all
select 'E','D' from dual union all
select 'F','E' from dual)
,t2 as (select distinct least(per1, per2) as per1, greatest(per1, per2) as per2 from t union
select distinct greatest(per1, per2) as per1, least(per1, per2) as per1 from t)
,t3 as (select per1, per2, row_number() over (partition by per1 order by per2) as rn from t2)
,people as (select per, row_number() over (order by per) rn
from (select distinct per1 as per from t union
select distinct per2 from t)
)
,comb as (select sys_connect_by_path(per,',')||',' as p
from people
connect by rn > prior rn
)
,find as (select p, per2, count(*) over (partition by p) as cnt
from (
select distinct comb.p, t3.per2
from comb, t3
where instr(comb.p, ','||t3.per1||',') > 0 or instr(comb.p, ','||t3.per2||',') > 0
)
)
,rnk as (select p, rank() over (order by length(p)) as rnk
from find
where cnt = (select count(*) from people)
order by rnk
) select distinct trim(',' from p) as p from rnk where rnk.rnk = 1`
One of Oracle's limits is that SQL cannot handle VARCHAR2 bigger than 4000 characters. If you attempt to return a string exceeding this size it hurls ORA-01489. Ideally you should try to break down the resultset into multiple small rows. Alternatively you could return it as a CLOB.
edit
how i can return the above as a CLOB
Hmm...
Having looked closely at your code I think the only place which is going to hurl ORA-1489 is this line:
select sys_connect_by_path(per,',')||',' as p
from people
It would be easy to wrap that call in TO_CLOB(). Unfortunately turning P into a CLOB breaks some of the subsequent processing ('distinct p,partition by p`) so it probably isn't an option. Sorry.
As for other workarounds....
Does your site have a license for Oracle Spatial? I know not many sites do, but if yours is one of the lucky ones (and you're using 10gR2 or higher) then you should check out Oracle Spatial Network Data Model (PDF).
Otherwise, if there is no way to restrict the output of the sys_connect_by_path() call, you might just have to implement this in PL/SQL. You can use a PIPELINED FUNCTION to return the final output so you can still call it from a SELECT statement.
In my experience you do not want to do complex string handling in large, complicated queries, and this query is quite complicated. I would guess that this problem very well could benefit from a rethink and a different approach rather than an optimization of the existing query.
What does the underlying tables look like and what exactly are you trying to achieve? Is it possible to alter the data model?