Quartile in Oracle SQL - sql

I have a table
table1
member
10010
10020
10030
10040
10050
10060
10070
10080
10090
10100
I want to divide the 10 rows into 4 buckets. I did the following:
select a.*, NTILE(4) over(order by member) as segment1 from table1 a
order by member;
But this gives me equal distribution in 4 buckets.
I would like to have decreasing 4 buckets. 1st one to have 40% then 30%, then 20%, then 10%.
Output should be:
member segment1
10010 1
10020 1
10030 1
10040 1
10050 2
10060 2
10070 2
10080 3
10090 3
10100 4
How can I achieve it using Oracle SQL?

One simple approach would be to use ROW_NUMBER along with a CASE expression:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (ORDER BY member) / COUNT(*) OVER () rn
FROM table1 t
)
SELECT
member,
CASE WHEN rn <= 0.4 THEN 1
WHEN rn <= 0.7 THEN 2
WHEN rn <= 0.9 THEN 3
ELSE 4 END AS segment1
FROM cte
ORDER BY
member;
Demo

I would use row_number() and count() for this:
select a.*,
(case when row_number() over (order by member) <= 0.4 * count(*) over () then 1
when row_number() over (order by member) <= 0.7 * count(*) over () then 2
when row_number() over (order by member) <= 0.9 * count(*) over () then 3
else 4
end) as segment
from table1 a
order by member;
Of course, you can also use ntile():
select a.*,
(case when ntile(10) over (order by member) <= 4 then 1
when ntile(10) over (order by member) <= 7 then 2
when ntile(10) over (order by member) <= 9 then 3
else 4
end) as segment
from table1 a
order by member;

Related

Assign weighted value to rows using SQL

I have a table of customers that I want to assign a test group. I want to assign a test group based on weighted values.
Example:
Group 1 - 50%
Group 2 - 25%
Group 3 - 20%
Group 4 - 5%
Result:
customer_id
group
1
group 1
2
group 4
3
group 1
4
group 2
5
group 1
6
group 1
7
group 2
8
group 1
9
group 3
10
group 1
If you shuffle the rows randomly you can then split based on the cumulative frequencies. While this works neatly when the fractions are neat I'm not sure if this will meet the most general case you've got.
with data as (
select *,
count(*) over () * 1.0 as cnt,
row_number() over (order by random()) * 1.0 as rn
from T
)
select customer_id,
case when rn / cnt <= 0.50 then 'Group 1'
when rn / cnt <= 0.75 then 'Group 2'
when rn / cnt <= 0.95 then 'Group 3'
when rn / cnt <= 1.00 then 'Group 4'
end as grp
from data;

How to rank groups of data?

Given the following, and tasked with ranking the raw data by the SUM(volume) within each group:
group_id volume
1 2
1 3
2 5
3 1
3 3
How can I obtain the following?
group_id volume group_volume rank
1 2 5 1
1 3 5 1
2 5 5 2
3 1 4 3
3 3 4 3
I can get group_volume easily, but am struggling on how to break the ties in rank without grouping by + ranking in a separate subquery and joining in.
SELECT *
, SUM(volume) OVER (PARTITION BY group_id) AS grouped_volume
, ... AS rank
FROM groups
Use CTE and Dense_rank
WITH CTE1 AS (SELECT group_id, volume,
sum(volume) over(partition by group_id) group_volume
from table1)
SELECT A.*, dense_rank() over( order by group_id, group_volume) rank FROM CTE1 A;
Use two levels of window functions for this:
select g.*,
dense_rank() over (order by group_volume desc, group_id) as rank
from (select g.*,
sum(volume) over (partition by group_id) as group_volume
from groups g
) g;
There is no need for a JOIN.

How to select top 2 values for each id

I have a table with values
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
1 1 "2015-01-01"
1 1 "2015-01-01"
2 7 "2015-01-05"
2 6 "2015-01-04"
2 4 "2015-01-03"
3 11 "2015-01-08"
3 10 "2015-01-07"
3 9 "2015-01-06"
3 8 "2015-01-05"
I want to select top two values of each id as shown in desired output.
Desired output:
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
2 7 "2015-01-05"
2 6 "2015-01-04"
3 11 "2015-01-08"
3 10 "2015-01-07"
My attempt:
can someone help me with this. Thank you in advance!
select transactions.salesperson_id, transactions.id, transactions.date
from transactions
ORDER BY transactions.salesperson_id ASC, transactions.date DESC;
This can be done using window functions:
select id, sales, "date"
from (
select id, sales, "date",
dense_rank() over (partition by id order by "date" desc) as rnk
from transactions
) t
where rnk <= 2;
If there are multiple rows on the same date this might return more than two rows for the same ID. If you don't want that, use row_number() instead of dense_rank()
row_number() will get what you want.
select * from
(select row_number() over (partition by id order by date) as rn, sales, date from transactions) t1
where t1.rn <= 2

Get MAX value of each record in group by query

I have a query in SQL looks like that:
select fldCustomer, fldTerminal, COUNT(fldbill)
from tblDataBills
group by fldCustomer, fldTerminal
order by fldCustomer
results looks like:
fldCustomer fldTerminal (number of bills)
0 1 19086
0 2 10
0 5 236
1 1 472
1 5 3
1 500 19
2 1 292
2 500 22
how can i get the MAX count of each customer so i get results like
0 1 19086
1 1 472
2 1 292
Thanks in advance!
Use a subquery with row_number():
select fldCustomer, fldTerminal, cnt
from (select fldCustomer, fldTerminal, COUNT(*) as cnt,
row_number() over (partition by fldCustomer order by count(*) desc) as seqnum
from tblDataBills
group by fldCustomer, fldTerminal
) db
where seqnum = 1
order by fldCustomer ;
Note that in the event of ties, this will arbitrarily return one of the rows. If you want all of them, then use rank() or dense_rank().
This might require a little trickery with the RANK() function
SELECT fldCustomer, fldTerminal, [(number of bills)]
FROM (
SELECT fldCustomer, fldTerminal, COUNT(fldbill) [(number of bills)],
RANK() OVER (PARTITION BY fldCustomer ORDER BY COUNT(fldbill) DESC) Ranking
FROM tblDataBills
GROUP BY fldCustomer, fldTerminal
) a
WHERE Ranking = 1

Select and aggregate last records base on order

I have different versions of the charges in a table. I want to grab and sum the last charge grouped by Type.
So I want to add 9.87, 9.63, 1.65.
I want the Parent ID , sum(9.87 + 9.63 + 1.65) as the results of this query.
We use MSSQL
ID ORDER CHARGES TYPE PARENT ID
1 1 6.45 1 1
2 2 1.25 1 1
3 3 9.87 1 1
4 1 6.54 2 1
5 2 5.64 2 1
6 3 0.84 2 1
7 4 9.63 2 1
8 1 7.33 3 1
9 2 5.65 3 1
10 3 8.65 3 1
11 4 5.14 3 1
12 5 1.65 3 1
WITH recordsList
AS
(
SELECT Type, Charges,
ROW_NUMBER() OVER (PArtition BY TYPE
ORDER BY [ORDER] DESC) rn
FROM tableName
)
SELECT SUM(Charges) totalCharge
FROM recordsLIst
WHERE rn = 1
SQLFiddle Demo
Use row_number() to identify the rows to be summed, and then sum them:
select SUM(charges)
from (select t.*,
ROW_NUMBER() over (PARTITION by type order by id desc) as seqnum
from t
) t
where seqnum = 1
Alternatively you could use a window aggregate MAX():
SELECT SUM(Charges)
FROM (
SELECT
[ORDER],
Charges,
MaxOrder = MAX([ORDER]) OVER (PARTITION BY [TYPE])
FROM atable
) s
WHERE [ORDER] = MaxOrder
;
SELECT t.PARENT_ID, SUM(t.CHARGES)
FROM dbo.test73 t
WHERE EXISTS (
SELECT 1
FROM dbo.test73
WHERE [TYPE] = t.[TYPE]
HAVING MAX([ORDER]) = t.[ORDER]
)
GROUP BY t.PARENT_ID
Demo on SQLFiddle