Assign weighted value to rows using SQL - sql

I have a table of customers that I want to assign a test group. I want to assign a test group based on weighted values.
Example:
Group 1 - 50%
Group 2 - 25%
Group 3 - 20%
Group 4 - 5%
Result:
customer_id
group
1
group 1
2
group 4
3
group 1
4
group 2
5
group 1
6
group 1
7
group 2
8
group 1
9
group 3
10
group 1

If you shuffle the rows randomly you can then split based on the cumulative frequencies. While this works neatly when the fractions are neat I'm not sure if this will meet the most general case you've got.
with data as (
select *,
count(*) over () * 1.0 as cnt,
row_number() over (order by random()) * 1.0 as rn
from T
)
select customer_id,
case when rn / cnt <= 0.50 then 'Group 1'
when rn / cnt <= 0.75 then 'Group 2'
when rn / cnt <= 0.95 then 'Group 3'
when rn / cnt <= 1.00 then 'Group 4'
end as grp
from data;

Related

How to calculate average omitting duplicates per partition in SQL?

I want to calculate the average item count accounting for sub-partitions in each partition.
Sample Data:
id session item_count random_field_1
1 weoifn2 3 A
1 weoifn2 3 B
1 iuboiwe 2 K
2 oeino33 5 R
2 vergeeg 8 C
2 feooinn 9 P
2 feooinn 9 M
Logic:
id = 1: (3 + 2) / 2 = 2.5
id = 2: (5 + 8 + 9) / 3 = 7.33
Expected Output:
id avg
1 2.5
2 7.33
My Query:
SELECT
id
, AVG(item_count) OVER (PARTITION BY id) AS avg
FROM my_table
However, I believe this will factor in duplicates twice, which is unintended. How can I fix my query to only consider one item_count value per session?
Consider below approach
select id, avg(item_count) as avg
from (
select distinct id, session, item_count
from your_table
)
group by id
if applied to sample data in your question - output is
SELECT id, AVG(item_count) OVER (PARTITION BY id) AS avg
FROM (
SELECT
id
, CASE
WHEN ROW_NUMBER OVER (PARTITION BY id) = 1
THEN item_count
ELSE NULL
END
AS item_count
FROM my_table
)

How to select top 2 values for each id

I have a table with values
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
1 1 "2015-01-01"
1 1 "2015-01-01"
2 7 "2015-01-05"
2 6 "2015-01-04"
2 4 "2015-01-03"
3 11 "2015-01-08"
3 10 "2015-01-07"
3 9 "2015-01-06"
3 8 "2015-01-05"
I want to select top two values of each id as shown in desired output.
Desired output:
id sales date
1 5 "2015-01-04"
1 3 "2015-01-03"
2 7 "2015-01-05"
2 6 "2015-01-04"
3 11 "2015-01-08"
3 10 "2015-01-07"
My attempt:
can someone help me with this. Thank you in advance!
select transactions.salesperson_id, transactions.id, transactions.date
from transactions
ORDER BY transactions.salesperson_id ASC, transactions.date DESC;
This can be done using window functions:
select id, sales, "date"
from (
select id, sales, "date",
dense_rank() over (partition by id order by "date" desc) as rnk
from transactions
) t
where rnk <= 2;
If there are multiple rows on the same date this might return more than two rows for the same ID. If you don't want that, use row_number() instead of dense_rank()
row_number() will get what you want.
select * from
(select row_number() over (partition by id order by date) as rn, sales, date from transactions) t1
where t1.rn <= 2

Quartile in Oracle SQL

I have a table
table1
member
10010
10020
10030
10040
10050
10060
10070
10080
10090
10100
I want to divide the 10 rows into 4 buckets. I did the following:
select a.*, NTILE(4) over(order by member) as segment1 from table1 a
order by member;
But this gives me equal distribution in 4 buckets.
I would like to have decreasing 4 buckets. 1st one to have 40% then 30%, then 20%, then 10%.
Output should be:
member segment1
10010 1
10020 1
10030 1
10040 1
10050 2
10060 2
10070 2
10080 3
10090 3
10100 4
How can I achieve it using Oracle SQL?
One simple approach would be to use ROW_NUMBER along with a CASE expression:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (ORDER BY member) / COUNT(*) OVER () rn
FROM table1 t
)
SELECT
member,
CASE WHEN rn <= 0.4 THEN 1
WHEN rn <= 0.7 THEN 2
WHEN rn <= 0.9 THEN 3
ELSE 4 END AS segment1
FROM cte
ORDER BY
member;
Demo
I would use row_number() and count() for this:
select a.*,
(case when row_number() over (order by member) <= 0.4 * count(*) over () then 1
when row_number() over (order by member) <= 0.7 * count(*) over () then 2
when row_number() over (order by member) <= 0.9 * count(*) over () then 3
else 4
end) as segment
from table1 a
order by member;
Of course, you can also use ntile():
select a.*,
(case when ntile(10) over (order by member) <= 4 then 1
when ntile(10) over (order by member) <= 7 then 2
when ntile(10) over (order by member) <= 9 then 3
else 4
end) as segment
from table1 a
order by member;

Assign column value based on the percentage of rows

In DB2 is there a way to assign a column value based on the first x%, then y% and remaining z% of rows?
I've tried using row_number() function but no luck!
Example below
Assuming that the below example count(id) is already arranged in descending order
Input:
ID count(id)
5 10
3 8
1 5
4 3
2 1
Output:
First 30% rows of the above input should be assigned code H, last 30% of the rows will have code L and remaining will have code M. If 30% of rows evaluates to decimal then round up-to 0 decimal place.
ID code
5 H
3 H
1 M
4 L
2 L
You can use window functions:
select t.id,
(case ntile(3) over (order by count(id) desc)
when 1 then 'H'
when 2 then 'M'
when 3 then 'L'
end) as grp
from t
group by t.id;
This puts them into equal sized groups.
For 30-40-30% split with your conditions, you have to be more careful:
select t.id,
(case when (seqnum - 1.0) < 0.3 * cnt then 'H'
when (seqnum + 1.0) > 0.7 * cnt then 'L'
else 'M'
end) as grp
from (select t.id,
count(*) as cnt,
count(*) over () as num_ids,
row_number() over (order by count(*) desc) as seqnum
from t
group by t.id
) t
Try this:
with t(ID, count_id) as (values
(5, 10)
, (3, 8)
, (1, 5)
, (4, 3)
, (2, 1)
)
select t.*
, case
when pst <=30 then 'H'
when pst <=70 then 'M'
else 'L'
end as code
from
(
select t.*
, rownumber() over (order by count_id desc) as rn
, 100*rownumber() over (order by count_id desc)/nullif(count(1) over(), 0) as pst
from t
) t;
The result is:
ID COUNT_ID RN PST CODE
-- -------- -- --- ----
5 10 1 20 H
3 8 2 40 M
1 5 3 60 M
4 3 4 80 L
2 1 5 100 L

Select and aggregate last records base on order

I have different versions of the charges in a table. I want to grab and sum the last charge grouped by Type.
So I want to add 9.87, 9.63, 1.65.
I want the Parent ID , sum(9.87 + 9.63 + 1.65) as the results of this query.
We use MSSQL
ID ORDER CHARGES TYPE PARENT ID
1 1 6.45 1 1
2 2 1.25 1 1
3 3 9.87 1 1
4 1 6.54 2 1
5 2 5.64 2 1
6 3 0.84 2 1
7 4 9.63 2 1
8 1 7.33 3 1
9 2 5.65 3 1
10 3 8.65 3 1
11 4 5.14 3 1
12 5 1.65 3 1
WITH recordsList
AS
(
SELECT Type, Charges,
ROW_NUMBER() OVER (PArtition BY TYPE
ORDER BY [ORDER] DESC) rn
FROM tableName
)
SELECT SUM(Charges) totalCharge
FROM recordsLIst
WHERE rn = 1
SQLFiddle Demo
Use row_number() to identify the rows to be summed, and then sum them:
select SUM(charges)
from (select t.*,
ROW_NUMBER() over (PARTITION by type order by id desc) as seqnum
from t
) t
where seqnum = 1
Alternatively you could use a window aggregate MAX():
SELECT SUM(Charges)
FROM (
SELECT
[ORDER],
Charges,
MaxOrder = MAX([ORDER]) OVER (PARTITION BY [TYPE])
FROM atable
) s
WHERE [ORDER] = MaxOrder
;
SELECT t.PARENT_ID, SUM(t.CHARGES)
FROM dbo.test73 t
WHERE EXISTS (
SELECT 1
FROM dbo.test73
WHERE [TYPE] = t.[TYPE]
HAVING MAX([ORDER]) = t.[ORDER]
)
GROUP BY t.PARENT_ID
Demo on SQLFiddle