Find the common value over partition - sql

I have a table like this :
Loan_Num asset LTV
1 20 0.2
2 20 0.2
3 20 0.12
4 20 0.2
5 10 0.3
6 10 0.3
7 10 0.22
8 10 0.3
And I want to add a common value to this table by the group of asset.
Loan_Num asset LTV cV
1 20 0.2 0.2
2 20 0.2 0.2
3 20 0.12 0.2
4 20 0.2 0.2
5 10 0.3 0.3
6 10 0.3 0.3
7 10 0.22 0.3
8 10 0.3 0.3
Any suggestions how to do this? is there a built in function for common value?

One way of doing this would be
WITH CTE1
AS (SELECT *,
COUNT(*) OVER (PARTITION BY [asset], [LTV]) AS C
FROM YourTable),
CTE2
AS (SELECT *,
RANK() OVER (PARTITION BY [asset] ORDER BY C DESC, [LTV] DESC) AS R
FROM CTE1)
SELECT [Loan_Num],
[asset],
[LTV],
MAX(CASE
WHEN R = 1
THEN [LTV]
END) OVER (PARTITION BY [asset]) AS cV
FROM CTE2
Demo
Though actually this would be slightly more efficient as it removes a sort
WITH CTE1
AS (SELECT *,
COUNT(*) OVER (PARTITION BY [asset], [LTV]) AS C
FROM YourTable),
CTE2
AS (SELECT *,
MAX(C) OVER (PARTITION BY [asset]) AS MaxC
FROM CTE1)
SELECT [Loan_Num],
[asset],
[LTV],
MAX(CASE
WHEN C = MaxC
THEN [LTV]
END) OVER (PARTITION BY [asset]) AS cV
FROM CTE2

Related

Select TOP 3 values with their respective groups and values

So i have a table like this
user
GROUP
VALUE
A
G1
0.9
A
G2
0.8
A
G3
0.3
A
G4
0.7
B
G1
0.9
B
G2
0.8
B
G3
0.7
C
G1
0.9
C
G2
0.8
and need to get to something like this
user
first_G
Fir_G_val
second_G
sec_G_val
third_G
thi_G_val
A
G1
0.9
G2
0.8
G4
0.7
B
G1
0.9
G2
0.8
G3
0.7
C
G1
0.9
G2
0.8
NULL
NULL
I tried in different ways, none worked out for me (Guided by this post)
This operation is called pivot and can be carried out by:
selecting the ranking for each group, using the ROW_NUMBER window function
extracting group and value alternatively, for each of the new field
aggregating values on each user
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY user_ ORDER BY VALUE DESC) AS rn
FROM tab
)
SELECT user_,
MAX(CASE WHEN rn = 1 THEN GROUP_ END) AS first_G,
MAX(CASE WHEN rn = 1 THEN VALUE_ END) AS first_G_val,
MAX(CASE WHEN rn = 2 THEN GROUP_ END) AS second_G,
MAX(CASE WHEN rn = 2 THEN VALUE_ END) AS second_G_val,
MAX(CASE WHEN rn = 3 THEN GROUP_ END) AS third_G,
MAX(CASE WHEN rn = 3 THEN VALUE_ END) AS third_G_val
FROM cte
GROUP BY user_

Finding categories that make up 70% of total turnover in SQL

I want to find the categories that make up a certain percentage of sales, and segment them according to their percentages in SQL. To do this, I must first sort them by their revenues in descending order and then select the top N percent. For example, if the total revenue is 20M:
Category Revenue
1 6.000.000
2 4.000.000
3 4.000.000
4 3.000.000
5 1.500.000
6 500.000
7 400.000
8 300.000
9 200.000
10 100.000
Total 20.000.000
-Categories that make up 70% (14M) of revenue - segment A
-Categories that make up 15% (3M) - segment B
-Categories that make up 10% (2M) - segment C
-Categories that make up 5% (1M) - segment D
So, the segments should be like this:
Category Segment
1 A
2 A
3 A
4 B
5 C
6 C
7 D
8 D
9 D
10 D
I'm sure there's a simpler way of getting this result, but here's one [rather long] query that classifies the categories in segments according to your logic:
with
q as (
select *,
sum(revenue) over(order by revenue desc) as acc_revenue,
sum(revenue) over() as tot_revenue
from t
),
a as (
select * from q where acc_revenue <= 0.7 * tot_revenue
),
b as (
select *
from q
where acc_revenue - (select max(acc_revenue) from a) <= 0.15 * tot_revenue
and category not in (select category from a)
),
c as (
select *
from q
where acc_revenue - (select max(acc_revenue) from b) <= 0.10 * tot_revenue
and category not in (select category from a)
and category not in (select category from b)
),
d as (
select *
from q
where category not in (select category from a)
and category not in (select category from b)
and category not in (select category from c)
)
select *, 'A' as segment from a
union all select *, 'B' from b
union all select *, 'C' from c
union all select *, 'D' from d
Result:
category revenue acc_revenue tot_revenue segment
--------- -------- ------------ ------------ -------
1 6000000 6000000 20000000 A
2 4000000 14000000 20000000 A
3 4000000 14000000 20000000 A
4 3000000 17000000 20000000 B
5 1500000 18500000 20000000 C
6 500000 19000000 20000000 C
7 400000 19400000 20000000 D
8 300000 19700000 20000000 D
9 200000 19900000 20000000 D
10 100000 20000000 20000000 D
See running example at DB Fiddle.
This answer uses window functions like the previous answer, but includes a CASE expression instead of multiple CTEs.
WITH
cte AS (
SELECT category, revenue,
sum(revenue) OVER(ORDER BY revenue DESC, category)*1. /*multiply by 1. (or use CAST) to avoid integer truncation in immediate next step*/
/sum(revenue) OVER() RunTtlPct /*Running total of sales, as a percent of the grand total*/
FROM t)
SELECT category, revenue,
CASE
WHEN RunTtlPct <= 0.7 THEN 'A'
WHEN RunTtlPct <= 0.85 THEN 'B'
WHEN RunTtlPct <= 0.95 THEN 'C'
ELSE 'D'
END Segment
FROM cte /*cte was included to avoid repeating RunTtlPct's expression in every WHEN clause.*/

How to select 70 percent of a column based on condition in SQL?

This is my existing table data
C1 C2 C3
1 A 1
2 B 1
3 C 0
4 D 0
5 E 0
6 F 0
7 G 1
8 H 1
9 I 1
10 J 0
I want to get this. What I am trying is I want to select 70% C3 column with value 1. In total the C3 has five ones. So 70% of 5 is 3.5 which is 4 ones. So I want to get my final dataset with 70 percent of ones in C3
C1 C2 C3
1 A 1
2 B 1
3 C 0
4 D 0
5 E 0
7 G 1
8 H 1
Here is the answer
select *
from
(SELECT *,
(SELECT SUM(C3) FROM table_name t1 WHERE t1.C1 <= t.C1) AS cumulative_sum,
(select sum(C3) from table_name) as total_sum
FROM table_name t) t
where (cumulative_sum - C3) < 0.8 * total_sum
Hmmm. You don't seem to want a random selection. They seem to be ordered by col1. So, you can calculate this as:
select t.*
from (select t.*,
sum(case when col3 = 1 then 1 else 0 end) over (order by col1) as running_col3,
sum(case when col3 = 1 then 1 else 0 end) over () as total_col3
from t
) t
where running_col3 >= 0.8 * total_col3 and
(running_col3 - col3) < 0.8 * total_col3;
Note: If col3 has only 0 and 1, you can simplify the above to:
select t.*
from (select t.*,
sum(col3) over (order by col1) as running_col3,
sum(col3) over () as total_col3
from t
) t
where running_col3 >= 0.8 * total_col3 and
(running_col3 - col3) < 0.8 * total_col3

Assigning categories based on percentage within a group in SQL

Suppose I have a table like this:
CampaignId Category Strike
1 A 2
1 B 3
1 Others 5
2 A 4
2 B 2
3 C 1
3 C 4
4 A 1
4 B 1
4 C 1
4 D 1
4 Others 1
Then, I would calculate percentage of Strike for each Category by CampaignId like this:
SELECT CampaignId, Category, Strike, (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable
resulting in the intermediate table below:
CampaignId Category Strike PercentageOfStrikesByCategoryByCampaignId
1 A 2 20.0
1 B 3 30.0
1 Others 5 50.0
2 A 4 66.6
2 B 2 33.3
3 C 1 20.0
3 C 4 80.0
4 A 1 20.0
4 B 1 20.0
4 C 1 20.0
4 D 1 20.0
4 Others 1 20.0
Now, I would like to assign a final label, say FinalCategory based on the PercentageOfStrikesByCategoryByCampaignId calculated above. The gist of the criteria for FinalCategory is: if one of the categories in each CampaignId is 'Others' AND is PercentageOfStrikesByCategoryByCampaignId >= 30.0, then the rest of the rows in that CampaignId group will be labeled 'Others'. Else, we copy Category directly into FinalCategory. The result table should look like this:
CampaignId Category Strike PercentageOfStrikesByCategoryByCampaignId FinalCategory
1 A 2 20.0 Others
1 B 3 30.0 Others
1 Others 5 50.0 Others
2 A 4 66.6 A
2 B 2 33.3 B
3 C 1 20.0 C
3 C 4 80.0 C
4 A 1 20.0 A
4 B 1 20.0 B
4 C 1 20.0 C
4 D 1 20.0 D
4 Others 1 20.0 Others
How could I achieve such thing using as simple SQL query as possible? Thank you in advance for your help!
SELECT CampaignId, Category, Strike, PercentageOfStrikesByCategoryByCampaignId,
CASE WHEN Others_count > 0 AND
MAX(CASE WHEN Category='Others' THEN PercentageOfStrikesByCategoryByCampaignId END) OVER (PARTITION BY CampaignId) >= 30 THEN 'Others'
ELSE Category END AS FinalCategory
FROM (
SELECT CampaignId, Category, Strike,
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId)
/ SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
,SUM(CASE WHEN Category='Others' THEN 1 ELSE 0 END) OVER (PARTITION BY CampaignId) as Others_count
FROM myTable
) T
Added to the existing query are
Others_Count for each campaignId with a sum window function
Use a case expression with calculated Others_Count and max window function to check if the row with Others category has percentage >= 30 and assign 'Others' as final category else use the category as-is.
Let's start with your query as a CTE or subquery:
WITH t as (
SELECT CampaignId, Category, Strike,
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable
)
select t.*,
(case when OthersFlag = 1 then 'Others' else category end) as FinalCategory
from (select t.*,
sum(case when category = 'Others' and PercentageOfStrikesByCategoryByCampaignId > 30.0 then 1 else 0 end) over
(partition by campaignid) as OthersFlag
from t
) t;

sql query to get for each item the line of a group that has the maximum occurence

the question is difficult to summarize in the title, so here a more verbose example:
I have a huge dataset of dozens of measurements for thousands of different objects. Most of them have an associated type but this type is not unambiguous.
So a Select like
SELECT oid, type, count(type) FROM data GROUP BY oid, type;
will produce something like:
oid type count(type)
0 0 22
1 0 22
2 1 61
2 2 104
3 2 63
4 0 34
6 0 1
8 2 76
9 0 1
11 3 33
12 0 55
13 4 1
13 5 28
13 1 2
13 2 255
14 4 148
14 1 4
14 2 3
15 3 10
16 0 13
18 4 137
18 1 5
How can i get only one line per object to the result if this only line has to be the one with the most occurences?
Bonus-Question: also get a percentage per object line that represents the occurrence ratio of this type.
The result should look like:
oid type P(type)
0 0 1.0
1 0 1.0
2 2 0.64
3 2 1.0
4 0 1.0
6 0 1.0
8 2 1.0
9 0 1.0
11 3 1.0
12 0 1.0
13 2 0.89
14 4 0.95
15 3 1.0
16 0 1.0
18 4 0.96
edit:
some test data and the almost-correct output of one solution:
http://pastebin.com/jVvHErJ2
This query solves both your problems
SELECT s.oid,
s.type,
s.total_per_oid_per_type,
(s.total_per_oid_per_type + 0.0) / s.total_per_oid AS percentage
FROM (SELECT v.oid,
v.type,
v.total_per_oid_per_type,
ROW_NUMBER() OVER (PARTITION BY v.oid ORDER BY v.total_per_oid_per_type DESC) AS object_number,
SUM(v.total_per_oid_per_type) OVER (PARTITION BY v.oid) AS total_per_oid
FROM (SELECT t.oid, t.type, count(1) AS total_per_oid_per_type
FROM data t
GROUP BY t.oid, t.type) v ) s
WHERE object_number = 1
Solution special for Sqlite3 (equals to above)
WITH v AS (
SELECT oid,
type,
COUNT(1) AS total_per_oid_per_type
FROM data
GROUP BY oid, type
),
s AS (
SELECT oid,
MAX(total_per_oid_per_type) AS max_total_per_oid
FROM v
GROUP BY oid
),
totals AS (
SELECT oid,
SUM(total_per_oid_per_type) AS total_per_oid
FROM v
GROUP BY oid
)
SELECT v.oid,
v.type,
v.total_per_oid_per_type,
(v.total_per_oid_per_type + 0.0) / totals.total_per_oid AS percentage
FROM v
INNER JOIN s ON v.oid = s.oid AND v.total_per_oid_per_type = s.max_total_per_oid
INNER JOIN totals ON v.oid = totals.oid
ORDER BY v.oid, v.type
Try this it should work
create table ##TBL (oid INT, [type] INT, [count(type)] INT)
INSERT INTO ##TBL VALUES
(0,0,22),
(1,0,22),
(2,1,61),
(2,2,104),
(3,2,63),
(4,0,34),
(6,0,1),
(8,2,76),
(9,0,1),
(11,3,33),
(12,0,55),
(13,4,1),
(13,5,28),
(13,1,2),
(13,2,255),
(14,4,148),
(14,1,4),
(14,2,3),
(15,3,10),
(16,0,13),
(18,4,137),
(18,1,5)
--------------------------------
SELECT oid
,max([type]) as x
--,Max([count(type)]) AS [count(type)]
,CAST( CAST( MAX([count(type)]) AS DECIMAL(10,2) ) / CAST( SUM([count(type)]) AS DECIMAL(10,2) ) AS DECIMAL(10,2) ) AS 'Percent %'
from ##TBL
group by oid