How to get a percentile rank based on a computation - sql

there are four tables as :
T_SALES has columns like
CUST_KEY,
ITEM_KEY,
SALE_DATE,
SALES_DLR_SALES_QTY,
ORDER_QTY.
T_CUST has columns like
CUST_KEY,
CUST_NUM,
PEER_GRP_ID
T_PEER_GRP has columns like
PEER_GRP_ID,
PEER_GRP_DESC,
PRNT_PEER_GRP_ID
T_PRNT_PEEER has columns like
PRNT_PEER_GRP_ID,
PRNT_PEER_DESC
Now for the above tables, i need to generate a percentile rank of the customer based on the computation fillrate = SALES_QTY / ORDER_QTY * 100 by peer group within a parent peer.
could someone please help on this?

You can use the analytic function PERCENT_RANK() to calculate the percentile rank, as below:
SELECT
t_s.cust_key,
t_c.cust_num,
PERCENT_RANK() OVER (ORDER BY (t_s.SALES_DLR_SALES_QTY / ORDER_QTY) DESC) as pr
FROM t_sales t_s
INNER JOIN t_cust t_c ON t_s.cust_key = t_c.cust_key
ORDER BY pr;
Reference:
PERCENT_RANK on Oracle® Database SQL Reference

If by "percentile rank" you mean "percent rank" (documented here), then the harder part is the joins. I think this is the basic data that you want for the percentile rank:
select t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID,
sum(SALES_DLR_SALES_QTY * ORDER_QTY) as total
from t_sales s join
t_customers c
on s.CUST_KEY = c.cust_key join
t_peer_grp t
on t.PEER_GRP_ID = c.PEER_GRP_ID
group by t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID;
You can then calculate the percentile (0 to 100) as:
select t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID,
sum(SALES_DLR_SALES_QTY * ORDER_QTY) as total,
percentile_rank() over (partition by t.PRNT_PEER_GRP_ID
order by sum(SALES_DLR_SALES_QTY * ORDER_QTY)
)
from t_sales s join
t_customers c
on s.CUST_KEY = c.cust_key join
t_peer_grp t
on t.PEER_GRP_ID = c.PEER_GRP_ID
group by t.PEER_GRP_ID, t.PRNT_PEER_GRP_ID;
Note that this mixes analytic functions with aggregation functions. This can look awkward when you first learn about it.

Related

Group by after a partition by in MS SQL Server

I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;

SQL: Aggregations over partition by

Query: Considering only Italian routes, for each category of goods and for each year
select the average daily income for each month
the total monthly income since the beginning of the year
SQL:
SELECT
gc.GoodCategory,
tm.Month,
tm.Year,
SUM(ro.Income) / COUNT(DISTINCT tm.Date),
SUM(ro.Income) OVER (PARTITION BY gc.GoodCategory, tm.Year
ORDER BY tm.Month ROWS UNBOUNDED PRECEDING)
FROM FactRoutes ro,
DimLocation dp,
DimLocation ds,
DimGoodCategory gc,
DimTime tm
WHERE ro.DepartureID = dp.LocationID
AND ro.DestinationID = ds.LocationID
AND ro.GoodCategoryID = gc.GoodCategoryID
AND ro.GoodTimeID = tm.GoodTimeID
AND dp.State = 'Italy'
AND ds.State = 'Italy'
GROUP BY gc.GoodCategory,
tm.Month,
tm.Year;
But facing the below error
Column 'FactRoutes.Income' is invalid in the select list
because it is not contained in either an aggregate function
or the GROUP BY clause.
whats the better way to handle it?
I think that you want:
SELECT
gc.GoodCategory,
tm.Month,
tm.Year,
SUM(ro.Income) / COUNT(DISTINCT tm.Date),
SUM(SUM(ro.Income)) OVER (PARTITION BY gc.GoodCategory, tm.Year ORDER BY tm.Month)
FROM FactRoutes ro
INNER JOIN DimLocation dp ON ro.DepartureID = dp.LocationID
INNER JOIN DimLocation ds ON ro.DestinationID = ds.LocationID
INNER JOIN DimGoodCategory gc ON ro.GoodCategoryID = gc.GoodCategoryID
INNER JOIN DimTime tm ON ro.GoodTimeID = tm.GoodTimeID
WHERE dp.State = 'Italy' AND ds.State = 'Italy'
GROUP BY gc.GoodCategory, tm.Month, tm.Year;
The main point is that, in order to make your query is not a valid aggregate query, you need to use an aggregate function within the window function, like SUM(SUM(ro.Income)) OVER (...) instead of just SUM(ro.Income) OVER(...), so you get a window sum over the previous groups of records.
Other notable points:
always use explicit joins (with the ON keyword) rather than old-school, implicit joins (with commas in the FROM clause), whose syntax has fallen out of favor for decades
ROWS UNBOUNDED PRECEDING is not needed; your window function has an ORDER BY clause so that's what it does anyway

Showing all channel groups even without record in database

I have advertiser and channel_group columns. My code is below and the output is there as well. I want my output to contain ALL channel groups ( for instance to A-J, even if there is no value to it) How can i accomplish that? Any tips because i dont have any idea.
(SELECT
advertiser,
channel_group,
ROUND ( sum(cost) ) AS cost
FROM student_37.data_table
JOIN student_37.dict
ON student_37.data_table.audiocode = student_37.dict.audiocode
JOIN student_37.channel_group
ON student_37.data_table.medium = student_37.channel_group.channel
GROUP BY advertiser,channel_group
ORDER BY advertiser)
SELECT
advertiser,
channel_group,
cost,
ROUND (cost::decimal/SUM(cost) OVER(PARTITION BY advertiser),2) AS sos_adv ,
ROUND (cost::decimal/SUM(cost) OVER(PARTITION BY channel_group),2) AS sos_channel_group ,
ROUND(cost / ( SELECT sum(cost) FROM sos),2) AS sos
FROM sos
ORDER BY advertiser,
array_position(ARRAY['TVP1','TVP2','TVP tem','TVN','TVN tem','Polsat','Polsat tem','unknown'],channel_group);
"company1";"B";"TV";16537
"company1";"C";"TV";20406
"company1";"D";"TV";33380
"company1";"E";"TV";193633
"company1";"F";"TV";14957
"company1";"G";"TV";5338
"company2";"A";"TV";46580
"company2";"B";"TV";56223
"company2";"G";"TV";80735
"company2";"H";"TV";80874
"company2";"J";"TV";38511
I want to get something like this, i dont have any records so i need to some kind of generate them?
"company1";"A";"TV";
"company1";"B";"TV";16537
"company1";"C";"TV";20406
"company1";"D";"TV";33380
"company1";"E";"TV";193633
"company1";"F";"TV";14957
"company1";"G";"TV";5338
"company1";"I";"TV";
"company1";"J";"TV";
"company2";"A";"TV";46580
"company2";"B";"TV";56223
"company2";"C";"TV";
"company2";"D";"TV";
"company2";"E";"TV";
"company2";"F";"TV";
"company2";"G";"TV";56223
"company2";"H";"TV";80874
"company2";"I";"TV";
"company2";"J";"TV";38511
If you want all channel groups -- even those with no data -- then you want outer joins.
You don't provide sample data, but I am guessing that you want:
SELECT advertiser, channel_group,
ROUND( SUM(cost) ) AS cost
FROM student_37.channel_group cg LEFT JOIN
student_37.data_table dt
ON dt.medium = cg.channel LEFT JOIN
student_37.dict d
ON dt.audiocode = d.audiocode
GROUP BY advertiser, channel_group

getting avg of column based on the result set

I have a select statement that divides the count of sales by country, priceBanding (see example below)
The select statement looks like follows:
SELECT p.[Price Band]
,t.[Country]
,o.COUNT([Order]) as [Order Count]
FROM #price p (temp table)
INNER JOIN country t ON p.CountryCode = t.countryCode
INNER JOIN sales o ON o.salesValue >= p.startPrice and s.salesValue < p.endPrice
What i want to be able to do is based on this result i want to get an avg of the unit count i.e. For all orders that are under 20 what is the avg unit counts and the same for all others. How can i do this?
Its most likely simple but I cant think through it.
What I am after:
So as you can see, in the price band <20 in UK the order count is 50, and the avg Units of that is 2. As i mentioned earlier, I want the Avg Units of all orders that are under 20 (which is 50 in the picture).
Is that clearer?
Thanks in advance.
EDIT:
The first table: assume it to be the source
And the second table gets the avg, that's what I am after.
Wouldn't you just use avg()?
SELECT p.[Price Band], t.[Country],
o.COUNT(*) as [Order Count],
AVG(Items)
FROM #price p INNER JOIN
country t
ON p.CountryCode = t.countryCode INNER JOIN
sales o
ON o.salesValue >= p.startPrice and s.salesValue < p.endPrice
GROUP BY p.[Price Band], t.[Country]
ORDER BY t.[Country], p.[Price Band]
Note: SQL Server does integer division of integers (so 3/2 = 1 not 1.5) and similarly for AVG(). It is more accurate to use a decimal point number. An easy way is to use AVG(items * 1.0).

How to use rounded value for join in SQL

i'm just learning SQL today and i never thought how fun it's until i'm fiddling with it.
I got a problem and i need a help.
i have 2 tables, Customer and Rate, with details stated below
Customer
idcustomer = int
namecustomer = varchar
rate = decimal(3,0)
with value as described:
idcustomer---namecustomer---rate
1---JOHN DOE---100
2---MARY JANE---90
3---CLIVE BAKER---12
4---DANIEL REYES---47
Rate
rate = decimal(3,0)
description = varchar(40)
with value as described:
rate---description
10---G Rank
20---F Rank
30---E Rank
40---D Rank
50---C Rank
60---B Rank
70---A Rank
80---S Rank
90---SS Rank
100---SSS Rank
Then i ran query below in order to round all values in customer.rate field then inner join it with rate table.
SELECT *, round(rate,-1) as roundedrate
FROM customer INNER JOIN rate ON customer.roundedrate = rate.rate
It didn't produce this result:
idcustomer---namecustomer---rate---roundedrate---description
1---JOHN DOE---100---100---SSS Rank
2---MARY JANE---90---90---SS Rank
3---CLIVE BAKER---12---10---G Rank
4---DANIEL REYES---47---50---C Rank
Is there anything wrong with my code ?
Your query should produce an 'ambigious column' error because you're not specifying a table name when referring to rate (in round(rate,-1)), which exists in both tables.
Also, the where part of a sql query is executed before the select part, so you can't refer to the alias customer.roundedrate in your where statement.
Try this instead
SELECT *, round(customer.rate,-1) as roundedrate
FROM customer INNER JOIN rate ON round(customer.rate,-1) = rate.rate
http://sqlfiddle.com/#!9/e94a60/2
I would suggest a correlated subquery for this:
select c.*,
(select r.description
from rate r
where r.rate <= c.rate
order by r.rate desc
fetch first 1 row only
) as description
from customer c;
Note: fetch first 1 row only is ANSI standard SQL, which some databases do not support. MySQL uses limit. Older versions of SQL Server use select top 1 instead.