Retrieving the top 20% of a particular column in Oracle SQL? [duplicate] - sql

This question already has answers here:
Top n percent top n%
(4 answers)
Closed 7 years ago.
I've created a database where workers have a charging rate and work a particular number of hours. From this I've done a select statement which simply displays some information about the worker, and then I've created a new column called Earnings which uses rate_per_hour * task_hours to give their total earnings.
My question is, is there a way to only display the top 20% highest earners based on the new Earnings column created in the select statement?
So far I have this:
SELECT worker.worker_id
,worker_first_name
,worker_surname
,worker_case_id
,task_hours
,rate.rate_id
,rate_per_hour
,task_hours * rate_per_hour AS Earnings
FROM worker
,note
,rate
WHERE worker.worker_id = note.worker_id
AND rate.rate_id = note.rate_id;
I just need display the top 20% of earnings based on that new column I've made. Is this possible?
Thanks, apologies for my lack of experience!

First, you should use explicit join syntax. Second, you can do what you want using percentile_cont() or percentile_disc(). However, I often do this using row_number() and count():
SELECT wnr.*
FROM (SELECT w.worker_id, worker_first_name, worker_surname, worker_case_id,
task_hours, r.rate_id, rate_per_hour,
task_hours * rate_per_hour AS Earnings,
row_number() over (order by task_hours * rate_per_hour desc) as seqnum,
count(*) as cnt
FROM worker w JOIN
note n
ON w.worker_id = n.worker_id JOIN
rate r
ON r.rate_id = n.rate_id
) wnr
WHERE seqnum <= cnt * 0.2;

You also might use rank analytical function instead of row_number in case you want equal rank for equal earnings.
Select * from (
Select employee,earnings,rank() over (order by earnings desc )/count(*) over() As top from employees)
Where top<=0.2;

Related

Top 1 Profitable movie for each decade [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed last month.
I need help with SQL Server query. I need to extract only the top 1 profitable movie for each decade.
Suppose there are 20 distinct decades and I only want to extract the top 1 profitable movie for each decade. Can someone help me with the query?
I have attached the screen shot for the reference. My result shows all the profitable movies for each decade. I only want the top 1 profitable movie for each decade.
For reference enter image description here
Select
decade, Movie_Title, Profit
from
DW.IMDB_MOVIE
group by
decade, Movie_Title, profit
order by
decade, profit desc
One option is using WITH TIES in concert with the window function row_number()
Example
Select top 1 with ties *
From DW.IMDB_MOVIE
Order by row_number() over (partition by decade order by profit desc)
Or a nudge more performant
with cte as (
Select *
,RN = row_number() over (partition by decade order by profit desc)
From DW.IMDB_MOVIE
)
Select *
From cte
Where RN=1

SQL Selecting dates with maximum sale for each department [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Oracle SQL query: Retrieve latest values per group based on time [duplicate]
(2 answers)
Get value based on max of a different column grouped by another column [duplicate]
(1 answer)
SQL: getting the max value of one column and the corresponding other columns [duplicate]
(2 answers)
Closed 3 years ago.
I am troubled with writing a tricky query.
I have the following table:
For each department I want to print date with largest profit;
I tried coming up with such a query myself:
Select DISTINCT(Name), Date_sale, MAX(A) as B FROM (SELECT
Departments.Name, SALES.Date_sale, SUM(GOODS.Price * SALES.Quantity)
AS A FROM DEPARTMENTS, GOODS, SALES
WHERE DEPARTMENTS.Dept_id = GOODS.Dept_id AND GOODS.Good_id =
SALES.Good_id GROUP BY DEPARTMENTs.Name, SALES.Date_sale)
GROUP BY Name, Date_sale;
But the problem it that departments are printed several times because I groupped by both name and date.
How should I fix it?
You can try below way-
with cte as
(
SELECT
Departments.Name, SALES.Date_sale, SUM(GOODS.Price * SALES.Quantity)
AS profit FROM DEPARTMENTS inner join GOODS on DEPARTMENTS.Dept_id = GOODS.Dept_id
inner join SALES on GOODS.Good_id = SALES.Good_id
GROUP BY DEPARTMENTs.Name, SALES.Date_sale
)A
select * from cte a
where profit =
(select max(profit) from cte b on a.department=b.department)
OR you can use row_number()
select * from
(
select *, row_number() over(partition by department oder by profit desc) as rn
from cte
)A where rn=1
You can write it using ROW_NUMBER which will give a number to each date's total count grouped by the department as following and then you can take the highest sale date using rn = 1
SELECT NAME, DATE_SALE, A
FROM
(
SELECT
DEPARTMENTS.NAME, SALES.DATE_SALE,
ROW_NUMBER() OVER(
PARTITION BY DEPARTMENTS.NAME
ORDER BY SUM(GOODS.PRICE * SALES.QUANTITY) DESC NULLS LAST
) AS RN,
SUM(GOODS.PRICE * SALES.QUANTITY) AS A
FROM DEPARTMENTS
JOIN GOODS ON ( DEPARTMENTS.DEPT_ID = GOODS.DEPT_ID )
JOIN SALES ON ( GOODS.GOOD_ID = SALES.GOOD_ID )
GROUP BY DEPARTMENTS.NAME,
SALES.DATE_SALE
)
WHERE RN = 1;
Important, Use the standard ANSI-joins.
Cheers!!
i would use join-s here as it is needed to pull info from 2 tables linked via the third table.
Something like this (but I have not tested this query, just suggesting an approach):
Select department.name as dept, MAX(sales.quantity) as max_sales, sales.date_sale
from goods
Left outer join departments on departments.dept_id = goods.dept_id
Left outer join sales on sales.good_id = goods.good_id
Group by dept

How to query samples in relativity?

I have a large data set with about 100 million rows that I want to 'compress' the data set and get a 1% sample of the entire dataset while ensuring relativity.
How can such query be implemented?
Step 1: create the helper table
You can use aggregation to group records by visit_id, and CROSS JOIN with a query that computes the total number of records in the table to compute the distribution percent:
CREATE TABLE my_helper AS
SELECT
t.visit_number,
COUNT(*) visit_count,
SUM(t.purchase_id) sum_purchase,
COUNT(*)/total.cnt distribution
FROM
mytable t
CROSS JOIN (SELECT COUNT(*) cnt FROM mytable) total
GROUP BY t.visit_number
Step 2: sample the main table using the helper table
Within a subquery, you can use ROW_NUMBER() OVER(PARTITION BY visit_number ORDER BY RANDOM()) to assign a random rank to each record within groups of records sharing the same visit_id. Then, in the outer query, you can join on the helper table to select the corect amount of records for each visit_id:
SELECT x.*
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY visit_number ORDER BY RANDOM()) rn
FROM mytable t
) x
INNER JOIN my_helper h ON h.visit_number = x.visit_number
WHERE x.rn <= 1000000 * h.distribution
Side notes:
this only works if there are indeed more than 1 million record in the source table
the exact number of records in the output might be slightly below or above 1 million (depending on the distribution in the original table)
it should be possible to combine both queries into a single one, which would avoid the need to use a helper table
This is doable. A quick way is to take every nth record only.
1) order by a random column (probably ID)
2) apply a nownum() attribute
3) apply a mod(rownum) = 0 on whatever percent makes sense (e.g. 1% would be rownum mod 100)
You may need steps 1/2 in a sub query and step 3 on the outside.
Enjoy and good luck!

SQL Getting Top 2 Results for each individual column value

I have a table 'Cashup_Till' that records all data on what a particular till has recorded in a venue for a given day, each venue has multiple tills all with a designated number 'Till_No'. I need to get the previous 2 days entries for each till number. For each till Individually I can do this...
SELECT TOP 2 T.* FROM CashUp_Till T
WHERE T.Till_No = (Enter Till Number Here)
ORDER BY T.Till_Id DESC
Some venues have 20-30 tills so Ideally I need to do all the tills in one call. I can pass in a user defined table type of till numbers, then select them in a subquery, but that's as far as my SQL knowledge takes me, does anyone have a solution?
Here is one way:
SELECT T.*
FROM (SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY Till_No ORDER BY Till_Id DESC) as seqnum
FROM CashUp_Till T
) T
WHERE seqnum <= 2;
This assumes that there is one record per day, which I believe is suggested by the question.
If you have a separate table of tills, then:
select ct.*
from t cross apply
(select top 2 ct.*
from cashup_till ct
where ct.till_no = t.till_no
order by till_id desc
) ct;

Query in sql to get the top 10 percent in standard sql (without limit, top and the likes, without window functions)

I'm wondering how to retrieve the top 10% athletes in terms of points, without using any clauses such as TOP, Limit etc, just a plain SQL query.
My idea so far:
Table Layout:
Score:
ID | Name | Points
Query:
select *
from Score s
where 0.10 * (select count(*) from Score x) >
(select count(*) from Score p where p.Points < s.Points)
Is there an easier way to do this? Any suggestions?
In most databases, you would use the ANSI standard window functions:
select s.*
from (select s.*,
count(*) over () as cnt,
row_number() over (order by score) as seqnum
from s
) s
where seqnum*10 < cnt;
Try:
select s1.id, s1.name s1.points, count(s2.points)
from score s1, score s2
where s2.points > s1.points
group by s1.id, s1.name s1.points
having count(s2.points) <= (select count(*)*.1 from score)
Basically calculates the count of players with a higher score than the current score, and if that count is less than or equal to 10% of the count of all scores, it's in the top 10%.
The PERCENTILE_DISC function is standard SQL and can help you here. Not every SQL implementation supports it, but the following should work in SQL Server 2012, for example. If you need to be particular about ties, or what the top 10% means if there are fewer than 10 athletes, make sure this is computing what you want. PERCENTILE_COMP may be a better option for some questions.
WITH C(cutoff) AS (
SELECT DISTINCT
PERCENTILE_DISC(0.90)
WITHIN GROUP (ORDER BY points)
OVER ()
FROM T
)
SELECT *
FROM Score JOIN C
ON points >= cutoff;