SELECT TOP 20 Percent SQL - sql

I have a query which can select TOP 20 percent of TOP highest with GrandTotal. But there is something is not fair. For example, in between the Top 20 out of 10 People is 2. So the out put is show this:
EmpName GrandTotal
Kelvin 50
Gem 40
But the grand total of the 3rd and 4th people also having 40 as Grand Total. I need some idea and advice, how i going to do solve this problem?
SELECT TOP 20 PERCENT
EmpName,
SUM(Scoring) AS GrandTotal
FROM
[masterView]
GROUP BY
EmpName
ORDER BY
GrandTotal DESC, EmpName ASC

On SQL server you can use WITH TIES in order to include ties
SELECT TOP 20 PERCENT WITH TIES Id, sum(Score) as GrandTotal
FROM myTable GROUP BY Id
ORDER BY GrandTotal DESC

SQL Fiddle Demo
Test Data
CREATE TABLE Table1
([ID] int, [Score] int)
;
INSERT INTO Table1
([ID], [Score])
VALUES
(1, 10), (2, 20),
(3, 30), (4, 20),
(5, 10), (6, 40),
(7, 40), (8, 50),
(9, 10), (10, 5);
Query
with ranked as (
select
id,
rank() over (order by Score desc) as rnk
from Table1
),
total as (
select count(*) as total
from Table1
)
SELECT *
FROM ranked
CROSS JOIN total
WHERE ranked.rnk <= 0.2 * total.total
OUTPUT
| id | rnk | total |
|----|-----|-------|
| 8 | 1 | 10 |
| 6 | 2 | 10 |
| 7 | 2 | 10 |

Related

SQL Occurrence of Sequence Number

I want to find if any Name has straight 4 or more occurrences of SeqNo in consecutive sequence only.
If there is a break in seqNo but 4 or more rows are consecutive then also i need that Name.
Example:
SeqNo Name
10 | A
15 | A
16 | A
17 | A
18 | A
9 | B
10 | B
13 | B
14 | B
6 | C
7 | C
9 | C
10 | C
OUTPUT:
A
BELOW IS SCRIPT FOR ANYONE HELPING.
create table testseq (Id int, Name char)
INSERT into testseq values
(10, 'A'),
(15, 'A'),
(16, 'A'),
(17, 'A'),
(18, 'A'),
(9, 'B'),
(10, 'B'),
(13, 'B'),
(14, 'B'),
(6, 'C'),
(7, 'C'),
(9, 'C'),
(10, 'C')
SELECT * FROM testseq
You can use some gaps-and-islands techniques for this.
If you want names that have at least 4 consecutive records where seqno is increasing by 1, then you can use the difference between seqno androw_number()` to define the groups, and then aggregate:
select distinct name
from (
select t.*, row_number() over(partition by name order by seqno) rn
from testseq t
) t
group by name, rn - seqno
having count(*) >= 4
Note that for your sample data, this returns no rows. A has 3 consecutive records where seqno is incrementing by 1, B and C have two.
I don't really view this as a "gaps-and-islands" problem. You are just looking for a minimum number of adjacent rows. This is easily handled using lag() or lead():
select t.*
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
This checks the third sequence number on the same name. The third one after means that four names are the same in a row.
If you just want the name and to handle duplicates:
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
If the sequence numbers can have gaps (but are otherwise adjacent):
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3,
lead(seqno, 3) over (order by seqno) as seqno_3
from t
) t
where seqno_name_3 = seqno_3;
A solution in plain SQL, no LAG() or LEAD() or ROW_NUMBER():
SELECT t1.Name
FROM testseq t1
WHERE (
SELECT count(t2.Id)
FROM testseq t2
WHERE t2.Name=t1.Name
and t2.Id between t1.Id and t1.Id+3
GROUP BY t2.Name)>=4
GROUP BY t1.Name;

Unexpected behavior of window function first_value

I have 2 columns - order no, value. Table value constructor:
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
I need to get
(1, 5) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(2, 5)
,(3, 2) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(4, 2) -- analogous
,(5, 2)
,(6, 1)
This is query that I think should work.
;with SourceTable as (
select *
from (values
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
) as T(OrderNo, Value)
)
select
*
,first_value(Value) over (
order by
case when Value is not null then 0 else 1 end
, OrderNo
rows between current row and unbounded following
) as X
from SourceTable
order by OrderNo
The issue is that it returns exactly same resultset as SourceTable. I don't understand why. E.g., if first row is processed (OrderNo = 1) I'd expect column X returns 5 because frame should include all rows (current row and unbound following) and it orders by Value - nonnulls first, then by OrderNo. So first row in frame should be OrderNo=2. Obviously it doesn't work like that but I don't get why.
Much appreciated if someone explains how is constructed the first frame. I need this for SQL Server and also Postgresql.
Many thanks
Although probably more expensive than two window functions, you can do this without a subquery using arrays:
with SourceTable as (
select *
from (values (1, null),
(2, 5),
(3, null),
(4, null),
(5, 2),
(6, 1)
) T(OrderNo, Value)
)
select st.*,
(array_remove(array_agg(value) over (order by orderno rows between current row and unbounded following), null))[1] as x
from SourceTable st
order by OrderNo;
Here is the db<>fiddle.
Or using a lateral join:
select st.*, st2.value
from SourceTable st left join lateral
(select st2.*
from SourceTable st2
where st2.value is not null and st2.orderno >= st.orderno
order by st2.orderno asc
limit 1
) st2
on 1=1
order by OrderNo;
With the right indexes on the source table, the lateral join might be the best solution from a performance perspective (I have been surprised by the performance of lateral joins under the right circumstances).
It's pretty easy to see why first_value doesn't work if you order the results by case when Value is not null then 0 else 1 end, orderno
orderno | value | x
---------+-------+---
2 | 5 | 5
5 | 2 | 2
6 | 1 | 1
1 | |
3 | |
4 | |
(6 rows)
For orderno=1, there's nothing after it in the frame that would be not-null.
Instead, we can arrange the orders into groups using count as a window function in a sub-query. We then use max as a window function over that group (this is arbitrary, min would work just as well) to get the one non-null value in that group:
with SourceTable as (
select *
from (values
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
) as T(OrderNo, Value)
)
select orderno, order_group, max(value) OVER (PARTITION BY order_group) FROM (
SELECT *,
count(value) OVER (ORDER BY orderno DESC) as order_group
from SourceTable
) as sub
order by orderno;
orderno | order_group | max
---------+-------------+-----
1 | 3 | 5
2 | 3 | 5
3 | 2 | 2
4 | 2 | 2
5 | 2 | 2
6 | 1 | 1
(6 rows)

Calculate percentage / aggregation based on a baseline row

I would like to calculate the productivity of a sales team compared to a specific team member.
Given this query:
with t1 (rep_id, place_id, sales_qty) as (values
(0, 1, 3),
(1, 1, 1),
(1, 2, 2),
(1, 3, 4),
(1, 4, 1),
(2, 2, 1),
(2, 3, 3)
)
select
rep_id,
count(distinct place_id) as qty_places,
sum(sales_qty) as qty,
sum(sales_qty) / count(place_id) as productivity
from
t1
group by
rep_id
result:
rep_id | qty_places | qty_sales | productivity
---------------------------------------------
0 | 1 | 6 | 6
1 | 4 | 22 | 5
2 | 2 | 9 | 4
I would like to have the productivity of the team based on the productivity of rep_id = 1, so I would like to have something like this:
rep_id | qty_places | qty_sales | productivity | productivity %
--------------------------------------------------------------
0 | 1 | 6 | 6 | 1.2
1 | 4 | 22 | 5 | 1 <- Baseline
2 | 2 | 9 | 4 | 0.8
How can I achieve that with SQL on PostgreSQL?
this should do the trick
with t1 (rep_id, place_id, sales_qty) as (values
(0, 1, 3),
(1, 1, 1),
(1, 2, 2),
(1, 3, 4),
(1, 4, 1),
(2, 2, 1),
(2, 3, 3)
),
cte as (select
rep_id,
count(distinct place_id) as qty_places,
sum(sales_qty) as qty,
sum(sales_qty) / count(place_id) as productivity
from
t1
group by
rep_id)
select rep_id, qty_places, qty, productivity,
productivity::numeric/(select productivity::numeric from cte where rep_id = 1)
as productivity_percent from cte
We can try computing the rep_id = 1 figures in a separate CTE, and then cross join that to your current table:
WITH cte AS (
SELECT SUM(CASE WHEN rep_id = 1 THEN sales_qty ELSE 0 END) /
COUNT(CASE WHEN rep_id = 1 THEN 1 END) AS baseline
FROM t1
)
SELECT
rep_id,
COUNT(DISTINCT place_id) AS qty_places,
SUM(sales_qty) AS qty,
SUM(sales_qty) / COUNT(place_id) AS productivity,
(1.0*SUM(sales_qty) / COUNT(place_id)) / t2.baseline AS productivity_pct
FROM t1
CROSS JOIN cte t2
GROUP BY
t1.rep_id, t2.baseline;
Demo
Simply use conditional aggregation. I would do this using a subquery:
select t.*,
productivity / max(productivity) filter (where rep_id = 1) over ()
from (select rep_id,
count(distinct place_id) as qty_places,
sum(sales_qty) as qty,
sum(sales_qty)::numeric / count(place_id) as productivity
from t1
group by rep_id
) t
Here is a db<>fiddle.
Note that you can actually express this without the subquery, but I think that just makes the query more complicated.

Remove duplicates values when all values are the same

I am using SQL workbench/J connecting to amazon redshift.
I have the following data in a table (there are more columns that need to be kept but are all the exact same values for each unique claim_id regardless of line number):
Member ID | Claim_ID | Line_Number |
1 100 1
1 100 2
1 100 1
1 100 2
2 101 13
2 101 13
2 101 13
2 101 13
3 102 12
3 102 12
1 103 2
1 103 2
I want it to become the following which will remove any duplicates based on claim_id (it does not matter which line number is kept):
Member ID | Claim_ID | Line_Number |
1 100 1
2 101 13
3 102 12
1 103 2
I have tried the following:
select er_main.member_id, er_main.claim_id, er_main.line_number,
temp.claim_id, temp.line_number
from OK_ER_30 er_main
inner join (
select row_number() over (partition by claim_id order by line_number desc) as seqnum
from
OK_ER_30 temp) temp
ON er_main.claim_id = temp.claim_id and seqnum = 1
Order by er_main.claim_id, temp.line_number
and this:
select * from ok_er_30
where claim_id in
(select distinct claim_id
from ok_er_30
group by claim_id
)
order by claim_id desc
I have checked many other ways of pulling only one row per distinct claim_id but nothing has worked.
try this
select Distant(Member_ID,Claim_ID,max(Line_Number)) group by Member_ID,Claim_ID
Check out the following code.
declare #OK_ER_30 table(Member_ID int, Claim_ID int, Line_Number int);
insert #OK_ER_30 values
(1, 100, 1),
(1, 100, 2),
(1, 100, 1),
(1, 100, 2),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(3, 102, 12),
(3, 102, 12),
(1, 103, 2),
(1, 103, 2);
with
t as(
select *, row_number() over(
partition by Member_ID, Claim_ID order by (select 0)
) rn
from #OK_ER_30
)
delete from t where rn > 1;
select * from #OK_ER_30;
Try this,
select Member_ID,Claim_ID,max(Line_Number) group by Member_ID,Claim_ID

Get sum of all rows in each row

Is it possible to get sum of all rows in each row. Example
Rows | TotalCount
1 | 20
2 | 30
3 | 10
4 | 60
Now, I want get following result.
Rows | TotalCount
1 | 120
2 | 120
3 | 120
4 | 120
If it is possible in SQL server please help.
Use window functions:
select t.*, sum(totalcount) over ()
from t;
In general, window functions are going to be faster than join/aggregation solutions. This is a rather simple case, so the performance might be essentially the same.
Use a sub-query where you do the SUM work:
select rows, (select sum(TotalCount) from tablename) as TotalCount
from tablename
Or a cross join:
select t1.rows, t2.TotalCount
from tablename t1
cross join (select sum(TotalCount) as TotalCount from tablename) t2
Try like this,
DECLARE #table TABLE
(
Rows INT,
TotalCount INT
)
INSERT INTO #table
VALUES (1,
20),
(2,
30),
(3,
10),
(4,
60)
DECLARE #total INT=(SELECT Sum(totalcount)
FROM #table)
SELECT rows,
#total AS TotalCount
FROM #table