How to select top 10% of values in SQL - sql

I know it's possible to select the maximum value in SQL with MAX(), but if I have a table with two columns, an ID and a value, is it possible to select the top 10% of values for each ID. There is not a set number of values for each ID.
EDIT: Apologies for not being clearer,
I'm working with Microsoft SQL server Managment studio. Table A looks like:
ID, Value
112345, 1
112345, 2
112345, 3
112345, 2
112345, 3
112345, 18
112345, 32
112357, 10
112346, 15
112346, 16
If it were to select the top 50% for each I would want the select to produce:
ID, Value
112345, 3
112345, 3
112345, 18
112345, 32
112357, 10
112346, 16
I would prefer if the number of returned rows was rounded up, eg. 10% of an ID that had 4 rows would still return 1 value

Assuming you want the first 10% of rows ordered by Value (desc), you can achieve that by using window functions:
select * from (
select ID, Value, COUNT(*) over (partition by ID) as countrows, ROW_NUMBER() over (partition by ID order by Value desc) as rowno from mytable) as innertab
where rowno <= floor(countrows*0.1+0.9)
order by ID, rowno
The floor-thing brings 1 row per 1-10 rows, 2 rows for 11-20 rows and so on.

Alternatively you could use CROSS APPLY and specify TOP n PERCENT
SELECT x.*
FROM ( SELECT DISTINCT ID FROM tab ) a
CROSS APPLY ( SELECT TOP 10 PERCENT ID, Value FROM tab b WHERE b.ID = a.ID) x
TOP n PERCENT will produce at least one row.

Related

How to add single value in a new column

my goal is to put the value of the 1 row in every row of the new column.
First value in this example is the number 10.
The New Table is showing my goal.
Table
Product ID Name Value
1 ABC 10
2 XYZ 22
3 LMM 8
New Table
Product ID Name Value New Column
1 ABC 10 10
2 XYZ 22 10
3 LMM 8 10
I would fetch the value with the row_rumber function, but how i get that value in every row?
You can use the first_value() window function:
select product_id, name, value,
first_value(value) over (order by product_id) as new_column
from the_table
order by product_id;
Rows in a table have no implied sort order. So the "first row" can only be defined when an order by is present.
Assuming you want to pick the first one according to the product ID, you can do:
select *,
( select value
from (select *, row_number() over(order by product_id) as rn from t) x
where rn = 1
) as new_column
from t

sql - select single ID for each group with the lowest value

Consider the following table:
ID GroupId Rank
1 1 1
2 1 2
3 1 1
4 2 10
5 2 1
6 3 1
7 4 5
I need an sql (for MS-SQL) select query selecting a single Id for each group with the lowest rank. Each group needs to only return a single ID, even if there are two with the same rank (as 1 and 2 do in the above table). I've tried to select the min value, but the requirement that only one be returned, and the value to be returned is the ID column, is throwing me.
Does anyone know how to do this?
Use row_number():
select t.*
from (select t.*,
row_number() over (partition by groupid order by rank) as seqnum
from t
) t
where seqnum = 1;

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

SQL Get highest repeating value for a group clause

I want a SQL query which should tell me that for each ID which value repeated most of time.
For example lets take the following table:
Id Value
1 10
1 20
1 10
1 10
2 1
1 3
Desired Output
Id Value Count
1 10 3
2 1 1
From above example, it shows that for Id 1, Value 10 was repeated most of times and for Id 2, value 1 was repeated most of times
Any suggestion would be really appreciated.
Use rank to number the id's based on their value counts in descending order and pick up the 1st ranked rows.
select id, value, cnt
from (select id, value, count(*) as cnt,
rank() over (partition by id order by count(*) desc) as rnk
from t
group by id, value) x
where rnk = 1
Based on Gordon's comment, if you need only one value per id in case of ties, use row_number instead of rank, as rank returns all the ties in value counts.

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.