Calculate Mode and Median for data in SQL [duplicate] - sql

According to MSDN, Median is not available as an aggregate function in Transact-SQL. However, I would like to find out whether it is possible to create this functionality (using the Create Aggregate function, user defined function, or some other method).
What would be the best way (if possible) to do this - allow for the calculation of a median value (assuming a numeric data type) in an aggregate query?

If you're using SQL 2005 or better this is a nice, simple-ish median calculation for a single column in a table:
SELECT
(
(SELECT MAX(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf)
+
(SELECT MIN(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf)
) / 2 AS Median

2019 UPDATE: In the 10 years since I wrote this answer, more solutions have been uncovered that may yield better results. Also, SQL Server releases since then (especially SQL 2012) have introduced new T-SQL features that can be used to calculate medians. SQL Server releases have also improved its query optimizer which may affect perf of various median solutions. Net-net, my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps. Take a look at this article from 2012 which is a great resource: https://sqlperformance.com/2012/08/t-sql-queries/median
This article found the following pattern to be much, much faster than all other alternatives, at least on the simple schema they tested. This solution was 373x faster (!!!) than the slowest (PERCENTILE_CONT) solution tested. Note that this trick requires two separate queries which may not be practical in all cases. It also requires SQL 2012 or later.
DECLARE #c BIGINT = (SELECT COUNT(*) FROM dbo.EvenRows);
SELECT AVG(1.0 * val)
FROM (
SELECT val FROM dbo.EvenRows
ORDER BY val
OFFSET (#c - 1) / 2 ROWS
FETCH NEXT 1 + (1 - #c % 2) ROWS ONLY
) AS x;
Of course, just because one test on one schema in 2012 yielded great results, your mileage may vary, especially if you're on SQL Server 2014 or later. If perf is important for your median calculation, I'd strongly suggest trying and perf-testing several of the options recommended in that article to make sure that you've found the best one for your schema.
I'd also be especially careful using the (new in SQL Server 2012) function PERCENTILE_CONT that's recommended in one of the other answers to this question, because the article linked above found this built-in function to be 373x slower than the fastest solution. It's possible that this disparity has been improved in the 7 years since, but personally I wouldn't use this function on a large table until I verified its performance vs. other solutions.
ORIGINAL 2009 POST IS BELOW:
There are lots of ways to do this, with dramatically varying performance. Here's one particularly well-optimized solution, from Medians, ROW_NUMBERs, and performance. This is a particularly optimal solution when it comes to actual I/Os generated during execution – it looks more costly than other solutions, but it is actually much faster.
That page also contains a discussion of other solutions and performance testing details. Note the use of a unique column as a disambiguator in case there are multiple rows with the same value of the median column.
As with all database performance scenarios, always try to test a solution out with real data on real hardware – you never know when a change to SQL Server's optimizer or a peculiarity in your environment will make a normally-speedy solution slower.
SELECT
CustomerId,
AVG(TotalDue)
FROM
(
SELECT
CustomerId,
TotalDue,
-- SalesOrderId in the ORDER BY is a disambiguator to break ties
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc
FROM Sales.SalesOrderHeader SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CustomerId
ORDER BY CustomerId;

In SQL Server 2012 you should use PERCENTILE_CONT:
SELECT SalesOrderID, OrderQty,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY OrderQty)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
See also : http://blog.sqlauthority.com/2011/11/20/sql-server-introduction-to-percentile_cont-analytic-functions-introduced-in-sql-server-2012/

My original quick answer was:
select max(my_column) as [my_column], quartile
from (select my_column, ntile(4) over (order by my_column) as [quartile]
from my_table) i
--where quartile = 2
group by quartile
This will give you the median and interquartile range in one fell swoop. If you really only want one row that is the median then uncomment the where clause.
When you stick that into an explain plan, 60% of the work is sorting the data which is unavoidable when calculating position dependent statistics like this.
I've amended the answer to follow the excellent suggestion from Robert Ševčík-Robajz in the comments below:
;with PartitionedData as
(select my_column, ntile(10) over (order by my_column) as [percentile]
from my_table),
MinimaAndMaxima as
(select min(my_column) as [low], max(my_column) as [high], percentile
from PartitionedData
group by percentile)
select
case
when b.percentile = 10 then cast(b.high as decimal(18,2))
else cast((a.low + b.high) as decimal(18,2)) / 2
end as [value], --b.high, a.low,
b.percentile
from MinimaAndMaxima a
join MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5
This should calculate the correct median and percentile values when you have an even number of data items. Again, uncomment the final where clause if you only want the median and not the entire percentile distribution.

Even better:
SELECT #Median = AVG(1.0 * val)
FROM
(
SELECT o.val, rn = ROW_NUMBER() OVER (ORDER BY o.val), c.c
FROM dbo.EvenRows AS o
CROSS JOIN (SELECT c = COUNT(*) FROM dbo.EvenRows) AS c
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2);
From the master Himself, Itzik Ben-Gan!

MS SQL Server 2012 (and later) has the PERCENTILE_DISC function which computes a specific percentile for sorted values. PERCENTILE_DISC (0.5) will compute the median - https://msdn.microsoft.com/en-us/library/hh231327.aspx

Simple, fast, accurate
SELECT x.Amount
FROM (SELECT amount,
Count(1) OVER (partition BY 'A') AS TotalRows,
Row_number() OVER (ORDER BY Amount ASC) AS AmountOrder
FROM facttransaction ft) x
WHERE x.AmountOrder = Round(x.TotalRows / 2.0, 0)

If you want to use the Create Aggregate function in SQL Server, this is how to do it. Doing it this way has the benefit of being able to write clean queries. Note this this process could be adapted to calculate a Percentile value fairly easily.
Create a new Visual Studio project and set the target framework to .NET 3.5 (this is for SQL 2008, it may be different in SQL 2012). Then create a class file and put in the following code, or c# equivalent:
Imports Microsoft.SqlServer.Server
Imports System.Data.SqlTypes
Imports System.IO
<Serializable>
<SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToNulls:=True, IsInvariantToDuplicates:=False, _
IsInvariantToOrder:=True, MaxByteSize:=-1, IsNullIfEmpty:=True)>
Public Class Median
Implements IBinarySerialize
Private _items As List(Of Decimal)
Public Sub Init()
_items = New List(Of Decimal)()
End Sub
Public Sub Accumulate(value As SqlDecimal)
If Not value.IsNull Then
_items.Add(value.Value)
End If
End Sub
Public Sub Merge(other As Median)
If other._items IsNot Nothing Then
_items.AddRange(other._items)
End If
End Sub
Public Function Terminate() As SqlDecimal
If _items.Count <> 0 Then
Dim result As Decimal
_items = _items.OrderBy(Function(i) i).ToList()
If _items.Count Mod 2 = 0 Then
result = ((_items((_items.Count / 2) - 1)) + (_items(_items.Count / 2))) / 2#
Else
result = _items((_items.Count - 1) / 2)
End If
Return New SqlDecimal(result)
Else
Return New SqlDecimal()
End If
End Function
Public Sub Read(r As BinaryReader) Implements IBinarySerialize.Read
'deserialize it from a string
Dim list = r.ReadString()
_items = New List(Of Decimal)
For Each value In list.Split(","c)
Dim number As Decimal
If Decimal.TryParse(value, number) Then
_items.Add(number)
End If
Next
End Sub
Public Sub Write(w As BinaryWriter) Implements IBinarySerialize.Write
'serialize the list to a string
Dim list = ""
For Each item In _items
If list <> "" Then
list += ","
End If
list += item.ToString()
Next
w.Write(list)
End Sub
End Class
Then compile it and copy the DLL and PDB file to your SQL Server machine and run the following command in SQL Server:
CREATE ASSEMBLY CustomAggregate FROM '{path to your DLL}'
WITH PERMISSION_SET=SAFE;
GO
CREATE AGGREGATE Median(#value decimal(9, 3))
RETURNS decimal(9, 3)
EXTERNAL NAME [CustomAggregate].[{namespace of your DLL}.Median];
GO
You can then write a query to calculate the median like this:
SELECT dbo.Median(Field) FROM Table

I just came across this page while looking for a set based solution to median. After looking at some of the solutions here, I came up with the following. Hope is helps/works.
DECLARE #test TABLE(
i int identity(1,1),
id int,
score float
)
INSERT INTO #test (id,score) VALUES (1,10)
INSERT INTO #test (id,score) VALUES (1,11)
INSERT INTO #test (id,score) VALUES (1,15)
INSERT INTO #test (id,score) VALUES (1,19)
INSERT INTO #test (id,score) VALUES (1,20)
INSERT INTO #test (id,score) VALUES (2,20)
INSERT INTO #test (id,score) VALUES (2,21)
INSERT INTO #test (id,score) VALUES (2,25)
INSERT INTO #test (id,score) VALUES (2,29)
INSERT INTO #test (id,score) VALUES (2,30)
INSERT INTO #test (id,score) VALUES (3,20)
INSERT INTO #test (id,score) VALUES (3,21)
INSERT INTO #test (id,score) VALUES (3,25)
INSERT INTO #test (id,score) VALUES (3,29)
DECLARE #counts TABLE(
id int,
cnt int
)
INSERT INTO #counts (
id,
cnt
)
SELECT
id,
COUNT(*)
FROM
#test
GROUP BY
id
SELECT
drv.id,
drv.start,
AVG(t.score)
FROM
(
SELECT
MIN(t.i)-1 AS start,
t.id
FROM
#test t
GROUP BY
t.id
) drv
INNER JOIN #test t ON drv.id = t.id
INNER JOIN #counts c ON t.id = c.id
WHERE
t.i = ((c.cnt+1)/2)+drv.start
OR (
t.i = (((c.cnt+1)%2) * ((c.cnt+2)/2))+drv.start
AND ((c.cnt+1)%2) * ((c.cnt+2)/2) <> 0
)
GROUP BY
drv.id,
drv.start

The following query returns the median from a list of values in one column. It cannot be used as or along with an aggregate function, but you can still use it as a sub-query with a WHERE clause in the inner select.
SQL Server 2005+:
SELECT TOP 1 value from
(
SELECT TOP 50 PERCENT value
FROM table_name
ORDER BY value
)for_median
ORDER BY value DESC

Although Justin grant's solution appears solid I found that when you have a number of duplicate values within a given partition key the row numbers for the ASC duplicate values end up out of sequence so they do not properly align.
Here is a fragment from my result:
KEY VALUE ROWA ROWD
13 2 22 182
13 1 6 183
13 1 7 184
13 1 8 185
13 1 9 186
13 1 10 187
13 1 11 188
13 1 12 189
13 0 1 190
13 0 2 191
13 0 3 192
13 0 4 193
13 0 5 194
I used Justin's code as the basis for this solution. Although not as efficient given the use of multiple derived tables it does resolve the row ordering problem I encountered. Any improvements would be welcome as I am not that experienced in T-SQL.
SELECT PKEY, cast(AVG(VALUE)as decimal(5,2)) as MEDIANVALUE
FROM
(
SELECT PKEY,VALUE,ROWA,ROWD,
'FLAG' = (CASE WHEN ROWA IN (ROWD,ROWD-1,ROWD+1) THEN 1 ELSE 0 END)
FROM
(
SELECT
PKEY,
cast(VALUE as decimal(5,2)) as VALUE,
ROWA,
ROW_NUMBER() OVER (PARTITION BY PKEY ORDER BY ROWA DESC) as ROWD
FROM
(
SELECT
PKEY,
VALUE,
ROW_NUMBER() OVER (PARTITION BY PKEY ORDER BY VALUE ASC,PKEY ASC ) as ROWA
FROM [MTEST]
)T1
)T2
)T3
WHERE FLAG = '1'
GROUP BY PKEY
ORDER BY PKEY

In a UDF, write:
Select Top 1 medianSortColumn from Table T
Where (Select Count(*) from Table
Where MedianSortColumn <
(Select Count(*) From Table) / 2)
Order By medianSortColumn

Justin's example above is very good. But that Primary key need should be stated very clearly. I have seen that code in the wild without the key and the results are bad.
The complaint I get about the Percentile_Cont is that it wont give you an actual value from the dataset.
To get to a "median" that is an actual value from the dataset use Percentile_Disc.
SELECT SalesOrderID, OrderQty,
PERCENTILE_DISC(0.5)
WITHIN GROUP (ORDER BY OrderQty)
OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC

Using a single statement - One way is to use ROW_NUMBER(), COUNT() window function and filter the sub-query. Here is to find the median salary:
SELECT AVG(e_salary)
FROM
(SELECT
ROW_NUMBER() OVER(ORDER BY e_salary) as row_no,
e_salary,
(COUNT(*) OVER()+1)*0.5 AS row_half
FROM Employee) t
WHERE row_no IN (FLOOR(row_half),CEILING(row_half))
I have seen similar solutions over the net using FLOOR and CEILING but tried to use a single statement. (edited)

Median Finding
This is the simplest method to find the median of an attribute.
Select round(S.salary,4) median from employee S
where (select count(salary) from station
where salary < S.salary ) = (select count(salary) from station
where salary > S.salary)

See other solutions for median calculation in SQL here:
"Simple way to calculate median with MySQL" (the solutions are mostly vendor-independent).

Building on Jeff Atwood's answer above here it is with GROUP BY and a correlated subquery to get the median for each group.
SELECT TestID,
(
(SELECT MAX(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts WHERE TestID = Posts_parent.TestID ORDER BY Score) AS BottomHalf)
+
(SELECT MIN(Score) FROM
(SELECT TOP 50 PERCENT Score FROM Posts WHERE TestID = Posts_parent.TestID ORDER BY Score DESC) AS TopHalf)
) / 2 AS MedianScore,
AVG(Score) AS AvgScore, MIN(Score) AS MinScore, MAX(Score) AS MaxScore
FROM Posts_parent
GROUP BY Posts_parent.TestID

For a continuous variable/measure 'col1' from 'table1'
select col1
from
(select top 50 percent col1,
ROW_NUMBER() OVER(ORDER BY col1 ASC) AS Rowa,
ROW_NUMBER() OVER(ORDER BY col1 DESC) AS Rowd
from table1 ) tmp
where tmp.Rowa = tmp.Rowd

Frequently, we may need to calculate Median not just for the whole table, but for aggregates with respect to some ID. In other words, calculate median for each ID in our table, where each ID has many records. (based on the solution edited by #gdoron: good performance and works in many SQL)
SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val,
COUNT(*) OVER (PARTITION BY our_id) AS cnt,
ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rnk
FROM our_table
) AS x
WHERE rnk IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;
Hope it helps.

For large scale datasets, you can try this GIST:
https://gist.github.com/chrisknoll/1b38761ce8c5016ec5b2
It works by aggregating the distinct values you would find in your set (such as ages, or year of birth, etc.), and uses SQL window functions to locate any percentile position you specify in the query.

To get median value of salary from employee table
with cte as (select salary, ROW_NUMBER() over (order by salary asc) as num from employees)
select avg(salary) from cte where num in ((select (count(*)+1)/2 from employees), (select (count(*)+2)/2 from employees));

I wanted to work out a solution by myself, but my brain tripped and fell on the way. I think it works, but don't ask me to explain it in the morning. :P
DECLARE #table AS TABLE
(
Number int not null
);
insert into #table select 2;
insert into #table select 4;
insert into #table select 9;
insert into #table select 15;
insert into #table select 22;
insert into #table select 26;
insert into #table select 37;
insert into #table select 49;
DECLARE #Count AS INT
SELECT #Count = COUNT(*) FROM #table;
WITH MyResults(RowNo, Number) AS
(
SELECT RowNo, Number FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Number) AS RowNo, Number FROM #table) AS Foo
)
SELECT AVG(Number) FROM MyResults WHERE RowNo = (#Count+1)/2 OR RowNo = ((#Count+1)%2) * ((#Count+2)/2)

--Create Temp Table to Store Results in
DECLARE #results AS TABLE
(
[Month] datetime not null
,[Median] int not null
);
--This variable will determine the date
DECLARE #IntDate as int
set #IntDate = -13
WHILE (#IntDate < 0)
BEGIN
--Create Temp Table
DECLARE #table AS TABLE
(
[Rank] int not null
,[Days Open] int not null
);
--Insert records into Temp Table
insert into #table
SELECT
rank() OVER (ORDER BY DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0), DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')),[SVR].[ref_num]) as [Rank]
,DATEDIFF(day,DATEADD(ss, SVR.open_date, '1970'),DATEADD(ss, SVR.close_date, '1970')) as [Days Open]
FROM
mdbrpt.dbo.View_Request SVR
LEFT OUTER JOIN dbo.dtv_apps_systems vapp
on SVR.category = vapp.persid
LEFT OUTER JOIN dbo.prob_ctg pctg
on SVR.category = pctg.persid
Left Outer Join [mdbrpt].[dbo].[rootcause] as [Root Cause]
on [SVR].[rootcause]=[Root Cause].[id]
Left Outer Join [mdbrpt].[dbo].[cr_stat] as [Status]
on [SVR].[status]=[Status].[code]
LEFT OUTER JOIN [mdbrpt].[dbo].[net_res] as [net]
on [net].[id]=SVR.[affected_rc]
WHERE
SVR.Type IN ('P')
AND
SVR.close_date IS NOT NULL
AND
[Status].[SYM] = 'Closed'
AND
SVR.parent is null
AND
[Root Cause].[sym] in ( 'RC - Application','RC - Hardware', 'RC - Operational', 'RC - Unknown')
AND
(
[vapp].[appl_name] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
OR
pctg.sym in ('Systems.Release Health Dashboard.Problem','DTV QA Test.Enterprise Release.Deferred Defect Log')
AND
[Net].[nr_desc] in ('3PI','Billing Rpts/Files','Collabrent','Reports','STMS','STMS 2','Telco','Comergent','OOM','C3-BAU','C3-DD','DIRECTV','DIRECTV Sales','DIRECTV Self Care','Dealer Website','EI Servlet','Enterprise Integration','ET','ICAN','ODS','SB-SCM','SeeBeyond','Digital Dashboard','IVR','OMS','Order Services','Retail Services','OSCAR','SAP','CTI','RIO','RIO Call Center','RIO Field Services','FSS-RIO3','TAOS','TCS')
)
AND
DATEADD(mm, DATEDIFF(mm, 0, DATEADD(ss, SVR.close_date, '1970')), 0) = DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,#IntDate,getdate())), 0)
ORDER BY [Days Open]
DECLARE #Count AS INT
SELECT #Count = COUNT(*) FROM #table;
WITH MyResults(RowNo, [Days Open]) AS
(
SELECT RowNo, [Days Open] FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [Days Open]) AS RowNo, [Days Open] FROM #table) AS Foo
)
insert into #results
SELECT
DATEADD(mm, DATEDIFF(mm,0,DATEADD(mm,#IntDate,getdate())), 0) as [Month]
,AVG([Days Open])as [Median] FROM MyResults WHERE RowNo = (#Count+1)/2 OR RowNo = ((#Count+1)%2) * ((#Count+2)/2)
set #IntDate = #IntDate+1
DELETE FROM #table
END
select *
from #results
order by [Month]

This works with SQL 2000:
DECLARE #testTable TABLE
(
VALUE INT
)
--INSERT INTO #testTable -- Even Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56
--
--INSERT INTO #testTable -- Odd Test
--SELECT 3 UNION ALL
--SELECT 5 UNION ALL
--SELECT 7 UNION ALL
--SELECT 12 UNION ALL
--SELECT 13 UNION ALL
--SELECT 14 UNION ALL
--SELECT 21 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 23 UNION ALL
--SELECT 29 UNION ALL
--SELECT 39 UNION ALL
--SELECT 40 UNION ALL
--SELECT 56
DECLARE #RowAsc TABLE
(
ID INT IDENTITY,
Amount INT
)
INSERT INTO #RowAsc
SELECT VALUE
FROM #testTable
ORDER BY VALUE ASC
SELECT AVG(amount)
FROM #RowAsc ra
WHERE ra.id IN
(
SELECT ID
FROM #RowAsc
WHERE ra.id -
(
SELECT MAX(id) / 2.0
FROM #RowAsc
) BETWEEN 0 AND 1
)

For newbies like myself who are learning the very basics, I personally find this example easier to follow, as it is easier to understand exactly what's happening and where median values are coming from...
select
( max(a.[Value1]) + min(a.[Value1]) ) / 2 as [Median Value1]
,( max(a.[Value2]) + min(a.[Value2]) ) / 2 as [Median Value2]
from (select
datediff(dd,startdate,enddate) as [Value1]
,xxxxxxxxxxxxxx as [Value2]
from dbo.table1
)a
In absolute awe of some of the codes above though!!!

This is as simple an answer as I could come up with. Worked well with my data. If you want to exclude certain values just add a where clause to the inner select.
SELECT TOP 1
ValueField AS MedianValue
FROM
(SELECT TOP(SELECT COUNT(1)/2 FROM tTABLE)
ValueField
FROM
tTABLE
ORDER BY
ValueField) A
ORDER BY
ValueField DESC

The following solution works under these assumptions:
No duplicate values
No NULLs
Code:
IF OBJECT_ID('dbo.R', 'U') IS NOT NULL
DROP TABLE dbo.R
CREATE TABLE R (
A FLOAT NOT NULL);
INSERT INTO R VALUES (1);
INSERT INTO R VALUES (2);
INSERT INTO R VALUES (3);
INSERT INTO R VALUES (4);
INSERT INTO R VALUES (5);
INSERT INTO R VALUES (6);
-- Returns Median(R)
select SUM(A) / CAST(COUNT(A) AS FLOAT)
from R R1
where ((select count(A) from R R2 where R1.A > R2.A) =
(select count(A) from R R2 where R1.A < R2.A)) OR
((select count(A) from R R2 where R1.A > R2.A) + 1 =
(select count(A) from R R2 where R1.A < R2.A)) OR
((select count(A) from R R2 where R1.A > R2.A) =
(select count(A) from R R2 where R1.A < R2.A) + 1) ;

DECLARE #Obs int
DECLARE #RowAsc table
(
ID INT IDENTITY,
Observation FLOAT
)
INSERT INTO #RowAsc
SELECT Observations FROM MyTable
ORDER BY 1
SELECT #Obs=COUNT(*)/2 FROM #RowAsc
SELECT Observation AS Median FROM #RowAsc WHERE ID=#Obs

I try with several alternatives, but due my data records has repeated values, the ROW_NUMBER versions seems are not a choice for me. So here the query I used (a version with NTILE):
SELECT distinct
CustomerId,
(
MAX(CASE WHEN Percent50_Asc=1 THEN TotalDue END) OVER (PARTITION BY CustomerId) +
MIN(CASE WHEN Percent50_desc=1 THEN TotalDue END) OVER (PARTITION BY CustomerId)
)/2 MEDIAN
FROM
(
SELECT
CustomerId,
TotalDue,
NTILE(2) OVER (
PARTITION BY CustomerId
ORDER BY TotalDue ASC) AS Percent50_Asc,
NTILE(2) OVER (
PARTITION BY CustomerId
ORDER BY TotalDue DESC) AS Percent50_desc
FROM Sales.SalesOrderHeader SOH
) x
ORDER BY CustomerId;

For your question, Jeff Atwood had already given the simple and effective solution. But, if you are looking for some alternative approach to calculate the median, below SQL code will help you.
create table employees(salary int);
insert into employees values(8); insert into employees values(23); insert into employees values(45); insert into employees values(123); insert into employees values(93); insert into employees values(2342); insert into employees values(2238);
select * from employees;
declare #odd_even int; declare #cnt int; declare #middle_no int;
set #cnt=(select count(*) from employees); set #middle_no=(#cnt/2)+1; select #odd_even=case when (#cnt%2=0) THEN -1 ELse 0 END ;
select AVG(tbl.salary) from (select salary,ROW_NUMBER() over (order by salary) as rno from employees group by salary) tbl where tbl.rno=#middle_no or tbl.rno=#middle_no+#odd_even;
If you are looking to calculate median in MySQL, this github link will be useful.

Related

Sorting top ten vendors and showing remained vendors as "other"

Please consider a table of vendors having two columns: VendorName and PayableAmount
I'm looking for a query which returns top ten vendors sorted by PayableAmount descending and sum of other payable amounts as "other" in 11th row.
Obviously, sum of PayableAmount from Vendors table should be equal to sum of PayableAmount from Query.
Technically, it's possible to do in one query:
declare #t table (
Name varchar(50) primary key,
Amount money not null
);
-- Dummy data
insert into #t (Name, Amount)
select top (20) sq.*
from (
select name, max(number) as [Amount]
from master.dbo.spt_values
where number between 100 and 100000
and name is not null
group by name
) sq
order by newid();
-- The table itself, for verification
select * from #t order by Amount desc;
-- Actual query
select top (11)
case when sq.RN > 10 then '<All others>' else sq.Name end as [VendorName],
case
when sq.RN > 10 then sum(sq.Amount) over(partition by case when sq.rn > 10 then 1 else 0 end)
else sq.Amount
end as [Value]
from (
select t.Name, t.Amount, row_number() over(order by t.Amount desc) as [RN]
from #t t
) sq
order by sq.RN;
It will even work on any SQL Server version starting with 2005. But, in real life, I would prefer to calculate these 2 parts separately and then UNION them.
This would perform the query you're looking for. Firstly extracting those in the top 10, then UNION ing that result with the higher ranked vendors, but calling those 'Other'
WITH rank AS (SELECT
VendorName,
PayableAmount,
ROW_NUMBER() OVER (ORDER BY PayableAmount DESC) AS rn
FROM vendors)
SELECT VendorName,
rn,
PayableAmount
FROM
rank WHERE rn <= 10
UNION
SELECT VendorName, 11 AS rn, PayableAmount
FROM
(
SELECT 'Other' AS VendorName,
SUM(PayableAmount) AS PayableAmount
FROM
rank WHERE rn > 10
) X11
ORDER BY rn
This has been tested in SQLFiddle.
this is for the 11th row
i didnt check it
declare #i int
set #i=
(select sum(x.PayableAmount)
from
(select * from table
except
select top 10 *from table
order by PayableAmount desc) as x)
select 'another',#i

Cumulative Total in MS Sql server [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Calculate a Running Total in SqlServer
I need to get the cumulative (running) total of a column in ms-sql server. I.e. if there is a column named “Marks”, then corresponding to each row cumulative sum will the sum of current and previous rows. Can we attain the result without using joins? Because my query is pretty big.
I have included a sample table and data:
CREATE TABLE "SCORE_CHART"
(
"STUDENT_NAME" NVARCHAR(20),
"MARKS" INT
)
INSERT INTO SCORE_CHART (STUDENT_NAME, MARKS) VALUES ('STUD1', 95);
INSERT INTO SCORE_CHART (STUDENT_NAME, MARKS) VALUES ('STUD2', 90);
INSERT INTO SCORE_CHART (STUDENT_NAME, MARKS) VALUES ('STUD3', 98);
SELECT STUDENT_NAME, MARKS FROM SCORE_CHART;
Expected result:
In oracle it’s easy to write like:
SELECT
STUDENT_NAME,
MARKS,
SUM(MARKS) OVER (ORDER BY STUDENT_NAME) CUM_SUM
FROM SCORE_CHART
ORDER BY STUDENT_NAME;
Thanks in advance.
The same query is supported from 2012 onwards. In older versions there are several approaches. Refer this http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals
try this:
you can get the cumulative sum just by joining the same table itself
SELECT S1.STUDENT_NAME, S1.MARKS ,sum(S2.MARKS) CUM_SUM
FROM SCORE_CHART S1 join SCORE_CHART S2
on S1.STUDENT_NAME>=S2.STUDENT_NAME
group by S1.STUDENT_NAME, S1.MARKS
order by S1.STUDENT_NAME, S1.MARKS
SQL Fiddle demo
You said no joins, what about a apply? ;)
SELECT STUDENT_NAME, MARKS, running.total
FROM SCORE_CHART a
cross apply
(
select SUM(marks) total
from score_chart b
where b.student_name <= a.student_name
) running
ORDER BY STUDENT_NAME;
With a index on student_name speed should be okay!
Check the query for Recursive CTE.
;with CTE as (select ROW_NUMBER() over (order by (select 0)) as id,STUDENT_NAME,MARKS from SCORE_CHART)
,CTE1 as (
select id,STUDENT_NAME,marks,marks as CUM_SUM from CTE where id=1
UNION ALL
select c.id,c.STUDENT_NAME,c.marks,c.marks+c1.CUM_SUM as CUM_SUM from CTE1 c1 inner join CTE c on c.id-1=c1.id)
select * from CTE1
Use Recursive CTE to achive this.
JUSt doing a join doesn't seem to guarantee order but comes up with the final answer ok:
select
x.STUDENT_NAME
, sum(y.marks) marks
from
SCORE_CHART x
join SCORE_CHART y
on x.STUDENT_NAME <= y.STUDENT_NAME
group by x.STUDENT_NAME
order by x.STUDENT_NAME
just seem the NO JOINS rule - will re-think
EDIT - running ok now: LIVE FIDDLE HERE
Creating the data
CREATE TABLE "SCORE_CHART"
(
"STUDENT_NAME" NVARCHAR(20),
"MARKS" INT
)
INSERT INTO SCORE_CHART (STUDENT_NAME, MARKS)
VALUES
('STUD1', 95),
('STUD2', 90),
('STUD3', 98)
Using a recursive CTE:
;WITH
init_cte(row,STUDENT_NAME,MARKS)
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY STUDENT_NAME),
STUDENT_NAME,
MARKS
FROM SCORE_CHART
)
,MinMax_cte(MinRow,MaxRow) AS (SELECT MIN(row),MAX(row) FROM init_cte)
,recursive_cte (row,STUDENT_NAME,MARKS,RUNNING_MARKS) AS
(
SELECT row,STUDENT_NAME,MARKS,MARKS
FROM init_cte
WHERE row = (SELECT MinRow FROM MinMax_cte)
UNION ALL
SELECT Y.row,y.STUDENT_NAME,y.MARKS,x.RUNNING_MARKS + y.MARKS
FROM recursive_cte x
INNER JOIN init_cte y
ON y.row = x.row + 1
WHERE y.row <= (SELECT [MaxRow] from MinMax_cte)
)
SELECT * FROM recursive_cte
As mentioned in a comment to you OP there is a similar question HERE ON SO
In that question Sam Saffron put forward a very elegant way of doing a running total using UPDATE. This is is applied to your data:
Using the same data created above but with the UPDATE trick:
CREATE TABLE #t ( ROW int, STUDENT_NAME NVARCHAR(20) , MARKS int, MARKS_RUNNING int)
INSERT INTO #t
SELECT
ROW_NUMBER() OVER (ORDER BY STUDENT_NAME),
STUDENT_NAME,
MARKS,
0
FROM SCORE_CHART
DECLARE #total int
SET #total = 0
UPDATE #t SET marksrunning = #total, #total = #total + MARKS
SELECT * FROM #t

SQL Server - Counting number of times an attribute in a dataset changes (non-concurrently)

I have a query that returns either a 1 or 0 based on whether or not an event occurred on a given date. This is ordered by date. Basically, a simple result set is:
Date | Type
---------------------
2010-09-27 1
2010-10-11 1
2010-11-29 0
2010-12-06 0
2010-12-13 1
2010-12-15 0
2010-12-17 0
2011-01-03 1
2011-01-04 0
What I would now like to be able to do is to count the number of separate, non-concurrent instances of '0's there are - i.e. count how many different groups of 0s appear.
In the above instance, the answer should be 3 (1 group of 2, then another group of 2, then finally 1 to end with).
Hopefully, the above example illustrates what I am trying to get at. I have been searching for a while, but am finding it difficult to succinctly describe what I am looking for, and hence haven't found anything of relevance.
Thanks in advance,
Josh
You could give each row a number in a CTE. Then you can join the table on itself to find the previous row. Knowing the previous row, you can sum the number of times the previous row is 1 and the current row is 0. For example:
; with NumberedRows as
(
select row_number() over (order by date) as rn
, type
from YourTable
)
select sum(case when cur.type = 0 and IsNull(prev.type,1) = 1 then 1 end)
from NumberedRows cur
left join
NumberedRows prev
on cur.rn = prev.rn + 1
This is a variant of the "islands" problem. My first answer uses Itzik Ben Gan's double row_number trick to identify contiguous groups of data efficiently. The combination of Type,Grp identifies each individual island in the data.
You can read more about the different approaches to tackling this problem here.
;WITH T AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY Date) -
ROW_NUMBER() OVER(PARTITION BY Type ORDER BY Date) AS Grp
FROM YourTable
)
SELECT COUNT(DISTINCT Grp)
FROM T
WHERE Type=0
My second answer requires a single pass through the data. It is not guaranteed to work but is on the same principle as a technique that many people successfully use to concatenate strings without problems.
DECLARE #Count int = 0
SELECT #Count = CASE WHEN Type = 0 AND #Count <=0 THEN -#Count+1
WHEN Type = 1 AND #Count > 0 THEN - #Count
ELSE #Count END
FROM YourTable
ORDER BY Date
SELECT ABS(#Count)
Have a look at this example, using Sql Server 2005+
DECLARE #Table TABLE(
Date DATETIME,
Type INT
)
INSERT INTO #Table SELECT '2010-09-27',1
INSERT INTO #Table SELECT '2010-10-11',1
INSERT INTO #Table SELECT '2010-11-29',0
INSERT INTO #Table SELECT '2010-12-06',0
INSERT INTO #Table SELECT '2010-12-13',1
INSERT INTO #Table SELECT '2010-12-15',0
INSERT INTO #Table SELECT '2010-12-17',0
INSERT INTO #Table SELECT '2011-01-03',1
INSERT INTO #Table SELECT '2011-01-04',0
;WITH Vals AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY Date) ROWID
FROM #Table
)
SELECT v.*
FROM Vals v LEFT JOIN
Vals vNext ON v.ROWID + 1 = vNext.ROWID
WHERE v.Type = 0
AND (vNext.Type = 1 OR vNext.Type IS NULL)

Equivalent of LIMIT and OFFSET for SQL Server?

In PostgreSQL there is the Limit and Offset keywords which will allow very easy pagination of result sets.
What is the equivalent syntax for SQL Server?
This feature is now made easy in SQL Server 2012.
This is working from SQL Server 2012 onwards.
Limit with offset to select 11 to 20 rows in SQL Server:
SELECT email FROM emailTable
WHERE user_id=3
ORDER BY Id
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
ORDER BY: required
OFFSET: optional number of skipped rows
NEXT: required number of next rows
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql
The equivalent of LIMIT is SET ROWCOUNT, but if you want generic pagination it's better to write a query like this:
;WITH Results_CTE AS
(
SELECT
Col1, Col2, ...,
ROW_NUMBER() OVER (ORDER BY SortCol1, SortCol2, ...) AS RowNum
FROM Table
WHERE <whatever>
)
SELECT *
FROM Results_CTE
WHERE RowNum >= #Offset
AND RowNum < #Offset + #Limit
The advantage here is the parameterization of the offset and limit in case you decide to change your paging options (or allow the user to do so).
Note: the #Offset parameter should use one-based indexing for this rather than the normal zero-based indexing.
select top {LIMIT HERE} * from (
select *, ROW_NUMBER() over (order by {ORDER FIELD}) as r_n_n
from {YOUR TABLES} where {OTHER OPTIONAL FILTERS}
) xx where r_n_n >={OFFSET HERE}
A note:
This solution will only work in SQL Server 2005 or above, since this was when ROW_NUMBER() was implemented.
You can use ROW_NUMBER in a Common Table Expression to achieve this.
;WITH My_CTE AS
(
SELECT
col1,
col2,
ROW_NUMBER() OVER(ORDER BY col1) AS row_number
FROM
My_Table
WHERE
<<<whatever>>>
)
SELECT
col1,
col2
FROM
My_CTE
WHERE
row_number BETWEEN #start_row AND #end_row
Specifically for SQL-SERVER you can achieve that in many different ways.For given real example we took Customer table here.
Example 1: With "SET ROWCOUNT"
SET ROWCOUNT 10
SELECT CustomerID, CompanyName from Customers
ORDER BY CompanyName
To return all rows, set ROWCOUNT to 0
SET ROWCOUNT 0
SELECT CustomerID, CompanyName from Customers
ORDER BY CompanyName
Example 2: With "ROW_NUMBER and OVER"
With Cust AS
( SELECT CustomerID, CompanyName,
ROW_NUMBER() OVER (order by CompanyName) as RowNumber
FROM Customers )
select *
from Cust
Where RowNumber Between 0 and 10
Example 3 : With "OFFSET and FETCH", But with this "ORDER BY" is mandatory
SELECT CustomerID, CompanyName FROM Customers
ORDER BY CompanyName
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY
Hope this helps you.
-- #RowsPerPage can be a fixed number and #PageNumber number can be passed
DECLARE #RowsPerPage INT = 10, #PageNumber INT = 2
SELECT *
FROM MemberEmployeeData
ORDER BY EmployeeNumber
OFFSET #PageNumber*#RowsPerPage ROWS
FETCH NEXT 10 ROWS ONLY
For me the use of OFFSET and FETCH together was slow, so I used a combination of TOP and OFFSET like this (which was faster):
SELECT TOP 20 * FROM (SELECT columname1, columname2 FROM tablename
WHERE <conditions...> ORDER BY columname1 OFFSET 100 ROWS) aliasname
Note: If you use TOP and OFFSET together in the same query like:
SELECT TOP 20 columname1, columname2 FROM tablename
WHERE <conditions...> ORDER BY columname1 OFFSET 100 ROWS
Then you get an error, so for use TOP and OFFSET together you need to separate it with a sub-query.
And if you need to use SELECT DISTINCT then the query is like:
SELECT TOP 20 FROM (SELECT DISTINCT columname1, columname2
WHERE <conditions...> ORDER BY columname1 OFFSET 100 ROWS) aliasname
Note: The use of SELECT ROW_NUMBER with DISTINCT did not work for me.
Adding a slight variation on Aaronaught's solution, I typically parametrize page number (#PageNum) and page size (#PageSize). This way each page click event just sends in the requested page number along with a configurable page size:
begin
with My_CTE as
(
SELECT col1,
ROW_NUMBER() OVER(ORDER BY col1) AS row_number
FROM
My_Table
WHERE
<<<whatever>>>
)
select * from My_CTE
WHERE RowNum BETWEEN (#PageNum - 1) * (#PageSize + 1)
AND #PageNum * #PageSize
end
Another sample :
declare #limit int
declare #offset int
set #offset = 2;
set #limit = 20;
declare #count int
declare #idxini int
declare #idxfim int
select #idxfim = #offset * #limit
select #idxini = #idxfim - (#limit-1);
WITH paging AS
(
SELECT
ROW_NUMBER() OVER (order by object_id) AS rowid, *
FROM
sys.objects
)
select *
from
(select COUNT(1) as rowqtd from paging) qtd,
paging
where
rowid between #idxini and #idxfim
order by
rowid;
There is here someone telling about this feature in sql 2011, its sad they choose a little different keyword "OFFSET / FETCH" but its not standart then ok.
The closest I could make is
select * FROM( SELECT *, ROW_NUMBER() over (ORDER BY ID ) as ct from [db].[dbo].[table] ) sub where ct > fromNumber and ct <= toNumber
Which I guess similar to select * from [db].[dbo].[table] LIMIT 0, 10
Elaborating the Somnath-Muluk's answer just use:
SELECT *
FROM table_name_here
ORDER BY (SELECT NULL AS NOORDER)
OFFSET 9 ROWS
FETCH NEXT 25 ROWS ONLY
w/o adding any extra column.
Tested in SQL Server 2019, but I guess could work in older ones as well.
select top (#TakeCount) * --FETCH NEXT
from(
Select ROW_NUMBER() OVER (order by StartDate) AS rowid,*
From YourTable
)A
where Rowid>#SkipCount --OFFSET
#nombre_row :nombre ligne par page
#page:numero de la page
//--------------code sql---------------
declare #page int,#nombre_row int;
set #page='2';
set #nombre_row=5;
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY etudiant_ID ) AS RowNum, *
FROM etudiant
) AS RowConstrainedResult
WHERE RowNum >= ((#page-1)*#nombre_row)+1
AND RowNum < ((#page)*#nombre_row)+1
ORDER BY RowNum
Since nobody provided this code yet:
SELECT TOP #limit f1, f2, f3...
FROM t1
WHERE c1 = v1, c2 > v2...
AND
t1.id NOT IN
(SELECT TOP #offset id
FROM t1
WHERE c1 = v1, c2 > v2...
ORDER BY o1, o2...)
ORDER BY o1, o2...
Important points:
ORDER BY must be identical
#limit can be replaced with number of results to retrieve,
#offset is number of results to skip
Please compare performance with previous solutions as they may be more efficient
this solution duplicates where and order by clauses, and will provide incorrect results if they are out of sync
on the other hand order by is there explicitly if that's what's needed
I assume that, In C# Expression/LINQ statement of skip and take generating below SQL Command
DECLARE #p0 Int = 1
DECLARE #p1 Int = 3
SELECT [t1].[Id]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[Id]
FROM [ShoppingCart] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t1].[ROW_NUMBER]
In SQL server you would use TOP together with ROW_NUMBER()
Since, I test more times this script more useful by 1 million records each page 100 records with pagination work faster my PC execute this script 0 sec while compare with mysql have own limit and offset about 4.5 sec to get the result.
Someone may miss understanding Row_Number() always sort by specific field. In case we need to define only row in sequence should use:
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
SELECT TOP {LIMIT} * FROM (
SELECT TOP {LIMIT} + {OFFSET} ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ROW_NO,*
FROM {TABLE_NAME}
) XX WHERE ROW_NO > {OFFSET}
Explain:
{LIMIT}: Number of records for each page
{OFFSET}: Number of skip records

Calculating percentile rankings in MS SQL

What's the best way to calculate percentile rankings (e.g. the 90th percentile or the median score) in MSSQL 2005?
I'd like to be able to select the 25th, median, and 75th percentiles for a single column of scores (preferably in a single record so I can combine with average, max, and min). So for example, table output of the results might be:
Group MinScore MaxScore AvgScore pct25 median pct75
----- -------- -------- -------- ----- ------ -----
T1 52 96 74 68 76 84
T2 48 98 74 68 75 85
I would think that this would be the simplest solution:
SELECT TOP N PERCENT FROM TheTable ORDER BY TheScore DESC
Where N = (100 - desired percentile). So if you wanted all rows in the 90th percentile, you'd select the top 10%.
I'm not sure what you mean by "preferably in a single record". Do you mean calculate which percentile a given score for a single record would fall into? e.g. do you want to be able to make statements like "your score is 83, which puts you in the 91st percentile." ?
EDIT: OK, I thought some more about your question and came up with this interpretation. Are you asking how to calculate the cutoff score for a particular percentile? e.g. something like this: to be in the 90th percentile you must have a score greater than 78.
If so, this query works. I dislike sub-queries though, so depending on what it was for, I'd probably try to find a more elegant solution. It does, however, return a single record with a single score.
-- Find the minimum score for all scores in the 90th percentile
SELECT Min(subq.TheScore) FROM
(SELECT TOP 10 PERCENT TheScore FROM TheTable
ORDER BY TheScore DESC) AS subq
Check out the NTILE command -- it will give you percentiles pretty easily!
SELECT SalesOrderID,
OrderQty,
RowNum = Row_Number() OVER(Order By OrderQty),
Rnk = RANK() OVER(ORDER BY OrderQty),
DenseRnk = DENSE_RANK() OVER(ORDER BY OrderQty),
NTile4 = NTILE(4) OVER(ORDER BY OrderQty)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43689, 63181)
How about this:
SELECT
Group,
75_percentile = MAX(case when NTILE(4) OVER(ORDER BY score ASC) = 3 then score else 0 end),
90_percentile = MAX(case when NTILE(10) OVER(ORDER BY score ASC) = 9 then score else 0 end)
FROM TheScore
GROUP BY Group
I've been working on this a little more, and here's what I've come up with so far:
CREATE PROCEDURE [dbo].[TestGetPercentile]
#percentile as float,
#resultval as float output
AS
BEGIN
WITH scores(score, prev_rank, curr_rank, next_rank) AS (
SELECT dblScore,
(ROW_NUMBER() OVER ( ORDER BY dblScore ) - 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [prev_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 0.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [curr_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [next_rank]
FROM TestScores
)
SELECT #resultval = (
SELECT TOP 1
CASE WHEN t1.score = t2.score
THEN t1.score
ELSE
t1.score + (t2.score - t1.score) * ((#percentile - t1.curr_rank) / (t2.curr_rank - t1.curr_rank))
END
FROM scores t1, scores t2
WHERE (t1.curr_rank = #percentile OR (t1.curr_rank < #percentile AND t1.next_rank > #percentile))
AND (t2.curr_rank = #percentile OR (t2.curr_rank > #percentile AND t2.prev_rank < #percentile))
)
END
Then in another stored procedure I do this:
DECLARE #pct25 float;
DECLARE #pct50 float;
DECLARE #pct75 float;
exec SurveyGetPercentile .25, #pct25 output
exec SurveyGetPercentile .50, #pct50 output
exec SurveyGetPercentile .75, #pct75 output
Select
min(dblScore) as minScore,
max(dblScore) as maxScore,
avg(dblScore) as avgScore,
#pct25 as percentile25,
#pct50 as percentile50,
#pct75 as percentile75
From TestScores
It still doesn't do quite what I'm looking for. This will get the stats for all tests; whereas I would like to be able to select from a TestScores table that has multiple different tests in it and get back the same stats for each different test (like I have in my example table in my question).
The 50th percentile is same as the median. When computing other percentile, say the 80th, sort the data for the 80 percent of data in ascending order and the other 20 percent in descending order, and take the avg of the two middle value.
NB: The median query has been around for a long time, but cannot remember where exactly I got it from, I have only amended it to compute other percentiles.
DECLARE #Temp TABLE(Id INT IDENTITY(1,1), DATA DECIMAL(10,5))
INSERT INTO #Temp VALUES(0)
INSERT INTO #Temp VALUES(2)
INSERT INTO #Temp VALUES(8)
INSERT INTO #Temp VALUES(4)
INSERT INTO #Temp VALUES(3)
INSERT INTO #Temp VALUES(6)
INSERT INTO #Temp VALUES(6)
INSERT INTO #Temp VALUES(6)
INSERT INTO #Temp VALUES(7)
INSERT INTO #Temp VALUES(0)
INSERT INTO #Temp VALUES(1)
INSERT INTO #Temp VALUES(NULL)
--50th percentile or median
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM #Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM #Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--90th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 90 PERCENT DATA
FROM #Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 10 PERCENT DATA
FROM #Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--75th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 75 PERCENT DATA
FROM #Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 25 PERCENT DATA
FROM #Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
i'd do something like:
select #n = count(*) from tbl1
select #median = #n / 2
select #p75 = #n * 3 / 4
select #p90 = #n * 9 / 10
select top 1 score from (select top #median score from tbl1 order by score asc) order by score desc
is this right?
i'd probably use a the sql server 2005
row_number() over (order by score ) / (select count(*) from scores)
or something along those lines.
Percentile is calculated by
(Rank -1) /(total_rows -1) when you sort values in ascending order.
The below query will give you percentile value between 0 and 1. Person with lowest marks will have 0 percentile.
SELECT Name, marks, (rank_1-1)/((select count(*) as total_1 from table)-1)as percentile_rank
from
(
SELECT Name,
Marks,
RANK() OVER (ORDER BY Marks) AS rank_1
from table
) as A