Cumulating value of current row + sum of previous rows - sql

How would you do to transform a Column in a table from this:
ColumnA ColumnB
2 a
3 b
4 c
5 d
1 a
to this:
ColumnA ColumnB
3 a
6(=3+3) b
10(=4+3+3) c
15(=5+4+3+3) d
I'm interested to see esp. what method you would pick.

Like this:
;WITH cte
AS
(
SELECT ColumnB, SUM(ColumnA) asum
FROM #t
gROUP BY ColumnB
), cteRanked AS
(
SELECT asum, ColumnB, ROW_NUMBER() OVER(ORDER BY ColumnB) rownum
FROM cte
)
SELECT (SELECT SUM(asum) FROM cteRanked c2 WHERE c2.rownum <= c1.rownum),
ColumnB
FROM cteRanked c1;
This should give you:
ColumnA ColumnB
3 a
6 b
10 c
15 d
Here is a live demo

I'd generally avoid trying to do so, but the following matches what you've asked for:
declare #T table (ColumnA int,ColumnB char(1))
insert into #T(ColumnA,ColumnB) values
(2 , 'a'),
(3 , 'b'),
(4 , 'c'),
(5 , 'd'),
(1, 'a')
;With Bs as (
select distinct ColumnB from #T
)
select
SUM(t.ColumnA),b.ColumnB
from
Bs b
inner join
#T t
on
b.ColumnB >= t.ColumnB
group by
b.ColumnB
Result:
ColumnB
----------- -------
3 a
6 b
10 c
15 d
For small data sets, this will be fine. But for larger data sets, note that the last row of the table relies on obtaining the SUM over the entire contents of the original table.

Try the below script,
DECLARE #T TABLE(ColumnA INT, ColumnB VARCHAR(50));
INSERT INTO #T VALUES
(2, 'a'),
(3, 'b'),
(4, 'c'),
(5, 'd'),
(1, 'a');
SELECT SUM(ColumnA) OVER(ORDER BY ColumnB) AS ColumnA,ColumnB
FROM ( SELECT SUM(ColumnA) AS ColumnA,ColumnB
FROM #T GROUP BY ColumnB )T

Not sure if this is optimal, but how about (SQL Fiddle):
SELECT x.A + COALESCE(SUM(y.A),0) ColumnA, x.ColumnB
FROM
(
SELECT SUM(ColumnA) A, ColumnB
FROM myTable
GROUP BY ColumnB
) x
LEFT OUTER JOIN
(
SELECT SUM(ColumnA) A, ColumnB
FROM myTable
GROUP BY ColumnB
) y ON y.ColumnB < x.ColumnB
GROUP BY x.ColumnB, x.A

create table #T
(
ID int primary key,
ColumnA int,
ColumnB char(1)
);
insert into #T
select row_number() over(order by ColumnB),
sum(ColumnA) as ColumnA,
ColumnB
from YourTable
group by ColumnB;
with C as
(
select ID,
ColumnA,
ColumnB
from #T
where ID = 1
union all
select T.ID,
T.ColumnA + C.ColumnA,
T.ColumnB
from #T as T
inner join C
on T.ID = C.ID + 1
)
select ColumnA,
ColumnB
from C
option (maxrecursion 0);
drop table #T;

Using SQL SERVER? SO
Let think you have a table with 3 column C_1, C_2, C_3 and ordered by C_1.
Simply use [Over (Order By C_1)] to add a column for sum of C_3:
Select C_1, C_2, C_3, Sum(C_3) Over (Order By C_1)
if you want row number too, do it in the same way:
Select Row_Number() Over (Order By C_1), C_1, C_2, C_3, Sum(C_3) Over (Order By C_1)

If you are using SQL Server 2012 or greater then this will produce the required result.
DECLARE #t TABLE(
ColumnA int,
ColumnB varchar(50)
);
INSERT INTO #t VALUES
(2,'a'),
(3,'b'),
(4,'c'),
(5,'d'),
(1,'a');
SELECT
SUM(ColumnA) OVER (ORDER BY ColumnB ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ColumnA,
ColumnB
FROM (
SELECT
ColumnB,
SUM(ColumnA) AS ColumnA
FROM #t
GROUP BY ColumnB
) DVTBL
ORDER BY ColumnB

DECLARE #t TABLE(ColumnA INT, ColumnB VARCHAR(50));
INSERT INTO #t VALUES
(2, 'a'),
(3 , 'b'),
(4 , 'c'),
(5 , 'd'),
(1 , 'a');
;WITH cte
AS
(
SELECT ColumnB, sum(ColumnA) value,ROW_NUMBER() OVER(ORDER BY ColumnB) sr_no FROM #t group by ColumnB
)
SELECT ColumnB
,SUM(value) OVER ( ORDER BY ColumnB ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING)
FROM cte c1;

The best solution (simplest and quickest) is to use a OVER(ORDER BY) clause.
I will give and explain my problem and the solution found.
I have a table containing some annual transaction that have following columns
Yearx INT
NoSeq INT
Amount DECIMAL(10,2)
Balance DECIMAL(10,2)
The first three columns have values; balance column is empty.
Problem
How to fill Balance values considering that first value at 1 January is 5000€ ?
Example
NoSeq Amount Balance
----- -------- ---------
1 120.00+ 5120.00+ <= 5000 + 120
2 16.00- 5104.00+ <= 5000 + 120 - 16
3 3000.00- 2104.00+ <= 5000 + 120 - 16 + 3000
4 640.00+ 2740.00+ <= 5000 + 120 - 16 + 3000 + 640
Solution (based on Abdul Rasheed answer)
WITH
t AS
(
SELECT NoSeq
,Amount
FROM payements
WHERE Yearx = 2021
)
SELECT NoSeq
,Amount
,1179.18 + SUM(Amount) OVER(ORDER BY NoSeq
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS Balance
FROM t
In considering that on PostGreSql ROW BETWEEN used before is default, previous SELECT can be reduced to
WITH
t AS
(
SELECT NoSeq
,Amount
FROM payements
WHERE Yearx = 2021
)
SELECT NoSeq
,Amount
,1179.18 + SUM(Amount) OVER(ORDER BY NoSeq) as balance
FROM t
The first part (WITH clause) is used to define table on which OVER(ORDER BY) is apply in final SELECT.
The second part compute running sum using temporaty T table.
In my case, WITH clause is not necessary and SELECT command can be ultimely reducted to following SQL command
SELECT NoSeq
,Amount
,1179.18 + SUM(Amount) OVER(ORDER BY NoSeq) as balance
FROM payements
WHERE Yearx = 2021
I use this last SQL command in my VB.Net - Postgresql application.
To compute more that one year knowing Balance value on 1 January 2010, I use following SQL command
SELECT Yearx
,NoSeq
,Amount
,-279.34 + SUM(Amount) OVER(ORDER BY Yearx,NoSeq) as balance
FROM payements
WHERE Yearx BETWEEN 2010 AND 2021

You can do in this way also:
WITH grpAllData
AS
(
SELECT ColumnB, SUM(ColumnA) grpValue
FROM table_Name
gROUP BY ColumnB
)
SELECT g.ColumnB, sum(grpValue) OVER(ORDER BY ColumnB) desireValue
FROM grpAllData g
order by ColumnB
In the above query, We first aggregate all values in the same group, then in the final select just applied a window function on the previous result.

SELECT g.columnB as "ColumnB",
SUM(g.group_sum) over (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as "ColumnA"
FROM (
SELECT SUM(ColumnA) as group_sum,
ColumnB
FROM cand
GROUP BY ColumnB
ORDER BY ColumnB) g
Grouping ColumnB with SUM aggregation of ColumnA. And then applying window function to ColumnA to generate cumulative sum.

That was my question too and I used answers here. With more research I found another solution which is more optimized and easier, also more fun! This solutions is based on Window Functions. here it is:
--- creating table and inserting values of the question
CREATE TABLE #tmp ( ColumnA INT , ColumnB VARCHAR(1))
INSERT INTO #tmp
VALUES (2,'a'),(3,'b'),(4,'c'),(5,'d'),(1,'a')
---- my solution
SELECT
SUM(ColumnA) OVER (ORDER BY ColumnB ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ColumnA
,ColumnB
FROM
(
SELECT SUM(ColumnA) ColumnA,ColumnB FROM #tmp GROUP BY ColumnB
) X
And the result is:
ColumnA ColumnB
----------- -------
3 a
6 b
10 c
15 d

This will work based on grouping of columns cumulative summation for a column.
See the below SQL
SELECT product,
product_group,
fiscal_year,
Sum(quantity) OVER ( partition BY fiscal_year,a.product_group ORDER BY a.posting_date, a.product_group rows 100000000 PRECEDING) AS quantity
FROM report
WHERE
order by b.fiscal_year DESC

You can use below simple select statement for the same
SELECT COLUMN_A, COLUMN_B,
(SELECT SUM(COLUMN_B) FROM #TBL T2 WHERE T2.ID <= T1.ID) as SumofPreviousRow FROM #TBL T1;

Related

Select first occurrence of list item in table

I have a list like this example:
abc, efg, rty
and a table with following data:
1 abcd
2 efgh
3 abcd
4 rtyu
5 efgh
now I want to find the first-row which start with list item in the table. my expected result is:
1 abcd
2 efgh
4 rtyu
This is a complete script to do the job
Declare #v_List Table
(
Text nvarchar(100)
)
Declare #v_Data Table
(
Number int,
Text nvarchar(100)
)
Insert Into #v_List values(N'abc')
Insert Into #v_List values(N'efg')
Insert Into #v_List values(N'rty')
Insert Into #v_Data values(1, N'abcd')
Insert Into #v_Data values(2, N'efgh')
Insert Into #v_Data values(3, N'abcd')
Insert Into #v_Data values(4, N'rtyu')
Insert Into #v_Data values(5, N'efgh')
;with CTE as
(
Select D.Number,
D.Text,
ROW_NUMBER() OVER (PARTITION BY L.Text Order By D.Number) as Row_No
From #v_Data D
Join #v_List L
On D.Text like L.Text + '%'
)
Select CTE.Number,
CTE.Text
From CTE
Where CTE.Row_No = 1
select * from TableName
where Id in
(
select min(Id) from
(
select Id,
case
when Val like 'abc%' then 1
when Val like 'efg%' then 2
when Val like 'rty%' then 3
else 0 end temp
from TableName
)t where temp > 0
group by temp
)
You can use a windowed ROW_NUMBER to generate a sequential number by each different value, then just display the first one only.
;WITH RowNumbersByValue AS
(
SELECT
T.ID,
T.Value,
RowNumber = ROW_NUMBER() OVER (PARTITION BY T.Value ORDER BY T.ID)
FROM
YourTable AS T
)
SELECT
R.ID,
R.Value
FROM
RowNumbersByValue AS R
WHERE
R.Value IN ('abcd', 'efgh', 'rtyu') AND
R.RowNumber = 1
For SQL Server I prefer this version, which does not require a subquery:
SELECT TOP 1 WITH TIES ID, Value
FROM yourTable
WHERE Value LIKE 'abc%' OR Value LIKE 'efg%' OR Value LIKE 'rty%'
ORDER BY ROW_NUMBER() OVER (PARTITION BY Value ORDER BY ID);
SELECT * INTO #temp FROM (VALUES
(1 ,'abcd'),
(2 ,'efgh'),
(3 ,'abcd'),
(4 ,'rtyu'),
(5 ,'efgh'))a([id], [name])
You can use min and group by function
SELECT MIN(id), name FROM #temp GROUP BY name
You may use this, there are so many ways to achieve this, use whichever suits you better.
using subquery
select id, col from
(select Row_number() over (partition by col order by id) as slno, id, col from yourtable)
as tb where tb.slno=1
using cte
; with cte as (
select row_number() over (partition by col order by id) as Slno, id, col from table)
select id, col from cte where slno=1
using min
select Min(id) , col from table group by col
Note:-
In the end of any above mentioned query you may apply your where clause to filter your records as needed.

2 rows differences

I would like to get 2 consecutive rows from an SQL table.
One of the columns storing UNIX datestamp and between 2 rows the difference only this value.
For example:
id_int dt_int
1. row 8211721 509794233
2. row 8211722 509794233
I need only those rows where dt_int the same (edited)
Do you want both lines to be shown?
A solution could be this:
with foo as
(
select
*
from (values (8211721),(8211722),(8211728),(8211740),(8211741)) a(id_int)
)
select
id_int
from
(
select
id_int
,id_int-isnull(lag(id_int,1) over (order by id_int) ,id_int-6) prev
,isnull(lead(id_int,1) over (order by id_int) ,id_int+6)-id_int nxt
from foo
) a
where prev<=5 or nxt<=5
We use lead and lag, to find the differences between rows, and keep the rows where there is less than or equal to 5 for the row before or after.
If you use 2008r2, then lag and lead are not available. You could use rownumber in stead:
with foo as
(
select
*
from (values (8211721),(8211722),(8211728),(8211740),(8211741)) a(id_int)
)
, rownums as
(
select
id_int
,row_number() over (order by id_int) rn
from foo
)
select
id_int
from
(
select
cur.id_int
,cur.id_int-prev.id_int prev
,nxt.id_int-cur.id_int nxt
from rownums cur
left join rownums prev
on cur.rn-1=prev.rn
left join rownums nxt
on cur.rn+1=nxt.rn
) a
where isnull(prev,6)<=5 or isnull(nxt,6)<=5
Assuming:
lead() analytical function available.
ID_INT is what we need to sort by to determine table order...
you may need to partition by some value lead(ID_int) over(partition by SomeKeysuchasOrderNumber order by ID_int asc) so that orders and dates don't get mixed together.
.
WITH CTE AS (
SELECT A.*
, lead(ID_int) over ([missing partition info] ORDER BY id_Int asc) - id_int as ID_INT_DIFF
FROM Table A)
SELECT *
FROM CTE
WHERE ID_INT_DIFF < 5;
You can try it. This version works on SQL Server 2000 and above. Today I don not a more recent SQL Server to write on.
declare #t table (id_int int, dt_int int)
INSERT #T SELECT 8211721 , 509794233
INSERT #T SELECT 8211722 , 509794233
INSERT #T SELECT 8211723 , 509794235
INSERT #T SELECT 8211724 , 509794236
INSERT #T SELECT 8211729 , 509794237
INSERT #T SELECT 8211731 , 509794238
;with cte_t as
(SELECT
ROW_NUMBER() OVER (ORDER BY id_int) id
,id_int
,dt_int
FROM #t),
cte_diff as
( SELECT
id_int
,dt_int
,(SELECT TOP 1 dt_int FROM cte_t b WHERE a.id < b.id) dt_int1
,dt_int - (SELECT TOP 1 dt_int FROM cte_t b WHERE a.id < b.id) Difference
FROM cte_t a
)
SELECT DISTINCT id_int , dt_int FROM #t a
WHERE
EXISTS(SELECT 1 FROM cte_diff b where b.Difference =0 and a.dt_int = b.dt_int)

Get two random records (different in one attribute) from table

Very simple table as an example but no idea how to achieve this:
Example: Table1
ColumnA ColumnB
1 A
1 B
2 C
For two random records: I know I could do like
Select top 2 *
From Table1
order by NewID()
But now I would like to select two random records out but cannot be such a combination that has both '1' for column A, which means the result cannot accept '1 A' together with '1 B', the rest are fine.
Any ideas? Thanks in advance
DROP TABLE #T
CREATE TABLE #T(ID INT
,Vals CHAR(2)
)
INSERT INTO #T VALUES
(1,'A')
,(1,'B')
,(2,'A')
,(2,'C')
,(3,'D')
,(4,'E')
,(5,'E')
SELECT TOP 2
ID,
Vals
FROM
(
SELECT
ID
,VALS
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NEWID()) Rnk
FROM
#T) T
WHERE
Rnk = 1
order by NewID()
Here's a way to do it, but it can get expensive if your table is very large:
;With Random As
(
Select *,
Row_Number() Over (Partition By ColumnA Order By NewId()) As RN
From Table1
)
Select Top 2 ColumnA, ColumnB
From Random
Where RN = 1
Order By NewId()

SQL group by if values are close

Class| Value
-------------
A | 1
A | 2
A | 3
A | 10
B | 1
I am not sure whether it is practical to achieve this using SQL.
If the difference of values are less than 5 (or x), then group the rows (of course with the same Class)
Expected result
Class| ValueMin | ValueMax
---------------------------
A | 1 | 3
A | 10 | 10
B | 1 | 1
For fixed intervals, we can easily use "GROUP BY". But now the grouping is based on nearby row's value. So if the values are consecutive or very close, they will be "chained together".
Thank you very much
Assuming MSSQL
You are trying to group things by gaps between values. The easiest way to do this is to use the lag() function to find the gaps:
select class, min(value) as minvalue, max(value) as maxvalue
from (select class, value,
sum(IsNewGroup) over (partition by class order by value) as GroupId
from (select class, value,
(case when lag(value) over (partition by class order by value) > value - 5
then 0 else 1
end) as IsNewGroup
from t
) t
) t
group by class, groupid;
Note that this assumes SQL Server 2012 for the use of lag() and cumulative sum.
Update:
*This answer is incorrect*
Assuming the table you gave is called sd_test, the following query will give you the output you are expecting
In short, we need a way to find what was the value on the previous row. This is determined using a join on row ids. Then create a group to see if the difference is less than 5. and then it is just regular 'Group By'.
If your version of SQL Server supports windowing functions with partitioning the code would be much more readable.
SELECT
A.CLASS
,MIN(A.VALUE) AS MIN_VALUE
,MAX(A.VALUE) AS MAX_VALUE
FROM
(SELECT
ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
,CLASS
,VALUE
FROM SD_TEST) AS A
LEFT JOIN
(SELECT
ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
,CLASS
,VALUE
FROM SD_TEST) AS B
ON A.CLASS = B.CLASS AND A.ROW_ID=B.ROW_ID+1
GROUP BY A.CLASS,CASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END
ORDER BY A.CLASS,cASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END DESC
ps: I think the above is ANSI compliant. So should run in most SQL variants. Someone can correct me if it is not.
These give the correct result, using the fact that you must have the same number of group starts as ends and that they will both be in ascending order.
if object_id('tempdb..#temp') is not null drop table #temp
create table #temp (class char(1),Value int);
insert into #temp values ('A',1);
insert into #temp values ('A',2);
insert into #temp values ('A',3);
insert into #temp values ('A',10);
insert into #temp values ('A',13);
insert into #temp values ('A',14);
insert into #temp values ('b',7);
insert into #temp values ('b',8);
insert into #temp values ('b',9);
insert into #temp values ('b',12);
insert into #temp values ('b',22);
insert into #temp values ('b',26);
insert into #temp values ('b',67);
Method 1 Using CTE and row offsets
with cte as
(select distinct class,value,ROW_NUMBER() over ( partition by class order by value ) as R from #temp),
cte2 as
(
select
c1.class
,c1.value
,c2.R as PreviousRec
,c3.r as NextRec
from
cte c1
left join cte c2 on (c1.class = c2.class and c1.R= c2.R+1 and c1.Value < c2.value + 5)
left join cte c3 on (c1.class = c3.class and c1.R= c3.R-1 and c1.Value > c3.value - 5)
)
select
Starts.Class
,Starts.Value as StartValue
,Ends.Value as EndValue
from
(
select
class
,value
,row_number() over ( partition by class order by value ) as GroupNumber
from cte2
where PreviousRec is null) as Starts join
(
select
class
,value
,row_number() over ( partition by class order by value ) as GroupNumber
from cte2
where NextRec is null) as Ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber
** Method 2 Inline views using not exists **
select
Starts.Class
,Starts.Value as StartValue
,Ends.Value as EndValue
from
(
select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
from
(select distinct class,value from #temp) as T
where not exists (select 1 from #temp where class=t.class and Value < t.Value and Value > t.Value -5 )
) Starts join
(
select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
from
(select distinct class,value from #temp) as T
where not exists (select 1 from #temp where class=t.class and Value > t.Value and Value < t.Value +5 )
) ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber
In both methods I use a select distinct to begin because if you have a dulpicate entry at a group start or end things go awry without it.
Here is one way of getting the information you are after:
SELECT Under5.Class,
(
SELECT MIN(m2.Value)
FROM MyTable AS m2
WHERE m2.Value < 5
AND m2.Class = Under5.Class
) AS ValueMin,
(
SELECT MAX(m3.Value)
FROM MyTable AS m3
WHERE m3.Value < 5
AND m3.Class = Under5.Class
) AS ValueMax
FROM
(
SELECT DISTINCT m1.Class
FROM MyTable AS m1
WHERE m1.Value < 5
) AS Under5
UNION
SELECT Over4.Class,
(
SELECT MIN(m4.Value)
FROM MyTable AS m4
WHERE m4.Value >= 5
AND m4.Class = Over4.Class
) AS ValueMin,
(
SELECT Max(m5.Value)
FROM MyTable AS m5
WHERE m5.Value >= 5
AND m5.Class = Over4.Class
) AS ValueMax
FROM
(
SELECT DISTINCT m6.Class
FROM MyTable AS m6
WHERE m6.Value >= 5
) AS Over4

How to select top 3 values from each group in a table with SQL which have duplicates [duplicate]

This question already has answers here:
Select top 10 records for each category
(14 answers)
Closed 5 years ago.
Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.
The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?
For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:
-----------+-------+
| name | value |
-----------+-------+
| John | 20 |
| John | 7 |
| John | 7 |
-----------+-------+
Only 3 rows should be returned for John even though there are three 7s for John.
In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:
WITH cte AS
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
)
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Without CTE, only ROW_NUMBER():
SELECT name, value, rn
FROM
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
) tmp
WHERE rn <= 3
ORDER BY name, rn ;
Tested in:
Postgres
Oracle
SQL-Server
In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.
The (tid) is assumed to be the primary key of the table:
SELECT t.tid, t.name, t.value, -- self join and GROUP BY
COUNT(*) AS rn
FROM t
JOIN t AS t2
ON t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;
SELECT t.tid, t.name, t.value, rn
FROM
( SELECT t.tid, t.name, t.value,
( SELECT COUNT(*) -- inline, correlated subquery
FROM t AS t2
WHERE t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
) AS rn
FROM t
) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Tested in MySQL
I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.
Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3)
However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.
Here is an example of it working in SQL Server:
with t as (
select 1 as personid, 5 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 7 as val union all
select 1 as personid, 8 as val
)
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3);
Using GROUP_CONCAT and FIND_IN_SET you can do that.Check SQLFIDDLE.
SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
FROM tbl t1
WHERE t1.name = t.name
GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc
If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.
Try this -
CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);
WITH cte
AS (
SELECT NAME
,value
,ROW_NUMBER() OVER (
PARTITION BY NAME ORDER BY (value) DESC
) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC
This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)
if object_id('tempdb..#Data') is not null drop table #Data;
GO
create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO
with D as (
SELECT
name
,Value
,row_number() over (partition by name order by value desc) rn
From
#Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc
Name Value
Jack 5
Jane 30
Jane 21
Jane 18
John 20
John 7
John 7