sql: sum of highest n values in each group [duplicate] - sql

This question already has answers here:
Using LIMIT within GROUP BY to get N results per group?
(14 answers)
In MySQL, how can I find the sum of the N largest values grouped on a particular column? [duplicate]
(1 answer)
Closed 4 years ago.
I need to find the sum of the n highest values in each group.
With (n=2):
group | points
g1 | 3
g2 | 3
g3 | 4
g1 | 2
g1 | 4
g2 | 5
g2 | 5
g3 | 1
g3 | 2
result
group | sum
g1 | 7
g2 | 10
g3 | 6
sql using join and group
thanks

If your RDBMS supports window function, you can use ROW_NUMBER() to assign a number to each record in the group, ordered by points, and then filter out top 2 records of each group in an outer, aggregated query.
SELECT grp, SUM(points) total
FROM (
SELECT grp, points, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY points DESC) rn
FROM mytable
) x
WHERE rn <= 2
GROUP BY grp
ORDER BY grp
This MySQL 8.0 DB Fiddle with your sample data yields :
| grp | total |
| --- | ----- |
| g1 | 7 |
| g2 | 10 |
| g3 | 6 |

Example dataset:
create table #temp (name varchar(20), value int)
insert into #temp values ('g1',2),('g2',2),('g3',2),('g2',7),
('g3',9),('g1',4),('g2',8),('g3',1),('g1',3),('g1',11)
Another way to handle this issue is to use "cross apply" like in Sql Server below:
--This returns top 2 rows for each group where its value is highest.
SELECT x.*
FROM ( SELECT DISTINCT name FROM #temp ) c
CROSS APPLY ( SELECT TOP 2 * FROM #temp t WHERE c.name = t.name order by value desc ) x
--This returns sum of top 2 value of each group
SELECT x.name, SUM(x.Value) as Total
FROM ( SELECT DISTINCT name FROM #temp ) c
CROSS APPLY ( SELECT TOP 2 * FROM #temp t WHERE c.name = t.name order by value desc ) x
group by x.name
Since your #n value here is not static, and will be changed based on user's choice, you can use dynamic query like below:
declare #n int = 2;
declare #sql nvarchar(max) = 'SELECT x.name, SUM(x.Value) as Total
FROM ( SELECT DISTINCT name FROM #temp ) c
CROSS APPLY ( SELECT TOP '+cast(#n as nvarchar(10))+' * FROM #temp t
WHERE c.name = t.name order by value desc ) x
group by x.name'
exec sp_executesql #sql

Related

How to find duplicate sets of values in column SQL

I have a database table in SQL Server like this:
+----+--------+
| ID | Number |
+----+--------+
| 1 | 4 |
| 2 | 2 |
| 3 | 6 |
| 4 | 5 |
| 5 | 3 |
| 6 | 2 |
| 7 | 6 |
| 8 | 4 |
| 9 | 5 |
| 10 | 1 |
| 11 | 6 |
| 12 | 4 |
| 13 | 2 |
| 14 | 6 |
+----+--------+
I want to get all values ​​of rows that are the same with last row or last 2 rows or last 3 rows or .... in column Number, and when finding those values, will go on to get the values ​​that appear next and count the number its appearance.
Result output like this:
If the same with the last row:
We see that the number next to 6 in column Number is 4 and 5.
Times appear in column Number of pair 6,4 is 2 and pair 6,5 is 1.
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 6 | 5 | 1 |
| 6 | 4 | 2 |
+---------------------+-------------------------+--------------+
If the same with the last two rows:
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 2,6 | 5 | 1 |
| 2,6 | 4 | 1 |
+---------------------+-------------------------+--------------+
If the same with the last 3 rows:
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 4,2,6 | 5 | 1 |
+---------------------+-------------------------+--------------+
And if the last 4,5,6...rows, find until Times appear returns 0
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 6,4,2,6 | | |
+---------------------+-------------------------+--------------+
Any idea how to get this. Thank so much!
Here's an answer which uses the 'Lead' function - which (once ordered) takes a value from a certain number of rows ahead.
It converts your table with 1 number, to also include the next 3 numbers on each row.
Then you can join on those columns to get numbers etc.
CREATE TABLE #Src (Id int PRIMARY KEY, Num int)
INSERT INTO #Src (Id, Num) VALUES
( 1, 4),
( 2, 2),
( 3, 6),
( 4, 5),
( 5, 3),
( 6, 2),
( 7, 6),
( 8, 4),
( 9, 5),
(10, 1),
(11, 6),
(12, 4),
(13, 2),
(14, 6)
CREATE TABLE #SrcWithNext (Id int PRIMARY KEY, Num int, Next1 int, Next2 int, Next3 int)
-- First step - use LEAD to get the next 1, 2, 3 values
INSERT INTO #SrcWithNext (Id, Num, Next1, Next2, Next3)
SELECT ID, Num,
LEAD(Num, 1, NULL) OVER (ORDER BY Id) AS Next1,
LEAD(Num, 2, NULL) OVER (ORDER BY Id) AS Next2,
LEAD(Num, 3, NULL) OVER (ORDER BY Id) AS Next3
FROM #Src
SELECT * FROM #SrcWithNext
/* Find number with each combination */
-- 2 chars
SELECT A.Num, A.Next1, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B ON A.Num = B.Num AND A.Next1 = B.Next1
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1
ORDER BY A.Num, A.Next1
-- 3 chars
SELECT A.Num, A.Next1, A.Next2, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1, Next2 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B
ON A.Num = B.Num
AND A.Next1 = B.Next1
AND A.Next2 = B.Next2
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1, A.Next2
ORDER BY A.Num, A.Next1, A.Next2
-- 4 chars
SELECT A.Num, A.Next1, A.Next2, A.Next3, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1, Next2, Next3 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B
ON A.Num = B.Num
AND A.Next1 = B.Next1
AND A.Next2 = B.Next2
AND A.Next3 = B.Next3
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1, A.Next2, A.Next3
ORDER BY A.Num, A.Next1, A.Next2, A.Next3
Here's a db<>fiddle to check.
Notes
The A.Num <= B.Num means it finds all matches to itself, and then only counts others once
This answer finds all combinations. To filter, it currently would need to filter as separate columns e.g., instead of 2,6, you'd filter on Num = 2 AND Next1 = 6. Feel free to then do various text/string concatenation functions to create references for your preferred search/filter approach.
Hmmm . . . I am thinking that you want to create the "pattern to find" as a string. Unfortunately, string_agg() is not a windowing function, but you can use apply:
select t.*, p.*
from t cross apply
(select string_agg(number, ',') within group (order by id) as pattern
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
) p;
You would change the "3" to whatever number of rows that you want.
Then you can use this to identify the rows where the patterns are matched and aggregate:
with tp as (
select t.*, p.*
from t cross apply
(select string_agg(number, ',') within group (order by id) as pattern
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
) p
)
select pattern_to_find, next_number, count(*)
from (select tp.*,
first_value(pattern) over (order by id desc) as pattern_to_find,
lead(number) over (order by id) as next_number
from tp
) tp
where pattern = pattern_to_find
group by pattern_to_find, next_number;
Here is a db<>fiddle.
If you are using an older version of SQL Server -- one that doesn't support string_agg() -- you can calculate the pattern using lag():
with tp as (
select t.*,
concat(lag(number, 2) over (order by id), ',',
lag(number, 1) over (order by id), ',',
number
) as pattern
from t
)
Actually, if you have a large amount of data, it would be interesting to know which is faster -- the apply version or the lag() version. I suspect that lag() might be faster.
EDIT:
In unsupported versions of SQL Server, you can get the pattern using:
select t.*, p.*
from t cross apply
(select (select cast(number as varchar(255)) + ','
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
order by t2.id desc
for xml path ('')
) as pattern
) p
You can use similar logic for lead().
I tried to solve this problem by converting "Number" column to a string.
Here is my code using a function with input of "number of last selected rows":
(Be careful that the name of the main table is "test" )
create function duplicate(#nlast int)
returns #temp table (RowNumbers varchar(20), Number varchar(1))
as
begin
declare #num varchar(20)
set #num=''
declare #count int=1
while #count <= (select count(id) from test)
begin
set #num = #num + cast((select Number from test where #count=ID) as varchar(20))
set #count=#count+1
end
declare #lastnum varchar(20)
set #lastnum= (select RIGHT(#num,#nlast))
declare #count2 int=1
while #count2 <= len(#num)-#nlast
begin
if (SUBSTRING(#num,#count2,#nlast) = #lastnum)
begin
insert into #temp
select #lastnum ,SUBSTRING(#num,#count2+#nlast,1)
end
set #count2=#count2+1
end
return
end
go
select RowNumbers AS "Condition to find", Number AS "Next Number in column" , COUNT(Number) AS "Times appear" from dbo.duplicate(2)
group by Number, RowNumbers

How to obtain list in sql query

In one application, I have a table with three fields, being Id, Name and count.
Id | Name | Value
1 | A | 5
2 | B | 9
3 | C | 9
4 | D | 5
5 | E | 6
6 | F | 6
now, how can I obtain a cross table from the above? I mean, as follows:
Value | Count
---- | ----
5 | 2
6 | 2
7 | 0
8 | 0
9 | 2
can you help, please?
First, you need to create a tally table. There are many methods for that. You will use the tally table to number off all the values between min and max of your source table. Once you have all the numbers between min and max, you will need to LEFT JOIN those into a a version of your table where you use COUNT() and GROUP BY to total the number of times each value appears.
Below Table A is the tally table.
Table B is your aggregated source table.
DECLARE #MinValue INT
DECLARE #MaxValue INT
SET #MinValue = (SELECT MIN(Value) FROM dbo.MyTable)
SET #MaxValue = (SELECT MAX(Value) FROM dbo.MyTable)
SELECT number as Value, COALESCE(Count,0) AS Count
FROM (
SELECT DISTINCT number
FROM master..spt_values
WHERE number
BETWEEN #MinValue AND #MaxValue
) AS A
LEFT JOIN (
SELECT Value, COUNT(Value) AS Count
FROM dbo.MyTable
GROUP BY Value
) AS B
ON A.number = B.value

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.
You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;
If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group
Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1

Select only one row with same key value from a table

I have this table:
reg_no | cname | no
1 | X | 1
1 | Y | 2
2 | X | 1
2 | Y | 2
What I want to do is to select all that rows but I only want one row for each reg_no when I arrange it in desc (it should only get the row with highest no for each reg_no).
The output should be:
1 Y 2
2 Y 2
Use Row_Number() window function
select Reg_no,C_name,no from
(
select row_number() over(partition by reg_no order by no desc) Rn,*
from yourtable
) A
where rn=1
or ANSI SQL standard will work in sql server 2000. Find the max no in each reg_no then join the result back to the main table.
select A.Reg_no,A.C_name,A.no
from yourtable As A
Inner Join
(
select max(no) As no,Reg_no
from yourtable
group by Reg_No
) As B
on A.No=B.No and A.Reg_No=B.Reg_no
In MSSQL using CROSS APPLY this would be
SELECT DISTINCT
r1.reg_no, r2.cname, r2.no
FROM
table_name r1
CROSS APPLY
(SELECT TOP 1
r.cname, r.no
FROM
table_name r
WHERE r1.reg_no = r.reg_no
ORDER BY r.no DESC) r2

SQL Server - Sum entire column AND Group By

Suppose I had the following table in SQL Server:
grp: val: criteria:
a 1 1
a 1 1
b 1 1
b 1 1
b 1 1
c 1 1
c 1 1
c 1 1
d 1 1
Now what I want is to get an output which would basically be:
Select grp, val / [sum(val) for all records] grouped by grp where criteria = 1
So, given the following is true:
Sum of all values = 9
Sum of values in grp(a) = 2
Sum of values in grp(b) = 3
Sum of values in grp(c) = 3
Sum of values in grp(d) = 1
The output would be as follows:
grp: calc:
a 2/9
b 3/9
c 3/9
d 1/9
What would my SQL have to look like??
Thanks!!
You should be able to use something like this which uses sum() over():
select distinct grp,
sum(val) over(partition by grp)
/ (sum(val) over(partition by criteria)*1.0) Total
from yourtable
where criteria = 1
See SQL Fiddle with Demo
The result is:
| GRP | TOTAL |
------------------------
| a | 0.222222222222 |
| b | 0.333333333333 |
| c | 0.333333333333 |
| d | 0.111111111111 |
I completely agree with #bluefeet's response -- this is just a little more of a database-independent approach (should work with most RDBMS):
select distinct
grp,
sum(val)/cast(total as decimal)
from yourtable
cross join
(
select SUM(val) as total
from yourtable
) sumtable
where criteria = 1
GROUP BY grp, total
And here is the SQL Fiddle.