How to average the top n in each SQL group - sql

I'm trying to figure out how to average the top N values within each group. I have a table with two columns, Group and Value. My goal is to average the top N values within each group where N is different based on another table.
For group A, N equals 3 and is highlighted in red. The output is the average of the top 3 values.
For group B, N equals 2 and is highlighted in green. Because we only have 1 value of 2.2 for group B, we need to go to the filler table. The filler value for group B is 2.0, so we will average 2.2 and 2.0. If N = 5, then the filler value will be repeated 4 times for Group B.
My initial idea is to:
Rank the values in each group
Join it to the second table
Use where Rank <= N to remove the duplicates before averaging
However, I not sure how the filling table could be incorporated since N could be greater than the number of values I have. I do need to use SQL Server 2008.

First of all, I hope that you're using more adequate names instead of Group and Value. Here's a sample code that first defines the order to later define the N values that will be used and get an average from those. The code is untested as you didn't provide consumable sample data.
WITH CTE AS(
SELECT *,
ROW_NUMBER() OVER( PARTITION BY [Group] ORDER BY [Value] DESC) AS rn,
COUNT(*) OVER( PARTITION BY [Group]) ItemCount
FROM TableWithValues
)
SELECT [Group],
(SUM( [Value]) + CASE WHEN N.n > c.ItemCount
THEN (N.n - c.ItemCount) * F.Filler
ELSE 0 END)/ N.n AS [Value]
FROM CTE c
JOIN TableWithN N ON c.[Group] = N.[Group] AND c.rn <= N.n
JOIN Fillers F ON c.[Group] = F.[Group]
GROUP BY [Group];

Related

Group By 2 Columns and Find Percentage of Group In SQL

I have a Game table with two columns TeamZeroScore and TeamOneScore. I would like to calculate the % of games that end with each score variance. The max score one team can have is 5.
I have got the following code which selects each team score with an additional 2 columns to have the max and min of these two values in order. I did this because I thought the next step is to group by these two columns
SELECT TOP (100000) [TeamOneScore],[TeamZeroScore],
(SELECT Max(v)
FROM (VALUES ([TeamOneScore]), ([TeamZeroScore])) AS value(v)) as [MaxScore],
(SELECT Min(v)
FROM (VALUES ([TeamOneScore]), ([TeamZeroScore])) AS value(v)) as [MinScore]
FROM [Database].[dbo].[Game]
Below is the sample data I have for the code above.
How do I produce something similar to this? I think I need to Group By MaxScore, MinScore and then use Count on each group to calculate the percentage based on the total.
Select
Count(*) as "number",
(100 * count(*)) / t
As "percentage",
TeamOneScore as score,
TeamTwoScore as score
From
( Select
TeamOneScore,TeamTwoScore
From tablename
Where TeamOneScore <= TeamTwoScore
Union all
Select
TeamTwoScore,TeamOneScore
from tablename
Where TeamOneScore > TeamTwoScore
) a,
(Select count(*) as t
From tablename) b
Group by
TeamOneScore,
TeamTwoScore
Order by
TeamOneScore,
TeamTwoScore;

Random Samples of XX rows per Column Value

I'm using T-SQL and require some sample output of random rows.
Typically I would write some SQL as per below
Select top 10 *
from SampleTable as ST
Order by NewID()
However this time I want say 100 rows but them split equally by another column value for instance Column 'Type'.
100 Rows with a sample of 25 rows for TypeA , 25 rows for Type B, 25 rows for Type C and lastly 25 rows for Type D scenerio.
My 'Type' values are saved to a temp table
Select top 10 *
from SampleTable as ST
Inner Join #Types as TY
on TY.Type = ST.Type
Order by NewID()
I've seen NTILE but not sure if applicable for my problem.
Thanks.
Use ROW_NUMBER in conjunction with NEWID():
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ST.Type ORDER BY NEWID()) rn
FROM SampleTable AS ST
INNER JOIN #TypesAS TY ON TY.Type = ST.Type
)
SELECT *
FROM cte
WHERE rn <= 25;
The above solution will return 25 records from each type (or however many fewer might be available), randomly.

Separating the list of value by comparing frequency

I have the bellow data set output should come like less then one frequency..
i am new to sql so don't have much idea..
in the input i have 3 times 1, 2 times 2, 3 times 3 and 2 times 4. output i want 2 times 1, 1 time 2, 2 times 3 and 1 time 4..
Any suggestion how to achieve this output!!
This can be written in a more compact form, but just for clarity:
With Src As ( --< Source table
Select * From (Values (1),(2),(3),(1),(1),(2),(3),(3),(4),(4),(5)) V (Id)
), Numbers As ( --< Auxiliary table with numbers from 1 to maximum row count of Src
Select ROW_NUMBER() Over (Order By Id) As N From Src
), Counted As ( --< Calculate current number of ID occurances
Select Id, Count(Id) As Cnt From Src Group By Id
)
Select Id
From Counted --< From distinct list of IDs
Inner Join Numbers --< replicate each row
On Numbers.N < Counted.Cnt --< one less time than the Cnt
Expression to replicate the row taken from SQL: Repeat a result row multiple times...
jpw implementation (please feel free to copy it into your own answer):
With Src As ( --< Source table
Select * From (Values (1),(2),(3),(1),(1),(2),(3),(3),(4),(4),(5)) V (Id)
), Numbered As ( --< Number ID occurances
Select Id, row_number() Over (Partition By id Order By id) As n From Src
)
Select Id From Numbered Where n > 1 --< Take one off

Select random rows from multiple tables in one query

I'm trying to insert some dummy data into a table (A), for which I need the IDs from two other tables (B and C). How can I get n rows with a random B.Id and a random C.Id.
I've got:
select
(Select top 1 ID from B order by newid()) as 'B.Id',
(select top 1 ID from C order by newid()) as 'C.Id'
which gives me random Ids from each table, but what's the best way to get n of these? I've tried joining on a large table and doing top n, but the IDs from B and C are the same random Ids repeated for each row.
So looking to end up with something like this, but able to specify N rows.
INSERT INTO A (B-Id,C-Id,Note)
select
(Select top 1 ID from B order by newid()) as 'B.Id',
(select top 1 ID from C order by newid()) as 'C.Id',
'Rar'
So if B had Ids 1,2,3,4 and C had Ids 11,12,13,14, i'm after the equivalent of:
INSERT INTO A (B-Id,C-Id,Note)
Values
(3,11,'rar'), (1,14,'rar'),(4,11,'rar')
Where the Ids from each table are combined at random
If you want to avoid duplicates, you can use row_number() to enumerate the values in each table (randomly) and then join them:
select b.id as b_id, c.id as c_id
from (select b.*, row_number() over (order by newid()) as seqnum
from b
) b join
(select c.*, row_number() over (order by newid()) as seqnum
from c
) c
on b.seqnum = c.seqnum;
You can just add top N or where seqnum <= N to limit the number.
If I'm reading your question correctly, I think you want N random rows from the union of the two tables - so on any given execution you will get X rows from table B and N-X rows from table C. To accomplish this, you first UNION tables B and C together, then ORDER BY the random value generated by NEWID() while pulling your overall TOP N.
SELECT TOP 50 --or however many you like
DerivedUnionOfTwoTables.[ID],
DerivedUnionOfTwoTables.[Source]
FROM
(
(SELECT NEWID() AS [Random ID], [ID], 'Table B' AS [Source] FROM B)
UNION ALL
(SELECT NEWID() AS [Random ID], [ID], 'Table C' AS [Source] FROM C)
) DerivedUnionOfTwoTables
ORDER BY
[Random ID] DESC
I included a column showing which source table any given record comes from so you could see the distribution of the two table sources changing each time it is executed. If you don't need it and/or don't care to verify, simply comment it out from the topmost select.
You shouldn't need to join to a large table - Select top N ID from B order by newid() should work as newid() works per-row (unlike RAND()). Your join is probably doing a cross-join which will give you multiple results for each newid value.

How to use ROWNUM for a maximum and another minimum ordering in ORACLE?

Currently i am trying to output the top row for 2 condition. One is max and one is min.
Current code
Select *
from (MY SELECT STATEMENT order by A desc)
where ROWNUM <= 1
UPDATE
I am now able to do for both condition. But i need the A to be the highest, if same then check for the B lowest.
E.g Lets say there is 2 rows, Both A is 100 and B is 50 for one and 60 for other.
In this case the 100:50 shld be choose because A is same then B is lowest.
E.g
Lets say there is 2 rows, A is 100 for one and 90 for other, since one is higher no need to check for B.
I tried using max and min but this method seems to work better, any suggestions
Well, after your clarification, you are looking for one record. With Max A. And the smallest B, in case there is more than one record with MAX A. This is simply:
Select *
from (MY SELECT STATEMENT order by A desc, B)
where ROWNUM = 1;
This sorts by A descending first, so you get all maximal A records first. Then it sorts by B, so inside each A group you get the least B first. This gives you the desired A record first, no matter if the found A is unique or not.
or avoid the vagaries of rownun and go for row_number() instead:
SELECT
*
FROM (
SELECT
*
, ROW_NUMBER (ORDER BY A DESC) adesc
, ROW_NUMBER (ORDER BY B ASC) basc
FROM SomeQuery
)
WHERE adesc = 1
OR basc = 1
footnote: select * is a convenience only, please replace with the actual columns required along with table names etc.
Try this if that works
Select *
from (MY SELECT STATEMENT order by A desc)
where ROWNUM <= 1
union
Select *
from (MY SELECT STATEMENT order by A asc)
where ROWNUM <= 1
SELECT * FROM
(Select foo.*, 0 as union_order
from (MY SELECT STATEMENT order by A desc) foo
where ROWNUM <= 1
UNION
Select foo.*, 1
from (MY SELECT STATEMENT order by B asc) foo
where ROWNUM <= 1)
ORDER BY
union_order