How to determine top x rows with a group by - SQL Server 2017

How to determine top x rows with a group by - SQL Server 2017 - sql

I have a dataset that looks like the following:
| Category | Employee | Output |
|:--------:|:--------:|:------:|
| Top | A | 97 |
| Mid | B | 50 |
| Mid | C | 35 |
| Mid | D | 45 |
| Low | E | 15 |
| Low | F | 16 |
| Top | G | 92 |
| Top | H | 84 |
| Mid | I | 49 |
| Mid | J | 31 |
| Low | K | 22 |
| Top | L | 79 |
| Mid | M | 63 |
| Mid | N | 33 |
| Low | O | 19 |
| Mid | P | 33 |
| Top | Q | 77 |
| Top | R | 88 |
| Low | S | 30 |
| Mid | T | 53 |
| Mid | U | 68 |
| Mid | V | 72 |
| Mid | W | 66 |
| Mid | X | 51 |
| Mid | Y | 35 |
| Mid | Z | 70 |
(The real dataset is much larger, about ~20K Rows)
I am trying to find the top 3 output numbers for each group. Ultimately resulting in a dataset like:
| Low | 30 |
|:---:|:--:|
| Low | 22 |
| Low | 19 |
| Mid | 72 |
| Mid | 70 |
| Mid | 68 |
| Top | 97 |
| Top | 92 |
| Top | 88 |
I have tried:
SELECT TOP 10
Category,
Output
FROM
raw_data
ORDER BY
Output DESC
But that only lists the top 10 overall, not by category.
Adding
GROUP BY Category, Count_Placements obviously does nothing, and I cannot group by Category itself.
Sorry there is no SQL Fiddle like I normally do, it is currently down.

You can use row_number():
select category, output
from (
select t.*, row_number() over(partition by category order by output desc) rn
from mytable t
) t
where rn <= 3
order by category, output desc

Related

DB2 SQL - Limit the number of groups returned

I am trying to find a way to limit the first n groups returned. I have a scenario where I want to only select 10 groups of user data and no more. How would I limit 10 groups of user data where the group size for the user can vary. Some groups may have more than 4 records for a user, some may have less than 4 records for a user. But I only want to get 10 users at a time. I tried thinking about how ROW_NUMBER() and PARTITION BY could be leveraged or even FETCH FIRST N ROWS ONLY could be leveraged, but couldn't come up with a solution.
Below is some sample data. NOTE: The GROUP_NUMBER column doesn't exist in the data set I am working with. It is what I was thinking about creating via SQL so that I can leverage this to select where the "GROUP_NUMBER" < 11 for example. I am absolutely open to other solutions given my question, but this was one solution I was thinking about but didn't know how to do it.
+-----------+--------------+-----------+-----------+----------+------------------+--------------+
| REQUESTID | USERID | COMPANYID | FIRSTNAME | LASTNAME | EMAIL | GROUP_NUMBER |
+-----------+--------------+-----------+-----------+----------+------------------+--------------+
| 157 | test.bulkup1 | 44 | BulkUp | Test | bulkup1#test.com | 1 |
| 157 | test.bulkup1 | 44 | BulkUp | Test | bulkup1#test.com | 1 |
| 157 | test.bulkup1 | 44 | BulkUp | Test | bulkup1#test.com | 1 |
| 162 | test.bulkup2 | 44 | BulkUp | Test | bulkup2#test.com | 2 |
| 162 | test.bulkup2 | 44 | BulkUp | Test | bulkup2#test.com | 2 |
| 162 | test.bulkup2 | 44 | BulkUp | Test | bulkup2#test.com | 2 |
| 162 | test.bulkup2 | 44 | BulkUp | Test | bulkup2#test.com | 2 |
| 187 | test.bulkup3 | 44 | BulkUp | Test | bulkup3#test.com | 3 |
| 187 | test.bulkup3 | 44 | BulkUp | Test | bulkup3#test.com | 3 |
| 187 | test.bulkup3 | 44 | BulkUp | Test | bulkup3#test.com | 3 |
| 187 | test.bulkup3 | 44 | BulkUp | Test | bulkup3#test.com | 3 |
| 192 | test.bulkup4 | 44 | BulkUp | Test | bulkup4#test.com | 4 |
+-----------+--------------+-----------+-----------+----------+------------------+--------------+

You can use dense_rank(). I think you want:
select t.*
from (select t.*,
dense_rank() over (order by requestId) as seqnum
from t
) t
where seqnum <= 3;

Left join in sql table?

I have the following SQL Server table:
+-------+----------+----------+----------+
| group | subgroup | position | value |
+-------+----------+----------+----------+
| D924 | A | 50 | 9144142 |
| D924 | A | 52 | 9268118 |
| D924 | A | 60 | 9144588 |
| D924 | A | 70 | 10116006 |
| D924 | A | 110 | 9074177 |
| D924 | A | 171 | 7367052 |
| D924 | A | 180 | 10118595 |
| D924 | A | 190 | 9074522 |
| D924 | B | 150 | 12423396 |
| D955 | ... | ... | ... |
+-------+----------+----------+----------+
I need to list all the position for every subgroup within the same group
Like so:
+-------+----------+----------+----------+
| group | subgroup | position | value |
+-------+----------+----------+----------+
| D924 | A | 50 | 9144142 |
| D924 | A | 52 | 9268118 |
| D924 | A | 60 | 9144588 |
| D924 | A | 70 | 10116006 |
| D924 | A | 110 | 9074177 |
| D924 | A | 171 | 7367052 |
| D924 | A | 180 | 10118595 |
| D924 | A | 190 | 9074522 |
| D924 | A | 150 | |
| D924 | B | 50 | |
| D924 | B | 52 | |
| D924 | B | 60 | |
| D924 | B | 70 | |
| D924 | B | 110 | |
| D924 | B | 171 | |
| D924 | B | 180 | |
| D924 | B | 190 | |
| D924 | B | 150 | 12423396 |
| D955 | ... | ... | ... |
+-------+----------+----------+----------+
I would like to achieve the result table in a single SQL query. Can you advise?

This is simply a DISTINCT list of the position and [Group] & subgroup values with a LEFT JOIN back to the table.
Doing a 2 DISTINCT queries will be expensive, so if you have a table of your groups and positions, I would suggest using those, rather than the CTEs:
WITH Groups AS
(SELECT DISTINCT
[group],
subgroup
FROM dbo.YourTable),
Positions AS
(SELECT DISTINCT
position
FROM dbo.YourTable)
SELECT G.[Group],
G.subgroup,
P.Position,
YT.[value]
FROM Groups G
CROSS JOIN Positions P
LEFT JOIN dbo.YourTable YT ON G.[Group] = YT.[Group]
AND G.subgroup = YT.subgroup
AND P.Position = YT.Position;

You seem to only want the value for the first "position" in each group. That suggests row_number():
select group, subgroup, position,
(case when row_number() over (partition by group, position order by subgroup) = 1
then value
end) as value
from t;
Here is a db<>fiddle.

Select distinct subgroups and positions first, then join them and outer join your table.
with sub as (select distinct group, subgroup from mytable)
, pos as (select distinct group, position from mytable)
select
sub.group. sub.subgroup, pos.position, t.value
from sub
join pos on pos.group = sub.group
left join mytable t on t.group = sub.group
and t.subgroup = sub.subgroup
and t.position = pos.position
order by t.group, t.subgroup, t.position;

select all rows that match criteria if not get a random one

+----+---------------+--------------------+------------+----------+-----------------+
| id | restaurant_id | filename | is_profile | priority | show_in_profile |
+----+---------------+--------------------+------------+----------+-----------------+
| 40 | 20 | 1320849687_390.jpg | | | 1 |
| 60 | 24 | 1320853501_121.png | 1 | | 1 |
| 61 | 24 | 1320853504_847.png | | | 1 |
| 62 | 24 | 1320853505_732.png | | | 1 |
| 63 | 24 | 1320853505_865.png | | | 1 |
| 64 | 29 | 1320854617_311.png | 1 | | 1 |
| 65 | 29 | 1320854617_669.png | | | 1 |
| 66 | 29 | 1320854618_636.png | | | 1 |
| 67 | 29 | 1320854619_791.png | | | 1 |
| 74 | 154 | 1320922653_259.png | | | 1 |
| 76 | 154 | 1320922656_332.png | | | 1 |
| 77 | 154 | 1320922657_106.png | | | 1 |
| 84 | 130 | 1321269380_960.jpg | 1 | | 1 |
| 85 | 130 | 1321269383_555.jpg | | | 1 |
| 86 | 130 | 1321269384_251.jpg | | | 1 |
| 89 | 28 | 1321269714_303.jpg | | | 1 |
| 90 | 28 | 1321269716_938.jpg | 1 | | 1 |
| 91 | 28 | 1321269717_147.jpg | | | 1 |
| 92 | 28 | 1321269717_774.jpg | | | 1 |
| 93 | 28 | 1321269717_250.jpg | | | 1 |
| 94 | 28 | 1321269718_964.jpg | | | 1 |
| 95 | 28 | 1321269719_830.jpg | | | 1 |
| 96 | 43 | 1321270013_629.jpg | 1 | | 1 |
+----+---------------+--------------------+------------+----------+-----------------+
I have this table and I want to select the filename for a given list of restaurants ids.
For example for 24,29,154:
+----+---------------
| filename |
+----+---------------
1320853501_121.png (has is_profile 1)
1320854617_311.png (has is_profile 1)
1320922653_259.png (chosen as profile picture because restaurant doesn't have a profile pic but has pictures)
I tried group by and case statements but I got nowhere.Also if you use group by it should be a full group by.

You can do this with aggregation and some logic:
select restaurant_id,
coalesce(max(case when is_profile = 1 then filename end),
max(filename)
) as filename
from t
where restaurant_id in (24, 29, 154)
group by restaurant_id;
First look for the/a profile filename. Next just choose an arbitrary one.

MS Access SQL query from 3 tables

I have 3 tables shown below in MS Access 2010:
Table: devices
id | device_id | Company | Version | Revision |
-----------------------------------------------
1 | dev_a | Almaras | 1.5.1 | 0.2A |
2 | dev_b | Enigma | 1.5.1 | 0.2A |
3 | dev_c | Almaras | 1.5.1 | 0.2C |
*Field: device_id is Primary Key Unique String
*Field ID is just an auto-number column
Table: activities
id | act_id | act_date | act_type | act_note |
------------------------------------------------
1 | dev_a | 07/22/2013 | usb_axc | ok |
2 | dev_a | 07/23/2013 | usb_axe | ok | (LAST ROW for dev_a)
3 | dev_c | 07/22/2013 | usb_axc | ok | (LAST ROW for dev_c)
4 | dev_b | 07/21/2013 | usb_axc | ok | (LAST ROW for dev_b)
*Field: act_id contains device_id; NOT UNIQUE
*Field ID is just an auto-number column
Table: matrix
id | mat_id | tc | ts | bat | cycles |
-----------------------------------------
1 | dev_a | 2811 | 10 | 99 | 200 |
2 | dev_a | 2911 | 10 | 97 | 400 |
3 | dev_a | 3007 | 10 | 94 | 600 |
4 | dev_a | 3210 | 10 | 92 | 800 | (LAST ROW for dev_d)
5 | dev_b | 1100 | 5 | 98 | 100 |
6 | dev_b | 1300 | 8 | 93 | 200 |
7 | dev_b | 1411 | 11 | 90 | 300 | (LAST ROW for dev_b)
8 | dev_c | 4000 | 27 | 77 | 478 | (LAST ROW for dev_c)
*Field: mat_id contains device_id; NOT UNIQUE
*Field ID is just an auto-number column
Is there any way to query tables to get results as shown below (each device from devices and only last row added [see example output table] from each of the other two tables):
Query Results:
device_id | Company | act_date | act_type | bat | cycles |
------------------------------------------------------------
device_a | Almaras | 07/23/2013 | usb_axe | 92 | 800 |
device_b | Enigma | 07/21/2013 | usb_axc | 90 | 300 |
device_c | Almaras | 07/22/2013 | usb_axc | 77 | 478 |
Any ideas? Thank you in advance for reading and helping me out :)

I think is what you want,
SELECT a.device_id, a.Company,
b.act_date, b.act_type,
c.bat, c.cycles
FROM ((((devices AS a
INNER JOIN activities AS b
ON a.device_id = b.act_id)
INNER JOIN matrix AS c
ON a.device_id = c.mat_id)
INNER JOIN
(
SELECT act_id, MAX(act_date) AS max_date
FROM activities
GROUP BY act_id
) AS d ON b.act_id = d.act_id AND b.act_date = d.max_date)
INNER JOIN
(
SELECT mat_id, MAX(tc) AS max_tc
FROM matrix
GROUP BY mat_id
) AS e ON c.mat_id = e.mat_id AND c.tc = e.max_tc)
The subqueries: d and e separately gets the latest row for every act_id.

Try
SELECT devices.device_id, devices.Company, activities.act_data, activities.act_type, matrix.bat, matrix.cycles
FROM devices
LEFT JOIN activities
ON devices.device_id = activities.act_id
LEFT JOIN matrix
ON devices.device_id = matrix.mat_id;

What do you consider the "last" row in Matrix?
You need to do something like
WHERE act_date in (SELECT max(a.act_date) from activities a where a.mat_id=d.device_id GROUP BY a.mat_id)
and something similar for the join to matrix.

Adding rows and changing the row name

I have fetched the values from the sql server database from the following code,
SELECT [Zone Name]
,[Zone Count]
,[Phase Name]
FROM [Interface].[dbo].[VwZoneCount]
where [Zone Name] IN ('EB2GFNMZ','EB2GFSMZ','EB2GFNZ1','EB2GFSZ1','EB21FNZ1','EB21FSMZ','EB2IFSZ1','EB22FNZ1','EB22FSZ1','EB22FSMZ','EB23FNMZ','EB23FNZ1','EB23FNZ2','EB23FNZ3','EB23FSMZ','EB23FSZ1','EB23FSZ2','EB24FNMZ','EB24FNZ1','EB24FSMZ','EB24FSZ1','EB25FNMZ','EB25FNZ1','EB25FSMZ','EB25FSZ1','EB26FNMZ','EB26FNZ1','EB26FSMZ','EB26FSZ1','EB27FNZ1','EB27FSMZ')
GO
The ouput for the above query is ,
Zone Name Zone Count
EB24FNZ1 160
EB24FSMZ 10
EB24FSZ1 87
EB25FNMZ 82
EB25FNZ1 82
EB25FSMZ 12
EB25FSZ1 123
EB26FNMZ 4
EB26FNZ1 92
EB26FSMZ 23
EB26FSZ1 91
EB27FNZ1 1
EB27FSMZ 64
EB2GFNMZ 12
EB2GFNZ1 152
EB2GFSMZ 36
EB2GFSZ1 212
but i need the output by summing some row values .I need to combine values that have 'EB2GFN%' to one with different name,'EB21FN%'..similarly need to combine other rows. Can any body suggest me how i would do that .
Desired output:-
Zone Name Zone Count
EB24F_NORTH_WING 160
EB24F_SOUTH_WING 10+87
EB25F_NORTH_WING 82+82
EB25F_SOUTH_WING 12+123
EB26F_NORTH_WING 4+92
EB26F_SOUTH_WING 23+91
EB27F_NORTH_WING 1
EB27F_SOUTH_WING 64
EB2GF_NORTH_WING 12+152
EB2GF_SOUTH_WING 36+212

You can do this:
;WITH AllZones
AS
(
SELECT * FROM YourQuery
), WithGroupedZones
AS
(
SELECT
ZoneName,
ZoneCount,
LEFT(ZoneName, 2) Eb,
SUBSTRING(ZoneName, 3, 1) EbNumber,
SUBSTRING(ZoneName, 4, 3) F,
SUBSTRING(ZoneName, 8, 1) FNumber
FROM AllZones
)
SELECT
ZoneName,
(SELECT SUM(t2.ZoneCount)
FROM WithGroupedZones t2
WHERE t1.Eb = t2.Eb
AND t1.F = t2.F
AND t1.EBNumber= t2.EBnumber
) ZonesCount
FROM WithGroupedZones t1;
SQL Fiddle Demo
This will give you:
| ZONENAME | ZONESCOUNT |
-------------------------
| EB24FNZ1 | 160 |
| EB24FSMZ | 97 |
| EB24FSZ1 | 97 |
| EB25FNMZ | 164 |
| EB25FNZ1 | 164 |
| EB25FSMZ | 135 |
| EB25FSZ1 | 135 |
| EB26FNMZ | 96 |
| EB26FNZ1 | 96 |
| EB26FSMZ | 114 |
| EB26FSZ1 | 114 |
| EB27FNZ1 | 1 |
| EB27FSMZ | 64 |
| EB2GFNMZ | 164 |
| EB2GFNZ1 | 164 |
| EB2GFSMZ | 248 |
| EB2GFSZ1 | 248 |
Note that: This might be not the same result set that you are looking for. But you can modify the condition, I used in my query:
t1.Eb = t2.Eb
AND t1.F = t2.F
AND t1.EBNumber= t2.EBnumber
To get your desired output. Also note that the zones' names are grouped into:
| ZONENAME | ZONECOUNT | EB | EBNUMBER | F | FNUMBER |
--------------------------------------------------------
| EB24FNZ1 | 160 | EB | 2 | 4FN | 1 |
| EB24FSMZ | 10 | EB | 2 | 4FS | Z |
| EB24FSZ1 | 87 | EB | 2 | 4FS | 1 |
| EB25FNMZ | 82 | EB | 2 | 5FN | Z |
| EB25FNZ1 | 82 | EB | 2 | 5FN | 1 |
| EB25FSMZ | 12 | EB | 2 | 5FS | Z |
| EB25FSZ1 | 123 | EB | 2 | 5FS | 1 |
| EB26FNMZ | 4 | EB | 2 | 6FN | Z |
| EB26FNZ1 | 92 | EB | 2 | 6FN | 1 |
| EB26FSMZ | 23 | EB | 2 | 6FS | Z |
| EB26FSZ1 | 91 | EB | 2 | 6FS | 1 |
| EB27FNZ1 | 1 | EB | 2 | 7FN | 1 |
| EB27FSMZ | 64 | EB | 2 | 7FS | Z |
| EB2GFNMZ | 12 | EB | 2 | GFN | Z |
| EB2GFNZ1 | 152 | EB | 2 | GFN | 1 |
| EB2GFSMZ | 36 | EB | 2 | GFS | Z |
| EB2GFSZ1 | 212 | EB | 2 | GFS | 1 |
Then you can compare the ZoneName using these groups EB, EBNUMBER, F, FNUMBER

Try this, it will give you sums for groups that have first 6 letters in common:
SELECT LEFT(Zone_Prefix, 5) + CASE WHEN RIGHT(Zone_Prefix, 1) = 'N' THEN '_NORTH_WING' ELSE '_SOUTH_WING' END AS [Zone Name],
Cnt AS [Zone Count]
FROM
(
SELECT LEFT([Zone Name], 6) AS Zone_Prefix
,SUM([Zone Count]) Cnt
FROM [Interface].[dbo].[VwZoneCount]
WHERE [Zone Name] IN ('EB2GFNMZ','EB2GFSMZ','EB2GFNZ1','EB2GFSZ1','EB21FNZ1','EB21FSMZ','EB2IFSZ1','EB22FNZ1','EB22FSZ1','EB22FSMZ','EB23FNMZ','EB23FNZ1','EB23FNZ2','EB23FNZ3','EB23FSMZ','EB23FSZ1','EB23FSZ2','EB24FNMZ','EB24FNZ1','EB24FSMZ','EB24FSZ1','EB25FNMZ','EB25FNZ1','EB25FSMZ','EB25FSZ1','EB26FNMZ','EB26FNZ1','EB26FSMZ','EB26FSZ1','EB27FNZ1','EB27FSMZ')
GROUP BY
LEFT([Zone Name], 6)
) tbl
Here is an SQL Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to determine top x rows with a group by - SQL Server 2017 - sql

You can use row_number(): select category, output from ( select t.*, row_number() over(partition by category order by output desc) rn from mytable t ) t where rn <= 3 order by category, output desc

Related

DB2 SQL - Limit the number of groups returned

Left join in sql table?

select all rows that match criteria if not get a random one

MS Access SQL query from 3 tables

Adding rows and changing the row name

Categories

Resources