Get the top N rows by row count in GROUP BY

Get the top N rows by row count in GROUP BY - sql

I'm querying a records table to find which users are my top record creators for certain record types. The basic starting point of my query looks something like this:
SELECT recordtype, createdby, COUNT(*)
FROM recordtable
WHERE recordtype in (...)
GROUP BY recordtype, createdby
ORDER BY recordtype, createdby DESC
But there are many users who have created records - I want to narrow this down further.
I added HAVING COUNT(*) > ..., but some record types only have a few records, while others have hundreds. If I do HAVING COUNT(*) > 10, I won't see that all 9 records of type "XYZ" were created by the same person, but I will have to scroll through every person that's created only 15, 30, 50, etc. of the 3,500 records of type "ABC."
I only want the top 5, 10, or so creators for each record type.
I've found a few questions that address the "select top N in group" part of the question, but I can't figure out how to apply them to what I need. The answers I could find are in cases where the "rank by" column is a value stored in the table, not an aggregate.
(Example: "what are the top cities in each country by population?", with data that looks like this:)
Country City Population
United States New York 123456789
United States Chicago 123456789
France Paris 123456789
I don't know how to apply the methods I've seen used to answer that (row_number(), mostly) to get the top N by COUNT(*).

here is one way , to get top 10 rows in each group:
select * from(
select *, row_number() over (partition by recordtype order by cnt desc) rn
from (
SELECT recordtype, createdby, COUNT(*) cnt
FROM recordtable
WHERE recordtype in (...)
GROUP BY recordtype, createdby
)t
)t where rn <= 10

If i understand well you want to get the top N records with biggest count. You can achieve this with a subquery like this (I suppose you are using MySQL or PostGRESQL or db2, in other DB engines the limit and offset may differ, as for example in sqlserver that is achieved with select top n * from...
SELECT A.recordtype, A.createdby, A.total FROM (
SELECT recordtype, createdby, COUNT(*) as total
FROM recordtable
WHERE recordtype in (...)
GROUP BY recordtype, createdby
) AS A ORDER BY recordtype, createdby, total DESC
LIMIT 10 OFFSET 0
Limit is the number of records you want in the results page, and offset is the number of records to skip before taking the result page.
If you use sqlserver it may look like this (there is also a way to apply an offset, you can take a look here SQL Server OFFSET and LIMIT)
SELECT TOP 10 A.recordtype, A.createdby, A.total FROM (
SELECT recordtype, createdby, COUNT(*) as total
FROM recordtable
WHERE recordtype in (...)
GROUP BY recordtype, createdby
) AS A ORDER BY recordtype, createdby, total DESC
For a grouped result then you can take a look at this post http://www.silota.com/docs/recipes/sql-top-n-group.html
So to take first 10 records in groups not only the first i mix this answer, with the link below and the approach of #eshirvana
SELECT * FROM (
SELECT *, row_number() OVER (PARTITION recordtype BY ORDER BY total DESC) rn
FROM (
SELECT recordtype, createdby, COUNT(*) as total
FROM recordtable
WHERE recordtype in (...)
GROUP BY recordtype, createdby
) t
) t WHERE total <= 10

Related

sql - Data with more than one record

I have the following temporary table
Aim is to flag the data with more than one records and put More than one records
In my example below, if Siren appears more than once, I would have
Siren ETS_RS Voie Ville nom_etp
348177155 POITOU-CHARENTES ENGRAIS P.C.E. (SNC) BOULEVARD WLADIMIR MORCH 17000 LA ROCHELLE More than one records
For records that are appearing once, I will have the single name of the company (here nom_etp)
Siren ETS_RS Voie Ville nom_etp
344843347 PRESTIGE AUTO ROCHELAIS (SAS) 4 RUE JEAN DEMEOCQ 17000 LA ROCHELLE NIGER
I tried a few things based on the idea that if I can have a count of more than one, I could flag them easily and use them with a CASE :
First: I tried to do a count
WITH cte_ssrep_moraux AS (...)
SELECT SIREN,ETS_RS,Voie,Ville
,Denomination AS nom_etp,COUNT(SIREN)
FROM cte_ssrep_moraux
GROUP BY ETS_RS,Voie,Ville,Denomination,SIREN
It hits a snitch as all counts were equal to one and I have the same dataset as in the picture...
Second:
WITH cte_ssrep_moraux AS (...)
SELECT ETS_RS,Voie,Ville
,Denomination AS nom_etp,SIREN,
RANK() OVER (PARTITION BY ETS_RS ORDER BY ETS_RS ASC) AS xx
FROM cte_ssrep_moraux
GROUP BY ETS_RS,Voie,Ville,Denomination,SIREN
It hits a snitch as all counts were equal to one and I have the same dataset as in the picture...
I'm bit confused on what I should do next. I have the feeling will be an easy one and I'll face palmed myself.
Many thanks for reading my question

If this is your criteria:
if Siren appears more than once,
Then the group by clause should only contain Siren:
SELECT SIREN, COUNT(*)
FROM cte_ssrep_moraux
GROUP BY SIREN
HAVING COUNT(*) > 1;
I'm not sure what you want to do after that, but this will return the SIREN values that appear more than once.

If there is more than one row and you change every nom_etp to 'more than one record', you end up with identical rows. That's why I prepared some tweaked query. See following (table simplified for clarity):
CREATE TABLE Duplicates
(
Id int,
Name varchar(20),
Item varchar(20)
)
INSERT Duplicates VALUES
(1,'Name1', 'Item1'),
(2,'Name2', 'Item2'),
(2,'Name2', 'Item3'),
(3,'Name3', 'Item4'),
(3,'Name3', 'Item5'),
(3,'Name3', 'Item6'),
(4,'Name4', 'Item7');
If you need just a query:
WITH Numbered AS
(
SELECT Id, Name, Item,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Id) RowNum,
COUNT(*) OVER (PARTITION BY Id ORDER BY ID) TotalInGroup
FROM Duplicates
)
SELECT Id, Name,
CASE WHEN RowNum=1 AND TotalInGroup>1 THEN 'More records' ELSE Item END Item
FROM Numbered
If you need to normalize:
WITH Numbered AS
(
SELECT Id, Name, Item,
ROW_NUMBER() OVER (ORDER BY Id) Number,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Id) RowNum,
COUNT(*) OVER (PARTITION BY Id ORDER BY ID) TotalInGroup
FROM Duplicates
)
MERGE Numbered AS tgt
USING Numbered AS src
ON src.Number=tgt.Number
WHEN MATCHED AND tgt.RowNum=1 AND tgt.TotalInGroup>1 THEN
UPDATE SET tgt.Item='More'
WHEN MATCHED AND tgt.RowNum>1 THEN
DELETE;
Table will look like below:
Id Name Item
-- ---- ----
1 Name1 Item1
2 Name2 More
3 Name3 More
4 Name4 Item7
If there are multiple rows with same id, first of them is updated with 'More' constant, all other in the group are deleted.

Use CTE for this purpose
;WITH CTE AS(
SELECT ETS_RS,Voie,Ville,Denomination AS nom_etp,SIREN,
ROW_NUMBER() OVER (PARTITION BY ETS_RS ORDER BY ETS_RS ASC) AS RN
FROM cte_ssrep_moraux
--GROUP BY ETS_RS,Voie,Ville,Denomination,SIREN
)
SELECT ETS_RS,
Voie,Ville,
CASE WHEN RN > 1 THEN 'More than one records'
ELSE nom_etp
END AS 'nom_etp',
SIREN
FROM CTE

;with cte
as
(
select siren,count(*) as cnt
from
yourtable
having count(*)>1
)
update t
set nom_etp='more than one records'
yourtable t where exists(Select 1 from cte c where c.sirenid=t.sirenid)

Since you still want all the records, including the unique.
Then you can use COUNT as a window function.
With a CASE to choose what to display as nom_etp.
select Siren, ETS_RS, Voie, Ville,
(case when count(*) over (partition by Siren) > 1 then 'More than one records' else nom_etp end) as nom_etp
from cte_ssrep_moraux;

Please find what I did
WITH cte_ssrep_moraux AS (
SELECT SIREN,ETS_RS,Voie,Ville
,Denomination AS nom_etp,ROW_NUMBER()
OVER (PARTITION BY ETS_RS ORDER BY ETS_RS ASC) AS Counting
FROM
(my_initial_cte) AS tb
)
SELECT Siren, ETS_RS, Voie, Ville,nom_etp
FROM cte_ssrep_moraux
WHERE counting = 1
AND Siren NOT IN (SELECT Siren FROM cte_ssrep_moraux WHERE counting > 1)
UNION ALL
SELECT DISTINCT Siren, ETS_RS, Voie, Ville,'More than one records'
FROM cte_ssrep_moraux
WHERE counting > 1
Explanation: After the initial CTE, I tried many of the solutions mentioned above especially using the CASE.
Issue with the CASE was that it would put something like that
Siren ETS_RS Voie Ville nom_etp
xxxx xyxy xyzet Bordeaux More than one records
xxxx xyxy xyzet Bordeaux More than one records
xxxx xyxy xyzet Bordeaux More than one records
xxxy zzzy ssare Paris Firm ABC
So instead of putting everything under a CASE, I said let's split that into 2 part :
First part would put everything with a counting equal to 1
Second part would put the rest with a counting that goes above 1 with a DISTINCT
Join the two results with an UNION ALL as the two sets have the same numbers of fetch rows

I need the Top 10 results from table

I need to get the Top 10 results for each Region, Market and Name along with those with highest counts (Gaps). There are 4 Regions with 1 to N Markets. I can get the Top 10 but cannot figure out how to do this without using a Union for every Market. Any ideas on how do this?
SELECT DISTINCT TOP 10
Region, Market, Name, Gaps
FROM
TableName
ORDER BY
Region, Market, Gaps DESC

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your Region, Market, Name - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria.
So try something like this:
;WITH RegionsMarkets AS
(
SELECT
Region, Market, Name, Gaps,
RN = ROW_NUMBER() OVER(PARTITION BY Region, Market, Name ORDER BY Gaps DESC)
FROM
dbo.TableName
)
SELECT
Region, Market, Name, Gaps
FROM
RegionsMarkets
WHERE
RN <= 10
Here, I am selecting only the "first" entry for each "partition" (i.e. for each Region, Market, Name tuple) - ordered by Gaps in a descending fashion.
With this, you get the top 10 rows for each (Region, Market, Name) tuple - does that approach what you're looking for??

I think you want row_number():
select t.*
from (select t.*,
row_number() over (partition by region, market order by gaps desc) as seqnum
from tablename t
) t
where seqnum <= 10;
I am not sure if you want name in the partition by clause. If you have more than one name within a market, that may be what you are looking for. (Hint: Sample data and desired results can really help clarify a question.)

Selecting 5 Most Recent Records Of Each Group

The below statement retrieves the top 2 records within each group in SQL Server. It works correctly, however as you can see it doesn't scale at all. I mean that if I wanted to retrieve the top 5 or 10 records instead of just 2, you can see how this query statement would grow very quickly.
How can I convert this query into something that returns the same records, but that I can quickly change it to return the top 5 or 10 records within each group instead, rather than just 2? (i.e. I want to just tell it to return the top 5 within each group, rather than having 5 unions as the below format would require)
Thanks!
WITH tSub
as (SELECT CustomerID,
TransactionTypeID,
Max(EventDate) as EventDate,
Max(TransactionID) as TransactionID
FROM Transactions
WHERE ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID)
SELECT *
from tSub
UNION
SELECT t.CustomerID,
t.TransactionTypeID,
Max(t.EventDate) as EventDate,
Max(t.TransactionID) as TransactionID
FROM Transactions t
WHERE t.TransactionID NOT IN (SELECT tSub.TransactionID
FROM tSub)
and ParentTransactionID is NULL
Group By CustomerID,
TransactionTypeID

Use Partition by to solve this type problem
select values from
(select values ROW_NUMBER() over (PARTITION by <GroupColumn> order by <OrderColumn>)
as rownum from YourTable) ut where ut.rownum<=5
This will partitioned the result on the column you wanted order by EventDate Column then then select those entry having rownum<=5. Now you can change this value 5 to get the top n recent entry of each group.

How do I use ROW_NUMBER()?

I want to use the ROW_NUMBER() to get...
To get the max(ROW_NUMBER()) --> Or i guess this would also be the count of all rows
I tried doing:
SELECT max(ROW_NUMBER() OVER(ORDER BY UserId)) FROM Users
but it didn't seem to work...
To get ROW_NUMBER() using a given piece of information, ie. if I have a name and I want to know what row the name came from.
I assume it would be something similar to what I tried for #1
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
but this didn't work either...
Any Ideas?

For the first question, why not just use?
SELECT COUNT(*) FROM myTable
to get the count.
And for the second question, the primary key of the row is what should be used to identify a particular row. Don't try and use the row number for that.
If you returned Row_Number() in your main query,
SELECT ROW_NUMBER() OVER (Order by Id) AS RowNumber, Field1, Field2, Field3
FROM User
Then when you want to go 5 rows back then you can take the current row number and use the following query to determine the row with currentrow -5
SELECT us.Id
FROM (SELECT ROW_NUMBER() OVER (ORDER BY id) AS Row, Id
FROM User ) us
WHERE Row = CurrentRow - 5

Though I agree with others that you could use count() to get the total number of rows, here is how you can use the row_count():
To get the total no of rows:
with temp as (
select row_number() over (order by id) as rownum
from table_name
)
select max(rownum) from temp
To get the row numbers where name is Matt:
with temp as (
select name, row_number() over (order by id) as rownum
from table_name
)
select rownum from temp where name like 'Matt'
You can further use min(rownum) or max(rownum) to get the first or last row for Matt respectively.
These were very simple implementations of row_number(). You can use it for more complex grouping. Check out my response on Advanced grouping without using a sub query

If you need to return the table's total row count, you can use an alternative way to the SELECT COUNT(*) statement.
Because SELECT COUNT(*) makes a full table scan to return the row count, it can take very long time for a large table. You can use the sysindexes system table instead in this case. There is a ROWS column that contains the total row count for each table in your database. You can use the following select statement:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This will drastically reduce the time your query takes.

You can use this for get first record where has clause
SELECT TOP(1) * , ROW_NUMBER() OVER(ORDER BY UserId) AS rownum
FROM Users
WHERE UserName = 'Joe'
ORDER BY rownum ASC

ROW_NUMBER() returns a unique number for each row starting with 1. You can easily use this by simply writing:
ROW_NUMBER() OVER (ORDER BY 'Column_Name' DESC) as ROW_NUMBER

May not be related to the question here. But I found it could be useful when using ROW_NUMBER -
SELECT *,
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS Any_ID
FROM #Any_Table

select
Ml.Hid,
ml.blockid,
row_number() over (partition by ml.blockid order by Ml.Hid desc) as rownumber,
H.HNAME
from MIT_LeadBechmarkHamletwise ML
join [MT.HAMLE] h on ML.Hid=h.HID

SELECT num, UserName FROM
(SELECT UserName, ROW_NUMBER() OVER(ORDER BY UserId) AS num
From Users) AS numbered
WHERE UserName='Joe'

You can use Row_Number for limit query result.
Example:
SELECT * FROM (
select row_number() OVER (order by createtime desc) AS ROWINDEX,*
from TABLENAME ) TB
WHERE TB.ROWINDEX between 0 and 10
--
With above query, I will get PAGE 1 of results from TABLENAME.

If you absolutely want to use ROW_NUMBER for this (instead of count(*)) you can always use:
SELECT TOP 1 ROW_NUMBER() OVER (ORDER BY Id)
FROM USERS
ORDER BY ROW_NUMBER() OVER (ORDER BY Id) DESC

Need to create virtual table by using WITH table AS, which is mention in given Query.
By using this virtual table, you can perform CRUD operation w.r.t row_number.
QUERY:
WITH table AS
-
(SELECT row_number() OVER(ORDER BY UserId) rn, * FROM Users)
-
SELECT * FROM table WHERE UserName='Joe'
-
You can use INSERT, UPDATE or DELETE in last sentence by in spite of SELECT.

SQL Row_Number() function is to sort and assign an order number to data rows in related record set. So it is used to number rows, for example to identify the top 10 rows which have the highest order amount or identify the order of each customer which is the highest amount, etc.
If you want to sort the dataset and number each row by seperating them into categories we use Row_Number() with Partition By clause. For example, sorting orders of each customer within itself where the dataset contains all orders, etc.
SELECT
SalesOrderNumber,
CustomerId,
SubTotal,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SubTotal DESC) rn
FROM Sales.SalesOrderHeader
But as I understand you want to calculate the number of rows of grouped by a column. To visualize the requirement, if you want to see the count of all orders of the related customer as a seperate column besides order info, you can use COUNT() aggregation function with Partition By clause
For example,
SELECT
SalesOrderNumber,
CustomerId,
COUNT(*) OVER (PARTITION BY CustomerId) CustomerOrderCount
FROM Sales.SalesOrderHeader

This query:
SELECT ROW_NUMBER() OVER(ORDER BY UserId) From Users WHERE UserName='Joe'
will return all rows where the UserName is 'Joe' UNLESS you have no UserName='Joe'
They will be listed in order of UserID and the row_number field will start with 1 and increment however many rows contain UserName='Joe'
If it does not work for you then your WHERE command has an issue OR there is no UserID in the table. Check spelling for both fields UserID and UserName.

SQL Select Bottom Records

I have a query where I wish to retrieve the oldest X records. At present my query is something like the following:
SELECT Id, Title, Comments, CreatedDate
FROM MyTable
WHERE CreatedDate > #OlderThanDate
ORDER BY CreatedDate DESC
I know that normally I would remove the 'DESC' keyword to switch the order of the records, however in this instance I still want to get records ordered with the newest item first.
So I want to know if there is any means of performing this query such that I get the oldest X items sorted such that the newest item is first. I should also add that my database exists on SQL Server 2005.

Why not just use a subquery?
SELECT T1.*
FROM
(SELECT TOP X Id, Title, Comments, CreatedDate
FROM MyTable
WHERE CreatedDate > #OlderThanDate
ORDER BY CreatedDate) T1
ORDER BY CreatedDate DESC

Embed the query. You take the top x when sorted in ascending order (i.e. the oldest) and then re-sort those in descending order ...
select *
from
(
SELECT top X Id, Title, Comments, CreatedDate
FROM MyTable
WHERE CreatedDate > #OlderThanDate
ORDER BY CreatedDate
) a
order by createddate desc

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the top N rows by row count in GROUP BY - sql

here is one way , to get top 10 rows in each group: select * from( select , row_number() over (partition by recordtype order by cnt desc) rn from ( SELECT recordtype, createdby, COUNT() cnt FROM recordtable WHERE recordtype in (...) GROUP BY recordtype, createdby )t )t where rn <= 10

Related

sql - Data with more than one record

I need the Top 10 results from table

Selecting 5 Most Recent Records Of Each Group

How do I use ROW_NUMBER()?

SQL Select Bottom Records

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the top N rows by row count in GROUP BY - sql

here is one way , to get top 10 rows in each group: select * from( select *, row_number() over (partition by recordtype order by cnt desc) rn from ( SELECT recordtype, createdby, COUNT(*) cnt FROM recordtable WHERE recordtype in (...) GROUP BY recordtype, createdby )t )t where rn <= 10

Related

sql - Data with more than one record

I need the Top 10 results from table

Selecting 5 Most Recent Records Of Each Group

How do I use ROW_NUMBER()?

SQL Select Bottom Records

Categories

Resources

here is one way , to get top 10 rows in each group: select * from( select , row_number() over (partition by recordtype order by cnt desc) rn from ( SELECT recordtype, createdby, COUNT() cnt FROM recordtable WHERE recordtype in (...) GROUP BY recordtype, createdby )t )t where rn <= 10