How to get top records of subset?

How to get top records of subset? - sql

Say I have a table
StoreID TotalSales Month Year
-- ---------- ----- ----
1 10 1 2012
2 2 1 2012
3 15 1 2012
1 4 2 2012
2 5 2 2012
I need: For each unique "Month/Year", grab the top two StoreID's with the highest Sales.
I'm at a loss on how to do this. I tried with a cross apply but that doesn't seem to work. This is all way over my head so hopefully someone can give me a nudge in the right direction.

This query uses Common Table Expression and Window Function to be able to get all the columns within the row. It works on SQL Server 2005 and up
WITH records
AS
(
SELECT StoreID, TotalSales , Month, Year,
DENSE_RANK() OVER (PARTITION BY Month, Year
ORDER BY TotalSales DESC) rn
FROM tableName
)
SELECT StoreID, TotalSales , Month, Year
FROM records
WHERE rn <= 2
SQLFiddle Demo

Related

SQL How to take the minium for multiple fields?

Consider the following data set that records the product sold, year, and revenue from that particular product in thousands of dollars. This data table (YEARLY_PRODUCT_REVENUE) is stored in SQL and has many more rows.
Year | Product | Revenue
2000 Table 100
2000 Chair 200
2000 Bed 150
2010 Table 120
2010 Chair 190
2010 Bed 390
Using SQL, for every year I would like to find the product that has the maximum revenue.
That is, I would like my output to be the following:
Year | Product | Revenue
2000 Chair 200
2010 Bed 390
My attempt so far has been this:
SELECT year, product, MIN(revenue)
FROM YEARLY_PRODUCT_REVENUE
GROUP BY article, month;
But when I do this, I get multiple-year values for distinct products. For instance, I'm getting the output below which is an error. I'm not entirely sure what the error here is. Any help would be much appreciated!
Year | Product | Revenue
2000 Table 100
2000 Bed 150
2010 Table 120
2010 Chair 190

You don't mention the database so I'll assume it's PostgreSQL. You can do:
select distinct on (year) * from t order by year, revenue desc

You want filtering rather than aggregation. We can use window functions (which most databases support) to rank yearly product sales, and then retain only the top selling product per year.
select *
from (
select r.*, rank() over(partition by year order by revenue desc) rn
from yearly_product_revenue r
) r
where rn = 1;
Here is a shorter solution if your database support the standard WITH TIES clause:
select *
from yearly_product_revenue r
order by rank() over(partition by year order by revenue desc)
fetch first row with ties

Order table by the total count but do not lose the order by names

I have a table, consisting of 3 columns (Person, Year and Count), so for each person, there are several rows with different years and counts and the final row with total count. I want to keep the table ordered by Name, but also order it by the total count.
So the rows should be ordered by sum, but also grouped by the Person and ordered by year. When I am trying to order by sum, of course, both person and years are messed up. Is there a way to sort like this?

You've stored those "total" rows as well? Gosh! Why did you do that?
Anyway: if you
compute rank for rows whose year column is equal to 'total' and
add case expression into the order by clause,
you might get what you want:
SQL> with sorter as
2 (select name, cnt,
3 rank() over (order by cnt) rnk
4 from test
5 where year = 'total'
6 )
7 select t.*
8 from test t join sorter s on s.name = t.name
9 order by s.rnk, case when year = 'total' then '9'
10 else year
11 end;
NAME YEAR CNT
---- ----- ----------
John 2018 3
John 2019 2
John total 5
Bob 2017 2
Bob 2019 4
Bob total 6
6 rows selected.
SQL>

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.

In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.

Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.

in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id

So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID

SQL Server : count types with totals by date change

I need to count a value (M_Id) at each change of a date (RS_Date) and create a column grouped by the RS_Date that has an active total from that date.
So the table is:
Ep_Id Oa_Id M_Id M_StartDate RS_Date
--------------------------------------------
1 2001 5 1/1/2014 1/1/2014
1 2001 9 1/1/2014 1/1/2014
1 2001 3 1/1/2014 1/1/2014
1 2001 11 1/1/2014 1/1/2014
1 2001 2 1/1/2014 1/1/2014
1 2067 7 1/1/2014 1/5/2014
1 2067 1 1/1/2014 1/5/2014
1 3099 12 1/1/2014 3/2/2014
1 3099 14 2/14/2014 3/2/2014
1 3099 4 2/14/2014 3/2/2014
So my goal is like
RS_Date Active
-----------------
1/1/2014 5
1/5/2014 7
3/2/2014 10
If the M_startDate = RS_Date I need to count the M_id and then for
each RS_Date that is not equal to the start date I need to count the M_Id and then add that to the M_StartDate count and then count the next RS_Date and add that to the last active count.
I can get the basic counts with something like
(Case when M_StartDate <= RS_Date
then [m_Id] end) as Test.
But I am stuck as how to get to the result I want.
Any help would be greatly appreciated.
Brian
-added in response to comments
I am using Server Ver 10

If using SQL SERVER 2012+ you can use ROWS with your the analytic/window functions:
;with cte AS (SELECT RS_Date
,COUNT(DISTINCT M_ID) AS CT
FROM Table1
GROUP BY RS_Date
)
SELECT *,SUM(CT) OVER(ORDER BY RS_Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Run_CT
FROM cte
Demo: SQL Fiddle
If stuck using something prior to 2012 you can use:
;with cte AS (SELECT RS_Date
,COUNT(DISTINCT M_ID) AS CT
FROM Table1
GROUP BY RS_Date
)
SELECT a.RS_Date
,SUM(b.CT)
FROM cte a
LEFT JOIN cte b
ON a.RS_DAte >= b.RS_Date
GROUP BY a.RS_Date
Demo: SQL Fiddle

You need a cumulative sum, easy in SQL Server 2012 using Windowed Aggregate Functions. Based on your description this will return the expected result
SELECT p_id, RS_Date,
SUM(COUNT(*))
OVER (PARTITION BY p_id
ORDER BY RS_Date
ROWS UNBOUNDED PRECEDING)
FROM tab
GROUP BY p_id, RS_Date

It looks like you want something like this:
SELECT
RS_Date,
SUM(c) OVER (PARTITION BY M_StartDate ORDER BY RS_Date ROWS UNBOUNDED PRECEEDING)
FROM
(
SELECT M_StartDate, RS_Date, COUNT(DISTINCT M_Id) AS c
FROM my_table
GROUP BY M_StartDate, RS_Date
) counts
The inline view computes the counts of distinct M_Id values within each (M_StartDate, RS_Date) group (distinctness enforced only within the group), and the outer query uses the analytic version of SUM() to add up the counts within each M_StartDate.
Note that this particular query will not exactly reproduce your example results. It will instead produce:
RS_Date Active
-----------------
1/1/2014 5
1/5/2014 7
3/2/2014 8
3/2/2014 2
This is on account of some rows in your example data with RS_Date 3/2/2014 having a later M_StartDate than others. If this is not what you want then you need to clarify the question, which currently seems a bit inconsistent.
Unfortunately, analytic functions are not available until SQL Server 2012. In SQL Server 2010, the job is messier. It could be done like this:
WITH gc AS (
SELECT M_StartDate, RS_Date, COUNT(DISTINCT M_Id) AS c
FROM my_table
GROUP BY M_StartDate, RS_Date
)
SELECT
RS_Date,
(
SELECT SUM(c)
FROM gc2
WHERE gc2.M_StartDate = gc.M_StartDate AND gc2.RS_Date <= gc.RS_Date
) AS Active
FROM gc

If you are using SQL 2012 or newer you can use LAG to produce a running total.
https://msdn.microsoft.com/en-us/library/hh231256(v=sql.110).aspx

Need to print highest year and their highest quarter in SQL Server 2012

I have a requirement to print the corresponding highest year and highest quarter for a given column.
Input is in a table:
cityprogram year quarter
=========== ==== =======
Abc 1998 1
Abc 1999 4
Abc 1999 4
Abc 1998 3
xyz 1998 4
xyz 1998 1
xyz 2000 3
It should print
Abc 1999 4
xyz 2000 3
I tried many joins, max conditions, I seem to get quarter 4 and 4 for both of them :( thanks

Use a window function like ROW_NUMBER in a common-table-expression:
WITH CTE AS(
SELECT [cityprogram], [year], [quarter],
RN = ROW_NUMBER() OVER (
PARTITION BY [cityprogram]
ORDER BY [year] DESC, [quarter] DESC)
FROM dbo.TableName
)
SELECT [cityprogram], [year], [quarter]
FROM CTE
WHERE RN = 1
DEMO
CITYPROGRAM YEAR QUARTER
Abc 1999 4
xyz 2000 3
ROW_NUMBER returns only one row per group even if there are ties(cityprograms with the same highest year+quarter). If you then want to show all highest you can replace ROW_NUMBER with DENSE_RANK.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get top records of subset? - sql

Related

SQL How to take the minium for multiple fields?

Order table by the total count but do not lose the order by names

Firebird Query- Return first row each group

SQL Server : count types with totals by date change

Need to print highest year and their highest quarter in SQL Server 2012

Categories

Resources