Is there a way to do something like SQL NOT top statement? - sql

I'm trying to make a SQL statement that gives me the top X records and then all sums all the others. The first part is easy...
select top 3 Department, Sum(sales) as TotalSales
from Sales
group by Department
What would be nice is if I union a second query something like...
select NOT top 3 "Others" as Department, Sum(sales) as TotalSales
from Sales
group by Department
... for a result set that looks like,
Department TotalSales
----------- -----------
Mens Clothes 120.00
Jewelry 113.00
Shoes 98.00
Others 312.00
Is there a way to do an equivalent to a NOT operator on a TOP? (I know I can probably make a temp table of the top X and work with that, but I'd prefer a solution that was just a single sql statement.)

WITH q AS
(
SELECT ROW_NUMBER() OVER (ORDER BY SUM(sales) DESC) rn,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY SUM(sales) DESC) <= 3 THEN
department
ELSE
'Others'
END AS dept,
SUM(sales) AS sales
FROM sales
GROUP BY
department
)
SELECT dept, SUM(sales)
FROM q
GROUP BY
dept
ORDER BY
MAX(rn)

WITH cte
As (SELECT Department,
Sum(sales) as TotalSales
from Sales
group by Department),
cte2
AS (SELECT *,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY TotalSales DESC) <= 3 THEN
ROW_NUMBER() OVER (ORDER BY TotalSales DESC)
ELSE 4
END AS Grp
FROM cte)
SELECT MAX(CASE
WHEN Grp = 4 THEN 'Others'
ELSE Department
END) AS Department,
SUM(TotalSales) AS TotalSales
FROM cte2
GROUP BY Grp
ORDER BY Grp

You can use a union to sum all other departments. A common table expression makes this a little bit more readable:
; with Top3Sales as
(
select top 3 Department
, Sum(sales) as TotalSales
from Sales
group by
Department
order by
Sum(sales) desc
)
select Department
, TotalSales
from Top3Sales
union all
select 'Other'
, SUM(Sales)
from Sales
where Department not in (select Department from Top3Sales)
Example at data.stackexchange.com.

SELECT TOP 3 Department, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY Department
UNION ALL
SELECT 'Others', SUM(s.Sales)
FROM Sales s
WHERE s.Department NOT IN
(SELECT Department
FROM (SELECT TOP 3 Department, SUM(Sales)
FROM Sales
GROUP BY Department) D)

Related

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month
You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;
You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output
Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL
This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

Find top N most frequent categories with top N most frequent sub-categories for each category

I'm trying to make a single query that will retrieve:
The top e.g. 3 most popular brands from a list of cars. For each of the top 3 brands I want to retrieve the top 5 most popular models.
I tried with both a ranking/partitioning strategy and a distinct ON strategy but I cannot seem to figure out how I can get the limits to works within two queries.
Here is some sample data: http://sqlfiddle.com/#!15/1e81d5/1
From the ranking query I would expect an output like this, given the sample data (order not important):
brand car_mode count
'Audi' 'A4' 3
'Audi' 'A1' 3
'Audi' 'Q7' 2
'Audi' 'Q5' 2
'Audi' 'A3' 2
'VW' 'Passat' 3
'VW' 'Beetle' 3
'VW' 'Caravelle' 2
'VW' 'Golf' 2
'VW' 'Fox' 2
'Volvo' 'V70' 3
'Volvo' 'V40' 3
'Volvo' 'S60' 2
'Volvo' 'XC70' 2
'Volvo' 'V50' 2
Turns out I could use LATERAL join as suggested in comments. Thanks.
SELECT brand, car_model, the_count
FROM
(
SELECT brand FROM cars GROUP BY brand ORDER BY COUNT(*) DESC LIMIT 3
) o1
INNER JOIN LATERAL
(
SELECT car_model, count(*) as the_count
FROM cars
WHERE brand = o1.brand
GROUP BY brand, car_model
ORDER BY count(*) DESC LIMIT 5
) o2 ON true;
http://sqlfiddle.com/#!15/1e81d5/9
you can try by using cte and window function row_number()
with cte as
(
select brand,car_model,count(*) as cnt from cars group by brand,car_model
) , cte2 as
(
select * ,row_number() over(partition by brand order by cnt desc) rn from cte
)
select brand,car_model,cnt from cte2 where rn<=5
demo link
You can use window functions for this:
select brand, car_model, cnt_car
from (select c.*, dense_rank() over (order by cnt_brand, brand) as seqnum_b
from (select brand, car_model, count(*) as cnt_car,
row_number() over (partition by brand order by count(*) desc) as seqnum_bc,
sum(count(*)) over (partition by brand) as cnt_brand
from cars c
group by brand, car_model
) c
) c
where seqnum_bc <= 5 and seqnum_b <= 3
order by cnt_brand desc, brand, cnt desc;
If you know that each brand (or at least each top brand) has at least five cars, then you can simplify the query to:
select brand, car_model, cnt_car
from (select brand, car_model, count(*) as cnt_car,
row_number() over (partition by brand order by count(*) desc) as seqnum_bc,
sum(count(*)) over (partition by brand) as cnt_brand
from cars c
group by brand, car_model
) c
where seqnum_bc <= 5
order by cnt_brand desc, brand, cnt desc
limit 15

T-SQL: Select partitions which have more than 1 row

I've managed to use this query
SELECT
PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp, vendorpn ORDER BY SUM(sales) DESC) AS seqnum
FROM
BG_Invoice
GROUP BY
PartGrp, VendorPn, customer
ORDER BY
PartGrp, VendorPn, totalSales DESC
To get a result set like this. A list of sales records grouped by a group, a product ID (VendorPn), a customer, the customer's sales, and a sequence number which is partitioned by the group and the productID.
PartGrp VendorPn Customer totalSales seqnum
------------------------------------------------------------
AGS-AS 002A0002-252 10021013 19307.00 1
AGS-AS 002A0006-86 10021013 33092.00 1
AGS-AS 010-63078-8 10020987 10866.00 1
AGS-SQ B71040-39 10020997 7174.00 1
AGS-SQ B71040-39 10020998 2.00 2
AIRFRAME 0130-25 10017232 1971.00 1
AIRFRAME 0130-25 10000122 1243.00 2
AIRFRAME 0130-25 10008637 753.00 3
HARDWARE MS28775-261 10005623 214.00 1
M250 23066682 10013266 175.00 1
How can I filter the result set to only return rows which have more than 1 seqnum? I would like the result set to look like this
PartGrp VendorPn Customer totalSales seqnum
------------------------------------------------------------
AGS-SQ B71040-39 10020997 7174.00 1
AGS-SQ B71040-39 10020998 2.00 2
AIRFRAME 0130-25 10017232 1971.00 1
AIRFRAME 0130-25 10000122 1243.00 2
AIRFRAME 0130-25 10008637 753.00 3
Out of the first result set example, only rows with VendorPn "B71040-39" and "0130-25" had multiple customers purchase the product. All products which had only 1 customer were removed. Note that my desired result set isn't simply seqnum > 1, because i still need the first seqnum per partition.
I would change your query to be like this:
SELECT PartGrp,
VendorPn,
customer,
sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp,vendorpn ORDER BY SUM(sales) DESC) as seqnum,
COUNT(1) OVER (PARTITION BY partgrp,vendorpn) as cnt
FROM BG_Invoice
GROUP BY PartGrp,VendorPn, customer
HAVING cnt > 1
ORDER BY PartGrp,VendorPn, totalSales desc
You can try something like:
SELECT PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp,vendorpn ORDER BY SUM(sales) DESC) as seqnum
FROM BG_Invoice
GROUP BY PartGrp,VendorPn, customer
HAVING seqnum <> '1'
ORDER BY PartGrp,VendorPn, totalSales desc
WITH CTE AS (
SELECT
PartGrp,VendorPn, customer, sum(sales) as totalSales,
ROW_NUMBER() OVER (PARTITION BY partgrp, vendorpn ORDER BY SUM(sales) DESC) AS seqnum
FROM
BG_Invoice
GROUP BY
PartGrp, VendorPn, customer)
SELECT DISTINCT
a.*
FROM
CTE a
JOIN
CTE b
ON a.PartGrp = b.PartGrp
AND a.VendorPn = b.VendorPn
WHERE
b.seqnum > 1
ORDER BY
a.PartGrp,
a.VendorPn,
a.totalSales DESC;

Taking the Largest SUM from a table

I'm trying to get the Employee with the highest sales
Employee DeptNo Date Sales
Chris 2 2012/1/1 1000
Joe 1 2012/1/1 900
Arthur 3 2012/1/1 1100
Chris 2 2012/3/1 1200
Joe 1 2012/2/1 1500
Arthur 3 2010/2/1 1200
Joe 1 2010/3/1 900
Arthur 3 2010/3/1 1100
Arthur 3 2010/4/1 1200
Joe 1 2012/4/1 1500
Chris 2 2010/4/1 1800
I've tried using two subqueries, and then comparing them together to find the higher value
SELECT c1.Employee,
c1.TOTAL_SALES
FROM (SELECT Employee,
Sum(sales) AS TOTAL_SALES
FROM EmployeeSales
GROUP BY Employee) c1,
(SELECT Employee,
Sum(sales) AS TOTAL_SALES
FROM EmployeeSales
GROUP BY Employee) c2
WHERE ( c1.TOTAL_SALES > c2.TOTAL_SALES
AND c1.Employee > c2.Employee )
But the resulting query gives me two rows of
Employee TOTAL_SALES
joe 4800
joe 4800
What am I doing wrong?
I would use a CTE.
;With [CTE] as (
Select
[Employee]
,sum([Sales]) as [Total_Sales]
,Row_Number()
Over(order by sum([sales]) Desc) as [RN]
From [EmployeeSales]
Group by [Employee]
)
Select
[Employee]
,[Total_Sales]
From [CTE]
Where [RN] = 1
Example of working code SQL Fiddle:
http://sqlfiddle.com/#!3/bd772/2
To return all employees with the highest total sales, you can use SQL Server's proprietary TOP WITH TIES:
SELECT TOP (1) WITH TIES name, SUM(sales) as total_sales
FROM employees
GROUP BY name
ORDER BY SUM(sales) DESC
SELECT name, SUM(sales) as total_sales
FROM employees
GROUP BY name
ORDER by total_sales DESC
LIMIT 1;
A better solution is to group by an employee id so we are sure they are the same person. Since there can be two Chris's.
I would use a window partition
select * from
(
select
employee
, sum(sales) as sales
, row_number() over
(
order by sum(sales) desc
) as rank
from EmployeeSales
group by employee
) tmp
where tmp.rank = 1
And I agree with what someone said (Shawn) about having an employeeID and group by that for this, rather than the name.
(I removed the partition from the row_number() call as it is not needed for this)
you can use CTE for that
WITH CTE
AS ( select employee , sum(sales) as sales,
ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sum(sales) desc) RN
FROM EmployeeSales)
SELECT employee ,
sales
FROM CTE
WHERE RN =1

Conditional Max in SQL

I have to following query in SQL Server:
SELECT EmployeeID,
TotalQuantity AS TotalQty,
TotalSales,
MAX(CASE WHEN MonthNumber = MAX(MonthNumber)
THEN TotalSales END) as RecentMonthSale
FROM vwSales
GROUP BY EmployeeID, TotalQuantity , TotalSales
Bu it gives me the error:
Cannot perform an aggregate function on an expression
containing an aggregate or a subquery.
Input View is as follows:
EmployeeID TotaSales MonthNumber
1 4000 1
1 6000 2
2 8500 1
2 6081 2
Desired output:
EmployeeID TotalSale RecentMonthSale
1 10000 6000
2 14581 6081
3 11458 1012
I want following column in my output EmployeeID, TotalQuantity TotalSale RecentMonthSale My View has the following column EmployeeID TotalSale,TotalQuantity, MonthNumber.
This query will show the output that you need, and will scan the table only one time.
select EmployeeID, sum(TotalSales), sum(case when MaxMonth = 1 then TotalSales else 0 end) RecentMonthSales
from
(
select *, rank() over(order by MonthNumber desc) MaxMonth
from
(
select EmployeeID, MonthNumber, sum(TotalSales) TotalSales
from vwSales
group by EmployeeID, MonthNumber
) tt
) tt
group by EmployeeID
SELECT
vw.EmployeeID,
SUM(vw.TotalSale) as Total,
Recent.RecentMonthSale
FROM
vwSales vw
LEFT JOIN
(
SELECT
_vw.EmployeeID,
_vw.TotalSale as RecentMonthSale
FROM
vwSales _vw
INNER JOIN
(
SELECT EmployeeID, MAX(MonthNumber) as MaxMonth
FROM vwSales
GROUP BY EmployeeID
) _a
on _vw.EmployeeID = _a.EmployeeID
and _vw.MonthNumber = _a.MaxMonth
) Recent
on Recent.EmployeeID = vw.EmployeeID
GROUP BY
vw.EmployeeID,
Recent.RecentMonthSale
If you just execute each of the subqueries and view their results you should get a good idea for how this works