I have the following tables in SQL Server:
COMMANDLINES: ID_LINE - ID_COMMAND - ID_ARTICLE - QUANTITY
COMMAND: ID_COMMAND - ID_CLIENT - PRICE - PRINTED
CLIENT: ID_CLIENT - FULL_NAME - SSN - PH_NUM - MOBILE - USERNAME - PASSWORD
ARTICLE: ID_ARTICLE - DES - NAME - PRICE - TYPE - CURRENT_QTT - MINIMUM_QTT
ID_COMMAND from COMMANDLINES references COMMAND.ID_COMMAND
ID_CLIENT from COMMAND references CLIENT.ID_CLIENT
ID_ARTICLE from COMMANDLINES references ARTICLE.ID_ARTICLE
I need to create a view where I need to show all COMMANDLINES that have the best client (the one with the highest total of PRICE) and then I need to order them by ID_COMMAND in a descending order AND by ID_LINE in ascending order.
Sample data:
COMMANDLINE table:
COMMAND table:
Only these 2 are needed to resolve the problem. I added the other just for more information.
Sample output:
To be honest, I'm not sure if both outputs are supposed to be "output" at the same time or that I need 2 VIEWS for each output.
WHAT HAVE I DONE SO FAR:
I looked through what I could find on StackOverflow about MAX of SUM, but unfortunately, it has not helped me much in this case. I always seem to be doing something wrong.
I also found out that in order to use ORDER BY in VIEWS you need to, in this case, use TOP, but I've no idea how to apply it correctly when I need to select all of the COMMANDLINES. In one of my previous things, I used the following SELECT TOP:
create view PRODUCTS_BY_TYPE
as
select top (select count(*) from ARTICLE
where CURRENT_QTT > MINIMUM_QTT)*
from
ARTICLE
order by
TYPE
This allowed me to show all PRODUCT data where the CURRENT_QTT was more than the minimum ordering them by type, but I can't figure out for the life of me, how to apply this to my current situation.
I could start with something like this:
create view THE_BEST
as
select COMMANDLINE.*
from COMMANDLINE
But then I don't know how to apply the TOP.
I figured that first, I need to see who the best client is, by SUM-ing all of the PRICE under his ID and then doing a MAX on all of the SUM of all clients.
So far, the best I could come up with is this:
create view THE_BEST
as
select top (select count(*)
from (select max(max_price)
from (select sum(PRICE) as max_price
from COMMAND) COMMAND) COMMAND) COMMANDLINE.*
from COMMANDLINE
inner join COMMAND on COMMANDLINE.ID_COMMAND = COMMAND.ID_COMMAND
order by COMMAND.ID_COMMAND desc, COMMANDLINE.ID_LINE asc
Unfortunately, in the select count(*) the COMMAND is underlined in red (a.k.a. the 3rd COMMAND word) and it says that there is "no column specified for column 1 of COMMAND".
EDIT:
I've come up with something closer to what I want:
create view THE_BEST
as
select top (select count(*)
from (select max(total_price) as MaxPrice
from (select sum(PRICE) as total_price
from COMMAND) COMMAND) COMMAND)*
from COMMANDLINE
order by ID_LINE asc
Still missing the ordered by ID_COMMAND and I only get 1 result in the output when it should be 2.
here is some code that hopefully will show you how you can use the top-clause and also a different approche to show only the "top" :-)
/* Creating Tables*/
CREATE TABLE ARTICLE (ID_ARTICLE int,DES varchar(10),NAME varchar(10),PRICE float,TYPE int,CURRENT_QTT int,MINIMUM_QTT int)
CREATE TABLE COMMANDLINES (ID_LINE int,ID_COMMAND int,ID_ARTICLE int,QUANTITY int)
CREATE TABLE COMMAND (ID_COMMAND int, ID_CLIENT varchar(20), PRICE float, PRINTED int)
CREATE TABLE CLIENT (ID_CLIENT varchar(20), FULL_NAME varchar(50), SSN varchar(50), PH_NUM varchar(50), MOBILE varchar(50), USERNAME varchar(50), PASSWORD varchar(50))
INSERT INTO COMMANDLINES VALUES (1,1,10,20),(2,1,12,3),(3,1,2,21),(1,2,30,2),(2,2,21,5),(1,3,32,20),(2,3,21,2)
INSERT INTO COMMAND VALUES (1,'1695152D',1200,0),(2,'1695152D',500,0),(3,'2658492D',200,0)
INSERT INTO ARTICLE VALUES(1, 'A','AA',1300,0,10,5),(2,'B','BB',450,0,10,5),(30,'C','CC',1000,0,5,5),(21,'D','DD',1500,0,5,5),(32,'E','EE',1600,1,4,5),(3,'F','FF',210,2,15,5)
INSERT INTO CLIENT VALUES ('1695152D', 'DoombringerBG', 'A','123','321','asdf','asf'),('2658492D', 'tgr', 'A','123','321','asdf','asf')
GO
/* Your View-Problem*/
CREATE VIEW PRODUCTS_BY_TYPE AS
SELECT TOP 100 PERCENT *
FROM ARTICLE
WHERE CURRENT_QTT > MINIMUM_QTT -- You really don't want >= ??
ORDER BY [Type]
-- why do you need your view with an ordered output? cant your query order the data?
GO
OUTPUT:
ID_ARTICLE | DES | NAME | PRICE | TYPE | CURRENT_QTT | MINIMUM_QTT
-------------+-------+-------+-------+------+--------------+-------------
1 | A | AA | 1300 | 0 | 10 | 5
2 | B | BB | 450 | 0 | 10 | 5
3 | F | FF | 210 | 2 | 15 | 5
I hope this is what you were looking for :-)
-- your top customers
SELECT cli.FULL_NAME, SUM(c.PRICE)
FROM COMMANDLINES as cl
INNER JOIN COMMAND as c
on cl.ID_COMMAND = c.ID_COMMAND
INNER JOIN CLIENT as cli
on cli.ID_CLIENT = c.ID_CLIENT
GROUP BY cli.FULL_NAME
ORDER BY SUM(c.PRICE) DESC -- highest value first
SELECT *
FROM (
-- your top customers with a rank
SELECT cli.FULL_NAME, SUM(c.PRICE) as Price, ROW_NUMBER() OVER (ORDER BY SUM(c.PRICE) DESC) AS RowN
FROM COMMANDLINES as cl
INNER JOIN COMMAND as c
on cl.ID_COMMAND = c.ID_COMMAND
INNER JOIN CLIENT as cli
on cli.ID_CLIENT = c.ID_CLIENT
GROUP BY cli.FULL_NAME
) as a
-- only the best :-)
where RowN = 1
--considerations: what if two customers have the same value?
Output:
FULL_NAME |Price | RowN
----------------+---------+-------
DoombringerBG | 4600 | 1
Regards
tgr
===== EDITED =====
The syntax-corrention to your THE_BEST-View:
create view THE_BEST AS
SELECT TOP (
SELECT count(*) as cnt
FROM (
SELECT max(max_price) as max_price
FROM (
SELECT sum(PRICE) AS max_price
FROM COMMAND
) COMMAND
) COMMAND
)
cl.*
FROM COMMANDLINES as cl
INNER JOIN COMMAND as c
ON cl.ID_COMMAND = c.ID_COMMAND
ORDER BY c.ID_COMMAND DESC
,cl.ID_LINE ASC
Without the OVER-Clause:
SELECT TOP 1 *
FROM (
-- your top customers with a rank
SELECT cli.FULL_NAME, SUM(c.PRICE) as Price
FROM COMMANDLINES as cl
INNER JOIN COMMAND as c
on cl.ID_COMMAND = c.ID_COMMAND
INNER JOIN CLIENT as cli
on cli.ID_CLIENT = c.ID_CLIENT
GROUP BY cli.FULL_NAME
) as a
-- only the best :-)
ORDER BY Price DESC
Your PRODUCTS_BY_TYPE without PERCENT:
CREATE VIEW PRODUCTS_BY_TYPE AS
SELECT TOP (select
SUM(p.rows)
from sys.partitions as p
inner join sys.all_objects as ao
on p.object_id = ao.object_id
where ao.name = 'ARTICLE'
and ao.type = 'U')
*
FROM ARTICLE
WHERE CURRENT_QTT > MINIMUM_QTT -- You really don't want >= ??
ORDER BY [Type]
go
but to be honest - i would never use such a query in production... i only posted this because you need it for studing purposes...
It is quite likely that there is some misunderstanding between you and your teacher. You can technically have ORDER BY clause in a view definition, but it never guarantees any order of the rows in the query that uses the view, such as SELECT ... FROM your_view. Without ORDER BY in the final SELECT the order of the result set is not defined. The order of rows returned to the client by the server is determined only by the final outermost ORDER BY of the query, not by the ORDER BY in the view definition.
The purpose of having TOP in the view definition is to limit the number of returned rows somehow. For example, TOP (1). In this case ORDER BY specifies which row(s) to return.
Having TOP 100 PERCENT in a view does nothing. It doesn't reduce the number of returned rows and it doesn't guarantee any specific order of returned rows.
Having said all that, in your case you need to find one best client, so it makes sense to use TOP (1) in a sub-query.
This query would return the ID of the best client:
SELECT
TOP (1)
-- WITH TIES
ID_CLIENT
FROM COMMAND
GROUP BY ID_CLIENT
ORDER BY SUM(PRICE) DESC
If there can be several clients with the same maximum total price and you want to return data related to all of them, not just one random client, then use TOP WITH TIES.
Finally, you need to return lines that correspond to the chosen client(s):
create view THE_BEST
as
SELECT
COMMANDLINE.ID_LINE
,COMMANDLINE.ID_COMMAND
,COMMANDLINE.ID_ARTICLE
,COMMANDLINE.QUANTITY
FROM
COMMANDLINE
INNER JOIN COMMAND ON COMMAND.ID_COMMAND = COMMANDLINE.ID_COMMAND
WHERE
COMMAND.ID_CLIENT IN
(
SELECT
TOP (1)
-- WITH TIES
ID_CLIENT
FROM COMMAND
GROUP BY ID_CLIENT
ORDER BY SUM(PRICE) DESC
)
;
This is how the view can be used:
SELECT
ID_LINE
,ID_COMMAND
,ID_ARTICLE
,QUANTITY
FROM THE_BEST
ORDER BY ID_COMMAND DESC, ID_LINE ASC;
Note, that ORDER BY ID_COMMAND DESC, ID_LINE ASC has to be in the actual query, not in the view definition.
Related
I need to find the last P.O.for parts purchased from Vendors.
I was trying to come up with a way to do this using a query I found that allowed me to find
the max Creation date for a group of Quotes linked to an Opportunity:
SELECT
t1.[quoteid]
,t1.[OpportunityId]
,t1.[Name]
FROM
[Quote] t1
WHERE
t1.[CreatedOn] = (SELECT MAX(t2.[CreatedOn])
FROM [Quote] t2
WHERE t2.[OpportunityId] = t1.[OpportunityId])
In the case of Purchase Orders, though, I have a header table and a line item table.
So, I need to include info from both:
SELECT
PURCHASE_ORDER.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURCHASE_ORDER,
PURC_ORDER_LINE
WHERE
PURCHASE_ORDER.ID=
PURC_ORDER_LINE.PURC_ORDER_ID
If the ORDER_DATE from the header were available in the PURC_ORDER_LINE table I thought
this could be done like so:
SELECT
PURC_ORDER_LINE.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURC_ORDER_LINE T1
WHERE T1.ORDER_DATE=(SELECT MAX(T2.ORDER_DATE)
FROM PURC_ORDER_LINE T2
WHERE T2.PURC_ORDER_ID=T1.PURC_ORDER_ID)
But I'm not sure that's correct and, in any case, there are 2 things:
The ORDER_DATE is in the Header table, not in the line table
I need the last P.O. created for each of the Parts (PART_ID)
So:
PART_A and PART_B, as an example, may appear on several P.O.s
Part
Order Date
P.O. #
PART_A
2020-08-17
PO12345
PART_A
2020-11-21
PO23456
PART_A
2021-07-08
PO29986
PART_B
2019-11-30
PO00861
PART_B
2021-08-30
PO30001
The result set would be (including the other fields from above):
ORDER_DATE
PURC_ORDER_ID
PART_ID
UNIT_PRICE
ORDER_QTY
2021-07-08
PO29986
PART_A
321.00
12
2021-08-30
PO30001
PART_B
426.30
8
I need a query that will give me such a result set.
You can use row-numbering for this. Just place the whole join inside a subquery (derived table), add a row-number, then filter on the outside.
SELECT *
FROM (
SELECT
pol.PART_ID,
po.ORDER_DATE,
pol.PURC_ORDER_ID,
pol.UNIT_PRICE,
pol.USER_ORDER_QTY,
rn = ROW_NUMBER() OVER (PARTITION BY pol.PART_ID ORDER BY po.ORDER_DATE DESC)
FROM PURCHASE_ORDER po
JOIN PURC_ORDER_LINE pol ON po.ID = pol.PURC_ORDER_ID
) po
WHERE po.rn = 1;
Note the use of proper join syntax, as well as table aliases
you can use window function:
select * from (
select * , row_number() over (partition by PART_ID order by ORDER_DATE desc) rn
from tablename
) t where t.rn = 1
I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE
Given the following data base structure (its just a demo structure):
order:
| id | orderedat |
order_item:
| id | id_order | id_article | price | currency |
What I want to get is the latest order price of each article:
| id_article | orderedat | price | currency |
I wrote this statement to achive the goal:
WITH "last_ordered" AS (
SELECT "oi"."id_article", "oi"."id" as "id_oi", MAX("o"."orderedat")
FROM "order" "o", "order_item" "oi"
WHERE ("oi"."id_order" = "o"."id")
GROUP BY "oi"."id", "oi"."id_id_article"
)
SELECT "oi"."id_article", "o"."orderedat", "oi"."price", "oi"."currency"
FROM "order_item" "oi"
INNER JOIN "last_ordered" ON ("oi"."id" = "last_ordered"."id_oi")
INNER JOIN "order" "o" ON ("oi"."id_order" = "o"."id")
However, the group by is wrong, as it returns one line for each order_item the article is linked to. On the other hand column "oi.id" must appear in the GROUP BY clause. I don't know how to solve this issue. Can someone help me please?
Maybe there is another solution for this issue.
Thanks in advance :)
select distinct on (id_article)
id_article, orderedat, price, currency
from
order o
inner join
order_item oi on o.id = oi.id_order
order by id_article, orderedat desc
Check distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
WITH "last_ordered" AS (
SELECT "o".*,"oi".*,
ROW_NUMBER() OVER (PARTITION BY "oi"."id_id_article" ORDER BY "o"."orderedat" DESC) as Rnum
FROM "order" "o"
INNER JOIN "order_item" "oi"
ON "oi"."id_order" = "o"."id"
)
SELECT * FROM last_ordered
WHERE Rnum =1
Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)
Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b
Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1
Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)
Imagine I have a table showing the sales of Acme Widgets, and where they were sold. It's fairly easy to produce a report grouping sales by country. It's fairly easy to find the top 10. But what I'd like is to show the top 10, and then have a final row saying Other. E.g.,
Ctry | Sales
=============
GB | 100
US | 80
ES | 60
...
IT | 10
Other | 50
I've been searching for ages but can't seem to find any help which takes me beyond the standard top 10.
TIA
I tried some of the other solutions here, however they seem to be either slightly off, or the ordering wasn't quite right.
My attempt at a Microsoft SQL Server solution appears to work correctly:
SELECT Ctry, Sales FROM
(
SELECT TOP 2
Ctry,
SUM(Sales) AS Sales
FROM
Table1
GROUP BY
Ctry
ORDER BY
Sales DESC
) AS Q1
UNION ALL
SELECT
Ctry AS 'Other',
SUM(Sales) AS Sales
FROM
Table1
WHERE
Ctry NOT IN (SELECT TOP 2
Ctry
FROM
Table1
GROUP BY
Ctry
ORDER BY
SUM(Sales) DESC)
Note that in my example, I'm only using TOP 2 rather than TOP 10. This is simply due to my test data being rather more limited. You can easily substitute the 2 for a 10 in your own data.
Here's the SQL Script to create the table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Table1](
[Ctry] [varchar](50) NOT NULL,
[Sales] [float] NOT NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
And my data looks like this:
GB 10
GB 21.2
GB 34
GB 16.75
US 10
US 11
US 56.43
FR 18.54
FR 98.58
WE 44.33
WE 11.54
WE 89.21
KR 10
PO 10
DE 10
Note that the query result is correctly ordered by the Sales value aggregate and not the alphabetic country code, and that the "Other" category is always last, even if it's Sales value aggregate would ordinarily push it to the top of the list.
I'm not saying this is the best (read: most optimal) solution, however, for the dataset that I provided it seems to work pretty well.
SELECT Ctry, sum(Sales) Sales
FROM (SELECT COALESCE(T2.Ctry, 'OTHER') Ctry, T1.Sales
FROM (SELECT Ctry, sum(Sales) Sales
FROM Table1
GROUP BY Ctry) T1
LEFT JOIN
(SELECT TOP 10 Ctry, sum(sales) Sales
FROM Table1
GROUP BY Ctry) T2
on T1.Ctry = T2.Ctry
) T
GROUP BY Ctry
The pure SQL solutions to this problem make multiple passes through the individual records more than once. The following solution only queries the data once, and uses a SQL ranking function, ROW_NUMBER() to determine if some results belong in the "Other" category. The ROW_NUMBER() function has been available in SQL Server since SQL Server 2008. In my database, this seems to have resulted in a more efficient query. Please note that the "Other" row will appear above some rows if the total of the "Other" sales exceeds the top 10. If this is not desired some adjustments would need to be made to this query:
SELECT CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END AS Ctry,
SUM(Sales) as Sales FROM
(
SELECT Ctry, SUM(Sales) as Sales,
ROW_NUMBER() OVER(ORDER BY SUM(Sales) DESC) AS RowNumber
FROM Table1 GROUP BY Ctry
) as AggregateQuery
GROUP BY CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END
ORDER BY SUM(Sales) DESC
Using a real analytics SQL engine, such as Apache Spark, you can use Common Table Expression with to do:
with t as (
select rank() over (order by sales desc) as r, sales,city
from DB
order by sales desc
)
select sales, city, r
from t where r <= 10
union
select sum(sales) as sales, "Other" as city, 11 as r
from t where r > 10
In pseudo SQL:
select top 10 order by sales
UNION
select 'Other',SUM(sales) where Ctry not in (select top 10 like above)
Union the top ten with an outer Join of the top ten with the table it self to aggregate the rest.
I don't have access to SQL here but I'll hazzard a guess:
select top (10) Ctry, sales from table1
union all
select 'other', sum(sales)
from table1
left outer join (select top (10) Ctry, sales from table1) as table2
on table2.Ctry = table2.Ctry
where table2.ctry = null
group by table1.Ctry
Of course if this is a rapidly changing top(10) then you either lock or maintain a copy of the top(10) for the duration of the query.
Have in mind that depending on your use (and database volume / restrictions) you can achieve the same results using application code (python, node, C#, java etc). Sure it will depend on your use-case but hey, it's possible.
I ended up doing this in C# for instance:
// Mockup Class that has a CATEGORY and it's VOLUME
class YourModel { string category; double volume; }
List<YourModel> groupedList = wholeList.Take (5).ToList ();
groupedList.Add (new YourModel()
{
category = "Others",
volume = tempChartData.Skip (5).Select (t => t.qtd).Sum ()
});
Disclaimer
I understand that this is a "SQL Only" tagged question, but there might be other people like me out there who can make use of the application layer instead of relying only on SQL to make it happen. I am just trying to show people other ways of doing the same thing, that might be helpful. Even if this gets downvoted to oblivion I know that someone will be happy to read this because they were taught to use each tool to it's best, and think "outside the box".