Select all data not in Top 'n' as 'Other' - sql

I hope somebody may be able to point out where i'm going wrong here but i've been looking at this for the last 30 minutes and not gotten anywhere with it.
I have a temporary table that is populated with data, the front end application cannot do any logic for me so please excuse the ugly case statement logic in the table.
The user is happy with the resultset brought back as I get the top 10 records. They have now decided they want to see a group of the remaining countries (all rows not in the top 10) as 'Other'.
I have tried to create a grouping of countries not in the top 10 but it's not working, I was planning on UNION'ing this result to the top 10 results.
SELECT c.Country, count(*) AS 'Total_Number_of_customers', COALESCE(ili.new_customers,0) AS 'New_Customers', COALESCE(ilb.existing_first,0) AS 'Existing_First_Trans', COALESCE(ilc.existing_old,0) AS 'Existing_Prev_Trans'
FROM #customer_tmp c
LEFT JOIN (SELECT z.country, count(*) AS 'new_customers' FROM #customer_tmp z where z.customer_type='New_Customer' group by z.country)ili ON ili.country = c.country
LEFT JOIN (SELECT zy.country, count(*) AS 'existing_first' FROM #customer_tmp zy where zy.customer_type='Existing_Customer' AND zy.first_transaction=1 group by zy.country)ilb ON ilb.country = c.country
LEFT JOIN (SELECT zx.country, count(*) AS 'existing_old' FROM #customer_tmp zx where zx.customer_type='Existing_Customer' AND zx.first_transaction=0 group by zx.country)ilc ON ilc.country = c.country
GROUP BY c.country, ili.new_customers, ilb.existing_first, ilc.existing_old
ORDER BY 2 DESC
Here is the SQL that I use to get results from my table.
For reference, each row in my temporary table contains a customer ID, the date they were created and their customer type, which is specific to what i'm trying to achieve.
Hopefully this is a simple problem and i'm just being a bit slow..
Many thanks in Adv.

Use the EXCEPT operator in SQL Server:
SELECT <fields>
FROM <table>
WHERE <conditons>
EXCEPT
<Query you want excluded>

Yup; EXCEPT, or maybe add a row number to your query and then select by that:
SELECT * FROM (
SELECT c.Country, count(*) AS 'Total_Number_of_customers',
row_number() OVER (ORDER BY COUNT(*) DESC) AS 'r',
COALESCE(ili.new_customers,0) AS 'New_Customers', COALESCE(ilb.existing_first,0) AS 'Existing_First_Trans', COALESCE(ilc.existing_old,0) AS 'Existing_Prev_Trans'
FROM #customer_tmp c
LEFT JOIN (SELECT z.country, count(*) AS 'new_customers' FROM #customer_tmp z where z.customer_type='New_Customer' group by z.country)ili ON ili.country = c.country
LEFT JOIN (SELECT zy.country, count(*) AS 'existing_first' FROM #customer_tmp zy where zy.customer_type='Existing_Customer' AND zy.first_transaction=1 group by zy.country)ilb ON ilb.country = c.country
LEFT JOIN (SELECT zx.country, count(*) AS 'existing_old' FROM #customer_tmp zx where zx.customer_type='Existing_Customer' AND zx.first_transaction=0 group by zx.country)ilc ON ilc.country = c.country
GROUP BY c.country, ili.new_customers, ilb.existing_first, ilc.existing_old
ORDER BY 2 DESC
) sub_query WHERE sub_query.r >= 10
This may be more flexible, as you can run one query and then divide the results up into "top ten" and "the rest" quite easily.
(This is equivalent to bobs' answer; I guess we were working on this at exactly the same time!)

Here's an approach using common table expressions (CTE)
WITH CTE AS
(
SELECT c.Country, count(*) AS 'Total_Number_of_customers', COALESCE(ili.new_customers,0) AS 'New_Customers', COALESCE(ilb.existing_first,0) AS 'Existing_First_Trans', COALESCE(ilc.existing_old,0) AS 'Existing_Prev_Trans'
, ROW_NUMBER() OVER (ORDER BY count(*) DESC) AS sequence
FROM #customer_tmp c
LEFT JOIN (SELECT z.country, count(*) AS 'new_customers' FROM #customer_tmp z where z.customer_type='New_Customer' group by z.country)ili ON ili.country = c.country
LEFT JOIN (SELECT zy.country, count(*) AS 'existing_first' FROM #customer_tmp zy where zy.customer_type='Existing_Customer' AND zy.first_transaction=1 group by zy.country)ilb ON ilb.country = c.country
LEFT JOIN (SELECT zx.country, count(*) AS 'existing_old' FROM #customer_tmp zx where zx.customer_type='Existing_Customer' AND zx.first_transaction=0 group by zx.country)ilc ON ilc.country = c.country
GROUP BY c.country, ili.new_customers, ilb.existing_first, ilc.existing_old
)
SELECT *
FROM CTE
WHERE sequence > 10
ORDER BY sequence

SELECT country, COUNT(*) cnt, SUM(new_customer), SUM(existing_first_trans), SUM(existing_prev_trans)
FROM (
SELECT CASE
WHEN country IN
(
SELECT TOP 10 country
FROM #customer_tmp
ORDER BY
COUNT(*) DESC
) THEN
country
ELSE 'Others'
END AS country,
CASE WHEN customer_type = 'New_Customer' THEN 1 END AS new_customer,
CASE WHEN customer_type = 'Existing_Customer' AND first_transaction = 1 THEN 1 AS existing_first_trans,
CASE WHEN customer_type = 'Existing_Customer' AND first_transaction = 0 THEN 1 AS existing_prev_trans,
FROM #customer_tmp
)
GROUP BY
country
ORDER BY
CASE country WHEN 'Others' THEN 2 ELSE 1 END, cnt DESC

Related

Multiple subquery join in View with group by returns duplicate rows

I have created a view using subquery but I want this view to return few mendatory column which cant be added in group by subquery, so I have to create one more select statement and join with other group by subquery
I am come up with following query,
But problem I am facing is if group by seller has 28 rows it returns 28 duplicate rows, also I want whole query to order by TotalOrderItem.
Alter VIEW [dbo].[SellersPerformance] AS
Select
RequiredColumns.Id as Id,
aggrgateDT.SellerId as SellerId,
aggrgateDT.TenantId as TenantId,
aggrgateDT.Active as Active,
aggrgateDT.TotalOrderedItem as TotalOrderItem,
aggrgateDT.MoveToPurchase as MoveToPurchase,
aggrgateDT.GoodPurchase as GoodPurchase,
RequiredColumns.Created as Created,
RequiredColumns.Modified as Modified,
RequiredColumns.CreatorId as CreatorId,
RequiredColumns.ModifierId as ModifierId
From
(
(Select
sellerId, p.TenantId, p.Active, count(*) as TotalOrderedItem,
count(*) - count(o.Id) as MoveToPurchase,
count(o.Id) as GoodPurchase,
count(case when o.ApplicationStatus = 'Perfect' then 1 end) as Perfect,
count(case when o.ApplicationStatus = 'R-Perfect' then 1 end) as R_Perfect
FROM [dbo].[AmazonOrderPurchaseInfo] p
left join [dbo].[AmazonOrder] o
on p.AmazonOrderId = o.Id
AND p.Id = o.[AmazonOrderPurchaseInfoId]
group by SellerId, p.TenantId, p.Active
order by TotalOrderedItem offset 0 rows
) aggrgateDT
Left outer Join (
SELECT
NEWID() Id,
purchase.Created AS Created,
purchase.Modified AS Modified,
purchase.CreatorId AS CreatorId,
purchase.ModifierId AS ModifierId,
purchase.SellerId As SellerId
From dbo.AmazonOrderPurchaseInfo purchase
) RequiredColumns ON aggrgateDT.SellerId = RequiredColumns.SellerId
)
GO
You may try Group by for this.
Alter VIEW [dbo].[SellersPerformance] AS
select res.Id, res.SellerId, res.TenandId, res.Active, res.TotalOrderItem, res.MovetoPurchase, res.GoodPurchase, res.Created, res.Modified, res.CreatorId, res.ModifierId
from
(
Select
RequiredColumns.Id as Id,
aggrgateDT.SellerId as SellerId,
aggrgateDT.TenantId as TenantId,
aggrgateDT.Active as Active,
aggrgateDT.TotalOrderedItem as TotalOrderItem,
aggrgateDT.MoveToPurchase as MoveToPurchase,
aggrgateDT.GoodPurchase as GoodPurchase,
RequiredColumns.Created as Created,
RequiredColumns.Modified as Modified,
RequiredColumns.CreatorId as CreatorId,
RequiredColumns.ModifierId as ModifierId
From
(
(Select
sellerId, p.TenantId, p.Active, count(*) as TotalOrderedItem,
count(*) - count(o.Id) as MoveToPurchase,
count(o.Id) as GoodPurchase,
count(case when o.ApplicationStatus = 'Perfect' then 1 end) as Perfect,
count(case when o.ApplicationStatus = 'R-Perfect' then 1 end) as R_Perfect
FROM [dbo].[AmazonOrderPurchaseInfo] p
left join [dbo].[AmazonOrder] o
on p.AmazonOrderId = o.Id
AND p.Id = o.[AmazonOrderPurchaseInfoId]
group by SellerId, p.TenantId, p.Active
order by TotalOrderedItem offset 0 rows
) aggrgateDT
Left outer Join (
SELECT
NEWID() Id,
purchase.Created AS Created,
purchase.Modified AS Modified,
purchase.CreatorId AS CreatorId,
purchase.ModifierId AS ModifierId,
purchase.SellerId As SellerId
From dbo.AmazonOrderPurchaseInfo purchase
) RequiredColumns ON aggrgateDT.SellerId = RequiredColumns.SellerId
) as res
group by res.Id, res.SellerId, res.TenandId, res.Active, res.TotalOrderItem, res.MovetoPurchase, res.GoodPurchase, res.Created, res.Modified, res.CreatorId, res.ModifierId
)
GO
Here if Id, Created, Modified, CreatorId, ModifierId columns will have same id then you may get your expected result.

Limit result from inner join query to 2 rows

My query is giving me result from grouped data but now I want only two rows
I have tried HAVING COUNT(*) <= 2 but issue is is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
my query is
select f.CompanyName, f.EmployeeCity, f.PrioritySL ,f.EmployeeSeniorityLevel ,f.EmployeeID
from (
select ConcatKey, min(PrioritySL) as PSL
from dbo.WalkerItContacts group by ConcatKey
) as x inner join dbo.WalkerItContacts as f on f.ConcatKey = x.ConcatKey and f.PrioritySL = x.PSL
where f.PrioritySL != '10'
Company apple have 9 records I want only 2 records
my data
company name priority
a 10
a 1
a 3
b 2
b 4
b 3
b 5
c 1
c 10
c 2
my expected data
company name priority
a 1
a 3
b 2
b 3
c 1
c 2
Add a 'top 2' clause to the outer query:
select top 2 f.CompanyName, f.EmployeeCity, f.PrioritySL ,f.EmployeeSeniorityLevel ,f.EmployeeID
from (
select ConcatKey, min(PrioritySL) as PSL
from dbo.WalkerItContacts group by ConcatKey
) as x inner join dbo.WalkerItContacts as f on f.ConcatKey = x.ConcatKey and f.PrioritySL = x.PSL
where f.PrioritySL != '10'
and f.CompanyName= 'Apple'
will give you two rows. Add a order clause by in the outer query so you can control which two rows are returned.
You can phrase this more succinctly and with better performance as:
select top (2) wic.*
from (select wic,
rank() over (partition by CompanyName, ConcatKey order by PrioritySL) as seqnum
from dbo.WalkerItContacts wic
) wic
where seqnum = 1 and
wic.PrioritySL <> 10 and
wic.CompanyName = 'Apple';
I think you could solve your problem using the ROW_NUMBER() function to count the rows and filter it in the WHERE clause to only show 2 rows per group.
I think something like this might work for you:
SELECT rownum, f.CompanyName, f.EmployeeCity, f.PrioritySL,
f.EmployeeSeniorityLevel, f.EmployeeID
FROM ( SELECT ConcatKey, MIN(PrioritySL) AS PSL, ROW_NUMBER() OVER(PARTITION BY
f.CompanyName) AS rownum
FROM dbo.WalkerItContacts
GROUP BY ConcatKey) AS x
INNER JOIN dbo.WalkerItContacts AS f ON f.ConcatKey = x.ConcatKey
AND f.PrioritySL = x.PSL
WHERE f.PrioritySL != '10' AND rownum <= 2
ORDER BY f.CompanyName ASC;
Hope this helps some.

SQL Select TOP 1 for each group in subquery

Good morning,
I want to alter my query in such a way, that only the top 1, filtered from h.started asc is selected.
select h.started, * from wshhistory h
join asset a on h.assetid = a.uid
inner join
(
select Count(*) as TotalLatest, a.uid, a.deleted from asset a
join wshhistory h on a.uid = h.assetid
where h.latest = 1
group by a.uid, a.deleted
having Count(*) > 1
) X
on X.uid = h.assetid
where X.deleted = 0 and h.latest = 1
order by h.assetid desc
I searched all over, and found in most posts, to use:
ROW_NUMBER() OVER (PARTITION BY a.uid ORDER BY h.started asc) as rn
But I can't seem to use this since I need use group by, and this results in the error message:
Column 'wshhistory.started' is invalid in the select list because it
is not contained in either an aggregate function or the GROUP BY
clause.
To give some extra info about my query:
I need to search where I have duplicates of Latest = 1 (table: wshhistory), of the same assetid. And then I need to set the them all on 0 except the latest one.
I think you want something like this:
with toupdate as (
select h.*,
row_number() over (partition by h.assetid order by h.started desc) as seqnum
from wshhistory h
where h.latest = 1
)
update toupdate
set latest = 0
where seqnum > 1 and
exists (select 1
from asset a
where a.uid = toupdate.assetid and a.deleted = 0
);
Sample data and desired results are much easier to work with than non-working queries.

Complex Full Outer Join

Sigh ... can anyone help? In the SQL query below, the results I get are incorrect. There are three (3) labor records in [LaborDetail]
Hours / Cost
2.75 / 50.88
2.00 / 74.00
1.25 / 34.69
There are two (2) material records in [WorkOrderInventory]
Material Cost
42.75
35.94
The issue is that the query incorrectly returns the following:
sFunction cntWO sumLaborHours sumLaborCost sumMaterialCost
ROBOT HARNESS 1 12 319.14 236.07
What am I doing wrong in the query that is causing the sums to be multiplied? The correct values are sumLaborHours = 6, sumLaborCost = 159.57, and sumMaterialCost = 78.69. Thank you for your help.
SELECT CASE WHEN COALESCE(work_orders.location, Work_Orders_Archived.location) IS NULL
THEN '' ELSE COALESCE(work_orders.location, Work_Orders_Archived.location) END AS sFunction,
(SELECT COUNT(*)
FROM work_orders
FULL OUTER JOIN Work_Orders_Archived
ON work_orders.order_number = Work_Orders_Archived.order_number
WHERE COALESCE(work_orders.order_number, Work_Orders_Archived.order_number) = '919630') AS cntWO,
SUM(Laborhours) AS sumLaborHours,
SUM(LaborCost) AS sumLaborCost,
SUM(MaterialCost*MaterialQuanity) AS sumMaterialCost
FROM work_orders
FULL OUTER JOIN Work_Orders_Archived
ON work_orders.order_number = Work_Orders_Archived.order_number
LEFT OUTER JOIN
(SELECT HoursWorked AS Laborhours, TotalDollars AS LaborCost, WorkOrderNo
FROM LaborDetail) AS LD
ON COALESCE(work_orders.order_number, Work_Orders_Archived.order_number) = LD.WorkOrderNo
LEFT OUTER JOIN
(SELECT UnitCost AS MaterialCost, Qty AS MaterialQuanity, OrderNumber
FROM WorkOrderInventory) AS WOI
ON COALESCE(work_orders.order_number, Work_Orders_Archived.order_number) = WOI.OrderNumber
WHERE COALESCE(work_orders.order_number, Work_Orders_Archived.order_number) = '919630'
GROUP BY CASE WHEN COALESCE(work_orders.location, Work_Orders_Archived.location) IS NULL
THEN '' ELSE COALESCE(work_orders.location, Work_Orders_Archived.location) END
ORDER BY sFunction
Try using the SUM function inside a derived table subquery when doing the full join to "WorkOrderInventory" like so...
select
...
sum(hrs) as sumlaborhrs,
sum(cost) as sumlaborcost,
-- calculate material cost in subquery
summaterialcost
from labordetail a
full outer join
(select ordernumber, sum(materialcost) as summaterialcost
from WorkOrderInventory
group by ordernumber
) b on a.workorderno = b.ordernumber
i created a simple sql fiddle to demonstrate this (i simplified your query for examples sake)
Looks to me that work_orders and work_orders_archived contains the same thing and you need both tables as if they were one table. So you could instead of joining create a UNION and use it as if it was one table:
select location as sfunction
from
(select location
from work_orders
union location
from work_orders_archived)
Then you use it to join the rest. What DBMS are you on? You could use WITH. But this does not exist on MYSQL.
with wo as
(select location as sfunction, order_number
from work_orders
union location, order_number
from work_orders_archived)
select sfunction,
count(*)
SUM(Laborhours) AS sumLaborHours,
SUM(LaborCost) AS sumLaborCost,
SUM(MaterialCost*MaterialQuanity) AS sumMaterialCost
from wo
LEFT OUTER JOIN
(SELECT HoursWorked AS Laborhours, TotalDollars AS LaborCost, WorkOrderNo
FROM LaborDetail) AS LD
ON COALESCE(work_orders.order_number, Work_Orders_Archived.order_number) = LD.WorkOrderNo
LEFT OUTER JOIN
(SELECT UnitCost AS MaterialCost, Qty AS MaterialQuanity, OrderNumber
FROM WorkOrderInventory) AS WOI
ON COALESCE(work_orders.order_number, Work_Orders_Archived.order_number) = WOI.OrderNumber
where wo.order_number = '919630'
group by sfunction
order by sfunction
The best guess is that the work orders appear more than once in one of the tables. Try these queries to check for duplicates in the two most obvious candidate tables:
select cnt, COUNT(*), MIN(order_number), MAX(order_number)
from (select order_number, COUNT(*) as cnt
from work_orders
group by order_number
) t
group by cnt
order by 1;
select cnt, COUNT(*), MIN(order_number), MAX(order_number)
from (select order_number, COUNT(*) as cnt
from work_orders_archived
group by order_number
) t
group by cnt
order by 1;
If either returns a row where cnt is not 1, then you have duplicates in the tables.

SQL Server 2005, SELECT DISTINCT query

How can I modify this query to return distinct Models and VideoProcessor columns ?
SELECT TOP(20) *
FROM
(SELECT
[3D_Benchmarks].Id AS BenchmarkId,
Manufacturer, Model, Slug, VideoProcessor,
FPS, CPU
FROM
[3D_Benchmarks]
JOIN
[3D_Slugs] ON [3D_Benchmarks].Id = [3D_Slugs].BenchmarkId) AS tb
ORDER BY
tb.FPS DESC;
New answer based on your comment. This looks up the twenty highest FPS Model+VideoProcessor. For each of those, it picks the row with the highest FPS.
select details.Model
, details.VideoProcessor
, details.FPS
, <add other columns here>
from (
select top 20 b.Model
, VideoProcessor
from [3D_Benchmarks] b
join [3D_Slugs] s
on b.Id = s.BenchmarkId
group by
b.Model
order by
max(b.FPS) desc
) top20
cross apply
(
select top 1 *
from [3D_Benchmarks[ b
join [3D_Slugs] s
on b.Id = s.BenchmarkId
where b.Model = top20.Model
and b.VideoProcessor = top20.VideoProcessor
order by
b.FPS desc
) details