SQL Server Group By Complex Query - sql

In SQL Server, suppose we have a SALES_HISTORY table as below.
CustomerNo PurchaseDate ProductId
1 20120411 12
1 20120330 13
2 20120312 14
3 20120222 16
3 20120109 16
... and many records for each purchase of each customer...
How can I write the appropriate query for finding:
For each customer,
find the product he bought at MOST,
find the percentage of this product over all products he bought.
The result table must have columns like:
CustomerNo,
MostPurchasedProductId,
MostPurchasedProductPercentage

Assuming SQL Server 2005+, you can do the following:
;WITH CTE AS
(
SELECT *,
COUNT(*) OVER(PARTITION BY CustomerNo, ProductId) TotalProduct,
COUNT(*) OVER(PARTITION BY CustomerNo) Total
FROM YourTable
), CTE2 AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY CustomerNo
ORDER BY TotalProduct DESC)
FROM CTE
)
SELECT CustomerNo,
ProductId MostPurchasedProductId,
CAST(TotalProduct AS NUMERIC(16,2))/Total*100 MostPurchasedProductPercent
FROM CTE2
WHERE RN = 1
You still need to deal when you have more than one product as the most purchased one. Here is a sqlfiddle with a demo for you to try.

Could do a lot prettier, but it works:
with cte as(
select CustomerNo, ProductId, count(1) as c
from SALES_HISTORY
group by CustomerNo, ProductId)
select CustomerNo, ProductId as MostPurchasedProductId, (t.c * 1.0)/(select sum(c) from cte t2 where t.CustomerNo = t2.CustomerNo) as MostPurchasedProductPercentage
from cte t
where c = (select max(c) from cte t2 where t.CustomerNo = t2.CustomerNo)
SQL Fiddle

Related

Find the most frequent data every year in SQL

TableA :
Years Data
2000 A
2000 B
2000 C
2000 C
2000 D
2001 A
2001 B
2001 B
2002 B
2002 D
2002 D
I want to output:
Years Data
2000 C
2001 B
2002 D
My solution:
SELECT DISTINCT Years, Data
FROM
(
SELECT Years, Data, COUNT(*) AS _count
FROM TableA
GROUP BY Years, Data
) a1
ORDER BY Years, _count DESC
But it have a problem:
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
How do I correct my SQL code?
Assuming your database supports row_number(), you can do it like this:
SELECT Years, Data
FROM
(
SELECT Years,
Data,
ROW_NUMBER() OVER(PARTITION BY Years ORDER BY count(*) DESC) rn
FROM TableA
GROUP BY Years, Data
) x
WHERE rn = 1
ORDER BY Years, Data
See a live demo on rextester.
Try this:
select t.Years, t.[Data]
from (
select *, count(*) cnt
from TableA
group by years, [Data]
) t
left join (
select Years, max(cnt) maxCnt
from (
select *, count(*) cnt
from TableA
group by years, [Data]
) t
group by Years
) tt on t.Years = tt.Years -- tt is a view that gives you max count of each year
where t.cnt = tt.maxCnt -- you need `years`, `[Data]` that its count is max count
order by years;
SQL Fiddle Demo
Another way is to use rank() in a DBMS that supports it:
;with t as (
select *, count(*) cnt
from TableA
group by years, [Data]
), tt as (
select *, rank() over (partition by years order by cnt desc) rn
from t
)
select years, [Data]
from tt
where rn = 1
order by years;
SQL Fiddle Demo
If you're using oracle, you can use the function STATS_MODE
select years, stats_mode(data)
from tablet
group by years;
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm
Your error is "ORDER BY items must appear in the select list if SELECT DISTINCT is specified."
This means you have put something in the ORDER BY that is not in the SELECT. In this case, _count DESC is not in the SELECT statement
SELECT DISTINCT Years, Data, _count DESC
FROM
(
SELECT Years, Data, COUNT(*) AS _count
FROM TableA
GROUP BY Years, Data
) a1
ORDER BY Years, _count DESC

Max count by customer and category

I have transactional data that looks like this
Account ProductCategory
1 a
1 a
1 b
2 c
2 d
2 d
I need to find the ProductCategory that appears most per customer. Results:
Account ProductCategory
1 a
2 d
My result was a long with many nested subqueries. Any good ideas?
Thank you in advance for the help.
Most databases support the ANSI-standard window functions, particularly row_number(). You can use this with aggregation to get what you want:
select Account, ProductCategory
from (select Account, ProductCategory, count(*) as cnt,
row_number() over (partition by Account order by count(*) desc) as seqnum
from table t
group by Account, ProductCategory
) apc
where seqnum = 1;
This can be done using analytic SQL , or just using count over group. The syntax depends on the RDBMS, as asked by Michael .
you can try following SQL :
select * from
(select account, ProductCategory, ct , ROW_NUMBER() OVER (partition by account, ProductCategory ORDER BY ct DESC ) As myRank
from (select account, ProductCategory, count(0) as ct
from <table>
group by account, ProductCategory ) t ) t2
where t2.myRank = 1
Code:
WITH A AS (SELECT [Account], ProductCategory, COUNT([ProductCategory]) OVER(PARTITION BY ProductCategory) AS [Count]
FROM tbl_all)
SELECT A.Account, ProductCategory
FROM A INNER JOIN (SELECT Account, MAX([Count]) AS Count FROM A GROUP BY A.Account) AS B ON A.Account=B.Account AND A.Count=B.Count
GROUP BY A.Account, ProductCategory

Maximum count - PostgreSQL

I'm looking to pull the max(count(*)) of something from a table.
Effectively what i'm trying to do is pull out a customers favourite brand. So they buy 300 bars of soap a year but I'd like to know which their favourite is. So the max(count(brand_id) basically.
I was thinking of doing it like this:
SELECT
transaction.customer_id,
max(occ)
FROM
( SELECT
transaction.customer_id,
count(transaction.brand_id) as occ,
FROM
transaction
GROUP BY
transaction.customer_id,
) AS foo
GROUP BY
transaction.customer_id
Thanks in advance
you can do it like this:
with cte as (
select customer_id, brand_id, count(*) as cnt
from test1
group by customer_id, brand_id
)
select distinct on (customer_id)
customer_id, brand_id, cnt
from cte
order by customer_id, cnt desc
Keep in mind, that if there more than one brand with equal count for some customer, you'll end up with one arbitrary record. If you want to get all records, use dense_rank() function:
with cte1 as (
select customer_id, brand_id, count(*) as cnt
from test1
group by customer_id, brand_id
), cte2 as (
select
customer_id, brand_id,
dense_rank() over(partition by customer_id order by cnt desc) as rn
from cte1
)
select customer_id, brand_id
from cte2
where rn = 1
sql fiddle demo
For PostgreSQL 8.3:
select distinct on (customer_id)
customer_id, brand_id, cnt
from (
select customer_id, brand_id, count(*) as cnt
from test1
group by customer_id, brand_id
) as c
order by customer_id, cnt desc;
sql fiddle demo
or like this
with cte as (
SELECT
transaction.customer_id,
count(transaction.brand_id) as occ,
FROM
transaction
GROUP BY
transaction.customer_id
)
select max(occ) from cte

Delete where one column contains duplicates

consider the below:
ProductID Supplier
--------- --------
111 Microsoft
112 Microsoft
222 Apple Mac
222 Apple
223 Apple
In this example product 222 is repeated because the supplier is known as two names in the data supplied.
I have data like this for thousands of products. How can I delete the duplicate products or select individual results - something like a self join with SELECT TOP 1 or something like that?
Thanks!
I think you want to do the following:
select t.*
from (select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
) t
where seqnum = 1
This selects an arbitrary row for each product.
To delete all rows but one, you can use the same idea:
with todelete (
(select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
)
delete from to_delete where seqnum > 1
DELETE a
FROM tableName a
LEFT JOIN
(
SELECT Supplier, MIN(ProductID) min_ID
FROM tableName
GROUP BY Supplier
) b ON a.supplier = b.supplier AND
a.ProductID = b.min_ID
WHERE b.Supplier IS NULL
SQLFiddle Demo
or if you want to delete productID which has more than onbe product
WITH cte
AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Supplier) rn
FROM tableName
)
DELETE FROM cte WHERE rn > 1
SQLFiddle Demo
;WITH Products_CTE AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY <some value>) as rn
FROM PRODUCTS
)
SELECT *
FROM Products_CTE
WHERE rn = 1
The some value is going to be the key that determines which version of Supplier you keep. If you want the first instance of the supplier, you could use the DateAdded column, if it exists.

Return top 5 from SUM in select statement

I need to return the following statement but I only want to return the TOP 5 of each Sale value only.....not all the records.
Select ID, Code, sum(Sale) as Sale from TableName
Where Code = 11
Group By ID, code
I do not want this!
Select TOP 5 ID, Code, sum(Sale) as Sale from TableName
Where Code = 11
Group By ID, code
With Cte as
( Select ID, Code, sale as Sales ,
row_number() over (partition by ID,code order by sale desc) as row_num
from TableName where code=11
)
Select Id,code,sum(sales) from cte
GROUP BY ID, code
WHERE row_num < 6
WITH TopSales AS (
SELECT *, RANK() OVER (PARTITION BY ID, Code ORDER BY Sale DESC) saleRank
FROM TableName
)
SELECT ID, Code, SUM(Sale) AS Sale
FROM TopSales
WHERE (Code = 11) AND (saleRank <= 5)
GROUP BY ID, code
select id, code, SUM (sale)
from
(
select id, code, sale,
ROW_NUMBER() over(partition by id, code order by sale desc) rn
from tablename
) v
where rn<=5
group by id, code
Probably you need something like:
;WITH sales (
SELECT
id,
code,
sale,
ROW_NUMBER() OVER (PARTITION BY id, code ORDER BY sales DESC) n
FROM
TableName
WHERE
Code = 11
)
SELECT
id, code, sum(sale) sale
FROM
sales
WHERE
n <= 5
GROUP BY
id,
code
ROW_NUMBER() and PARTITION BY help to find last 5 sales. Then you SUM only top (highest) 5.
This query returns sum of top 5 sales for each (id, code) group.
If you want to return just the top 5 results for each group you could do this:
with cte as
(ID, Code, Sale,ROW_NUMBER() over(partition by ID,
Code order by (select 0)) rownum
from TableName)
Select ID, Code, sum(Sale) as Sale from cte
Where Code = 11
and rownum<=5
Group By ID, code
If you want to return top 5 results with highest salary for each group you could do this:
with cte as
(ID, Code, Sale,ROW_NUMBER() over(partition by ID,
Code order by Sale desc) rownum
from TableName)
Select ID, Code, sum(Sale) as Sale from cte
Where Code = 11
and rownum<=5
Group By ID, code
select id, code, sum(sale) as sale from tablename
where code = 11
group by id, code
order by sum(sale) desc
limit 5