I have transactional data that looks like this
Account ProductCategory
1 a
1 a
1 b
2 c
2 d
2 d
I need to find the ProductCategory that appears most per customer. Results:
Account ProductCategory
1 a
2 d
My result was a long with many nested subqueries. Any good ideas?
Thank you in advance for the help.
Most databases support the ANSI-standard window functions, particularly row_number(). You can use this with aggregation to get what you want:
select Account, ProductCategory
from (select Account, ProductCategory, count(*) as cnt,
row_number() over (partition by Account order by count(*) desc) as seqnum
from table t
group by Account, ProductCategory
) apc
where seqnum = 1;
This can be done using analytic SQL , or just using count over group. The syntax depends on the RDBMS, as asked by Michael .
you can try following SQL :
select * from
(select account, ProductCategory, ct , ROW_NUMBER() OVER (partition by account, ProductCategory ORDER BY ct DESC ) As myRank
from (select account, ProductCategory, count(0) as ct
from <table>
group by account, ProductCategory ) t ) t2
where t2.myRank = 1
Code:
WITH A AS (SELECT [Account], ProductCategory, COUNT([ProductCategory]) OVER(PARTITION BY ProductCategory) AS [Count]
FROM tbl_all)
SELECT A.Account, ProductCategory
FROM A INNER JOIN (SELECT Account, MAX([Count]) AS Count FROM A GROUP BY A.Account) AS B ON A.Account=B.Account AND A.Count=B.Count
GROUP BY A.Account, ProductCategory
Related
I am facing a bit of issue with an SQL query:
Currently I have 2 tables. The first table lists sales by a vendor and country eg and there are a lot more rows but this is just the gist.
Country id Sale
US 1 100
UK 2 1000
US 3 150
UK 2 200
In the second table I have ids that links to the vendor's name eg
id name
1 john
2 david
3 tom
I need to get the top vendor in each country but sum of sales. the output should look something like this
country id name sum_sales
Would you be able to help. Currently I am only able to groupby and sum and am unable to obtain the top guy in each country. thank you!
I am running this on big_query sql
Use dense_rank() with aggregation :
select yr, Country, id, name, total_sales
from (select extract(year from s.date) as yr,
s.Country, s.id, v.name, sum(s.sales) as total_sales,
dense_rank() over (partition by s.date, s.country order by sum(s.sales) desc) as seq
from sales s inner join
vendors v
on v.id = v.id
group by s.date, s.Country, s.id, v.name
) t
where seq <= 2;
EDIT : For specific year format use FORMAT_DATETIME
FORMAT_DATETIME("%Y", DATETIME "2020-03-19")
By this way, you will get vendors for each country which are having higher sales.
Note : This will display two or more vendors which are having same total sales. If you want only one from them, then use row_number() instead of dense_rank().
In BigQuery, you can use window functions with aggregation:
select id, name, country, sum_sales
from (select s.id, v.name, s.country, sum(sales) as sum_sales
row_number() over (partition by s.country order by sum(sales) desc) as seqnum
from sales s join
vendors v
on v.id = v.id
group by s.id, v.name, s.country
) sv
where seqnum = 1;
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY sum_sale DESC LIMIT 1)[OFFSET(0)]
FROM (
SELECT country, id, name, SUM(sale) sum_sale
FROM `project.dataset.vendors`
JOIN `project.dataset.sales`
USING(id)
GROUP BY id, name, country
) t
GROUP BY country
I have to find the 3 highest spending customers from Customer (customer name, id) and order (order id, order amt, order date) for every week. If I run the query today, it should show the top 3 for all weeks for which order date exists.
I am thinking about doing a Partition by over the date (weekly), but I can't find any method to do that? Has anyone done a weekly partition of results?
I know it's not right, but this is what I have:
Select Top 3 customer_name, id OVER (partition by [week])
(
Select c.customer_name, c.id, o.order_amt,
from customer c
Join Order o
on c.id=o.id
group by c.id
)
According to your table structure, where orders.order_id is customer.id
use this statement
select
*
from
(
select
details.*
,dense_rank() over (partition by week_num order by order_amt desc) as rank_num
from
(
select
c.id as customer_id
,c.name
,sum(o.order_amt) as order_amt
,datepart(WEEK,o.order_date) as week_num
from customer c
join orders o on c.id=o.order_id
group by c.id,c.name,datepart(WEEK,o.order_date)
)details
)dets
where dets.rank_num<=3
Updated : changed statement to use just 2 tables
the query should be something like this
Select customer_name, id, order_amt, [week]
(
Select c.customer_name, c.id, o.order_amt, [week],
rn = row_number() over (partition by [week] order by o.order_amt desc)
from customer c
Join Order o
on c.id=o.id
) d
where rn <= 3
this is an idea,
;WITH CTE
AS (
SELECT c.customer_name
,c.id
,o.order_amt
,datepart(wk, datecol) AS Weekcol
)
,CTE1
AS (
SELECT c.customer_name
,c.id
,o.order_amt
,ROW_NUMBER() OVER (
PARTITION BY Weekcol ORDER BY order_amt DESC
) AS rowNUm
FROM CTE
)
SELECT *
FROM CTE1
WHERE rowNUm <= 3
In SQL Server, suppose we have a SALES_HISTORY table as below.
CustomerNo PurchaseDate ProductId
1 20120411 12
1 20120330 13
2 20120312 14
3 20120222 16
3 20120109 16
... and many records for each purchase of each customer...
How can I write the appropriate query for finding:
For each customer,
find the product he bought at MOST,
find the percentage of this product over all products he bought.
The result table must have columns like:
CustomerNo,
MostPurchasedProductId,
MostPurchasedProductPercentage
Assuming SQL Server 2005+, you can do the following:
;WITH CTE AS
(
SELECT *,
COUNT(*) OVER(PARTITION BY CustomerNo, ProductId) TotalProduct,
COUNT(*) OVER(PARTITION BY CustomerNo) Total
FROM YourTable
), CTE2 AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY CustomerNo
ORDER BY TotalProduct DESC)
FROM CTE
)
SELECT CustomerNo,
ProductId MostPurchasedProductId,
CAST(TotalProduct AS NUMERIC(16,2))/Total*100 MostPurchasedProductPercent
FROM CTE2
WHERE RN = 1
You still need to deal when you have more than one product as the most purchased one. Here is a sqlfiddle with a demo for you to try.
Could do a lot prettier, but it works:
with cte as(
select CustomerNo, ProductId, count(1) as c
from SALES_HISTORY
group by CustomerNo, ProductId)
select CustomerNo, ProductId as MostPurchasedProductId, (t.c * 1.0)/(select sum(c) from cte t2 where t.CustomerNo = t2.CustomerNo) as MostPurchasedProductPercentage
from cte t
where c = (select max(c) from cte t2 where t.CustomerNo = t2.CustomerNo)
SQL Fiddle
consider the below:
ProductID Supplier
--------- --------
111 Microsoft
112 Microsoft
222 Apple Mac
222 Apple
223 Apple
In this example product 222 is repeated because the supplier is known as two names in the data supplied.
I have data like this for thousands of products. How can I delete the duplicate products or select individual results - something like a self join with SELECT TOP 1 or something like that?
Thanks!
I think you want to do the following:
select t.*
from (select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
) t
where seqnum = 1
This selects an arbitrary row for each product.
To delete all rows but one, you can use the same idea:
with todelete (
(select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
)
delete from to_delete where seqnum > 1
DELETE a
FROM tableName a
LEFT JOIN
(
SELECT Supplier, MIN(ProductID) min_ID
FROM tableName
GROUP BY Supplier
) b ON a.supplier = b.supplier AND
a.ProductID = b.min_ID
WHERE b.Supplier IS NULL
SQLFiddle Demo
or if you want to delete productID which has more than onbe product
WITH cte
AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Supplier) rn
FROM tableName
)
DELETE FROM cte WHERE rn > 1
SQLFiddle Demo
;WITH Products_CTE AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY <some value>) as rn
FROM PRODUCTS
)
SELECT *
FROM Products_CTE
WHERE rn = 1
The some value is going to be the key that determines which version of Supplier you keep. If you want the first instance of the supplier, you could use the DateAdded column, if it exists.
I have a table Posts which has a memberID and createdDate.
I need to return the most recent post per member and the posts must be order with most recent at the top.
I am not sure how to do this with Sql Server, can anyone help?
WITH PostsRanked AS (
SELECT
memberID, postTitle, createdDate,
RANK() OVER (
PARTITION BY memberID
ORDER BY createdDate DESC
) AS rk
FROM Posts
)
SELECT
memberID, postTitle, createdDate
FROM PostsRanked
WHERE rk = 1
ORDER BY createdDate DESC
select p.*
from post p
join (select memberId,
max(createdDate) as maxd
from post
group by memberId) as p2 on p.member_id=p2.member_id
and p.createdDate=p2.maxd
order by p.createdDate desc
Here is the working query
select p.*
from post p join
(
select memberId, max(createdDate) as maxd
from post
group by memberId
) as p2 on p.memberid = p2.memberid and p.createdDate=p2.maxd
order by p.createdDate desc