Incremental count of duplicates - sql

The following query displays duplicates in a table with the qty alias showing the total count, eg if there are five duplicates then all five will have the same qty = 5.
select s.*, t.*
from [Migrate].[dbo].[Table1] s
join (
select [date] as d1, [product] as h1, count(*) as qty
from [Migrate].[dbo].[Table1]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[d1] and s.[product] = t.[h1]
ORDER BY s.[product], s.[date], s.[id]
Is it possible to amend the count(*) as qty to show an incremental count so that five duplicates would display 1,2,3,4,5?

The answer to your question is row_number(). How you use it is rather unclear, because you provide no guidance, such as sample data or desired results. Hence this answer is rather general:
select s.*, t.*,
row_number() over (partition by s.product order by s.date) as seqnum
from [Migrate].[dbo].[Table1] s join
(select [date] as d1, [product] as h1, count(*) as qty
from [Migrate].[dbo].[Table1]
group by [date], [product]
having count(*) > 1
) t
on s.[date] = t.[d1] and s.[product] = t.[h1]
order by s.[product], s.[date], s.[id];
The speculation is that the duplicates are by product. This enumerates them by date. Some combination of the partition by and group by is almost certainly what you need.

Related

SQL get top 3 values / bottom 3 values with group by and sum

I am working on a restaurant management system. There I have two tables
order_details(orderId,dishId,createdAt)
dishes(id,name,imageUrl)
My customer wants to see a report top 3 selling items / least selling 3 items by the month
For the moment I did something like this
SELECT
*
FROM
(SELECT
SUM(qty) AS qty,
order_details.dishId,
MONTHNAME(order_details.createdAt) AS mon,
dishes.name,
dishes.imageUrl
FROM
rms.order_details
INNER JOIN dishes ON order_details.dishId = dishes.id
GROUP BY order_details.dishId , MONTHNAME(order_details.createdAt)) t
ORDER BY t.qty
This gives me all the dishes sold count order by qty.
I have to manually filter max 3 records and reject the rest. There should be a SQL way of doing this. How do I do this in SQL?
You would use row_number() for this purpose. You don't specify the database you are using, so I am guessing at the appropriate date functions. I also assume that you mean a month within a year, so you need to take the year into account as well:
SELECT ym.*
FROM (SELECT YEAR(od.CreatedAt) as yyyy,
MONTH(od.createdAt) as mm,
SUM(qty) AS qty,
od.dishId, d.name, d.imageUrl,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_desc,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_asc
FROM rms.order_details od INNER JOIN
dishes d
ON od.dishId = d.id
GROUP BY YEAR(od.CreatedAt), MONTH(od.CreatedAt), od.dishId
) ym
WHERE seqnum_asc <= 3 OR
seqnum_desc <= 3;
Using the above info i used i combination of group by, order by and limit
as shown below. I hope this is what you are looking for
SELECT
t.qty,
t.dishId,
t.month,
d.name,
d.mageUrl
from
(
SELECT
od.dishId,
count(od.dishId) AS 'qty',
date_format(od.createdAt,'%Y-%m') as 'month'
FROM
rms.order_details od
group by date_format(od.createdAt,'%Y-%m'),od.dishId
order by qty desc
limit 3) t
join rms.dishes d on (t.dishId = d.id)

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

MAX of SUMs from GROUP BY of JOIN

I'm stuck.
I have two tables:
First, [PurchasedItemsByCustomer] with the columns:
[CustID] INT NULL,
[ItemId] INT NULL,
[Quantity] INT NULL,
[OnDate] DATE NULL
Second, Table [Items] with the columns:
[ItemId] INT NULL,
[Price] FLOAT NULL,
[CategoryId] INT NULL
I need to output a list with 3 columns:
a month
the category which sold the most (in items quantity) in that month
how many items from that category were purchased in that month.
Thank you
I think you can use a query like this:
;With SoldPerMonth as (
select datepart(month, p.onDate) [Month], i.CategoryId [Category], sum(p.Quntity) [Count]
from PurchasedItemsByCustomer p
join Items i on p.ItemId = i.ItemId
group by datepart(month, p.onDate), i.CategoryId
), SoldPerMonthRanked as (
select *, rank() over (partition by [Month] order by [Count] desc) rnk
from SoldPerMonth
)
select [Month], [Category], [Count]
from SoldPerMonthRanked
where rnk = 1;
SQL Server Demo
Note: In above query by using rank() will provide all max categories if you want to return just one row use row_number() instead.
Divide et Impera:
with dept_sales as(
select month(ondate) as month, year(ondate) as year, category, count(*) as N -- measure sales for each month and category
from purchase join items using itemid
group by year(ondate), month(ondate), category)
select top 1 * --pick the highest
from dept_sales
where year = year(current_timestamp) -- I imagine you need data only for current year
order by N desc --order by N asc if you want the least selling category
If you don't group by year you'll get january of all the years in the same 'january' entry, so I added a filter on current year.
I used CTE for code clarity to split the phases of calculation, you can nest them if you want to.
Here you go,
SELECT
A.[CategoryId],
A.[Month],
A.[CategoryMonthCount]
FROM
(
SELECT
A.[CategoryId],
A.[Month],
A.[CategoryMonthCount],
RANK() OVER(
PARTITION BY A.[Month]
ORDER BY A.[CategoryMonthCount] DESC) [RN]
FROM
(
SELECT
I.[CategoryId],
MONTH(PIBC.[OnDate]) [Month],
SUM(PIBC.[Quantity]) [CategoryMonthCount]
FROM
[dbo].[PurchasedItemsByCustomer] PIBC
JOIN
[dbo].[Items] I
GROUP BY
I.[CategoryId],
MONTH(PIBC.[OnDate])
) A
) A
WHERE
A.[RN] = 1;

Tweaking a Query - looking for duplicates within a certain day range

I posted a question similar to this, and got an answer, but the answer isn't configurable - my fault I should have been more clear, so I'll try again.
I have a table where TABLENAME has the following information - OrderDate, OrderNumber, CustomerID, ProductSKU, ProductName exist. This table has lines for invoices. So an order will have a data line for every item in the order.
I want to know, which customers have ordered the same item, more than once, where the order is within 90 of any other order of that same product by that customer, after a specific date. Same product in the same order number do not count. The catch is that I want "more than once" to be configurable, so if I need to see 3 or more, or 4 or more I can adjust AND I want to see the counts. Here's the query I have so far, which I think gives me the items and the counts - but not the 90 day thing:
EDITED: I don't think the former version gave me the right counts
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >'2017-11-01'
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 2
Try doing this. it'll go back 90 days
declare #date date = '2017-11-01'
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >= dateadd(DD,-90,#date) and orderdate <= #date
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 1
yes that is what I was doing in the first query. so this might be a really crappy way of doing it but without seeing any data it was kind of tough. this query shows gives you the order dates as well. hope it helps
WITH DupsWithin90Days (customerid,productsku,productname,orderdate,num)
as
(
select customerid,productsku,productname,orderdate ,count(*) num from (
SELECT X.customerid, X.productsku, X.productname,X.ORDERDATE,ROW_NUMBER() OVER (partition by x.customerid,x.orderdate order by x.orderdate) rownum
FROM
(
SELECT T1.customerid, T1.productsku, T1.productname,T1.ORDERDATE
FROM TABLENAME1 T1
) X
JOIN
(
SELECT T2.customerid, T2.productsku, T2.productname,T2.ORDERDATE
FROM
TABLENAME1 T2
) Y
ON X.customerid = Y.customerid AND X.orderdate >= dateadd(DD,-90,Y.orderdate)
) dup
where rownum > 1
group by customerid,productsku,productname,orderdate
)
select customerid,productsku,productname,orderdate
from DupsWithin90Days
order by customerid ,orderdate desc

ROW_NUMBER() OVER (PARTITION BY) showing duplicate results for Group By Clause

I have the below query that was created to show the summation of the "Last" values for a year, usually this is a december value, but the year could potentially end in any month so i want to add together the last values for each goalmontecarloheaderid. I have it working 99%, but there are some random duplicates in the [year] value.
WITH endBalances AS (
SELECT ROW_NUMBER() OVER (PARTITION By GoalMonteCarloHeaderID, Year(Convert(date,MonthDate)) Order By Max(Month(Convert(date,MonthDate))) desc) n, Max(Month(Convert(date,MonthDate))) maxMonth, GrowthBucket, WithdrawalBucket, NoTaxesBucket,
Year(MonthDate) [year]
From GoalMonteCarloMedianResults mcmr
full join GoalMonteCarloHeader mch on mch.ID = mcmr.GoalMonteCarloHeaderID
full join GoalChartData gcd on gcd.ID = mch.GoalChartDataID and gcd.TypeID = 2
inner join Goal g on g.iGoalID = gcd.GoalID
where g.iTypeID in (1) and g.iHHID = 850802
group by GoalMonteCarloHeaderID, MonthDate, GrowthBucket, WithdrawalBucket, NoTaxesBucket
)
SELECT [year], Sum(GrowthBucket) GrowthBucket, Sum(WithdrawalBucket) WithdrawalBucket,Sum(NoTaxesBucket) NoTaxesBucket, maxMonth
From endBalances
where [year] is not null and n=1
Group By [year], maxMonth
order by [year] asc
Showing two random duplicates in the database result;
you can see in the image there are two examples where the year is duplicated and displayed for more than just the 'last' month in the year. Am I doing something wrong with the group by or the PARTITION BY() in my query? I am not the most familiar with this functionality of T-SQL.
T-SQL has a lovely function for this which has no direct equivalent in MySQL.
ROW_NUMBER() OVER (PARTITION BY [year] ORDER BY MonthDate DESC) AS rn
Then anything with rn=1 will be the last entry in a year.
The answers to this question have a few ideas:
ROW_NUMBER() in MySQL