Tweaking a Query - looking for duplicates within a certain day range

Tweaking a Query - looking for duplicates within a certain day range - sql

I posted a question similar to this, and got an answer, but the answer isn't configurable - my fault I should have been more clear, so I'll try again.
I have a table where TABLENAME has the following information - OrderDate, OrderNumber, CustomerID, ProductSKU, ProductName exist. This table has lines for invoices. So an order will have a data line for every item in the order.
I want to know, which customers have ordered the same item, more than once, where the order is within 90 of any other order of that same product by that customer, after a specific date. Same product in the same order number do not count. The catch is that I want "more than once" to be configurable, so if I need to see 3 or more, or 4 or more I can adjust AND I want to see the counts. Here's the query I have so far, which I think gives me the items and the counts - but not the 90 day thing:
EDITED: I don't think the former version gave me the right counts
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >'2017-11-01'
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 2

Try doing this. it'll go back 90 days
declare #date date = '2017-11-01'
SELECT customerid, productsku, productname, count(distinct ordernumber) FROM tablename
WHERE orderdate >= dateadd(DD,-90,#date) and orderdate <= #date
GROUP BY customerid, productsku, productname
HAVING COUNT(distinct ordernumber) > 1

yes that is what I was doing in the first query. so this might be a really crappy way of doing it but without seeing any data it was kind of tough. this query shows gives you the order dates as well. hope it helps
WITH DupsWithin90Days (customerid,productsku,productname,orderdate,num)
as
(
select customerid,productsku,productname,orderdate ,count(*) num from (
SELECT X.customerid, X.productsku, X.productname,X.ORDERDATE,ROW_NUMBER() OVER (partition by x.customerid,x.orderdate order by x.orderdate) rownum
FROM
(
SELECT T1.customerid, T1.productsku, T1.productname,T1.ORDERDATE
FROM TABLENAME1 T1
) X
JOIN
(
SELECT T2.customerid, T2.productsku, T2.productname,T2.ORDERDATE
FROM
TABLENAME1 T2
) Y
ON X.customerid = Y.customerid AND X.orderdate >= dateadd(DD,-90,Y.orderdate)
) dup
where rownum > 1
group by customerid,productsku,productname,orderdate
)
select customerid,productsku,productname,orderdate
from DupsWithin90Days
order by customerid ,orderdate desc

Related

SQL get top 3 values / bottom 3 values with group by and sum

I am working on a restaurant management system. There I have two tables
order_details(orderId,dishId,createdAt)
dishes(id,name,imageUrl)
My customer wants to see a report top 3 selling items / least selling 3 items by the month
For the moment I did something like this
SELECT
*
FROM
(SELECT
SUM(qty) AS qty,
order_details.dishId,
MONTHNAME(order_details.createdAt) AS mon,
dishes.name,
dishes.imageUrl
FROM
rms.order_details
INNER JOIN dishes ON order_details.dishId = dishes.id
GROUP BY order_details.dishId , MONTHNAME(order_details.createdAt)) t
ORDER BY t.qty
This gives me all the dishes sold count order by qty.
I have to manually filter max 3 records and reject the rest. There should be a SQL way of doing this. How do I do this in SQL?

You would use row_number() for this purpose. You don't specify the database you are using, so I am guessing at the appropriate date functions. I also assume that you mean a month within a year, so you need to take the year into account as well:
SELECT ym.*
FROM (SELECT YEAR(od.CreatedAt) as yyyy,
MONTH(od.createdAt) as mm,
SUM(qty) AS qty,
od.dishId, d.name, d.imageUrl,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_desc,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_asc
FROM rms.order_details od INNER JOIN
dishes d
ON od.dishId = d.id
GROUP BY YEAR(od.CreatedAt), MONTH(od.CreatedAt), od.dishId
) ym
WHERE seqnum_asc <= 3 OR
seqnum_desc <= 3;

Using the above info i used i combination of group by, order by and limit
as shown below. I hope this is what you are looking for
SELECT
t.qty,
t.dishId,
t.month,
d.name,
d.mageUrl
from
(
SELECT
od.dishId,
count(od.dishId) AS 'qty',
date_format(od.createdAt,'%Y-%m') as 'month'
FROM
rms.order_details od
group by date_format(od.createdAt,'%Y-%m'),od.dishId
order by qty desc
limit 3) t
join rms.dishes d on (t.dishId = d.id)

Incremental count of duplicates

The following query displays duplicates in a table with the qty alias showing the total count, eg if there are five duplicates then all five will have the same qty = 5.
select s.*, t.*
from [Migrate].[dbo].[Table1] s
join (
select [date] as d1, [product] as h1, count(*) as qty
from [Migrate].[dbo].[Table1]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[d1] and s.[product] = t.[h1]
ORDER BY s.[product], s.[date], s.[id]
Is it possible to amend the count(*) as qty to show an incremental count so that five duplicates would display 1,2,3,4,5?

The answer to your question is row_number(). How you use it is rather unclear, because you provide no guidance, such as sample data or desired results. Hence this answer is rather general:
select s.*, t.*,
row_number() over (partition by s.product order by s.date) as seqnum
from [Migrate].[dbo].[Table1] s join
(select [date] as d1, [product] as h1, count(*) as qty
from [Migrate].[dbo].[Table1]
group by [date], [product]
having count(*) > 1
) t
on s.[date] = t.[d1] and s.[product] = t.[h1]
order by s.[product], s.[date], s.[id];
The speculation is that the duplicates are by product. This enumerates them by date. Some combination of the partition by and group by is almost certainly what you need.

Summing a column over a date range in a CTE?

I'm trying to sum a certain column over a certain date range. The kicker is that I want this to be a CTE, because I'll have to use it multiple times as part of a larger query. Since it's a CTE, it has to have the date column as well as the sum and ID columns, meaning I have to group by date AND ID. That will cause my results to be grouped by ID and date, giving me not a single sum over the date range, but a bunch of sums, one for each day.
To make it simple, say we have:
create table orders (
id int primary key,
itemID int foreign key references items.id,
datePlaced datetime,
salesRep int foreign key references salesReps.id,
price int,
amountShipped int);
Now, we want to get the total money a given sales rep made during a fiscal year, broken down by item. That is, ignoring the fiscal year bit:
select itemName, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName
Simple enough. But when you add anything else, even the price, the query spits out way more rows than you wanted.
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
group by itemName, price
Now, each group is (name, price) instead of just (name). This is kind of sudocode, but in my database, just this change causes my result set to jump from 13 to 32 rows. Add to that the date range, and you really have a problem:
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between 150101 and 151231
group by itemName, price
This is identical to the last example. The trouble is making it a CTE:
with totals as (
select itemName, price, sum(price) as totalSales, sum(totalShipped) as totalShipped, orderDate as startDate, orderDate as endDate
from orders
join items on items.id = orders.itemID
where orders.salesRep = '1234'
and orderDate between startDate and endDate
group by itemName, price, startDate, endDate
)
select totals_2015.itemName as itemName_2015, totals_2015.price as price_2015, ...
totals_2016.itemName as itemName_2016, ...
from (
select * from totals
where startDate = 150101 and endDate = 151231
) totals_2015
join (
select *
from totals
where startDate = 160101 and endDate = 160412
) totals_2016
on totals_2015.itemName = totals_2016.itemName
Now the grouping in the CTE is way off, more than adding the price made it. I've thought about breaking the price query into its own subquery inside the CTE, but I can't escape needing to group by the dates in order to get the date range. Can anyone see a way around this? I hope I've made things clear enough. This is running against an IBM iSeries machine. Thank you!

Depending on what you are looking for, this might be a better approach:
select 'by sales rep' breakdown
, salesRep
, '' year
, sum(price * amountShipped) amount
from etc
group by salesRep
union
select 'by sales rep and year' breakdown
, salesRep
, convert(char(4),orderDate, 120) year
, sum(price * amountShipped) amount
from etc
group by salesRep, convert(char(4),orderDate, 120)
etc

When possible group by the id columns or foreign keys because the columns are indexed already you'll get faster results. This applies to any database.
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte order by rep
or more fancy
with cte as (
select id,rep, sum(sales) sls, count(distinct itemid) did, count(*) cnt from sommewhere
where date between x and y
group by id,rep
) select * from cte join reps on cte.rep = reps.rep order by sls desc

I eventually found a solution, and it doesn't need a CTE at all. I wanted the CTE to avoid code duplication, but this works almost as well. Here's a thread explaining summing conditionally that does exactly what I was looking for.

Selecting only if at least one row matches condition

I have a select statement and want to return all values only if at least one of them has a date with 60 days of difference from today.
The problem is that i have an outer apply which returns the column i want to compare to, and they come from different tables (one belongs to cash items, and the other to card items).
Considering I have the following:
OUTER APPLY (
SELECT COUNT(*) AS quantity, MIN(date) AS item_date
FROM dbo.get_cash_items(loans.id_cash) AS cash_item
HAVING loans.id_product_type = 1 --Cash
UNION
SELECT COUNT(*) AS quantity, MIN(date) AS item_date
FROM dbo.get_card_items(loans.id_card) AS card_item
HAVING loans.id_product_type = 2 --Card
) AS items
I want to return all the rows only when DATEDIFF(DAY, MIN(items.item_date), GETDATE()) >= 60, but I want them all even if only one matches this condition.
What would be the best approach to do this?
EDIT
To make it clearer, I'll explain the use case:
I need to show the items of every loan, only if the client is late for more than 60 days of the due date on any of it

I am also not sure, what do you expect, but how about that:
WITH items
AS (SELECT Count(*) AS quantity,
Min(date) AS item_date
FROM dbo.Get_cash_items(loans.id_cash) AS cash_item
HAVING loans.id_product_type = 1
UNION
SELECT Count(*) AS quantity,
Min(date) AS item_date
FROM dbo.Get_card_items(loans.id_card) AS card_item
HAVING loans.id_product_type = 2)
SELECT a.*
FROM items AS a,
(SELECT TOP 1 *
FROM items AS b
WHERE Datediff(day, b.item_date, Getdate()) >= 60) AS c
It's a sort of CROSS JOIN, where table C will have one or zero rows depending on that if the condition is met - it will than join to every row in other table.

Have you tried something like this?
SELECT a.quantity, a.item_date
FROM
(SELECT COUNT(*) AS quantity, MIN(date) AS item_date
FROM dbo.get_cash_items(loans.id_cash) AS cash_item
HAVING loans.id_product_type = 1
UNION
SELECT COUNT(*) AS quantity, MIN(date) AS item_date
FROM dbo.get_card_items(loans.id_card) AS card_item
HAVING loans.id_product_type = 2) a
WHERE DATEDIFF(day, a.item_date, GETDATE()) >= 60

Typically I do this using a CTE to select the key for the records I want to select and then join on that. Below is an attempt at an example:
with LateClients as
(
SELECT LoadId FROM Payment Where /*payment date later than 60 days*/
)
SELECT p.LoanId,
p.UserId
FROM Payment as p
INNER JOIN LateClients as LC
ON p.LoanId = lc.LoanId
OrderBy p.LoanId, p.UserId
I know it's a bit different from the code you posted, but this is a simplified example that should explain the concept. Good luck!

SQL Grouping Issues

I'm attempting to write a query that will return any customer that has multiple work orders with these work orders falling on different days of the week. Every work order for each customer should be falling on the same day of the week so I want to know where this is not the case so I can fix it.
The name of the table is Core.WorkOrder, and it contains a column called CustomerId that specifies which customer each work order belongs to. There is a column called TimeWindowStart that can be used to see which day each work order falls on (I'm using DATENAME(weekday, TimeWindowStart) to do so).
Any ideas how to write this query? I'm stuck here.
Thanks!

Select ...
From WorkOrder As W
Where Exists (
Select 1
From WorkOrder As W1
And W1.CustomerId = W.CustomerId
And DatePart( dw, W1.TimeWindowStart ) <> DatePart( dw, W.TimeWindowStart )
)

SELECT *
FROM (
SELECT *,
COUNT(dp) OVER (PARTITION BY CustomerID) AS cnt
FROM (
SELECT DISTINCT CustomerID, DATEPART(dw, TimeWindowStart) AS dp
FROM workOrder
) q
) q
WHERE cnt >= 2

SELECT CustomerId,
MIN(DATENAME(weekday, TimeWindowStart)),
MAX(DATENAME(weekday, TimeWindowStart))
FROM Core.WorkOrder
GROUP BY CustomerId
HAVING MIN(DATENAME(weekday, TimeWindowStart)) != MAX(DATENAME(weekday, TimeWindowStart))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas