SQL Server: Group similar sales together - sql

I'm trying to do some reporting in SQL Server.
Here's the basic table setup:
Order (ID, DateCreated, Status)
Product(ID, Name, Price)
Order_Product_Mapping(OrderID, ProductID, Quantity, Price, DateOrdered)
Here I want to create a report to group product with similar amount of sales over a time period like this:
Sales over 1 month:
Coca, Pepsi, Tiger: $20000 average(coca:$21000, pepsi: $19000, tiger: $20000)
Bread, Meat: $10000 avg (bread:$11000, meat: $9000)
Note that the text in () is just to clarify, not need in the report).
User define the varying between sales that can consider similar. Example sales with varying lower than 5% are consider similar and should be group together. The time period is also user defined.
I can calculate total sale over a period but has no ideas on how to group them together by sales varying. I'm using SQL Server 2012.
Any help is appreciated.
Sorry, my English is not very good :)
UPDATE: *I figured out about what I atually need ;)*
For an known array of numbers like: 1,2,3,50,52,100,102,105
I need to group them into groups which have at least 3 number and the difference between any two items in group is smaller than 10.
For the above array, output should be:
[1,2,3]
[100,102,105]
=> the algorithm take 3 params: the array, minimum items to form a group and maximum difference between 2 items.
How can I implement this in C#?

By the way, if you just want c#:
var maxDifference = 10;
var minItems = 3;
// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);
var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
totalDiff += n - lastNum;
// if distance of current number and first number in current group
// is less than the threshold, add into current group
if (totalDiff <= maxDifference)
{
group.Add(n);
lastNum = n;
continue;
}
// if current group has 3 items or more, add to final result
if (group.Count >= minItems)
result.Add(group);
// start new group
group = new List<int>() { n };
lastNum = n;
totalDiff = 0;
}
// forgot the last group...
if (group.Count >= minItems)
Result.Add(group);
the key here is, the array need to be ordered, so that you do not need to jump around or store values to calculate distances

I can't believe I did it~~~
-- this threshold is the key in this query
-- it means that
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE #th int
SET #th = 200
-- very simple, calculate total price for a time range
;WITH totals AS (
SELECT p.name AS col, sum(o.price * op.quantity) AS val
FROM order_product_mapping op
JOIN [order] o ON o.id = op.orderid
JOIN product p ON p.id = op.productid
WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( --
SELECT col, val, row_number()over(ORDER BY val DESC) rn
FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it
cte_last_rn AS (
SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
FROM cte_rn
),
-- then we join current to the row before it, and calculate
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
SELECT
c1.col, c1.val,
CASE
WHEN c2.val - c1.val <= #th THEN 0
ELSE 1
END AS range,
rn
FROM cte_last_rn c1
JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
SELECT c1.col, c1.val, sum(c2.range) rank
FROM cte_range c1
JOIN cte_range c2 ON c1.rn >= c2.rn
GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank
SELECT
avg(c1.val) AVG,
(
SELECT c2.col + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.val desc
FOR xml path('')
) product,
(
SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.desc
FOR xml path('')
) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2
The result will look like:
AVG PRODUCT PRICE
28 A, B, C 30, 29, 27
12 D, E, F 15, 12, 10
3 G, H, I 4, 3, 2
for understanding how I did concatenate, please read this:
Concatenate many rows into a single text string?

This query should produce what you expect, it displays products sales for every months for which you have orders :
SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month ,
Product.Name ,
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
Product.Name
Not tested, but if you provide sample data I could work on it

Look like I made things more complicate than it should be.
Here is what should solve the problem:
-Run a query to get sales for each product.
-Run K-mean or some similar algorithms.

Related

SQL self join to get count difference between records

Pardon the title as I could not think of a good title for my problem.
I have a table as below
L_DATE
GRP
Counts
20.01.2023
A
100
21.01.2023
A
150
22.01.2023
B
200
20.01.2023
C
500
21.01.2023
C
800
22.01.2023
C
1200
The desired output is like this
GRP
Current Count
Last Count
Diff1
Last2Last Count
Diff2
A
0
150
-150
100
-100
B
200
0
200
0
200
C
1200
800
400
500
700
where,
Current Count is the count of latest date - 22.01.2023
Last Count is the count of previous date - 21.01.2023
Last2Last Count is the count of last to last date - 20.01.2023
Diff1 is the difference between Current Count and Last Count
Diff2 is the difference between Current Count and Last2Last Count
0 appears where there is no data for that date, for example A does not have any record for latest date 22.01.2023 so its 'Current Count' is 0. Similarly B does not have any record for 21.01.2023 or 20.01.2023 so its 'Last Count' and 'Last2Last Count' is 0.
I have tried all sorts of joins but cannot achieve the desired results. Below is my latest code, which gives me result of C and B but not A.
select distinct
T1.GRP,
T1.Counts as "Current Count",
ifnull(T2.Counts,0) as "Last Count",
T1.Counts - T2.Counts as "Diff1",
ifnull(T3.Counts,0) as "Last2Last Count",
T1.Counts - T3.Counts as "Diff2"
from tbl T1
left join tbl T2 on (T2.L_DATE = '21.01.2023' and T2.GRP = T1.GRP)
left join tbl T3 on (T3.L_DATE = '20.01.2023' and T3.GRP = T1.GRP)
where T1.L_DATE = ('22.01.2023')
I tried to achieve it via GROUP_BY but did not succeed. Any help or guidance is appreciated.
Generate test data
CREATE TABLE TEST (L_DATE DATE, GRP VARCHAR(1), COUNTS INTEGER);
INSERT INTO TEST VALUES ('20.1.2023', 'A', 100);
INSERT INTO TEST VALUES ('21.1.2023', 'A', 150);
INSERT INTO TEST VALUES ('22.1.2023', 'B', 200);
INSERT INTO TEST VALUES ('20.1.2023', 'C', 500);
INSERT INTO TEST VALUES ('21.1.2023', 'C', 800);
INSERT INTO TEST VALUES ('22.1.2023', 'C', 1200);
Next you need to "fill the empty lines". For dates you may want to use SERIES_GENERATE instead, if not all dates are present in the data.
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT el.L_DATE, el.GRP, ifnull(t.COUNTS, 0) AS COUNTS
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
As you proposed, two self-joins based on this intermediate result would do the job. However, I would prefer to use window function LAG instead.
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT
el.L_DATE,
el.GRP,
ifnull(t.COUNTS, 0) AS COUNTS,
LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
Note that this gives you the desired result also for historical dates. You can add a WHERE condition for the current date. Also you can additionally calculate the differences:
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT L_DATE, GRP, COUNTS, LASTCOUNT, COUNTS-LASTCOUNT DIFF1, LAST2LASTCOUNT, COUNTS-LAST2LASTCOUNT DIFF2
FROM
(
SELECT
el.L_DATE,
el.GRP,
ifnull(t.COUNTS, 0) AS COUNTS,
LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
)
WHERE L_DATE = '22.1.2023'

SQL - Get the sum of several groups of records

DESIRED RESULT
Get the hours SUM of all [Hours] including only a single result from each [DevelopmentID] where [Revision] is highest value
e.g SUM 1, 2, 3, 5, 6 (Result should be 22.00)
I'm stuck trying to get the appropriate grouping.
DECLARE #CompanyID INT = 1
SELECT
SUM([s].[Hours]) AS [Hours]
FROM
[dbo].[tblDev] [d] WITH (NOLOCK)
JOIN
[dbo].[tblSpec] [s] WITH (NOLOCK) ON [d].[DevID] = [s].[DevID]
WHERE
[s].[Revision] = (
SELECT MAX([s2].[Revision]) FROM [tblSpec] [s2]
)
GROUP BY
[s].[Hours]
use row_number() to identify the latest revision
SELECT SUM([Hours])
FROM (
SELECT *, R = ROW_NUMBER() OVER (PARTITION BY d.DevID
ORDER BY s.Revision)
FROM [dbo].[tblDev] d
JOIN [dbo].[tblSpec] s
ON d.[DevID] = s.[DevID]
) d
WHERE R = 1
If you want one row per DevId, then that should be in the GROUP BY (and presumably in the SELECT as well):
SELECT s.DevId, SUM(s.Hours) as hours
FROM [dbo].[tblDev] d JOIN
[dbo].[tblSpec] s
ON [d].[DevID] = [s].[DevID]
WHERE s.Revision = (SELECT MAX(s2.Revision) FROM tblSpec s2)
GROUP BY s.DevId;
Also, don't use WITH NOLOCK unless you really know what you are doing -- and I'm guessing you do not. It is basically a license that says: "You can get me data even if it is not 100% accurate."
I would also dispense with all the square braces. They just make the query harder to write and to read.

SQL - combining consecutive months of the same block with same quantity

This question will seem very easy at first but as you start writing the complexity hits. I have attached a picture blow with the result set of my SQL. The result is 39 rows. I need to combine all the consecutive rows of the same block with the same value. With this example, the end result should be 29 rows where all the red box'd rows below should be consolidated into 1 row.
so for example the first redbox with quantity = 40 should combine into 1 row with term_start = 2017-06-01 and term_end = 2017-08-01
Here's my Code
SELECT
pp.position
, term_start = pq.begtime
, term_end = pq.endtime
, quantity = CONVERT(VARCHAR,convert(double precision, pq.energy))
, block = p.block
FROM trade t
INNER JOIN position p on p.trade = t.trade
INNER JOIN powerposition pp on p.position = pp.position
INNER JOIN powerquantity pq on pq.position = pp.position
AND pq.posdetail = pp.posdetail
AND pq.quantitystatus = 'TRADE'
WHERE 1=1
AND p.positionmode = 'PHYSICAL'
AND t.collaboration = 13119572
I've been stuck on this problem for three days straight now. I've explored using CTEs and Row_Number() over () but with no success. Any help would be greatly appreciated!!
You are looking for consecutive values. Here is one way, using a difference of row numbers to identify a group:
with t as (<your query here>)
select min(term_start), max(term_end), block, quantity
from (select t.*,
(row_number() over (partition by block order by position) -
row_number() over (partition by quantity, block order by position)
) as grp
from t
) t
group by quantity, grp, block;

SQL query for adding column value to compare with other column

I have two tables
table_inventory
List item
inventory_rack_key(primarykey)
node_key
rack_id
inventory_item_key
in_coming_qty,locked_qty
quantity
table_item
inventory_item_key(primary key)
item_id,product_zone
The table example are provided here DB TABLES
I need query to find out those items for which (net_qty) i.e difference b/w sum of in_coming_qty & quantity & locked_qty is negative. arranged by node_key,rack_id, item_id,net_qty
Note: each distinct set {node_key,rack_id, item_id,net_qty} will have only 1 row in output.
For ex :{node_key,rack_id, item_id} = {ABD101,RK-01,562879} has 4 rows in table_inventory
but in output net_qty= -78(single row) .
The query I made is giving me result but can we do it in some other way?
SELECT l.node_key,
l.rack_id,
i.item_id,
( SUM(l.quantity + l.in_coming_qty) - SUM(l.locked_qty) ) AS net_qty
FROM table_inventory l,
table_item i
WHERE l.inventory_item_key = i.inventory_item_key
GROUP BY l.node_key,
l.rack_id,
i.item_id
HAVING SUM(l.quantity + l.in_coming_qty) - SUM(l.locked_qty) < 0
Not really. There is this minor variant:
select v.* from (
SELECT l.node_key,
l.rack_id,
i.item_id,
SUM(l.quantity + l.in_coming_qty - l.locked_qty) AS net_qty
FROM table_inventory l,
table_item i
WHERE l.inventory_item_key = i.inventory_item_key
GROUP BY l.node_key,
l.rack_id,
i.item_id
) v
where net_qty < 0
- which means that the SUM calculation only needs to be coded once, but you do still need to do a SUM.

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.