SQL - combining consecutive months of the same block with same quantity - sql

This question will seem very easy at first but as you start writing the complexity hits. I have attached a picture blow with the result set of my SQL. The result is 39 rows. I need to combine all the consecutive rows of the same block with the same value. With this example, the end result should be 29 rows where all the red box'd rows below should be consolidated into 1 row.
so for example the first redbox with quantity = 40 should combine into 1 row with term_start = 2017-06-01 and term_end = 2017-08-01
Here's my Code
SELECT
pp.position
, term_start = pq.begtime
, term_end = pq.endtime
, quantity = CONVERT(VARCHAR,convert(double precision, pq.energy))
, block = p.block
FROM trade t
INNER JOIN position p on p.trade = t.trade
INNER JOIN powerposition pp on p.position = pp.position
INNER JOIN powerquantity pq on pq.position = pp.position
AND pq.posdetail = pp.posdetail
AND pq.quantitystatus = 'TRADE'
WHERE 1=1
AND p.positionmode = 'PHYSICAL'
AND t.collaboration = 13119572
I've been stuck on this problem for three days straight now. I've explored using CTEs and Row_Number() over () but with no success. Any help would be greatly appreciated!!

You are looking for consecutive values. Here is one way, using a difference of row numbers to identify a group:
with t as (<your query here>)
select min(term_start), max(term_end), block, quantity
from (select t.*,
(row_number() over (partition by block order by position) -
row_number() over (partition by quantity, block order by position)
) as grp
from t
) t
group by quantity, grp, block;

Related

Self join taking time

I am having below query which selects SUM of AAD_00TO30 columns depending upon some conditions.
The query executes in 1 sec when I remove below condition, but it takes more than a min when same condition is included.
Can someone please suggest me any alternative to modify the query for better performance.
AND A.AAD_DATE >= (SELECT MAX(B.AAD_DATE)
FROM MST_AR_AS_ON_DATE B
WHERE MONTH(B.AAD_DATE) = MONTH(A.AAD_DATE) AND YEAR(B.AAD_DATE) = YEAR(A.AAD_DATE))
Query:
SELECT '00-30 #66ff66',SUM(A.AAD_00TO30) FROM MST_AR_AS_ON_DATE A
WHERE MONTH(A.AAD_DATE) = MONTH(DATEADD(MM,-1,GETDATE()))
AND YEAR(A.AAD_DATE) = YEAR(DATEADD(MM,-1,GETDATE()))
AND A.AAD_RESP_NOW = 4
AND A.AAD_DATE >= (SELECT MAX(B.AAD_DATE)
FROM MST_AR_AS_ON_DATE B
WHERE MONTH(B.AAD_DATE) = MONTH(A.AAD_DATE) AND YEAR(B.AAD_DATE) = YEAR(A.AAD_DATE))
Try using RANK() to tag rows that meet the criteria of having the last date of the month. Then eliminate rows without a winning rank:
WITH
MST_AR_AS_ON_DATE_RANKED AS (
SELECT
*,
RANK() OVER (
PARTITION BY
YEAR(AAD_DATE),
MONTH(AAD_DATE)
ORDER BY
AAD_DATE DESC -- last day of month ranked highest
) AS AAD_DATE_RANK
FROM
MST_AR_AS_ON_DATE
)
SELECT
'00-30 #66ff66',
SUM(AAD_00TO30)
FROM
MST_AR_AS_ON_DATE_RANKED
WHERE
MONTH(AAD_DATE) = MONTH(DATEADD(MM,-1,GETDATE()))
AND YEAR(AAD_DATE) = YEAR(DATEADD(MM,-1,GETDATE()))
AND AAD_RESP_NOW = 4
AND AAD_DATE_RANK = 1
;

Select all rows with max date for each ID

I have the following query returning the data as shown below. But I need to exclude the rows with MODIFIEDDATETIME shown in red as they have a lower time stamp by COMMITRECID. As depicted in the data, there may be multiple rows with the max time stamp by COMMITRECID.
SELECT REQCOMMIT.COMMITSTATUS, NOTEHISTORY.NOTE, NOTEHISTORY.MODIFIEDDATETIME, NOTEHISTORY.COMMITRECID
FROM REQCOMMIT INNER JOIN NOTEHISTORY ON REQCOMMIT.RECID = NOTEHISTORY.COMMITRECID
WHERE REQCOMMIT.PORECID = 1234
Here is the result of the above query
The desired result is only 8 rows with 5 in Green and 3 in Black (6 in Red should get eliminated).
Thank you very much for your help :)
Use RANK:
WITH CTE AS
(
SELECT R.COMMITSTATUS,
N.NOTE,
N.MODIFIEDDATETIME,
N.COMMITRECID,
RN = RANK() OVER(PARTITION BY N.COMMITRECID ORDER BY N.MODIFIEDDATETIME)
FROM REQCOMMIT R
INNER JOIN NOTEHISTORY N
ON R.RECID = N.COMMITRECID
WHERE R.PORECID = 1234
)
SELECT *
FROM CTE
WHERE RN = 1;
As an aside, please try to use tabla aliases instead of the whole table name in your queries.
*Disclaimer: You said that you wanted the max date, but the selected values in your post were those with the min date, so I used that criteria in my answer
This method just limits your history table to those with the MINdate as you described.
SELECT
REQCOMMIT.COMMITSTATUS,
NOTEHISTORY.NOTE,
NOTEHISTORY.MODIFIEDDATETIME,
NOTEHISTORY.COMMITRECID
FROM REQCOMMIT
INNER JOIN NOTEHISTORY ON REQCOMMIT.RECID = NOTEHISTORY.COMMITRECID
INNER JOIN (SELECT COMMITRECID, MIN(MODIFIEDDATETIME) DT FROM NOTEHISTORY GROUP BY COMMITRECID) a on a.COMMITRECID = NOTEHISTORY.COMMITRECID and a.DT = NOTEHISTORY.MODIFIEDDATETIME
WHERE REQCOMMIT.PORECID = 1234

Postgres: Making column in first row contain sum of same column in other rows

I'm a newbie in postgres and i have a troubling issue.
Suppose the output of my SQL query is
123456789;"2014-11-20 12:30:35.454875";500;200;"2014-11-16 16:16:26.976258";300
123456789;"2014-11-20 12:30:35.454875";500;200;"2014-11-16 16:16:27.173523";100
What i want is to sum up all the 4th column, and so that the first row will contain the sum of the 4th column
123456789;"2014-11-20 12:30:35.454875";500;400;"2014-11-16 16:16:26.976258";300
My query is
select l.phone_no, l.loan_time, l.cents_loaned/100, r.cents_deducted/100, r.event_time,
r.cents_balance/100
from tbl_table1 l
LEFT JOIN tbl_table2 r
ON l.tb1_id = r.tbl2_id
where l.phone_no=123456789
order by r.event_time desc
Any help will be appreciated.
Maybe this helps. It will add a new row containing the sum of the 4th column.
WITH query AS (
SELECT l.phone_no, l.loan_time, l.cents_loaned/100 AS cents_loaned,
r.cents_deducted/100 AS cents_deducted, r.event_time,
r.cents_balance/100 AS cents_balance,
ROW_NUMBER() OVER (ORDER BY r.event_time DESC) rn,
SUM(cents_deducted/100) OVER () AS sum_cents_deducted
FROM tbl_table1 l
LEFT
JOIN tbl_table2 r
ON l.tb1_id = r.tbl2_id
WHERE l.phone_no=123456789
)
SELECT phone_no, loan_time, cents_loaned, cents_deducted, event_time, cents_balance
FROM query
WHERE rn > 1
UNION
ALL
SELECT phone_no, loan_time, cents_loaned, sum_cents_deducted, event_time, cents_balance
FROM query
WHERE rn = 1
Use a window function over the whole set (OVER ()) as frame:
select l.phone_no, l.loan_time, l.cents_loaned/100
, sum(r.cents_deducted) OVER () / 100 AS total_cents_deducted
, r.event_time, r.cents_balance/100
FROM tbl_table1 l
LEFT JOIN tbl_table2 r ON l.tb1_id = r.tbl2_id
WHERE l.phone_no = 123456789
ORDER BY r.event_time desc
This will return all rows, not just the first. Your question is unclear as to that.

SQL Server: Group similar sales together

I'm trying to do some reporting in SQL Server.
Here's the basic table setup:
Order (ID, DateCreated, Status)
Product(ID, Name, Price)
Order_Product_Mapping(OrderID, ProductID, Quantity, Price, DateOrdered)
Here I want to create a report to group product with similar amount of sales over a time period like this:
Sales over 1 month:
Coca, Pepsi, Tiger: $20000 average(coca:$21000, pepsi: $19000, tiger: $20000)
Bread, Meat: $10000 avg (bread:$11000, meat: $9000)
Note that the text in () is just to clarify, not need in the report).
User define the varying between sales that can consider similar. Example sales with varying lower than 5% are consider similar and should be group together. The time period is also user defined.
I can calculate total sale over a period but has no ideas on how to group them together by sales varying. I'm using SQL Server 2012.
Any help is appreciated.
Sorry, my English is not very good :)
UPDATE: *I figured out about what I atually need ;)*
For an known array of numbers like: 1,2,3,50,52,100,102,105
I need to group them into groups which have at least 3 number and the difference between any two items in group is smaller than 10.
For the above array, output should be:
[1,2,3]
[100,102,105]
=> the algorithm take 3 params: the array, minimum items to form a group and maximum difference between 2 items.
How can I implement this in C#?
By the way, if you just want c#:
var maxDifference = 10;
var minItems = 3;
// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);
var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
totalDiff += n - lastNum;
// if distance of current number and first number in current group
// is less than the threshold, add into current group
if (totalDiff <= maxDifference)
{
group.Add(n);
lastNum = n;
continue;
}
// if current group has 3 items or more, add to final result
if (group.Count >= minItems)
result.Add(group);
// start new group
group = new List<int>() { n };
lastNum = n;
totalDiff = 0;
}
// forgot the last group...
if (group.Count >= minItems)
Result.Add(group);
the key here is, the array need to be ordered, so that you do not need to jump around or store values to calculate distances
I can't believe I did it~~~
-- this threshold is the key in this query
-- it means that
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE #th int
SET #th = 200
-- very simple, calculate total price for a time range
;WITH totals AS (
SELECT p.name AS col, sum(o.price * op.quantity) AS val
FROM order_product_mapping op
JOIN [order] o ON o.id = op.orderid
JOIN product p ON p.id = op.productid
WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( --
SELECT col, val, row_number()over(ORDER BY val DESC) rn
FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it
cte_last_rn AS (
SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
FROM cte_rn
),
-- then we join current to the row before it, and calculate
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
SELECT
c1.col, c1.val,
CASE
WHEN c2.val - c1.val <= #th THEN 0
ELSE 1
END AS range,
rn
FROM cte_last_rn c1
JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
SELECT c1.col, c1.val, sum(c2.range) rank
FROM cte_range c1
JOIN cte_range c2 ON c1.rn >= c2.rn
GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank
SELECT
avg(c1.val) AVG,
(
SELECT c2.col + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.val desc
FOR xml path('')
) product,
(
SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.desc
FOR xml path('')
) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2
The result will look like:
AVG PRODUCT PRICE
28 A, B, C 30, 29, 27
12 D, E, F 15, 12, 10
3 G, H, I 4, 3, 2
for understanding how I did concatenate, please read this:
Concatenate many rows into a single text string?
This query should produce what you expect, it displays products sales for every months for which you have orders :
SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month ,
Product.Name ,
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
Product.Name
Not tested, but if you provide sample data I could work on it
Look like I made things more complicate than it should be.
Here is what should solve the problem:
-Run a query to get sales for each product.
-Run K-mean or some similar algorithms.

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.