TSQL - Calculate difference between values from yesterday to today, SELF JOIN question - sql

I have pieced together code from various answers online to get the result I want but, I don't understand why it's working and I would like to know what the JOIN is actually doing where it says RowNum + 1.
The original problem is to calculate the percentage difference between a value from yesterday to today. I'm a little fuzzy on Self Joins, but I do understand self join. When I add the RowNum column, that confuses me.
Question
What is T2.RowNum = T1.RowNum + 1 doing in the self join please?
IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1
CREATE TABLE #T1 (
ProductTotal int
,CountDate date
)
INSERT INTO #t1
VALUES
(893911,'20200815')
,(888970,'20200816')
,(899999,'20200817')
WITH cte AS (
SELECT
ROW_NUMBER() OVER(ORDER BY CountDate) AS RowNum
,ProductTotal
,CountDate
FROM #t1
WHERE CountDate > CAST(GETDATE () - 2 AS DATE)
)
SELECT
t1.RowNum
,t1.ProductTotal
,CAST(((t1.ProductTotal - t2.ProductTotal) * 1.0 / t2.ProductTotal) * 100 AS DECIMAL(10,2)) AS ProductDiff
,t1.CountDate
FROM cte AS t1
LEFT JOIN cte t2 ON T2.RowNum = T1.RowNum + 1

Assuming you have values on each day, a better approach uses lag():
SELECT ProductTotal, CountDate,
(ProductTotal - prev_ProductTotal) * 1.0 / ProductTotal
FROM (SELECT t.*,
LAG(ProductTotal) OVER (ORDER BY CountDate) as prev_ProductTotal
FROM #t1 t
) t
WHERE CountDate > CAST(GETDATE () - 1 AS DATE)

Note, that, as I commented, I completely agree with Gordon here, and that LAG (or LEAD) is the right answer here. To explain what you ask in the comment "I don't understand how T2.RowNum = T1.RowNum + 1 works":
A JOIN returns rows where the expression in the ON is true. As you have an INNER JOIN then only rows from both sides of the JOIN where the expression evaluates to True are displayed. For a LEFT JOIN any prior previously returned would not be "lost". (There are other types of joins too.)
For T2.RowNum = T1.RowNum + 1 this is basic maths. 2 is matched to 1 (1+1), 3 is match to 2 (2+1)... 100 is matched to 99 (99 + 1). So the data from T1 is matched to the row "after" in terms of the ROW_NUMBER order defined within the CTE. In this case, that would be the row with the "next" value for CountDate in ascending order.

Related

SQL Server 2008: duplicate a row n-times, where n is a value in a field

In SQL Server 2018 I have three tables:
T1 (idService, dateStart, dateStop)
T2 (idService, totalCostOfService)
T3 (idService, companyName)
Using joins, I created a view:
V1 (idService, dateStart, dateStop, totalCostOfService, companyName)
And we are fine. I can do my selects on the view and obtain the list of services done.
What I would like to do now is to duplicate every row of the view n times, where n=dateStart-dateStop; every row should have a "new" totalCostOfService = totalCostOfService/n.
I can do that using a temporary table, declaring variables, insert in temp using some while etc. etc. Let's call it "the procedure"
But what I would like to understand is:
is it possibile to do that directly with a select on V1? If not, is it possible to save "the procedure" as a view so that I can have it as a easy select?
Sorry if my question looks somewhat stupid, but I'm totally new with SQL. I tried searching here and on google but I couldn't find what an answer to my questions.
Thank you!
Rather than an rCTE (which is RBAR), you could use a Tally Table:
WITH N AS (
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4) --10000
SELECT *
FROM YourTable
JOIN Tally T ON T.I <= dateStart-dateStop --Assumes dateStart and DateStop are integer values, even though their name implies otherwise
--If they are dates, then use DATEDIFF(DAY, dateStart, dateEnd)
That tally will generate numbers up to 10000 (which over 27 years worth of days. That should be far more than enough).
I will assume the existence of a numbers table which has the column val for the individual value numbers. If you don't, you will find plenty by searching around.
Add this in the end of the FROM clause of your view:
cross apply (select datediff(day,T1.dateStart,T1.dateStop)+1 as n_days)q1 -- number of days INCLUDING start
cross apply (select dateadd(day,T1.dateStart,n.val) as day_of_charge)q2 from numbers n where n.val between 0 and n_days-1)
Then you will be able to have the following field on your SELECT:
T2.totalCostOfService/n_days as totalCostOfService
I'll add a numbers table solution shortly.
You can use a recursive CTE:
with cte as (
select idService, dateStart, dateStop,
totalCostOfService / (datediff(day, datestop, datestart) + 1) as dailyCostOfService,
companyName
from v1
union all
select idService,
dateadd(day, 1, dateStart),
dateStop,
dailyCostOfService
companyName
from cte
)
select idservice, dateStart as dateOfService,
dailyCostOfService, companyName
from cte;
Note that if there are more than 100 days in any row, then you will need to add OPTION (MAXRECURSION 0).

Update record for the last week

I'm building a report that needs to show how many users were upgraded from account status 1 to account status 2 each hour for the last week (and delete hours where the upgrades = 0). My table has an updated date, however it isn't certain that the account status is the item being updated (it could be contact information etc).
The basic table config that I'm working with is below. There are other columns but they aren't needed for my query.
account_id, account_status, updated_date.
My initial idea was to first filter and look at the data for the current week, then find if they were at account_status = 1 and later account_status = 2.
What's the best way to tackle this?
This is the kind of thing that you would use a SELF JOIN for. It's tough to say exactly how to do this without getting any kind of example data, but hopefully you can build off of this at least. There are a lot of tutorials on how to write a successful self join, so I'd refer to those if you're having difficulties.
select a.account_id
from tableName a, tableName b
where a.account_id= b.account_id
and
(a.DateModified > 'YYYY-MM-DD' and a.account_status = 1)
and
(b.DateModified < 'YYYY-MM-DD' and b.account_status= 2)
Maybe you could try to rank all the updates older than an update, with a status of 2 for an account by the timestamp descending. Check if such an entry with status 1 and rank 1 exists, to know that the respective younger update did change the status from 1 to 2.
SELECT *
FROM elbat t1
WHERE t1.account_status = 2
AND EXISTS (SELECT *
FROM (SELECT rank() OVER (ORDER BY t2.updated_date DESC) r,
t2.account_status
FROM elbat t2
WHERE t2.account_id = t1.account_id
AND t2.updated_date <= t1.updated_date) x
WHERE x.account_status = 1
AND x.r = 1);
Then, to get the hours you, could create a table variable and fill it with the hours worth a week (unless you already have a suitable calender/time table). Then INNER JOIN that table (variable) to the result from above. Since it's an INNER JOIN hours where no status update exists won't be in the result.
DECLARE #current_time datetime = getdate();
DECLARE #current_hour datetime = dateadd(hour,
datepart(hour,
#current_time),
convert(datetime,
convert(date,
#current_time)));
DECLARE #hours
TABLE (hour datetime);
DECLARE #interval_size integer = 7 * 24;
WHILE #interval_size > 0
BEGIN
INSERT INTO #hours
(hour)
VALUES (dateadd(hour,
-1 * #interval_size,
#current_hour));
SET #interval_size = #interval_size - 1;
END;
SELECT *
FROM #hours h
INNER JOIN (SELECT *
FROM elbat t1
WHERE t1.account_status = 2
AND EXISTS (SELECT *
FROM (SELECT rank() OVER (ORDER BY t2.updated_date DESC) r,
t2.account_status
FROM elbat t2
WHERE t2.account_id = t1.account_id
AND t2.updated_date <= t1.updated_date) x
WHERE x.account_status = 1
AND x.r = 1)) y
ON convert(date,
y.updated_date) = h.convert(date,
h.hour)
AND datepart(hour,
y.updated_date) = datepart(hour,
h.hour);
If you use this often and/or performance is important, you might consider to introduce persistent, computed and indexed columns for the convert(...) and datepart(...) expressions and use them in the query instead. Indexing the calender/time table and the columns used in the subqueries is also worth a consideration.
(Disclaimer: Since you didn't provide DDL of the table nor any sample data this is totally untested.)

TSQL Repeat values on outer join

I've been working on this for some time now and I would like to get some help.
My database is SQL Server 2008 R2 ( I know, very old).
I basically have a transaction table that captures values per week, by job.
I would like to repeat the last value of a job until it finds the next value.
I have included some data from my table. The last column (values needed) is what I'm trying to achieve.
Thank you very much.
Bruce
image of data
I've tried the SQL below, but it is not giving the correct values. Please see the attachment.
SQL
select t.*, t2.percentcomp as value_needed
from #1 t
outer apply
(select top 1 t2.*
from #1 t2
where t2.job_skey = t.job_skey and
t2.COST_CODE_SKEY=t2.COST_CODE_SKEY and
t2.period_end_date <= t.period_end_date and
t2.percentcomp is not null
order by t.JOB_SKEY,t.phase,t.period_end_date desc
) t2
Attachment..view of SQL. Value_needed should begin with 5
You can do what you want using OUTER APPLY:
select t.*, t2.percent_comp as value_needed
from t outer apply
(select top 1 t2.*
from t t2
where t2.job_skey = t.job_skey and
t2.period_end_date_id <= t.period_end_date_id and
t2.percentcomp is not null
order by t2.period_start_date desc
) t2;
enter image description here

SQL Using PARTITION when comparing values in consecutive DataRows

I'm using a SQL statement to compare consecutive values of a field [Allocation] as follows:
;WITH cteMain AS
(SELECT AllocID, CaseNo, FeeEarner, Allocation, ROW_NUMBER() OVER (ORDER BY AllocID) AS sn
FROM tblAllocations)
SELECT m.AllocID, m.CaseNo, m.FeeEarner, m.Allocation,
ISNULL(sLag.Allocation, 0) AS prevAllocation,
(m.Allocation - ISNULL(sLag.Allocation, 0)) AS movement
FROM cteMain AS m
LEFT OUTER JOIN cteMain AS sLag
ON sLag.sn = m.sn-1;
The query returns a calculated field [movement] which is the increase or decrease in consecutive values of [Allocation].
I have included a screen shot of the data returned by this query.
However the query is not yet complete. I need to revise the statement so that the consecutive values of [Allocation] compared are grouped / partitioned by [FeeEarner] and [CaseNo].
For example, at line 18 of the data, the [Allocation] is 800 and is compared to a previous value of 600. But the previous value belongs to a different [CaseNo] i.e. 6 rather than 31. In fact [FeeEarner] 'PJW' has no previous [Allocation] on [CaseNo] '31' and so the [prevAllocation] should be '0' from the ISNULL keyword.
I have tried changing
OVER (ORDER BY AllocID)
to
OVER (PARTITION BY CaseNo, FeeEarner ORDER BY AllocID)
But that results in a lot of lines of data being repeated.
Can someone advise how to compare consecutive values of [Allocation] but only between rows of data with matching [FeeEarner] AND [CaseNo] please?
NOTE - I cannot use LAG because my customer is using SQL Server 2008 R2 which does not support Parallel Data Warehousing.
I believe you were close. Try this (notice the added pieces in the join clause to match the partition - without this you will match every row number 3 with every row number 2 across partitions, which is what you were seeing):
;WITH cteMain AS
(
SELECT AllocID, CaseNo, FeeEarner, Allocation,
ROW_NUMBER() OVER (PARTITION BY CaseNo, FeeEarner ORDER BY AllocID) AS sn
FROM tblAllocations
)
SELECT m.AllocID, m.CaseNo, m.FeeEarner, m.Allocation,
ISNULL(sLag.Allocation, 0) AS prevAllocation,
(m.Allocation - ISNULL(sLag.Allocation, 0)) AS movement
FROM cteMain AS m
LEFT OUTER JOIN cteMain AS sLag
ON sLag.CaseNo = m.CaseNo
AND sLag.FeeEarner = m.FeeEarner
AND sLag.sn = m.sn-1
You need to change your join condition as well:
FROM cteMain m LEFT OUTER JOIN
cteMain sLag
ON sLag.sn = m.sn-1 and sLag.FeeEarner = m.FeeEarner and slag.CaseNo = m.CaseNo
Also, you should have only one order by in the row_number() call.
Also, if you are using Oracle, SQL Server 2012, newer versions of DB2, or Postgres, then the lead()/lag() functions would be a better choice.
One more option with OUTER APPLY and EXISTS
SELECT t1.AllocID, t1.CaseNo, t1.FreeEarner, t1.Allocation,
ISNULL(o.Allocation, 0) AS PrevAllocation,
(t1.Allocation - ISNULL(o.Allocation, 0)) AS movement
FROM tblAllocations t1
OUTER APPLY (
SELECT t2.AllocID, t2.CaseNo, t2.FreeEarner, t2.Allocation
FROM tblAllocations t2
WHERE EXISTS (
SELECT 1
FROM tblAllocations t3
WHERE t1.AllocID > t3.AllocID
HAVING MAX(t3.AllocID) = t2.AllocID
) AND t1.CaseNo = t2.CaseNo
) o

Math with previous row in SQL, avoiding nested queries?

I want to do some math on the previous rows in an SQL request in order to avoid doing it in my code.
I have a table representing the sales of two entities (the data represented here is doesn't make much sense and it's just an excerpt) :
YEAR ID SALES PURCHASE MARGIN
2009 1 10796820,57 2662369,19 8134451,38
2009 2 2472271,53 2066312,34 405959,19
2008 1 9641213,19 1223606,68 8417606,51
2008 2 3436363,86 2730035,19 706328,67
I want to know how the sales, purchase, margin... have evolved and compare one year to the previous one.
In short I want an SQL result with the evolutions pre-computed like this :
YEAR ID SALES SALES_EVOLUTION PURCHASE PURCHASE_EVOLUTION MARGIN MARGIN_EVOLUTION
2009 1 10796820,57 11,99 2662369,19 117,58 8134451,38 -3,36
2009 2 2472271,53 -28,06 2066312,34 -24,31 405959,19 -42,53
2008 1 9641213,19 1223606,68 8417606,51
2008 2 3436363,86 2730035,19 706328,67
I could do some ugly stuff :
SELECT *, YEAR, ID, SALES , (SALES/(SELECT SALES FROM TABLE WHERE YEAR = OUTER_TABLE.YEAR-1 AND ID = OUTER_TABLE.ID) -1)*100 as SALES_EVOLUTION (...)
FROM TABLE as OUTER_TABLE
ORDER BY YEAR DESC, ID ASC
But I have arround 20 fields for which I would have to do a nested query, meaning I would have a very huge and ugly query.
Is there a better way to do this, with less SQL ?
Using sql server (but this should work for almost any sql), with the table provided you can use a LEFT JOIN
DECLARE #Table TABLE(
[YEAR] INT,
ID INT,
SALES FLOAT,
PURCHASE FLOAT,
MARGIN FLOAT
)
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2009,1,10796820.57,2662369.19,8134451.38
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2009,2,2472271.53,2066312.34,405959.19
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2008,1,9641213.19,1223606.68,8417606.51
INSERT INTO #Table ([YEAR],ID,SALES,PURCHASE,MARGIN) SELECT 2008,2,3436363.86,2730035.19,706328.67
SELECT cur.*,
((cur.Sales / prev.SALES) - 1) * 100
FROM #Table cur LEFT JOIN
#Table prev ON cur.ID = prev.ID AND cur.[YEAR] - 1 = prev.[YEAR]
The LEFT JOIN will allow you to still see values from 2008, where an INNER JOIN would not.
Old skool solution:
SELECT c.YEAR, c.ID, c.SALES, c.PURCHASE, c.MARGIN
, p.YEAR, p.ID, p.SALES, p.PURCHASE, p.MARGIN
FROM tab AS c -- current
INNER JOIN tab AS p -- previous
ON c.year = p.year - 1
AND c.id = p.id
If you have a db with analytical functions (MS SQL, Oracle) you can use the LEAD or LAG analytical functions, see http://www.oracle-base.com/articles/misc/LagLeadAnalyticFunctions.php
I think this would be the correct application:
SELECT c.YEAR, c.ID, c.SALES, c.PURCHASE, c.MARGIN
, LAG(c.YEAR, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.ID, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.SALES, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.PURCHASE, 1, 0) OVER (ORDER BY ID,YEAR)
, LAG(c.MARGIN, 1, 0) OVER (ORDER BY ID,YEAR)
FROM tab AS c -- current
(not really sure, haven't played with this enough)
You can do it like this:
SELECT t1.*, t1.YEAR, t1.ID, t1.SALES , ((t1.sales/t2.sales) -1) * 100 as SALES_EVOLUTION
(...)
FROM Table t1 JOIN Table t2 ON t1.Year = (t2.Year + 1) AND t1.Id = t2.Id
ORDER BY t1.YEAR DESC, t1.ID ASC
Now, if you want to compare more years, you'd have to do more joins, so it is a slightly ugly solution.