SQLServer: LAG & LEAD instead of recursive calculation - sql

I am pretty new to the new version of SQL Server 2016 and haven't used the new LAG & LEAD functions yet.
If i understood right, it will make work easier in cases where we currently use the ROW_NUMBER() function and furthermore join the results to connect the records in a certain order.
A case where i currently use this way to connect the records is:
;WITH IncrementingRowNums AS
(
SELECT d.MyKey
,d.Outstanding
,d.Rate
,AMO.PaymentAmount
,AMO.AmoDate
,ROW_NUMBER() OVER (PARTITION BY d.MyKey ORDER BY AMO.AmoDate ASC) AS RowNum
FROM Deals d
INNER JOIN Amortization AMO
ON d.MyKey = AMO.MyKey
),
lagged AS
(
SELECT MyKey
,Outstanding AS new_outstanding
,Rate
,PaymentAmount
,AmoDate
,RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.MyKey
,(l.new_outstanding - l.PaymentAmount)
* (1 + i.Rate * (DATEDIFF(DAY,l.AmoDate, i.AmoDate)/365.25))
AS new_outstanding
,i.Rate
,i.PaymentAmount
,i.AmoDate
,i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
AND i.MyKey = l.MyKey
Whats the best way to solve this solution with the LAG&LEAD functions?
I tried several ways, but it never worked out.
The only thing i want to calculate is the column new_outstanding.
Which calculates like:
(previous_record.new_outstanding - previous_record.PaymentAmount)
* (1 + current_record.Rate * (DATEDIFF(DAY,previous_record.AmoDate, current_record.AmoDate)/365.25))
As there is no SQL Server 2016 Version on rextester, i can just provide a little test-data and the my old solution of the recursive calculation: http://rextester.com/WVTM46505
Thanks

Related

Hive SQL Cross Join Question on condition

Wonderful coders of the universe, I have a question:
I'm writing a HIVE SQL script, and I'm wondering if it's possible to cross join on a condition (condition below is where the dayofweek is a friday), or if there's a performance-light alternative to what I'm doing below. I ONLY need to add in 2 rows to dates that are Fridays, which is just a persist of the Friday date data for Saturdays and Sundays. I get an error on the join condition, but I'm wondering if it's possible to bypass that somehow.
To be crystal clear, the way the query is written below gives me an error (specifically and DAYOFWEEK(performance_end_date) = 6). Just wondering if there's a way to write this where the syntax will be accepted.
Please advise.
select
portfolio_name
,Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS performance_end_date
,return
,nav
,nav_id
,row_no
from
(
SELECT portfolio_name, performance_end_date, return, cast(cast(nav as decimal(20,2))as string) as nav, nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
WHERE
portfolio_code IN ('1994','2078','2155','2365','2367')
and
year=2020 and month=09
) a
CROSS JOIN (SELECT stack(2, 1,2) as crs) crs and DAYOFWEEK(performance_end_date) = 6
where a.row_no = 1
CROSS JOIN don't have "join condition", so move your criteria to where clause
CROSS JOIN (SELECT stack(2, 1,2) as crs) crs
where a.row_no = 1 and DAYOFWEEK(performance_end_date) = 6

LAG() function in sql 2008

I have looked at a few other questions regarding this problem, we are trying to get a stored procedure working that contains the LAG() function, but the machine we are now trying to install an instance on is SQL 2008 and we can't use it
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
What I've tried so far (edited to reflect Zohar Peled's) suggestoin
SELECT se.SetID,se.SetName,se.ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level,
QuestionType
FROM tblSet se
left join tblSet se2 on se.ParentSetId = se2.ParentSetId -1
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where se.CollectionId=#colID and se.IsDeleted=0
order by se.SetID
it does not seem to be bringing out all of the same records when I run them side by side and the level value seems to be different also
I have put in some of the outputs into a HTML formatted table from the version containing LAG() (the first results) then the second is the new version, where the levels are not coming out the same
https://jsfiddle.net/gyn8Lv3u/
LAG() can be implemented using a self-join as Jeroen wrote in his comment, or by using a correlated subquery. In this case, it's a simple lag() so the correlated subquery is also simple:
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (
(
SELECT TOP 1 ParentSetId
FROM tblSet seInner
WHERE seInner.ParentSetId < se.ParentSetId
ORDER BY seInner.ParentSetId DESC
)
<> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
If you had specified an offset it would be harder do implement using a correlated subquery, and a self join would make a much easier solution.
Sample data and desired results would help. This construct:
(case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1
end) as level
is quite strange. You are lagging by the only column used in the order by. That makes sense. But then you are comparing the value to the same column, implying that there are duplicates.
If you have duplicates, then order by se.ParentSetId is unstable. That is, the "previous" row is indeterminate because of the duplicate values being ordered. You can run the query twice and get different results.
I am guessing you want one row with the value 1 for each parent set id. If so, then in either database, you would use:
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level
This also has the problem with an unstable ordering. You can fix this by changing the order by to what you really want.

DB2 getting QDT Array List maximum exceeded using CTE and sql recursion

I am using CTE to create a recursive query to merge multiple column data into one.
I have about 9 working CTE's (I need to merge columns a few times in one row per request, so I have the CTE helpers). When I add the 10th, I get an error. I am running the query on Visual Studio 2010 and here is the error:
And on the As400 system using the, WRKOBJLCK MyUserProfile *USRPRF command, I see:
I can't find any information on this.
I am using DB2 running on an AS400 system, and using: Operating system: i5/OS Version: V5R4M0
I repeat these same 3 CTE's but with different conditions to compare against:
t1A (ROWNUM, PARTNO, LOCNAM, LOCCODE, QTY) AS
(
SELECT rownumber() over(partition by s2.LOCPART), s2.LOCPART, s2.LOCNAM, s2.LOCCODE, s2.LOCQTY
FROM (
SELECT distinct s1.LOCPART, L.LOCNAM, L.LOCCODE, L.LOCQTY
FROM(
SELECT COUNT(LOCPART) AS counts, LOCPART
FROM LOCATIONS
WHERE LOCCODE = 'A'
GROUP BY LOCPART) S1, LOCATIONS L
WHERE S1.COUNTS > 1 AND S1.LOCPART = L.LOCPART AND L.LOCCODE = 'A'
)s2
),
t2A(PARTNO, LIST, QTY, CODE, CNT) AS
(
select PARTNO, LOCNAM, QTY, LOCCODE, 1
from t1A
where ROWNUM = 1
UNION ALL
select t2A.PARTNO, t2A.LIST || ', ' || t1A.LOCNAM, t1A.QTY, t1A.LOCCODE, t2A.CNT + 1
FROM t2A, t1A
where t2A.PARTNO = t1A.PARTNO
AND t2A.CNT + 1 = t1A.ROWNUM
),
t3A(PARTNO, LIST, QTY, CODE, CNT) AS
(
select t2.PARTNO, t2.LIST, q.SQTY, t2.CODE, t2.CNT
from(
select SUM(QTY) as SQTY, PARTNO
FROM t1A
GROUP BY PARTNO
) q, t2A t2
where t2.PARTNO = q.PARTNO
)
Using these, I just call a simple select on one of the CTE's just for testing, and I get the error each time when I have more than 9 CTE's (even if only one is being called).
In the AS400 error (green screen snapshot) what does QDT stand for, and when am I using an Array here?
This was a mess. Error after error. The only way I could get around this was to create views and piece them together.
When creating the view I was only able to get it to work with one CTE not multiple, then what worked fine as one recursive CTE, wouldn't work when trying to define as a view. I had to break apart the sub query into views, and I couldn't create a view out of SELECT rownumber() over(partition by COL1, Col2) that contained a sub query, I had to break it down into two views. If I called SELECT rownumber() over(partition by COL1, Col2) using a view as its subquery and threw that into the CTE it wouldn't work. I had to put the SELECT rownumber() over(partition by COL1, Col2) with its inner view into another view, and then I was able to use it in the CTE, and then create a main view out of all of that.
Also, Each error I got was a system error not SQL.
So in conclusion, I relied heavily on views to fix my issue if anyone ever runs across this same problem.

Sql grabbing most recent record

I'm currently using DBISAM sql compiler. It's very very identical to ms sql compiler, the only difference is that I can't have any nested join statements.
The query below is a nested query that grabs the most recent loan record and the rate. I'm wondering if there's another way I can write this without the nested select statement.
select * from
(select Loan_Id, Max(effectiveDate) as EffectiveDate from InterestTerms
group by Loan_Id) as Y
join InterestTerms as X on Y.Loan_Id = X.Loan_Id and Y.EffectiveDate = X.EffectiveDate
order by Y.Loan_Id
You could try the following:
select
X.*
FROM
InterestTerms AS X
WHERE
X.effectiveDate IN (
select
Max(Y.effectiveDate) as MaxED
from
InterestTerms as Y
WHERE
Y.Loan_Id = X.Loan_Id
)
order by
X.Loan_Id
(UPDATED)

TSQL Last Record Efficiency Cursor, SubQuery, or CTE

Consider the following query...
SELECT
*
,CAST(
(CurrentSampleDateTime - PreviousSampleDateTime) AS FLOAT
) * 24.0 * 60.0 AS DeltaMinutes
FROM
(
SELECT
C.SampleDateTime AS CurrentSampleDateTime
,C.Location
,C.CurrentValue
,(
SELECT TOP 1
Previous.SampleDateTime
FROM Samples AS Previous
WHERE
Previous.Location = C.Location
AND Previous.SampleDateTime < C.SampleDateTime
ORDER BY Previous.SampleDateTime DESC
) AS PreviousSampleDateTime
FROM Samples AS C
) AS TempResults
Assuming all things being equal such as indexing, etc is this the most efficient way of achieving the above results? That is using a SubQuery to retrieve the last record?
Would I be better off creating a cursor that orders by Location, SampleDateTime and setting up variables for CurrentSampleDateTime and PreviousSampleDateTime...setting the Previous to the Current at the bottom of the while loop?
I'm not very good with CTE's is this something that could be accomplished more efficiently with a CTE? If so what would that look like?
I'm likely going to have to retrieve PreviousValue along with Previous SampleDateTime in order to get an average of the two. Does that change the results any.
Long story short what is the best/most efficient way of holding onto the values of a previous record if you need to use those values in calculations on the current record?
----UPDATE
I should note that I have a clustered index on Location, SampleDateTime, CurrentValue so maybe that is what is affecting the results more than anything.
with 5,591,571 records my query (the one above) on average takes 3 mins and 20 seconds
The CTE that Joachim Isaksson below on average is taking 5 mins and 15 secs.
Maybe it's taking longer because it's not using the clustered index but is using the rownumber for the joins?
I started testing the cursor method but it's already at 10 minutes...so no go on that one.
I'll give it a day or so but think I will accept the CTE answer provided by Joachim Isaksson just because I found a new method of getting the last row.
Can anyone concur that it's the index on Location, SampleDateTime, CurrentValue that is making the subquery method faster?
I don't have SQL Server 2012 so can't test the LEAD/LAG method. I'd bet that would be quicker than anything I've tried assuming Microsoft implemented that efficiently. Probably just have to swap a pointer to a memory reference at the end of each row.
If you are using SQL Server 2012, you can use the LAG window function that retrieves the value of the specified column from the previous row. It returns null if there is no previous row.
SELECT
a.*,
CAST((a.SampleDateTime - LAG(a.SampleDateTime) OVER(PARTITION BY a.location ORDER BY a.SampleDateTime ASC)) AS FLOAT)
* 24.0 * 60.0 AS DeltaMinutes
FROM samples a
ORDER BY
a.location,
a.SampleDateTime
You'd have to run some tests to see if it's faster. If you're not using SQL Server 2012 then at least this may give others an idea of how it can be done with 2012. I like #Joachim Isaksson 's answer using a CTE with a Row_Number()/Partition By for 2008 and 2005.
SQL Fiddle
Have you considered creating a temp table to use instead of a CTE or subquery? You can create indexes on the temp table that are more suited for the join on RowNumber.
CREATE TABLE #tmp (
RowNumber INT,
Location INT,
SampleDateTime DATETIME,
CurrentValue INT)
;
INSERT INTO #tmp
SELECT
ROW_NUMBER() OVER (PARTITION BY Location
ORDER BY SampleDateTime DESC) rn,
Location,
SampleDateTime,
CurrentValue
FROM Samples
;
CREATE INDEX idx_location_row ON #tmp(Location,RowNumber) INCLUDE (SampleDateTime,CurrentValue);
SELECT
a.Location,
a.SampleDateTime,
a.CurrentValue,
CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT) * 24.0 * 60.0 AS DeltaMinutes
FROM #tmp a
LEFT JOIN #tmp b ON
a.Location = b.Location
AND b.RowNumber = a.RowNumber +1
ORDER BY
a.Location,
a.SampleDateTime
SQL Fiddle #2
As always, testing with your real data is king.
Here's a CTE version that shows the samples for each location with time deltas from the previous sample. It uses OVER ranking, which usually does well in comparison to subqueries for solving the same problem.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Location
ORDER BY SampleDateTime DESC) rn
FROM Samples
)
SELECT a.*,CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT)
* 24.0 * 60.0 AS DeltaMinutes
FROM cte a
LEFT JOIN cte b ON a.Location = b.Location AND b.rn = a.rn +1
An SQLfiddle to test with.