Find the max value from previous row - sql

I want to find in the below rows the maximum "book_type" value:
book_id
book_type
book_time
uniq_step
book_ordered
1
2022-10-13 00:00:00
800
0
1
2022-10-13 00:00:00
801
0
1
poetry
2022-10-13 00:00:00
802
1
1
2022-10-13 00:00:00
803
0
1
2022-10-13 01:00:00
804
0
1
poetry
2022-10-13 01:00:00
802
1
I want in the line with uniq_step = 804 to have as book_type = poetry but when I use the LAG window function I am getting ' ' (the space string).
So is there any way to take from the partition by book_time the max value as a lag?

You could try using the LAST_VALUE window function in place of the LAG one. Since your "book_type" values can't be NULL in your specific case, you can use a CASE statement inside the window function to make them NULL.
LAST_VALUE(CASE WHEN book_type <> "" THEN book_type END) OVER(
PARTITION BY book_id
ORDER BY uniq_step
)
Side Note: Empty spaces/strings are still values in a DBMS. If you have the possibility of refactoring the empty values in your db to NULL values, that will make the DBMS handle your data better than how it does now.

Related

SQL Server: Limit filling NULL values up to row with specific text

I have a CTE on which i am filling null values with the previous value before row 100 available by using the following script
SELECT [YearMonth],grp,
CASE WHEN grp>grp
THEN (FIRST_VALUE([Value 1]) over (partition by grp order by [YearMonth]))
The problem is that i want the rows after the "Latest" to be null but i dont want these rows to be deleted cause there are values in other columns which i need to show. I would appreciate of any help.
EDIT
Current Table
YearMonth
Value 1
2021-01
0.9575
2021-02
NULL
2021-03
NULL
2021-04
NULL
2021-05
NULL
2021-06
0.9875
Expected table
YearMonth
Value 2
2021-01
0.9575
2021-02
0.9575
2021-03
0.9575
2021-04
0.9575
2021-05
0.9575
2021-06
0.9875

CTE recursion infinite loop

I'm working with a stored procedure and using a CTE in SQL Server and I'm trying to reach some data from a 2 tables, but when the execution goes to the CTE query it gets an infinite loop and never ends, is there a way to prevent that infinite loop?
This is the query that I create:
WITH tableName(Id, enddate, statusDte, closeId, shceDte, calcDte, closeEndDte, ParentId, LastClose, lasCloseDte, closeClass,addSe,twon,code)
AS
(
SELECT
tba.Id,
CASE WHEN tb.ParentId IS NOT NULL
THEN tb.Id
WHEN tb.statusDte IN (1,2,3)
THEN tb.calcDte ELSE tb.shceDte
END ForecastDueDate,
statusDte, closeId, shceDte, calcDte,
CASE WHEN tb.ParentId IS NULL
THEN closeEndDte ELSE NULL END, tb.ParentId, 0,
CASE WHEN tb.ParentId IS NOT NULL
THEN statusDte
WHEN tb.statusDte = 5
AND (tb.calcDte BETWEEN '1/1/2020 12:00:00 AM' AND '12/31/2020 11:59:59 PM'
OR tb.closeEndDte BETWEEN '1/1/2020 12:00:00 AM' AND '12/31/2020 11:59:59 PM')
THEN ams.GetPreviousNthFullAuditDate(tb.Id, tb.AuditID, 2) ELSE a.statusDate END as lastDate,
a.closeClass, tba.addSe,tba.town,tba.code
FROM
tableA tba
INNER JOIN
tableB tb ON tb.Id = tba.Id
WHERE
statusDte NOT IN (3,4) AND tba.IsAtve = 1
UNION ALL
SELECT
Id, enddate,
statusDte, statusDte, shceDte, calcDte, closeEndDte, ParentId,
0, lasCloseDte, closeClass,addSe,twon,code
FROM
tableName
WHERE
enddate BETWEEN enddate AND '12/31/2020 11:59:59 PM'
)
SELECT *
FROM tableName
OPTION (maxrecursion 0)
Expected results
Id enddate statusDte closeId shceDte calcDte closeEndDte parentId lastClose lastCloseDte closeClass addSe town code
----------- ----------------------- ------------- ----------- ----------------------- ----------------------- ----------------------- ----------------------- ----------- ----------------------- ----------- --------------------------------- ---------------------- --------------------------------------------------
133 2011-04-04 00:00:00.000 22 14453 NULL 2011-04-04 00:00:00.000 2099-12-31 00:00:00.000 NULL 0 NULL 1 4707 EXECUTIVE DRIVE '' SAN DIEGO 123
56 2018-12-07 13:00:00.000 22 52354 NULL 2018-12-07 13:00:00.000 2019-12-07 00:00:00.000 NULL 0 NULL 1 75 STATE ST FL 24 '' BOSTON 345
12 2021-02-05 17:00:00.000 22 75751 NULL 2021-02-05 17:00:00.000 NULL NULL 0 NULL 1 1450 FRAZEE RD STE 308 '' SAN DIEGO 678
334 2019-03-07 16:30:00.000 15 66707 2019-03-07 16:30:00.000 2019-03-23 21:00:00.000 NULL NULL 0 2019-03-07 16:30:00.000 1 42690 RIO NEDO, STE E '' TEMECULA 91011
33 2020-01-10 17:00:00.000 22 65568 NULL 2020-01-10 17:00:00.000 NULL NULL 0 2018-01-10 17:00:00.000 1 2518 UNICORNIO ST. '' CARLSBAD 136
55 2020-04-16 20:00:00.000 22 67812 NULL 2020-04-16 20:00:00.000 NULL NULL 0 2018-04-17 20:00:00.000 1 4534 OSPREY STREET '' SAN DIEGO 653
66 2020-02-21 17:00:00.000 22 75956 NULL 2020-02-21 17:00:00.000 NULL NULL 0 2019-02-21 17:00:00.000 1 3511 CAMINO DEL RIO S, STE 305 '' SAN DIEGO 0484
094 2021-02-20 21:00:00.000 22 75629 NULL 2021-02-20 21:00:00.000 NULL NULL 0 NULL 1 29349 EAGLE DR '' MURRIETA 345
First, let's try to add some best practices. Qualify all your columns with the appropriate table alias. Just doing some of them is inconsistent and inconsistent style is difficult to read and prone to errors.
Next, you've (hopefully) dumbed down your actual query. Generic names like "tableA" hinder understanding.
Next - your first case expression seems highly suspicious. You have one branch returns tb.id and the others return what appears to be a date (or datetime). You can, unfortunately, cast an int to a datetime. Might not make any sense and it won't generate an error. So - does this make sense?
Next - you've made a common mistake with your datetime boundaries. Depending on your data you might never know this. But there is no reason to expect that and there is every reason to write your logic so that it avoids any possibility. Tibor discusses in great detail here. Shorter version - your upper boundary should always be an exclusive one to support all possible values of time for your datatype. 23:59:59 will ignore any time values with non-zero milliseconds. And use a literal format that is not dependent on language or connection settings.
Next, you add confusion. You named your columns in the cte declaration but your code also includes aliases for some (but not all - see, refer to the consistency comment) columns which differ significantly from the actual column name for the cte. The 2nd column for the cte is "enddate", the anchor query uses the alias "ForecastDueDate"
Next, you have this: tb.statusDte = 5. The name implies date; the literal implies something different. You have other columns that end in "Dte" that are obviously dates, but not this one? Danger, danger!
Next, you refer to columns "a.closeClass" and "a.statusDate". There is no table or alias named "a".
Lastly, you have:
WHERE enddate BETWEEN enddate AND '12/31/2020 11:59:59 PM'
Think about what you wrote. Is not enddate always between enddate and Dec 31 2010 (so long as enddate <= that value)? I think this is the source of your issue. You're not computing or adjusting anything from the anchor, so the recursed part just keeps selecting and selecting and selecting. There is no logic to end the recursion.
The next question is obviously "now to fix it". That is impossible to say without knowing your schema, what it represents, and your goal. The use of recursion here is not obvious.
If the data is in a structure that the hierarchy between records is is a loop then recursion goes to infinite causing a problem in SQL. You will see the resources used by SQL process is increasing tremendously.
If you use MAXRECURSION with a different value than 0 (zero lets SQL to continue recursion without a limit) you will be able to limit the recursion.
With data that is looping or referencing each other you can this MAXRECURSION parameter

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Detect Intervals

id_person transaction internation_in internation_out
1 456465 2015-01-01 2015-02-01
2 564564 2015-02-03 2015-04-02
3 4564654 2015-01-01 2015-01-05
4 4564646 2015-01-01 2015-02-04
4 4564656 2015-03-01 2015-04-15
4 87899465 2015-05-16 2015-05-25
5 56456456 2015-01-01 2105-01-08
5 45456546 2015-02-04 2015-03-04
I want to know how to group by id_person the difference (Interval in hours) between the internation_out from the first transaction with the internation_in of the next transaction.
I probe with lag and lead but I can't group by id_person
I Want this Result using id_person 4 for example
id_person transaction Gap
4 4564646 Null
4 4564656 The result of (2015-02-04- 2015-03-01)
4 87899465 The result of (2015-04-15- 2015-05-16)
If your time periods are not overlapping (and yours are not), then there is a simple calculation for the gaps: it is the total number of days from the beginning to the end minus the total on each row. So, you don't need lead() or lag():
select id_person,
(case when count(*) > 1
then (max(internation_out) - min(internation_in) -
sum(internation_out - internation_in)
)
end) as gap_duration
from table t
group by id_person;
Note that this returns NULL if there is only one row for the person. If you want 0, then you don't need the case.

sql sliding window - finding max value over interval

i have a sliding window problem. specifically, i do not know where my window should start and where it should end. i do know the size of my interval/window.
i need to find the start/end of the window that delivers the best (or worst, depending on how you look at it) case scenario.
here is an example dataset:
value | tstamp
100 | 2013-02-20 00:01:00
200 | 2013-02-20 00:02:00
300 | 2013-02-20 00:03:00
400 | 2013-02-20 00:04:00
500 | 2013-02-20 00:05:00
600 | 2013-02-20 00:06:00
500 | 2013-02-20 00:07:00
400 | 2013-02-20 00:08:00
300 | 2013-02-20 00:09:00
200 | 2013-02-20 00:10:00
100 | 2013-02-20 00:11:00
let's say i know that my interval needs to be 5 minutes. so, i need to know the value and timestamps included in the 5 minute interval where the sum of 'value' is the highest. in my above example, the rows from '2013-02-20 00:04:00' to '2013-02-20 00:08:00' would give me a sum of 400+500+600+500+400 = 2400, which is the highest value over 5 minutes in that table.
i'm not opposed to using multiple tables if needed. but i'm trying to find a "best case scenario" interval. results can go either way, as long as they net the interval. if i get all data points over that interval, it still works. if i get the start and end points, i can use those as well.
i've found several sliding window problems for SQL, but haven't found any where the window size is the known factor, and the starting point is unknown.
SELECT *,
(
SELECT SUM(value)
FROM mytable mi
WHERE mi.tstamp BETWEEN m.tstamp - '5 minute'::INTERVAL AND m.tstamp
) AS maxvalue
FROM mytable m
ORDER BY
maxvalue DESC
LIMIT 1
In PostgreSQL 11 and above:
SELECT SUM(value) OVER (ORDER BY tstamp RANGE '5 minute' PRECEDING) AS maxvalue,
*
FROM mytable m
ORDER BY
maxvalue DESC
LIMIT 1