Calculated column reference in DB2 - sql

I have a table with the columns:date1,name and price. What i want to do is to add 2 columns one right having the minimum and maximum dates of the consecutive dates of the same name.
I have written the following query that explains the rule:
select date1,name,price
case
when lag(name,1) over(order by date1 ASC,name ASC)=name then lag(minDate,1) over(order by date1 ASC,name ASC)
else date1
end as minDate,
case
when lag(name,1) over(order by date1 DESC,name DESC)=name then lag(maxDate,1) over(order by date1 DESC,name DESC)
else date1
end as maxDate
from MyTable order by date1 ASC,name ASC
My problem is that i get an "invalid context for minDate/maxDate" (SQLCODE=-206, SQLSTATE=42703)
Why can't i refer to a calculated column? Is there any other way?

It's complaining about the lag(maxDate,1), because maxDate isn't defined in that scope; it's not a column in MyTable, and aliases aren't available until after the SELECT list finishes in DB2 (they become available by pushing this into a subquery, or in clauses like HAVING).
Incidentally, your query can be better written as the following:
SELECT date1, name, price,
LAG(date1, 1, date1) OVER(PARTITION BY name ORDER BY date1) AS minDate,
LEAD(date1, 1, date1) OVER(PARTITION BY name ORDER BY date1) AS maxDate
FROM MyTable
ORDER BY date1, name
(I've left out ASC, as it's the default for all ordering)
LEAD(...) is the opposite function to LAG(...), and looks one row ahead. Using both functions this way allows the optimizer to compute just one window (what's specified in the OVER(...)).
The third parameter to the windowing functions here is a default value - in the case there wasn't a next/previous row, it returns date1 from the current row.
PARTITION BY is essentially a grouping for windowing function, and takes the place of the CASE ... WHEN ... in your original query. (In general, I find that such constructs reflect mentality common to imperative programming, which tends to run counter to the set-based nature of SQL. There are usually better ways to do things)

Related

How to get the previous non 0 value from a table?

I still have not read a simple solution to this problem.
This is the table that I have:
This is the result that I want:
Basically, if the value for column SeriesStartRowNum is 0, it should retrieve the lastest, previous non 0 value from the table.
Anyone that knows how to easily do this?
Thanks in advance!
This is untested, due to images of data, but this looks like a gaps and island problem. One method, therefore, would be to use a windowed COUNT to count the number of non-zero values so far in the column SeriesStartRowNum to create "groups", and then in get the MAX value for SeriesStartRowNum in that group:
WITH Groups AS(
SELECT RowNumber,
CustomerID,
Value,
StartDate,
EndDate,
SeriesStartRowNum,
COUNT(NULLIF(SeriesStartRowNum,0)) OVER (/*PARTTION BY ???*/ ORDER BY RowNumber) AS Grp
FROM dbo.YourTable)
SELECT RowNumber,
CustomerID,
Value,
StartDate,
EndDate,
MAX(SeriesStartRowNum) OVER (PARTITION BY Grp)
FROM Groups;
If your actual data is like the sample data that you posted, with the 1st row of each combination of CustometID and Value having a non-0 value and all the rest are 0s, then all you need is MAX() window function:
SELECT RowNumber, CustometID, Value, StartDate, EndDate,
MAX(SeriesStartRowNum) OVER (PARTITION BY CustometID, Value) SeriesStartRowNum
FROM tablename

SQL Server : get date range and number of days

I have 4 columns (two columns filter) and one column date and one column flag.
For a combination of column 1 and column 2, and the open flag, I want to find the open date, the close date and number of days where it was open.
Here is my example and what I want to find with SQL is with the blue/grey color
You can use DATEDIFF() function as shown below if you have already two dates available as OpenDate and CloseDate in your case.
Select DATEDIFF(DAY, OpenDate, CloseDate) as NumberOfOpenDays from YourTable
This looks a gaps-and-islands problem -- but the islands are already defined by opendt (and perhaps col1 and col2).
So, you can just use row_number():
select row_number() over (partition by col1, col2, opendt order by effectivedt)

Why Window Functions Require My Aggregated Column in Group

I have been working with window functions a fair amount but I don't think I understand enough about how they work to answer why they behave the way they do.
For the query that I was working on (below), why am I required to take my aggregated field and add it to the group by? (In the second half of my query below I am unable to produce a result if I don't include "Events" in my second group by)
With Data as (
Select
CohortDate as month
,datediff(week,CohortDate,EventDate) as EventAge
,count(distinct case when EventDate is not null then GUID end) as Events
From MyTable
where month >= [getdate():month] - interval '12 months'
group by 1, 2
order by 1, 2
)
Select
month
,EventAge
,sum(Events) over (partition by month order by SubAge asc rows between unbounded preceding and current row) as TotEvents
from data
group by 1, 2, Events
order by 1, 2
I have run into this enough that I have just taken it for granted, but would really love some more color as to why this is needed. Is there a way I should be formatting these differently in order to avoid this (somewhat non-intuitive) requirement?
Thanks a ton!
What you are looking for is presumably a cumulative sum. That would be:
select month, EventAge,
sum(sum(Events)) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from data
group by 1, 2
order by 1, 2 ;
Why? That might be a little hard to explain. Perhaps if you see the equivalent version with a subquery it will be clearer:
select me.*
sum(sum_events) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from (select month, EventAge, sum(events) as sum_events
from data
group by 1, 2
) me
order by 1, 2 ;
This is pretty much an exactly shorthand for the query. The window function is evaluated after aggregation. You want to sum the SUM of the events after the aggregation. Hence, you need sum(sum(events)). After the aggregation, events is no longer available.
The nesting of aggregation functions is awkward at first -- at least it was for me. When I first started using window functions, I think I first spent a few days writing aggregation queries using subqueries and then rewriting without the subqueries. Quickly, I got used to writing them without subqueries.

Getting the Delta between the rows on SQL

I need to get the difference between the row on the same column. I can use lag()over but how can I get the result like the pic, if the name changed then start counting as new
Something like this:
select t.*,
(call_received -
lag(call_received, 1, 0) over (partition by agent_name order by id)
) as delta
from t;
Note that this uses the rarely seen three-argument form of lag().

Calculating deltas in time series with duplicate & missing values

I have an Oracle table that consist of tuples of logtime/value1, value2..., plus additional columns such as a metering point id. The values are sampled values of different counters that are each monotonically increasing, i.e. a newer value cannot be less than an older value. However, values can remain equal for several samplings, and values can miss sometimes, so the corresponding table entry is NULL while other values of the same logtime are valid. Also, the intervals between logtimes are not constant.
In the following, for simplicity I will regard only the logtime and one counter value.
I have to calculate the deltas from each logtime to the previous one. Using the method described in another question here gives two NULL deltas for each NULL value because two subtractions are invalid. A second solution fails when consecutive values are identical since the difference to the previous value is calculated twice.
Another solution is to construct a derived table/view with those NULL values replaced by the latest older valid value. My approach looks like this:
SELECT A.logtime, A.val,
(A.val - (SELECT MAX(C.val)
FROM tab C
WHERE logtime =
(SELECT MAX(B.logtime)
FROM tab B
WHERE B.logtime < A.logtime AND B.val IS NOT NULL))) AS delta
FROM tab A;
I suspect that this will result in a quite inefficient query, especially when doing this for all N counters in the table which will result in (1 + 2*N) SELECTs. It also does not take advantage from the fact that the counter is monotonically increasing.
Are there any alternative approaches? I'd think others have similar problems, too.
An obvious solution would of course be filling in those NULL values constructing a new table or modifying the existing table, but unfortunately that is not possible in this case. Avoiding/eliminating them on entry isn't possible either.
Any help would be greatly appreciated.
select
logtime,
val,
last_value(val ignore nulls) over (order by logtime)
as not_null_val,
last_value(val ignore nulls) over (order by logtime) -
last_value(val ignore nulls) over (order by logtime rows between unbounded preceding and 1 preceding)
as delta
from your_tab order by logtime;
I found a way to avoid the nested SELECT statements using Oracle SQL's build-in LAG function:
SELECT logtime, val,
NVL(val-LAG(val IGNORE NULLS) OVER (ORDER BY logtime), 0) AS delta
FROM tab;
seems to work as I intended.
(Repeated here as a separate answer)