How to filter out NULLs from a window function - sql

I'm trying to return just non-null diff_review values > 1 and so far this has been my result:
This is the query that generated this:
select review_number
, review_number - lag(review_number, 1) over (partition by organization_id order by review_number) as diff_review from applications
where organization_id = 25144
and kind = 'annual_review'
and review_number is not null
order by diff_review desc
I can't use ...and diff_review is not null since you can't use aliases in where clauses, but I found out today you also can't use windowing functions in where clauses either.
This is the first time I've ever used windowing in SQL (I hadn't even heard of it until an hour ago) so I'm still very green at this. I'd appreciate someone clueing me in thanks!!!

You can use a table expression to "alias" the column. For example:
select *
from (
select review_number
, review_number - lag(review_number, 1)
over (partition by organization_id order by review_number)
as diff_review
from applications
where organization_id = 25144
and kind = 'annual_review'
and review_number is not null
) x
where diff_review is not null -- here you can use the aliased column
order by diff_review desc

Alternative method: a self join on row_number():
with omg AS (
select review_number
, row_number() over (partition by organization_id order by review_number) as rn
from applications
where organization_id = 25144
and kind = 'annual_review'
)
SELECT o2.review_number
, o2.review_number - o1.review_number AS diff_review
FROM omg o2
JOIN omg o1 ON (o2.review_number = o1.review_number AND o2.rn = o1.rn +1)
order by 2 desc
;

Related

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC
Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.
Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.
Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

Rank & find difference of value in the same column

I have the below table -
Here, I have created the "Order" column by using the rank function partitioned by case_identifier ordered by audit_date.
Now, I want to create a new column as below -
The logic for the new column would be -
select *,
case when [order] = '1' then [days_diff]
else (val of [days_diff] in rank 2) - (val of [days_diff] in rank 1) ...
end as '[New_Col]'
from TABLE
Can you please help me with the syntax? Thanks.
Take a look at the LAG function. It will provide you with what you want.
something like:
declare #temptable TABLE (case_id varchar(2), row_order int, days_diff float)
INSERT INTO #temptable values ('A',1,5)
INSERT INTO #temptable values ('A',2,3)
INSERT INTO #temptable values ('A',3,2)
INSERT INTO #temptable values ('B',1,5)
INSERT INTO #temptable values ('B',2,1)
--select * from #temptable
SELECT case_id,row_order, LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) AS prev_row,days_diff,
CASE
WHEN row_order = 1 THEN days_diff
ELSE LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) - days_diff
END AS newcolumn
FROM #temptable
order by case_id,row_order asc
SELECT case_id,row_order,LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) AS prev_row, days_diff,
COALESCE(LAG(days_diff,1) OVER (PARTITION BY case_id ORDER BY row_order) - days_diff , days_diff)
FROM #temptable
order by case_id,row_order asc
Other answers will use a coalesce in place of the CASE statement. It's probably faster, but I feel like this is clearer.
If you run both and look at the execution plans they are the same.
I believe the following query gets you what you want.
SELECT a.*,
'NEW DAYS DIFF' =
CASE
WHEN a.[order] = 1 THEN a.days_diff
ELSE a.days_diff - b.days_diff
END
FROM dbo.tblCaseDaysDiff a
INNER JOIN dbo.tblCaseDaysDiff b
ON
(b.CASE_ID = a.CASE_ID AND b.[order] + 1 = a.[order] ) -- Get the current row and compare with the next highest order
OR (b.CASE_ID = a.CASE_ID AND b.[order] = 1 AND a.[order] = 1) --WHEN ORDER = 1 Get days_diff value
ORDER BY a.CASE_ID, a.[order]
As it happens, you're already hip-deep in window functions, and as others have pointed out, LAG will do the trick. In general, though, you can always get the difference of two rows by making one row: by joining the table to itself.
with T (CASE_IDENTIFIER, AUDIT_DATE, order, days_diff)
as (
... your query ...
)
select a.*,
a.days_diff - coalesce(b.days_diff, 0) as delta_days_diff
from T as a left join T as b
on a.CASE_IDENTIFIER = b.CASE_IDENTIFIER
and b.days_diff = a.days_diff - 1
LAG METHOD
SELECT
CASE_IDENTIFIER
,AUDIT_DATE
,[order]
,days_diff
,days_diff - ISNULL(LAG(days_diff,1) OVER (PARTITION BY CASE_IDENTIFIER ORDER BY [order]),0) AS New_Column
FROM #Table
SELF JOIN METHOD
SELECT
t1.CASE_IDENTIFIER
,AUDIT_DATE
,t1.[order]
,t1.days_diff
,t1.days_diff - ISNULL(t2.days_diff,0) AS New_Column
FROM
#Table t1
LEFT JOIN #Table t2
ON t1.CASE_IDENTIFIER = t2.CASE_IDENTIFIER
AND t1.[order] - 1 = t2.[order]
I feel like a lot of the other answers are on the right track but there are some nuances or easier ways of writing some of them. Or also some of the answer provide the write direction but had something wrong with their join or syntax. Anyways, you don't need the CASE STATEMENT whether you use the LAG of SELF JOIN Method. Next COALESCE() is great but you are only comparing 2 values so ISNULL() works fine too for sql-server but either will do.

Update table with another column in the same table

I have a table like this
Test_order
Order Num Order ID Prev Order ID
987Y7OP89 919325 0
987Y7OP90 1006626 919325
987Y7OP91 1029350 1006626
987Y7OP92 1756689 0
987Y7OP93 1756690 0
987Y7OP94 1950100 1756690
987Y7OP95 1977570 1950100
987Y7OP96 2160462 1977570
987Y7OP97 2288982 2160462
Target table should be like below,
Order Num Order ID Prev Order ID
987Y7OP89 919325 0
987Y7OP90 1006626 919325
987Y7OP91 1029350 1006626
987Y7OP92 1756689 1029350
987Y7OP93 1756690 1756689
987Y7OP94 1950100 1756690
987Y7OP95 1977570 1950100
987Y7OP96 2160462 1977570
987Y7OP97 2288982 2160462
987Y7OP97 2288900 2288982
Prev Order ID should be updated with the Order ID from the previous record from the same table.
I'm trying to create a dummy data set and update..but it's not working..
WITH A AS
(SELECT ORDER_NUM, ORDER_ID, PRIOR_ORDER_ID,ROWNUM RID1 FROM TEST_ORDER),B AS (SELECT ORDER_NUM, ORDER_ID, PRIOR_ORDER_ID,ROWNUM+1 RID2 FROM TEST_ORDER)
SELECT A.ORDER_NUM,B.ORDER_ID,A.PRIOR_ORDER_ID,B.PRIOR_ORDER_ID FROM A,B WHERE RID1 = RID2
You could use Oracles Analytical Functions (also called Window functions) to pick up the value from the previous order:
UPDATE Test_Order
SET ORDERID = LAG(ORDERID, 1, 0) OVER (ORDER BY ORDERNUM ASC)
WHERE PrevOrderId = 0
See here for the documentation on LAG()
In sql-server you cannot use window function in update statement, not positive but don't think so in Oracle either. Anyway to get around that you can just update a cte as follows.
WITH cte AS (
SELECT
*
,NewPreviousOrderId = LAG(OrderId,1,0) OVER (ORDER BY OrderNum)
FROM
TableName
)
UPDATE cte
SET PrevOrderId = NewPreviousOrderId
And if you want to stick with the ROW_NUMBER route you were going this would be the way of doing it.
;WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (ORDER BY OrderNum) AS RowNum
FROM
TableName
)
UPDATE c1
SET PrevOrderId = c2.OrderId
FROM
cte c1
INNER JOIN cte c2
ON (c1.RowNum - 1) = c2.RowNum

SQL ROW_NUMBER OVER syntax

I have this:
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured
on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
AND RowNumber = 1
ORDER BY homepagefeatured.orderby
It fails on "Invalid column name RowNumber"
Not sure how to prefix it to access it?
Thanks
You can't reference the field like that. You can however use a subquery or a common-table-expression:
Here's a subquery:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa,
homepagefeatured.orderby
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
) T
WHERE RowNumber = 1
ORDER BY orderby
Rereading your query, since you aren't partitioning by any fields, the order by at the end is useless (it contradicts the order of the ranking function). You're probably better off using top 1...
Using top:
SELECT top 1 vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa,
homepagefeatured.orderby
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
ORDER BY homepagefeatured.orderby
(EDIT: This answer is a nearby pitfall. I leave it for documentation.)
Have a look here: Referring to a Column Alias in a WHERE Clause
This is the same situation.
It is a matter of how the sql query is parsed/compiled internally, so your field alias names are not known at the time the where clause is interpreted. Therefore you might try, in reference to the example above:
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid, vwmain.manufacturer,vwmain.model,vwmain.derivative, vwmain.isspecial,vwmain.created,vwmain.updated,vwmain.stocklevel, vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
AND ROW_NUMBER() OVER (ORDER BY vwmain.ch) = 1
ORDER BY homepagefeatured.orderby
Thus you see your expression in the select statement is exactly reused in the where clause.

SQL ROW_NUMBER with INNER JOIN

I need to use ROW_NUMBER() in the following Query to return rows 5 to 10 of the result. Can anyone please show me what I need to do? I've been trying to no avail. If anyone can help I'd really appreciate it.
SELECT *
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity
You need to stick it in a table expression to filter on ROW_NUMBER. You won't be able to use * as it will complain about the column name starRating appearing more than once so will need to list out the required columns explicitly. This is better practice anyway.
WITH CTE AS
(
SELECT /*TODO: List column names*/
ROW_NUMBER()
OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity) AS RN
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT /*TODO: List column names*/
FROM CTE
WHERE RN BETWEEN 5 AND 10
ORDER BY RN
You can use a with clause. Please try the following
WITH t AS
(
SELECT villa_data.starRating,
villa_data.capacity,
villa_data.bedrooms,
villa_prices.period,
villa_prices.price,
ROW_NUMBER() OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity ) AS 'RowNumber'
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT *
FROM t
WHERE RowNumber BETWEEN 5 AND 10;