SQL Server - How to fill in missing column values - sql

I have set of records at day level with 2 columns:
Invoice_date
Invoice_amount
For few records, value of invoice_amount is missing.
I need to fill invoice_amount values where it is NULL using this logic:
Look for next available invoice_amount (in dates later than the blank value record date)
For records with invoice_amount still blank (invoice_amount not present for future dates), look for most previous invoice_amount (in dates before the blank value date)
Note: We have consecutive multiple days where invoice_amount is blank in the dataset:

use CROSS APPLY to find next and previous not null Invoice Amount
update p
set Invoice_Amount = coalesce(nx.Invoice_Amount, pr.Invoice_Amount)
from Problem p
outer apply -- Next non null value
(
select top 1 *
from Problem x
where x.Invoice_Amount is not null
and x.Invoice_Date > p.Invoice_Date
order by Invoice_Date
) nx
outer apply -- Prev non null value
(
select top 1 *
from Problem x
where x.Invoice_Amount is not null
and x.Invoice_Date < p.Invoice_Date
order by Invoice_Date desc
) pr
where p.Invoice_Amount is null
this updates back your table. If you need a select query, it can be modify to it easily

Not efficient but seems to work. Try:
update test set invoice_amount =
coalesce ((select top 1 next.invoice_amount from test next
where next.invoiceDate > test.invoiceDate and next.invoice_amount is not null
order by next.invoiceDate),
(select top 1 prev.invoice_amount from test prev
where prev.invoiceDate < test.invoiceDate and prev.invoice_amount is not null
order by prev.invoiceDate desc))
where invoice_amount is null;

As per given example you could use window function with self join
update t set t.amount = tt.NewAmount
from table t
inner join (
select Dates, coalesce(min(amount) over (order by dates desc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW),
min(amount) over (order by dates asc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) NewAmount
from table t
) tt on tt.dates = t.dates
where t.amount is null

Related

Exclude nulls from postgres window function

For each row, I want to take average of last 20 non-null values using.
The window function is taking 20 rows below including the null ones and calculating average, while I want average of last 20 non null rows.
What I have tried?
WITH adv_calculated AS (
SELECT security_id AS sec_id,
date AS pd,
AVG(volume)
OVER (PARTITION BY (security_id) ORDER BY date ROWS BETWEEN 20 PRECEDING AND CURRENT ROW) AS adv
FROM tbl_financial_index_data
WHERE volume IS NOT NULL
GROUP BY security_id, date
)
UPDATE tbl_financial_index_data
SET adv = amc.adv
FROM adv_calculated amc
WHERE amc.sec_id = security_id
AND amc.pd = date
This solution works good for all rows where volume is NOT null but it does not calculate the adv/average for the rows where volume is NULL.
Then for the null adv and volume rows, I have to run this query which is really slow
UPDATE tbl_financial_index_data
SET average_daily_volume =
(SELECT avg(t.volume)
FROM (
SELECT a.volume
FROM tbl_financial_index_data a
WHERE a.security_id = tbl_financial_index_data.security_id
AND a.date::date <= tbl_financial_index_data.date::date
AND a.volume IS NOT NULL
ORDER BY a.date DESC
LIMIT 21
) t)
WHERE volume IS NULL;
I want to avoid using the second query and calculate ADV for all rows using first query (because it is much faster).
Simply omit the WHERE condition WHERE volume IS NOT NULL, then you should get what you want.
You can nest the query in an outer query to remove the undesired values later:
WITH adv_calculated AS (
SELECT ...
FROM (SELECT AVG(volume) OVER (... ROWS BETWEEN ... AND ...),
...
FROM tbl_financial_index_data
GROUP BY security_id, date
) AS subq
WHERE volume IS NOT NULL
)
SELECT ...
This is just a work around
Okay, so I found the solution.
If the volume is null for some row then the adv for that row is going to be equal to the previous non-null volume row's adv, so I had to find some way to carry forward previous non-null adv for rows where it is null.
I was able to find a way to do that from this answer.
Here's the code to carry forward the non-null value:
WITH temp_adv_filled_values AS (
SELECT security_id,
date,
FIRST_VALUE(adv) OVER W AS adv
FROM (
SELECT security_id,
date,
adv,
SUM(CASE WHEN volume IS NULL THEN 0 ELSE 1 END)
OVER (PARTITION BY security_id ORDER BY date ) AS value_partition
FROM tbl_financial_index_data
) AS q
WINDOW W AS (PARTITION BY security_id, value_partition ORDER BY date)
)
UPDATE tbl_financial_index_data tfid
SET adv = tmcfv.adv
FROM temp_adv_filled_values tmcfv
WHERE tmcfv.security_id = tfid.security_id
AND tmcfv.date = tfid.date;

SQL : Date condition Updated

I want to update my column depending on dates from two different table. Below is my query
UPDATE T_TRN_DEAL_DETAILS SET MATURITY_DT =
(CASE
WHEN (Select top 1 INS.INVOICE_DUE_DT as invoice_date from T_TRN_INVOICE_DETAILS IND
INNER jOIN T_TRN_INVOICE_SUMMARY INS on IND.INVOICE_ID=INS.INVOICE_ID
where IND.DEAL_ID ='1234'
order by INS.INVOICE_DUE_DT desc ) > (Select DEAL_ID,MATURITY_DT as invoice_date from T_TRN_DEAL_DETAILS WHERE DEAL_ID ='1234')
THEN (Select top 1 INS.INVOICE_DUE_DT as invoice_date from T_TRN_INVOICE_DETAILS IND
INNER jOIN T_TRN_INVOICE_SUMMARY INS on IND.INVOICE_ID=INS.INVOICE_ID
where IND.DEAL_ID ='1234'
order by INS.INVOICE_DUE_DT desc)
ELSE (Select DEAL_ID,MATURITY_DT as invoice_date from T_TRN_DEAL_DETAILS WHERE DEAL_ID ='DL18111213586')
END )
WHERE DEAL_ID ='1234'
But I am getting below error
Only one expression can be specified in the select list when the
subquery is not introduced with EXISTS
Although I am just comparing two dates.
The error is clear:
a subquery must return only one column but in your ELSE breanch you have written as follow:
(Select DEAL_ID,MATURITY_DT as invoice_date
from T_TRN_DEAL_DETAILS WHERE DEAL_ID ='DL18111213586')
So you try to return two columns. You must remove DEAL_ID column, because your MATURITY_DT is the field you want to use to update your main table.
The same error you have done in the first branch when you try to compare subqueries (>) where the second subquery returns two columns instead of one.
(Select DEAL_ID,MATURITY_DT as invoice_date
from T_TRN_DEAL_DETAILS WHERE DEAL_ID ='1234')

Get value of same column from next row if current column value is null

I have a table and I want to select one column such as if it's record not found(cause I have joins with other tables) or exists but is null than select value of same column from next row. I tried to use isnull and coalesce functions but I am unable to get value of next row.
Any help or link would be appreciated.
Here is my query so far
Select
(select top 1 OpenPrice from #tbltempData where Dated=D.Dated) [Open],
ISNULL((select top 1 ClosePrice from #tbltempData where Dated= DATEADD(hour,#Interval-1, D.Dated)),
(select top 1 ClosePrice from #tbltempData where Dated= DATEADD(hour,0, D.Dated))) [Close],
[Min],[Max],Dated
from #tbltempData2 D
Order BY Dated Asc
Open column is having null values.
here is Screenshot of my sample data
and here is output am getting
Details: as I have records in my sample data for date '28/06/2019' and time for first record is 9 am and I am grouping my data in 2 hours so after grouping my first group record of same date is for 8am and as I have no value for that time in sample data so am getting null values. to avoid this scenario I want to get OpenPrice value where time is 9am(in sample data) of same date cause that time is in same group.
If you want "next row" always greater than current time
[Open] = (
select top 1 OpenPrice
from #tbltempData t
where DATEDIFF(day,t.Dated,D.Dated) = 0 -- make sure the price for same day
AND t.Dated>=D.Dated
ORDER BY t.Dated ASC
)
In case you want "next row" be closest available time slot
[Open] = (
select top 1 OpenPrice
from #tbltempData t
where DATEDIFF(day,t.Dated,D.Dated) = 0 -- make sure the price for same day
ORDER BY ABS(DATEDIFF(minute,t.Dated,D.Dated)) ASC
)
I think a correlated subquery does what you want:
select d.*,
(select top (1) ClosePrice
from #tbltempData td
where td.Dated <= D.Dated
order by td.Dated desc
) as ClosePrice
from #tbltempData2 d
order by dated Asc

TSQL syntax to feed results into subquery

I'm after some help on how best to write a query that does the following. I think I need a subquery but I don't know how to use the data returned in the row to feed back into the subquery without hardcoding values? A subquery may not be the right thing here?
Ideally I only want 1 variable ...WHERE t_Date = '2018-01-01'
Desired Output:
The COUNT Criteria column has the following rules
Date < current row
Area = current row
Name = current row
Value = 1
For example, the first row indicates there are 2 records with Date < '2018-01-01' AND Area = 'Area6' AND Name = 'Name1' AND Value = 1
Example Data:
SQLFiddle: http://sqlfiddle.com/#!18/92ba3/4
Effectively I only want to return the first 2 rows but summarise the historic data into a column based on the output in that column.
The right way to do this is to use the cumulative sum functionality in ANSI SQL and SQL Server since 2012:
select t.*,
sum(case when t.value = 1 then 1 else 0 end) over (partition by t_area, t_name order by t_date)
from t;
This actually includes the current row. If you have only one row per date (for the area/name combo), then you can just subtract it or use a windowing clause:
select t.*,
sum(case when t.value = 1 then 1 else 0 end) over
(partition by t_area, t_name
order by t_date
rows between unbounded preceding and 1 preceding
)
from t;
Use a self join to find records in the same table that are related to a particular record:
SELECT t1.t_Date, t1.t_Area, t1.t_Name, t1.t_Value,
COUNT(t2.t_Name) AS COUNTCriteria
FROM Table1 as t1
LEFT OUTER JOIN Table1 as t2
ON t1.t_Area=t2.t_Area
AND t1.t_Name=t2.T_Name
AND t2.t_Date<t1.t_Date
AND t2.t_Value=1
GROUP BY t1.t_Date, t1.t_Area, t1.t_Name, t1.t_Value

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.