Replace null values with most recent non-null values SQL - sql

I have a table where each row consists of an ID, date, variable values (eg. var1).
When there is a null value for var1 in a row, I want like to replace the null value with the most recent non-null value before that date for that ID. How can I do this quickly for a very large table?
So presume I start with this table:
+----+------------|-------+
| id |date | var1 |
+----+------------+-------+
| 1 |'01-01-2022'|55 |
| 2 |'01-01-2022'|12 |
| 3 |'01-01-2022'|45 |
| 1 |'01-02-2022'|Null |
| 2 |'01-02-2022'|Null |
| 3 |'01-02-2022'|20 |
| 1 |'01-03-2022'|15 |
| 2 |'01-03-2022'|Null |
| 3 |'01-03-2022'|Null |
| 1 |'01-04-2022'|Null |
| 2 |'01-04-2022'|77 |
+----+------------+-------+
Then I want this
+----+------------|-------+
| id |date | var1 |
+----+------------+-------+
| 1 |'01-01-2022'|55 |
| 2 |'01-01-2022'|12 |
| 3 |'01-01-2022'|45 |
| 1 |'01-02-2022'|55 |
| 2 |'01-02-2022'|12 |
| 3 |'01-02-2022'|20 |
| 1 |'01-03-2022'|15 |
| 2 |'01-03-2022'|12 |
| 3 |'01-03-2022'|20 |
| 1 |'01-04-2022'|15 |
| 2 |'01-04-2022'|77 |
+----+------------+-------+

cte suits perfect here
this snippets returns the rows with values, just an update query and thats all (will update my response).
WITH selectcte AS
(
SELECT * FROM testnulls where var1 is NOT NULL
)
SELECT t1A.id, t1A.date, ISNULL(t1A.var1,t1B.var1) varvalue
FROM selectcte t1A
OUTER APPLY (SELECT TOP 1 *
FROM selectcte
WHERE id = t1A.id AND date < t1A.date
AND var1 IS NOT NULL
ORDER BY id, date DESC) t1B
Here you can dig further about CTEs :
https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver16

Related

Merging multiple "state-change" time series

Given a number of tables like the following, representing state-changes at time t of an entity identified by id:
| A | | B |
| t | id | a | | t | id | b |
| - | -- | - | | - | -- | - |
| 0 | 1 | 1 | | 0 | 1 | 3 |
| 1 | 1 | 2 | | 2 | 1 | 2 |
| 5 | 1 | 3 | | 3 | 1 | 1 |
where t is in reality a DateTime field with millisecond precision (making discretisation infeasible), how would I go about creating the following output?
| output |
| t | id | a | b |
| - | -- | - | - |
| 0 | 1 | 1 | 3 |
| 1 | 1 | 2 | 3 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 1 |
| 5 | 1 | 3 | 1 |
The idea is that for any given input timestamp, the entire state of a selected entity can be extracted by selecting one row from the resulting table. So the latest state of each variable corresponding to any time needs to be present in each row.
I've tried various JOIN statements, but I seem to be getting nowhere.
Note that in my use case:
rows also need to be joined by entity id
there may be more than two source tables to be merged
I'm running PostgreSQL, but I will eventually translate the query to SQLAlchemy, so a pure SQLAlchemy solution would be even better
I've created a db<>fiddle with the example data.
I think you want a full join and some other manipulations. The ideal would be:
select t, id,
last_value(a.a ignore nulls) over (partition by id order by t) as a,
last_value(b.b ignore nulls) over (partition by id order by t) as b
from a full join
b
using (t, id);
But . . . Postgres doesn't support ignore nulls. So an alternative method is:
select t, id,
max(a) over (partition by id, grp_a) as a,
max(b) over (partition by id, grp_b) as b
from (select *,
count(a.a) over (partition by id order by t) as grp_a,
count(b.b) over (partition by id order by t) as grp_b
from a full join
b
using (t, id)
) ab;

How do i get the latest user udpated column value in a table based on timestamp entry on a different table in SQL Server?

I have a temp table #StatusInfo with the following data
+---------+--------------+-------+-------------------------+--+
| OrderNo | GroupLineNum | Type1 | UpdateDate | |
+---------+--------------+-------+-------------------------+--+
| Order85 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | 2 | 2 | 2019-11-25 05:32:23.773 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | 1 | 2 | 2019-11-25 05:43:37.637 | | B
+---------+--------------+-------+-------------------------+--+
| Order87 | 2 | 2 | 2019-11-25 05:42:32.390 | | A
+---------+--------------+-------+-------------------------+--+
| Order88 | NULL | 1 | 2019-11-25 06:35:13.000 | |
+---------+--------------+-------+-------------------------+--+
| Order88 | 1 | 2 | 2019-11-25 06:39:16.170 | |
+---------+--------------+-------+-------------------------+--+
Any update the user does on an order will be pulled into this temp table. Type 1 column with value 2 denotes a 'Required Date' field change by the user. The timestamp when the user made the change is the last column.
I have another temp table #LineInfo with the following data. This table is created by joining other tables and a left join with the above table too. The 'LineNum' column from below table will match the 'GroupLineNum' column in the above table for Type1=2
+---------+-----------+---------+------------+-------------------------+-------+
| OrderNo | RowNumber | LineNum | TotalCost | ReqDate | Type1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 2 | 2 | 265.560000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 2 | 2 | 265.560000 | 2019-12-28 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 1 | 1 | 309.110000 | 2020-01-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 2 | 2 | 265.560000 | 2020-01-01 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 1 | 1 | 309.110000 | 2019-11-29 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 2 | 2 | 265.560000 | 2019-12-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
I will be joining #lineInfo with other tables to generate a new table with only one record for an orderno. Its grouped by orderno.
What I need to do is ensure that the new selectquery will have a column 'ReqDate' which will be the latest ReqDate value for the order.
For example, Order87 has two lines in the order. User updated Line 2 first at '2019-11-25 05:42:32.390' as seen in the row marked 'A' followed by Line 1 marked B # '2019-11-25 05:43:37.637 ' from the first table.
The new query should have the data from LineInfo and only the 'ReqDate' value matching the 'LineNum' that has the maximum of 'UpdateDate' column for Type1=2 and group by orderno.
So in our example, the output should have the ReqDate value '2020-01-31 23:59:00.000'.
In short, an order should have the most recently updated required date. Order can have multiple line items where reqdate is udpated. If there is no entry in #StatusInfo table with Type2 for an order, then any one of the ReqDate value from the #LineInfo table will suffice. Maybe the first line
I wrote something like this but it doesnt pull orders without any entry in StatusInfo table. Those orders will have a default value even though user didnt udpate and i am not sure how to join the result of this with LineInfo table to set the latest value
Select SIT.Orderno, max_date,grouplinenum
from #StatusInfo SIT
inner join
(SELECT Orderno, MAX(ActDate) as max_date
FROM #StatusInfo SI
WHERE SI.Type1=2
GROUP BY SI.Orderno)a
on a.Orderno = SIT.Orderno and a.max_date = SIT.ActDate
This is what I did. I created the blow CTE to load orders with req date change in order of Updated date and assigned it row number. Record with row number 1 will be the most recently updated date
;WITH cteLatestReqDate AS ( --We need to pull the latest ReqDate value the user set. So we are are ordering the SIT table by ActDate and assigning a row number and respective line's required date here
SELECT SIT.OrderNo, SIT.UpdateDate, SIT.GroupLineNum, LLI.ReqDate,
ROW_NUMBER() OVER (PARTITION BY SIT.OrderNo ORDER BY ActDate DESC) AS RowNum
FROM #StatusInfo SIT INNER JOIN #LineLevelInfo LLI ON SIT.OrderNo = OI.OrderNo AND SIT.GroupLineNum = LLI.LineNum
WHERE SIT.Type1 = 2
)
and then I added the below condition to my select query. Below select query is partial
SELECT
CASE WHEN MAX(LRD.ReqDate) IS NULL THEN CAST(FORMAT(MAX(LLI.ReqDate), 'yyMMdd') AS NVARCHAR(10))
ELSE CAST(FORMAT(MAX(LRD.ReqDate), 'yyMMdd') AS NVARCHAR(10)) END AS LatestReqDate
FROM #LineLevelInfo LLI
LEFT JOIN(SELECT * FROM cteLatestReqDate WHERE RowNum = 1)LRD ON LRD.OrderNo = LLI.OrderNo And LRD.GroupLineNum = LLI.LineNum

How to get a child table records which exist in results of a table valued function

I have a table named TBL_WorkOrder as below :
+----+------------+----------------+
| Id | SystemCode | WorkOrderTitle |
+----+------------+----------------+
| 1 | C001 | Title 1 |
| 2 | C002 | Title 2 |
| 3 | C003 | Title 3 |
+----+------------+----------------+
and another table named TBL_WorkGroup
+----+---------------+
| Id | WorkGroupName |
+----+---------------+
| 1 | WorkGroup1 |
| 2 | WorkGroup2 |
| 3 | WorkGroup3 |
+----+---------------+
Each work order can contain different work groups as below (TBL_WorkOrderGroup)
+----+-------------+-------------+
| Id | WorkOrderId | WorkGroupId |
+----+-------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 3 |
| 3 | 2 | 1 |
+----+-------------+-------------+
The problem is that I send a varchar string like '1,3' to the stored procedure. This varchar is changed to a table using a table valued function. I want to obtain the work orders that contain both '1' and '3' as their work groups.
What should i do in this case?
DECLARE #String VARCHAR(100) = '1,3'
;WITH Split AS
(
SELECT SUBSTRING(#String,0,CHARINDEX(',',#String)) SplitStr,SUBSTRING(#String,CHARINDEX(',',#String)+1,LEN(#String)) RemainStr
UNION ALL
SELECT CASE WHEN CHARINDEX(',',RemainStr) = 0 THEN RemainStr ELSE SUBSTRING(RemainStr,0,CHARINDEX(',',RemainStr)) END,
CASE WHEN CHARINDEX(',',RemainStr) = 0 THEN '' ELSE SUBSTRING(RemainStr,CHARINDEX(',',RemainStr)+1,LEN(RemainStr)) END
FROM Split
WHERE ISNULL(RemainStr,'') <> ''
)
SELECT SplitStr FROM Split

No rowid or key need most recent row

I am trying my hardest to get a list of the most recent rows by date in a DB2 file. The file has no unique id, so I am trying to get the entries by matching a set of columns. I need DESCGA most importantly as that changes often. When it does they keep another row for historical reasons.
SELECT B.COGA, B.COMSUBGA, B.ACCTGA, B.PRFXGA, B.DESCGA
FROM mylib.myfile B
WHERE
(
SELECT COUNT(*)
FROM
(
SELECT A.COGA,A.COMSUBGA,A.ACCTGA,A.PRFXGA,MAX(A.DATEGA) AS EDATE
FROM mylib.myfile A
GROUP BY A.COGA, A.COMSUBGA, A.ACCTGA, A.PRFXGA
) T
WHERE
(B.ACCTGA = T.ACCTGA AND
B.COGA = T.COGA AND
B.COMSUBGA = T.COMSUBGA AND
B.PRFXGA = T.PRFXGA AND
B.DATEGA = T.EDATE)
) > 1
This is what I am trying and so far I get 0 results.
If I remove
B.ACCTGA = T.ACCTGA AND
It will return results (of course wrong).
I am using ODBC in VS 2013 to structure this query.
I have a table with the following
| a | b | descri | date |
-----------------------------
| 1 | 0 | string | 20140102 |
| 2 | 1 | string | 20140103 |
| 1 | 1 | string | 20140101 |
| 1 | 1 | string | 20150101 |
| 1 | 0 | string | 20150102 |
| 2 | 1 | string | 20150103 |
| 1 | 1 | string | 20150103 |
and i need
| 1 | 0 | string | 20150102 |
| 2 | 1 | string | 20150103 |
| 1 | 1 | string | 20150103 |
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by date desc) as seqnum
from mylib.myfile t
) t
where seqnum = 1;

ASP.NET Join duplicate results into one and sum other fields

Currently I have the following table in database dbo.test :
agentid | serv | func | com |
--------+------+------+-----|
ampg | 1 | 0 | 1 |
jrep | 0 | 0 | 1 |
ampg | 1 | 1 | 0 |
jrep | 1 | 0 | 1 |
Desired result:
agentid | serv | func | com |
--------+------+------+-----|
ampg | 2 | 1 | 1 |
jrep | 1 | 0 | 2 |
So it recognizes same agent id and combines into one row summing up the values of each other column. I will then present it in gridview in visual. Is it possible?
Thank you
Try this:
select agentid ,sum(serv) as [serv], sum(func) as [func], sum(com) as [com]
from tablename
group by agentid
SQL Group By:
http://www.w3schools.com/sql/sql_groupby.asp
select
agentid,
sum(serv) [sum_serv],
sum(func) [sum_func],
sum(com) [sum_com]
from [table]
group by agentid