Recreating historical field changes from an EAV audit table

Recreating historical field changes from an EAV audit table - sql

Edit :- for any one with a similar problem, there's a good article covering various solutions here
Given the following tables recs and audit, how would one in SQL transform into the resultant table.
A little background, the former table is a simplified example of an standard SQL table used in a CRUD application collecting data. On any update to a column a record is written to an audit table in EAV form. There is now a requirement to transform the recs table into a historical table with a copy of each row as it was at a point in time for reporting (the data will be stored in a star schema data warehouse ultimately.
It seems like this would be straightforward enough in a procedural language and manageable (if ugly) using cursors, but is there a set based approach that would work?
I'm using T-SQL right now, but I imagine that I could port any examples or ideas from any sufficiently rich SQL dialect.
Setup
create table recs
(
ID int identity(1,1) not null primary key,
Column1 nvarchar(30) not null,
Column2 nvarchar(30) not null,
sys_updated_on datetime not null
)
create table audit
(
ID int identity(1,1) not null primary key,
recs_id int not null,
fieldname nvarchar(30) not null,
old_value nvarchar(30) not null,
new_value nvarchar(30) not null,
sys_updated_on datetime not null
)
insert into recs (Column1, Column2, sys_updated_on)
values ('A', 'B', '2012-10-31 22:00')
, ('C', 'D', '2012-10-31 22:30')
insert into audit (recs_id, fieldname, old_value, new_value, sys_updated_on)
values (1, 'Column1', 'Z', 'A', '2012-10-31 22:00')
, (2, 'Column2','X', 'D', '2012-10-31 22:30')
, (1, 'Column1', 'Y', 'Z', '2012-10-31 21:00')
Resultant Data
Recs
ID Column1 Column2 sys_updated_on
1 A B 31/10/2012 22:00:00
2 C D 31/10/2012 22:30:00
Audit
ID recs_id fieldname old_value new_value sys_updated_on
1 1 Column1 Z A 31/10/2012 22:00:00
2 2 Column2 X D 31/10/2012 22:30:00
3 1 Column1 Y Z 31/10/2012 21:00:00
Desired result
recs_id sys_updated_on Column1 Column2
1 null Y B
1 31/10/2012 21:00:00 Z B
1 31/10/2012 22:00:00 A B
2 null C X
2 31/10/2012 22:30:00 C D

Interesting....
Try this
;with cte as
(
select recs_id, sys_updated_on, column1, column2,
ROW_NUMBER() over (order by sys_updated_on) rn
from audit a
pivot
(max(old_value) for fieldname in (column1,column2)) p
)
select
recs_id,
case when ud1>ud2 then ud1 else ud2 end as updateddate,
coalesce(cte.column1,mc1,recs.column1),
coalesce(cte.column2,mc2,recs.column2)
from cte
outer apply
(
select top 1
column1 as mc1, sys_updated_on as ud1
from cte prev1
where prev1.recs_id=cte.recs_id and prev1.rn<cte.rn
order by prev1.rn desc
) r1
outer apply
(
select top 1
column2 as mc2, sys_updated_on as ud2
from cte prev2
where prev2.recs_id=cte.recs_id and prev2.rn<cte.rn
order by prev2.rn desc
) r2
inner join recs on cte.recs_id = recs.id
where cte.sys_updated_on is not null
union
select id, sys_updated_on, Column1, Column2 from recs
order by recs_id, cte.updateddate

Related

MS SQL Server counting nulls by column in a table

I have been given a task to find how many nulls there are for each column in a given table. The table has many columns (50-80, depending on the individual table). I would like the result pivoted so the column names are records, like this:
column_name
null_count
columnA
253
columnB
25
columnC
0
columnD
456
...
...
Currently, I do
SELECT 'columnA' as column_name, sum(case when columnA IS NULL then 1 else 0 end) null_count from [table] UNION
SELECT 'columnB', sum(case when columnB IS NULL then 1 else 0 end) from [table] UNION
...
for all the rows. This is kind of tedious and I would like to know if there is a more flexible way to do this in MS Sql Server management studio. Maybe something that can step through each record in [database].INFORMATION_SCHEMA.COLUMNS.

Please try the following solution.
It is using SQL Server's XML and XQuery magic.
No need for dynamic SQL.
We are leveraging a fact that FOR XML ... clause omits columns with NULL values.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
-- DDL and sample data population, start
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, columnA INT, columnB VARCHAR(5), columnC INT);
INSERT dbo.tbl (columnA, columnB, columnC) VALUES
(1, NULL, NULL),
(2, 'city', NULL),
(NULL, 'cat', NULL),
(100, NULL, NULL);
-- DDL and sample data population, end
DECLARE #total_row_counter BIGINT = (SELECT COUNT_BIG(*) FROM dbo.tbl);
;WITH rs AS
(
SELECT column_name = c.value('local-name(.)', 'sysname')
FROM dbo.tbl AS p
CROSS APPLY (SELECT *
FROM dbo.tbl AS c
WHERE c.ID = p.ID
FOR XML PATH('root'), TYPE) AS t1(x)
CROSS APPLY x.nodes('/root/*') AS t2(c)
)
SELECT sch.column_name, null_counter = #total_row_counter - COUNT_BIG(rs.column_name)
FROM INFORMATION_SCHEMA.COLUMNS AS sch
LEFT OUTER JOIN rs ON sch.COLUMN_NAME = rs.column_name
WHERE TABLE_CATALOG = 'TEMPDB'
AND TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'tbl'
GROUP BY sch.column_name
ORDER BY sch.column_name;
SQL #2
It covers table column names with spaces. A minor FOR XML ... clause change automatically converts spaces into _x0020_ in the XML element names.
<root>
<ID>3</ID>
<column_x0020_B>cat</column_x0020_B>
</root>
The rest is identical.
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
-- DDL and sample data population, start
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, columnA INT, [column B] VARCHAR(5), columnC INT);
INSERT dbo.tbl (columnA, [column B], columnC) VALUES
(1, NULL, NULL),
(2, 'city', NULL),
(NULL, 'cat', NULL),
(100, NULL, NULL);
-- DDL and sample data population, end
DECLARE #total_row_counter BIGINT = (SELECT COUNT_BIG(*) FROM dbo.tbl);
;WITH rs AS
(
SELECT column_name = REPLACE(c.value('local-name(.)', 'sysname'), '_x0020_', SPACE(1))
FROM dbo.tbl AS p
CROSS APPLY (SELECT *
FROM dbo.tbl AS [root]
WHERE [root].ID = p.ID
FOR XML AUTO, ELEMENTS, TYPE) AS t1(x)
CROSS APPLY x.nodes('/root/*') AS t2(c)
)
SELECT sch.column_name, null_counter = #total_row_counter - COUNT_BIG(rs.column_name)
FROM INFORMATION_SCHEMA.COLUMNS AS sch
LEFT OUTER JOIN rs ON sch.COLUMN_NAME = rs.column_name
WHERE TABLE_CATALOG = 'TEMPDB'
AND TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'tbl'
GROUP BY sch.column_name
ORDER BY sch.column_name;
Output
+-------------+--------------+
| column_name | null_counter |
+-------------+--------------+
| columnA | 1 |
| columnB | 2 |
| columnC | 4 |
| ID | 0 |
+-------------+--------------+

Select rows after the current row which are null and combine them to rows

I have written the following SQL code to display the data as rows, where the row after data is having null values except on the description column.
DECLARE #StudentData TABLE
(
RowID INT NOT NULL PRIMARY KEY IDENTITY(1,1),
RemarksDate NVARCHAR(20),
StudentName NVARCHAR(1000),
Description NVARCHAR(MAX),
TotStudents NVARCHAR(100)
)
INSERT INTO #StudentData(RemarksDate, StudentName, Description, TotStudents)
VALUES('2/1/2021', NULL, 'Poor In English', '14'),
(NULL, NULL, '1 ABC', NULL),
(NULL, NULL, '1 XYZ', NULL),
(NULL, NULL, '1 MNO', NULL),
(NULL, NULL, '1 IGH', NULL),
(NULL, NULL, '10 KKK', NULL),
('2/1/2021', NULL, 'Poor In Maths', '5'),
(NULL, NULL, '5 PQR', NULL),
('2/8/2021', NULL, 'Poor In Social', '1'),
(NULL, NULL, '1 RST', NULL)
This results in the output as follows:
I have written the following query to group and display rows:
SELECT t1.RemarksDate, LTRIM(RIGHT(t2.Description, LEN(t2.Description) - PATINDEX('%[0-9][^0- 9]%', t2.Description ))) StudentName, t1.Description
,LEFT(t2.Description, PATINDEX('%[0-9][^0-9]%', t2.Description ))
FROM (
SELECT *, RowID + TotStudents MaxVal
FROM #StudentData
WHERE RemarksDate is NOT NULL
) t1
JOIN (
SELECT *
FROM #StudentData
WHERE RemarksDate is NULL
) t2 ON t2.RowId BETWEEN t1.RowID and t1.MaxVal
The data is displayed as follows
Expected output is as follows
2/1/2021 ABC Poor In English 1
2/1/2021 XYZ Poor In English 1
2/1/2021 MNO Poor In English 1
2/1/2021 IGH Poor In English 1
2/1/2021 KKK Poor In English 10
2/1/2021 PQR Poor In Maths 5
2/8/2021 RST Poor In Social 1

This is a type of gaps-and-islands problem. There are many solutions, I will give you one that only requires a single scan of the base table.
We have a header row and child rows, and we need to apply the header row values to the child rows.
We can solve this by defining the start point of each group, then taking windowed header values for each group and finally filtering out the header rows
WITH Groupings AS (
SELECT *,
GroupId = MAX(CASE WHEN Description LIKE 'Poor%' THEN RowID END)
OVER (ORDER BY RowID ROWS UNBOUNDED PRECEDING)
FROM #StudentData s
),
GroupValues AS (
SELECT
RemarksDate = MAX(CASE WHEN Description LIKE 'Poor%' THEN RemarksDate END)
OVER (PARTITION BY GroupId),
DescriptionHeader = MAX(CASE WHEN Description LIKE 'Poor%' THEN Description END)
OVER (PARTITION BY GroupId),
Space = CHARINDEX(' ', Description),
Description
FROM Groupings
)
SELECT
RemarksDate,
DescriptionHeader,
StudentName = SUBSTRING(Description, Space + 1, LEN(Description)),
SomeNumber = LEFT(Description, Space - 1)
FROM GroupValues
WHERE Description NOT LIKE 'Poor%';
db<>fiddle

Except the fact that the table design is pretty awful, I would suggest the following approach:
WITH cteRemarks AS(
SELECT *, LEAD(RowId) OVER (ORDER BY RowID) AS RowIdNxt
FROM #StudentData
WHERE TotStudents IS NOT NULL
)
SELECT r.RemarksDate
,RIGHT(t.Description, LEN(t.Description)-CHARINDEX(' ', t.Description)) AS StudentsName
,r.Description AS Description
,LEFT(t.Description, CHARINDEX(' ', t.Description)-1) AS Val
FROM cteRemarks r
LEFT JOIN #StudentData t ON t.TotStudents IS NULL
AND t.RowID > r.RowID
AND t.RowID < ISNULL(r.RowIDNxt, 99999999)

SQL: Upsert and get the old and the new values

I have the following table Items:
Id MemberId MemberGuid ExpiryYear Hash
---------------------------------------------------------------------------
1 1 Guid1 2017 Hash1
2 1 Guid2 2018 Hash2
3 2 Guid3 2020 Hash3
4 2 Guid4 2017 Hash1
I need to copy the items from a member to another (not just to update MemberId, to insert a new record). The rule is: if I want to migrate all the items from a member to another, I will have to check that that item does not exists in the new member.
For example, if I want to move the items from member 1 to member 2, I will move only item with id 2, because I already have an item at member 2 with the same hash and with the same expiry year (this are the columns that I need to check before inserting the new items).
How to write a query that migrates only the non-existing items from a member to another and get the old id and the new id of the records? Somehow with an upsert?

You can as the below:
-- MOCK DATA
DECLARE #Tbl TABLE
(
Id INT IDENTITY NOT NULL PRIMARY KEY,
MemberId INT,
MemberGuid CHAR(5),
ExpiryYear CHAR(4),
Hash CHAR(5)
)
INSERT INTO #Tbl
VALUES
(1, 'Guid1', '2017', 'Hash1'),
(1, 'Guid2', '2018', 'Hash1'),
(2, 'Guid3', '2020', 'Hash3'),
(2, 'Guid4', '2017', 'Hash1')
-- MOCK DATA
-- Parameters
DECLARE #FromParam INT = 1
DECLARE #ToParam INT = 2
DECLARE #TmpTable TABLE (NewDataId INT, OldDataId INT)
MERGE #Tbl AS T
USING
(
SELECT * FROM #Tbl
WHERE MemberId = #FromParam
) AS F
ON T.Hash = F.Hash AND
T.ExpiryYear = F.ExpiryYear AND
T.MemberId = #ToParam
WHEN NOT MATCHED THEN
INSERT ( MemberId, MemberGuid, ExpiryYear, Hash)
VALUES ( #ToParam, F.MemberGuid, F.ExpiryYear, F.Hash)
OUTPUT inserted.Id, F.Id INTO #TmpTable;
SELECT * FROM #TmpTable

Step 1:
Get in cursor all the data of member 1
Step 2:
While moving through cursor.
Begin
select hash, expirydate from items where memberid=2 and hash=member1.hash and expirydate=member1.expirydate
Step 3
If above brings any result, do not insert.
else insert.
Hope this helps
Note: this is not actual code. I am providing you just steps based on which you can write sql.

Actually you just need an insert. When ExpiryYear and Hash matched you don't wanna do anything. You just wanna insert from source to target where those columns doesn't match. You can do that with Merge or Insert.
CREATE TABLE YourTable
(
Oldid INT,
OldMemberId INT,
Id INT,
MemberId INT,
MemberGuid CHAR(5),
ExpiryYear CHAR(4),
Hash CHAR(5)
)
INSERT INTO YourTable VALUES
(null, null, 1, 1, 'Guid1', '2017', 'Hash1'),
(null, null, 2, 1, 'Guid2', '2018', 'Hash2'),
(null, null, 3, 2, 'Guid3', '2020', 'Hash3'),
(null, null, 4, 2, 'Guid4', '2017', 'Hash1')
DECLARE #SourceMemberID AS INT = 1
DECLARE #TargetMemberID AS INT = 2
MERGE [YourTable] AS t
USING
(
SELECT * FROM [YourTable]
WHERE MemberId = #SourceMemberID
) AS s
ON t.Hash = s.Hash AND t.ExpiryYear = s.ExpiryYear AND t.MemberId = #TargetMemberID
WHEN NOT MATCHED THEN
INSERT(Oldid, OldMemberId, Id, MemberId, MemberGuid, ExpiryYear, Hash) VALUES (s.Id, s.MemberId, (SELECT MAX(Id) + 1 FROM [YourTable]), #TargetMemberID, s.MemberGuid, s.ExpiryYear, s.Hash);
SELECT * FROM YourTable
DROP TABLE YourTable
/* Output:
Oldid OldMemberId Id MemberId MemberGuid ExpiryYear Hash
-----------------------------------------------------------------
NULL NULL 1 1 Guid1 2017 Hash1
NULL NULL 2 1 Guid2 2018 Hash2
NULL NULL 3 2 Guid3 2020 Hash3
NULL NULL 4 2 Guid4 2017 Hash1
2 1 5 2 Guid2 2018 Hash2
If you just want to select then do as following
SELECT null AS OldID, null AS OldMemberID, Id, MemberId, MemberGuid, ExpiryYear, Hash FROM YourTable
UNION ALL
SELECT A.Id AS OldID, A.MemberId AS OldMemberID, (SELECT MAX(Id) + 1 FROM YourTable) AS Id, #TargetMemberID AS MemberId, A.MemberGuid, A.ExpiryYear, A.Hash
FROM YourTable A
LEFT JOIN
(
SELECT * FROM YourTable WHERE MemberId = #TargetMemberID
) B ON A.ExpiryYear = B.ExpiryYear AND A.Hash = B.Hash
WHERE A.MemberId = #SourceMemberID AND B.Id IS NULL

How to return only numbers from query where column is nvarchar

I have a simple query that is returning records where "column2" > 0
Here is the data in the database
Column1 Column2
1 123456789
2 123456781
3 13-151-1513
4 alsdjf
5
6 000000000
Her is the query
select column1, replace(a.Payroll_id,'-','')
from table1
where isnumeric(column2) = 1
I'd like to return the following:
Column1 Column2
1 123456789
2 123456781
3 131511513
This mean, I won't select any records when the column is blank (or null), will not return a row if it's not an integer, and will drop out the '-', and would not show row 6 since it's all 0.
How can I do this?

I think you can use something like this :
USE tempdb
GO
CREATE TABLE #Temp
(
ID INT IDENTITY
,VALUE VARCHAR(30)
)
INSERT INTO #Temp (VALUE) VALUES ('1213213'), ('1213213'), ('121-32-13'), ('ASDFASF2123')
GO
WITH CteData
AS
(
SELECT REPLACE(VALUE,'-','') as Valor FROM #Temp
)
SELECT * FROM CteData WHERE (ISNUMERIC(Valor) = 1 AND valor not like '%[0-0]%')
DROP TABLE #Temp
then you can apply validations for empty, NULL,0 etc

If you are using SQL2012 or above you can also use TRY_PARSE that is more selective in its parsing. This function will return NULL if a record can't be converted. You could use it like this:
CREATE TABLE #temp
(
ID INT IDENTITY ,
VALUE VARCHAR(30)
)
INSERT INTO #temp
( VALUE )
VALUES ( '1213213' ),
( '1213213' ),
( '121-32-13' ),
( 'ASDFASF2123' ),
( '0000000' )
SELECT ParsedValue
FROM #temp
CROSS APPLY ( SELECT TRY_PARSE(
Value AS INT ) AS ParsedValue
) details
WHERE ParsedValue IS NOT NULL
AND ParsedValue>0

SQL script to aggregate column values

i'd appreciate some help putting together a sql script to copy data from one table to another. essentially what i need to do for each row in the source table is aggregate the column values and store them into a single column in the target table.
TableA: ID, ColumnA, ColumnB, ColumnC
TableB: Identity, ColumnX
so, ColumnX needs to be something like 'ColumnA, ColumnB, ColumnC'.
in addition though, i need to keep track of each TableA.ID -> SCOPE_IDENTITY() mapping in order to update a third table.
thanks in advance!
EDIT: TableA.ID is not the same as TableB.Identity. TableB.Identity will return a new identity value on insert. so either i need to store the mapping in a temp table or update TableC with each insert into TableB.

Here is a row-by-row processing example. This will provide you the results in a way where you can process each row at a time. Or you can use TableC at the end and do whatever processing you need to do.
However, it would be a lot faster if you added an extra column to TableB (Called TableA_ID) and just INSERTED the result into it. You would have instant access to TableA.ID and TableB.Identity. But without knowing your exact situation, this may not be feasible. (But you could always add the column and then drop it afterwards!)
USE tempdb
GO
CREATE TABLE TableA (
ID int NOT NULL PRIMARY KEY,
ColumnA varchar(10) NOT NULL,
ColumnB varchar(10) NOT NULL,
ColumnC varchar(10) NOT NULL
)
CREATE TABLE TableB (
[Identity] int IDENTITY(1,1) NOT NULL PRIMARY KEY,
ColumnX varchar(30) NOT NULL
)
CREATE TABLE TableC (
TableA_ID int NOT NULL,
TableB_ID int NOT NULL,
PRIMARY KEY (TableA_ID, TableB_ID)
)
GO
INSERT INTO TableA VALUES (1, 'A', 'A', 'A')
INSERT INTO TableA VALUES (2, 'A', 'A', 'B')
INSERT INTO TableA VALUES (3, 'A', 'A', 'C')
INSERT INTO TableA VALUES (11, 'A', 'B', 'A')
INSERT INTO TableA VALUES (12, 'A', 'B', 'B')
INSERT INTO TableA VALUES (13, 'A', 'B', 'C')
INSERT INTO TableA VALUES (21, 'A', 'C', 'A')
INSERT INTO TableA VALUES (22, 'A', 'C', 'B')
INSERT INTO TableA VALUES (23, 'A', 'C', 'C')
GO
-- Do row-by-row processing to get the desired results
SET NOCOUNT ON
DECLARE #TableA_ID int
DECLARE #TableB_Identity int
DECLARE #ColumnX varchar(100)
SET #TableA_ID = 0
WHILE 1=1 BEGIN
-- Get the next row to process
SELECT TOP 1
#TableA_ID=ID,
#ColumnX = ColumnA + ColumnB + ColumnC
FROM TableA
WHERE ID > #TableA_ID
-- Check if we are all done
IF ##ROWCOUNT = 0
BREAK
-- Insert row into TableB
INSERT INTO TableB (ColumnX)
SELECT #ColumnX
-- Get the identity of the new row
SET #TableB_Identity = SCOPE_IDENTITY()
-- At this point, you have #TableA_ID and #TableB_Identity.
-- Go to town with whatever extra processing you need to do
INSERT INTO TableC (TableA_ID, TableB_ID)
SELECT #TableA_ID, #TableB_Identity
END
GO
SELECT * FROM TableC
GO
SELECT * FROM TableA
ID ColumnA ColumnB ColumnC
----------- ---------- ---------- ----------
1 A A A
2 A A B
3 A A C
11 A B A
12 A B B
13 A B C
21 A C A
22 A C B
23 A C C
SELECT * FROM TableB
Identity ColumnX
----------- ------------------------------
1 AAA
2 AAB
3 AAC
4 ABA
5 ABB
6 ABC
7 ACA
8 ACB
9 ACC
SELECT * FROM TableC
TableA_ID TableB_ID
----------- -----------
1 1
2 2
3 3
11 4
12 5
13 6
21 7
22 8
23 9

Assuming:
TableB exists
INSERT INTO TableB (ColumnX)
SELECT [TableA]![ColumnA]+","+[TableA]![ColumnB]+","+[TableA]![ColumnC] AS ColumnX
FROM TableA;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Recreating historical field changes from an EAV audit table - sql

Related

MS SQL Server counting nulls by column in a table

Select rows after the current row which are null and combine them to rows

SQL: Upsert and get the old and the new values

How to return only numbers from query where column is nvarchar

SQL script to aggregate column values

Categories

Resources