How to use openXML() for Complex XMLs in SQL Server 2005 - sql-server-2005

I have the following complex XML
<Collection>
<VOUCHER>
<DATE TYPE="Date">20110401</DATE>
<NARRATION TYPE="String">MUNNA CONVENT ROAD</NARRATION>
<VOUCHERTYPENAME>RETAIL</VOUCHERTYPENAME>
<VOUCHERNUMBER>R-2-I2-9-6-27751</VOUCHERNUMBER>
<ALLLEDGERENTRIES.LIST>
<LEDGERNAME>U.S.T. CANTEEN</LEDGERNAME>
<AMOUNT>-2678.9985</AMOUNT>
</ALLLEDGERENTRIES.LIST>
<ALLLEDGERENTRIES.LIST>
<LEDGERNAME>U.S.T. CANTEEN</LEDGERNAME>
<AMOUNT>-2678.9985</AMOUNT>
</ALLLEDGERENTRIES.LIST>
</VOUCHER>
<VOUCHER>
<DATE TYPE="Date">20110401</DATE>
<NARRATION TYPE="String">MUNNA CONVENT ROAD</NARRATION>
<VOUCHERTYPENAME>RETAIL</VOUCHERTYPENAME>
<VOUCHERNUMBER>R-2-I2-9-6-27751</VOUCHERNUMBER>
<ALLLEDGERENTRIES.LIST>
<LEDGERNAME>U.S.T. CANTEEN</LEDGERNAME>
<AMOUNT>-2678.9985</AMOUNT>
</ALLLEDGERENTRIES.LIST>
<ALLLEDGERENTRIES.LIST>
<LEDGERNAME>U.S.T. CANTEEN</LEDGERNAME>
<AMOUNT>-2678.9985</AMOUNT>
</ALLLEDGERENTRIES.LIST>
</VOUCHER>
</Collection>
I'm saving voucher details in 1 table, ALLLEDGERENTRIES.LIST details in another table.
Both tables have relation on VoucherID. For a particular VoucherID the related x3 values should be stored. In my stored procedure I'm using openxml().
Piece of my SP:
INSERT INTO SalesVoucher(AbsID,VoucherNumber,VoucherTypeName,Narration,VoucherDate)
SELECT #AID,VOUCHERNUMBER,VOUCHERTYPENAME,NARRATION,CAST(DATE AS DATETIME)
FROM OPENXML(#XMLHandle,'ENVELOPE/BODY/DATA/COLLECTION/VOUCHER',3)
WITH (
VOUCHERNUMBER nVarchar(200),VOUCHERTYPENAME varchar(100),NARRATION varchar(500),DATE DATETIME
)
SELECT #VID=##IDENTITY
INSERT INTO SalesLedger(VoucherID,LedgerName,Amount)
SELECT #VID,LEDGERNAME,AMOUNT
FROM OPENXML(#XMLHandle,'ENVELOPE/BODY/DATA/COLLECTION/VOUCHER/ALLLEDGERENTRIES.LIST',3)
WITH(
LEDGERNAME varchar(200),AMOUNT decimal(18,0)
)
All values are storing in DB but the column VoucherID in SalesLedger table is same for all the rows (it should not..) as I used ##IDENTITY it is returning last identity value only.
Please someone help me how to store related voucherID in SalesLedger table using openxml() in sql...

I would probably use the native XQuery capabilities of SQL Server to do this. First, grab the items you need for your SalesVoucher table and insert those.
When you come to insert the details, your "parent" info is already stored in the SalesVoucher table - so go grab the necessary info from there.
Your code would be something like this (assuming your XML data is in a SQL variable called #input of type XML):
-- Insert the "parent" info into SalesVoucher
INSERT INTO dbo.SalesVoucher(VoucherNumber, VoucherTypeName, Narration, VoucherDate)
SELECT
v.value('(VOUCHERNUMBER)[1]', 'NVARCHAR(200)'),
v.value('(VOUCHERTYPENAME)[1]', 'VARCHAR(100)'),
v.value('(NARRATION)[1]', 'VARCHAR(500)'),
v.value('(DATE)[1]', 'DATETIME')
FROM
#input.nodes('/Collection/VOUCHER') AS Coll(V)
This inserts the basic info in your SalesVoucher table.
When you want to parse the details, you need to make a reference back to the VoucherNumber of the parent - with that info, you can retrieve the AbsID from SalesVoucher and insert the appropriate value into SalesLedger:
INSERT INTO #SalesLedger (VoucherID, LedgerName, Amount)
SELECT
sv.AbsID,
AL.LS.value('(LEDGERNAME)[1]', 'VARCHAR(200)'),
AL.LS.value('(AMOUNT)[1]', 'DECIMAL(18,4)')
FROM
#input.nodes('/Collection/VOUCHER') AS Coll(V)
INNER JOIN
dbo.SalesVoucher sv
ON sv.VoucherNumber = v.value('(VOUCHERNUMBER)[1]', 'NVARCHAR(200)')
CROSS APPLY
Coll.V.nodes('.//ALLLEDGERENTRIES.LIST') AS AL(LS)
The CROSS APPLY gets the details for that one particular node, and thus "connects" the details to the "parent" info for the VoucherNumber in the XML above.
As as PS: a datatype of DECIMAL(18,0) is not suitable for values like -2678.9985. DECIMAL(18,0) will store a maximum of 18 digits, but 0 of which after the decimal point - so this value would be stored as -2679. I've changed this to a more useful datatype of DECIMAL(18,4) - 18 digits max, 4 of which after the decimal point.

Related

Convert List Of XML Tags in varchar column to comma separated list

I have a table that contains a list of xml tags/values that I need to use to join to another table to retrieve their actual value and display the result as a csv list.
Example varchar data:
<choice id="100"/><choice id="101"/><choice id="102"/>
However, these values actually translate to other values: red, white, blue respectively. I need to convert that list to the following list:
red,white,blue
As a recap, the "source" table column is varchar, and contains a list of xml attribute values, and those values translate to other values by joining to another table. So the other table has a primary key of id (int) with rows for 100,101,102. Each of those rows has values red,white,blue respectively. I hope this makes enough sense.
Here is the ddl to set up the scenario:
create table datatable(
id int,
data nvarchar(449)
primary key (id)
);
insert into datatable(id, data)
values(1,'<choice id="100"/><choice id="101"/><choice id="102"/>')
,(2,'<choice id="100"/>')
,(3,'<choice id="101"/>')
,(4,'<choice id="102"/>');
create table choicetable(
id int,
choicevalue nvarchar(449)
primary key (id)
);
insert into choicetable(id, choicevalue)
values(100,'red')
,(101,'white')
,(102,'blue');
This would be the first time I've tried parsing XML in this manner so I'm a little stumped where to start. Also, I do not have control over the database I am retrieving the data from (3rd party software).
Without proper sample data it's hard to give an exact query. But you would do something like this
Use CROSS APPLY to convert the varchar to xml
Use .nodes to shred the XML into separate rows.
Join using .value to get the id attribute
Group up, and concatenate using STRING_AGG. You may not need GROUP BY depending on your situation.
SELECT
xt.Id,
STRING_AGG(ot.Value, ',')
FROM XmlTable xt
CROSS APPLY (SELECT CAST(xt.XmlColumn AS xml) ) v(XmlData)
CROSS APPLY v.XmlData.nodes('/choice') x1(choice)
JOIN OtherTable ot ON ot.Id = x1.choice.value('#id','int')
GROUP BY
xt.Id;
I would advise you to store XML data in an xml typed column if at all possible.

SQL importing Extended Events file using sys.fn_xe_file_target_read_file how to only get values since last import

I am using SQL Server 2012
I have a long running extended event (runs for days to capture events) that saves to a .xel file.
I have a job that runs periodically to import the data into a staging table.
I am only importing the XML event_data column from the file so I can parse out the XML fields I need and save to a table for reporting.
I know when the last time I ran the import was so I want to see if I can only select records from the file that were added since the import process last ran.
I have it working now but it imports ALL the records from the files into staging tables, parses out the fields I need (including timestamp), then only imports the records that have a timestamp since the job last ran.
My process only inserts new ones since the last time the job ran so this all works fine but it does a lot of work importing and parsing out the XML for ALL records in the file, including the ones I already imported the last times the job ran.
So I want to find a way to not import from the file at all if it was already imported, or at least not have to parse the XML for the records that were already imported (though I have to parse it now to get the timestamp to exclude the ones already processed).
Below is what I have, and as I said, it works, but is doing a lot of extra work if I can find a way to skip the ones I already imported.
I only included the steps for my process that I need the help on:
-- pull data from file path and insert into staging table
INSERT INTO #CaptureObjectUsageFileData (event_data)
SELECT cast(event_data as XML) as event_data
FROM sys.fn_xe_file_target_read_file(#FilePathNameToImport, null, null, null)
-- parse out the data needed (only columns using) and insert into temp table for parsed data
INSERT INTO #CaptureObjectUsageEventData (EventTime, EventObjectType, EventObjectName)
SELECT n.value('(#timestamp)[1]', 'datetime') AS [utc_timestamp],
n.value('(data[#name="object_type"]/text)[1]', 'varchar(500)') AS ObjectType,
n.value('(data[#name="object_name"]/value)[1]', 'varchar(500)') as ObjectName
from (
SELECT event_data
FROM #CaptureObjectUsageFileData (NOLOCK)
) ed
CROSS apply ed.event_data.nodes('event') as q(n)
-- select from temp table as another step for speed/conversion
-- converting the timestamp to smalldatetime so it doesnt get miliseconds so when we select distinct it wont have lots of dupes
INSERT INTO DBALocal.dbo.DBObjectUsageTracking(DatabaseID, ObjectType, ObjectName, ObjectUsageDateTime)
SELECT DISTINCT #DBID, EventObjectType, EventObjectName, CAST(EventTime AS SMALLDATETIME)
FROM #CaptureObjectUsageEventData
WHERE EventTime > #LastRunDateTime
Okay, I've place a comment already, but - after thinking a bit deeper and looking into your code - this might be rather simple:
You can store the time of your last import and use a predicate in .nodes() (like you do this in .value() to get the correct <data>-element).
Try something like this:
DECLARE #LastImport DATETIME=GETDATE(); --put the last import's time here
and then
CROSS apply ed.event_data.nodes('event[#timestamp cast as xs:dateTime? > sql:variable("#LastImport")]') as q(n)
Doing so, .nodes() should return only <event>-elements, where the condition is fullfilled. If this does not help, please show some reduced example of the XML and what you want to get.
Accepted answer above but posting the code for the section I had questions on in full with updates from comments/fixes I made (again not entire code) but important parts. Using #Shnugo help I was able to completely remove a temp table from my process that I needed for doing the date filtering on before inserting into my permanent table, with his answer I can just insert directly into the permanent table. In my testing small data sets the update and the removal of the extra code reduced the running time by 1/3. With the more data I get the bigger impact this improvement will give.
This is designed to run an Extended Event session over a long period of time.
It will tell me what Objects are being used (to later query up against the system tables) to tell me what ones are NOT being used.
See Extended Event generation code below:
I am grabbing info on: sp_statement_starting and only grabbing SP and function events and only saving the object name, type, and timestamp
I am NOT saving the SQL Text because it is not needed for my purpose.
The sp_statement_starting pulls every statement inside a Stored Procedure so when an SP runs it could have 1-100 statements starting events,
and insert that many records into the file (which is way more data than needed for my purposes).
In my code after I import the file into the staging table I am shortning the timestamp to shortdatetime and selecting distinct values from all the records in the file
I am doing this because it inserts a record for every statement inside an SP, shortining the data to shortdatetime and selecting distinct greatly reduces the humber of recrods inserted.
I know I could just keep the object name and only insert unique values and ignore the time completely, but I want to see approximatly how often they are called.
CREATE EVENT SESSION [CaptureObjectUsage_SubmissionEngine] ON SERVER
ADD EVENT sqlserver.sp_statement_starting(
-- collect object name but NOT statement, thats not needed
SET collect_object_name=(1),
collect_statement=(0)
WHERE (
-- this is for functions or SP's
(
-- functions
[object_type]=(8272)
-- SProcs
OR [object_type]=(20038)
)
AND [sqlserver].[database_name]=N'DBNAMEHERE'
AND [sqlserver].[is_system]=(0))
)
ADD TARGET package0.event_file(
SET filename=N'c:\Path\CaptureObjectUsage.xel' -- mine that was default UI gave me
)
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO
-- ***************************************************************************
-- code for importing
-- ***************************************************************************
-- pull data from file path and insert into staging table
INSERT INTO #CaptureObjectUsageFileData (event_data)
SELECT cast(event_data as XML) as event_data
FROM sys.fn_xe_file_target_read_file(#FilePathNameToImport, null, null, null)
-- with the XML.nodes parsing I can insert directly into my final table because it does the logic here
INSERT INTO DBALocal.dbo.DBObjectUsageTracking(DatabaseID, ObjectType, ObjectName, ObjectUsageDateTime)
SELECT DISTINCT #DBID, -- #DBID is variable I set above so I dont need to use DBNAME and take up a ton more space
n.value('(data[#name="object_type"]/text)[1]', 'varchar(500)') AS ObjectType,
n.value('(data[#name="object_name"]/value)[1]', 'varchar(500)') as ObjectName,
CAST(n.value('(#timestamp)[1]', 'datetime') AS SMALLDATETIME) AS [utc_timestamp]
from (
SELECT event_data
FROM #CaptureObjectUsageFileData (NOLOCK)
) ed
-- original before adding the .node logic
--CROSS apply ed.event_data.nodes('event') as q(n)
-- updated to reduce amount of data to import
CROSS apply ed.event_data.nodes('event[#timestamp cast as xs:dateTime? > sql:variable("#LastRunDateTime")]') as q(n)
old question, but since no one offered a solution using the initial_offset parameter for sys.fn_xe_file_target_read_file, I'll drop some code about how I used it a few years ago. It's not a working solution I think, because I cut and pasted it from a larger code base, but it shows everything that is needed to get it working.
-- table to hold the config, i.e. the last file read and the offset.
IF OBJECT_ID('session_data_reader_config', 'U') IS NULL
CREATE TABLE session_data_reader_config
(
lock bit PRIMARY KEY
DEFAULT 1
CHECK(lock=1) -- to allow only one record in the table
, file_target_path nvarchar(260)
, last_file_read nvarchar(260)
, last_file_read_offset bigint
, file_exists AS dbo.fn_file_exists(last_file_read)
)
-- Insert the default value to start reading the log files, if no values are already present.
IF NOT EXISTS(SELECT 1 FROM session_data_reader_config )
INSERT INTO session_data_reader_config (file_target_path,last_file_read,last_file_read_offset)
VALUES ('PathToYourFiles*.xel',NULL,NULL)
-- import the EE data into the staging table
IF EXISTS(SELECT 1 FROM [session_data_reader_config] WHERE file_exists = 1 )
BEGIN
INSERT INTO [staging_table] ([file_name], [file_offset], [data])
SELECT t2.file_name, t2.file_offset, t2.event_data --, CAST(t2.event_data as XML)
FROM [session_data_reader_config]
CROSS APPLY sys.fn_xe_file_target_read_file(file_target_path,NULL, last_file_read, last_file_read_offset) t2
END
ELSE
BEGIN
INSERT INTO [staging_table] ([file_name], [file_offset], [data])
SELECT t2.file_name, t2.file_offset, t2.event_data
FROM [session_data_reader_config]
CROSS APPLY sys.fn_xe_file_target_read_file(file_target_path,NULL, NULL, NULL) t2
END
-- update the config table with the last file and offset
UPDATE [session_data_reader_config]
SET [last_file_read] = T.[file_name]
, [last_file_read_offset] = T.[file_offset]
FROM (
SELECT TOP (1)
[file_name]
, [file_offset]
FROM [staging_table]
ORDER BY [id] DESC
) AS T ([file_name], [file_offset])

SQL: querying specific value in column with XML Data

Boy, I've been researching this subject, but I am just not getting it. Sorry if I'm asking a question that has all ready been asked a million times, but its hard to understand when you're a noob like me, and the values in my tables aren't like the others I've seen. So here it goes...
In my SQL Server database, I have a table that has all my listings in it, called ItemsEbay.
The main identifier for each individual item has a column ID value called ItemID, so I would like to refer to that as needed.
Within that table is a column named ItemSpecifics that contains XML data.
Within the ItemSpecifics XML Data, is a node with a <Name> of UPC text and random value for that node:
What I would like, is a query that would allow me to search all the items in the ItemsEbay table that have a specific UPC value of my choosing, such as 1000100 or 10U100 for instance.
When I find the match values I'm querying, I would like to be able to replace them all at once with a new value Does Not Apply.
First of all: do not poste pictures!
What I've done here is your job: How to create a MCVE!
A dummy table with two rows, one contains the searched value, one doesn't
DECLARE #dummyTable TABLE(ID INT IDENTITY, ItemSpecifications XML);
INSERT INTO #dummyTable VALUES
(N'<SelectedValues>
<SelectedValue>
<Name>TestName</Name>
<Value>Xml1</Value>
</SelectedValue>
<SelectedValue>
<Name>UPC</Name>
<Value>123</Value><!--The UPC named value = 123 -->
</SelectedValue>
</SelectedValues>')
,(N'<SelectedValues>
<SelectedValue>
<Name>TestName</Name>
<Value>Xml2</Value>
</SelectedValue>
<SelectedValue>
<Name>UPC</Name>
<Value>999</Value><!--The UPC named value = 999 -->
</SelectedValue>
</SelectedValues>');
--I search for "123" and want to replace it with "SomeOther"
DECLARE #SearchFor VARCHAR(100)='123';
DECLARE #ReplaceWith VARCHAR(100)='SomeOther';
--The update statement uses .modify() for the XML change and .exist() to check for the search value below a <SelectedValue>, which <Name> element has a text() of "123":
UPDATE #dummyTable SET ItemSpecifications.modify(N'replace value of (/SelectedValues
/SelectedValue[(Name/text())[1]="UPC"]
/Value/text())[1]
with sql:variable("#ReplaceWith")')
WHERE ItemSpecifications.exist(N'/SelectedValues
/SelectedValue[(Name/text())[1]="UPC"]
/Value[text()=sql:variable("#SearchFor")]')=1;
--Check the result
SELECT * FROM #dummyTable;

Fixing SQL Update using XQuery modify to work on SQL 2005

I'm trying to move a bunch of fields from a table into an xml blob contained within the same table. After this is successful I will be removing the column from the table. A really simple version (without the drop column) of what I've come up with is below, and this works fine on SQL 2008 - however I've discovered that this will not work on SQL 2005. I get the error XQuery: SQL type 'datetime' is not supported in XQuery. I'm actually doing this through the execution of a constructed SQL statement within a SP because of the number of fields, but for simplicity I've used a normal statement in the example:
if OBJECT_ID('tempdb..#Case') is not null
DROP Table #Case;
CREATE TABLE #Case
(
id INT,
DecisionDate DateTime,
CaseBlob xml
)
INSERT INTO #Case Values(1, '10-OCT-2011 10:10:00', '<CaseBlob></CaseBlob>')
INSERT INTO #Case Values(2, '20-OCT-2011 10:10:00', '<CaseBlob></CaseBlob>')
INSERT INTO #Case Values(3, null, '<CaseBlob></CaseBlob>')
INSERT INTO #Case Values(4, '21-OCT-2011 10:10:00', '<CaseBlob></CaseBlob>')
INSERT INTO #Case Values(5, null, '<CaseBlob></CaseBlob>')
UPDATE #Case
SET CaseBlob.modify('insert <DecisionDate>{sql:column("#Case.DecisionDate")}</DecisionDate>
as last into (/CaseBlob)[1]')
WHERE #Case.DecisionDate is not null
AND CaseBlob.exist('(/CaseBlob/DecisionDate)') = 0
SELECT * FROM #CASE
I've tried wrapping the sql:column("#Case.DecisionDate") with xs:string(sql:column("#Case.DecisionDate")) but that doesn't seem to help. It has been pointed out by #marc_s that the use of sql:column( within a .modify(insert statement wasn't introduced until SQL 2008 - so I think this is a red herring.
Due to the fact this is a one off migration script and only requires to be run once, am I thinking that I should move away from the Set methods and move to a procedural looping method to cater for my requirements. Does this sound like the correct approach due to the limitation of server version and what I'm trying to achieve? Any pointers greatly appreciated.
First part:
You can use a CTE to query the date part converted to a string using the date time style 126.
;with C as
(
select CaseBlob,
convert(varchar(23), DecisionDate, 126) as DecisionDate
from #Case
where DecisionDate is not null and
CaseBlob.exist('(/CaseBlob/DecisionDate)') = 0
)
update C
set CaseBlob.modify('insert <DecisionDate>{sql:column("DecisionDate")}</DecisionDate>
as last into (/CaseBlob)[1]')
There is a tiny difference in the output using this compared to your update statement. It will omit the milliseconds if they are .000. If they actually have a value it will be included so you are not missing any data. It's just different.
Second part:
I don't really understand how this connects to the above well enough to give you some sample code. But if you need to add more stuff from other tables you can join to those tables in the CTE and make the columns as output from the CTE to be used in the modify statement inserting values.

SQL Pivot table

Is there a way to pivot an Entity-Attribute table?
I want to flip all the rows into columns, regardless of how many different attributes there are.
Here's an example of what I want to accomplish. The example uses two attributes: FirstName, LastName. But in the real database, there are thousands of attributes and I want to flip them into columns for reporting purposes.
I don't want to have to write a CTE for every attribute.
USE TempDB
DECLARE #Attribute TABLE(
AttributeID Int Identity(10,1) PRIMARY KEY,
AttributeName Varchar(MAX))
INSERT INTO #Attribute(AttributeName) VALUES('Firstname')
INSERT INTO #Attribute(AttributeName) VALUES('Lastname')
DECLARE #tbl TABLE(
AttributeID Int,
EntityValue Varchar(MAX)
)
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(10,'John')
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(10,'Paul')
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(10,'George')
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(10,'Ringo')
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(11,'Lennon')
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(11,'McCartney')
INSERT INTO #tbl(AttributeID,EntityValue) VALUES(11,'Harrison')
SELECT A.AttributeID,AttributeName,EntityValue FROM #tbl T
INNER JOIN #Attribute A
ON T.AttributeID=A.AttributeID
DECLARE #Tbl2 Table(
FirstName Varchar(MAX),
LastName Varchar(MAX)
)
INSERT INTO #Tbl2(FirstName,LastName) VALUES('John','Lennon')
INSERT INTO #Tbl2(FirstName,LastName) VALUES('Paul','McCartney')
INSERT INTO #Tbl2(FirstName,LastName) VALUES('George','Harrison')
INSERT INTO #Tbl2(FirstName) VALUES('Ringo')
SELECT * FROM #Tbl2
Based on what you posted, you're dealing with SQL Server.
The old school method is to use IF or CASE statements to represent each column you want to create. IE:
CASE WHEN t.AttributeID = 10 THEN t.EntityValue ELSE NULL END 'FirstName'
The alternative is to use PIVOT (SQL Server 2005+).
In either case, you're going to have to define the output columns by hand. If you model was setup to address it, you might be able to use dynamic SQL.
In case you're curious, the reason Microsoft SQL Server's PIVOT operator isn't "dynamic," and that you must specify each value to pivot, is that this makes it possible to identify the table structure of the PIVOT query from the query text alone. That's an important principle of most programming languages - it should be possible to determine the type of an expression from the expression. The type shouldn't depend on the run-time values of anything mentioned in the expression.
That said, some implementations of SQL implement what you want. For example, I think Microsoft Access does this with TRANSFORM.
If you search the web for "dynamic pivot", you'll find a lot.