Alternatives to XML shredding in SQL - sql

I tried to shred XML into a temporary table by using XQuery .nodes as follows. But, I got performance problem. It is taking much time to shred. Please give me an idea on alternatives for this.
My requirement is to pass bulk records to a stored procedure and parse those records and do some operation based on record values.
CREATE TABLE #DW_TEMP_TABLE_SAVE(
[USER_ID] [NVARCHAR](30),
[USER_NAME] [NVARCHAR](255)
)
insert into #DW_TEMP_TABLE_SAVE
select
A.B.value('(USER_ID)[1]', 'nvarchar(30)' ) [USER_ID],
A.B.value('(USER_NAME)[1]', 'nvarchar(30)' ) [USER_NAME]
from
#l_n_XMLDoc.nodes('//ROW') as A(B)

Specify the text() node in your values clause.
insert into #DW_TEMP_TABLE_SAVE
select A.B.value('(USER_ID/text())[1]', 'nvarchar(30)' ) [USER_ID],
A.B.value('(USER_NAME/text())[1]', 'nvarchar(30)' ) [USER_NAME]
from #l_n_XMLDoc.nodes('/USER_DETAILS/RECORDSET/ROW') as A(B)
Not using text() will create a query plan that tries concatenate the values from the specified node with all its child nodes and I guess you don't want that in this scenario. The concatenation part of the query if you don't use text() is done by the UDX operator and it is a good thing not to have it in your plan.
Another thing to try is OPENXML. In some scenarios (large xml documents) I have found that OPENXML performs faster.
declare #idoc int
exec sp_xml_preparedocument #idoc out, #l_n_XMLDoc
insert into #DW_TEMP_TABLE_SAVE
select USER_ID, USER_NAME
from openxml(#idoc, '/USER_DETAILS/RECORDSET/ROW', 2)
with (USER_ID nvarchar(30), USER_NAME nvarchar(30))
exec sp_xml_removedocument #idoc

Related

Converting a single XML bracket column to multiple columns in T-SQL SQL-SERVER

I've seen similar questions to mine but the XML format has always been different, as the XML format I have does not follow the "standard" strcuture. My table looks like the following (single column with XML bracket as row values):
|VAL|
|<person name="bob" age="22" city="new york" occupation="student"></person>|
|<person name="bob" age="22" city="new york" occupation="student"></person>|
And the outcome I'm looking for is:
|Name|age|city |occupation|
|bob |22 |new york|student |
|bob |22 |new york|student |
I can create hardcoded script with these column names, but the problem is that I have over 20 tables that would then all require a custom script. My thinking is that there is a way where I can dynamically, taken into account destination table and a source table (xml), I could have a procedure where this data is generated.
Your question is not all clear...
As far as I understand, you have a variety of different XMLs and you want to read them generically. If this is true, I'd suggest for your next question, to reflect this in your sample data.
One general statement is: There is no way around dynamically created statements, in cases, where you want to set the descriptive elements of a resultset (in this case: the names of the columns) dynamically. T-SQL relies on some things you must know in advance.
Try this:
I set up a mockup scenario to simulate your issue (please try to do this yourself in your next question):
DECLARE #tbl TABLE(ID INT IDENTITY, Descr VARCHAR(100), VAL XML);
INSERT INTO #tbl VALUES
('One person',N'|<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One more person','<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One country','<country name="Germany" capital="Berlin" continent="Europe"></country>');
--this query relies on all possible attributes known in advance.
--common attributes, like the name, are returned for a person and for a country
--differing attributes return as NULL.
--One advantage might be, that you can use a specific datatype if appropriate.
SELECT t.ID
,t.Descr
,t.VAL.value('(/*[1]/#name)[1]','nvarchar(max)') AS [name]
,t.VAL.value('(/*[1]/#age)[1]','nvarchar(max)') AS [age]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#occupation)[1]','nvarchar(max)') AS [occupation]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#capital)[1]','nvarchar(max)') AS [capital]
,t.VAL.value('(/*[1]/#continent)[1]','nvarchar(max)') AS [continent]
FROM #tbl t;
--This query returns as classical EAV (entity-attribute-value) list
--In this result you get each attribute on its own line
SELECT t.ID
,t.Descr
,A.attrs.value('local-name(.)','nvarchar(max)') AS AttrName
,A.attrs.value('.','nvarchar(max)') AS AttrValue
FROM #tbl t
CROSS APPLY t.VAL.nodes('/*[1]/#*') A(attrs);
Both approaches might be generated as a statement on string-level and then executed by EXEC() or sp_executesql.
Hint: One approach might be to insert the EAV list into a tolerant staging table and proceed from there with conditional aggregation, PIVOT or hardcoded VIEWs.
Dynamic approach
In order to read the <person> elements we would need this:
SELECT t.ID
,t.Descr
,t.VAL.value('(/*[1]/#name)[1]','nvarchar(max)') AS [name]
,t.VAL.value('(/*[1]/#age)[1]','nvarchar(max)') AS [age]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#occupation)[1]','nvarchar(max)') AS [occupation]
FROM #tbl t
WHERE VAL.value('local-name(/*[1])','varchar(100)')='person';
All we have to do is to generate the changing part:
Try this:
A new mockup with a real table
CREATE TABLE SimulateYourTable(ID INT IDENTITY, Descr VARCHAR(100), VAL XML);
INSERT INTO SimulateYourTable VALUES
('One person',N'|<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One more person','<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One country','<country name="Germany" capital="Berlin" continent="Europe"></country>');
--Filter for <person> entities
DECLARE #entityName NVARCHAR(100)='person';
--This is a string representing the command
DECLARE #cmd NVARCHAR(MAX)=
'SELECT t.ID
,t.Descr
***columns here***
FROM SimulateYourTable t
WHERE VAL.value(''local-name(/*[1])'',''varchar(100)'')=''***name here***''';
--with this we can create all the columns
--Hint: With SQL Server 2017+ there is STRING_AGG() - much simpler!
DECLARE #columns NVARCHAR(MAX)=
(
SELECT CONCAT(',t.VAL.value(''(/*[1]/#',Attrib.[name],')[1]'',''nvarchar(max)'') AS ',QUOTENAME(Attrib.[name]))
FROM SimulateYourTable t
CROSS APPLY t.VAL.nodes('//#*') AllAttrs(a)
CROSS APPLY (SELECT a.value('local-name(.)','varchar(max)')) Attrib([name])
WHERE VAL.value('local-name(/*[1])','varchar(100)')=#entityName
GROUP BY Attrib.[name]
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)');
--Now we stuff this into our command
SET #cmd=REPLACE(#cmd,'***columns here***',#columns);
SET #cmd=REPLACE(#cmd,'***name here***',#entityName);
--This is the command.
--Hint: You might use this to create physical VIEWs without the need to type them in...
PRINT #cmd;
You can use EXEC(#cmd) to execute this dynamic SQL and check the result.

Issue with data population from XML

I am reading data from XML into a table. When I do select from the table, the table is empty.
SET #INPUTXML = CAST(#Attribute AS XML)
EXEC Sp_xml_preparedocument #TestDoc OUTPUT, #INPUTXML
SELECT Row_Number() OVER (ORDER BY Name) AS Row, *
INTO #tData
FROM OPENXML(#TestDoc, N'/DocumentElement/dtData')
WITH (
ID VARCHAR(100) './ID'
, Name VARCHAR(100) './Name'
, Value VARCHAR(max) './Value'
, Column VARCHAR(100) './Column'
)
EXEC Sp_xml_removedocument #TestDoc
Below are my questions:
select * from #tData is empty table. Why is data not getting populated?
What does Sp_xml_preparedocument do? When I print #TestDoc, it gives me a number
What is Sp_xml_removedocument ?
To answer your questions though.
#tData is empty because your SELECT statement returned no data. A SELECT...INTO statement will still create the table, even if the SELECT returns no rows. Why your SELECT is returning no data is impossible for us to say, because we have no sample data. If you remove the INTO clause you will see that no rows are returned, so you need to fix your SELECT, FROM, etc. but that brings on to my statement in a minute (about using XQUERY)
sp_xml_preparedocument (Transact-SQL) explains better than I could. Really though, you shouldn't be using it anymore, as it was used to read XML back in SQL Server 2000 (maybe 2005) and prior. Certainly SQL Server 2008 supported XQUERY, which you must be at least using if you are using SSMS 2014. To quote the opening statement of the documentation though:
Reads the XML text provided as input, parses the text by using the MSXML parser (Msxmlsql.dll), and provides the parsed document in a state ready for consumption. This parsed document is a tree representation of the various nodes in the XML document: elements, attributes, text, comments, and so on.
sp_xml_removedocument (Transact-SQL), but again, you should be using XQUERY.
Removes the internal representation of the XML document specified by the document handle and invalidates the document handle.

SQL SELECT ... IN(xQuery)

My first xQuery, may be a bit basic, but it's the only time I need to use it at the moment)
DECLARE #IdList XML;
SET #IdList =
'<Ids>
<Id>6faf5db8-b434-437f-99c8-70299f82dab4</Id>
<Id>5b3ddaf1-3412-471a-a6cf-71f8e1c31168</Id>
<Id>1da6136d-2ff5-44cc-8510-4713451aac4d</Id>
</Ids>';
What I want to do is:
SELECT * FROM [MyTable] WHERE [Id] IN ( /* This list of Id's in the XML */ );
What is the desired way to do this?
Note: The format of the XML (passed into a Stored Procedure from C#) is also under my control, so if there is a better structure (for performance), then please include details.
Also, there could be 1000's of Id's in the real list if that makes a difference..
Thanks
Rob
If you're passing 1000's of Id's, I don't think this will be a stellar performer, but give it a try:
DECLARE #IdList XML;
SET #IdList =
'<Ids>
<Id>6faf5db8-b434-437f-99c8-70299f82dab4</Id>
<Id>5b3ddaf1-3412-471a-a6cf-71f8e1c31168</Id>
<Id>1da6136d-2ff5-44cc-8510-4713451aac4d</Id>
</Ids>';
select * from mytable where id in
(
select cast(T.c.query('text()') as varchar(36)) as result from #idlist.nodes('/Ids/Id') as T(c)
)
You might look at table valued parameters as a better way, unless you already have your data in XML.

Importing XML data with a namespace into SQL Server

I have an XML document containing data that I want to import into existing SQL server tables:
<?xml version="1.0" encoding="UTF-8"?>
<geia:GEIA-STD-0007 xmlns:geia="http://www.geia_STD_0007.com/2006/schema" xmlns:xsi=`enter code here`"http://www.w3.org/2001/XMLSchema-instance">
<geia:full_file>
<geia:XA_end_item_acronym_code_data>
<geia:end_item_acronym_code>ON565</geia:end_item_acronym_code>
<geia:logistics_support_analysis_control_number_structure>32222222</geia:logistics_support_analysis_control_number_structure>
</geia:XA_end_item_acronym_code_data>
<geia:XB_logistics_support_analysis_control_number_indentured_item_data>
<geia:end_item_acronym_code>ON565</geia:end_item_acronym_code>
<geia:logistics_support_analysis_control_number>E2C06CAAE</geia:logistics_support_analysis_control_number>
<geia:alternate_logistics_support_analysis_control_number_code>06</geia:alternate_logistics_support_analysis_control_number_code>
<geia:logistics_support_analysis_control_number_type>P</geia:logistics_support_analysis_control_number_type>
<geia:logistics_support_analysis_control_number_nomenclature>CABLE ASSEMBLY W5</geia:logistics_support_analysis_control_number_nomenclature>
<geia:reliability_availability_and_maintainability_indicator>Y</geia:reliability_availability_and_maintainability_indicator>
<geia:system_end_item_identifier>N</geia:system_end_item_identifier>
<geia:technical_manual_functional_group_code>41JE150</geia:technical_manual_functional_group_code>
</geia:XB_logistics_support_analysis_control_number_indentured_item_data>
<geia:XB_logistics_support_analysis_control_number_indentured_item_data>
<geia:end_item_acronym_code>ON565</geia:end_item_acronym_code>
<geia:logistics_support_analysis_control_number>E2C06CAAMZZ</geia:logistics_support_analysis_control_number>
<geia:alternate_logistics_support_analysis_control_number_code>06</geia:alternate_logistics_support_analysis_control_number_code>
<geia:logistics_support_analysis_control_number_type>P</geia:logistics_support_analysis_control_number_type>
<geia:logistics_support_analysis_control_number_nomenclature>CONSUMABLES</geia:logistics_support_analysis_control_number_nomenclature>
<geia:system_end_item_identifier>N</geia:system_end_item_identifier>
</geia:XB_logistics_support_analysis_control_number_indentured_item_data>
</geia:full_file>
</geia:GEIA-STD-0007>
I have been looking online for code that can help me accomplish this task but have not had much luck. So far this is the code I have been trying to use:
----step 1 Import XML data from an XML file into SQL Server table using the OPENROWSET function
drop table lsa.XMLwithOpenXML
CREATE TABLE lsa.XMLwithOpenXML
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)
INSERT INTO lsa.XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'D:\Temp\e2c.xml', SINGLE_CLOB) AS x;
--SELECT * FROM lsa.XMLwithOpenXML
--get xmldata to shred
-------------------------------------------------------------------------
Declare #xmlData as xml
Select #xmlData = XMLData FROM lsa.XMLwithOpenXML
------------------------------------------------------------
--create variable to hold the int id of the xmldoc created by the sp
DECLARE #XMLdocId AS INT
--procedureName, outputId, InputData
EXEC sp_xml_preparedocument #XMLdocId OUTPUT, #xmlData , '<geia:GEIA-STD-0007 xmlns:geia="http://www.geia_STD_0007.com/2006/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">'
--create an OenXML query to shred the doc, or transfer it into rowsets
INSERT INTO [lsa].[XB]
(
[end_item_acronym_code])
Select end_item_acronym_code from
OpenXML(#XMLdocId, '/geia:XB_logistics_support_analysis_control_number_indentured_item_data',2)
;WITH XMLNAMESPACES ('xmlns:geia="http://www.geia_STD_0007.com/2006/schema' AS geia, DEFAULT 'http://www.w3.org/2001/XMLSchema-instance' )
SELECT
x.c( '.' ) AS result
FROM #xmlData.nodes('geia:GEIA-STD-0007/geia:full_file/geia:XB_logistics_support_analysis_control_number_indentured_item_data') x(c)
EXEC sp_xml_removedocument #XMLdocId
I realize that this code is very wrong. The path that I am passing to the OpenXML() function is wrong, but I have tried many iterations of it and none have been successful. I am also not 100% certain how to go about pulling out the different table data (ie XA, XB) but my plan is to pull from one table then reiterate the code for each additional table. This code will be used to import large amounts of data (I only posted a part of the xml file) with many different tables. If anyone has a better idea, then I would welcome it, as I am still learning.

Is there any way to retrieve inserted rows of a command

We probably all know SCOPE_IDENTITY() to retrieve the identity generated by a single insert. Currently I'm in the need of some kind of magic variable or function to retrieve all the rows generated by a statement, eg:
INSERT INTO [dbo].[myMagicTable]
(
[name]
)
SELECT [name]
FROM [dbo].[myMagicSource]
WHERE /* some weird where-clauses with several subselects ... */;
INSERT INTO [dbo].[myMagicBackupTable]
(
[id],
[name]
)
SELECT
[id],
[name]
FROM ???
An insert trigger is no option, as this will perform a single insert which is a problem for a batch of 10.000 rows...
So, is there any way to achieve this?
We are using mssql2005<
For SQL Server 2005+, you can use the OUTPUT clause.
DECLARE #InsertedIDs table(ID int);
INSERT INTO [dbo].[myMagicTable]
OUTPUT INSERTED.ID
INTO #InsertedIDs
SELECT ...
You could define a temporary table (possibly a table variable) and make use of the OUTPUT clause on your INSERT (you can make use of the Inserted pseudo-table, like in a trigger):
DECLARE #NewIDs TABLE (MagicID INT, Name VARCHAR(50))
INSERT INTO [dbo].[myMagicTable]([name])
OUTPUT Inserted.MagicID, Inserted.Name INTO #NewIDs(MagicID, Name)
SELECT [name]
FROM [dbo].[myMagicSource]
WHERE /
and then use that table variable after the INSERT:
INSERT INTO
[dbo].[myMagicBackupTable]([id], [name])
SELECT MagicID, [name]
FROM #NewIDs
and go from there.