Parse XML text to identify the node values - sql

I'm trying to parse an XML file that I get from a url (sample below) and I need to parse the items that are on the record node. I am planning to put it into a SQL database with code at the bottom but can't figure out that line of code
<pmcids status="ok">
<request idtype="pmid" pmids="" versions="yes" showaiid="no">
<echo>ids=19240239;tool=HCC;email=morgenxxx%40xxxx.edu;format=xml</echo>
</request>
<record requested-id="19240239" pmcid="PMC2668929" pmid="19240239" doi="10.1158/1055-9965.EPI-08-0866">
<versions><version pmcid="PMC2668929.1" mid="NIHMS104698" current="true"/>
</versions>
</record>
</pmcids>
SQL code:
nref.value('#PMID[1]','varchar(max)') pmid,
nref.value('#PMCID[1]','varchar(max)') PMCID
All help is appreciated. I hope that this is enough information to determine the correct syntax

Use the native XQuery support in SQL Server! Much simpler than OPENXML ....
Try this:
DECLARE #input XML = '<pmcids status="ok">
<request idtype="pmid" pmids="" versions="yes" showaiid="no">
<echo>ids=19240239;tool=HCC;email=morgenxxx%40xxxx.edu;format=xml</echo>
</request>
<record requested-id="19240239" pmcid="PMC2668929" pmid="19240239" doi="10.1158/1055-9965.EPI-08-0866">
<versions>
<version pmcid="PMC2668929.1" mid="NIHMS104698" current="true"/>
</versions>
</record>
</pmcids>'
SELECT
RequestedId = xc.value('#requested-id', 'int'),
pmcid = xc.value('#pmcid', 'varchar(50)'),
pmid = xc.value('#pmid', 'int'),
doi = xc.value('#doi', 'varchar(50)')
FROM
#input.nodes('/pmcids/record') AS XT(XC)
Basically, the .nodes() call returns a "virtual" table XT with a column XC that contains the XML fragment for each of the XML nodes that match your XPath expression - here a list of all <record> nodes under the <pmcids> root node.
Then, using the .value() call, you can "reach into" each of those nodes in the XML elements and retrieve the individual bits - since those are all attributes, you the # prefix to indicate an attribute, and define the data type of your attribute.
This gives me an output of:
which you could easily insert into a database table
Update: if you also need the mid from the <version> node - use this:
SELECT
RequestedId = xc.value('#requested-id', 'int'),
pmcid = xc.value('#pmcid', 'varchar(50)'),
pmid = xc.value('#pmid', 'int'),
doi = xc.value('#doi', 'varchar(50)'),
VersionPmcid = xver.value('#pmcid', 'varchar(50)'),
mid = xver.value('#mid', 'varchar(50)')
FROM
#input.nodes('/pmcids/record') AS XT(XC)
CROSS APPLY
XC.nodes('versions/version') AS XT2(XVer)
(I added the pmcid attribute from the <version> node, since there might be multiple <version> nodes under a <record> from what this sample looks like)

Related

SQL query for XML data

I have a SQL Server database table with a column called XML that contains XML data which is structured like this:
<Item xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://test/data">
<Roots>
<Root>
<Name>Field Name</Name>
<Value>Field Value</Value>
</Root>
<Root>
<Name>Field Name</Name>
<Value>Field Value</Value>
</Root>
</Roots>
I want to use T-SQL to get the Value where Name = Total. I have tried the following but it isn't returning any data:
SELECT [XML]
FROM [BusinessAccount]
WHERE [XML].value('(/Root/Name)[13]', 'VARCHAR(MAX)') LIKE '%Total%'
If anyone could tell me where I've gone wrong?
You are missing the required WITH XMLNAMESPACES for your XML and the path is incorrect.
If you want to bring back rows where the 13th element consists of the text Total you can use the below.
WITH XMLNAMESPACES (DEFAULT 'http://test/data')
SELECT [XML]
FROM [BusinessAccount]
WHERE 1 = [XML].exist('(/Item/Roots/Root/Name)[13][text() = "Total"]')
Otherwise you can add the WITH XMLNAMESPACES to your original query and fix the path there too.
You need to specify namespaces. You can then match <Name> and <Value> pairs and extract the contents of <Value> like so:
SELECT NameNode.value('declare namespace x="http://test/data"; (../x:Value)[1]', 'varchar(100)')
FROM [BusinessAccount]
CROSS APPLY [XML].nodes('declare namespace x="http://test/data"; //x:Root/x:Name') AS n(NameNode)
WHERE NameNode.value('.', 'varchar(100)') = 'Total'
Demo on db<>fiddle

Importing XML to SQL Server

I am wondering how I can insert an XML file into a SQL Server DB. Below is the XML I have but I am unsure how to do this in a way that will scale. My thought is a Insert Into Select statement but I do not know if that is going to work as the data increases. Thank you in advance!
<Records>
<Record>
<ID SpecNum="5069580" IssueNum="001" SpecStatus="Pre-Approved">
<NutritionDetails>
<NutrientFacts>
<NutrientNameId>ENERC_KCAL</NutrientNameId>
<NutrientName>ENERC_KCAL</NutrientName>
<NutrientPer100gUnrounded>1.91</NutrientPer100gUnrounded>
<NutrientPer100gRounded>191</NutrientPer100gRounded>
</NutrientFacts>
</NutritionDetails>
</ID>
</Record>
</Records>
Once you've successfully created a proper, valid XML - you should be able to use this T-SQL code to grab the details:
SELECT
-- get the attributes from the <ID> node
IDSpecNum = XC.value('(ID/#SpecNum)[1]', 'int'),
IDIsseNum = XC.value('(ID/#IssueNum)[1]', 'int'),
IDSpecStatus = XC.value('(ID/#SpecStatus)[1]', 'varchar(100)'),
-- get the element values from the children of the <NutrientFacts> node
NutrientNameId = NUT.value('(NutrientNameId)[1]', 'varchar(100)'),
NutrientName = NUT.value('(NutrientName)[1]', 'varchar(100)'),
NutrientPer100gUnrounded = NUT.value('(NutrientPer100gUnrounded)[1]', 'decimal(20,4)'),
NutrientPer100gRounded = NUT.value('(NutrientPer100gRounded)[1]', 'decimal(20,4)')
FROM
dbo.YourTable
CROSS APPLY
-- get one XML fragment per <Record>
XmlData.nodes('/Records/Record') AS XT(XC)
CROSS APPLY
-- get one XML fragment per <NutrientFacts> inside
XC.nodes('ID/NutritionDetails/NutrientFacts') AS XT2(NUT)
The first CROSS APPLY basically get an "inline pseudo table" with one XML fragment for each <Record> node in your XML in the XmlData column of your table (this is just an assumption on my part - adapt to your reality!). These XML fragments are referenced as "pseudo-table" XT with a single column XC.
With that XC column's XML fragment, you can "reach in" and grab the attribute values from the <ID> node in the <Record> - that's the first three values.
Then, based on the XT pseudo table, I apply another CROSS APPLY to get all the <NutrientFacts> nodes inside <ID> / <NutritionDetails> - those are referenced as pseudo-table XT2 with column NUT, which again holds an XML fragment for each <NutrientFacts> node; I reach into that XML node and extract the values from the sub-elements of that node - those are the four additional values that are shown in the select.
Now that you have a SELECT that returns all the values - you can easily get those bits you need and use them in a INSERT INTO dbo.MyTable(list-of-columns) SELECT list-of-columns :...... scenario. Enjoy!
UPDATE: to import an XML file from disk (local disk on your SQL Server machine's file system) into your table - use something like this:
INSERT INTO dbo.YourTable(XmlData)
SELECT
CONVERT(XML, BulkColumn) AS BulkColumn
FROM
OPENROWSET(BULK 'C:\temp\records.xml', SINGLE_BLOB) AS x;
Again: adapt to your needs - I don't know if you want to insert additional information into dbo.YourTable - and I don't even know your table name; you can load one XML at a time from disk

extract datav values for XML column with XML namespaces in SQL Server

Can anybody please help me with the below xml. I need extract all the xml values like below.
AwarYear Comments FieldCode FieldNumber Key Value
AY2013-14 AAI: Adjusted Available Income AAI 306 Blank None Calculated
Here is the sample XML.
<SchemaType xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/process">
<AwardYear>AY2013_14</AwardYear>
<Fields>
<FieldSchema>
<Comments>AAI: Adjusted Available Income</Comments>
<DbLocation>IsirData</DbLocation>
<FieldCode>AAI</FieldCode>
<FieldNumber>306</FieldNumber>
<ReportDisplay>Data</ReportDisplay>
<ValidContent>
<ValidValueContent xmlns:d5p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<d5p1:KeyValueOfstringstring>
<d5p1:Key>Blank</d5p1:Key>
<d5p1:Value>None calculated</d5p1:Value>
</d5p1:KeyValueOfstringstring>
</ValidValueContent>
</ValidContent>
</FieldSchema>
</Fields>
</SchemaType>
Please do the need full. Thanks in advance.
Assuming you have your XML in a table inside an XML column like this:
DECLARE #XmlTable TABLE (ID INT NOT NULL, XMLDATA XML)
INSERT INTO #XmlTable VALUES(1, '<SchemaType xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/process">
<AwardYear>AY2013_14</AwardYear>
<Fields>
<FieldSchema>
<Comments>AAI: Adjusted Available Income</Comments>
<DbLocation>IsirData</DbLocation>
<FieldCode>AAI</FieldCode>
<FieldNumber>306</FieldNumber>
<ReportDisplay>Data</ReportDisplay>
<ValidContent>
<ValidValueContent xmlns:d5p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<d5p1:KeyValueOfstringstring>
<d5p1:Key>Blank</d5p1:Key>
<d5p1:Value>None calculated</d5p1:Value>
</d5p1:KeyValueOfstringstring>
</ValidValueContent>
</ValidContent>
</FieldSchema>
</Fields>
</SchemaType>')
then you can use this T-SQL statement to fetch the values:
;WITH XMLNAMESPACES(DEFAULT 'http://schemas.datacontract.org/process',
'http://schemas.microsoft.com/2003/10/Serialization/Arrays' AS ns1)
SELECT
AwardYear = XmlData.value('(SchemaType/AwardYear)[1]', 'varchar(25)'),
Comments = XmlData.value('(SchemaType/Fields/FieldSchema/Comments)[1]', 'varchar(50)'),
FieldCode = XmlData.value('(SchemaType/Fields/FieldSchema/FieldCode)[1]', 'varchar(10)'),
FieldNumber = XmlData.value('(SchemaType/Fields/FieldSchema/FieldNumber)[1]', 'int'),
[Key] = XmlData.value('(SchemaType/Fields/FieldSchema/ValidContent/ValidValueContent/ns1:KeyValueOfstringstring/ns1:Key)[1]', 'varchar(10)'),
[Value] = XmlData.value('(SchemaType/Fields/FieldSchema/ValidContent/ValidValueContent/ns1:KeyValueOfstringstring/ns1:Value)[1]', 'varchar(10)')
FROM
#XmlTable
I defined the top-level XML namespace as the "default" namespace (that doesn't need to be referenced all over the place), and the second namespace deep inside your structure is defined explicitly with a separate XML namespace prefix.

Modify xml element name in SQL Server

How to change element name from Cust to Customer?
<Cust id="1">
<Name>aaaaaaaaaa</Name>
<Desc>bbbbbbbbbb</Desc>
</Cust>
When I'm using following statement
select #myXml.query('/node()[1]/node()') for xml raw('Customer')
sql removes attributes
<Customer>
<Name>aaaaaaaaaa</Name>
<Desc>bbbbbbbbbb</Desc>
</Customer>
Try this:
SELECT
#myXml.value('(/Cust/#id)[1]', 'int') AS '#id',
#myXml.query('/node()[1]/node()')
FOR XML PATH('Customer')
Gives me an output of:
<Customer id="1">
<Name>aaaaaaaaaa</Name>
<Desc>bbbbbbbbbb</Desc>
</Customer>
With the FOR XML PATH, you can fairly easily "restore" that attribute that gets lost in the conversion.
You could use replace:
replace(replace(#YourXml, '<Cust id', '<Customer id)', '</Cust>', '</Customer>')
This is fairly safe, as < is not valid as data in XML, it would appear as < or an ASCII or UNICODE sequence.

How do I select a top-level attribute of an XML column in SQL Server?

I have an XML column in SQL Server that is the equivalent of:
<Test foo="bar">
<Otherstuff baz="belch" />
</Test>
I want to get the value of the foo attribute of Test (the root element) as a varchar. My goal would be something along the lines of:
SELECT CAST('<Test foo="bar"><Otherstuff baz="belch" /></Test>' AS xml).value('#foo', 'varchar(20)') AS Foo
But when I run the above query, I get the following error:
Msg 2390, Level 16, State 1, Line 1
XQuery [value()]: Top-level attribute
nodes are not supported
John Saunders has it almost right :-)
declare #Data XML
set #Data = '<Test foo="bar"><Otherstuff baz="belch" /></Test>'
select #Data.value('(/Test/#foo)[1]','varchar(20)') as Foo
This works for me (SQL Server 2005 and 2008)
Marc
If you dont know the root element:
select #Data.value('(/*/#foo)[1]','varchar(20)') as Foo
Why does .value('#foo', 'varchar(20)') generate the error “Top-level attribute nodes are not supported”?
When you query the xml data type, the context is the document node, which is an implicit node that contains the root element(s) of your XML document. The document node has no name and no attributes.
How can I get the value of an attribute on the root element?
In your XQuery expression, include the path to the first root element:
DECLARE #Data xml = '<Customer ID="123"><Order ID="ABC" /></Customer>'
SELECT #Data.value('Customer[1]/#ID', 'varchar(20)')
-- Result: 123
If you don’t know (or don’t want to specify) the name of the root element, then just use * to match any element:
SELECT #Data.value('*[1]/#ID', 'varchar(20)')
-- Result: 123
Because the query context is the document node, you don’t need to prefix the XQuery expression with a forward slash (as the other answers unnecessarily do).
Why do I have to include [1]?
The XQuery expression you pass to value() must be guaranteed to return a singleton. The expression Customer/#ID doesn’t satisfy this requirement because it matches both ID="123" and ID="456" in the following example:
DECLARE #Data xml = '<Customer ID="123" /><Customer ID="456" />'
Remember that the xml data type represents an XML document fragment, not an XML document, so it can contain multiple root elements.
What’s the difference between Customer[1]/#ID and (Customer/#ID)[1]?
The expression Customer[1]/#ID retrieves the ID attribute of the first <Customer> element.
The expression (Customer/#ID)[1] retrieves the ID attribute of all <Customer> elements, and from this list of attributes, picks the first.
The following example demonstrates the difference:
DECLARE #Data xml = '<Customer /><Customer ID="123" /><Customer ID="456" />'
SELECT #Data.value('Customer[1]/#ID', 'varchar(20)')
-- Result: NULL (because the first Customer element doesn't have an ID attribute)
SELECT #Data.value('(Customer/#ID)[1]', 'varchar(20)')
-- Result: 123