I have a table with two fields of type NUMERIC and one field of type XML. Here is a rough sample:
CREATE TABLE books (
ID INT NOT NULL,
price NUMERIC(4,2),
discount NUMERIC(2,2),
book XML
);
The XML value will look something like, say,
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>Harry Potter</title>
<author>J K Rowling</author>
<Store>
<Name>Burke and Burkins</Name>
<Address>Some St, Somewhere, Some City</Address>
</Store>
</book>
Now my question is, using xml.modify(), how can I add two xpaths under Store with the price and discount with value from books.price and books.discount?
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>Harry Potter</title>
<author>J K Rowling</author>
<Store>
<Name>Burke and Burkins</Name>
<Address>Some St, Somewhere, Some City</Address>
<Price>value from books.price from the same row</Price>
<Discount>value from books.discount from the same row</Discount>
</Store>
</book>
This is a rough example, so please don't worry about where the XML data came from. Lets just say the book column has the XML data already present.
I know how to update the table with static values with,
UPDATE books
SET book.modify('insert <Price>10.99</Price><Discount>20.00</Discount> after (/book/Store/Address)[1]')
Performance is not a consideration here.
It is not possible to do two modifications in one statement.
In this case you might trick this out by first combining both values and then insert them at once.
I use an updateable CTE to achieve this:
CREATE TABLE books (
ID INT NOT NULL,
price NUMERIC(4,2),
discount NUMERIC(2,2),
book XML
);
--Fill the table with data
INSERT INTO books VALUES(1,10.5,.5,
'<book>
<title>Harry Potter</title>
<author>J K Rowling</author>
<Store>
<Name>Burke and Burkins</Name>
<Address>Some St, Somewhere, Some City</Address>
</Store>
</book>');
--This is the actual query
WITH CTE AS
(
SELECT *
,(SELECT price AS Price,discount AS Discount FOR XML PATH(''),TYPE) AS XmlNode
FROM books
)
UPDATE CTE SET book.modify('insert sql:column("XmlNode") after (/book/Store/Address)[1]');
--Check the result
SELECT *
FROM books;
--Clean-Up (carefull with real date!)
GO
--DROP TABLE books;
One hint
Your XML column, if it is really XML, will - for sure! - not contain an XML starting with <?xml version="1.0" encoding="UTF-8"?>. The internal encoding is always unicode (ucs-2, which is almost utf-16) and one cannot change this. If you pass in a declaration, it is either ommited or you'll get an error.
UPDATE
Another approach was to first read the XML's values and then to rebuild it:
WITH CTE AS
(
SELECT *
,(SELECT b.value('title[1]','nvarchar(max)') AS [title]
,b.value('author[1]','nvarchar(max)') AS [author]
,b.value('(Store/Name)[1]','nvarchar(max)') AS [Store/Name]
,b.value('(Store/Address)[1]','nvarchar(max)') AS [Store/Address]
,price AS [Store/Price]
,discount AS [Store/Discount]
FROM book.nodes('book') AS A(b)
FOR XML PATH('book'),TYPE
) AS bookNew
FROM books
)
UPDATE CTE SET book=bookNew;
Related
I have a SQL table Products with 2 columns as below.
ID
ProductDetails
2
<XML>
3
<XML>
The XML column holds the following data:
<Products>
<product key="0" description="">Product1</product>
<product key="1" description="">Product2</product>
<product key="2" description="">Product3</product>
<product key="3" description="">Product4</product>
<product key="4" description="">Product5</product>
<product key="5" description="">Product6</product>
<product key="6" description="">Product7</product>
<product key="7" description="">Product8</product>
</Products>
How can I get the relevant node from the ProductDetails for ProductTitle?
For example: if the ID column has 3, I need to query the ProductDetails column and create a new column with just the ProductTitle to be Product3.
ID
ProductDetails
ProductTitle
5
<XML>
Product5
3
<XML>
Product3
Any help would be appreciated.
Please try the following.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT PRIMARY KEY, ProductDetails XML);
INSERT #tbl (ID, ProductDetails) VALUES
(3, N'<Products>
<product key="0" description="">Product1</product>
<product key="1" description="">Product2</product>
<product key="2" description="">Product3</product>
<product key="3" description="">Product4</product>
<product key="4" description="">Product5</product>
<product key="5" description="">Product6</product>
<product key="6" description="">Product7</product>
<product key="7" description="">Product8</product>
</Products>');
-- DDL and sample data population, end
SELECT ID
, ProductDetails.value('(/Products/product[#key=sql:column("ID")]/text())[1]','VARCHAR(20)') AS ProductTitle
FROM #tbl;
Output
ID
ProductTitle
3
Product4
You need to use the XML .nodes() function to access and filter the "product" XML elements, and then the .value() function to extract the desired text value from that node. Both take xpath parameters to specify the data to be extracted.
Try
SELECT P.*, X.N.value('text()[1]', 'nvarchar(max)')
FROM #Products P
CROSS APPLY #ProductXml.nodes('/Products/product[#key=sql:column("P.ID")]') X(N)
or equivilently:
SELECT P.*, PN.ProductName
FROM #Products P
CROSS APPLY (
SELECT ProductName = X.N.value('text()[1]', 'nvarchar(max)')
FROM #ProductXml.nodes('/Products/product[#key=sql:column("P.ID")]') X(N)
) PN
or even
SELECT P.*, PN.ProductName
FROM #Products P
JOIN (
SELECT
ProductKey = X.N.value('#key', 'nvarchar(max)'),
ProductName = X.N.value('text()[1]', 'nvarchar(max)')
FROM #ProductXml.nodes('/Products/product') X(N)
) PN ON PN.ProductKey = P.ID
The subselect in the latter could also be used to build a temp table for more traditional query access.
In the xpath strings,
/Products/product selects all product nodes
[#key = ...] selects the key attribute to be used as a filter
(equality)
sql:column("P.ID") allows a reference back to a column
in the containing SQL
text() is a special selector that chooses the text within the XML element
[1] filters that down to at most a single value (a singleton)
nvarchar(max) defines the result datatype for the .value() function.
See this db<>fiddle.
I am just new to reading in an XML file into a table through SQL Server Management Studio. There are probably better ways but I would like to use this approach.
Currently I am reading in a standard XML file of records on people. A <record> tag is the highest level of each row of data. I want to read all the records into separate rows into my SQL table.
I have gotten along fine so far using the following approach as follows:
SELECT
-- Record
category, editor, entered, subcategory, uid, updated,
-- Person
first_name, last_name, ssn, ei, title, POSITION,
FROM OPENXML(#hDoc, 'records/record/person/names')
WITH
(
-- Record
category [varchar](100) '../../#category',
editor [varchar](100) '../../#editor',
entered Datetime '../../#entered',
subcategory [varchar](100) '../../#subcategory',
uid BIGINT '../../#uid',
updated [varchar](100) '../../#updated',
-- Person
first_name [varchar](100) 'first_name',
last_name [varchar](100) 'last_name',
ssn [varchar](100) '../#ssn',
ei [varchar](100) '../#e-i',
title [varchar](100) '../title',
Position [varchar](100) '../position',
)
However this approach has worked fine as the tag names have all been unique to each record/person. The issue I have is within the <Person> tag I now have an <Aliases> tag that contains a list of more than 1 <Alias> test name </Alias> tags. If I use the above approach & reference '../aliases' I get all the Alias elements as one long String row mixed together. If I just try '../aliases/alias' ONLY the first element is returned per record row. If there was 10 Alias elements within the Aliases tag set I would like 10 rows returned for example.
Is there a way to specify that when there are multiple tags of the same name within a higher level tag, return them all & not just one row?
The following is the example block within the XML I am referring to:
- <aliases>
<alias>test 1</alias>
<alias>test 2</alias>
<alias>test 3</alias>
</aliases>
I would like the following in the SQL table:
Record Aliases
Record 1 test 1
Record 1 test 2
Record 1 test 3
Record 2 test 4
Record 2 test 5
but all I get is:
Record 1 test 1
Record 2 test 4
Apologies if I have not explained this correctly - any help would be greatly appreciated.
As Antonio indicated, I would use XQuery instead of Openxml. I've guessed at your source xml, and provided code for your 2 queries:
/* Please note I have used different cases than you to follow a standard (first letter capitialised for elements, lowercase for attributes.
As xml is case sensitive, you may need to change the below code to suit the case of your data */
/* Bring your xml into an xml variable */
DECLARE #xml xml = '
<Records>
<Record category="category1" editor="editor1" entered="2015-01-01" subcategory="subcategory1" uid="100001" updated="updated1">
<Person ssn="ssn1" e-i="ei1">
<Title>Title1</Title>
<Position>Position1</Position>
<Names>
<First_name>FirstName1</First_name>
<Last_name>LastName1</Last_name>
<Aliases>
<Alias>test1</Alias>
<Alias>test2</Alias>
<Alias>test3</Alias>
</Aliases>
</Names>
</Person>
</Record>
<Record category="category2" editor="editor2" entered="2015-01-02" subcategory="subcategory2" uid="100002" updated="updated2">
<Person ssn="ssn2" e-i="ei2">
<Title>Title2</Title>
<Position>Position2</Position>
<Names>
<First_name>FirstName2</First_name>
<Last_name>LastName2</Last_name>
<Aliases>
<Alias>test4</Alias>
<Alias>test5</Alias>
</Aliases>
</Names>
</Person>
</Record>
</Records>'
/* The unary relationship of 1 element/attribute type per record in xpath */
SELECT T.rows.value('#category[1]', '[varchar](100)') AS Category
,T.rows.value('#editor[1]', '[varchar](100)') AS Editor
,T.rows.value('#entered[1]', '[Datetime]') AS Entered
,T.rows.value('#subcategory[1]', '[varchar](100)') AS Subcategory
,T.rows.value('#uid[1]', '[bigint]') AS [UID]
,T.rows.value('#updated[1]', '[varchar](100)') AS Updated
,T.rows.value('(Person/Names/First_name)[1]', '[varchar](100)') AS First_Name
,T.rows.value('(Person/Names/Last_name)[1]', '[varchar](100)') AS Last_Name
,T.rows.value('(Person/#ssn)[1]', '[varchar](100)') AS SSN
,T.rows.value('(Person/#e-i)[1]', '[varchar](100)') AS ei
,T.rows.value('(Person/Title)[1]', '[varchar](100)') AS Title
,T.rows.value('(Person/Position)[1]', '[varchar](100)') AS Position
FROM #xml.nodes('/Records/Record') T(rows)
/* Record to alias, one-to-many mapping */
SELECT T.rows.value('../../../../#category[1]', '[varchar](100)') AS Category
,T.rows.value('.', '[varchar](100)') AS Alias
FROM #xml.nodes('/Records/Record/Person/Names/Aliases/Alias') T(rows)
Try modifying the FROM clause as follows:
SELECT
...
FROM OPENXML(#hDoc, 'records/record/person/aliases')
When I work with XML files i do the following to split the nodes into columns:
XML file sitting in a column in a table with the following format:
<CustomData>
<Name>Shaun</Name>
<Surname>Johnson</Surname>
<Title>Mr</Title>
<Age>29</Age>
</CustomData>
The Select statement I use:
SELECT
[ColumnName].value('(/CustomData//Name/node())[1]' , 'nvarchar(max)') AS Name,
[ColumnName].value('(/CustomData//Surname/node())[1]','nvarchar(max)') AS Surname,
[ColumnName].value('(/CustomData//Title/node())[1]' , 'nvarchar(max)') AS Title,
[ColumnName].value('(/CustomData//Age/node())[1]' , 'nvarchar(max)') AS Age
FROM TableName
Results:
Name Surname Title Age
---- ------- ----- ---
Shaun Johnson Mr 29
openxml must return a scalar value, you should use the .nodes method in XML joined with your OPENXML
I have a task that requires me to pull in a set of xml files, which are all related, then pick out a subset of records from these files, transform some columns, and export to a a single xml file using a different format.
I'm new to SSIS, and so far I've managed to first import two of the xml files (for simplicity, starting with just two files).
The first file we can call "Item", containing some basic metadata, amongst those an ID, which is used to identify related records in the second file "Milestones". I filter my "valid records" using a lookup transformation in my dataflow - now I have the valid Item ID's to fetch the records I need. I funnel these valid ID's (along with the rest of the columns from Item.xml through a Sort, then into a merge join.
The second file is structured with 2 outputs, one containing two columns (ItemID and RowID). The second containing all of the Milestone related data plus a RowID. I put these through a inner merge join, based on RowID, so I have the ItemID in with the milestone data. Then I do a full outer join merge join on both files, using ItemID.
This gives me data sort of like this:
ItemID[1] - MilestoneData[2]
ItemID[1] - MilestoneData[3]
ItemID[1] - MilestoneData[4]
ItemID[1] - MilestoneData[5]
ItemID[2] - MilestoneData[6]
ItemID[2] - MilestoneData[7]
ItemID[2] - MilestoneData[8]
I can put this data through derived column transformations to create the columns of data I actually need, but I can't see how to structure this in a relational way/normalize it into a different xml format.
The idea is to output something like:
<item id="1">
<Milestone id="2">
<Milestone />
<Milestone id="3">
<Milestone />
</item>
Can anyone point me in the right direction?
UPDATE:
A bit more detailed picture of what I have, and what I'd like to achieve:
Item.xml:
<Items>
<Item ItemID="1">
<Title>
Data
</Title>
</Item>
<Item ItemID="2">
...
</Item>
...
</Items>
Milestone.xml:
<Milestones>
<Item ItemID="2">
<MS id="3">
<MS_DATA>
Data
</MS_DATA>
</MS>
<MS id="4">
<MS_DATA>
Data
</MS_DATA>
</MS>
</Item>
<Item ItemID="3">
<MS id="5">
<MS_DATA>
Data
</MS_DATA>
</MS>
</item>
</Milestones>
The way it's presented in SSIS when I use XML source, is not entirely intuitive, meaning the Item rows and the MS rows are two seperate outputs. I had to run these through a join in order to get the Milestones that corresponds to specific Items. No problem here, then run it through a full outer join with the items, so I get a flattened table with multiple rows containing obviously the same data for an Item and with different data for the MS. Basically I get what I tried to show in my table, lots of redundant Item data, for each unique MilestoneData.
In the end it has to look similar to:
<NewItems>
<myNewItem ItemID="2">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones>
<MS_DATA_TRANSFORMED MSID="3">
data
</MS_DATA_TRANSFORMED>
<MS_DATA_TRANSFORMED MSID="4">
data
</MS_DATA_TRANSFORMED>
</MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
<myNewItem ItemID="3">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones>
<MS_DATA_TRANSFORMED MSID="5">
data
</MS_DATA_TRANSFORMED>
</MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
</myNewItem>
<myNewItem ItemID="4">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones></MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
</myNewItem>
</NewItems>
I would try to handle this using a script component with the component type transformation. As you are new to ssis, i assume you haven't used this before. So basically you
define input columns, your component will expect (i.e. column input_xml containing ItemID[1] - MilestoneData[2];...
use c# to create a logic which cuts and sticks together
define output columns your component will use to deliver the transformed row
You will face the problem that one row will probably be used two times in the end, like i.e.
ItemID[1] - MilestoneData[2]
will result in
<item id="1">
<Milestone id="2">
I have done something pretty similar using Pentaho kettle, even without using something like a script component in which you define own logic. But i guess ssis has a lack of tasks here.
How about importing the XML into relational tables ( eg in tempdb ) then using FOR XML PATH to reconstruct the XML? FOR XML PATH offers a high degree of control over how you want the XML to look. A very simple example below:
CREATE TABLE #items ( itemId INT PRIMARY KEY, title VARCHAR(50) NULL )
CREATE TABLE #milestones ( itemId INT, msId INT, msData VARCHAR(50) NOT NULL, PRIMARY KEY ( itemId, msId ) )
GO
DECLARE #itemsXML XML
SELECT #itemsXML = x.y
FROM OPENROWSET( BULK 'c:\temp\items.xml', SINGLE_CLOB ) x(y)
INSERT INTO #items ( itemId, title )
SELECT
i.c.value('#ItemID', 'INT' ),
i.c.value('(Title/text())[1]', 'VARCHAR(50)' )
FROM #itemsXML.nodes('Items/Item') i(c)
GO
DECLARE #milestoneXML XML
SELECT #milestoneXML = x.y
FROM OPENROWSET( BULK 'c:\temp\milestone.xml', SINGLE_CLOB ) x(y)
INSERT INTO #milestones ( itemId, msId, msData )
SELECT
i.c.value('#ItemID', 'INT' ),
i.c.value('(MS/#id)[1]', 'VARCHAR(50)' ) msId,
i.c.value('(MS/MS_DATA/text())[1]', 'VARCHAR(50)' ) msData
FROM #milestoneXML.nodes('Milestones/Item') i(c)
GO
SELECT
i.itemId AS "#ItemID"
FROM #items i
INNER JOIN #milestones ms ON i.itemId = ms.itemId
FOR XML PATH('myNewItem'), ROOT('NewItems'), TYPE
DROP TABLE #items
DROP TABLE #milestones
I am using InfoPath 2007 to send out a survey (it is not connected to SharePoint or a DataBase). The file I will get back is an XML file. Every place there is an answer block, it has its own unique id (aka field name).
Now, I have a SQL Server Database (2007?) with a table "Responses". Its columns are: AnswerID(unique PK), QuestionID (FK) (which is the unique id (field name), and Answer. The QuestionID is already populated with the unique id (field name). There are more than 300 records for QuestionID.
What I need to be able to do is reach into the XML file, find the QuestionID (field name), grab the data for that field name, and then put the data into the DB column "Answer" that matches the field name in the QuestionID column.
Is there an easy/medium way to do this mapping/updating with the least amount of chance of error?
NOTE: I tried to use the DB import XML data wizard, the information breaks out into an unmanageable number of tables.
You can shred the XML into rows and columns and then use that to update your table.
Here is a little example of what you can do.
create table Responses(QuestionID varchar(10), Answer varchar(10))
insert into Responses values('Q1', null)
insert into Responses values('Q2', null)
insert into Responses values('Q3', null)
declare #xml xml
set #xml =
'<root>
<question ID="Q1">Answer1</question>
<question ID="Q2">Answer2</question>
<question ID="Q3">Answer3</question>
</root>'
;with cte as
(
select
r.n.value('#ID', 'varchar(10)') as QuestionID,
r.n.value('.', 'varchar(10)') as Answer
from #xml.nodes('/root/*') as r(n)
)
update R
set Answer = C.Answer
from Responses as R
inner join cte as C
on R.QuestionID = C.QuestionID
select *
from Responses
Result:
QuestionID Answer
---------- ----------
Q1 Answer1
Q2 Answer2
Q3 Answer3
The XML I used most certainly does not look anything like what you have but it should give you a hint of what you can do. If you post a sample of your XML file, table structure and expected result/output you can probably get a more precise answer.
Edit
declare #xml xml =
'<?xml version="1.0" encoding="UTF-8"?>
<my:myFields xmlns:my="xx.com" xml:lang="en-us">
<my:group1>
<my:group2>
<my:field1>Im an analyst.</my:field1>
<my:group3>
<my:group4>
<my:field2>1</my:field2>
<my:field3>I click the mouse.</my:field3>
</my:group4>
<my:group4>
<my:field2>2</my:field2>
<my:field3>I type on the keyboard.</my:field3>
</my:group4>
</my:group3>
</my:group2>
<my:group2>
<my:field1>Im a stay at home mom.</my:field1>
<my:group3>
<my:group4>
<my:field2>1</my:field2>
<my:field3>I Cook.</my:field3>
</my:group4>
<my:group4>
<my:field2>2</my:field2>
<my:field3>I clean.</my:field3>
</my:group4>
</my:group3>
</my:group2>
</my:group1>
</my:myFields>'
;with xmlnamespaces('xx.com' as my)
select
T.N.value('../../my:field1[1]', 'varchar(50)') as Field1,
T.N.value('my:field2[1]', 'varchar(50)') as Field2,
T.N.value('my:field3[1]', 'varchar(50)') as Field3
from #xml.nodes('my:myFields/my:group1/my:group2/my:group3/my:group4') as T(N)
Result:
Field1 Field2 Field3
Im an analyst. 1 I click the mouse.
Im an analyst. 2 I type on the keyboard.
Im a stay at home mom. 1 I Cook.
Im a stay at home mom. 2 I clean.
I am using a CTE to recurse data I have stored in a recursive table. The trouble is I am trying to figure out how I can use "FOR XML" to build the desired xml output. I have a Table of Contents table I am recursing and I want to be able to use that data to generate the XML.
Here is an example of what the data is simliar to:
ID|TOC_ID|TOC_SECTION|TOC_DESCRIPTON|PARENT_ID
1|I|Chapter|My Test Chapter|-1
2|A|Section|My Test Section|1
3|1|SubSection|My SubSection|2
I want to be able to spit out the data like so:
XML Attributes:
ID = Appended values from the TOC_ID field
value = value from TOC_Section field
<FilterData>
<Filter id="I" value="Chapter">
<Description>My Test Chapter</Description>
<Filter id="I_A" value="Section">
<Description>My Test Section</Description>
<Filter id="I_A_1" value="SubSection">
<Description>My Test SubSection</Description>
</Filter>
</Filter>
</Filter>
</FilterData>
Not sure how I can take the CTE data and produce a similar format to the above. When the data is in separate tables it isn't too difficult to build this type of output.
As always appreciate the input.
Thanks,
S
You may get some mileage from Recursive Hierarchies to XML in Christian Wade's blog - it all looks mighty painful to me!
Check this out Will (Not sure you are still following)....this does have a 32 level max, but that should still work fine for my stuff...can't see going deeper than that. Found this on another forum:
CREATE TABLE tree ( id INT, name VARCHAR(5), parent_id INT )
GO
INSERT INTO tree VALUES ( 1, 'N1', NULL )
INSERT INTO tree VALUES ( 3, 'N4', 1 )
INSERT INTO tree VALUES ( 4, 'N10', 3 )
INSERT INTO tree VALUES ( 5, 'N7', 3 )
GO
CREATE FUNCTION dbo.treeList(#parent_id int)
RETURNS XML
WITH RETURNS NULL ON NULL INPUT
BEGIN RETURN
(SELECT id as "#id", name as "#name",
CASE WHEN parent_id=#parent_id
THEN dbo.treeList(id)
END
FROM dbo.tree WHERE parent_id=#parent_id
FOR XML PATH('tree'), TYPE)
END
GO
SELECT id AS "#id", name AS "#name",
CASE WHEN id=1
THEN dbo.treeList(id)
END
FROM tree
WHERE id=1
FOR XML PATH('tree'), TYPE
Now isn't that nice and simple?
Customised from the great example over on http://msdn.microsoft.com/en-us/library/ms345137.aspx