I am just new to reading in an XML file into a table through SQL Server Management Studio. There are probably better ways but I would like to use this approach.
Currently I am reading in a standard XML file of records on people. A <record> tag is the highest level of each row of data. I want to read all the records into separate rows into my SQL table.
I have gotten along fine so far using the following approach as follows:
SELECT
-- Record
category, editor, entered, subcategory, uid, updated,
-- Person
first_name, last_name, ssn, ei, title, POSITION,
FROM OPENXML(#hDoc, 'records/record/person/names')
WITH
(
-- Record
category [varchar](100) '../../#category',
editor [varchar](100) '../../#editor',
entered Datetime '../../#entered',
subcategory [varchar](100) '../../#subcategory',
uid BIGINT '../../#uid',
updated [varchar](100) '../../#updated',
-- Person
first_name [varchar](100) 'first_name',
last_name [varchar](100) 'last_name',
ssn [varchar](100) '../#ssn',
ei [varchar](100) '../#e-i',
title [varchar](100) '../title',
Position [varchar](100) '../position',
)
However this approach has worked fine as the tag names have all been unique to each record/person. The issue I have is within the <Person> tag I now have an <Aliases> tag that contains a list of more than 1 <Alias> test name </Alias> tags. If I use the above approach & reference '../aliases' I get all the Alias elements as one long String row mixed together. If I just try '../aliases/alias' ONLY the first element is returned per record row. If there was 10 Alias elements within the Aliases tag set I would like 10 rows returned for example.
Is there a way to specify that when there are multiple tags of the same name within a higher level tag, return them all & not just one row?
The following is the example block within the XML I am referring to:
- <aliases>
<alias>test 1</alias>
<alias>test 2</alias>
<alias>test 3</alias>
</aliases>
I would like the following in the SQL table:
Record Aliases
Record 1 test 1
Record 1 test 2
Record 1 test 3
Record 2 test 4
Record 2 test 5
but all I get is:
Record 1 test 1
Record 2 test 4
Apologies if I have not explained this correctly - any help would be greatly appreciated.
As Antonio indicated, I would use XQuery instead of Openxml. I've guessed at your source xml, and provided code for your 2 queries:
/* Please note I have used different cases than you to follow a standard (first letter capitialised for elements, lowercase for attributes.
As xml is case sensitive, you may need to change the below code to suit the case of your data */
/* Bring your xml into an xml variable */
DECLARE #xml xml = '
<Records>
<Record category="category1" editor="editor1" entered="2015-01-01" subcategory="subcategory1" uid="100001" updated="updated1">
<Person ssn="ssn1" e-i="ei1">
<Title>Title1</Title>
<Position>Position1</Position>
<Names>
<First_name>FirstName1</First_name>
<Last_name>LastName1</Last_name>
<Aliases>
<Alias>test1</Alias>
<Alias>test2</Alias>
<Alias>test3</Alias>
</Aliases>
</Names>
</Person>
</Record>
<Record category="category2" editor="editor2" entered="2015-01-02" subcategory="subcategory2" uid="100002" updated="updated2">
<Person ssn="ssn2" e-i="ei2">
<Title>Title2</Title>
<Position>Position2</Position>
<Names>
<First_name>FirstName2</First_name>
<Last_name>LastName2</Last_name>
<Aliases>
<Alias>test4</Alias>
<Alias>test5</Alias>
</Aliases>
</Names>
</Person>
</Record>
</Records>'
/* The unary relationship of 1 element/attribute type per record in xpath */
SELECT T.rows.value('#category[1]', '[varchar](100)') AS Category
,T.rows.value('#editor[1]', '[varchar](100)') AS Editor
,T.rows.value('#entered[1]', '[Datetime]') AS Entered
,T.rows.value('#subcategory[1]', '[varchar](100)') AS Subcategory
,T.rows.value('#uid[1]', '[bigint]') AS [UID]
,T.rows.value('#updated[1]', '[varchar](100)') AS Updated
,T.rows.value('(Person/Names/First_name)[1]', '[varchar](100)') AS First_Name
,T.rows.value('(Person/Names/Last_name)[1]', '[varchar](100)') AS Last_Name
,T.rows.value('(Person/#ssn)[1]', '[varchar](100)') AS SSN
,T.rows.value('(Person/#e-i)[1]', '[varchar](100)') AS ei
,T.rows.value('(Person/Title)[1]', '[varchar](100)') AS Title
,T.rows.value('(Person/Position)[1]', '[varchar](100)') AS Position
FROM #xml.nodes('/Records/Record') T(rows)
/* Record to alias, one-to-many mapping */
SELECT T.rows.value('../../../../#category[1]', '[varchar](100)') AS Category
,T.rows.value('.', '[varchar](100)') AS Alias
FROM #xml.nodes('/Records/Record/Person/Names/Aliases/Alias') T(rows)
Try modifying the FROM clause as follows:
SELECT
...
FROM OPENXML(#hDoc, 'records/record/person/aliases')
When I work with XML files i do the following to split the nodes into columns:
XML file sitting in a column in a table with the following format:
<CustomData>
<Name>Shaun</Name>
<Surname>Johnson</Surname>
<Title>Mr</Title>
<Age>29</Age>
</CustomData>
The Select statement I use:
SELECT
[ColumnName].value('(/CustomData//Name/node())[1]' , 'nvarchar(max)') AS Name,
[ColumnName].value('(/CustomData//Surname/node())[1]','nvarchar(max)') AS Surname,
[ColumnName].value('(/CustomData//Title/node())[1]' , 'nvarchar(max)') AS Title,
[ColumnName].value('(/CustomData//Age/node())[1]' , 'nvarchar(max)') AS Age
FROM TableName
Results:
Name Surname Title Age
---- ------- ----- ---
Shaun Johnson Mr 29
openxml must return a scalar value, you should use the .nodes method in XML joined with your OPENXML
Related
I have a table with two fields of type NUMERIC and one field of type XML. Here is a rough sample:
CREATE TABLE books (
ID INT NOT NULL,
price NUMERIC(4,2),
discount NUMERIC(2,2),
book XML
);
The XML value will look something like, say,
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>Harry Potter</title>
<author>J K Rowling</author>
<Store>
<Name>Burke and Burkins</Name>
<Address>Some St, Somewhere, Some City</Address>
</Store>
</book>
Now my question is, using xml.modify(), how can I add two xpaths under Store with the price and discount with value from books.price and books.discount?
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>Harry Potter</title>
<author>J K Rowling</author>
<Store>
<Name>Burke and Burkins</Name>
<Address>Some St, Somewhere, Some City</Address>
<Price>value from books.price from the same row</Price>
<Discount>value from books.discount from the same row</Discount>
</Store>
</book>
This is a rough example, so please don't worry about where the XML data came from. Lets just say the book column has the XML data already present.
I know how to update the table with static values with,
UPDATE books
SET book.modify('insert <Price>10.99</Price><Discount>20.00</Discount> after (/book/Store/Address)[1]')
Performance is not a consideration here.
It is not possible to do two modifications in one statement.
In this case you might trick this out by first combining both values and then insert them at once.
I use an updateable CTE to achieve this:
CREATE TABLE books (
ID INT NOT NULL,
price NUMERIC(4,2),
discount NUMERIC(2,2),
book XML
);
--Fill the table with data
INSERT INTO books VALUES(1,10.5,.5,
'<book>
<title>Harry Potter</title>
<author>J K Rowling</author>
<Store>
<Name>Burke and Burkins</Name>
<Address>Some St, Somewhere, Some City</Address>
</Store>
</book>');
--This is the actual query
WITH CTE AS
(
SELECT *
,(SELECT price AS Price,discount AS Discount FOR XML PATH(''),TYPE) AS XmlNode
FROM books
)
UPDATE CTE SET book.modify('insert sql:column("XmlNode") after (/book/Store/Address)[1]');
--Check the result
SELECT *
FROM books;
--Clean-Up (carefull with real date!)
GO
--DROP TABLE books;
One hint
Your XML column, if it is really XML, will - for sure! - not contain an XML starting with <?xml version="1.0" encoding="UTF-8"?>. The internal encoding is always unicode (ucs-2, which is almost utf-16) and one cannot change this. If you pass in a declaration, it is either ommited or you'll get an error.
UPDATE
Another approach was to first read the XML's values and then to rebuild it:
WITH CTE AS
(
SELECT *
,(SELECT b.value('title[1]','nvarchar(max)') AS [title]
,b.value('author[1]','nvarchar(max)') AS [author]
,b.value('(Store/Name)[1]','nvarchar(max)') AS [Store/Name]
,b.value('(Store/Address)[1]','nvarchar(max)') AS [Store/Address]
,price AS [Store/Price]
,discount AS [Store/Discount]
FROM book.nodes('book') AS A(b)
FOR XML PATH('book'),TYPE
) AS bookNew
FROM books
)
UPDATE CTE SET book=bookNew;
Considering this simple table id (int), name (varchar), customFields (xml)
customFields contains an XML representation of a C# Dictionary. E.g :
<dictionary>
<item>
<key><int>1</int></key>
<value><string>Audi</string></value>
</item>
<item>
<key><int>2</int></key>
<value><string>Red</string></value>
</item>
</dictionary>
How do I select all rows of my table with 3 columns: id, name and 1 column that is the content of the string tag where the value of the int tag is equal 1.
The closest to the result I managed to get is:
SELECT id, name, C.value('(value/string/text())[1]', 'varchar(max)')
FROM myTable
OUTER APPLY customFields.nodes('/dictionary/item') N(C)
WHERE (customFields IS NULL OR C.value('(key/int/text())[1]', 'int') = 1)
Problem is, if xml doesn't have a int tag = 1 the row is not returned at all (I would still like to get a row with id, name and NULL in the 3rd column)
I've created a table the same as yours and this query worked fine:
select id, name,
(select C.value('(value/string/text())[1]','varchar(MAX)')
from xmlTable inr outer apply
customField.nodes('/dictionary/item') N(C)
where
C.value('(key/int/text())[1]','int') = 1
and inr.id = ou.id) as xmlVal
from xmlTable ou
Here is my result:
The reason why your query didn't worked is because it first selects values of "value/string" for all rows from "myTable" and then filters them. Therefore, the null values appear only on empty fields, and on the other fields (which contains any xml value), the "value/string/text()" is displayed. This is what your query without where clause returns:
id,name,xml
1 lol NULL
2 rotfl Audi
2 rotfl Red
3 troll Red
I have a task that requires me to pull in a set of xml files, which are all related, then pick out a subset of records from these files, transform some columns, and export to a a single xml file using a different format.
I'm new to SSIS, and so far I've managed to first import two of the xml files (for simplicity, starting with just two files).
The first file we can call "Item", containing some basic metadata, amongst those an ID, which is used to identify related records in the second file "Milestones". I filter my "valid records" using a lookup transformation in my dataflow - now I have the valid Item ID's to fetch the records I need. I funnel these valid ID's (along with the rest of the columns from Item.xml through a Sort, then into a merge join.
The second file is structured with 2 outputs, one containing two columns (ItemID and RowID). The second containing all of the Milestone related data plus a RowID. I put these through a inner merge join, based on RowID, so I have the ItemID in with the milestone data. Then I do a full outer join merge join on both files, using ItemID.
This gives me data sort of like this:
ItemID[1] - MilestoneData[2]
ItemID[1] - MilestoneData[3]
ItemID[1] - MilestoneData[4]
ItemID[1] - MilestoneData[5]
ItemID[2] - MilestoneData[6]
ItemID[2] - MilestoneData[7]
ItemID[2] - MilestoneData[8]
I can put this data through derived column transformations to create the columns of data I actually need, but I can't see how to structure this in a relational way/normalize it into a different xml format.
The idea is to output something like:
<item id="1">
<Milestone id="2">
<Milestone />
<Milestone id="3">
<Milestone />
</item>
Can anyone point me in the right direction?
UPDATE:
A bit more detailed picture of what I have, and what I'd like to achieve:
Item.xml:
<Items>
<Item ItemID="1">
<Title>
Data
</Title>
</Item>
<Item ItemID="2">
...
</Item>
...
</Items>
Milestone.xml:
<Milestones>
<Item ItemID="2">
<MS id="3">
<MS_DATA>
Data
</MS_DATA>
</MS>
<MS id="4">
<MS_DATA>
Data
</MS_DATA>
</MS>
</Item>
<Item ItemID="3">
<MS id="5">
<MS_DATA>
Data
</MS_DATA>
</MS>
</item>
</Milestones>
The way it's presented in SSIS when I use XML source, is not entirely intuitive, meaning the Item rows and the MS rows are two seperate outputs. I had to run these through a join in order to get the Milestones that corresponds to specific Items. No problem here, then run it through a full outer join with the items, so I get a flattened table with multiple rows containing obviously the same data for an Item and with different data for the MS. Basically I get what I tried to show in my table, lots of redundant Item data, for each unique MilestoneData.
In the end it has to look similar to:
<NewItems>
<myNewItem ItemID="2">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones>
<MS_DATA_TRANSFORMED MSID="3">
data
</MS_DATA_TRANSFORMED>
<MS_DATA_TRANSFORMED MSID="4">
data
</MS_DATA_TRANSFORMED>
</MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
<myNewItem ItemID="3">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones>
<MS_DATA_TRANSFORMED MSID="5">
data
</MS_DATA_TRANSFORMED>
</MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
</myNewItem>
<myNewItem ItemID="4">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones></MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
</myNewItem>
</NewItems>
I would try to handle this using a script component with the component type transformation. As you are new to ssis, i assume you haven't used this before. So basically you
define input columns, your component will expect (i.e. column input_xml containing ItemID[1] - MilestoneData[2];...
use c# to create a logic which cuts and sticks together
define output columns your component will use to deliver the transformed row
You will face the problem that one row will probably be used two times in the end, like i.e.
ItemID[1] - MilestoneData[2]
will result in
<item id="1">
<Milestone id="2">
I have done something pretty similar using Pentaho kettle, even without using something like a script component in which you define own logic. But i guess ssis has a lack of tasks here.
How about importing the XML into relational tables ( eg in tempdb ) then using FOR XML PATH to reconstruct the XML? FOR XML PATH offers a high degree of control over how you want the XML to look. A very simple example below:
CREATE TABLE #items ( itemId INT PRIMARY KEY, title VARCHAR(50) NULL )
CREATE TABLE #milestones ( itemId INT, msId INT, msData VARCHAR(50) NOT NULL, PRIMARY KEY ( itemId, msId ) )
GO
DECLARE #itemsXML XML
SELECT #itemsXML = x.y
FROM OPENROWSET( BULK 'c:\temp\items.xml', SINGLE_CLOB ) x(y)
INSERT INTO #items ( itemId, title )
SELECT
i.c.value('#ItemID', 'INT' ),
i.c.value('(Title/text())[1]', 'VARCHAR(50)' )
FROM #itemsXML.nodes('Items/Item') i(c)
GO
DECLARE #milestoneXML XML
SELECT #milestoneXML = x.y
FROM OPENROWSET( BULK 'c:\temp\milestone.xml', SINGLE_CLOB ) x(y)
INSERT INTO #milestones ( itemId, msId, msData )
SELECT
i.c.value('#ItemID', 'INT' ),
i.c.value('(MS/#id)[1]', 'VARCHAR(50)' ) msId,
i.c.value('(MS/MS_DATA/text())[1]', 'VARCHAR(50)' ) msData
FROM #milestoneXML.nodes('Milestones/Item') i(c)
GO
SELECT
i.itemId AS "#ItemID"
FROM #items i
INNER JOIN #milestones ms ON i.itemId = ms.itemId
FOR XML PATH('myNewItem'), ROOT('NewItems'), TYPE
DROP TABLE #items
DROP TABLE #milestones
I am using InfoPath 2007 to send out a survey (it is not connected to SharePoint or a DataBase). The file I will get back is an XML file. Every place there is an answer block, it has its own unique id (aka field name).
Now, I have a SQL Server Database (2007?) with a table "Responses". Its columns are: AnswerID(unique PK), QuestionID (FK) (which is the unique id (field name), and Answer. The QuestionID is already populated with the unique id (field name). There are more than 300 records for QuestionID.
What I need to be able to do is reach into the XML file, find the QuestionID (field name), grab the data for that field name, and then put the data into the DB column "Answer" that matches the field name in the QuestionID column.
Is there an easy/medium way to do this mapping/updating with the least amount of chance of error?
NOTE: I tried to use the DB import XML data wizard, the information breaks out into an unmanageable number of tables.
You can shred the XML into rows and columns and then use that to update your table.
Here is a little example of what you can do.
create table Responses(QuestionID varchar(10), Answer varchar(10))
insert into Responses values('Q1', null)
insert into Responses values('Q2', null)
insert into Responses values('Q3', null)
declare #xml xml
set #xml =
'<root>
<question ID="Q1">Answer1</question>
<question ID="Q2">Answer2</question>
<question ID="Q3">Answer3</question>
</root>'
;with cte as
(
select
r.n.value('#ID', 'varchar(10)') as QuestionID,
r.n.value('.', 'varchar(10)') as Answer
from #xml.nodes('/root/*') as r(n)
)
update R
set Answer = C.Answer
from Responses as R
inner join cte as C
on R.QuestionID = C.QuestionID
select *
from Responses
Result:
QuestionID Answer
---------- ----------
Q1 Answer1
Q2 Answer2
Q3 Answer3
The XML I used most certainly does not look anything like what you have but it should give you a hint of what you can do. If you post a sample of your XML file, table structure and expected result/output you can probably get a more precise answer.
Edit
declare #xml xml =
'<?xml version="1.0" encoding="UTF-8"?>
<my:myFields xmlns:my="xx.com" xml:lang="en-us">
<my:group1>
<my:group2>
<my:field1>Im an analyst.</my:field1>
<my:group3>
<my:group4>
<my:field2>1</my:field2>
<my:field3>I click the mouse.</my:field3>
</my:group4>
<my:group4>
<my:field2>2</my:field2>
<my:field3>I type on the keyboard.</my:field3>
</my:group4>
</my:group3>
</my:group2>
<my:group2>
<my:field1>Im a stay at home mom.</my:field1>
<my:group3>
<my:group4>
<my:field2>1</my:field2>
<my:field3>I Cook.</my:field3>
</my:group4>
<my:group4>
<my:field2>2</my:field2>
<my:field3>I clean.</my:field3>
</my:group4>
</my:group3>
</my:group2>
</my:group1>
</my:myFields>'
;with xmlnamespaces('xx.com' as my)
select
T.N.value('../../my:field1[1]', 'varchar(50)') as Field1,
T.N.value('my:field2[1]', 'varchar(50)') as Field2,
T.N.value('my:field3[1]', 'varchar(50)') as Field3
from #xml.nodes('my:myFields/my:group1/my:group2/my:group3/my:group4') as T(N)
Result:
Field1 Field2 Field3
Im an analyst. 1 I click the mouse.
Im an analyst. 2 I type on the keyboard.
Im a stay at home mom. 1 I Cook.
Im a stay at home mom. 2 I clean.
I am trying to find Relative information from a table and return those results (along with other unrelated results) in one row as part of a larger query.
I already tried using this example, modified for my data.
How to return multiple values in one column (T-SQL)?
But I cannot get it to work. It will not pull any data (I am sure it is is user[me] error).
If I query the table directly using a TempTable, I can get the results correctly.
DECLARE #res NVARCHAR(100)
SET #res = ''
CREATE TABLE #tempResult ( item nvarchar(100) )
INSERT INTO #tempResult
SELECT Relation AS item
FROM tblNextOfKin
WHERE ID ='xxx' AND Address ='yyy'
ORDER BY Relation
SELECT #res = #res + item + ', ' from #tempResult
SELECT substring(#res,1,len(#res)-1) as Result
DROP TABLE #tempResult
Note the WHERE line above, xxx and yyy would vary based on the input criteria for the function. but since you cannot use TempTables in a function... I am stuck.
The relevant fields in the table I am trying to query are as follows.
tblNextOfKin
ID - varchar(12)
Name - varchar(60)
Relation - varchar(30)
Address - varchar(100)
I hope this makes enough sense... I saw on another post an expression that fits.
My SQL-fu is not so good.
Once I get a working function, I will place it into the main query for the SSIS package I am working on which is pulling data from many other tables.
I can provide more details if needed, but the site said to keep it simple, and I tried to do so.
Thanks !!!
Follow-up (because when I added a comment to the reponse below, I could not edit formatting)
I need to be able to get results from different columns.
ID Name Relation Address
1, Mike, SON, 100 Main St.
1, Sara, DAU, 100 Main St.
2, Tim , SON, 123 South St.
Both the first two people live at the same address, so if I query for ID='1' and Address='100 Main St.' I need the results to look something like...
"DAU, SON"
Mysql has GROUP_CONCAT
SELECT GROUP_CONCAT(Relation ORDER BY Relation SEPARATOR ', ') AS item
FROM tblNextOfKin
WHERE ID ='xxx' AND Address ='yyy'
You can do it for the whole table with
SELECT ID, Address, GROUP_CONCAT(Relation ORDER BY Relation SEPARATOR ', ') AS item
FROM tblNextOfKin
GROUP BY ID, Address
(assuming ID is not unique)
note: this is usually bad practice as an intermediate step, this is acceptable only as final formatting for presentation (otherwise you will end up ungrouping it which will be pain)
I think you need something like this (SQL Server):
SELECT stuff((select ',' +Relation
FROM tblNextOfKin a
WHERE ID ='xxx' AND Address ='yyy'
ORDER BY Relation
FOR XML path('')),1,1,'') AS res;