Converting a single XML bracket column to multiple columns in T-SQL SQL-SERVER - sql

I've seen similar questions to mine but the XML format has always been different, as the XML format I have does not follow the "standard" strcuture. My table looks like the following (single column with XML bracket as row values):
|VAL|
|<person name="bob" age="22" city="new york" occupation="student"></person>|
|<person name="bob" age="22" city="new york" occupation="student"></person>|
And the outcome I'm looking for is:
|Name|age|city |occupation|
|bob |22 |new york|student |
|bob |22 |new york|student |
I can create hardcoded script with these column names, but the problem is that I have over 20 tables that would then all require a custom script. My thinking is that there is a way where I can dynamically, taken into account destination table and a source table (xml), I could have a procedure where this data is generated.

Your question is not all clear...
As far as I understand, you have a variety of different XMLs and you want to read them generically. If this is true, I'd suggest for your next question, to reflect this in your sample data.
One general statement is: There is no way around dynamically created statements, in cases, where you want to set the descriptive elements of a resultset (in this case: the names of the columns) dynamically. T-SQL relies on some things you must know in advance.
Try this:
I set up a mockup scenario to simulate your issue (please try to do this yourself in your next question):
DECLARE #tbl TABLE(ID INT IDENTITY, Descr VARCHAR(100), VAL XML);
INSERT INTO #tbl VALUES
('One person',N'|<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One more person','<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One country','<country name="Germany" capital="Berlin" continent="Europe"></country>');
--this query relies on all possible attributes known in advance.
--common attributes, like the name, are returned for a person and for a country
--differing attributes return as NULL.
--One advantage might be, that you can use a specific datatype if appropriate.
SELECT t.ID
,t.Descr
,t.VAL.value('(/*[1]/#name)[1]','nvarchar(max)') AS [name]
,t.VAL.value('(/*[1]/#age)[1]','nvarchar(max)') AS [age]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#occupation)[1]','nvarchar(max)') AS [occupation]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#capital)[1]','nvarchar(max)') AS [capital]
,t.VAL.value('(/*[1]/#continent)[1]','nvarchar(max)') AS [continent]
FROM #tbl t;
--This query returns as classical EAV (entity-attribute-value) list
--In this result you get each attribute on its own line
SELECT t.ID
,t.Descr
,A.attrs.value('local-name(.)','nvarchar(max)') AS AttrName
,A.attrs.value('.','nvarchar(max)') AS AttrValue
FROM #tbl t
CROSS APPLY t.VAL.nodes('/*[1]/#*') A(attrs);
Both approaches might be generated as a statement on string-level and then executed by EXEC() or sp_executesql.
Hint: One approach might be to insert the EAV list into a tolerant staging table and proceed from there with conditional aggregation, PIVOT or hardcoded VIEWs.
Dynamic approach
In order to read the <person> elements we would need this:
SELECT t.ID
,t.Descr
,t.VAL.value('(/*[1]/#name)[1]','nvarchar(max)') AS [name]
,t.VAL.value('(/*[1]/#age)[1]','nvarchar(max)') AS [age]
,t.VAL.value('(/*[1]/#city)[1]','nvarchar(max)') AS [city]
,t.VAL.value('(/*[1]/#occupation)[1]','nvarchar(max)') AS [occupation]
FROM #tbl t
WHERE VAL.value('local-name(/*[1])','varchar(100)')='person';
All we have to do is to generate the changing part:
Try this:
A new mockup with a real table
CREATE TABLE SimulateYourTable(ID INT IDENTITY, Descr VARCHAR(100), VAL XML);
INSERT INTO SimulateYourTable VALUES
('One person',N'|<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One more person','<person name="bob" age="22" city="new york" occupation="student"></person>')
,('One country','<country name="Germany" capital="Berlin" continent="Europe"></country>');
--Filter for <person> entities
DECLARE #entityName NVARCHAR(100)='person';
--This is a string representing the command
DECLARE #cmd NVARCHAR(MAX)=
'SELECT t.ID
,t.Descr
***columns here***
FROM SimulateYourTable t
WHERE VAL.value(''local-name(/*[1])'',''varchar(100)'')=''***name here***''';
--with this we can create all the columns
--Hint: With SQL Server 2017+ there is STRING_AGG() - much simpler!
DECLARE #columns NVARCHAR(MAX)=
(
SELECT CONCAT(',t.VAL.value(''(/*[1]/#',Attrib.[name],')[1]'',''nvarchar(max)'') AS ',QUOTENAME(Attrib.[name]))
FROM SimulateYourTable t
CROSS APPLY t.VAL.nodes('//#*') AllAttrs(a)
CROSS APPLY (SELECT a.value('local-name(.)','varchar(max)')) Attrib([name])
WHERE VAL.value('local-name(/*[1])','varchar(100)')=#entityName
GROUP BY Attrib.[name]
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)');
--Now we stuff this into our command
SET #cmd=REPLACE(#cmd,'***columns here***',#columns);
SET #cmd=REPLACE(#cmd,'***name here***',#entityName);
--This is the command.
--Hint: You might use this to create physical VIEWs without the need to type them in...
PRINT #cmd;
You can use EXEC(#cmd) to execute this dynamic SQL and check the result.

Related

SQL XML contains() with fn:lowercase()

I'm trying to make this Xpath query using the lowercase text() value in my contains() statement. For example, searching "New York" versus "New york" will return different results.
I plan to make sure the parameter is lowercase going into the stored procedure from now on, but I need to make sure the text() in the XML is also lowercase. I've tried a few different ways but keep getting syntax errors. Note: I'm searching the <Company> node for New York to ensure I don't get any records that match the <City> node. I had started with regular full-text contains() but have since gone to XPath for accuracy.
DECLARE #Company nvarchar(100) = "new york"
SELECT ...
FROM OrderObject o
WHERE o.Address.exist('//Company/text()[contains(.,sql:variable("#Company"))]') = 1)
XML is like this... shorted for brevity
<Address>
<Company>1</Company>
<City>2</City>
<State>3</State>
</Address>
Thanks
Here is a correct way to it.
You need to apply lower-case() function for both parameters of the contains() function.
This way stored procedure parameter could be in absolutely any case: upper, lower, mixed, etc.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
(N'<Address>
<Company>1</Company>
<City>NeW YoRk</City>
<State>NY</State>
</Address>'),
(N'<Address>
<Company>2</Company>
<City>Miami</City>
<State>FL</State>
</Address>');
-- DDL and sample data population, end
DECLARE #City NVARCHAR(100) = 'new york';
SELECT *
FROM #tbl
WHERE xmldata.exist('/Address/City[contains(lower-case((./text())[1]),lower-case(sql:variable("#City")))]') = 1;

Incorrect Results in my SQL select Query when parsing XML text column

Hi all I have a table that holds my business Id's and it is varchar(255) data type
I also have a separate table that stores an XML structured document in a text data type column when the business gets approved by a lender (it stores the companys information etc).
I am trying to return all business ID's that are NOT approved by a lender, the only way i can know this is if the business ID does not exist in the XML.
I cannot join on any tables as i do not have any relational data, but i am trying to subquery it.
Any ideas? here is what i have
Select bus_id
From dbo.tbl_business
Where bus_id Not In (
Select Cast(company_xml_info As Varchar(Max))
From tbl_company_reports
Where Cast(company_xml_info As Varchar(Max)) Is Not Null
And company_xml_info Like '%Business id="' + bus_id + '"%'
And company_xml_info Is Not Null
And company_xml_current_status = 'Approved'
)
Here is an example mark of something similar you can do. This should run fine in SQL Management Studio 2008 and up:
DECLARE #Data TABLE (BusinessId VARCHAR(8))
INSERT INTO #Data (BusinessId) VALUES ('A68'),('A69'),('A70');
DECLARE #CompanyXml TABLE (company_xml_info VARCHAR(MAX));
INSERT INTO #CompanyXml (company_xml_info ) VALUES ('<CompanyInfo>
<Businesses>
<Business id="A68">
<Businessceo>Test</Businessceo>
</Business>
</Businesses>
</CompanyInfo>')
,('<CompanyInfo>
<Businesses>
<Business id="A70">
<Businessceo>Test2</Businessceo>
</Business>
</Businesses>
</CompanyInfo>')
--Data as is
Select *
From #Data
--example of your code as is
SELECT *
From #CompanyXml
--exclusionary listing
SELECT *
From #Data
EXCEPT
--the secret of this is part 1 casting it to xml. Then you extend that with '.value'. That wants a structure to get to the Id.
--I wrap that in ()'s then say the first instance of that [1] as in theory you could have more instances and do very complex parsing.
--Then it needs a type of sql to transform this value into
SELECT CAST(company_xml_info AS XML).value('(CompanyInfo/Businesses/Business/#id)[1]', 'varchar(8)')
From #CompanyXml
Update 6-29-17
If you have something that has repeat elements in a tree structure of your XML, I prefer the 'nodes' method of repeating them and then you do not have to worry about using a first. You merely need to iterate through what you have from the use of the 'nodes' syntax and get a value like so
DECLARE #X XML = '<CompanyInfo><Businesses><Business id="C1405"/><Business id="C1408"/><Business id="C1408"/></Businesses> </CompanyInfo>'
SELECT
x.query('.')
, x.value('#id', 'varchar(8)')
FROM #X.nodes('/CompanyInfo/Businesses/Business') AS y(x)

Alternatives to XML shredding in SQL

I tried to shred XML into a temporary table by using XQuery .nodes as follows. But, I got performance problem. It is taking much time to shred. Please give me an idea on alternatives for this.
My requirement is to pass bulk records to a stored procedure and parse those records and do some operation based on record values.
CREATE TABLE #DW_TEMP_TABLE_SAVE(
[USER_ID] [NVARCHAR](30),
[USER_NAME] [NVARCHAR](255)
)
insert into #DW_TEMP_TABLE_SAVE
select
A.B.value('(USER_ID)[1]', 'nvarchar(30)' ) [USER_ID],
A.B.value('(USER_NAME)[1]', 'nvarchar(30)' ) [USER_NAME]
from
#l_n_XMLDoc.nodes('//ROW') as A(B)
Specify the text() node in your values clause.
insert into #DW_TEMP_TABLE_SAVE
select A.B.value('(USER_ID/text())[1]', 'nvarchar(30)' ) [USER_ID],
A.B.value('(USER_NAME/text())[1]', 'nvarchar(30)' ) [USER_NAME]
from #l_n_XMLDoc.nodes('/USER_DETAILS/RECORDSET/ROW') as A(B)
Not using text() will create a query plan that tries concatenate the values from the specified node with all its child nodes and I guess you don't want that in this scenario. The concatenation part of the query if you don't use text() is done by the UDX operator and it is a good thing not to have it in your plan.
Another thing to try is OPENXML. In some scenarios (large xml documents) I have found that OPENXML performs faster.
declare #idoc int
exec sp_xml_preparedocument #idoc out, #l_n_XMLDoc
insert into #DW_TEMP_TABLE_SAVE
select USER_ID, USER_NAME
from openxml(#idoc, '/USER_DETAILS/RECORDSET/ROW', 2)
with (USER_ID nvarchar(30), USER_NAME nvarchar(30))
exec sp_xml_removedocument #idoc

SQL How to Split One Column into Multiple Variable Columns

I am working on MSSQL, trying to split one string column into multiple columns. The string column has numbers separated by semicolons, like:
190230943204;190234443204;
However, some rows have more numbers than others, so in the database you can have
190230943204;190234443204;
121340944534;340212343204;134530943204
I've seen some solutions for splitting one column into a specific number of columns, but not variable columns. The columns that have less data (2 series of strings separated by commas instead of 3) will have nulls in the third place.
Ideas? Let me know if I must clarify anything.
Splitting this data into separate columns is a very good start (coma-separated values are an heresy). However, a "variable number of properties" should typically be modeled as a one-to-many relationship.
CREATE TABLE main_entity (
id INT PRIMARY KEY,
other_fields INT
);
CREATE TABLE entity_properties (
main_entity_id INT PRIMARY KEY,
property_value INT,
FOREIGN KEY (main_entity_id) REFERENCES main_entity(id)
);
entity_properties.main_entity_id is a foreign key to main_entity.id.
Congratulations, you are on the right path, this is called normalisation. You are about to reach the First Normal Form.
Beweare, however, these properties should have a sensibly similar nature (ie. all phone numbers, or addresses, etc.). Do not to fall into the dark side (a.k.a. the Entity-Attribute-Value anti-pattern), and be tempted to throw all properties into the same table. If you can identify several types of attributes, store each type in a separate table.
If these are all fixed length strings (as in the question), then you can do the work fairly simply (at least relative to other solutions):
select substring(col, 1+13*(n-1), 12) as val
from t join
(select 1 as n union all select union all select 3
) n
on len(t.col) <= 13*n.n
This is a useful hack if all the entries are the same size (not so easy if they are of different sizes). Do, however, think about the data structure because semi-colon (or comma) separated list is not a very good data structure.
IF I were you, I would create a simple function that is dividing values separated with ';' like this:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N'fn_Split_List') AND xtype IN (N'FN', N'IF', N'TF'))
BEGIN
DROP FUNCTION [dbo].[fn_Split_List]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fn_Split_List](#List NVARCHAR(512))
RETURNS #ResultRowset TABLE ( [Value] NVARCHAR(128) PRIMARY KEY)
AS
BEGIN
DECLARE #XML xml = N'<r><![CDATA[' + REPLACE(#List, ';', ']]></r><r><![CDATA[') + ']]></r>'
INSERT INTO #ResultRowset ([Value])
SELECT DISTINCT RTRIM(LTRIM(Tbl.Col.value('.', 'NVARCHAR(128)')))
FROM #xml.nodes('//r') Tbl(Col)
RETURN
END
GO
Than simply called in this way:
SET NOCOUNT ON
GO
DECLARE #RawData TABLE( [Value] NVARCHAR(256))
INSERT INTO #RawData ([Value] )
VALUES ('1111111;22222222')
,('3333333;113113131')
,('776767676')
,('89332131;313131312;54545353')
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
SET NOCOUNT OFF
GO
The result is as the follow:
Value
1111111
22222222
113113131
3333333
776767676
313131312
54545353
89332131
Anyway, the logic in the function is not complicated, so you can easily put it anywhere you need.
Note: There is not limitations of how many values you will have separated with ';', but there are length limitation in the function that you can set to NVARCHAR(MAX) if you need.
EDIT:
As I can see, there are some rows in your example that will caused the function to return empty strings. For example:
number;number;
will return:
number
number
'' (empty string)
To clear them, just add the following where clause to the statement above like this:
SELECT SL.[Value]
FROM #RawData AS RD
CROSS APPLY [fn_Split_List] ([Value]) as SL
WHERE LEN(SL.[Value]) > 0

how to get values inside an xml column, when it's of type nvarchar

My question is similar to this one: Choose a XML node in SQL Server based on max value of a child element
except that my column is NOT of type XML, it's of type nvarchar(max).
I want to extract the XML node values from a column that looks like this:
<Data>
<el1>1234</el1>
<el2>Something</el2>
</Data>
How can I extract the values '1234' and 'Something' ?
doing a convert and using the col.nodes is not working.
CONVERT(XML, table1.col1).value('(/Data/el1)[1]','int') as 'xcol1',
After that, I would like to do a compare value of el1 (1234) with another column, and update update el1 as is. Right now I'm trying to just rebuild the XML when passing the update:
ie
Update table set col1 ='<Data><el1>'+#col2+'</el1><el2>???</el2>
You've got to tell SQL Server the number of the node you're after, like:
(/Data/el1)[1]
^^^
Full example:
declare #t table (id int, col1 varchar(max))
insert #t values (1, '<Data><el1>1234</el1><el2>Something</el2></Data>')
select CAST(col1 as xml).value('(/Data/el1)[1]', 'int')
from #t
-->
1234
SQL Server provides a modify function to change XML columns. But I think you can only use it on columns with the xml type. Here's an example:
declare #q table (id int, col1 xml)
insert #q values (1, '<Data><el1>1234</el1><el2>Something</el2></Data>')
update #q
set col1.modify('replace value of (/Data/el1/text())[1] with "5678"')
select *
from #q
-->
<Data><el1>5678</el1><el2>Something</el2></Data>
At the end of the day, SQL Server's XML support makes simple things very hard. If you value maintainability, you're better off processing XML on the client side.