how to combine different schemas - azure-data-lake

I'm using a custom OUTPUTTER to generate XML from my "flat data" like so:
SELECT *..
OUTPUT #all_data
TO "/patient/{ID}.tsv"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlOutputter("Patient");
Which generates individual files that look like this:
<Patient>
<ID>5283293478</ID>
<ANESTHESIA_START>09/06/2019 11:52:00</ANESTHESIA_START>
<ANESHTHESIA_END>09/06/2019 14:40:00</ANESHTHESIA_END>
<SURGERY_START_TIME>9/6/2019 11:52:00 AM</SURGERY_START_TIME>
<SURGERY_END_TIME>9/6/2019 2:34:00 PM</SURGERY_END_TIME>
<INCISION_START>9/6/2019 12:45:00 PM</INCISION_START>
<INCISION_END>9/6/2019 2:18:00 PM</INCISION_END>
</Patient>
A separate script is generating data like this:
SELECT *..
OUTPUT #other_data
TO "/charge/{ID}.tsv"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlOutputter("Patient");
Yielding files that look like this:
<Charge>
<ID>5283293478</ID>
<PROVIDER_TYPE>CRNA</PROVIDER_TYPE>
</Charge>
<Charge>
<ID>5283293478</ID>
<PROVIDER_TYPE>Student Nurse Anesthetist</PROVIDER_TYPE>
</Charge>
As you can see, the files that are being created are:
/patient/{ID}.tsv
/charge/{ID}.tsv
How do I concatenate the two sets of files based on ID?
The result I'd like is:
<Patient>
<ID>5283293478</ID>
<ANESTHESIA_START>09/06/2019 11:52:00</ANESTHESIA_START>
<ANESHTHESIA_END>09/06/2019 14:40:00</ANESHTHESIA_END>
<SURGERY_START_TIME>9/6/2019 11:52:00 AM</SURGERY_START_TIME>
<SURGERY_END_TIME>9/6/2019 2:34:00 PM</SURGERY_END_TIME>
<INCISION_START>9/6/2019 12:45:00 PM</INCISION_START>
<INCISION_END>9/6/2019 2:18:00 PM</INCISION_END>
</Patient>
<Charge>
<ID>5283293478</ID>
<PROVIDER_TYPE>CRNA</PROVIDER_TYPE>
</Charge>
<Charge>
<ID>5283293478</ID>
<PROVIDER_TYPE>Student Nurse Anesthetist</PROVIDER_TYPE>
</Charge>

If you have the 2 files, you can simple extract both (using id)
DECLARE #patient string ="/patient/{Id}.tsv";
DECLARE #charge string ="/charge/{Id}.tsv";
#patients =
EXTRACT Id string, content string FROM #patient USING Extractors.Text();
#charges =
EXTRACT Id string, content string FROM #charge USING Extractors.Text();
Then you can simple join by id and concatenate patients and charges and output it.

Related

Parse XML data from a column in SQL Server

I'm trying to put together a report for a badge program. Some the data for custom fields we created are stored in a single column in the table as XML. I need to return a couple of the items in there to the report and am having a hard time getting the proper syntax to get it to parse out.
Aside from the XML, the query itself is simple:
SELECT
Person1.firstName
,Person1.lastName
,Person1.idNumber
,Person1.idNumber2
,Person1.idNumber3
,Person1.status
,Person1.customdata
FROM
Person1
the field "customdata" is the XML field that I need to pull Title, and 2 different dates out of. This is what the XML looks like:
<person1_7:CustomData xmlns:person1_7="http://www.badgepass.com/Person1_7">
<Title>IT Director</Title>
<Gaming_x0020_Level>Level 1</Gaming_x0020_Level>
<Gaming_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Gaming_x0020_Issue_x0020_Date>
<Gaming_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Gaming_x0020_Expire_x0020_Date>
<Betting_x0020_Level>Level 1</Betting_x0020_Level>
<Betting_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Betting_x0020_Issue_x0020_Date>
<Betting_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Betting_x0020_Expire_x0020_Date>
<BadgeType>Dual Employee</BadgeType>
<Gaming_x0020_Status>TEMP</Gaming_x0020_Status>
<Betting_x0020_Status>TEMP</Betting_x0020_Status>
</person1_7:CustomData>
I have tried a couple of different methods trying to follow the advice from How to query for Xml values and attributes from table in SQL Server? and then tried declaring a XML namespace with the following query:
WITH XMLNAMESPACES ('http://www.badgepass.com/Person1_7' as X)
SELECT
Person1.firstName
,Person1.lastName
,Person1.idNumber
,Person1.idNumber2
,Person1.idNumber3
,Person1.status
,Person1.customdata.value('(/X:person1_7:customdata/X:Title)[1]', 'varchar(100)') AS Title
FROM
Person1
So far all of my results keep returning "XQuery [Person1.customData.value()]: ")" was expected.
" I'm assuming I have a syntax issue that I'm overlooking as I've never had to manipulate XML with SQL before. Thank you in advance for any help.
Please try the following solution.
Notable points:
XQuery .nodes() method establishes a context so you can access any
XML element right away without long XPath expressions.
Use of the text() in the XPath expressions is for performance
reasons.
SQL
-- DDL and sample data population, start
DECLARE #person1 TABLE (firstname varchar(50), customdata xml);
INSERT INTO #person1(firstname, customdata) VALUES
('John', '<person1_7:CustomData xmlns:person1_7="http://www.badgepass.com/Person1_7">
<Title>IT Director</Title>
<Gaming_x0020_Level>Level 1</Gaming_x0020_Level>
<Gaming_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Gaming_x0020_Issue_x0020_Date>
<Gaming_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Gaming_x0020_Expire_x0020_Date>
<Betting_x0020_Level>Level 1</Betting_x0020_Level>
<Betting_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Betting_x0020_Issue_x0020_Date>
<Betting_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Betting_x0020_Expire_x0020_Date>
<BadgeType>Dual Employee</BadgeType>
<Gaming_x0020_Status>TEMP</Gaming_x0020_Status>
<Betting_x0020_Status>TEMP</Betting_x0020_Status>
</person1_7:CustomData>');
-- DDL and sample data population, end
WITH XMLNAMESPACES ('http://www.badgepass.com/Person1_7' as person1_7)
SELECT firstName
, c.value('(Title/text())[1]', 'VARCHAR(100)') AS Title
, c.value('(Gaming_x0020_Issue_x0020_Date/text())[1]', 'DATETIME') GamingIssueDate
FROM #person1
CROSS APPLY customdata.nodes('/person1_7:CustomData') AS t(c);
Output
+-----------+-------------+-------------------------+
| firstName | Title | GamingIssueDate |
+-----------+-------------+-------------------------+
| John | IT Director | 2021-02-18 12:00:00.000 |
+-----------+-------------+-------------------------+

Sum of values extracted using SQL

I have an xml like the below.
<LPNDetail>
<ItemName>5054807025389</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>2</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0002</RefField4>
<RefField5>Retail</RefField5>
<Qty>4</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>
<LPNDetail>
<ItemName>5054807025563</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>4</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0004</RefField4>
<RefField5>Retail</RefField5>
<Qty>2</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>
I have extracted the xml field using extract.xmltype and now i am getting the below result.
42
But i need to sum the quantity values i.e i need to get result as 6 (4+2).
Any help will be appreciated.
Thanks,
Shihaj
It is not clear what you mean by "an xml". If it's supposed to be an XML document, you are missing the outermost tags, perhaps something like <Document> ..... </Document>
If your text value is EXACTLY as you have shown it (which would be pretty bad), you can wrap within such outermost tags manually, and then use standard Oracle XML tools. For the illustration below I assume you simply have a string (VARCHAR2 or CLOB), not converted to XML type; in that case, I concatenate the beginning and end tags, and then convert to XMLtype, in the query.
with t ( str ) as (
select '<LPNDetail>
<ItemName>5054807025389</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>2</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0002</RefField4>
<RefField5>Retail</RefField5>
<Qty>4</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>
<LPNDetail>
<ItemName>5054807025563</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>4</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0004</RefField4>
<RefField5>Retail</RefField5>
<Qty>2</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>'
from dual
)
-- End of SIMULATED table (for testing purposes only, not part of the solution)
-- Query begins below this line
select sum(x.qty) as total_quantity
from t,
xmltable('/Document/LPNDetail'
passing xmltype('<Document>' || t.str || '</Document>')
columns qty number path 'Qty') x
;
Output:
TOTAL_QUANTITY
--------------
6

Reading dynamic XML nodes in SQL Server

I have the following XML structure:
set #MailXML =
'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Age> 24 </Age>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
<DestinationCountry> USA </DestinationCountry>
<OriginCountry> Australia </OriginCountry>
<OriginAddress> 120 St Kilda Road </OriginAddress>
</Letter>
</Mailman>
</MailingCompany>'
My SQL currently looks like this:
-- Mail Insertion
INSERT INTO mailDB.dbo.Mailman
SELECT
m.value('Name[1]','varchar(50)') as Name,
m.value('Age[1]','varchar(50)') as Age
FROM
#MailXML.nodes('/MailingCompany/Mailman') as A(m)
SET #MailPersonFK = SCOPE_IDENTITY();
-- Letter Insertion
INSERT INTO mailDB.dbo.Letter
SELECT
l.value('DestinationAddress[1]', 'varchar(50)') as DestinationAddress,
l.value('DestinationCountry[1]', 'varchar(50)') as DestinationCountry,
l.value('OriginCountry[1]', 'varchar(50)') as OriginCountry,
l.value('OriginAddress[1]', 'varchar(50)') as OriginAddress
#MailPersonFK as MailID
FROM
#MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
I am trying to extract the Mailman and Letter data into their own respective tables. I have got that working however my issue is that the MailCompany node is dynamic. Sometimes it may be MailVehicle, for example, and I still need
to read the corresponding Mailman and Letter node data and insert them into their own respective tables.
So both
FROM #MailXML.nodes('/MailingCompany/Mailman') as A(t)
and
FROM #MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
Will need to be changed to allow MailingCompany to be dynamic.
I have tried to extract the parent node and concatenate it into a string to put into the .nodes function like the following:
set #DynXML = '/' + #parentNodeVar + '/Mailman'
FROM #MailXML.nodes(#DynXML) as A(t)
However I get the following error:
The argument 1 of the XML data type method "nodes" must be a string literal.
How can I overcome this dynamic XML issue?
Thank you very much in advance
Look at this reduced example:
DECLARE #xml1 XML=
N'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
</Letter>
</Mailman>
</MailingCompany>';
DECLARE #xml2 XML=
N'<OtherName>
<Mailman>
<Name>Jodie</Name>
<Letter>
<DestinationAddress> This is the other address </DestinationAddress>
</Letter>
</Mailman>
</OtherName>';
SELECT #xml1.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
You can replace a node's name with *.
Another trick is the deep search with // (same result as before):
SELECT #xml1.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
The general rule: Be as specific as possible.

Capturing mutliple XML strings with the same node names in SQL

Weaving my way through the XML string world - I've come across this issue I'm having.
So I have two XML string that are super similar to each other - only thing is - is that they have different info inside the nodes.
XML string 1:
<DocumentElement>
<Readings>
<ReadingID>1</ReadingID>
<ReadingDate>2013-12-19T00:00:00-05:00</ReadingDate>
<Sys>120</Sys>
<Dia>80</Dia>
<PageNumber>4</PageNumber>
<AddedDate>2015-04-17T19:30:22.2255116-04:00</AddedDate>
<UpdateDate>2015-04-17T19:30:22.2255116-04:00</UpdateDate>
</Readings>
<Readings>
<ReadingID>2</ReadingID>
<ReadingDate>2014-01-10T00:00:00-05:00</ReadingDate>
<Sys>108</Sys>
<Dia>86</Dia>
<PageNumber>8</PageNumber>
<AddedDate>2015-04-17T19:32:08.5121747-04:00</AddedDate>
<UpdateDate>2015-04-17T19:32:08.5121747-04:00</UpdateDate>
</Readings>
</DocumentElement>
XML String 2:
<DocumentElement>
<Readings>
<ReadingID>1</ReadingID>
<ReadingDate>2013-12-20T00:00:00-05:00</ReadingDate>
<Sys>140</Sys>
<Dia>70</Dia>
<PageNumber>10</PageNumber>
<AddedDate>2015-04-17T19:30:22.2255116-04:00</AddedDate>
<UpdateDate>2015-04-17T19:30:22.2255116-04:00</UpdateDate>
</Readings>
</DocumentElement>
Now this is really just an example - I could have an infinite amount of strings just like this that I would want to pull data from. In this case I have two strings and I'm looking to extract all info on <Sys>, <Dia> and <ReadingDate>
I would also like to display this info in a table like this:
Reading Date | Sys | Dia
----------------------------
12/29/2013 | 120 | 80
----------------------------
1/10/2014 | 108 | 86
----------------------------
12/20/2013 | 140 | 70
I am totally unsure how to proceed with this - any and all help is appreciated!
Assuming those XML's are in an XML column named MyXmlColumn, in a table named MyTable*, you can try something like this :
SELECT
R.value('ReadingDate[1]', 'DATETIME') as ReadingDate
, R.value('Sys[1]', 'INT') as Sys
, R.value('Dia[1]', 'INT') as Dia
FROM MyTable t
CROSS APPLY t.MyXmlColumn.nodes('/DocumentElement/Readings') as readings(R)
SQL Fiddle
*: next time you should've provided these info in the first place

xmlquery returns all values as one long line instead of separate entities

I am trying to query the telephone numbers from the following xml file
<xmlPhoneEntity>
<TelephoneEntity>
<number>123</number>
</TelephoneEntity>
<TelephoneEntity>
<number>456</number>
</TelephoneEntity>
<TelephoneEntity>
<number>789</number>
</TelephoneEntity>
</xmlPhoneEntity>
This XML is located in my DB - the table looks like this
id customer_id telephone blabla
1 111 xmlfile
2 222 xmlfile
etc
My sql query looks like this -
select xmlserialize(xmlquery('/xmlPhoneEntity/TelephoneEntity/number/text()
passing telephone"
the response: 123456789
I tried using /nodes() instead of /text() and the result is the same.
How do I separate the values ?