Parse a varchar column containing XML like data into row wise - sql

i have a column in table which contain data like XML,i would like to get data in rows.
My table data as-
select printDataColumn from Mytable
It returns value-
<PrintData>
<Line1>.MERCHANT ID: *****4005</Line1>
<Line2>.CLERK ID: ADMIN</Line2>
<Line3>.</Line3>
<Line4>. VOID SALE</Line4>
<Line5>.</Line5>
<Line6>.VISA ************0006</Line6>
<Line7>.ENTRY METHOD: SWIPED</Line7>
<Line8>.DATE: 03/05/2019 TIME: 16:57:20</Line8>
<Line9>.</Line9>
<Line10>.INVOICE: 1551785225020</Line10>
<Line11>.REFERENCE: 1008</Line11>
<Line12>.AUTH CODE: 08354A</Line12>
<Line13>.</Line13>
<Line14>.AMOUNT USD$ 1.14</Line14>
<Line15>. ==========</Line15>
<Line16>.TOTAL USD$ 1.14</Line16>
<Line17>.</Line17>
<Line18>. APPROVED - THANK YOU</Line18>
<Line19>.</Line19>
<Line20>.I AGREE TO PAY THE ABOVE TOTAL AMOUNT</Line20>
<Line21>.ACCORDING TO CARD ISSUER AGREEMENT</Line21>
<Line22>.(MERCHANT AGREEMENT IF CREDIT VOUCHER)</Line22>
<Line23>.</Line23>
<Line24>.</Line24>
<Line25>.</Line25>
<Line26>.x_______________________________________</Line26>
<Line27>. Merchant Signature</Line27>
<Line28>.</Line28>
</PrintData>
but i want to use this information in another way like that
MERCHANT ID: *****4005
CLERK ID: ADMIN
SALE
AMEX ***********1006
ENTRY METHOD: CHIP
DATE: 03/07/2019 TIME: 14:37:23
INVOICE: 1551949638173
REFERENCE: 1005
AUTH CODE: 040749. . . . .and so on.
any help is appreciable.

Besides the fact, that it always is a good idea to use the appropriate type to store your data, you can use a cast on the fly to use your xml-like-data with XML methods:
DECLARE #tbl TABLE(ID INT IDENTITY,PrintData VARCHAR(4000));
INSERT INTO #tbl VALUES
('<PrintData>
<Line1>.MERCHANT ID: *****4005</Line1>
<Line2>.CLERK ID: ADMIN</Line2>
<Line3>.</Line3>
<Line4>. VOID SALE</Line4>
<!-- more lines -->
</PrintData>');
SELECT t.ID
,A.Casted.value(N'(/PrintData/Line1/text())[1]','nvarchar(max)') AS Line1
FROM #tbl t
CROSS APPLY(SELECT CAST(t.PrintData AS XML)) A(Casted);
In this case I use CROSS APPLY to add a column A.Casted to the result set, which is a row-wise casted XML.
This will break, in cases of invalid XML (of course). You might try TRY_CAST instead. This would return NULL, but will hide data errors...
Some more background
The cast to XML is a rather expensive operation. Doing this whenever you want to read out of your data is some heavy load for your server. Furthermore, using VARCHAR is prone to two major errors:
If there are foreign characters you might get question marks
If the XML is not valid, you will not see it - until you use it.
If possible, try to change the table's design to use native XML.
And one more hint
It is a bad approach to name-number elements (same for columns). Instead of <Line1><Line2><Line3> better use <Line nr="1"><Line nr="2"><Line nr="3">...

Related

Parse XML data from a column in SQL Server

I'm trying to put together a report for a badge program. Some the data for custom fields we created are stored in a single column in the table as XML. I need to return a couple of the items in there to the report and am having a hard time getting the proper syntax to get it to parse out.
Aside from the XML, the query itself is simple:
SELECT
Person1.firstName
,Person1.lastName
,Person1.idNumber
,Person1.idNumber2
,Person1.idNumber3
,Person1.status
,Person1.customdata
FROM
Person1
the field "customdata" is the XML field that I need to pull Title, and 2 different dates out of. This is what the XML looks like:
<person1_7:CustomData xmlns:person1_7="http://www.badgepass.com/Person1_7">
<Title>IT Director</Title>
<Gaming_x0020_Level>Level 1</Gaming_x0020_Level>
<Gaming_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Gaming_x0020_Issue_x0020_Date>
<Gaming_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Gaming_x0020_Expire_x0020_Date>
<Betting_x0020_Level>Level 1</Betting_x0020_Level>
<Betting_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Betting_x0020_Issue_x0020_Date>
<Betting_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Betting_x0020_Expire_x0020_Date>
<BadgeType>Dual Employee</BadgeType>
<Gaming_x0020_Status>TEMP</Gaming_x0020_Status>
<Betting_x0020_Status>TEMP</Betting_x0020_Status>
</person1_7:CustomData>
I have tried a couple of different methods trying to follow the advice from How to query for Xml values and attributes from table in SQL Server? and then tried declaring a XML namespace with the following query:
WITH XMLNAMESPACES ('http://www.badgepass.com/Person1_7' as X)
SELECT
Person1.firstName
,Person1.lastName
,Person1.idNumber
,Person1.idNumber2
,Person1.idNumber3
,Person1.status
,Person1.customdata.value('(/X:person1_7:customdata/X:Title)[1]', 'varchar(100)') AS Title
FROM
Person1
So far all of my results keep returning "XQuery [Person1.customData.value()]: ")" was expected.
" I'm assuming I have a syntax issue that I'm overlooking as I've never had to manipulate XML with SQL before. Thank you in advance for any help.
Please try the following solution.
Notable points:
XQuery .nodes() method establishes a context so you can access any
XML element right away without long XPath expressions.
Use of the text() in the XPath expressions is for performance
reasons.
SQL
-- DDL and sample data population, start
DECLARE #person1 TABLE (firstname varchar(50), customdata xml);
INSERT INTO #person1(firstname, customdata) VALUES
('John', '<person1_7:CustomData xmlns:person1_7="http://www.badgepass.com/Person1_7">
<Title>IT Director</Title>
<Gaming_x0020_Level>Level 1</Gaming_x0020_Level>
<Gaming_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Gaming_x0020_Issue_x0020_Date>
<Gaming_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Gaming_x0020_Expire_x0020_Date>
<Betting_x0020_Level>Level 1</Betting_x0020_Level>
<Betting_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Betting_x0020_Issue_x0020_Date>
<Betting_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Betting_x0020_Expire_x0020_Date>
<BadgeType>Dual Employee</BadgeType>
<Gaming_x0020_Status>TEMP</Gaming_x0020_Status>
<Betting_x0020_Status>TEMP</Betting_x0020_Status>
</person1_7:CustomData>');
-- DDL and sample data population, end
WITH XMLNAMESPACES ('http://www.badgepass.com/Person1_7' as person1_7)
SELECT firstName
, c.value('(Title/text())[1]', 'VARCHAR(100)') AS Title
, c.value('(Gaming_x0020_Issue_x0020_Date/text())[1]', 'DATETIME') GamingIssueDate
FROM #person1
CROSS APPLY customdata.nodes('/person1_7:CustomData') AS t(c);
Output
+-----------+-------------+-------------------------+
| firstName | Title | GamingIssueDate |
+-----------+-------------+-------------------------+
| John | IT Director | 2021-02-18 12:00:00.000 |
+-----------+-------------+-------------------------+

Regex comparison in Oracle between 2 varchar columns (from different tables)

I am trying to find a way to capture relevant errors from oracle alertlog. I have one table (ORA_BLACKLIST) with column values as below (these are the values which I want to ignore from
V$DIAG_ALERT_EXT)
Below are sample data in ORA_BLACKLIST table. This table can grow based on additional error to ignore from alertlog.
ORA-07445%[kkqctdrvJPPD
ORA-07445%[kxsPurgeCursor
ORA-01013%
ORA-27037%
ORA-01110
ORA-2154
V$DIAG_ALERT_EXT contains a MESSAGE_TEXT column which contains sample text like below.
ORA-01013: user requested cancel of current operation
ORA-07445: exception encountered: core dump [kxtogboh()+22] [SIGSEGV] [ADDR:0x87] [PC:0x12292A56]
ORA-07445: exception encountered: core dump [java_util_HashMap__get()] [SIGSEGV]
ORA-00600: internal error code arguments: [qercoRopRowsets:anumrows]
I want to write a query something like below to ignore the black listed errors and only capture relevant info like below.
select
dae.instance_id,
dae.container_name,
err_count,
dae.message_level
from
ORA_BLACKLIST ob,
V$DIAG_ALERT_EXT dae
where
group by .....;
Can someone suggest a way or sample code to achieve it?
I should have provided the exact contents of blacklist table. It currently contains some regex (perl) and I want to convert it to oracle like regex and compare with v$diag_alert_ext message_text column. Below are sample perl regex in my blacklist table.
ORA-0(,|$| )
ORA-48913
ORA-00060
ORA-609(,|$| )
ORA-65011
ORA-65020 ORA-31(,|$| )
ORA-7452 ORA-959(,|$| )
ORA-3136(,|)|$| )
ORA-07445.[kkqctdrvJPPD
ORA-07445.[kxsPurgeCursor –
Your blacklist table looks like like patterns, not regular expressions.
You can write a query like this:
select dae.* -- or whatever columns you want
from V$DIAG_ALERT_EXT dae
where not exists (select 1
from ORA_BLACKLIST ob
where dae.message_text like ob.<column name>
);
This will not have particularly good performance if the tables are large.

'local-name()' requires a singleton (or empty sequence) T-SQL Xquery

This is a variation/combination of these two questions, but I think I've done as they described:
Getting element's name in XPATH
XQuery [value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'
Is there a way to get the name of a node in a SQL Server 2016 T-SQL query?
I want to do this:
select
trcid,
trcXML.value(
'local-name(//*[local-name()="after"][1]/*[1])',
'varchar(32)'
) as XmlTableName,
trcXml,
trcGUID,
trcCorrID
from trace
order by trcId desc
I have explicitly included the [1] after each item in the hierarchy.
Error: XQuery [trace.trcXML.value()]: 'local-name()' requires a singleton (or empty sequence), found operand of type 'element(*,xdt:untyped) *'
Sample data:
<s1:DB2Request xmlns:s1="XYZDB2">
<s1:sync>
<s1:after identityInsert="false">
<s1:PG204AT5>
<s1:ISA06>1975111</s1:ISA06>
etc...
</s1:PG204AT5>
</s1:after>
</s1:sync>
</s1:DB2Request>
I want the table name (this is a BizTalk "update-gram") which is between the "after" and the ISA06 elements, in this case: PG204AT5.
Tested in xpathtester.com: local-name(//*[local-name()="after"][1]/* [1])
- for some reason when I save it the XPATH gets changed: http://www.xpathtester.com/xpath/2bd602d8fc7aee0484de14bf93f71ef2
One more set of parenthesis gives you a singleton as required:
DECLARE #t TABLE(s XML);
INSERT INTO #t(s)VALUES(N'<s1:DB2Request xmlns:s1="XYZDB2">
<s1:sync>
<s1:after identityInsert="false">
<s1:PG204AT5>
<s1:ISA06>1975111</s1:ISA06>
<somethingsomething/>
</s1:PG204AT5>
</s1:after>
</s1:sync>
</s1:DB2Request>');
select
s.value(
'local-name((//*[local-name()="after"][1]/*)[1])',
'varchar(32)'
) as XmlTableName
from #t;
Results in:
XmlTableName
PG204AT5
Remember that the SQL Server engine, unlike most XQuery implementations, does "pessimistic static typing" - it assumes that if things can go wrong, they will go wrong.
The expression //*[local-name()="after"][1]/* [1] is capable of returning more than one element - for example it will do so with this input
<x>
<y>
<after>
<z/>
</after>
</y>
<y>
<after>
<z/>
</after>
</y>
</x>
so the XQuery compiler infers a static type of element()*, which doesn't meet the requirement.
It's probably enough to simply do (//*[local-name()="after"]/*)[1].
Personally, I always thought pessimistic static typing was a really poor design choice, but some of the early XQuery implementors including Microsoft were very keen on it.
Or even simpler solution by asking explicitly first child node name of the <s1:after> element.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE(xmldata XML);
INSERT INTO #tbl(xmldata)
VALUES(N'<s1:DB2Request xmlns:s1="XYZDB2">
<s1:sync>
<s1:after identityInsert="false">
<s1:PG204AT5>
<s1:ISA06>1975111</s1:ISA06>
<somethingsomething/>
</s1:PG204AT5>
</s1:after>
</s1:sync>
</s1:DB2Request>');
-- DDL and sample data population, end
;WITH XMLNAMESPACES (DEFAULT 'XYZDB2')
SELECT xmldata.value('local-name((/DB2Request/sync/after/child::node())[1])','VARCHAR(50)') as XmlTableName
FROM #tbl;
Output
+--------------+
| XmlTableName |
+--------------+
| PG204AT5 |
+--------------+

U-sql call data in json array

I have browsed the web and forum to download the data from the file json, but my script does not work.
I have a problem with downloading the list of objects of rates. Can someone please help? I can not find fault.
{"table":"C","no":"195/C/NBP/2016","tradingDate":"2016-10-06","effectiveDate":"2016-10-07","rates":
[
{"currency":"dolar amerykański","code":"USD","bid":3.8011,"ask":3.8779},
{"currency":"dolar australijski","code":"AUD","bid":2.8768,"ask":2.935},
{"currency":"dolar kanadyjski","code":"CAD","bid":2.8759,"ask":2.9339},
{"currency":"euro","code":"EUR","bid":4.2493,"ask":4.3351},
{"currency":"forint (Węgry)","code":"HUF","bid":0.013927,"ask":0.014209},
{"currency":"frank szwajcarski","code":"CHF","bid":3.8822,"ask":3.9606},
{"currency":"funt szterling","code":"GBP","bid":4.8053,"ask":4.9023},
{"currency":"jen (Japonia)","code":"JPY","bid":0.036558,"ask":0.037296},
{"currency":"korona czeska","code":"CZK","bid":0.1573,"ask":0.1605},
{"currency":"korona duńska","code":"DKK","bid":0.571,"ask":0.5826},
{"currency":"korona norweska","code":"NOK","bid":0.473,"ask":0.4826},
{"currency":"korona szwedzka","code":"SEK","bid":0.4408,"ask":0.4498},
{"currency":"SDR (MFW)","code":"XDR","bid":5.3142,"ask":5.4216}
],
"EventProcessedUtcTime":"2016-10-09T10:48:41.6338718Z","PartitionId":1,"EventEnqueuedUtcTime":"2016-10-09T10:48:42.6170000Z"}
This is my script in sql.
#trial =
EXTRACT jsonString string
FROM #"adl://kamilsepin.azuredatalakestore.net/ExchangeRates/2016/10/09/10_0_c60d8b8895b047c896ce67d19df3cdb2.json"
USING Extractors.Text(delimiter:'\b', quoting:false);
#json =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS rec
FROM #trial;
#columnized =
SELECT
rec["table"]AS table,
rec["no"]AS no,
rec["tradingDate"]AS tradingDate,
rec["effectiveDate"]AS effectiveDate,
rec["rates"]AS rates
FROM #json;
#rateslist =
SELECT
table, no, tradingDate, effectiveDate,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(rates) AS recl
FROM #columnized;
#selectrates =
SELECT
recl["currency"]AS currency,
recl["code"]AS code,
recl["bid"]AS bid,
recl["ask"]AS ask
FROM #rateslist;
OUTPUT #selectrates
TO "adl://kamilsepin.azuredatalakestore.net/datastreamanalitics/ExchangeRates.tsv"
USING Outputters.Tsv();
You need to look at the structure of your JSON and identify, what constitutes your first path inside your JSON that you want to map to correlated rows. In your case, you are really only interested in the array in rates where you want one row per array item.
Thus, you use the JSONExtractor with a JSONPath that gives you one row per array element (e.g., rates[*]) and then project each of its fields.
Here is the code (with slightly changed paths):
REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];
#selectrates =
EXTRACT currency string, code string, bid decimal, ask decimal
FROM #"/Temp/rates.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("rates[*]");
OUTPUT #selectrates
TO "/Temp/ExchangeRates.tsv"
USING Outputters.Tsv();

Converting xml node string to strip out nodes

I have a table that has a column called RAW DATA of type NVARCHAR MAX, which is a dump from a web service. Here is a sample of 1 data line:
<CourtRecordEventCaseHist>
<eventDate>2008-02-11T06:00:00Z</eventDate>
<eventDate_TZ>-0600</eventDate_TZ>
<histSeqNo>4</histSeqNo>
<countyNo>1</countyNo>
<caseNo>xxxxxx</caseNo>
<eventType>WCCS</eventType>
<descr>Warrant/Capias/Commitment served</descr>
<tag/>
<ctofcNameL/>
<ctofcNameF/>
<ctofcNameM/>
<ctofcSuffix/>
<sealCtofcNameL/>
<sealCtofcNameF/>
<sealCtofcNameM/>
<sealCtofcSuffix/>
<sealCtofcTypeCodeDescr/>
<courtRptrNameL/>
<courtRptrNameF/>
<courtRptrNameM/>
<courtRptrSuffix/>
<dktTxt>Signature bond set</dktTxt>
<eventAmt>0.00</eventAmt>
<isMoneyEnabled>false</isMoneyEnabled>
<courtRecordEventPartyList>
<partyNameF>Name</partyNameF>
<partyNameM>A.</partyNameM>
<partyNameL>xxxx</partyNameL>
<partySuffix/>
<isAddrSealed>false</isAddrSealed>
<isSeal>false</isSeal>
</courtRecordEventPartyList>
</CourtRecordEventCaseHist>
It was suppose to go in a table, with the node names representing the column names. The table it's going to is created, I just need to exract the data from this row to the table. I have 100's of thousands records like this. I was going to copy to a xml file, then import. But there is so much data, I would rather try and do the work within the DB.
Any ideas?
First, create the table with all the required columns.
Then, use your favorite scripting language to load the table! Mine being groovy, here is what I'd do:
def sql = Sql.newInstance(/* SQL connection here*/)
sql.eachRow("select RAW_DATA from TABLE_NAME") { row ->
String xmlData = row."RAW_DATA"
def root = new XmlSlurper().parseText(xmlData)
def date = root.eventDate
def histSeqNo = root.histSeqNo
//Pull out all the data and insert into new table!
}
I did find an answer to this, I'm sure there is more than one way of doing this. But this is what I got to work. Thanks for everyone's help.
SELECT
pref.value('(caseNo/text())[1]', 'varchar(20)') as CaseNumber,
pref.value('(countyNo/text())[1]', 'int') as CountyNumber
FROM
dbo.CaseHistoryRawData_10 CROSS APPLY
RawData.nodes('//CourtRecordEventCaseHist') AS CourtRec(pref)