SQL Server XML Node Parsing - sql

Fairly new to XML parsing in SQL Server. Here's what I have and what I'm trying to do.
I have a table with many rows similar to this:
+-------------------+------------------------------------+
| EDI_Assessment_ID | XML_TEXT |
+-------------------+------------------------------------+
| 12345 | text column containing XML |
| 12346 | text column containing XML |
+-------------------+------------------------------------+
The XML_Text column has a large XML text similar to this structure (i've simplified and only pasted the relevant portions of it:
<Assessment>
<ADLs>
<ADL_Group>
<ADL>bathing</ADL>
<Mapped_ADL Source="Calypso">Bathing</Mapped_ADL>
<ADL_Level>Requires only equipment to complete ADL</ADL_Level>
<Mapped_ADL_Level Source="Calypso">Independent</Mapped_ADL_Level>
<ADL_Equipment>HH shower</ADL_Equipment>
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
<ADL_Group>
<ADL>Continence-Bowel</ADL>
<Mapped_ADL Source="Calypso">Continence</Mapped_ADL>
<ADL_Level>Independent</ADL_Level>
<Mapped_ADL_Level Source="B/A">Independent</Mapped_ADL_Level>
<ADL_Equipment />
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
</Assessment>
How can i parse through the XML for each row in the table to return:
The ADL (bathing, Continence-Bowel) and
the ADL_Assisted_By_Info
I'm looking for the result set to return similar to this:
+-------------------+-------------+----------------------+------------------+----------------------+
| EDI_Assessment_ID | Bathing | ADL_Assisted_By_Info | Continence-Bowel | ADL_Assisted_By_Info |
+-------------------+-------------+----------------------+------------------+----------------------+
| 12345 | Independent | No one | Independent | No one |
+-------------------+-------------+----------------------+------------------+----------------------+

These solutions rely on something which, from the OP's sample data is not true; that the column of the datatype text has valid XML. The sample data is not, so this solution will not work against the sample data they have provided.
In fact, if all of the OP's sample data is poorly formed XML then they SQL Server is completely the wrong choice here. They should, ideally, be fixing their data first, and then changing the datatype to xml so that more bad XML can't in inserted into the database.
If, for whatever reason, they can't do that then they will need to find a different solution. SQL Server, however, isn't the solution. You're going to need something that is very good as string manipulation and work out the values that way. if you're doing this at a (large) dataset value then the process is probably going to slow down to a crawl.
Anyway, onto the point. Note the comment. There are 2 solution, the first, other than the validity, assumes that the bathing node is always the first ADL_GROUP element, and that Continence-Bowel is always the second:
WITH VTE AS(
SELECT 12345 AS ID,
CONVERT(text,
'<Assessment>
<ADLs>
<ADL_Group>
<ADL>bathing</ADL>
<Mapped_ADL Source="Calypso">Bathing</Mapped_ADL>
<ADL_Level>Requires only equipment to complete ADL</ADL_Level>
<Mapped_ADL_Level Source="Calypso">Independent</Mapped_ADL_Level>
<ADL_Equipment>HH shower</ADL_Equipment>
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
<ADL_Group>
<ADL>Continence-Bowel</ADL>
<Mapped_ADL Source="Calypso">Continence</Mapped_ADL>
<ADL_Level>Independent</ADL_Level>
<Mapped_ADL_Level Source="B/A">Independent</Mapped_ADL_Level>
<ADL_Equipment />
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
</ADLs>' + --I have added this line to make the XML valid. The sample you have will NOT work, as it is not valid XML
'</Assessment>') AS XML_Text
)
SELECT V.ID,
X.XML_Type,
T.AA.value('(ADL_Group/Mapped_ADL_Level/text())[1]','varchar(30)') AS Bathing,
T.AA.value('(ADL_Group/ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info,
T.AA.value('(ADL_Group/Mapped_ADL_Level/text())[2]','varchar(30)') AS ContinenceBowel,
T.AA.value('(ADL_Group[2]/ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info
FROM VTE V
CROSS APPLY (VALUES(TRY_CONVERT(xml, V.XML_Text))) X(XML_Type)
CROSS APPLY X.XML_Type.nodes('/Assessment/ADLs') T(AA);
If, however, that isn't true and there could be other nodes in play, with different values, then you could do the following for the SELECT (CTE not included):
SELECT V.ID,
X.XML_Type,
B.AG.value('(Mapped_ADL_Level/text())[1]','varchar(30)') AS Bathing,
B.AG.value('(ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info,
CB.AG.value('(Mapped_ADL_Level/text())[1]','varchar(30)') AS ContinenceBowel,
CB.AG.value('(ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info
FROM VTE V
CROSS APPLY (VALUES(TRY_CONVERT(xml, V.XML_Text))) X(XML_Type)
CROSS APPLY X.XML_Type.nodes('/Assessment/ADLs/ADL_Group') B(AG)
CROSS APPLY X.XML_Type.nodes('/Assessment/ADLs/ADL_Group') CB(AG)
WHERE B.AG.value('(ADL/text())[1]','varchar(30)') = 'bathing'
AND CB.AG.value('(ADL/text())[1]','varchar(30)') = 'Continence-Bowel';

Related

Redshift - How to use column in one table as pattern in SIMILAR TO

I have a problem where I have two tables. One table constains urls and their information and another groups of urls that should be grouped by a pattern.
Urls table:
------------------------------------------------
| url | files |
| https://myurl1/test/one/es/main.html | 530 |
| https://myurl1/test/one/en/main.html | 530 |
| https://myurl1/test/one/ar/main.html | 530 |
------------------------------------------------
Urls patterns table:
---------------------------------------------
| group | url_pattern |
| group1 | https://myurl1/test/one/(es|en)/%|
| group2 | https://myurl1/test/one/(ar)/% |
---------------------------------------------
I have tried something like this bearing in mind that url_patterns will only have one row per group.
SELECT * FROM urls_table
WHERE url SIMILAR TO (SELECT MAX (url_pattern) FROM url_patterns WHERE group='group1')
LIMIT 10
The main problem here is that it seems that applying SIMILAR TO with a column argument is not working.
Could anyone give me some advices?
Thanks in advance.
You are running into the requirement that regexp patterns are compiled and that SIMILAR TO is a layer on regexp. So what you are trying to do won't work. I believe there are a number of other ways to do this.
I) Change to LIKE pattern matching: LIKE patterns aren't precompiled so can use dynamic patterns. The downside is that they are more limited but I think you can still do what you want. Just change your patterns to be set of pattern columns (if the number of patterns is limited) and test for all the patterns. Unneeded patterns can just be a value that can never match. Definitely a brute force hack.
II) Change to LIKE pattern matching w/ SQL to provide OR behavior: have multiple LIKE patterns in the url_pattern column separated by '|' (for example). Then use split_part to match each sub-pattern - a bit complex and possible slow but works. Like this:
SELECT url
FROM urls_table
LEFT JOIN (SELECT split_part(pattern, '|', part_no::int) as pattern
FROM url_patterns
CROSS JOIN (SELECT row_number() over () as part_no FROM urls_table)
WHERE "group" = 'group1'
)
ON url LIKE pattern
WHERE p.pattern IS NOT NULL;
You will also need to change your pattern strings to use the simpler LIKE format and use '|' for multiple possibilities - Ex: Group1 pattern becomes 'https://myurl1/test/one/es/%|https://myurl1/test/one/en/%'
III) Use some front-end query modification to find the pattern for the group and apply it to query BEFORE it is sent to the compiler. This could be an external tool or a stored procedure on Redshift. Get the pattern in one query and use it to issue the second query.
Do you want exists?
SELECT u.*
FROM urls_table u
WHERE EXISTS (SELECT 1
FROM url_patterns p
WHERE u.url SIMILAR TO p.url_pattern AND
p.group = 'group1'
)
LIMIT 10;

Need Column data to be the ROW header for my query

I am trying to use a LATERAL JOIN on a particular data set however i cannot seem to get the syntax correct for the query.
What am i trying to achieve:
Take the first column in the dataset (See picture) and use that as the Table headers (rows) and populate the rows with the data from the StringValue column
Currently it appears like this:
cfname | stringvalue |
----------------------------------------
customerrequesttype | newformsubmission|
Assignmentgroup | ITDEPT |
and I would like to have it appear as this:
customerrequesttype| Assignmentgroup|
-------------------------------------
newformsubmission | ITDEPT
As mentioned i am very new to SQL i know limited basics

How to sort string data that represents numbers

My client has a set of numeric data stored in a string field in a database. So of course it doesn't sort correctly. These rows sort like this:
105
3
44
When they should sort like this:
3
44
105
This is very much a legacy database and I can't change it at all. I also can't change the software that uses the database. The client doesn't own it or have the source code. It has never worked the way they want. However, there is an unused string field that I could use to sort on (only a small number of fields can be sorted on.)
What I would like to do is take the input data, derive a string from it, and store the new string in the unused field, such that when the data is sorted on this new data, the original data sorts correctly, i.e., numerically.
So, for an overly simplistic example, if the algorithm produced the following new data:
105 -> c
3 -> a
44 -> b
Then when the second column was sorted, the first column would look 'correct'.
The tricky bit is that when new rows are added to the database, they must also sort correctly, without having to regenerate the sort data for all rows. This is the part of the problem that has my brain in a twist. I'm not sure it's actually possible.
You can assume that the number will never be more than 5 'digits'.
I realize this is a total kludge, but since I can't change the system, I have to find a work around, rather than a quality solution. Welcome to the real world.
~~~~~~~~~~~~~~~~~~~~~~ S O L U T I O N ~~~~~~~~~~~~~~~~~~
I don't think this is an uncommon problem, so here are the results of Gordon's solution:
mysql> select * from t order by new;
+------+------------+
| orig | new |
+------+------------+
| 3 | 0000000003 |
| 44 | 0000000044 |
| 105 | 0000000105 |
+------+------------+
In most databases, you can just do:
order by cast(col as int)
This will convert the string representation to a number and use that for ordering. There is no need for an additional column. If you add one, I would recommend adding a numeric column to contain the actual value.
If you really want to store something in the unused field, then you can left pad the number. How to do this depends on the database, but here is one typical method:
update t
set unused = right(concat('0000000000', col), 10);
Not all databases support these two specific functions, but all offer this basic functionality in some method.
Try something like
SELECT column1 FROM table1 ORDER BY LENGTH(column1) ASC, column1 ASC
(Adjust the column and table name for your environment.)
This is a bit of a hack but works as long as the "numbers" in your string column are natural, non-negative numbers only.
If you are looking for a more sophisticated approach or algorithm, try searching for natural sort together with your DBMS.

SQL Server 2014 equivalent to mysql's find_in_set()

I'm working with a database that has a locations table such as:
locationID | locationHierarchy
1 | 0
2 | 1
3 | 1,2
4 | 1
5 | 1,4
6 | 1,4,5
which makes a tree like this
1
--2
----3
--4
----5
------6
where locationHierarchy is a csv string of the locationIDs of all its ancesters (think of a hierarchy tree). This makes it easy to determine the hierarchy when working toward the top of the tree given a starting locationID.
Now I need to write code to start with an ancestor and recursively find all descendants. MySQL has a function called 'find_in_set' which easily parses a csv string to look for a value. It's nice because I can just say "find in set the value 4" which would give all locations that are descendants of locationID of 4 (including 4 itself).
Unfortunately this is being developed on SQL Server 2014 and it has no such function. The CSV string is a variable length (virtually unlimited levels allowed) and I need a way to find all ancestors of a location.
A lot of what I've found on the internet to mimic the find_in_set function into SQL Server assumes a fixed depth of hierarchy such as 4 levels maximum) which wouldn't work for me.
Does anyone have a stored procedure or anything that I could integrate into a query? I'd really rather not have to pull all records from this table to use code to individually parse the CSV string.
I would imagine searching the locationHierarchy string for locationID% or %,{locationid},% would work but be pretty slow.
I think you want like -- in either database. Something like this:
select l.*
from locations l
where l.locationHierarchy like #LocationHierarchy + ',%';
If you want the original location included, then one method is:
select l.*
from locations l
where l.locationHierarchy + ',' like #LocationHierarchy + ',%';
I should also note that SQL Server has proper support for recursive queries, so it has other options for hierarchies apart from hierarchy trees (which are still a very reasonable solution).
Finally It worked for me..
SELECT * FROM locations WHERE locationHierarchy like CONCAT(#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param)

Flat File Import: Remove Data

(Posted a similar question earlier but HR department changed conditions today)
Our HR department has an automated export from our SAP system in the form of a flat file. The information in the flat file looks like so.
G/L Account 4544000 Recruiting/Job Search
Company Code 0020
--------------------------
| Posting Date| LC amnt|
|------------------------|
| 01/01/2013 | 406.25 |
| 02/01/2013 | 283.33 |
| 03/21/2013 |1,517.18 |
--------------------------
G/L Account 4544000 Recruiting/Job Search
Company Code 0020
--------------------------
| Posting Date| LC amnt|
|------------------------|
| 05/01/2013 | 406.25 |
| 06/01/2013 | 283.33 |
| 07/21/2013 |1,517.18 |
--------------------------
When I look at the data in the SSIS Flat File Source Connection all of the information is in a single column. I have tried to use the Delimiter set to Pipe but it will not separate the data, I assume due to the nonessential information at the top and middle of the file.
I need to remove the data at the top and middle and then have the Date and Total split into two separate columns.
The goal of this is to separate the data so that I can get a single SUM for the running year.
Year Total
2013 $5123.25
I have tried to do this in SSIS but I cant seem to separate the columns or remove the data. I want to avoid a script task as I am not familiar with the code or operation of that component.
Any assistance would be appreciated.
I would create a temp table that can import the whole flat file, after that do filter on SQL level
An example
Create TABLE tmp (txtline VARCHAR(MAX))
BCP or SSIS file into tmp table
Run Query like this to get result ( you may need adjust string length to fit your flat file)
WITH cte AS (
SELECT
CAST(SUBSTRING(txtline,2,10) AS DATE) AS PostingDate,
CAST(REPLACE(REPLACE(SUBSTRING(txtline,15,100),'|',''),',','') AS NUMERIC(19,4)) AS LCAmount
FROM tmp
WHERE ISDATE(SUBSTRING(txtline,2,10)) = 1
)
SELECT
YEAR(PostingDate),
SUM(LCAmount)
FROM cte
GROUP BY YEAR(PostingDate)
maybe you could use MS-Excel to open the flat file, using pipe-character as the delimeter, and then create a CSV from that, if needed.
Short of a script task/component (or a full-blown custom SSIS component), I don't think you'll be able to parse that specific format in SSIS. The Flat File Connection Manager does allow you to select how many rows of your text file are headers to be skipped, but the format you're showing has multiple sections (and thus multiple headers). There's also the issue of the horizontal lines, which the Flat File Connection won't be able to properly handle.
I'd first see if there's any way to get a normal CSV file with this data out of SAP. If that turns out to be impossible, then you'll need some sort of custom code to strip out the excess text.