SQL XML parsing split the string on letters 'TH'

SQL XML parsing split the string on letters 'TH' - sql

I want to split a string into multiple values based on a special symbol. e.g., Here is the string
JdwnrhþTHIMPHUþOTHþþ10/1991þ02/02/2011þBHUTAN
I want it to be:
Jdwnrh THIMPHU OTH 10/1991 02/02/2011 BHUTAN
I am using the following SQL:
DECLARE #delimiter VARCHAR(50)
SET #delimiter='þ'
;WITH CTE AS
(
SELECT
CAST('<M>' + REPLACE(REPLACE(CAST(DATA as nvarchar(MAX)), #delimiter , '</M><M>'), '&', '&') + '</M>' AS XML)
AS BDWCREGPREVADDR_XML
FROM [JACS_RAVEN_DATA_OLD].dbo.BDWCREGPREVADDR
)
SELECT
BDWCREGPREVADDR_XML.value('/M[1]', 'varchar(50)') As streetNo,
BDWCREGPREVADDR_XML.value('/M[2]', 'varchar(50)') As suburb,
BDWCREGPREVADDR_XML.value('/M[3]', 'varchar(3)') As stateCode,
BDWCREGPREVADDR_XML.value('/M[4]', 'varchar(10)') As postalCode,
BDWCREGPREVADDR_XML.value('/M[7]', 'varchar(50)') As country,
BDWCREGPREVADDR_XML.value('/M[5]', 'varchar(50)') As dateFrom,
BDWCREGPREVADDR_XML.value('/M[6]', 'varchar(50)') As dateTo
FROM CTE
GO
The query works well on all the strings other than the one provided as an example. For above the string, the query returns the following:
'Jdwnrh' ' ' 'IMPHU' 'O' ' ' '10/1991' '02/02/2011' 'BHUTAN'
It seems the code takes letters 'TH' as a new attribute and split the string on it. Does anyone know how to resolve this issue?

This seems to be related to your collation. In Latin1_General_CS_AS, the þ character is considered equivalent to th (because it's an Old English letter that sounds like "th" when pronounced).
print replace('thornþ' collate Latin1_General_CS_AS,'þ','1')
' output: 1orn1
This is not the case for all collations; for example, in Latin1_General_BIN they are separate:
print replace('thornþ' collate Latin1_General_BIN,'þ','1')
' output: thorn1
So perhaps you could look at changing the collation of the column which contains the þ characters.

the key(þ) is wrong ,if you change another word ,it's ok. when use key(þ) and key(z), there are two results:
enter image description here
enter image description here
I think maybe the key(þ) has some special meaning. hope to help you

Related

Escape XML special characters upon convert

I have working csv splitter for my needs.
You can just grab and run it as is:
declare #t table(data varchar(max))
insert into #t select 'a,b,c,d'
insert into #t select 'e,,,h'
;with cte(xm) as
(
select convert(xml,'<f><e>' + replace(data,',', '</e><e>') + '</e></f>') as xm
from #t
)
select
xm.value('/f[1]/e[1]','varchar(32)'),
xm.value('/f[1]/e[2]','varchar(32)'),
xm.value('/f[1]/e[3]','varchar(32)'),
xm.value('/f[1]/e[4]','varchar(32)')
from cte
Only issue is, that if I introduce an XML sensitive character in the data, like &:
insert into #t select 'i,j,&,k'
It fails with error: character 24, illegal character
One solution is to replace & character to &amp on the fly, like this:
select convert(xml,'<f><e>' + replace(replace(data,'&','&amp'),',', '</e><e>') + '</e></f>') as xm
but there are several dozens of special XML characters which I need to escape upon convert, and I can't really nest dozens replace(replace(replace(... functions in there. That's what i did and it is messy.
How the above code can be modified to escape XML sensitive characters, and produce the same result?
Thanks!

You have got your answer by Martin Smith already. But I think, it is worth to place an answer here for followers. Want to provide some explanantion and furthermor, the rextester-link might not be reachable in future...
If you think of a string in a table like this ...
DECLARE #mockup TABLE(SomeXMLstring VARCHAR(100));
INSERT INTO #mockup VALUES('This is a string with forbidden characters like "<", ">" or "&"');
-- ... you can easily add XML-tags:
SELECT '<root>' + SomeXMLstring + '</root>'
FROM #mockup ;
--The result would look like XML
<root>This is a string with forbidden characters like "<", ">" or "&"</root>
--But it is not! You can test this, the CAST( AS XML) will fail:
SELECT CAST('<root>This is a string with forbidden characters like "<", ">" or "&"</root>' AS XML);
--Sometimes people try to do their own replaces and start to replace <, > and & with the corresponding entities <, > and &. But this will need a lot of replacements in order to be safe.
--But XML is doing all this for us implicitly
SELECT SomeXMLstring
FROM #mockup
FOR XML PATH('')
--This is the result
<SomeXMLstring>This is a string with forbidden characters like "<", ">" or "&"</SomeXMLstring>
--And the funny thing is: We can easily create a nameless element with AS [*]:
SELECT SomeXMLstring AS [*]
FROM #mockup
FOR XML PATH('')
--The result is the same, but without the tags:
This is a string with forbidden characters like "<", ">" or "&"
--Although this is looking like XML in SSMS, this will be implicitly casted to NVARCHAR(MAX) when used as a string.
--You can use this for implicit escaping of a string wherever you feel the need to build a XML with string concatenation:
SELECT CAST('<root>' + (SELECT SomeXMLstring AS [*] FOR XML PATH('')) + '</root>' AS XML)
FROM #mockup ;
To finally answer your question
This line must use the trick:
select convert(xml,'<f><e>' + replace((SELECT data AS [*] FOR XML PATH('')),',', '</e><e>') + '</e></f>') as xm

Remove white spaces from string in sql

SELECT CONVERT(DECIMAL(18,2)
,ROUND( REPLACE
(
REPLACE(
SUBSTRING([ColumnName],0, CHARINDEX(' ',[ColumnName],1) )
,'$',''
)
,',',''
)
,2
)
) AS 'ColumnName'
,[ColumnName]
,*
FROM TABLENAME
The CHARINDEX returns index of space, but when there is no space in data it returns 0. What I want is when ever there is a white space at the end data, SUBSTRING should consider that and when there is no white space then it should consider the length of the string.

It seems to you are working with SQL Server, then i would like to apply case expression
case when CHARINDEX(' ',[ColumnName],1) > 0
then CHARINDEX(' ',[ColumnName],1)
else len([ColumnName]) end

Apparently you're working with sql-server and want to convert a string like $123,456,789.99 to a DECIMAL.
TRY_PARSE should get you the result you want:
SELECT
TRY_PARSE(REPLACE('$123,456,789.99 ', '$', '')
AS DECIMAL(18,2)
USING 'en-US') AS myDecimalNumber
Or if it's an option for you to work with the MONEY data-type you can omit the REPLACE:
SELECT
TRY_PARSE('$123,456,789.99 '
AS MONEY
USING 'en-US')) AS myDecimalNumber

Using Upper to Capitalize the first letter of City name

I am doing some data clean-up and need to Capitalize the first letter of City names. How do I capitalize the second word in a City Like Terra Bella.
SELECT UPPER(LEFT([MAIL CITY],1))+
LOWER(SUBSTRING([MAIL CITY],2,LEN([MAILCITY])))
FROM masterfeelisting
My results is this 'Terra bella' and I need 'Terra Bella'. Thanks in advance.

Ok, I know I answered this before, but it bugged me that we couldn't write something efficient to handle an unknown amount of 'text segments'.
So re-thinking it and researching, I discovered a way to change the [MAILCITY] field into XML nodes where each 'text segment' is assigned it's own Node within the xml field. Then those xml fields can be processed node by node, concatenated together, and then changed back to a SQL varchar. It's convoluted, but it works. :)
Here's the code:
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(max) not null
);
INSERT INTO #masterfeelisting VALUES
('terra bellA')
,(' terrA novA ')
,('chicagO ')
,('bostoN')
,('porT dE sanTo')
,(' porT dE sanTo pallo ');
SELECT
RTRIM
(
(SELECT
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, '')) + ' '
FROM [xmlNodeRecordSet].[nodeField].nodes('/N') as [xmlField]([xmlNode]) FOR
xml path(''), type
).value('.', 'varchar(max)')
) as [MAILCITY]
FROM
(SELECT
CAST('<N>' + REPLACE([MAILCITY],' ','</N><N>')+'</N>' as xml) as [nodeField]
FROM #masterfeelisting
) as [xmlNodeRecordSet];
Drop table #masterfeelisting;
First I create a table and fill it with dummy values.
Now here is the beauty of the code:
For each record in #masterfeelisting, we are going to create an xml field with a node for each 'text segment'.
ie. '<N></N><N>terrA</N><N>novA</N><N></N>'
(This is built from the varchar ' terrA novA ')
1) The way this is done is by using the REPLACE function.
The string starts with a '<N>' to designate the beginning of the node. Then:
REPLACE([MAILCITY],' ','</N><N>')
This effectively goes through the whole [MAILCITY] string and replaces each
' ' with '</N><N>'
and then the string ends with a '</N>'. Where '</N>' designates the end of each node.
So now we have a beautiful XML string with a couple of empty nodes and the 'text segments' nicely nestled in their own node. All the 'spaces' have been removed.
2) Then we have to CAST the string into xml. And we will name that field [nodeField]. Now we can use xml functions on our newly created record set. (Conveniently named [xmlNodeRecordSet].)
3) Now we can read the [xmlNodeRecordSet] into the main sub-Select by stating:
FROM [xmlNodeRecordSet].[nodeField].nodes('/N')
This tells us we are reading the [nodeField] as nodes with a '/N' delimiter.
This table of node fields is then parsed by stating:
as [xmlField]([xmlNode]) FOR xml path(''), type
This means each [xmlField] will be parsed for each [xmlNode] in the xml string.
4) So in the main sub-select:
Each blank node '<N></N>' is discarded. (Or not processed.)
Each node with a 'text segment' in it will be parsed. ie <N>terrA</N>
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
This code will grab each node out of the field and take its contents '.' and only grab the first character 'char(1)'. Then it will Upper case that character. (the plus sign at the end means it will concatenate this letter with the next bit of code:
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, ''))
Now here is the beauty... STUFF is a function that will take a string, from a position, for a length, and substitute another string.
STUFF(string, start position, length, replacement string)
So our string is:
[xmlField].[xmlNode].value('.', 'varchar(max)')
Which grabs the whole string inside the current node since it is 'varchar(max)'.
The start position is 1. The length is 1. And the replacement string is ''. This effectively strips off the first character by replacing it with nothing. So the remaining string is all the other characters that we want to have lower case. So that's what we do... we use LOWER to make them all lower case. And this result is concatenated to our first letter that we already upper cased.
But wait... we are not done yet... we still have to append a + ' '. Which adds a blank space after our nicely capitalized 'text segment'. Just in case there is another 'text segment' after this node is done.
This main sub-Select will now parse each node in our [xmlField] and concatenate them all nicely together.
5) But now that we have one big happy concatenation, we still have to change it back from an xml field to a SQL varchar field. So after the main sub-select we need:
.value('.', 'varchar(max)')
This changes our [MAILCITY] back to a SQL varchar.
6) But hold on... we still are not done. Remember we put an extra space at the end of each 'text segment'??? Well the last 'text segment still has that extra space after it. So we need to Right Trim that space off by using RTRIM.
7) And dont forget to rename the final field back to as [MAILCITY]
8) And that's it. This code will take an unknown amount of 'text segments' and format each one of them. All using the fun of XML and it's node parsers.
Hope that helps :)

Here's one way to handle this using APPLY. Note that this solution supports up to 3 substrings (e.g. "Phoenix", "New York", "New York City") but can easily be updated to handle more.
DECLARE #string varchar(100) = 'nEW yoRk ciTY';
WITH DELIMCOUNT(String, DC) AS
(
SELECT #string, LEN(RTRIM(LTRIM(#string)))-LEN(REPLACE(RTRIM(LTRIM(#string)),' ',''))
),
CIPOS AS
(
SELECT *
FROM DELIMCOUNT
CROSS APPLY (SELECT CHARINDEX(char(32), string, 1)) CI1(CI1)
CROSS APPLY (SELECT CHARINDEX(char(32), string, CI1.CI1+1)) CI2(CI2)
)
SELECT
OldString = #string,
NewString =
CASE DC
WHEN 0 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,8000))
WHEN 1 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,100))
WHEN 2 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,CI2-(CI1+1))) +
UPPER(SUBSTRING(string,CI2+1,1))+LOWER(SUBSTRING(string,CI2+2,100))
END
FROM CIPOS;
Results:
OldString NewString
--------------- --------------
nEW yoRk ciTY New York City

This will only capitalize the first letter of the second word. A shorter but less flexible approach. Replace #str with [Mail City].
DECLARE #str AS VARCHAR(50) = 'Los angelas'
SELECT STUFF(#str, CHARINDEX(' ', #str) + 1, 1, UPPER(SUBSTRING(#str, CHARINDEX(' ', #str) + 1, 1)));

This is a way to use imbedded Selects for three City name parts.
It uses CHARINDEX to find the location of your separator character. (ie a space)
I put an 'if' structure around the Select to test if you have any records with more than 3 parts to the city name. If you ever get the warning message, you could add another sub-Select to handle another city part.
Although... just to be clear... SQL is not the best language to do complicated formatting. It was written as a data retrieval engine with the idea that another program will take that data and massage it into a friendlier look and feel. It may be easier to handle the formatting in another program. But if you insist on using SQL and you need to account for city names with 5 or more parts... you may want to consider using Cursors so you can loop through the variable possibilities. (But Cursors are not a good habit to get into. So don't do that unless you've exhausted all other options.)
Anyway, the following code creates and populates a table so you can test the code and see how it works. Enjoy!
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(30) not null
);
Insert into #masterfeelisting select 'terra bella';
Insert into #masterfeelisting select ' terrA novA ';
Insert into #masterfeelisting select 'chicagO ';
Insert into #masterfeelisting select 'bostoN';
Insert into #masterfeelisting select 'porT dE sanTo';
--Insert into #masterfeelisting select ' porT dE sanTo pallo ';
Declare #intSpaceCount as integer;
SELECT #intSpaceCount = max (len(RTRIM(LTRIM([MAILCITY]))) - len(replace([MAILCITY],' ',''))) FROM #masterfeelisting;
if #intSpaceCount > 2
SELECT 'You need to account for more than 3 city name parts ' as Warning, #intSpaceCount as SpacesFound;
else
SELECT
cThird.[MAILCITY1] + cThird.[MAILCITY2] + cThird.[MAILCITY3] as [MAILCITY]
FROM
(SELECT
bSecond.[MAILCITY1] as [MAILCITY1]
,SUBSTRING(bSecond.[MAILCITY2],1,bSecond.[intCol2]) as [MAILCITY2]
,UPPER(SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 1, 1)) +
SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 2,LEN(bSecond.[MAILCITY2]) - bSecond.[intCol2]) as [MAILCITY3]
FROM
(SELECT
SUBSTRING(aFirst.[MAILCITY],1,aFirst.[intCol1]) as [MAILCITY1]
,UPPER(SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, 1)) +
SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 2,LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) as [MAILCITY2]
,CHARINDEX ( ' ', SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) ) as intCol2
FROM
(SELECT
UPPER (LEFT(RTRIM(LTRIM(mstr.[MAILCITY])),1)) +
LOWER(SUBSTRING(RTRIM(LTRIM(mstr.[MAILCITY])),2,LEN(RTRIM(LTRIM(mstr.[MAILCITY])))-1)) as [MAILCITY]
,CHARINDEX ( ' ', RTRIM(LTRIM(mstr.[MAILCITY]))) as intCol1
FROM
#masterfeelisting as mstr -- Initial Master Table
) as aFirst -- First Select Shell
) as bSecond -- Second Select Shell
) as cThird; -- Third Select Shell
Drop table #masterfeelisting;

SQL Server - contains an invalid XML identifier as required by FOR XML;

I'm running this query and getting below mentioned error. Can anyone help?
Column name 'Middle Name' contains an invalid XML identifier as required by
FOR XML; ' '(0x0020) is the first character at fault.
SELECT
Username as [LastName],
'' AS [Middle Name],
'' AS Birthdate,
'' AS [SSN],
0 AS [Wage Amount]
FROM
Employee
FOR XML PATH

You can't have spaces in XML element or attribute names. Use
SELECT Username AS [LastName],
'' AS [MiddleName],
'' AS Birthdate,
'' AS [SSN],
0 AS [WageAmount]
FROM Employee
FOR XML PATH

For the simplest case, Smith's solution works all right.
Since I have constraint to keep the chars, such as space, #, ', /, etc, on my XML, finally I solved this by encoding the identifier using Base64. (Just be careful the length of the name can not depass 128 bit) Then outside where the XML would be read as input data, another small code will translate Base64 easily to original string.
CREATE FUNCTION [dbo].[ufn_string_To_BASE64]
(
#inputString VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN (
SELECT
CAST(N'' AS XML).value(
'xs:base64Binary(xs:hexBinary(sql:column("bin")))'
, 'VARCHAR(MAX)'
) Base64Encoding
FROM (
SELECT CAST(#inputString AS VARBINARY(MAX)) AS bin
) AS bin_sql_server_temp
)
END
GO
It's important to take VARCHAR, taht will bring us shorter Base64 code.
You could add char(10), char(13) in the identifier as well.
Dynamic SQL could be help to build a temporaire table to stock intermediate data.
In my case, C# decodes the Base64 to string
if (value.StartsWith("_"))
{
var base64Encoded = value.Substring(1).Replace('_','=');
try
{
var data = System.Convert.FromBase64String(base64Encoded);
value = Encoding.GetEncoding(1252).GetString(data);
}
catch (Exception e)
{
log.LogInformation(e.Message);
}
}
Be care of:
XML identifier could not start with numbers, so prefix with _ and remove them in c#.
= is not accepted in XML identifier. That should be replace by something else like _.
The Encoding in the original string when decode to string, using the right Encoding codepage, like 1252 for French char.
In real that would be more complexe than what's talking here.

SQL: how to select a substring between special characters

My string looks something like this:
\\\abcde\fghijl\akjfljadf\\
\\xyz\123
I want to select everything between the 1st set and next set of slashes
Desired result:
abcde
xyz
EDITED: To clarify, the special character is always slashes - but the leading characters are not constant, sometimes there are 3 slashes and other times there are only 2 slashes, followed by texts, and then followed by 1 or more slashes, some more texts, 1 or more slash, so on and so forth. I'm not using any adapter at all, just looking for a way to select this substring in my SQL query
Please advise.
Thanks in advance.

You could do a cross join to find the second position of the backslash. And then, use substring function to get the string between 2nd and 3rd backslash of the text like this:
SELECT substring(string, 3, (P2.Pos - 2)) AS new_string
FROM strings
CROSS APPLY (
SELECT (charindex('\', replace(string, '\\', '\')))
) AS P1(Pos)
CROSS APPLY (
SELECT (charindex('\', replace(string, '\\', '\'), P1.Pos + 1))
) AS P2(Pos)
SQL Fiddle Demo
UPDATE
In case, when you have unknown number of backslashes in your string, you could just do something like this:
DECLARE #string VARCHAR(255) = '\\\abcde\fghijl\akjfljadf\\'
SELECT left(ltrim(replace(#string, '\', ' ')),
charindex(' ',ltrim(replace(#string, '\', ' ')))-1) AS new_string
SQL Fiddle Demo2

Use substring, like this (only works for the specified pattern of two slashes, characters, then another slash):
declare #str varchar(100) = '\\abcde\cc\xxx'
select substring(#str, 3, charindex('\', #str, 3) - 3)
Replace #str with the column you actually want to search, of course.
The charindex returns the location of the first slash, starting from the 3rd character (i.e. skipping the first two slashes). Then the substring returns the part of your string starting from the 3rd character (again, skipping the first two slashes), and continuing until just before the next slash, as determined by charindex.
Edit: To make this work with different numbers of slashes at the beginning, use patindex with regex to find the first alphanumeric character, instead of hardcoding that it should be the third character. Example:
declare #str varchar(100) = '\\\1Abcde\cc\xxx'
select substring(#str, patindex('%[a-zA-Z0-9]%', #str), charindex('\', #str, patindex('%[a-zA-Z0-9]%', #str)) - patindex('%[a-zA-Z0-9]%', #str))

APH's solution works better if your string always follows the pattern as described. However this will get the text despite the pattern.
declare #str varchar(100) = '\\abcde\fghijl\akjfljadf\\'
declare #srch char(1) = '\'
select
SUBSTRING(#str,
(CHARINDEX(#srch,#str,(CHARINDEX(#srch,#str,1)+1))+1),
CHARINDEX(#srch,#str,(CHARINDEX(#srch,#str,(CHARINDEX(#srch,#str,1)+1))+1))
- (CHARINDEX(#srch,#str,(CHARINDEX(#srch,#str,1)+1))+1)
)
Sorry for the formatting.
Edited to correct user paste error. :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL XML parsing split the string on letters 'TH' - sql

the key(þ) is wrong ,if you change another word ,it's ok. when use key(þ) and key(z), there are two results: enter image description here enter image description here I think maybe the key(þ) has some special meaning. hope to help you

Related

Escape XML special characters upon convert

Remove white spaces from string in sql

Using Upper to Capitalize the first letter of City name

SQL Server - contains an invalid XML identifier as required by FOR XML;

SQL: how to select a substring between special characters

Categories

Resources