Trimming spaces out of a string based on a pattern in SQL Server - sql

I have a varchar(max) field with XML data in it. I need to clean it by removing the spaces between the tags. For eg:
</tns:time_changed> <tns:changed_properties>
should be cleaned as
</tns:time_changed><tns:changed_properties>
I need to do this in a single query and I cannot use replace all white spaces as there are other relevant spaces in the content.

Try like this:
UPDATE table
SET xmlColumnName = REPLACE ( xmlColumnName , '> <' , '><' );

Converted to XML type and it automatically took care of the spaces.
Replaced the
<?xml version="1.0"?>
from the field with a blank and it got rid of the error that I was getting "text/xmldecl not at the beginning of input".

Related

Why does Replace '&' with '&' not work for XML data?

I need to download a XML file and its data is retrieved from stored procedure.
My problem is if the data contains any '&' symbol, in XML file it is showing as
'&'
I have used REPLACE function in my Procedure as shown below but...
SELECT #V_NAME = REPLACE(#V_NAME, ' & ', ' & ');
UPDATE #TMP_RS_XML
SET OBJECT_ID=#V_ID,
FNAME=#V_FILE,
DOCUMENT=(SELECT #V_NAME as 'Description',
...
Now, the output is:
&amp;
This is not the way this is supposed to work...
XML is not just some text with fancy extras but with very strict rules. As any text-based container you will need either magic words or special characters to tell the consumer what is the content and what is the markup.
The most important markup characters in XML are < and > - of course. If you want these characters to be part of your content, you'll have to replace them. That is done with xml entities.
Within the content, any XML entity will start with an ampersand (< comes out as <), therefore the ampersand is the third most important special character. If you want an ampersand within the content you must use an entitiy (&) as a code for in this place we want an ampersand.
You must distinguish between the text you see, when you look at the XML and the actual content taken out of the XML.
Try this:
DECLARE #SomeStringWithSpecialCharacters NVARCHAR(200)=N'This & that -> let''s see, why how some foreign characters behave: அரிச். And what about a line break?' + CHAR(13) + CHAR(10) + 'Here is the second line. And an unprintable?' + CHAR(2);
--Here we use FOR XML, all the escaping is done implicitly
SELECT #SomeStringWithSpecialCharacters AS TestIt FOR XML PATH('test');
The result
<test>
<TestIt>This & that -> let's see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?</TestIt>
</test>
Now I take the XML as it came out of the first part and place it into a XML-typed variable.
Attention: I had to remove the  entity, check it out...
DECLARE #SomeXML XML=
N'<test>
<TestIt>This & that -> let''s see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?</TestIt>
</test>';
--Now we do the magic using .value() against a native XML:
SELECT #SomeXML.value('(/test/TestIt/text())[1]','nvarchar(max)');
The result comes out with all entities re-espaced:
This & -> let's see, why how some foreign characters behave: அரிச். And what about a line break?
Here is the second line. And an unprintable?
The general hint is: Never do the replacements yourself. Pushing content into the XML will need escaping and reading content out of XML will need the opposite. All this is done for you implicitly, when you use the proper tools.
'&' is a special character that is being rendered out of ' &amp ; '
The best practice here would be to decode the XML, adding a reference below:
https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.htmldecode?redirectedfrom=MSDN&view=netframework-4.8#overloads

Getting Unescaped JSON from SQL

I've created a stored procedure to pull data as a JSON object from my SQL Server database. All my data is relational and I'm trying to get it out as a JSON string.
Currently, I am able to get out a JSON string from SQL Server just fine, however this object ALWAYS includes escape characters (e.g. "{\"field\":\"value\"}). I'd like to pull the same JSON but without escaped characters. To test this I'm using some simple queries and getting them into .NET with a SqlDataAdapter using my stored procedure.
The thing that puzzles me is that when I run my query within SSMS, I never see any escape characters, but as soon as it's pulled a .NET application, the escape characters appear. I'd like to prevent this from happening and have my applications get only the unescaped JSON string.
I've tried several suggestions I've found during my research but nothing has produced my desired results. The changes I've seen (documented in MSDN and in other SO posts) have dealt with getting unescaped results, but only within SSMS and not within other applications.
What I've tried:
Simple Json query set to param and then using JSON_QUERY to select the param:
DECLARE #JSON varchar(max)
SET #JSON = (SELECT '{"Field":"Value"}' AS myJson FOR JSON PATH)
SELECT JSON_QUERY(#JSON) AS 'JsonResponse' FOR JSON PATH
This produces the following in a .NET application:
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS:
[{"JsonResponse":[{"myJson":"{\"Field\":\"Value\"}"}]}]
Simple Json query without param using JSON_QUERY:
SELECT JSON_QUERY('{"Field":"Value"}') AS 'JsonResponse' FOR JSON PATH
This produces the following in a .NET application
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS
[{"JsonResponse":{"Field":"Value"}}]
Simple Json query with temp tables using JSON_QUERY:
CREATE TABLE #temp(
jsoncol varchar(255)
)
INSERT INTO #temp VALUES ('{"Field":"Value"}')
SELECT JSON_QUERY(jsoncol) AS 'JsonResponse' FROM #temp FOR JSON PATH
DROP TABLE #temp
This produces the following in a .NET application:
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS:
[{"JsonResponse":{"Field":"Value"}}]
I'm lead to believe that there is no way to get out a JSON string from SQL Server without having the escaped characters. In case the examples above weren't enough, I've included my stored procedure here. Hopefully someone can point me in the right direction.
This depends where you look at the string...
In SSMS a string is marked with single quotes. The double quote can exist within a string without problems:
DECLARE #SomeString = 'This can include "double quotes" but you have to double ''single quote''';
In a C# application the double quote is the string marker. So the above example would look like this:
string SomeString = "This must escape \"double quotes\" but you can use 'single quote' without problems";
Within your IDE (is it VS?) you can look at the string as is or as you'd need to be used in code. Your example shows " at the beginning and at the end of your string. That is a clear hint, that this is the option as in code. You could use this string and place it into your code. The real string, which is used and processed will not contain escape characters.
Hint: Escape characters are only needed in human-readable formats, where there are characters with special meaning (a ; in a CSV, a < in HTML and so on)...
UPDATE Some more explanation
Escape characters are needed to place a string within a string. Somehow you have to mark the beginning and the end of the string, but there is nothing else you can use then some magic characters.
In order to use these characters within the embedded string you have to go one the following ways:
escaping (e.g. XML will replace & with & and JSON will replace a " with \" as JSON uses the " to mark its labels) or
Magic borders (e.g. a CDATA-section in XML, which allows to place unescaped characters as is: <![CDATA[forbidden characters &<> allowed here]]>)
Whatever you do, you must distinguish between the visible string in an editor or in a text-based container like XML or JSON and the value the application will pick out of this.
An example:
<root><a>this & that</a></root>
visible string: "this & that"
real value: "this & that"

How to check my data in SQL Server have carriage return and line feed? [duplicate]

This question already has answers here:
SQL query for a carriage return in a string and ultimately removing carriage return
(10 answers)
Closed 8 years ago.
Facing a problem, it seems my data stored in SQL Server does not stored correctly, simply put, how to verify that a varchar data has carriage return and line feed in it? I try to print them out, does not show the special characters.
Thanks
You can use SQL Server's char(n) and contains() functions to match on field contents in your WHERE clause.
carriage return: char(13)
line feed: char(10)
The following SQL will find all rows in some_table where the values of some_field contain newline and/or carriage return characters:
SELECT * FROM some_table
WHERE CONTAINS(some_field, char(13)) OR CONTAINS(some_field, char(10))
To remove carriage returns in your some_field values you could use the replace() function long with char() and contains(). The SQL would look something like:
UPDATE some_table
SET some_field = REPLACE(some_field, char(13), '')
WHERE CONTAINS(some_field, char(13))
To remove new lines you can follow up the last statement with the char(10) version of that update. How you do it all depends on what types of newlines your text contains. But, depending on where the text was inserted/pasted from the new lines may be \r\n or \n so running updates against both \r and \n characters would be safer than assuming that you're getting one version of newline or the other.
Note that if the newlines were removed and you want to retain them then you have to fix the issue at the point of entry. You can't replace or fix what has been removed so you should save the original text data in a new column that holds the original, unmodified text.
To add to what others have said; when I need to embed newlines in T-SQL, I tend to do;
DECLARE #nl CHAR(2) = CHAR(13) + CHAR(10);
..then use #nl as required. That's for Windows line-endings, naturally.
Take a look at the Char function. See MSDN. This will help look for the special characters.

DB2/iSeries SQL clean up CR/LF, tabs etc

I need to find and clean up line breaks, carriage returns, tabs and "SUB"-characters in a set of 400k+ string records, but this DB2 environment is taking a toll on me.
Thought I could do some search and replacing with the REPLACE() and CHR() functions, but it seems CHR() is not available on this system (Error: CHR in *LIBL type *N not found). Working with \t, \r, \n etc doesn't seem to be working either. The chars can be in the middle of strings or at the end of them.
DBMS = DB2
System = iSeries
Language = SQL
Encoding = Not sure, possibly EBCDIC
Any hints on what I can do with this?
I used this SQL to find x'25' and x'0D':
SELECT
<field>
, LOCATE(x'0D', <field>) AS "0D"
, LOCATE(x'25', <field>) AS "25"
, length(trim(<field>)) AS "Length"
FROM <file>
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
And I used this SQL to replace them:
UPDATE <file>
SET <field> = REPLACE(REPLACE(<field>, x'0D', ' '), x'25', ' ')
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
If you want to clear up specific characters like carriage return (EBCDIC x'0d') and line feed (EBCDIC x'25') you should find the translated character in EBCDIC then use the TRANSLATE() function to replace them with space.
If you just want to remove undisplayable characters then look for anything under x'40'.
Here is an sample script that replaces X'41' by X'40'. Something that was creating issues at our shop:
UPDATE [yourfile] SET [yourfield] = TRANSLATE([yourfield], X'40',
X'41') WHERE [yourfield] like '%' concat X'41' concat '%'
If you need to replace more than one character, extend the "to" and "from" hexadecimal strings to the values you need in the TRANSLATE function.
Try TRANSLATE or REPLACE.
The brute force method involves using POSITION to find the errant character, then SUBSTR before and after it. CONCAT the two substrings (less the undesirable character) to re-form the column.
The character encoding is almost certainly one of the EBCDIC character sets. Depending on how the table got loaded in the first place, the CR may be x'0d' and the LF x'15', x'25'. An easy way to find out is to get to a green screen and do a DSPPFM against the table. Press F10 then F11 to view the table is raw, hexadecimal (over/under) format.
For details on the available functions see the
DB2 for i5/OS SQL Reference.
Perhaps the TRANSLATE() function will serve your needs.
TRANSLATE( data, tochars, fromchars )
...where fromchars is the set of characters you don't want, and tochars is the corresponding characters you want them replaced with. You may have to write this out in hex format, as x'nnnnnn...' and you will need to know what character set you are working with.
Using the DSPFFD command on your table should show the CCSID of your fields.
we struggled a lot to replace the new line char and carriage return from flat file.
Finally we used below sql to sort the issue.
REPLACE(REPLACE(COLUMN_NAME, CHR(13), ''), CHR(10), '')
Try it out
CR = CHR(13)
LF = CHR(10)

Replace() on a field with line breaks in it?

So I have a field that's basically storing an entire XML file per row, complete with line breaks, and I need to remove some text from close to three hundred rows. The replace() function doesn't find the offending text no matter what I do, and all I can find by searching is a bunchy of people trying to remove the line breaks themselves. I don't see any reason that replace() just wouldn't work, so I must just be formatting it wrong somehow. Help?
Edit: Here's an example of what I mean in broad terms:
<script>...</script><dependencies>...</dependencies><bunch of other stuff></bunch of other stuff><labels><label description="Field2" languagecode="1033" /></labels><events><event name="onchange" application="false" active="true"><script><![field2.DataValue = (some equation);
</script><dependencies /></event></events><a bunch more stuff></a bunch more stuff>
I need to just remove everything between the events tags. So my sql code is this:
replace(fieldname, '<events><event name="onchange" application="false" active="true"><script><![field2.DataValue = (some equation);
</script><dependencies /></event></events>', '')
I've tried it like that, and I've tried it all on one line, and I've tried using char(10) where the line breaks are supposed to be, and nothing.
Nathan's answer was close. Since this question is the first thing that came up from a search I wanted to add a solution for my problem.
select replace(field,CHAR(13)+CHAR(10),' ')
I replaced the line break with a space incase there was no break. It may be that you want to always replace it with nothing in which case '' should be used instead of ' '.
Hope this helps someone else and they don't have to click the second link in the results from the search engine.
Worked for me on SQL2012-
UPDATE YourTable
SET YourCol = REPLACE(YourCol, CHAR(13) + CHAR(10), '')
If your column is an xml typed column, you can use the delete method on the column to remove the events nodes. See http://msdn.microsoft.com/en-us/library/ms190254(v=SQL.90).aspx for more info.
try two simple tests.
try the replace on an xml string that has no double quotes (or single quotes) but does have CRLFs. Does it work? If yes, you need to escape the quote marks.
try the replace on an xml string that has no CRLFs. Does it work? Great. If yes use two nested replace() one for the CRLFs only, then a second outter replace for the string in question.
A lot of people do not remember that line breaks are two characters
(Char 10 \n, and Char 13 \r)
replace both, and you should be good.
SELECT
REPLACE(field , CHR(10)+CHR(13), '' )
FROM Blah..