How to preserve an ampersand (&) while using FOR XML PATH on SQL 2005 - sql

Are there any tricks for preventing SQL Server from entitizing chars like &, <, and >? I'm trying to output a URL in my XML file but SQL wants to replace any '&' with '&'
Take the following query:
SELECT 'http://foosite.com/' + RTRIM(li.imageStore)
+ '/ImageStore.dll?id=' + RTRIM(li.imageID)
+ '&raw=1&rev=' + RTRIM(li.imageVersion) AS imageUrl
FROM ListingImages li
FOR XML PATH ('image'), ROOT ('images'), TYPE
The output I get is like this (&s are entitized):
<images>
<image>
<imageUrl>http://foosite.com/pics4/ImageStore.dll?id=7E92BA08829F6847&raw=1&rev=0</imageUrl>
</image>
</images>
What I'd like is this (&s are not entitized):
<images>
<image>
<imageUrl>http://foosite.com/pics4/ImageStore.dll?id=7E92BA08829F6847&raw=1&rev=0</imageUrl>
</image>
</images>
How does one prevent SQL server from entitizing the '&'s into '&'?

There are situations where a person may not want well formed XML - the one I (and perhaps the original poster) encountered was using the For XML Path technique to return a single field list of 'child' items via a recursive query. More information on this technique is here (specifically in the 'The blackbox XML methods' section):
Concatenating Row Values in Transact-SQL
For my situation, seeing 'H&E' (a pathology stain) transformed into 'well formed XML' was a real disappointment. Fortunately, I found a solution... the following page helped me solve this issue relatively easily and without having re-architect my recursive query or add additional parsing at the presentation level (for this as well for as other/future situations where my child-rows data fields contain reserved XML characters): Handling Special Characters with FOR XML PATH
EDIT: code below from the referenced blog post.
select
stuff(
(select ', <' + name + '>'
from sys.databases
where database_id > 4
order by name
for xml path(''), root('MyString'), type
).value('/MyString[1]','varchar(max)')
, 1, 2, '') as namelist;

What SQL Server generates is correct. What you expect to see is not well-formed XML. The reason is that & character signifies the start of an entity reference, such as &. See the XML specification for more information.
When your XML parser parses this string out of XML, it will understand the & entity references and return the text back in the form you want. So the internal format in the XML file should not cause a problem to you unless you're using a buggy XML parser, or trying to parse it manually (in which case your current parser code is effectively buggy at the moment with respect to the XML specification).

Try this....
select
stuff(
(select ', <' + name + '>'
from sys.databases
where database_id > 4
order by name
for xml path(''), root('MyString'), type
).value('/MyString[1]','varchar(max)')
, 1, 2, '') as namelist;

Related

T-SQL XML node value

I'm trying to extract the values from the following xml document
<response>
<entry>
<title>the tales</title>
<subject-area code="1" abbrev="XX1">Test1</subject-area>
<subject-area code="2" abbrev="XX2">Test2</subject-area>
</entry>
</response>
but I'm having problem getting the subject-area text values i.e. "Test1"
I'm using the below T-SQL to extract the rest of the values, I'm using a cross appy on the node as I required this to loop to get all values so can't use [1] etc to extract it that way as I'm not sure how many subject area there will be.
Any ideas
SELECT
,a.APIXMLResponse.value('(response[1]/entry[1]/title[1])','VARCHAR(250)') AS Title
,sa.value('(./#code)','varchar(10)') AS SubjectAreaCode
,sa.value('(./#abbrev)','varchar(10)') AS SubjectAreaAbbrev
FROM [dbo].[APIXML] a
CROSS APPLY APIXMLResponse.nodes('response/entry/subject-area') AS SubjectArea(sa)
Although there is a working solution in a comment already, I'd like to point out some things:
Using just '.' as path can lead to very annoying effects, if there are nested elements.
Looking for performance it is recommended to use text()[1] to read the needed value at its actual place (Here are some details with examples).
As the internal values are NVARCHAR(x) it is slightly faster to use NVARCHAR as target type (if you don't have a reason to do otherwise...
That's my query:
SELECT
a.APIXMLResponse.value('(response/entry/title)[1]','NVARCHAR(250)') AS Title
,sa.value('#code','nvarchar(10)') AS SubjectAreaCode
,sa.value('#abbrev','nvarchar(10)') AS SubjectAreaAbbrev
,sa.value('text()[1]','nvarchar(10)') AS SubjectAreaContent
FROM #mockup a
CROSS APPLY APIXMLResponse.nodes('response/entry/subject-area') AS SubjectArea(sa)

How to make LIKE in SQL look for specific string instead of just a wildcard

My SQL Query:
SELECT
[content_id] AS [LinkID]
, dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name'))) AS [Physician Name]
FROM
[DB].[dbo].[table1]
WHERE
[id] = '188'
AND
(content LIKE '%Urology%')
AND
(contentS = 'A')
ORDER BY
--[content_title]
dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name')))
The issue I am having is, if the content is Neurology or Urology it appears in the result.
Is there any way to make it so that if it's Urology, it will only give Urology result and if it's Neurology, it will only give Neurology result.
It can be Urology, Neurology, Internal Medicine, etc. etc... So the two above used are what is causing the issue.
The content is a ntext column with XML tag inside, for example:
<root><Location><location>Office</location>
<office>Office</office>
<Address><image><img src="Rd.jpg?n=7513" /></image>
<Address1>1 Road</Address1>
<Address2></Address2>
<City>Qns</City>
<State>NY</State>
<zip>14404</zip>
<phone>324-324-2342</phone>
<fax></fax>
<general></general>
<from_north></from_north>
<from_south></from_south>
<from_west></from_west>
<from_east></from_east>
<from_connecticut></from_connecticut>
<public_trans></public_trans>
</Address>
</Location>
</root>
With the update this content column has the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Physicians>
<name>Doctor #1</name>
<picture>
<img src="phys_lab coat_gradation2.jpg?n=7529" />
</picture>
<gender>M</gender>
<langF1>
English
</langF1>
<specialty>
<a title="Neurology" href="neu.aspx">Neurology</a>
</specialty>
</Physicians>
</root>
If I search for Lab the result appears because there is the text lab in the column.
This is what I would do if you're not into making a CLR proc to use Regexes (SQL Server doesn't have regex capabilities natively)
SELECT
[...]
WHERE
(content LIKE #strService OR
content LIKE '%[^a-z]' + #strService + '[^a-z]%' OR
content LIKE #strService + '[^a-z]%' OR
content LIKE '%[^a-z]' + #strService)
This way you check to see if content is equal to #strService OR if the word exists somewhere within content with non-letters around it OR if it's at the very beginning or very end of content with a non-letter either following or preceding respectively.
[^...] means "a character that is none of these". If there are other characters you don't want to accept before or after the search query, put them in every 4 of the square brackets (after the ^!). For instance [^a-zA-Z_].
As I see it, your options are to either:
Create a function that processes a string and finds a whole match inside it
Create a CLR extension that allows you to call .NET code and leverage the REGEX capabilities of .NET
Aaron's suggestion is a good one IF you can know up front all the terms that could be used for searching. The problem I could see is if someone searches for a specific word combination.
Databases are notoriously bad at semantics (i.e. they don't understand the concept of neurology or urology - everything is just a string of characters).
The best solution would be to create a table which defines the terms (two columns, PK and the name of the term).
The query is then a join:
join table1.term_id = terms.term_id and terms.term = 'Urology'
That way, you can avoid the LIKE and search for specific results.
If you can't do this, then SQL is probably the wrong tool. Use LIKE to get a set of results which match and then, in an imperative programming language, clean those results from unwanted ones.
Judging from your content, can you not leverage the fact that there are quotes in the string you're searching for?
SELECT
[...]
WHERE
(content LIKE '%""Urology""%')

FOR XML could not serialize the data for node because it contains a character (0x000E)

I am trying to use FOR XML in SSRS, but when the report runs it sometimes gives me this error:
FOR XML could not serialize the data for node 'NoName' because it contains a character (0x000E) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive.
I'm using FOR XML to concatenate a comments column into one cell within SSRS. Since multiple comments can exist for one user, this would solve the issue with duplicates. Does anyone have an idea why I would get this error in SSRS?
It appears to be a special character that looks like a musical note. You can find the row that is causing your problem like this:
SELECT Notes FROM MyTable WHERE Notes like '%' + char(0x000E) + '%'
You can fix the problem by removing the offending character.
UPDATE MyTable SET Notes = REPLACE(Notes, char(0x000E) , '')
WHERE Notes like '%' + char(0x000E) + '%'

SQL Server : remove duplicated text within a string

I have a SQL Server 2008 table with a column containing lengthy HTML text. Near the top there is a link provided for an associated MP3 file which is unique to each record. The links are are all formatted as follows:
<div class="MediaSaveAs">Download Audio </div>
Unfortunately many records contain two or three sequential and identical instances of this link where there should be only one. Is there a relatively simple script I can run to find and eliminate the redundant links?
I'm not entirely sure - because your explanation wasn't very clear - but this appears to do what you want, although whether or not you consider this to be a "simple script", I don't know.
declare #Link nvarchar(200) = N'<div class="MediaSaveAs">Download Audio </div>'
declare #BadData nvarchar(max) = N'cbjahcgfhjasgfzhjaucv' + replicate(#Link, 3) + N'cabhjcsghagj',
#StartPattern nvarchar(34) = N'<div class="MediaSaveAs"><a href="',
#EndPattern nvarchar(27) = N'">Download Audio </a></div>'
select #BadData
select replace (
#BadData,
substring(#BadData, charindex(#StartPattern, #BadData), len(#BadData)-charindex(reverse(#EndPattern), reverse(#BadData))-charindex(#StartPattern, #BadData) + 2),
substring(#BadData, charindex(#StartPattern, #BadData), charindex(#EndPattern, #BadData) + len(#EndPattern) - charindex(#StartPattern, #BadData))
)
Personally I would not like to have to maintain this code; I would far rather use a script in another language that can actually parse HTML. You said this is "just a repeated text issue", but that doesn't mean it's an easy problem and especially not in a language like TSQL that has such limited support for string operations.
For future reference, please put all relevant information into the question - you can edit it if you need to - instead of leaving them in the comments where they are difficult to read and may be overlooked. And please post sample data and results instead of describing things in words.
First we need to identify the file names, which we can do with PATINDEX:
select
substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
from files
And then secondly identify and the duplicates, check it out:
delete
from files
where id not in (
select max(id)
from files
group by substring(html, PATINDEX('%filename%.mp3%', html), PATINDEX('%.mp3%', html)-PATINDEX('%filename%.mp3%', html)+4)
)
http://www.sqlfiddle.com/#!3/887a3/5

Parsing XML Element value entirely in SQL form arbitrary string

We have log/audits we have compiled over some time that we would like to run some brief reports on.
One of the columns in the logs is JSON, but contains XML. We want to be able to parse out the value of a certain XML tag for each of the rows. So given an arbitrary string such as the following:
{ "XmlData" :"<tag1><tag2><TagToParse>234</TagToParse></tag2><tag1>".....}
I would like to run a sql query that return 234 when I give it the tag name TagToParse
What is the easiest way to do this ENTIRELY in SQL?
Give your container will always be tag1, then something like this should do it:
DECLARE #MyXML XML
SET #MyXML = '<tag1><tag2><TagToParse>234</TagToParse></tag2></tag1>'
SELECT
a.b.value('(/tag1//TagToParse/node())[1]', 'nvarchar(max)') AS Tag
FROM #MyXML.nodes('tag1') a(b)
Good luck.