I am trying to write an efficient function to use in a calculated field which has the following characteristics
Replace all non alpha numeric characters with space
Replace multiple white spaces with a space
Trim and lower the results
Example input
A B##%$$C &^%D
Example output
a b c d
A normal regex pattern would match like so
[\W_]+
The following works, however I am not sure if there is a more efficient approach than using 2 loops ( O(n2) complexity at least) with PatIndex and Stuff, charindex and replace
Create Function [dbo].[Clean](#Temp nvarchar(1000))
Returns nvarchar(1000)
AS
Begin
Declare #Pattern as varchar(50) = '%[^a-z0-9 ]%'
While PatIndex(#Pattern, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#Pattern, #Temp), 1, ' ')
while charindex(' ',#Temp ) > 0
set #Temp = replace(#Temp, ' ', ' ')
Return LOWER(TRIM(#Temp))
End
Usage
Select dbo.Clean(' A B##%$$C &^%D ')
Result
a b c d
Is there potentially a single pass approach I can use, or a sneaky method I am not aware of?
I'm not able to test the performance, but the following approach (without loops and based on some string manipulations) is an additional option.
Note, that you'll need at least SQL Server 2017 (for the TRANSLATE() call).
-- Input text and patterns
DECLARE #text varchar(1000) = ' A B##%$$C &^%D'
DECLARE #alphanumericpattern varchar(36) = 'abcdefghijklmnopqrstuvwxyz0123456789'
DECLARE #notalphanumericpattern varchar(1000)
-- Trim and lower the input text
SELECT #text = RTRIM(LTRIM(LOWER(#text)))
-- Get not alpha-numeric characters
SELECT #notalphanumericpattern =
REPLACE(
TRANSLATE(#text, #alphanumericpattern, REPLICATE('a', LEN(#alphanumericpattern))),
'a',
''
)
-- Replace all not alpha-numeric characters with a space
SELECT #text =
REPLACE(
TRANSLATE(#text, #notalphanumericpattern, REPLICATE('$', LEN(#notalphanumericpattern))),
'$',
' '
)
-- Replace multiple spaces with a single space
SELECT #text =
REPLACE(
REPLACE(
REPLACE(
#text,
' ',
'<>'
),
'><',
''
),
'<>',
' '
)
Result:
a b c d
I have a string of data
'["Dog",,,1,"Person","2020-03-17",,4,"Todd]'
I am trying to use the replace function to replace double commas with NULL values
Solution
'["Dog",NULL,NULL,1,"Person","2020-03-17",NULL,4,"Todd]'
But I keep ending up with
'"Dog",NULL,,1,"Person","2020-03-17",NULL,4,"Todd'
(The ,,, needs to become ,NULL,NULL, but only becomes ,NULL,,)
Here is my sample code I'm using
REPLACE(FileData, ',,' , ',NULL,')
WHERE FileData LIKE '%,,%'
If you do the same replacement twice, any number of sequential commas will get handled.
REPLACE(REPLACE(FileData, ',,' , ',NULL,'), ',,' , ',NULL,')
The first REPLACE deals with all the odd positions...
',,,,,,,,'` => ',NULL,,NULL,,NULL,,NULL,'
Doing it again will deal with all of the remaining positions.
=> ',NULL,NULL,NULL,NULL,NULL,NULL,NULL,'
Note, by specifically handling a special case of three consecutive commas (as in an other answer here) you won't handle four or five or six, etc. The above solution generalises to Any length of consecutive commas.
To be fully robust, you may also need to consider when there is a missing NULL at the first or last place in the string.
[,ThatOneToMyLeft,and,ThatOneToMyRight,]
A laborious but robust approach could be to replace [, and ,] with [,, and ,,] respectively, then do the double-replacement, then undo the first steps...
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
FileData,
'[,',
'[,,'
),
',]',
',,]'
),
',,',
',NULL,'
),
',,',
',NULL,'
),
',]',
']',
),
'[,',
'['
)
There are ways to make even that less verbose, but I have to run right now :)
You can try the following:
REPLACE(REPLACE(FileData, ',,,' , ',NULL,,'), ',,' , ',NULL,')
Where FileData LIKE '%,,%'
You can create a function for your problem solving that associates to string replacement function.
Check this:
update table1
set column1 = dbo.ReplaceEx(column1, ',', 'NULL')
where column1 like '%,,%'
create function dbo.ReplaceEx(#string varchar(2000), #separator varchar(4), #nullValue varchar(10))
returns varchar(4000)
with execute as caller
as
begin
declare #result varchar(4000);
set #result = '';
select #result = concat_ws(#sep, #result,
case when rtrim(value) = '' then #nullValue
else case when ltrim(rtrim(value)) = '[' then '[' + #nullValue
else case when ltrim(rtrim(value)) = ']' then #nullValue + ']'
else value end end end
)
from string_split(#string, #separator);
return (#result);
end;
I have a Column content like this:
CustomTags
<CustomTagsSerialiser>
<custom-tags>
<tag>Visas and travel</tag>
<tag>Explore Options</tag>
<tag>Consider – feasibility</tag>
</custom-tags>
</CustomTagsSerialiser>
I can query g.[CustomTags].value('(/CustomTagsSerialiser//custom-tags)[1]', 'nvarchar(500)') as Custom_Tag to get result like
Visas and travelExplore OptionsConsider – feasibility
But I want the result to have a tag separated by comma (in the same column), like the following:
Visas and travel,Explore Options,Consider – feasibility
Ideally, I would like this to be implemented by using XML functionality/node
instead of breaking it into + ',' + or coalesce
You may refere How Stuff and 'For Xml Path' work in Sql Server this answer.
try below
SELECT
STUFF((SELECT
',' + CTS.tag.value('(.)[1]', 'nvarchar(500)')
FROM
Temp12345
CROSS APPLY
col1.nodes('/CustomTagsSerialiser/custom-tags/tag') AS CTS(tag)
FOR XML PATH('')
), 1, 1, '')
should be this without using cross apply
STUFF((SELECT ',' + x.t.value('.', 'varchar(50)') FROM
[g].CustomTags.nodes('//tag') x(t) FOR XML PATH('')), 1, 1, '') AS 'Custom Tags'
I'm trying to convert a string to rows using T-SQL. I've found some people using XML for this but I'm running into troubles.
The original record:
A new line seperated string of data
New In Progress Left Message On Hold Researching Researching (2nd Level) Researching (3rd Level) Resolved Positive False Positive Security Respond
Using the following statement converts this string into XML:
select
cast('<i>'+REPLACE(convert(varchar(max), list_items), CHAR(13) + CHAR(10),'</i><i>')+'</i>' as xml)
from
field
where
column_name = 'state' and table_name = 'sv_inquiry'
XML string:
<i>Unassigned</i><i>Assigned</i><i>Transferred</i><i>Accepted</i><i>Closed</i><i>Reactivated</i>
Now I would like to convert every 'i' node into a separate row. I've constructed the query below, but I can't get it working in the way that it returns all the rows...
select x.i.value('i[1]', 'varchar(30)')
from (
select cast('<i>'+REPLACE(convert(varchar(max), list_items), CHAR(13) + CHAR(10),'</i><i>')+'</i>' as xml)
from field
where column_name='state' and table_name='sv_inquiry'
) x(i)
This will return
Unassigned
To be clear, when i change 'i[1]' into 'i[2]' it will return 'Assigned'. I've tried '.', this will return the whole string in a single record...
How about using the nodes method on an XML datatype.
declare #xml xml
set #xml = '<i>Unassigned</i><i>Assigned</i><i>Transferred</i><i>Accepted</i><i>Closed</i><i>Reactivated</i>'
select
t.c.value('.', 'nvarchar(100)') as [Word]
from
#xml.nodes('/i') as t(c)
You can split a string into rows without XML, see for example the fnSplitString function at SQL Server Central.
Here's an example using the nodes() function of the xml type. I'm using a space as the delimiter because SQL Fiddle doesn't play well with line feeds:
select node_column.value('.', 'varchar(max)')
from (
select cast('<i>' + replace(list_items, ' ', '</i><i>') +
'</i>' as xml) xml_value
from field
) f
cross apply
xml_value.nodes('/i') node_table(node_column);
Live example at SQL Fiddle.
The following code snippet on SQL server 2005 fails on the ampersand '&':
select cast('<name>Spolsky & Atwood</name>' as xml)
Does anyone know a workaround?
Longer explanation, I need to update some data in an XML column, and I'm using a search & replace type hack by casting the XML value to a varchar, doing the replace and updating the XML column with this cast.
select cast('<name>Spolsky & Atwood</name>' as xml)
A literal ampersand inside an XML tag is not allowed by the XML standard, and such a document will fail to parse by any XML parser.
An XMLSerializer() will output the ampersand HTML-encoded.
The following code:
using System.Xml.Serialization;
namespace xml
{
public class MyData
{
public string name = "Spolsky & Atwood";
}
class Program
{
static void Main(string[] args)
{
new XmlSerializer(typeof(MyData)).Serialize(System.Console.Out, new MyData());
}
}
}
will output the following:
<?xml version="1.0" encoding="utf-8"?>
<MyData
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<name>Spolsky & Atwood</name>
</MyData>
, with an & instead of &.
It's not valid XML. Use &:
select cast('<name>Spolsky & Atwood</name>' as xml)
You'd need to XML escape the text, too.
So let's backtrack and assume you're building that string as:
SELECT '<name>' + MyColumn + '</name>' FROM MyTable
you'd want to do something more like:
SELECT '<name>' + REPLACE( MyColumn, '&', '&' ) + '</name>' FROM MyTable
Of course, you probable should cater for the other entities thus:
SELECT '<name>' + REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( MyColumn, '&', '&' ), '''', ''' ), '"', '"' ), '<', '<' ), '>', '>' ) + '</name>' FROM MyTable
When working with XML in SQL you're a lot safer using built-in functions instead of converting it manually.
The following code will build a proper SQL XML variable that looks like your desired output based on a raw string:
DECLARE #ExampleString nvarchar(40)
, #ExampleXml xml
SELECT #ExampleString = N'Spolsky & Atwood'
SELECT #ExampleXml =
(
SELECT 'Spolsky & Atwood' AS 'name'
FOR XML PATH (''), TYPE
)
SELECT #ExampleString , #ExampleXml
As John and Quassnoi state, & on it's own is not valid. This is because the ampersand character is the start of a character entity - used to specify characters that cannot be represented literally. There are two forms of entity - one specifies the character by name (e.g., &, or "), and one the specifies the character by it's code (I believe it's the code position within the Unicode character set, but not sure. e.g., " should represent a double quote).
Thus, to include a literal & in a HTML document, you must specify it's entity: &. Other common ones you may encounter are < for <, > for >, and " for ".