Update or replace a special character text in my database field - sql

I am facing to an issue that I cannot update or replace some characters in my database.
Here is how this text look like in my column when I retrieve it:
As you can see, there is an unknown characters between 'master' and 'degree' which I cannot even paste it here.
I also tried to update and replace it with below code (I cannot paste that two vertical lines here since they are not supported in any browser and I am not sure what they are, Please see the picture above to see what is in my SQL statement).
begin transaction
update gm_desc set projdesc=replace(projdesc,'%â s%','') where projdesc like '%âs%' and proposalno = '15-149-01'
You can see the real SQL Statement here:
I tried to update, or replace it but I cannot do it. The update statement successfully works but I still see that weird special charters. I would be appreciate to help me.

Here's a scalar-valued function which removes all non-alphanumeric characters (preserves spaces) from a string.
Hopefully it helps!
dbfiddle
create function dbo.get_alphanumeric_str
(
#string varchar(max)
)
returns varchar(max)
as
begin
declare #ret varchar(max);
with nums as (
select 1 as n
union all select n+1 from nums
where n < 256
)
select #ret = replace(stuff(
(
select '' + substring(#string, nums.n, 1)
from nums
where patindex('%[^0-9A-Za-z ]%', substring(#string, nums.n,1)) = 0
for xml path('')
), 1, 0, ''
), ' ', ' ')
option (MAXRECURSION 256)
return #ret;
end
Usage
select dbo.get_alphanumeric_str('Helloᶄ âWorld 1234⅊⅐')
Returns: Hello World 1234
How it works
The nums CTE is just to get a list of numbers (you can set the maximum of 256 to a higher value if your strings are longer; n.b. option (MAXRECURSION n) is for this CTE but has to be placed at the query)
The stuff essentially iterates through the string, using the list of numbers above and extracts a substring of length 1; each of these chars are checked if they match the [^0-9A-Za-z ] regex group (0-9 all digits, A-Za-z all letters both lower and upper case, and a single space character)
If they match, patindex() should return 0; i.e. index zero.
Use replace(string, ' ', ' ') for the space character as the xml path returns a special encoding, see this question.
Use a binary collation for accented characters; see this answer

Related

Identify Hidden Characters

In my SQL tables I have text which has hidden characters which is only visible when I copy and paste it in notepad++.
How to find those rows which has hidden characters using SQL Server queries?
I have tried comparing the lengths using datalength and len
it did not work.
DATALENGTH(name) AS BinaryLength != LEN(name)
I want the row which has hidden characters.
On the assumption that this is being caused by control characters. Some of which are invisible. But also include tabs, newlines and spaces. An example to illustrate and how to get them to appear.
--DROP TABLE #SillyTemp
DECLARE #InvisibleChar1 NCHAR(1) = NCHAR(28), #InvisibleChar2 NCHAR(1) = NCHAR(30), #NonControlChar NCHAR(1) = NCHAR(33);
DECLARE #InputString NVARCHAR(500) = N'Some |' + #InvisibleChar1 +'| random string |' + #InvisibleChar2 + '|' + '; Thank god Finally a normal character |' + #NonControlChar + '|';
SELECT #InputString AS OhNoInvisibleCharacters
DECLARE #ControlCharRange NVARCHAR(50) = N'%[' + NCHAR(1) + '-' + NCHAR(31) + ']%';
CREATE TABLE #SillyTemp
(
input nvarchar(500)
)
INSERT INTO #SillyTemp(input)
VALUES (#InputString),(N'A normal string')
SELECT #ControlCharRange;
SELECT input FROM #SillyTemp AS #SI WHERE input LIKE #ControlCharRange;
This produces 3 results. A string with invisiblechars within them like such:
Some || random string ||; Thank god Finally a normal character |!|
Note, the are actually invisible inside SQL. But stackoverflow shows them as such. The output in SQL Server is simply.
Some || random string ||; Thank god Finally a normal character |!|
But these characters still have a corresponding (N)CHAR(X) value. (N)CHAR(0) is a NULL character and is highly unlikely to be in a string, in my setup to detect them it also provides some problems in building a range. (N)CHAR(32) is the ' ' space character.
The way the [X-Y] string operator works is also based on the (N)CHAR numbers. Therefore we can make a range of [NCHAR(1)-NCHAR(31)]
The last select goes through the temporary table, one which has invisible characters. Since we're looking for any NCHARS between 1 and 31, only those with invisible characters (and often invalid characters or tabs/newlines) satisfy the where condition. Thus only they get returned. In this case only the 'faulty' string gets returned in my select statement.

SQL - How to remove a space character from a string

I have a table where the max length of a column (varchar) is 12, someone has loaded some value with a space, so rather than 'SPACE' it's 'SPACE '
I want to remove the space using a script, I was positive RTRIM or REPLACE(myValue, ' ', '') would work but LEN(myValue) shows there is still and extra character?
As mentioned by a couple folks, it may not be a space. Grab a copy of ngrams8k and you use it to identify the issue. For example, here we have the text, " SPACE" with a preceding space and trailing CHAR(160) (HTML BR tag). CHAR(160) looks like a space in SSMS but isn't "trimable". For example consider this query:
DECLARE #string VARCHAR(100) = ' SPACE'+CHAR(160);
SELECT '"'+#string+'"'
Using ngrams8k you could do this:
DECLARE #string VARCHAR(100) = ' SPACE'+CHAR(160);
SELECT
ng.position,
ng.token,
asciival = ASCII(ng.token)
FROM dbo.ngrams8k(#string,1) AS ng;
Returns:
position token asciival
---------- ------- -----------
1 32
2 S 83
3 P 80
4 A 65
5 C 67
6 E 69
7   160
As you can see, the first character (position 1) is CHAR(32), that's a space. The last character (postion 7) is not a space.
Knowing that CHAR(160) is the issue you could fix it like so:
SET #string = REPLACE(LTRIM(#string),CHAR(160),'')
If you are using SQL Server 2017+ you can also use TRIM which does a whole lot more than just LTRIM-and-RTRIM-ing. For example, this will remove
leading and trailing tabs, spaces, carriage returns, line returns and HTML BR tags.
SET #string = SELECT TRIM(CHAR(32)+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160) FROM #string)
Odds are it is some other non-printing character, carriage return is a big one when going from between *nix and other OS. One way to tell is to use the DUMP function. So you could start with a query like:
SELECT dump(column_name)
FROM your_table
WHERE column_name LIKE 'SPACE%'
That should help you find the offending character, however, that doesn't fix your problem. Instead, I would use something like REGEXP_REPLACE:
SELECT REGEXP_REPLACE(column_name, '[^A-z]')
FROM your_table
That should take care of any non-printing characters. You may need to play with the regular expression if you expect numbers or symbols in your string. You could switch to a character class like:
SELECT REGEXP_REPLACE(column_name, '[:cntrl:]')
FROM your_table

SQL Server TRIM character

I have the following string: 'BOB*', how do I trim the * so it shows up as 'BOB'
I tried the RTRIM('BOB*','*') but does not work as says needs only 1 parameter.
Another pretty good way to implement Oracle's TRIM char FROM string in MS SQL Server is the following:
First, you need to identify a char that will never be used in your string, for example ~
You replace all spaces with that character
You replace the character * you want to trim with a space
You LTrim + RTrim the obtained string
You replace back all spaces with the trimmed character *
You replace back all never-used characters with a space
For example:
REPLACE(REPLACE(LTrim(RTrim(REPLACE(REPLACE(string,' ','~'),'*',' '))),' ','*'),'~',' ')
CREATE FUNCTION dbo.TrimCharacter
(
#Value NVARCHAR(4000),
#CharacterToTrim NVARCHAR(1)
)
RETURNS NVARCHAR(4000)
AS
BEGIN
SET #Value = LTRIM(RTRIM(#Value))
SET #Value = REVERSE(SUBSTRING(#Value, PATINDEX('%[^'+#CharacterToTrim+']%', #Value), LEN(#Value)))
SET #Value = REVERSE(SUBSTRING(#Value, PATINDEX('%[^'+#CharacterToTrim+']%', #Value), LEN(#Value)))
RETURN #Value
END
GO
--- Example
----- SELECT dbo.TrimCharacter('***BOB*********', '*')
----- returns 'BOB'
If you want to remove all asterisks then it's obvious:
SELECT REPLACE('Hello*', '*', '')
However, If you have more than one asterisk at the end and multiple throughout, but are only interested in trimming the trailing ones, then I'd use this:
DECLARE #String VarChar(50) = '**H*i****'
SELECT LEFT(#String, LEN(REPLACE(#String, '*', ' '))) --Returns: **H*i
I updated this answer to include show how to remove leading characters:
SELECT RIGHT(#String, LEN(REPLACE(REVERSE(#String), '*', ' '))) --Returns: H*i****
LEN() has a "feature" (that looks a lot like a bug) where it does not count trailing spaces.
LEFT('BOB*', LEN('BOB*')-1)
should do it.
If you wanted behavior similar to how RTRIM handles spaces i.e. that "B*O*B**" would turn into "B*O*B" without losing the embedded ones then something like -
REVERSE(SUBSTRING(REVERSE('B*O*B**'), PATINDEX('%[^*]%',REVERSE('B*O*B**')), LEN('B*O*B**') - PATINDEX('%[^*]%', REVERSE('B*O*B**')) + 1))
Should do it.
If you only want to remove a single '*' character from the value when the value ends with a '*', a simple CASE expression will do that for you:
SELECT CASE WHEN RIGHT(foo,1) = '*' THEN LEFT(foo,LEN(foo)-1) ELSE foo END AS foo
FROM (SELECT 'BOB*' AS foo)
To remove all trailing '*' characters, then you'd need a more complex expression, making use of the REVERSE, PATINDEX, LEN and LEFT functions.
NOTE: Be careful with the REPLACE function, as that will replace all occurrences of the specified character within the string, not just the trailing ones.
How about.. (in this case to trim off trailing comma or period)
For a variable:
-- Trim commas and full stops from end of City
WHILE RIGHT(#CITY, 1) IN (',', '.'))
SET #CITY = LEFT(#CITY, LEN(#CITY)-1)
For table values:
-- Trim commas and full stops from end of City
WHILE EXISTS (SELECT 1 FROM [sap_out_address] WHERE RIGHT([CITY], 1) IN (',', '.'))
UPDATE [sap_out_address]
SET [CITY] = LEFT([CITY], LEN([CITY])-1)
WHERE RIGHT([CITY], 1) IN (',', '.')
An other approach ONLY if you want to remove leading and trailing characters is the use of TRIM function.
By default removes white spaces but have te avility of remove other characters if you specify its.
SELECT TRIM('=' FROM '=SPECIALS=') AS Result;
Result
--------
SPECIALS
Unfortunately LTRIM and RTRIM does not work in the same way and only removes white spaces instead of specified characters like TRIM does if you specify its.
Reference and more examples:
https://database.guide/how-to-remove-leading-and-trailing-characters-in-sql-server/
RRIM() LTRIM() only remove spaces try http://msdn.microsoft.com/en-us/library/ms186862.aspx
Basically just replace the * with empty space
REPLACE('TextWithCharacterToReplace','CharacterToReplace','CharacterToReplaceWith')
So you want
REPLACE ('BOB*','*','')
I really like Teejay's answer, and almost stopped there. It's clever, but I got the "almost too clever" feeling, as, somehow, your string at some point will actually have a ~ (or whatever) in it on purpose. So that's not defensive enough for me to put into production.
I like Chris' too, but the PATINDEX call seems like overkill.
Though it's probably a micro-optimization, here's one without PATINDEX:
CREATE FUNCTION dbo.TRIMMIT(#stringToTrim NVARCHAR(MAX), #charToTrim NCHAR(1))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #retVal NVARCHAR(MAX)
SET #retVal = #stringToTrim
WHILE 1 = charindex(#charToTrim, reverse(#retVal))
SET #retVal = SUBSTRING(#retVal,0,LEN(#retVal))
WHILE 1 = charindex(#charToTrim, #retVal)
SET #retVal = SUBSTRING(#retVal,2,LEN(#retVal))
RETURN #retVal
END
--select dbo.TRIMMIT('\\trim\asdfds\\\', '\')
--trim\asdfds
Returning a MAX nvarchar bugs me a little, but that's the most flexible way to do this..
I've used a similar approach to some of the above answers of using pattern matching and reversing the string to find the first non-trimmable character, then cutting that off. The difference is this version does less work than those above, so should be a little more efficient.
This creates RTRIM functionality for any specified character.
It includes an additional step set #charToFind = case... to escape the chosen character.
There is currently an issue if #charToReplace is a right crotchet (]) as there appears to be no way to escape this.
.
declare #stringToSearch nvarchar(max) = '****this is****a ** demo*****'
, #charToFind nvarchar(5) = '*'
--escape #charToFind so it doesn't break our pattern matching
set #charToFind = case #charToFind
when ']' then '[]]' --*this does not work / can't find any info on escaping right crotchet*
when '^' then '\^'
--when '%' then '%' --doesn't require escaping in this context
--when '[' then '[' --doesn't require escaping in this context
--when '_' then '_' --doesn't require escaping in this context
else #charToFind
end
select #stringToSearch
, left
(
#stringToSearch
,1
+ len(#stringToSearch)
- patindex('%[^' + #charToFind + ']%',reverse(#stringToSearch))
)
SqlServer2017 has a new way to do it: https://learn.microsoft.com/en-us/sql/t-sql/functions/trim-transact-sql?view=sql-server-2017
SELECT TRIM('0' FROM '00001900'); -> 19
SELECT TRIM( '.,! ' FROM '# test .'); -> # test
SELECT TRIM('*' FROM 'BOB*'); --> BOB
Unfortunately, RTRIM does not support trimming a specific character.
SELECT REPLACE('BOB*', '*', '')
SELECT REPLACE('B*OB*', '*', '')
-------------------------------------
Result : BOB
-------------------------------------
this will replace all asterisk* from the text
Trim with many cases
--id = 100 101 102 103 104 105 106 107 108 109 110 111
select right(id,2)+1 from ordertbl -- 1 2 3 4 5 6 7 8 9 10 11 -- last two positions are taken
select LEFT('BOB', LEN('BOB')-1) -- BO
select LEFT('BOB*',1) --B
select LEFT('BOB*',2) --BO
Try this:
Original
select replace('BOB*','*','')
Fixed to be an exact replacement
select replace('BOB*','BOB*','BOB')
Solution for one char parameter:
rtrim('0000100','0') ->
select left('0000100',len(rtrim(replace('0000100','0',' '))))
ltrim('0000100','0') ->
select right('0000100',len(replace(ltrim(replace('0000100','0',' ')),' ','.')))
#teejay solution is great. But the code below can be more understandable:
declare #X nvarchar(max)='BOB *'
set #X=replace(#X,' ','^')
set #X=replace(#X,'*',' ')
set #X= ltrim(rtrim(#X))
set #X=replace(#X,'^',' ')
Here's a function I used in the past. Note that while you can make it more general purpose by having extra parameters like the character(s) you wish to remove and what you will be replacing the space character(s) with, this greatly increases execution time. Here, I used a pipe to replace spaces AFTER pre-trimming the input. Change varchar to nvarchar if required.
CREATE FUNCTION [dbo].[TrimColons]
(
#strToTrim varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
RETURN REPLACE(REPLACE(LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))),' ',':'),'|',' ')
/*
Here's a breakdown of this fancy, schmancy, trimmer
LTRIM(RTRIM(#strToTrim)) trims leading & trailing spaces first
REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|') replaces inside spaces with pipe char
REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' ') replaces demarc character, the colon, with spaces
LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))) trims the leading & trailing converted-to-space demarc char (colon)
REPLACE(LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))),' ',':') replaces the inner space characters back to demar char (colon)
REPLACE(REPLACE(LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))),' ',':'),'|',' ') replaces the pipe characters back to original space characters
*/
END
DECLARE #String VarChar(50) = '**H*i****', #String2 VarChar(50)
--Assign to new variable #String2
;WITH X AS (
SELECT LEFT(#String, LEN(REPLACE(#String, '*', ' '))) [V1]
)
SELECT TOP 1 #String2 = RIGHT(V1, LEN(REPLACE(REVERSE(V1), '*', ' '))) FROM X
SELECT #String [#String], #String2 [#String2]
--See the intermediate values, v0 original, v1 triming end, and v2 trim the v1 leading
;WITH X AS (
SELECT #String V0, LEFT(#String, LEN(REPLACE(#String, '*', ' '))) [V1]
)
SELECT [V0], [V1], RIGHT([V1], LEN(REPLACE(REVERSE([V1]), '*', ' '))) [v2] FROM X

How can I remove leading and trailing quotes in SQL Server?

I have a table in a SQL Server database with an NTEXT column. This column may contain data that is enclosed with double quotes. When I query for this column, I want to remove these leading and trailing quotes.
For example:
"this is a test message"
should become
this is a test message
I know of the LTRIM and RTRIM functions but these workl only for spaces. Any suggestions on which functions I can use to achieve this.
I have just tested this code in MS SQL 2008 and validated it.
Remove left-most quote:
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 2, LEN(FieldName))
WHERE LEFT(FieldName, 1) = '"'
Remove right-most quote: (Revised to avoid error from implicit type conversion to int)
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 1, LEN(FieldName)-1)
WHERE RIGHT(FieldName, 1) = '"'
I thought this is a simpler script if you want to remove all quotes
UPDATE Table_Name
SET col_name = REPLACE(col_name, '"', '')
You can simply use the "Replace" function in SQL Server.
like this ::
select REPLACE('this is a test message','"','')
note: second parameter here is "double quotes" inside two single quotes and third parameter is simply a combination of two single quotes. The idea here is to replace the double quotes with a blank.
Very simple and easy to execute !
My solution is to use the difference in the the column values length compared the same column length but with the double quotes replaced with spaces and trimmed in order to calculate the start and length values as parameters in a SUBSTRING function.
The advantage of doing it this way is that you can remove any leading or trailing character even if it occurs multiple times whilst leaving any characters that are contained within the text.
Here is my answer with some test data:
SELECT
x AS before
,SUBSTRING(x
,LEN(x) - (LEN(LTRIM(REPLACE(x, '"', ' ')) + '|') - 1) + 1 --start_pos
,LEN(LTRIM(REPLACE(x, '"', ' '))) --length
) AS after
FROM
(
SELECT 'test' AS x UNION ALL
SELECT '"' AS x UNION ALL
SELECT '"test' AS x UNION ALL
SELECT 'test"' AS x UNION ALL
SELECT '"test"' AS x UNION ALL
SELECT '""test' AS x UNION ALL
SELECT 'test""' AS x UNION ALL
SELECT '""test""' AS x UNION ALL
SELECT '"te"st"' AS x UNION ALL
SELECT 'te"st' AS x
) a
Which produces the following results:
before after
-----------------
test test
"
"test test
test" test
"test" test
""test test
test"" test
""test"" test
"te"st" te"st
te"st te"st
One thing to note that when getting the length I only need to use LTRIM and not LTRIM and RTRIM combined, this is because the LEN function does not count trailing spaces.
I know this is an older question post, but my daughter came to me with the question, and referenced this page as having possible answers. Given that she's hunting an answer for this, it's a safe assumption others might still be as well.
All are great approaches, and as with everything there's about as many way to skin a cat as there are cats to skin.
If you're looking for a left trim and a right trim of a character or string, and your trailing character/string is uniform in length, here's my suggestion:
SELECT SUBSTRING(ColName,VAR, LEN(ColName)-VAR)
Or in this question...
SELECT SUBSTRING('"this is a test message"',2, LEN('"this is a test message"')-2)
With this, you simply adjust the SUBSTRING starting point (2), and LEN position (-2) to whatever value you need to remove from your string.
It's non-iterative and doesn't require explicit case testing and above all it's inline all of which make for a cleaner execution plan.
The following script removes quotation marks only from around the column value if table is called [Messages] and the column is called [Description].
-- If the content is in the form of "anything" (LIKE '"%"')
-- Then take the whole text without the first and last characters
-- (from the 2nd character and the LEN([Description]) - 2th character)
UPDATE [Messages]
SET [Description] = SUBSTRING([Description], 2, LEN([Description]) - 2)
WHERE [Description] LIKE '"%"'
You can use following query which worked for me-
For updating-
UPDATE table SET colName= REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') WHERE...
For selecting-
SELECT REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') FROM TableName
you could replace the quotes with an empty string...
SELECT AllRemoved = REPLACE(CAST(MyColumn AS varchar(max)), '"', ''),
LeadingAndTrailingRemoved = CASE
WHEN MyTest like '"%"' THEN SUBSTRING(Mytest, 2, LEN(CAST(MyTest AS nvarchar(max)))-2)
ELSE MyTest
END
FROM MyTable
Some UDFs for re-usability.
Left Trimming by character (any number)
CREATE FUNCTION [dbo].[LTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Right Trimming by character (any number)
CREATE FUNCTION [dbo].[RTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(RTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Note the dummy character '¦' (Alt+0166) cannot be present in the data (you may wish to test your input string, first, if unsure or use a different character).
To remove both quotes you could do this
SUBSTRING(fieldName, 2, lEN(fieldName) - 2)
you can either assign or project the resulting value
You can use TRIM('"' FROM '"this "is" a test"') which returns: this "is" a test
CREATE FUNCTION dbo.TRIM(#String VARCHAR(MAX), #Char varchar(5))
RETURNS VARCHAR(MAX)
BEGIN
RETURN SUBSTRING(#String,PATINDEX('%[^' + #Char + ' ]%',#String)
,(DATALENGTH(#String)+2 - (PATINDEX('%[^' + #Char + ' ]%'
,REVERSE(#String)) + PATINDEX('%[^' + #Char + ' ]%',#String)
)))
END
GO
Select dbo.TRIM('"this is a test message"','"')
Reference : http://raresql.com/2013/05/20/sql-server-trim-how-to-remove-leading-and-trailing-charactersspaces-from-string/
I use this:
UPDATE DataImport
SET PRIO =
CASE WHEN LEN(PRIO) < 2
THEN
(CASE PRIO WHEN '""' THEN '' ELSE PRIO END)
ELSE REPLACE(PRIO, '"' + SUBSTRING(PRIO, 2, LEN(PRIO) - 2) + '"',
SUBSTRING(PRIO, 2, LEN(PRIO) - 2))
END
Try this:
SELECT left(right(cast(SampleText as nVarchar),LEN(cast(sampleText as nVarchar))-1),LEN(cast(sampleText as nVarchar))-2)
FROM TableName

How can I find Unicode/non-ASCII characters in an NTEXT field in a SQL Server 2005 table?

I have a table with a couple thousand rows. The description and summary fields are NTEXT, and sometimes have non-ASCII chars in them. How can I locate all of the rows with non ASCII characters?
I have sometimes been using this "cast" statement to find "strange" chars
select
*
from
<Table>
where
<Field> != cast(<Field> as varchar(1000))
First build a string with all the characters you're not interested in (the example uses the 0x20 - 0x7F range, or 7 bits without the control characters.) Each character is prefixed with |, for use in the escape clause later.
-- Start with tab, line feed, carriage return
declare #str varchar(1024)
set #str = '|' + char(9) + '|' + char(10) + '|' + char(13)
-- Add all normal ASCII characters (32 -> 127)
declare #i int
set #i = 32
while #i <= 127
begin
-- Uses | to escape, could be any character
set #str = #str + '|' + char(#i)
set #i = #i + 1
end
The next snippet searches for any character that is not in the list. The % matches 0 or more characters. The [] matches one of the characters inside the [], for example [abc] would match either a, b or c. The ^ negates the list, for example [^abc] would match anything that's not a, b, or c.
select *
from yourtable
where yourfield like '%[^' + #str + ']%' escape '|'
The escape character is required because otherwise searching for characters like ], % or _ would mess up the LIKE expression.
Hope this is useful, and thanks to JohnFX's comment on the other answer.
Here ya go:
SELECT *
FROM Objects
WHERE
ObjectKey LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?#\[\^_`{|}~\]\\]%' ESCAPE '\'
It's probably not the best solution, but maybe a query like:
SELECT *
FROM yourTable
WHERE yourTable.yourColumn LIKE '%[^0-9a-zA-Z]%'
Replace the "0-9a-zA-Z" expression with something that captures the full ASCII set (or a subset that your data contains).
Technically, I believe that an NCHAR(1) is a valid ASCII character IF & Only IF UNICODE(#NChar) < 256 and ASCII(#NChar) = UNICODE(#NChar) though that may not be exactly what you intended. Therefore this would be a correct solution:
;With cteNumbers as
(
Select ROW_NUMBER() Over(Order By c1.object_id) as N
From sys.system_columns c1, sys.system_columns c2
)
Select Distinct RowID
From YourTable t
Join cteNumbers n ON n <= Len(CAST(TXT As NVarchar(MAX)))
Where UNICODE(Substring(TXT, n.N, 1)) > 255
OR UNICODE(Substring(TXT, n.N, 1)) <> ASCII(Substring(TXT, n.N, 1))
This should also be very fast.
I started with #CC1960's solution but found an interesting use case that caused it to fail. It seems that SQL Server will equate certain Unicode characters to their non-Unicode approximations. For example, SQL Server considers the Unicode character "fullwidth comma" (http://www.fileformat.info/info/unicode/char/ff0c/index.htm) the same as a standard ASCII comma when compared in a WHERE clause.
To get around this, have SQL Server compare the strings as binary. But remember, nvarchar and varchar binaries don't match up (16-bit vs 8-bit), so you need to convert your varchar back up to nvarchar again before doing the binary comparison:
select *
from my_table
where CONVERT(binary(5000),my_table.my_column) != CONVERT(binary(5000),CONVERT(nvarchar(1000),CONVERT(varchar(1000),my_table.my_column)))
If you are looking for a specific unicode character, you might use something like below.
select Fieldname from
(
select Fieldname,
REPLACE(Fieldname COLLATE Latin1_General_BIN,
NCHAR(65533) COLLATE Latin1_General_BIN,
'CustomText123') replacedcol
from table
) results where results.replacedcol like '%CustomText123%'
My previous answer was confusing UNICODE/non-UNICODE data. Here is a solution that should work for all situations, although I'm still running into some anomalies. It seems like certain non-ASCII unicode characters for superscript characters are being confused with the actual number character. You might be able to play around with collations to get around that.
Hopefully you already have a numbers table in your database (they can be very useful), but just in case I've included the code to partially fill that as well.
You also might need to play around with the numeric range, since unicode characters can go beyond 255.
CREATE TABLE dbo.Numbers
(
number INT NOT NULL,
CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (number)
)
GO
DECLARE #i INT
SET #i = 0
WHILE #i < 1000
BEGIN
INSERT INTO dbo.Numbers (number) VALUES (#i)
SET #i = #i + 1
END
GO
SELECT *,
T.ID, N.number, N'%' + NCHAR(N.number) + N'%'
FROM
dbo.Numbers N
INNER JOIN dbo.My_Table T ON
T.description LIKE N'%' + NCHAR(N.number) + N'%' OR
T.summary LIKE N'%' + NCHAR(N.number) + N'%'
and t.id = 1
WHERE
N.number BETWEEN 127 AND 255
ORDER BY
T.id, N.number
GO
-- This is a very, very inefficient way of doing it but should be OK for
-- small tables. It uses an auxiliary table of numbers as per Itzik Ben-Gan and simply
-- looks for characters with bit 7 set.
SELECT *
FROM yourTable as t
WHERE EXISTS ( SELECT *
FROM msdb..Nums as NaturalNumbers
WHERE NaturalNumbers.n < LEN(t.string_column)
AND ASCII(SUBSTRING(t.string_column, NaturalNumbers.n, 1)) > 127)