� IN SQL Server database - sql

in my database I have this char �. I want to locate them with a query
Select *
from Sometable
where somecolumn like '%�%'
this gets me no result.
I think it is ANSI encoding

use N like below
where col like N'%�%'
why do you think ,you need N prefix:
Prefix Unicode character string constants with the letter N. Without the N prefix, the string is converted to the default code page of the database. This default code page may not recognize certain characters.
Thanks to Martin Smith,Earlier i tested only with one character earlier and it worked,but as Martin pointed out, it returns all characters..
Below query works and returns only intended
select * from #demo where id like N'%�%'
COLLATE Latin1_General_100_BIN
Demo:
create table #demo
(
id nvarchar(max)
)
insert into #demo
values
(N'ﬗ'),
( N'�')
to know more about unicode,please see below links
http://kunststube.net/encoding/
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

This is the Unicode replacement character symbol.
It could match any of 2,048 invalid code points in the UCS-2 encoding (or the single character U+FFFD for the symbol itself).
You can use a range and a binary collate clause to match them all (demo).
WITH T(N)
AS
(
SELECT TOP 65536 NCHAR(ROW_NUMBER() OVER (ORDER BY ##SPID))
FROM master..spt_values v1,
master..spt_values v2
)
SELECT N
FROM T
WHERE N LIKE '%[' + NCHAR(65533) + NCHAR(55296) + '-' + NCHAR(57343) + ']%' COLLATE Latin1_General_100_BIN

You can use ASCII to find out the ascii code for that char
Select ascii('�')
And use CHAR to retrieve the char from that code and combine it in a LIKE expression
Select * from Sometable
where somecolumn like '%'+CHAR(63)+'%'
Note the collation you use can affect the result. Also it depends on the encoding used by your application to feed your data (UTF-8, UNICODE, etc). also how you store it VARCHAR, or NVARCHAR has a last say on what you see.
There's more here in this similar question
EDIT
#Mark
try this simple test:
create table sometable(somecolumn nvarchar(100) not null)
GO
insert into sometable
values
('12345')
,('123�45')
,('12345')
GO
select * from sometable
where somecolumn like '%'+CHAR(63)+'%'
GO
This only means that character was stored win the as a "?" in this test.
When you see a � it means the app where you are seeing isn't quite sure what to print out.
It also mean OP probably needs to find out what char is that using a query.
Also note it means a string outputted like ��� can be 3 formed by different characters.
CHAR(63) was just an example, but you are right this in the ASCII table will be a standard interrogation.
EDIT
#Bridge
Not with time right now to deep dig in it but the below test don't worked
Select ascii('�'), CHAR(ascii('�')), UNICODE(N'�'), CHAR(UNICODE(N'�'))
GO
create table sometable(somecolumn nvarchar(100) not null)
GO
insert into sometable
values
('12345')
,('123�45')
,('12345')
,('12'+NCHAR(UNICODE(N'�'))+'345')
GO
select * from sometable
where somecolumn like '%'+CHAR(63)+'%'
select * from sometable
where somecolumn like '%'+NCHAR(UNICODE(N'�'))+'%'
GO

Related

while generating insert script as dynamic query, this N is not getting prefixed even though the column is nvarchar

SELECT 'INSERT INTO test ( name )
VALUES ( '''+S1.name+''' )' from SourceTable S1
while generating insert script as dynamic query, the unicode chararacter N is not getting prefixed even though the column is nvarchar.
Insert without dynamic query worked well by prefixing N.
Since a non-Latin1 character is included, you need unicode N-literal.
insert into [test] select N'― menu1;'
Add N along with column name just after the equal sign, silly solution but took my time due to over thinking lol...
SELECT 'UPDATE [test] SET name=N'''+name+'updated' + ''''
FROM [test]

Optimization of a substring query with charindex to trim the left part of a string

I need to get a substring of xyzdf/1234 resulting in 1234 (i.e. trimming the left part of the slash / ) . I have used
substring('xyzdf/1234',charindex('/','xyzdf/1234')+1,len('xyzdf/1234')-charindex('/','xyzdf/1234'))
which works but it is repetitive...
then I have used this way:
stuff('xyzdf/1234',1,charindex('/','xyzdf/1234'),'') and it works too and it is more compact, but still repeats the same argument twice 'xyzdf/1234'.
I wonder what would be the faster way to trim the left part. I will need to clean data in one column for million records. Not sure if the stuff command is faster enough. (Mind you it is a bulk operation). Thanks!
You could select the string from a VALUES.
That way you can repeat the value without double hardcoding it.
Then get the right part with the number from it.
F.e. using RIGHT, CHARINDEX, REVERSE and VALUES:
select right(val, charindex('/',reverse(val))-1) as nr
from (values ('xyzdf/1234')) q(val);
Or use SUBSTRING, CHARINDEX, LEN and VALUES:
select substring(val,charindex('/',val)+1,len(val)) as nr
from (values ('xyzdf/1234')) q(val);
Or abuse PARSENAME:
select parsename(replace('xyzdf/1234','/','.'),1) as nr;
Or use variables:
declare #value varchar(30) = 'xyzdf/1234';
declare #nr int = right(#value, charindex('/',reverse(#value))-1);
select #nr as nr;
But if the intention is to update a column so that only the number remains?
Then using the SUBSTRING method is probably still the safest.
Because it would keep those without / untouched, and without crashing on an Invalid length parameter passed error.
Example:
declare #Table table (id int identity(1,1) primary key, col1 varchar(30));
insert into #Table (col1) values
('xyzdf/1234'),
('12345');
update #Table
set col1 = substring(col1,charindex('/',col1)+1,len(col1))
where col1 like '%/[0-9]%';
select * from #Table;

Cannot find letter 'ș' or 'Ș' inserted from Romanian (Standard) keyboard

I have a table in sql server 2012, where one column is nvarchar. It contains Romanian characters. We've noticed that only some of the letters 'Ș' do not show in reports at all, so I found that it depends of the keyboard settings.
There are two different keyboard settings for Romanian - Standard and Legacy. Letter 'Ș' - inserted from Rom(Standard) keyboard have ASCII code 63, from Legacy it's 170.
Letter 'Ş' with CHAR(170) - shows in reports, but CHAR(63) - doesn't - even though it's the same letter (should be).
It would be simple if I could replace char(63) with char(170), but I cannot detect rows with character 63. The next select doesn't return rows:
select * from table1 where columnname like '%'+CHAR(63)+'%'
even though if I do select ASCII(SUBSTRING(columnname , 1, 1)) it returns me '63'.
even select charindex(char(63), columnname) - returns me 0
I also tried to do collation:
select * from table1 where columnname COLLATE Latin1_general_CI_AI like N'%s%'
it doesn't help - it returns only rows with 's' and char(170).
Please help me find these rows with wrong 'Ş'
So firstly from my comments, CHAR(63) is misleading as it represents a character that sql server is unable to display:
Unable to replace Char(63) by SQL query
The issue is possibly down to your selected collation, as if I run this sample I get the 2 rows containing the special characters:
CREATE TABLE #temp ( val NVARCHAR(50) )
INSERT INTO #temp
( val )
VALUES ( N'Șome val 1' ),
( N'some val 2' ),
( N'șome other val 3' )
SELECT *
FROM #temp
WHERE val COLLATE Latin1_General_BIN LIKE N'%ș%'
OR val COLLATE Latin1_General_BIN LIKE N'%Ș%'
DROP TABLE #temp
Output
val
=================
Șome val 1
șome other val 3
The specified collation is: Latin1_General_BIN, as found in this post:
replace only matches the beginning of the string
WHERE columnname LIKE N'%'+NCHAR(536)+'%'
This should help you find the character even if it was inserted as an unknown character as in the first insert below.
DECLARE #Table TABLE (text nvarchar(50))
INSERT INTO #Table(text)
SELECT 'Ș'
UNION ALL
SELECT N'Ș'
SELECT UNICODE(text) UNICODE
FROM #Table
Results:
UNICODE
63
536
'Ș' is NCHAR(536) and 'ș' is NCHAR(537).
If you then do:
SELECT * FROM #Table WHERE text LIKE N'%'+NCHAR(536)+'%'
Results:
text
?
Ș

Create rule to restrict special characters in table in sql server

I want to create a rule to restrict special characters to be entered into a column.
I have tried the following. But it didnt work.
CREATE RULE rule_spchar
AS
#make LIKE '%[^[^*|\":<>[]{}`\( );#&$]+$]%'
I dont know what I am doing wrong here. Any help would be appreciated.
Your can create a Check Constraint on this column and only allow Numbersand Alphabets to be inserted in this column, see below:
Check Constraint to only Allow Numbers & Alphabets
ALTER TABLE Table_Name
ADD CONSTRAINT ck_No_Special_Characters
CHECK (Column_Name NOT LIKE '%[^A-Z0-9]%')
Check Constraint to only Allow Numbers
ALTER TABLE Table_Name
ADD CONSTRAINT ck_Only_Numbers
CHECK (Column_Name NOT LIKE '%[^0-9]%')
Check Constraint to only Allow Alphabets
ALTER TABLE Table_Name
ADD CONSTRAINT ck_Only_Alphabets
CHECK (Column_Name NOT LIKE '%[^A-Z]%')
It's important to remember Microsoft's plans for the features you're using or intending to use. CREATE RULE is a deprecated feature that won't be around for long. Consider using CHECK CONSTRAINT instead.
Also, since the character exclusion class doesn't actually operate like a RegEx, trying to exclude brackets [] is impossible this way without multiple calls to LIKE. So collating to an accent-insensitive collation and using an alphanumeric inclusive filter will be more successful. More work required for non-latin alphabets.
M.Ali's NOT LIKE '%[^A-Z0-9 ]%' Should serve well.
M.Ali's answer represents the best practice for the solution you describe. That being said, I read your question differently(i.e What is wrong with they way you're implementing the like comparison.)
You are not properly escaping wildcard characters.
The expression 'AB' LIKE '%[AB]% is true. The expression 'ZB' LIKE '%[^AB]%' is also true, since that statement is the equivalent of 'Z' LIKE '[^AB]' OR 'A' LIKE '[^AB]' Instead, you should use 'YZ' NOT LIKE '%[^AB]%' which is the equivalent of 'Y' NOT LIKE '%[^AB]%' AND 'Z' NOT LIKE '%[^AB]%'
You didn't escape the single quote or invisible characters. Take a look at the the ASCII characters. You would be better served implementing a solution like M.Ali's and adding any characters you do not wish to exclude.
The following script demonstrates the formation of a complex wildcard statement that consists of special characters.
-- Create sample data
-- Experiment testing various characters
DECLARE #temp TABLE (id INT NOT NULL, string1 varchar(10) NOT NULL)
INSERT INTO #temp
(id,string1)
SELECT 1, '12]34'
UNION
SELECT 2, '12[34'
UNION
SELECT 3, '12_34'
UNION
SELECT 4, '12%34'
UNION
SELECT 5, '12]34'
SET NOCOUNT ON
DECLARE #SQL_Wildcard_Characters VARCHAR(512),
#Count_SQL_Wildcard_Characters INT,
#Other_Special_Characters VARCHAR(255),
#Character_Position INT,
#Escape_Character CHAR(1),
#Complete_Wildcard_Expression VARCHAR(1024)
SET #Character_Position = 1
-- Note these need to be escaped:
SET #SQL_Wildcard_Characters = '[]^%_'
-- Choose an escape character.
SET #Escape_Character = '~'
-- I added the single quote (') ASCII 39 and the space ( ) ASCII 32.
-- You could also add the actual characters, but this approach may make it easier to read.
SET #Other_Special_Characters = '*|\":<>{}`\();#&$' + CHAR(39) + CHAR(32)
-- Quick loop to escape the #SQL_Wildcard_Characters
SET #Count_SQL_Wildcard_Characters = LEN(#SQL_Wildcard_Characters)
WHILE #Character_Position < 2*#Count_SQL_Wildcard_Characters
BEGIN
SET #SQL_Wildcard_Characters = STUFF(#SQL_Wildcard_Characters,#Character_Position,0,#Escape_Character)
SET #Character_Position = #Character_Position + 2
END
-- Concatenate the respective strings
SET #Complete_Wildcard_Expression = #SQL_Wildcard_Characters+#Other_Special_Characters
-- Shows how the statment works for match
SELECT ID, string1, #Complete_Wildcard_Expression AS [expression]
FROM #temp
WHERE string1 LIKE '%['+#Complete_Wildcard_Expression+']%' ESCAPE #Escape_Character
-- Show how the statement works fo non-match
SELECT ID, string1, #Complete_Wildcard_Expression AS [expression]
FROM #temp
WHERE string1 NOT LIKE '%[^'+#Complete_Wildcard_Expression+']%' ESCAPE #Escape_Character
CREATE FUNCTION udf_checkspecial_characters(#String varchar(MAX))
RETURNS INT AS
BEGIN
DECLARE #Result INT;
SELECT #Result=(CASE WHEN #String COLLATE Latin1_General_BIN LIKE '%[(<~!#/#$%^&>)]%' THEN 1 ELSE 0 END);
RETURN #Result;
END

SQL Server 2008 query to find rows containing non-alphanumeric characters in a column

I was actually asked this myself a few weeks ago, whereas I know exactly how to do this with a SP or UDF but I was wondering if there was a quick and easy way of doing this without these methods. I'm assuming that there is and I just can't find it.
A point I need to make is that although we know what characters are allowed (a-z, A-Z, 0-9) we don't want to specify what is not allowed (##!$ etc...). Also, we want to pull the rows which have the illegal characters so that it can be listed to the user to fix (as we have no control over the input process we can't do anything at that point).
I have looked through SO and Google previously, but was unable to find anything that did what I wanted. I have seen many examples which can tell you if it contains alphanumeric characters, or doesn't, but something that is able to pull out an apostrophe in a sentence I have not found in query form.
Please note also that values can be null or '' (empty) in this varchar column.
Won't this do it?
SELECT * FROM TABLE
WHERE COLUMN_NAME LIKE '%[^a-zA-Z0-9]%'
Setup
use tempdb
create table mytable ( mycol varchar(40) NULL)
insert into mytable VALUES ('abcd')
insert into mytable VALUES ('ABCD')
insert into mytable VALUES ('1234')
insert into mytable VALUES ('efg%^&hji')
insert into mytable VALUES (NULL)
insert into mytable VALUES ('')
insert into mytable VALUES ('apostrophe '' in a sentence')
SELECT * FROM mytable
WHERE mycol LIKE '%[^a-zA-Z0-9]%'
drop table mytable
Results
mycol
----------------------------------------
efg%^&hji
apostrophe ' in a sentence
Sql server has very limited Regex support. You can use PATINDEX with something like this
PATINDEX('%[a-zA-Z0-9]%',Col)
Have a look at PATINDEX (Transact-SQL)
and Pattern Matching in Search Conditions
I found this page with quite a neat solution. What makes it great is that you get an indication of what the character is and where it is. Then it gives a super simple way to fix it (which can be combined and built into a piece of driver code to scale up it's application).
DECLARE #tablename VARCHAR(1000) ='Schema.Table'
DECLARE #columnname VARCHAR(100)='ColumnName'
DECLARE #counter INT = 0
DECLARE #sql VARCHAR(MAX)
WHILE #counter <=255
BEGIN
SET #sql=
'SELECT TOP 10 '+#columnname+','+CAST(#counter AS VARCHAR(3))+' as CharacterSet, CHARINDEX(CHAR('+CAST(#counter AS VARCHAR(3))+'),'+#columnname+') as LocationOfChar
FROM '+#tablename+'
WHERE CHARINDEX(CHAR('+CAST(#counter AS VARCHAR(3))+'),'+#columnname+') <> 0'
PRINT (#sql)
EXEC (#sql)
SET #counter = #counter + 1
END
and then...
UPDATE Schema.Table
SET ColumnName= REPLACE(Columnname,CHAR(13),'')
Credit to Ayman El-Ghazali.
SELECT * FROM TABLE_NAME WHERE COL_NAME LIKE '%[^0-9a-zA-Z $#$.$-$''''$,]%'
This works best for me when I'm trying to find any special characters in a string