Repeating characters in T-SQL LIKE condition - sql

Problem:
Limit the value of a VARCHAR variable (or a column) to digits and ASCI characters, but allow variable length.
This script will not yield required result:
declare #var as VARCHAR (150)
select #var = '123ABC'
if (#var LIKE '[a-zA-Z0-9]{0,150}')
print 'OK'
else
print 'Not OK'
Anyone have idea how to do this?

You can do this with the not carat ^, and a NOT LIKE expression.
So you say, where not like not non-alphanumeric ;) This works for standard numbers & characters:
declare #var as VARCHAR (150)
select #var = '123ABC'
if (#var NOT LIKE '%[^a-zA-Z0-9]%')
print 'OK'
else
print 'Not OK'
Edit: Thanks Martin for the collation hint, if you want the characters like ý treated as non-alphanumeric add in the COLLATE as below
declare #var as VARCHAR (150)
select #var = '123ABCý'
if (#var NOT LIKE '%[^a-zA-Z0-9]%' COLLATE Latin1_General_BIN )
print 'OK'
else
print 'Not OK'

Will this help
Declare #t table (Alphanumeric VARCHAR(100))
Insert Into #t
Select '123ABCD' Union All Select 'ABC' Union All
Select '123' Union All Select '123ABCý' Union All
Select 'a-z123' Union All Select 'abc123' Union All
Select 'a1b2c3d4'
SELECT Alphanumeric
FROM #t
WHERE Alphanumeric LIKE '%[a-zA-Z0-9]%'
AND ( Alphanumeric NOT LIKE '%[^0-9a-zA-Z]%' COLLATE Latin1_General_BIN)
AND LEN(Alphanumeric)> 6 -- display records having more than a length of 6
//Result
Alphanumeric
123ABCD
a1b2c3d4
N.B.~ Used Martin's collation hint..Thanks

T-SQL doesn't support RegEx.
You can use SQL CLR to run such expression though.
Also try the LEN function:
if (LEN(#var) <= 150)
print 'OK'
else
print 'Not OK'

T-SQL doesn’t support regex, closest you can get is the PATINDEX function that you can use to match specific characters, but you can’t specify the count.
You can try combining it with the LEN function to check the length.
See this page for a few examples of PATINDEX.

Related

SQL Server : Nvarchar to Varchar

I have a table with two columns, one is of type Varchar and the other in NVarchar.
I want to update all the rows so VarcharField = NVarcharField.
It won't let me because some of the rows contain chars that are not allowed in varchar column with the current code page.
How can I find these rows?
Is it possible to remove any char that doesn't fit the specific code page I'm using?
SQL Server 2012.
You can find the rows by attempting to convert the nvarchar() col to varchar():
select nvarcharcol
from t
where try_convert(varchar(max), nvarcharcol) is null;
Try this..
to find the rows with values that are not supported by varchar
declare #strText nvarchar(max)
set #strText = 'Keep calm and say தமிழன்டா'
select cast(#strText as varchar(max)) col1 , N'Keep calm and say தமிழன்டா' col2
Here #strText has non-english chars, When you try to cast that into varchar the non-english chars turns into ????. So the col1 and col2 are not equal.
select nvar_col
from tabl_name
where nvar_col != cast(nvar_col as varchar(max))
Is it possible to remove any char that doesn't fit the specific code page I'm using?
update tabl_name
set nvar_col = replace(cast(nvar_col as varchar(max)),'?','')
where nvar_col != cast(nvar_col as varchar(max))
Replace ? with empty string and update them.
If Gordon's approach doesn't work because you get question marks from TRY_CONVERT instead of the expected NULL, try this approach:
SELECT IsConvertible = CASE WHEN NULLIF(REPLACE(TRY_CONVERT(varchar(max), N'人物'), '?',''), '') IS NULL
THEN 'No' ELSE 'Yes' END
If you need it as filter for the rows that can't be converted:
SELECT t.*
FROM dbo.TableName t
WHERE NULLIF(REPLACE(TRY_CONVERT(varchar(max), t.NVarcharField), '?',''), '') IS NULL

Replace ; with blank in SQL

I have a table like this:
DECLARE #T TABLE
(note VARCHAR (50))
INSERT #T
SELECT 'Amplifier'
UNION ALL SELECT ';'
UNION ALL SELECT 'Regulator'
How can I replace the semicolon (';') with blank ('').
Expected Output:
Amplifier
'' -- here semicolon replace with blank
Regulator
If you want to replace ALL semicolons from any outputted cell you can use REPLACE like this:
SELECT REPLACE(note,';','') AS [note] FROM #T
Fetching from the given table, use a CASE statement:
SELECT CASE WHEN note = ';' THEN '' ELSE note END AS note FROM #T;
replace() would replace all occurrences of the character. Doesn't seem like you'd want that. This expression only replaces exact matches of the whole string.
It looks like you need to REPLACE all your semicolons:
DECLARE #T TABLE
(note VARCHAR (50))
INSERT INTO #T
SELECT REPLACE(SourceColumn, ';', '')
FROM SourceTable
SQL Server 2017 introduced the function TRANSLATE where you can specifiy a list of characters to be replaced
SELECT TRANSLATE('MAX(0,MIN(h-36,8))', '(,-)', ' ') -->'MAX 0 MIN h 36 8 '

Cannot find letter 'ș' or 'Ș' inserted from Romanian (Standard) keyboard

I have a table in sql server 2012, where one column is nvarchar. It contains Romanian characters. We've noticed that only some of the letters 'Ș' do not show in reports at all, so I found that it depends of the keyboard settings.
There are two different keyboard settings for Romanian - Standard and Legacy. Letter 'Ș' - inserted from Rom(Standard) keyboard have ASCII code 63, from Legacy it's 170.
Letter 'Ş' with CHAR(170) - shows in reports, but CHAR(63) - doesn't - even though it's the same letter (should be).
It would be simple if I could replace char(63) with char(170), but I cannot detect rows with character 63. The next select doesn't return rows:
select * from table1 where columnname like '%'+CHAR(63)+'%'
even though if I do select ASCII(SUBSTRING(columnname , 1, 1)) it returns me '63'.
even select charindex(char(63), columnname) - returns me 0
I also tried to do collation:
select * from table1 where columnname COLLATE Latin1_general_CI_AI like N'%s%'
it doesn't help - it returns only rows with 's' and char(170).
Please help me find these rows with wrong 'Ş'
So firstly from my comments, CHAR(63) is misleading as it represents a character that sql server is unable to display:
Unable to replace Char(63) by SQL query
The issue is possibly down to your selected collation, as if I run this sample I get the 2 rows containing the special characters:
CREATE TABLE #temp ( val NVARCHAR(50) )
INSERT INTO #temp
( val )
VALUES ( N'Șome val 1' ),
( N'some val 2' ),
( N'șome other val 3' )
SELECT *
FROM #temp
WHERE val COLLATE Latin1_General_BIN LIKE N'%ș%'
OR val COLLATE Latin1_General_BIN LIKE N'%Ș%'
DROP TABLE #temp
Output
val
=================
Șome val 1
șome other val 3
The specified collation is: Latin1_General_BIN, as found in this post:
replace only matches the beginning of the string
WHERE columnname LIKE N'%'+NCHAR(536)+'%'
This should help you find the character even if it was inserted as an unknown character as in the first insert below.
DECLARE #Table TABLE (text nvarchar(50))
INSERT INTO #Table(text)
SELECT 'Ș'
UNION ALL
SELECT N'Ș'
SELECT UNICODE(text) UNICODE
FROM #Table
Results:
UNICODE
63
536
'Ș' is NCHAR(536) and 'ș' is NCHAR(537).
If you then do:
SELECT * FROM #Table WHERE text LIKE N'%'+NCHAR(536)+'%'
Results:
text
?
Ș

VARCHAR vs VARCHAR(X)

This results in 'B':
DECLARE #NAME VARCHAR(20)=' s'
IF (#NAME IS NULL OR #NAME='')
SELECT 'A'
ELSE
SELECT 'B'
Whereas, this results in 'A'.
DECLARE #NAME VARCHAR=' s'
IF (#NAME IS NULL OR #NAME='')
SELECT 'A'
ELSE
SELECT 'B'
The only difference is VARCHAR(20) vs VARCHAR.
What is the reason of this odd behaviour?
Sql Server defaults a VARCHAR of unspecified length to a length of 1. And when taken in conjunction with Microsoft's interpretation of ANSI/ISO SQL-92 (ref here) which results in padding compared strings to equal length during equality comparisons, resulting in ' ' being = to '', hence the non-intuitive 'A' in the second test.
VARCHAR needs a size to be specified when there is no size specified it assume that the declaration is of one character length.
To confirm this, you can check the length of the variables.
Since, your if condition is checking for '' it satisfies the condition.
DECLARE #NAME VARCHAR=' s'
select datalength(#name)
returns 1
DECLARE #NAME VARCHAR(20)=' s'
select datalength(#name)
returns 2

What is the simplest/best way to remove substrings at the end of a string?

I have a function that normalizes addresses. What I would like to do now is remove any of the strings in a limited, specified list if they occur at the end of the string. Let's say the strings I want to remove are 'st', 'ave', 'rd', 'dr', 'ct'... If the string ends with any of these strings, I want to remove them. What is the best way to accomplish this, using T-SQL (this will not be part of a select statement)?
Edit:
This is a function that accepts one address and formats it. I would like to inline the code, and the list, but in the simplest way possible. For example, some code that I've been playing with is:
if #address LIKE '%st'
SET #address = substring(#address, 1, PatIndex('%st', #address) - 1)
Is this a good method? How can I put it in some sort of loop so I can repeat this code with different values (other than st)?
Adding the values to be trimmed to a new table allows you to
easily add new values
use this table to clean up adresses
SQL Statement
DECLARE #Input VARCHAR(32)
SET #Input = 'Streetsstaverddrad'
DECLARE #Trim TABLE (Value VARCHAR(32))
INSERT INTO #Trim
SELECT 'st'
UNION ALL SELECT 'ave'
UNION ALL SELECT 'rd'
UNION ALL SELECT 'dr'
UNION ALL SELECT 'ad'
WHILE EXISTS (
SELECT *
FROM (
SELECT [Adres] = #Input
) i
INNER JOIN #Trim t ON i.Adres LIKE '%' + t.Value
)
BEGIN
SELECT #Input = SUBSTRING(Adres, 1, LEN(Adres) - LEN(t.Value))
FROM (
SELECT [Adres] = #Input
) i
INNER JOIN #Trim t ON i.Adres LIKE '%' + t.Value
END
SELECT #Input
In SQL Server 2005 it is possible to define a user-function which enables regular expression matching. You will need to defined a function which strips the trailing strings. A RegEx to match the scenarios you mention would be something like...
\s+(ave|rd|dr|ct)\s*$