Regex number and dashes - sql

I would love to have a regex on my LIKE clause with the criteria given:
It should only have 13 length characters
First 8 characters should be number
The 9th character should append a dash (-)
The rest of 4 characters are the rest of the numbers.
Example:
12345678-9123
98765432-1234
Currently have this function:
CREATE FUNCTION [dbo].[FN_VALIDATE_ID](#TX_INPUT VARCHAR(50))RETURNS BIT AS
BEGIN
DECLARE #bitInputVal AS BIT = 1
DECLARE #InputText VARCHAR(50)
SET #InputText = LTRIM(RTRIM(ISNULL(#TX_INPUT,'')))
IF #InputText <> '' AND LEN(#InputText) = 13
BEGIN
SET #bitInputVal = CASE
WHEN #InputText LIKE '%^[0-9]%' THEN 1
ELSE 0
END
END
RETURN #bitInputVal
END

You don't even need a UDF for this, as SQL Server's enhanced LIKE operator can handle this requirement:
SELECT *
FROM yourTable
WHERE col LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]';
-- \ 8 digits / - \ 4 digits /

Related

Regex in LIKE Clause that accepts only Alphanumeric and dashes

Below is my SQL function script that will help identify the alphanumeric value and dashes (-):
CREATE FUNCTION [dbo].[FN_VALIDATE_ALPHANUMERIC_AND_DASHES](#TX_INPUT VARCHAR(1000))RETURNS BIT AS
BEGIN
DECLARE #bitInputVal AS BIT = 0
DECLARE #InputText VARCHAR(1000)
SET #InputText = LTRIM(RTRIM(ISNULL(#TX_INPUT,'')))
IF #InputText <> ''
BEGIN
SET #bitInputVal = CASE
WHEN #InputText LIKE '%[A-Za-z0-9-]%' THEN 1
ELSE 0
END
END
RETURN #bitInputVal
END
I have problem which I try this query:
SELECT dbo.FN_VALIDATE_CLAIMANT_REF_NO('AbcdefgH-1234*') it gives me a result of 1 though the character * is not included in the regex and should return 0 instead.
What I want to achieve is to explicitly verify if the string consist of alphanumeric (alphabets and numbers) and dashes only.
Please note that there are no limitation in length of the characters, only check if the string consist of alphanumeric and dashes.
You are testing for "whether any alphabet, number or dash is present in the string"
You should instead test whether any character other than alphabet, number or dash is present.
SET #bitInputVal = CASE WHEN #InputText LIKE '%[^A-Za-z0-9-]%' THEN 0 ELSE 1 END

Pattern Matching with SQL Like for a range of characters

Is there a way to use Pattern Matching with SQL LIKE, to match a variable number of characters with an upper limit?
For example, I have one column which can have "correct values" of 2-10 numbers, anything more than 10 and less than 2 is incorrect.
If you really want to use like you can do:
where col like '__' or col_like '___' or . . .
or:
where col like '__%' and -- at least two characters
col not like '_________%' -- not 9 characters
The more typical method would be:
where len(col) between 2 and 10
If you want to ensure they are numbers:
where len(col) between 2 and 10 and
col not like '%[^0-9]%'
You could make a function to do this the long way of inspecting each character like below:
DECLARE #TempField varchar(max)= '1v3Abc567'
DECLARE #FieldLength int = LEN(#TempField)
DECLARE #CountNumeric int = 0
WHILE #FieldLength > 0
BEGIN
SELECT #CountNumeric = #CountNumeric + ISNUMERIC( SUBSTRING (#TempField,#FieldLength,1))
SET #FieldLength = #FieldLength - 1
END
SELECT #CountNumeric NumericCount
You can use Regex.
SELECT * FROM mytable WHERE mycol LIKE '%[0-9]{2,10}%';
should do it (that's off the top of my head, so double-check!

Replace every alpha character with itself + wildcard in string SQL Server

My goal is to create a query that will search for results related to a specific keyword.
Say in a database we had the word cat.
Regardless of if the user types C a t, C.A.T. or Cat I want to find a result related to the search as long as the alpha numeric characters are in the correct sequence that is all that matters
Say in the database we have these 4 records
cat
c/a/t
c.a.t
c. at
If the user types in C#$*(&A T I'd like to get all 4 results.
What I have written so far in my query is a function that strips any non-alphanumeric characters from the input string.
What can I do to replace each alphanumeric character with itself and add a wildcard at the end?
For every alpha character my input would look similar to this
C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
Actually, that search string will return only one record from this table: the row with 'c.a.t '.
This is because the expression C%[^a-zA-Z0-9]%A does not mean there can't be any alpha-numeric chars between C and A.
What it actually means is there should be at least one non alpha-numeric value between C and A.
Moreover, it will return incorrect values as well - a value like 'c u a s e t ' will be returned.
You need to change your where clause to something like this:
WHERE column LIKE '%C%A%T%'
AND column NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
This way, if you have cat in the correct order, the first row will resolve to true, and if there are no other alpha-numeric chars between c, a, and t the second row will resolve to true.
Here is a test script, where you can see for yourself what I mean:
DECLARE #T AS TABLE
(
a varchar(20)
)
INSERT INTO #T VALUES
('cat'),
('c/a/t'),
('c.a.t '),
('c. at'),
('c u a s e t ')
-- Incorrect where clause
SELECT *
FROM #T
WHERE a LIKE 'C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%'
-- correct where clause
SELECT *
FROM #T
WHERE a LIKE '%C%A%T%'
AND a NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
You can also see it in action in this link.
And since I had some spare time, here is a script to create both the like and the not like patterns from the input string:
DECLARE #INPUT varchar(100) = '#*# c %^&# a ^&*$&* t (*&(%!##$'
DECLARE #Index int = 1,
#CurrentChar char(1),
#Like varchar(100),
#NotLike varchar(100) = '%'
WHILE #Index < LEN(#Input)
BEGIN
SET #CurrentChar = SUBSTRING(#INPUT, #Index, 1)
IF PATINDEX('%[^a-zA-Z0-9]%', #CurrentChar) = 0
BEGIN
SET #NotLike = #NotLike + #CurrentChar + '%[a-zA-Z0-9]%'
END
SET #Index = #Index + 1
END
SELECT #NotLike = LEFT(#NotLike, LEN(#NotLike) - 12),
#Like = REPLACE(#NotLike, '%[a-zA-Z0-9]%', '%')
SELECT *
FROM #T
WHERE a LIKE #Like
AND a NOT LIKE #NotLike
You can recursively go through your (cleaned) search string and to each letter add the expression you would like. In my example #builtString should be what you would like to use further on, if I understood correctly.
declare #cleanSearch as nvarchar(10) = 'CAT'
declare #builtString as nvarchar(100) = ''
WHILE LEN(#cleanSearch) > 0 -- loop until you deplete the search string
BEGIN
SET #builtString = #builtString + substring(#cleanSearch,1,1) + '%[^a-zA-Z0-9]%' -- append the letter plus regular expression
SET #cleanSearch = right(#cleanSearch, len(#cleanSearch) - 1) -- remove first letter of the search string
END
SELECT #builtString --will look like C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
SELECT #cleanSearch --#cleanSearch is now empty

create sql view from comma separated values

T-sql question:
I need help to build a join from 2 tables, where on one of the tables I have aggregated data (comma separated values).
I have a table - Users where I have 3 columns: UserId, DefaultLanguage and OtherLanguages.
The table looks like this:
UserId | DefaultLanguage | OtherLanguages
---------------------------------------------
1 | en | NULL
2 | en | it, fr
3 | fr | en, it
4 | en | sp
and so on.
I have another table where I have the association between language code (en, fr, ro, it, sp) and language name:
LangCode | LanguageName
-------------------------
en | English
fr | French
it | Italian
sp | Spanish
and so on.
I want to create a view like this:
UserId | DefaultLanguage | OtherLanguages
---------------------------------------------
1 | English | NULL
2 | English | Italian, French
3 | French | English, Italian
4 | English | Spanish
and so on.
In short, I need a view where the language code is replaced by language name.
Any help, please?
Several solutions of course you can recreate all table change the data structure.
1. If all the language are 2 digits:
select t1.UserId, t2.LanguageName,
ISNULL( t3.LanguageName, '') + ISNULL(', '+t4.LanguageName, '') + ISNULL( ', '+t5.LanguageName, '') OtherLanguages
from Table1 t1
inner join Table2 t2 on t1.DefaultLanguage = t2.LangCode
left join Table2 t3 on Left(t1.OtherLanguages,2) = t3.LangCode
left join Table2 t4 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 3 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 4, 2) ELSE null END = t4.LangCode
left join Table2 t5 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 6 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 7, 2) ELSE null END = t5.LangCode
Use user-define function:
CREATE FUNCTION [dbo].[func_GetLanguageName] (#pLanguageList varchar(max))
RETURNS varchar(max) AS
BEGIN
Declare #aLanguageList varchar(max) = #pLanguageList
Declare #aLangCode varchar(max) = null
Declare #aReturnName varchar(max) = null
WHILE LEN(#aLanguageList) > 0
BEGIN
IF PATINDEX('%,%',#aLanguageList) > 0
BEGIN
SET #aLangCode = RTRIM(LTRIM(SUBSTRING(#aLanguageList, 0, PATINDEX('%,%',#aLanguageList))))
SET #aLanguageList = LTRIM(SUBSTRING(#aLanguageList, LEN(#aLangCode + ',') + 1,LEN(#aLanguageList)))
END
ELSE
BEGIN
SET #aLangCode = #aLanguageList
SET #aLanguageList = NULL
END
Select #aReturnName = ISNULL( #aReturnName + ', ' , '') + LanguageName from Table2 where LangCode=#aLangCode
END
RETURN(#aReturnName)
END
and use select
select UserId, dbo.func_GetLanguageName(DefaultLanguage)DefaultLanguage, dbo.func_GetLanguageName(OtherLanguages) OtherLanguages from table1
Best practice would dictate not to have this type of comma delimited
data in a column...
Since you stated in comments that the schema cannot be changed, the next best thing is a function. This can be used in a select query in-line.
SQL is notoriously slow with string manipulation. Here is an interesting article on the topic. There are many SQL "string split" functions out there. They all generally split a comma delimited string and return a table.
For this specific use-case, you actually need a scalar-valued
function (a function which returns one value) rather than a
table-valued function (one which returns a table of values).
Below is a modified such function, which returns a scalar value in place of the original comma delimited string of language codes.
The comments explain what is happening line by line.
The gist is that you must loop through the input string keeping track of the last comma location, extract each code, lookup the full language from the languages table, and then return the output as a comma-delimited string.
Language codes to languages function:
Create Function [dbo].fn_languageCodeToFull
( #Input Varchar(100) )
Returns Varchar(1000)
As
Begin
-- To address null input, based on the example you provided, we set the output to NULL if there is no input
If #Input = '' Or #Input Is Null
Return Null
Declare
#CodeLength int, -- constant for code length to avoid hardcoded "magic numbers"
#Output varchar(1000), -- will contain the final comma delimited string of full languages
#LastIndex int, -- tracks the location of the input we are searching as we loop over the string
#CurrentCode varchar(2), -- for code readability, we extract each language code to this variable
#CurrentLanguage varchar(50), -- for code readability, we store the full language in this variable
#IndexIncrement int -- constant to increment the search index by 1 at each iteration
-- ensuring the loop moves forward
Set #LastIndex = 0 -- seed the index, so we begin to search at 0 index
Set #CodeLength = 2 -- ISO language codes are always 2 characters in length
Set #Output = '' -- seed with empty string to avoid NULL when concatenating
Set #IndexIncrement = 1 -- again avoiding hardcoded values...
-- We will loop until we have gone to or beyond the length of the input string
While #LastIndex < len(#Input)
Begin
-- Set the index of each comma (charindex is 1-based)
Set #LastIndex = CHARINDEX(',', #Input, #LastIndex)
-- When we get to the last item, CharIndex will return 0 when it does not find a comma.
-- To pull the last item, we will artificially set #LastIndex to be 1 greater than the input string
-- This will allow the code following this line to be unaltered for this scenario
If #LastIndex = 0 set #LastIndex = len(#Input) + 1 -- account for 1-based index of substring
-- Extract the code prior to the current comma that charindex has identified
Set #CurrentCode = substring(#Input, #LastIndex - #CodeLength, #CodeLength)
-- Do a lookup to get the language for the current code
Set #CurrentLanguage = (Select LanguageName From languages Where code = #CurrentCode)
-- Only add comma after first language to ensure no extra comma will be present in Output
If #LastIndex > 3 Set #Output = #Output + ','
-- Here we build the Output string with the language
Set #Output = #Output + #CurrentLanguage
-- Finally, we increment #LastIndex by 1 to avoid loop on first instance of comma
Set #LastIndex = #LastIndex + #IndexIncrement
End
Return #Output
End
Then your view would simply do something like:
Sample view using the function:
Create View vw_UserLanguages
As
Select
UserId,
dbo.fn_languageCodeToFull(DefaultLanguage) as DefaultLanguage,
dbo.fn_languageCodeToFull(OtherLanguages) as OtherLanguages,
From UserLanguageCodes -- you do not provide a name so I made one up
Note that the function will work whether there are commas or not, so there is no need to join the Languages table here as you can just have the function do all the work in this case.
One quick and dirty solution would be to use a nested REPLACE command but that could result in a very complex statement a bit long winded, especially if you have more than five languages.
As an example:
SELECT [UserId],[DefaultLanguage],
CASE
WHEN [OtherLanguages] IS NULL THEN ''
ELSE REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE([OtherLanguages],
'en','English'),
'fr','French'),
'it','Italian'),
'ro','Romulan'), --Probably not the intended language ;-)
'sp','Spanish')
END as [OtherLanguages]
FROM YourTable
Personally, I'd create a scalar function, again using the REPLACE command, but you can then check the number of languages present and add a counter so that you're not doing unnecessary lookups.
SELECT [UserId],[DefaultLanguage],
CASE
WHEN [OtherLanguages] IS NULL THEN ''
WHEN [OtherLanguages] = '' THEN ''
ELSE do_function_name([OtherLanguages])
END as [OtherLanguages]
FROM YourTable
It might not be good practice but there are times when it is more efficient to store multiple values in a single field but accept that when you do, it will slow down the way you handle that data.

Trim Special Char from SQL String

I am using SQL Server 2008
I have sql string in column with ; separated values. How i can trim the below value
Current string:
;145615;1676288;178829;
Output:
145615;1676288;178829;
Please help with sql query to trim the first ; from string
Note : The first char may be or may not be ; but if it is ; then only it should trim.
Edit: What i had tried before, although it doesn't make sense after so many good responses.
DECLARE
#VAL VARCHAR(1000)
BEGIN
SET #VAL =';13342762;1334273;'
IF(CHARINDEX(';',#VAL,1)=1)
BEGIN
SELECT SUBSTRING(#VAL,2,LEN(#VAL))
END
ELSE
BEGIN
SELECT #VAL
END
END
SELECT CASE WHEN col LIKE ';%'
THEN STUFF(col,1,1,'') ELSE col END
FROM dbo.table;
Just check the first character, and if it matches, start from the second character:
SELECT CASE WHEN SUBSTRING(col,1,1) = ';'
THEN SUBSTRING(col,2,LEN(col))
ELSE col
END AS col
Here's an example:
DECLARE #v varchar(10)
SET #v = ';1234'
SELECT
CASE
WHEN LEFT(#v,1) = ';' THEN RIGHT(#v, LEN(#v) - 1)
ELSE #v
END
A further development on #Aaron Bertrand's answer:
SELECT
STUFF(col, 1, PATINDEX(';%', col), '')
FROM ...
PATINDEX is similar to LIKE in that it uses a pattern search, but being a function it also returns the position of the first match. In this case, since we a looking for a ; specifically at the beginning of a string, the position returned is going to be either 1 (if found) or 0 (if not found). If it is 1, the STUFF function will delete 1 character at the beginning of the string, and if the position is 0, STUFF will delete 0 characters.