LIKE operator as a replacement for RegEx - sql

My task is to select entries for the following mask - SNNN000
Where:
"N" – any numerical symbol;
"S" – any numerical or alphabetic symbol(Latin);
"0" - any numerical or alphabetic symbol (Latin), can be
missed;
Here's what I got - "[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]".
There was a problem with "0", how can I make it so that the mask can ignore the conditions for this symbol? All entries except the 5th(Id) from the #table table should be displayed.
DECLARE #table TABLE (
id INT
,Txt NVARCHAR(100)
);
INSERT INTO #table (id, Txt)
VALUES (1, N'S123AB1')
,(2, N'S123')
,(3, N'S123A')
,(4, N'S123AB')
,(5, N'S123.#!');
SELECT *
FROM #table AS t
WHERE t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]'
I understand that I could add conditions via the OR operator. But I would like to do it in a single expression and I could do it in regular expressions "[0-9A-Za-z]\d{3}[0-9A-Za-z]?[0-9A-Za-z]?[0-9A-Za-z]?". As I understand it, there are no full regular expressions in SQL, if I am wrong, then I would appreciate an explanation.
SELECT *
FROM #table AS t
WHERE t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]'
OR t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9]';

Unfortunately, using OR is probably the best you can do using SQL Server's enhanced LIKE operator:
SELECT *
FROM #table AS t
WHERE
t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9]' OR
t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z]' OR
t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z][0-9A-Za-z]' OR
t.Txt LIKE N'[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]';

The simplest method I can think of is:
SELECT t.*
FROM #table AS t
WHERE (t.Txt + 'AAA') LIKE '[0-9A-Za-z][0-9][0-9][0-9][0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]%' AND
LEN(t.Txt) BETWEEN 4 AND 7;
This adds three extra characters and checks that the first 7 characters match. It then validates the length of the column.

Related

Concatenate Strings with Spaces into a varchar(255) column

I am writing an ETL logic to insert four source columns at a certain position of a certain length into a target varchar(255) column. I have tried several ways but unable to find a solution for it. Any help is much appreciated.
Ex:
Source:
Column_id at Column 14, len 8
+
name at Column 43, len 27
+
term at Column 133, len 1
Target:
Description varchar(255)
You could convert the data to char like this:
select REPLICATE(' ', 14)+convert(char(8), column_id)+REPLICATE(' ', 43-8-14) + convert(char(27), name) + REPLICATE(' ', 133-43-27)+convert(char(1), term)
from <whatever table not provided>
I left '133-43-27' as a example, test it so it's the right position...
You can try something along this:
a declared table to simulate your issue
DECLARE #tbl TABLE(id INT IDENTITY, [name] VARCHAR(100), term VARCHAR(100));
INSERT INTO #tbl VALUES('Name One','first term')
,('One more name','One more term');
-- some variables for a generic approach
DECLARE #posId INT=1
,#posName INT=10
,#posTerm INT=50;
--the query
SELECT t.*
,STUFF(
STUFF(
STUFF(trg,#posId, LEN(t.id), t.id)
,#posName, LEN(t.[name]), t.[name])
,#posTerm, LEN(t.term), t.term)
FROM #tbl t
CROSS APPLY(SELECT REPLICATE(' ',255)) A(trg)
--the result
1 Name One first term
2 One more name One more term
The idea in short:
First we use CROSS APPLY(SELECT ...) to add a column to our result set. This column is a string, created off 255 blanks.
Now we can use STUFF(). This functions stuffs given characters into an existing string. By replacing the exact count of characters we will not touch the total length.
Hint 1: If your data might have trailing blanks LEN() can trick you out. You can either use TRIM() (older versions LTRIM() and RTRIM()) or DATALENGTH() (be aware of 2 bytes with NVARCHAR!) then...
Hint 2: If you have to cut your data to a max length, you can use LEFT()
STUFF() does what you want. But you want to be really careful about overwriting all the data that is there. For that, I would suggest casting to a char() type:
SELECT t.*,
STUFF(STUFF(STUFF(target, 14, 8, CONVERT(CHAR(8), t.id
), 43, 27, CONVERT(CHAR(27), t.name
), 133, 1, CONVERT(CHAR(1), t.term
)
FROM t;
The CHAR() type pads the values with spaces, which means that this code will overwrite any existing data in those positions (and only in those positions).

When concatenating using COALESCE, number more than 9 displays as asterisk *

I want to concatenate values from multiple rows into one. I am using COALESCE for this purpose. One of the columns I have is an ID column. When concatenating ID column, values up to 9 are displayed correctly but after nine, asterisk is displayed. Anyone knows why this is? See my code below using COALESCE to concatenate all rows in one:
CREATE TABLE #test
(id int, name varchar(50))
insert into #test
values(1, 'ana'),
(2, 'bob'),
(3, 'steph'),
(4, 'bill'),
(5, 'john'),
(6, 'jose'),
(7, 'kerry'),
(8, 'frank'),
(9, 'noah'),
(10, 'melissa')
--SELECT * FROM #test
DECLARE #NameAndID VARCHAR(1000)
SELECT #NameAndID = COALESCE(#NameAndID +'; ', '') + CAST(ID AS VARCHAR(1))+'. ' + name
FROM #test
SELECT #NameAndID
You are casting the number to varchar(1) - and any number that have more than a single digit will overflow the one char and therefor will be turned into an asterisks (*).
When casting ints, I find it best to use varchar(11), since this covers the maximum amount of chars that might be needed to display an int.
The int minimum value is -2,147,483,648 - removing the thousands separators it's 10 digits and a minus sign:
-2147483648
123456789 1 (10 is missing in the chars count to make it more clear)
By the way, there are better ways of doing string aggregation in T-Sql.
For versions prior to 2017, use a combination of stuff and for xml path, like this:
SELECT STUFF(
(
SELECT '; ' + CAST(id as varchar(11)) + '. ' + name
FROM #test
FOR XML PATH('')
),1 ,2, '')
For version 2017 or higher, use the built in string_agg function, like this:
SELECT STRING_AGG(CAST(id as varchar(11)) + '. '+ name, '; ')
FROM #Test
for more information, check out this SO post.
The * is an indicator that the result length (was) too short to display. In your example you're trying to fit a two digit number into VARCHAR(1). In this particular case the result is the * instead of throwing an error.
The behavior is described in the docs.

T-SQL Wildcard search - namespace values

I need to satisfy a condition in a string that has "ns[0-9]:" where [0-9] can be any number even greater than 10.
Example:
DECLARE #test TABLE ( value VARCHAR(20))
INSERT INTO #test VALUES
( 'ns1:'),
( 'NOT OK'),
( 'ns7:'),
( 'ns8:'),
( 'ns9:'),
( 'ns10:'),
( 'ns11:' )
SELECT *, PATINDEX( '%ns[0-9]:%', value ) passes
FROM #test
This only works on 1 to 9, not on 10 and above. I can use [0-9][0-9] but then it only works on 10 and above. I don't want a wild card between the number and the colon either.
I only want the following format to return a 1 with patindex
ns1:, ns2:, ns10:, ns11:, etc.
I also need a non-function solution. For performance reasons I want to use the string like functionality
Thanks
You can use:
select (case when value like 'ns[0-9]%:' and
value not like 'ns[0-9]%[^0-9]%:'
then 1 else 0
end) as passes_flag

How to trim/replace any letters in the value?

I have few columns in my old database that have values where number and letters are combined together. This is something that I have to clean and import in the new table. The most of the values that need to be converted look like this:
40M or 85M or NR or 5NR ...
Since there wasn't any validation what user can enter in the old system there still can be values like: 40A or 3R and so on. I want to import only numeric values in my new table. So if there is any letters in the value I want to trim them. What is the best way to do that in SQL Server? I have tried this:
CASE WHEN CHARINDEX('M',hs_ptr1) <> 0 THEN 1 ELSE 0 END AS hs_ptr1
but this will only identify if one letter is in the value. If anyone can help please let me know. Thanks!
you can use patindex to search for the pattern. Try this code:
Code:
CREATE TABLE #temp
(
TXT NVARCHAR(50)
)
INSERT INTO #temp (TXT)
VALUES
('40M'),
('85M'),
('NR'),
('5NR')
SELECT LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1)
FROM (
SELECT subsrt = SUBSTRING(TXT, pos, LEN(TXT))
FROM (
SELECT TXT, pos = PATINDEX('%[0-9]%', TXT)
FROM #temp
) d
) t
DROP TABLE #temp
Here's a way without a function....
declare #table table (c varchar(256))
insert into #table
values
('40M'),
('30'),
('5NR'),
('3(-4_')
select
replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000),
PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','')
from #table
You go with the PATINDEX function and search for a character that is not a digit. If such an index exists, then grab everything to the left of it. Something like that:
SELECT LEFT(your_field_name, PATINDEX("%[^0-9]%", your_field_name) - 1)
FROM your_table_name
UPDATE
Well, you need to take care of any edge cases. E.g. if there isn't a non-digit data the function will return 0, thus the calculation yields -1, which, indeed, is an invalid length.
I would suggest you to leverage a Common Table Expression to calculate the index of the non-digit data and then construct an IIF expression to select the correct char data. E.g.
WITH cte AS
(
SELECT *, PATINDEX("%[^0-9]%", your_field_name) AS NumLength
FROM your_table_name
)
SELECT any_other_field, IIF(NumLength = 0,
your_field_name,
LEFT(your_field_name, PATINDEX("%[^0-9]%", your_field_name) - 1)
)
FROM cte

Is it possible to search for multiple terms in a column by using a LIKE statement?

I'm trying to understand if the above question is possible. I've been conceptually thinking about it, and basically what I'm looking to do is:
Specify keywords that may appear in a title. Lets use the two terms "Portfolio" and "Mike"
I'm hoping to generate a query that will allow for me to search for when Portfolio is contained within a title, or Mike. These two titles need not to be together.
For instance, if I have a title dubbed: "Portfolio A" and another title "Mike's favorite" I'd like both of these titles to be returned.
The issue I've encountered with using a LIKE statement is the following:
WHERE 1=1
and rpt_title LIKE ''%'+#report_title+'%'''
If I were to input: 'Portfolio,Mike' it would search for the occurrence of just that within a title.
EDIT: I should have been a bit more clear. I believe it's necessary for me to input my variable as 'Portfolio, Mike' in order for it to find the multiple values. Is this possible?
I'm assuming you could maybe use a charindex with a substring and a replace?
Yep, multiple Like statements with OR will work just fine -- just make sure you use the correct parentheses:
SELECT ...
FROM ...
WHERE 1=1
and (rpt_title LIKE '%Portfolio%'
or rpt_title LIKE '%Mike%')
However, I might suggest you look into using a full-text search.
http://msdn.microsoft.com/en-us/library/ms142571.aspx
I can propose a solution where you could specify any number of masks, without using multiple LIKE -
DECLARE #temp TABLE (st VARCHAR(100))
INSERT INTO #temp (st)
VALUES ('Portfolio photo'),('- Mike'),('blank'),('else'),('est')
DECLARE #delims VARCHAR(30)
SELECT #delims = '|Portfolio|Mike|' -- %Portfolio% OR %Mike% OR etc.
SELECT t.st
FROM #temp t
CROSS JOIN (
SELECT substr =
SUBSTRING(
#delims,
number + 1,
CHARINDEX('|', #delims, number + 1) - number - 1)
FROM [master].dbo.spt_values n
WHERE [type] = N'P'
AND number <= LEN(#delims) - 1
AND SUBSTRING(#delims, number, 1) = '|'
) s
WHERE t.st LIKE '%' + s.substr + '%'