Check a word starting with specific string [SQL Server] - sql

I try to search on a string like Dhaka is the capital of Bangladesh which contain six words. If my search text is cap (which is the starting text of capital), it will give me the starting index of the search text in the string (14 here). And if the search text contain in the string but not starting text any of the word, it will give me 0. Please take a look at the Test Case for better understanding.
What I tried
DECLARE #SearchText VARCHAR(20),
#Str VARCHAR(MAX),
#Result INT
SET #Str = 'Dhaka is the capital of Bangladesh'
SET #SearchText = 'cap'
SET #Result = CASE WHEN #Str LIKE #SearchText + '%'
OR #Str LIKE + '% ' + #SearchText + '%'
THEN CHARINDEX(#SearchText, #Str)
ELSE 0 END
PRINT #Result -- print 14 here
For my case, I need to generate #Str with another sql function. Here, we need to generate #Str 3 times which is costly (I think). So, is there any way so that I need generate #Str only one time? [Is that possible by using PATINDEX]
Note: CASE condition appear in the where clause at my original query. So, It is not possible to set the #Str value in variable then use it in the where clause.
Test Case
Search Text: Dhaka, Result: 1
Search Text: tal, Result: 0
Search Text: Mirpur, Result: 0
Search Text: isthe, Result: 0
Search Text: is the, Result: 7
Search Text: Dhaka Capital, Result: 0

Simply add a leading space to the strings to ensure that you always find only the beginning of a word:
DECLARE #SearchText VARCHAR(20),
#Str VARCHAR(MAX),
#Result INT
SET #Str = 'Dhaka is the capital of Bangladesh'
SET #SearchText = 'Dhaka Capital'
SET #Result = CHARINDEX(' ' + #SearchText, ' ' + #Str)
PRINT #Result -- print 14 here
I have tested the above query against your test cases and it seems to work.

To compute the function only once per row in SELECT make it table valued function. Or if it's impossible for some reason use CROSS APPLY
SELECT .. a, b,
FROM ..
CROSS APPLY (SELECT my_scalar_fn(a,b) as Str) arg
WHERE CASE WHEN arg.Str LIKE SearchText + '%'
OR arg.Str LIKE + '% ' + SearchText + '%'
THEN CHARINDEX(SearchText, arg.Str)
ELSE 0 END

Related

SSMS replace all commas outside of quotation marks in string

I've written the following function in SSMS to replace any commas that are outside of quotation marks with ||||:
CREATE FUNCTION dbo.fixqualifier (#string nvarchar(max))
returns nvarchar(max)
as begin
DECLARE #STRINGTOPAD NVARCHAR(MAX)
DECLARE #position int = 1,#newstring nvarchar(max) ='',#QUOTATIONMODE INT = 0
WHILE(LEN(#string)>0)
BEGIN
SET #STRINGTOPAD = SUBSTRING(#string,0,IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)))
SET #newstring = #newstring + IIF(#QUOTATIONMODE = 1, REPLACE(#STRINGTOPAD,',','||||'),#STRINGTOPAD)
SET #QUOTATIONMODE = IIF(#QUOTATIONMODE = 1,0,1)
set #string = SUBSTRING(#string,1+IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)),LEN(#string))
END
return #newstring
end
The idea is for the function to find the first ", replace all ',' before that then switch to quotation mode 1 so it knows to not replace the , until it changes back to quotation mode 0 when it hits the 2nd " and so on.
so for example the string:
qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl
would become:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
It works as expected but it's really inefficient when it comes to doing this for several thousand rows.
Is there a better way or doing this or at least speeding the function up.
Do a simple trick by Modulus
DECLARE #VAR VARCHAR(100) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl'
,#OUTPUT VARCHAR(100) = '';
SELECT #OUTPUT = #OUTPUT + CASE WHEN (LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', ''))) % 2 = 0
THEN REPLACE(VAL, ',', '||||') ELSE VAL END
FROM (
SELECT SUBSTRING(#VAR, NUMBER, 1) VAL
FROM master.dbo.spt_values
WHERE type = 'P'
AND NUMBER BETWEEN 1 AND LEN(#VAR)
) A
PRINT #OUTPUT
Result:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
By this LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', '')) expression, you will get count of ". By taking Modulus of the count %2, if it is zero its even then you can replace commas, otherwise you will keep them.
This uses DelimitedSplit8k and completely avoids any RBAR methods (such as a WHILE or #Variable = #Variable +... (which is a hidden form of RBAR)).
It firstly splits on the quotation, and then on the commas, where the string isn't quoted. Finally it then puts the strings back together again, using the "old" STUFF and FOR XML PATH method:
USE Sandbox;
DECLARE #String varchar(8000) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl';
WITH Splits AS(
SELECT QS.ItemNumber AS QuoteNumber, CS.ItemNumber AS CommaNumber, ISNULL(CS.Item, '"' + QS.Item + '"') AS DelimitedItem
FROM dbo.DelimitedSplit8K(#string,'"') QS
OUTER APPLY (SELECT *
FROM dbo.DelimitedSplit8K(QS.Item,',')
WHERE QS.ItemNumber % 2 = 1) CS
WHERE QS.Item <> ',')
SELECT STUFF((SELECT '||||' + S.DelimitedItem
FROM Splits S
ORDER BY S.QuoteNumber, S.CommaNumber
FOR XML PATH('')),1,1,'') AS DelimitedList;
(Note, DelimitedSplit8K does not accept more than 8,000 characters. If you have more than that, SQL Server is really not the right tool. STRING_SPLIT does not provide the ordinal position, so you would be unable to guarantee the rebuild order with it.)

Replace every alpha character with itself + wildcard in string SQL Server

My goal is to create a query that will search for results related to a specific keyword.
Say in a database we had the word cat.
Regardless of if the user types C a t, C.A.T. or Cat I want to find a result related to the search as long as the alpha numeric characters are in the correct sequence that is all that matters
Say in the database we have these 4 records
cat
c/a/t
c.a.t
c. at
If the user types in C#$*(&A T I'd like to get all 4 results.
What I have written so far in my query is a function that strips any non-alphanumeric characters from the input string.
What can I do to replace each alphanumeric character with itself and add a wildcard at the end?
For every alpha character my input would look similar to this
C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
Actually, that search string will return only one record from this table: the row with 'c.a.t '.
This is because the expression C%[^a-zA-Z0-9]%A does not mean there can't be any alpha-numeric chars between C and A.
What it actually means is there should be at least one non alpha-numeric value between C and A.
Moreover, it will return incorrect values as well - a value like 'c u a s e t ' will be returned.
You need to change your where clause to something like this:
WHERE column LIKE '%C%A%T%'
AND column NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
This way, if you have cat in the correct order, the first row will resolve to true, and if there are no other alpha-numeric chars between c, a, and t the second row will resolve to true.
Here is a test script, where you can see for yourself what I mean:
DECLARE #T AS TABLE
(
a varchar(20)
)
INSERT INTO #T VALUES
('cat'),
('c/a/t'),
('c.a.t '),
('c. at'),
('c u a s e t ')
-- Incorrect where clause
SELECT *
FROM #T
WHERE a LIKE 'C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%'
-- correct where clause
SELECT *
FROM #T
WHERE a LIKE '%C%A%T%'
AND a NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
You can also see it in action in this link.
And since I had some spare time, here is a script to create both the like and the not like patterns from the input string:
DECLARE #INPUT varchar(100) = '#*# c %^&# a ^&*$&* t (*&(%!##$'
DECLARE #Index int = 1,
#CurrentChar char(1),
#Like varchar(100),
#NotLike varchar(100) = '%'
WHILE #Index < LEN(#Input)
BEGIN
SET #CurrentChar = SUBSTRING(#INPUT, #Index, 1)
IF PATINDEX('%[^a-zA-Z0-9]%', #CurrentChar) = 0
BEGIN
SET #NotLike = #NotLike + #CurrentChar + '%[a-zA-Z0-9]%'
END
SET #Index = #Index + 1
END
SELECT #NotLike = LEFT(#NotLike, LEN(#NotLike) - 12),
#Like = REPLACE(#NotLike, '%[a-zA-Z0-9]%', '%')
SELECT *
FROM #T
WHERE a LIKE #Like
AND a NOT LIKE #NotLike
You can recursively go through your (cleaned) search string and to each letter add the expression you would like. In my example #builtString should be what you would like to use further on, if I understood correctly.
declare #cleanSearch as nvarchar(10) = 'CAT'
declare #builtString as nvarchar(100) = ''
WHILE LEN(#cleanSearch) > 0 -- loop until you deplete the search string
BEGIN
SET #builtString = #builtString + substring(#cleanSearch,1,1) + '%[^a-zA-Z0-9]%' -- append the letter plus regular expression
SET #cleanSearch = right(#cleanSearch, len(#cleanSearch) - 1) -- remove first letter of the search string
END
SELECT #builtString --will look like C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
SELECT #cleanSearch --#cleanSearch is now empty

Remove only leading or trailing carriage returns

I'm dumbfounded that this question has not been asked meaningfully already. How does one go about creating an equivalent function in SQL like LTRIM or RTRIM for carriage returns and line feeds ONLY at the start or end of a string.
Obviously REPLACE(REPLACE(#MyString,char(10),''),char(13),'') removes ALL carriage returns and new line feeds. Which is NOT what I'm looking for. I just want to remove leading or trailing ones.
Find the first character that is not CHAR(13) or CHAR(10) and subtract its position from the string's length.
LTRIM()
SELECT RIGHT(#MyString,LEN(#MyString)-PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',#MyString)+1)
RTRIM()
SELECT LEFT(#MyString,LEN(#MyString)-PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',REVERSE(#MyString))+1)
Following functions are enhanced types of trim functions you can use. Copied from sqlauthority.com
These functions remove trailing spaces, leading spaces, white space, tabs, carriage returns, line feeds etc.
Trim Left
CREATE FUNCTION dbo.LTrimX(#str VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #trimchars VARCHAR(10)
SET #trimchars = CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32)
IF #str LIKE '[' + #trimchars + ']%' SET #str = SUBSTRING(#str, PATINDEX('%[^' + #trimchars + ']%', #str), LEN(#str))
RETURN #str
END
Trim Right
CREATE FUNCTION dbo.RTrimX(#str VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #trimchars VARCHAR(10)
SET #trimchars = CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32)
IF #str LIKE '%[' + #trimchars + ']'
SET #str = REVERSE(dbo.LTrimX(REVERSE(#str)))
RETURN #str
END
Trim both Left and Right
CREATE FUNCTION dbo.TrimX(#str VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN dbo.LTrimX(dbo.RTrimX(#str))
END
Using function
SELECT dbo.TRIMX(#MyString)
If you do use these functions you might also consider changing from varchar to nvarchar to support more encodings.
In SQL Server 2017 you can use the TRIM function to remove specific characters from beginning and end, in one go:
WITH testdata(str) AS (
SELECT CHAR(13) + CHAR(10) + ' test ' + CHAR(13) + CHAR(10)
)
SELECT
str,
TRIM(CHAR(13) + CHAR(10) + CHAR(9) + ' ' FROM str) AS [trim cr/lf/tab/space],
TRIM(CHAR(13) + CHAR(10) FROM str) AS [trim cr/lf],
TRIM(' ' FROM str) AS [trim space]
FROM testdata
Result:
Note that the last example (trim space) does nothing as expected since the spaces are in the middle.
Here's an example you may run:
I decided to cast the results as an Xml value, so when you click on it, you will be able to view the Carriage Returns.
DECLARE #CRLF Char(2) = (CHAR(0x0D) + CHAR(0x0A))
DECLARE #String VarChar(MAX) = #CRLF + #CRLF + ' Hello' + #CRLF + 'World ' + #CRLF + #CRLF
--Unmodified String:
SELECT CAST(#String as Xml)[Unmodified]
--Remove Trailing Whitespace (including Spaces).
SELECT CAST(LEFT(#String, LEN(REPLACE(#String, #CRLF, ' '))) as Xml)[RemoveTrailingWhitespace]
--Remove Leading Whitespace (including Spaces).
SELECT CAST(RIGHT(#String, LEN(REVERSE(REPLACE(#String, #CRLF, ' ')))) as Xml)[RemoveLeadingWhitespace]
--Remove Leading & Trailing Whitespace (including Spaces).
SELECT CAST(SUBSTRING(#String, LEN(REPLACE(#String, ' ', '_')) - LEN(REVERSE(REPLACE(#String, #CRLF, ' '))) + 1, LEN(LTRIM(RTRIM(REPLACE(#String, #CRLF, ' '))))) as Xml)[RemoveAllWhitespace]
--Remove Only Leading and Trailing CR/LF's (while still preserving all other Whitespace - including Spaces). - 04/06/2016 - MCR.
SELECT CAST(SUBSTRING(#String, PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',#String), LEN(REPLACE(#String, ' ', '_')) - PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',#String) + 1 - PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%', REVERSE(#String)) + 1) as Xml)[RemoveLeadingAndTrailingCRLFsOnly]
Remember to remove the Cast-to-Xml, as this was done just as a Proof-of-Concept to show it works.
How is this better than the currently Accepted Answer?
At first glance this may appear to use more Functions than the Accepted Answer.
However, this is not the case.
If you combine both approaches listed in the Accepted Answer (to remove both Trailing and Leading whitespace), you will either have to make two passes updating the Record, or copy all of one Logic into the other (everywhere #String is listed), which would cause way more function calls and become even more difficult to read.
I was stuck using Microsoft SQL Server 2008 R2 and so basing my functions on #sqluser's answer I came up with the below. This will return an empty string if the string only contains the characters to be trimmed.
The bit that threw me was the pattern for PATINDEX must be included between % characters, which for a while I was thinking of as the same wildcard in a LIKE statement but which I now believe is just the syntax to denote a pattern, though I may be wrong!
CREATE FUNCTION [dbo].[ExtendedLTRIM](#string_to_trim VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #tab CHAR(1) = CHAR(9);
DECLARE #line_feed CHAR(1) = CHAR(10);
DECLARE #carriage_return CHAR(1) = CHAR(13);
DECLARE #space CHAR(1) = CHAR(32);
DECLARE #characters_to_trim VARCHAR(10)
SET #characters_to_trim = #tab + #line_feed + #carriage_return + #space
IF #string_to_trim LIKE '[' + #characters_to_trim + ']%'
BEGIN
DECLARE #first_non_trim_character INT = PATINDEX('%[^' + #characters_to_trim + ']%', #string_to_trim);
IF #first_non_trim_character = 0 RETURN '';
RETURN SUBSTRING(#string_to_trim, #first_non_trim_character, 8000)
END
RETURN #string_to_trim
END
GO
To trim characters from a pre-defined list you'll want to create the following UDF (should work in 2008R2 and above).
Handles both sides in a single pass and doesn't care if it's a CRLF, LFCR (yep, seen that abomination more than once), bare LF or a bunch of spaces.
is easy to extend to e.g. add additional parameters to do LTRIM/RTRIM only, or a full purge (that last bit is simpler to do in 2017 by incorporating STRING_AGG, but perfectly doable in 2008R2); as a matter of fact this is a simplified version of something I use to do all those things. If anybody is interested then let me know and I'll update:
CREATE FUNCTION fnTrimHarder
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE
#Start INT,
#Len INT,
#Chars CHAR(5) = CONCAT(
CHAR(9), -- TAB
CHAR(10), -- LF
CHAR(13), -- CR
' '
), -- List of invalid characters
#Return VARCHAR(MAX) = '';
IF #String NOT LIKE '%[^' + #Chars + ']%' -- If string contains only invalid characters
OR COALESCE(#String, '') = '' -- Optional addition for NULL handling
RETURN #Return
ELSE
BEGIN -- Create a "table" of characters with ordinals, calculate the start of string and its length, then return the substring
WITH CTE AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM CTE
WHERE n < LEN(#String)
)
SELECT
#Start = MIN(n),
#Len = 1 + MAX(n) - MIN(n)
FROM CTE
WHERE SUBSTRING(#String, n, 1) NOT LIKE '[' + #Chars + ']';
SET #Return = SUBSTRING(#String, #Start, #Len)
END
RETURN #Return
END
GO

Trim String After Keyword

I have a column that contains status changes, but I don't want to return the whole string. Is there any way to return just a part of a string after a certain keyword? Every value of the column is in the format of From X to Y where X and Y could be a single word or multiple words. I've looked at the substring and trim functions, but those seem to require knowledge of how many spaces you want to keep.
Edit: I want to keep part Y from the status and get rid of 'From X to'.
You can use a combination of Charindex and Substring and Len to do it.
Try this:
select SUBSTRING(field,charindex('keyword',field), LEN('keyword'))
So this will find Flop and extract it wherever it is in the field
select SUBSTRING('bullflop',charindex('flop','bullflop'), LEN('flop'))
EDIT:
To get the remainder then just set LEN to the field LEN(field)
declare #field varchar(200)
set #field = 'this is bullflop and other such junk'
select SUBSTRING(#field,charindex('flop',#field), LEN(#field) )
EDIT 2:
Now I understand, here is a quick and dirty version...
declare #field varchar(200)
set #field = 'From X to Y'
select Replace(SUBSTRING(#field,charindex('to ',#field), LEN(#field) ), 'to ','')
Returns:
Y
EDIT 3:
Cory is right, this is cleaner.
declare #field varchar(200) = 'From X to Y'
declare #keyword varchar(200) = 'to '
select SUBSTRING(#field,charindex(#keyword,#field) + LEN(#keyword), LEN(#field) )
Other answers are fine, but I like the STUFF() function and it doesn't seem to be well-known, so here's another option:
DECLARE #field VARCHAR(50) = 'From Authorized to Auth Not Needed'
,#keyword VARCHAR(50) = ' to '
SELECT STUFF(#field,1,CHARINDEX(#keyword,#field)+LEN(#keyword),'')
STUFF() is like SUBSTRING() and REPLACE() combined, you feed it a string, a start position and a length, and can replace that with anything or in your case, nothing ''.
From MSDN:
STUFF ( character_expression , start , length , replaceWith_expression )
You can combine a few string functions to do what you want:
DECLARE #Field varchar(100) = 'From A to Z'
DECLARE #Keyword varchar(100) = 'to'
-- Method 1 (Find the keyword, then take the remainder of the string)
SELECT LTRIM(SUBSTRING(#Field,
CHARINDEX(#Keyword, #Field, 0) + LEN(#Keyword), LEN(#Field)))
EDIT:
-- Method 2 (Take from the right the characters up to the keyword)
SELECT RIGHT(#Field, LEN(#Field) - CHARINDEX(#Keyword, #Field, 0) - LEN(#Keyword))
Produces:
'Z'

Extract a number from String in SQL

I have the following string:
"FLEETWOOD DESIGNS 535353110XXXXX" (The X's are actually numbers I just wanted to hide them here)
Does anyone know how can I search through Strings in SQL and extract numbers that are greater then lets say 10 characters long?
This a quite old post but might help anyone else. I was searching for an user defined function in SQL Server to extract only the numbers of a given string, and, surprisingly I could not find exactly what I was looking for.
Let me put here the code of a function to "Extract a number from string in SQL" (valid for SQL Server). This is taken from the fantastic blog of Pinal Dave, I've modified it just to return NULL is a NULL value is passed to the function.
CREATE FUNCTION [dbo].[ExtractInteger](#String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #Count INT
DECLARE #IntNumbers VARCHAR(1000)
SET #Count = 0
SET #IntNumbers = ''
IF #String IS NULL
RETURN NULL;
WHILE #Count <= LEN(#String)
BEGIN
IF SUBSTRING(#String,#Count,1) >= '0' AND SUBSTRING(#String,#Count,1) <= '9'
BEGIN
SET #IntNumbers = #IntNumbers + SUBSTRING(#String,#Count,1)
END
SET #Count = #Count + 1
END
RETURN #IntNumbers
END
Tests
select '"' + dbo.ExtractInteger('1a2b3c4d5e6f7g8h9i') + '"'
GO
select '"' + dbo.ExtractInteger('abcdefghi') + '"'
GO
select '"' + dbo.ExtractInteger(NULL) + '"'
GO
select '"' + dbo.ExtractInteger('') + '"'
GO
Results
"123456789"
""
NULL
""
You don't mention the DB engine, so we don't know what features are available...
If regexpressions are available then pattern like \d{10,} would match numbers with 10 or more digit.
In mySQL REGEXP can only return true or false (0 or 1) so you'd have to use some ugly hack like
SELECT
LEAST(
INSTR(field,'0'),
INSTR(field,'1'),
INSTR(field,'2'),
INSTR(field,'3'),
INSTR(field,'4'),
INSTR(field,'5'),
INSTR(field,'6'),
INSTR(field,'7'),
INSTR(field,'8'),
INSTR(field,'9')
) AS startPos,
REVERSE(field) AS backward,
LEAST(
INSTR(backward,'0'),
INSTR(backward,'1'),
INSTR(backward,'2'),
INSTR(backward,'3'),
INSTR(backward,'4'),
INSTR(backward,'5'),
INSTR(backward,'6'),
INSTR(backward,'7'),
INSTR(backward,'8'),
INSTR(backward,'9')
) AS endPos,
SUBSTRING(field, startPos, endPos - startPos + 1)
FROM tab
WHERE(field REGEXP '[0-9]{10,}')
but this isn't perfect - it would extract false substring for string like "ABC 9 A 1234567891", not to mention that it is probably so slooooow that it is faster to go througt data by hand.
SUBSTRING('FLEETWOOD DESIGNS 535353110XXXXX', 18, 32)
You could also use LEN() to get the length of the string itself. If you know the serial number length, you can just subtract that from the end index to get your start index of the substring.
It could be done like this
Declare #X varchar(100)
Select #X= 'Here is where15234Numbers'
--
Select #X= SubString(#X,PATINDEX('%[0-9]%',#X),Len(#X))
Select #X= SubString(#X,0,PATINDEX('%[^0-9]%',#X))
--// show result
Select #X