SQL Server search using like while ignoring blank spaces - sql

I have a phone column in the database, and the records contain unwanted spaces on the right. I tried to use trim and replace, but it didn't return the correct results.
If I use
phone like '%2581254%'
it returns
customerid
-----------
33470
33472
33473
33474
but I need use percent sign or wild card in the beginning only, I want to match the left side only.
So if I use it like this
phone like '%2581254'
I get nothing, because of the spaces on the right!
So I tried to use trim and replace, and I get one result only
LTRIM(RTRIM(phone)) LIKE '%2581254'
returns
customerid
-----------
33474
Note that these four ids have same phone number!
Table data
customerid phone
-------------------------------------
33470 96506217601532388254
33472 96506217601532388254
33473 96506217601532388254
33474 96506217601532388254
33475 966508307940
I added many number for test propose
The php function takes last 7 digits and compare them.
For example
01532388254 will be 2581254
and I want to search for all users that has this 7 digits in their phone number
2581254
I can't figure out where's the problem!
It should return 4 ids instead of 1 id

Given the sample data, I suspect you have control characters in your data. For example char(13), char(10)
To confirm this, just run the following
Select customerid,phone
From YourTable
Where CharIndex(CHAR(0),[phone])+CharIndex(CHAR(1),[phone])+CharIndex(CHAR(2),[phone])+CharIndex(CHAR(3),[phone])
+CharIndex(CHAR(4),[phone])+CharIndex(CHAR(5),[phone])+CharIndex(CHAR(6),[phone])+CharIndex(CHAR(7),[phone])
+CharIndex(CHAR(8),[phone])+CharIndex(CHAR(9),[phone])+CharIndex(CHAR(10),[phone])+CharIndex(CHAR(11),[phone])
+CharIndex(CHAR(12),[phone])+CharIndex(CHAR(13),[phone])+CharIndex(CHAR(14),[phone])+CharIndex(CHAR(15),[phone])
+CharIndex(CHAR(16),[phone])+CharIndex(CHAR(17),[phone])+CharIndex(CHAR(18),[phone])+CharIndex(CHAR(19),[phone])
+CharIndex(CHAR(20),[phone])+CharIndex(CHAR(21),[phone])+CharIndex(CHAR(22),[phone])+CharIndex(CHAR(23),[phone])
+CharIndex(CHAR(24),[phone])+CharIndex(CHAR(25),[phone])+CharIndex(CHAR(26),[phone])+CharIndex(CHAR(27),[phone])
+CharIndex(CHAR(28),[phone])+CharIndex(CHAR(29),[phone])+CharIndex(CHAR(30),[phone])+CharIndex(CHAR(31),[phone])
+CharIndex(CHAR(127),[phone]) >0
If the Test Results are Positive
The following UDF can be used to strip the control characters from your data via an update
Update YourTable Set Phone=[dbo].[udf-Str-Strip-Control](Phone)
The UDF if Interested
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](#S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select #S = Replace(#S,C,' ')
From cte2
Return LTrim(RTrim(Replace(Replace(Replace(#S,' ','><'),'<>',''),'><',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName
As promised (and nudged by Bill), the following is a little commentary on the UDF.
We pass a string that we want stripped of Control Characters
We create an ad-hoc tally table of ascii characters 0 - 31
We then run a global search-and-replace for each character in the
tally-table. Each character found will be replaced with a space
The final string is stripped of repeating spaces (a little trick
Gordon demonstrated several weeks ago - don't have the original
link)

Related

Need Help Using CHARINDEX to Create a New Variable

I'm trying to take the second number in the variable description and make a new variable. However, I'm not too familiar with string manipulation. Below is a small example of the variable description, I'm having a hard time formatting it because the first number changes in size.
Description
4 Matching notifications 11 Updates
32 Matching notifications 12 Updates
1211 Matching notifications 1 Updates
Below this is a rough idea of the code I thought would originally work.
SELECT
LEFT(Description, CHARNDEX('Updates', Description)-1) AS Second_variable
FROM X
This is a bit tricky in SQL Server. One method is:
select s.value as second_variable
from t cross apply
(select top (1) s.value
from string_split(description, ' ') s
where description like concat('% ', s.value, ' Updates')
) s;
This extracts the individual "words" from the string and then chooses the one that matches the value before the last ' Updates'.

T-SQL CONTAINS with numbers and dots (.)

Let's consider User.Note = 'Version:3.7.21.1'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7.2*"')
=> returns something
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
If User.Note = 'Version:3.7.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns something
If User.Note = 'Version:3.72.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
I can't figure out how it works. It should always returns something when I search for "3.7*".
Do you know what's the logic behind this ?
PS: if I replace the numbers by letters, there's no problem.
I think your problem is being caused by the unpredictability of the word breaker interacting with the punctuation marks within the data. Full text search is based on the concept of strings of characters, not including spaces and punctuation. When the engine is building the index it sees the periods and breaks the word in weird ways.
As an example, I made a small table with the three values you provided...
VALUES (1,'3.7.21.1'),(2,'3.7.21'),(3,'3.72.21')
Now when I do your selects, I get results on all four... not the results I expect, though.
For me, this returns all three values
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7.2*"')
and this returns only 3.7.21
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7*"')
So let's run this and take a look at the contents of the full text index
SELECT * FROM sys.dm_fts_index_keywords(db_id('{databasename}'), object_id('{tablename}'))
For my results (yours are quite probably different) I've got the following display_term values
display_term document_count
21 3
3 3
3.7.21 1
7 2
72 1
So let's look at the first search criterion '"3.7.2*"'
If I shove that into sys.dm_fts_parser...
select * from sys.dm_fts_parser('"3.7.2*"', 1033, NULL, 0)
...it's showing me that it's breaking with matches on
3
7
2
But if I do...
select * from sys.dm_fts_parser('"3.7*"', 1033, NULL, 0)
I'm getting a single exact match on the term 3.7 and sys.dm_fts_index_keywords told me earlier that I only have one document/row that contains 3.7
You might also experience additional weirdness because numbers 0-9 are usually in the system stopwords and can be left out of an index because they're considered to be useless. This might be why it works when you change to letters.
Also, I know you've decided to replace LIKE, but Microsoft has suggested that you only use alphanumeric characters in your full text indexes and, if you need to use non-alphanumeric characters in search criteria, you should use LIKE. Perhaps changing the periods to some alphanumeric replacement that won't be used in normal values?
Contains will only work if the column is in a full text index. If it it is not indexed you will need to use like:
SELECT * FROM [USER] WHERE NOTE like '3.7%' --or '%3.7%
Are you wanting to use CONTAINS because you think it will be faster?(It generally is)
The Microsoft document lists all the ways you can format and use CONTAINS(11 examples)
Here is the Microsoft doc on CONTAINS

Pull 3 digits from middle of id and sort by even odd

I have file ids in my database that start with:
a single character prefix
a period
a three digit client id
a hyphen
a three digit file number.
Example F.129-123
We have several ids for each client.
I need to be able to strip out the three digit file number and then pull them based on even or odd so that I can assign specific data to each result population.
One added issue. Some of the ids have characters added at the end.
Example: F.129-123A or F.129-123.NF
So I need to be able to just use the three digit file number without any other characters, because the added characters create errors while conversion.
If you are using SQL SERVER,
you can use CHARINDEX() to find the index of - and then
get 3 digits after - using SUBSTRING()
SELECT substring('F.123-234',charindex('-','F.123-234')+1, 3)
If you are using MySQL,
you can use POSITION() to find the index of - and then get 3 digits after - using SUBSTRING()
SELECT SUBSTRING('F.123-234',POSITION( '-' IN 'F.123-234' )+1,3);
If you are using Oracle,
you can use INSTR() to find the index of - and then get 3 digits after - using SUBSTR()
UPDATES:
Based on the requirements in comments, you can use a query like below achieve what you need.
SELECT
SUBSTRING(MatterID,CHARINDEX('-',MatterID)+1, 3) as FileNo
FROM
Matters
WHERE
MatterID LIKE'f.129%'
AND MatterID NOT LIKE '%col%'
AND substring( MatterID, CHARINDEX('-',MatterID)+1, 3) % 2 = 0
If you are working with Microsoft SQL Server, then you could use of patindex() function with substring() function to get the only 3 digits file number
select left(substring(string, PATINDEX('%[0-9][-]%', string)+2, LEN(string)), 3)
Note that if you have other period (i.e. -, /) then you will need to modify chars like PATINDEX('%[0-9][/]%')
In Postgres you can use split_part() to get the part after the hyphen, then cast it to an integer:
select *
from the_table
order by split_part(file_id, '-', 2)::int;
This assumes that there is always exactly one - in the string. I understand your question that this is the case as the format is fixed.
Is this helpful
Create table #tmpFileNames(id int, FileName VARCHAR(50))
insert into #tmpFileNames values(1,'F.129-123')
insert into #tmpFileNames values(2,'F.129-125')
insert into #tmpFileNames values(3,'F.129-124')
insert into #tmpFileNames values(4,'F.129-123A')
insert into #tmpFileNames values(5,'F.129-124B')
insert into #tmpFileNames values(6,'F.129-125.PQ')
insert into #tmpFileNames values(7,'F.129-123.NF')
select SUBSTRING(STUFF(FileName, 1, CHARINDEX('-',FileName), ''),0,4), * from #tmpFileNames
Order by SUBSTRING(STUFF(FileName, 1, CHARINDEX('-',FileName), ''),0,4),id
Drop table #tmpFileNames

Query for blank white space before AND after a number string

How would i go about constructing a query, that would return all material numbers that have a "blank white space" either BEFORE or AFTER the number string? We are exporting straight from SSMS to excel and we see the problem in the spreadsheet. If i could return all of the material numbers with spaces.. i could go in and edit them or do a replace to fix this issue prior to exporting! (the mtrl numbers are imported in via a windows application that users upload an excel template to. This template has all of this data and sometimes they place in spaces in or after the material number). The query we have used to work but now it does not return anything, but upon export we identify these problems you see highlighted in the screenshot (left screenshot) and then query to find that mtrl # in the table (right screenshot). And indeed, it has a space before the 1.
Currently the query we use looks like:
SELECT Mtrl
FROM dbo.Source
WHERE Mtrl LIKE '% %'
Since you are getting the data from a query, you should just have that query remove any potential spaces using LTRIM and RTRIM:
LTRIM(RTRIM([MTRL]))
Keep in mind that these two commands remove only spaces, not tabs or returns or other white-space characters.
Doing the above will make sure that the data for the entire set of data is fine, whether or not you find it and/or fix it.
Or, since you are copying-and-pasting from the Results Grid into Excel, you can just CONVERT the value to a number which will naturally remove any spaces:
SELECT CONVERT(INT, ' 12 ');
Returns:
12
So you would just use:
CONVERT(INT, [MRTL])
Now, if you want to find the data that has anything that is not a digit in it, you would use this:
SELECT Mtrl
FROM dbo.Source
WHERE [Mtrl] LIKE '%[^0-9]%'; -- any single non-digit character
If the issue is with non-space white-space characters, you can find out which ones they are via the following (to find them at the beginning instead of at the end, change the RIGHT to be LEFT):
;WITH cte AS
(
SELECT UNICODE(RIGHT([MTRL], 1)) AS [CharVal]
FROM dbo.Source
)
SELECT *
FROM cte
WHERE cte.[CharVal] NOT BETWEEN 48 AND 57 -- digits 0 - 9
AND cte.[CharVal] <> 32; -- space
And you can fix in one shot using the following, which removes regular spaces (char 32 via LTRIM/RTRIM), tabs (char 9), and non-breaking spaces (char 160):
UPDATE src
SET src.[Mtrl] = REPLACE(
REPLACE(
LTRIM(RTRIM(src.[Mtrl])),
CHAR(160),
''),
CHAR(9),
'')
FROM dbo.Source src
WHERE src.[Mtrl] LIKE '%[' -- find rows with any of the following characters
+ CHAR(9) -- tab
+ CHAR(32) -- space
+ CHAR(160) -- non-breaking space
+ ']%';
Here I used the same WHERE condition that you have since if there can't be any spaces then it doesn't matter if you check both ends or for any at all (and maybe it is faster to have a single LIKE instead of two).

How do I modify these group of functions to get everything left of #?

I have a column that has values stored in the following format:
name#URL
All data is stored with this and a second # is never present.
I've got the following statement that strips the URL from this column:
SELECT SUBSTRING ( wf_name ,PATINDEX ( '%#%' , wf_name )+1 , LEN(wf_name)-(PATINDEX ( '%#%' , wf_name )) )
However I want to take the name also (everything left of the #). Unfortuantely I don't understand the functions I'm using above (having read the documentation I'm still confused). Could somebody please help me to understand the flow and how I can adjust this to get everything left of #?
Have a look at the following example
; WITH Table1 AS (
SELECT 'TADA#TEST' AS NameURL
)
SELECT *,
LEFT(NameURL,PATINDEX('%#%',NameURL) - 1) LeftText,
RIGHT(NameURL,PATINDEX('%#%',NameURL)- 1) RightText
FROM Table1
SQL Fiddle DEMO
Using functions
PATINDEX (Transact-SQL)
Returns the starting position of the first occurrence of a pattern in
a specified expression, or zeros if the pattern is not found, on all
valid text and character data types.
LEFT (Transact-SQL)
Returns the left part of a character string with the specified number
of characters.
RIGHT (Transact-SQL)
Returns the right part of a character string with the specified number
of characters.