SQL Server Partial Match - sql

I have 2 columns that I am trying to see if there is a partial match between two strings. Column A has string: 0C000702AA-G and Column B has string S0C000702AB-DI. I did try:
CASE WHEN ColumnA LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'
but it returns a 0. Is there a better way to see if there is almost a match?
Column A = 0C000702AA-G and Column B = S0C000702AB-DI. As you can see Column B is almost the same as A, B has prefix of 'S' and ends with 'AB-DI'. The result should return 1 because the part in the middle '0C000702AA' is the same both sides.
I just tested:
CASE WHEN '%' + ColumnA + '%' LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'
Still returns 0

You can utilize DIFFERENCE function, which compares the SOUNDEX values of the two strings. If difference is 0, then no similarity. If the difference is 4, they are very similar. Read more about SOUNDEX and DIFFERENCE
CAVEAT: The comparison is based on how the strings sound. So, it will not be very suitable for your needs, as you have got a identifier kind of thing, including digits.
DECLARE #table table(columnA CHAR(100), ColumnB CHAR(100))
INSERT INTO #table values
('0C000702AA-G','S0C000702AB-DI')
SELECT SOUNDEX(ColumnA) as columnASoundex, SOUNDEX(columnB) as ColumnBSoundex,
DIFFERENCE(ColumnA,ColumnB) as Similarity from #table
columnASoundex
ColumnBSoundex
Similarity
0000
S000
3
But, if you want to go for even detailed comparison, you can use a CLR stored procedure leveraging C# fuzzy matching libraries like fuzzystring. Also refer to SO post fuzzy matching in C#
UPDATE As OP confirmed, the above approach works only in some cases. So, OP has to figure out a better approach, which would suit all his needs.

Related

Use a "case when in" expression where the list values are in a single field?

I'm using SQL Server and would like to check if todays day name is in a list of values in a single field/column.
An example of the column "start_days" contents is:
'Monday','Tuesday','Sunday'
'Thursday'
'Friday','Sunday'
'Tuesday','Sunday'
'Tuesday','Wednesday','Thursday','Friday'
The code I am trying to run on this is:
case
when datename(weekday,getdate()) in (start_days) then 1
else 0
end as today_flag
And the result is 0 for every row.
Am I doing something wrong here or is it just not possible to use a single field as a list of values in the statement?
As a starter: you should fix your data model and not store multiple values in a single column. Storing list of values in a database column basically defeats the purpopose of a relational database. Here is a related reading on that topic.
That said, here is one option using pattern matching:
case
when ',' + start_days + ',' like '%,' + datename(weekday,getdate()) + ',%' then 1
else 0
end as today_flag
If you really have single quotes around values within the list, then we need to include them in the match:
case
when ',' + start_days + ',' like '%,''' + datename(weekday,getdate()) + ''',%' then 1
else 0
end as today_flag
If the values always are weekdays, this can be simplified since there is no risk of overlapping values:
case
when start_days like '%''' + datename(weekday,getdate()) + '''%' then 1
else 0
end as today_flag
The right answer to the question is fixing your data modal. Storing multiple values like that will leads you to many issue and you're stuck on one right now.
Until that, you could use LIKE operaor to get the desired results as:
SELECT *, CASE WHEN
Value LIKE CONCAT('%', QUOTENAME(DATENAME(WEEKDAY,GETDATE()), ''''), '%')
THEN 1
ELSE 0
END
FROM
(
VALUES
('''Monday'',''Tuesday'',''Sunday'''),
('''Thursday'''),
('''Friday'',''Sunday'''),
('''Tuesday'',''Sunday'''),
('''Tuesday'',''Wednesday'',''Thursday'',''Friday''')
) T(Value)
Here is a db<>fiddle where you can see how it's working online.

SQL Server CASE statement with multiple THEN clauses

I have seen several similar questions but none cover what I need. I need to put another THEN statement after the first one. My column contains int's. When it returns NULL I need it to display a blank space, but when I try the below code, I just get '0'.
CASE
WHEN Column1 IS NULL
THEN ''
ELSE Column1
END
If I try to put a sting after THEN then it tells me that it cannot convert it from int. I need to convert it to varchar and then change its output to a blank space afterwards, such as:
e.g.
CASE
WHEN Column1 IS NULL
THEN CONVERT(varchar(10), Column1)
THEN ''
ELSE Column1
END
Is there a way of doing this?
Thanks
Rob
A case expression returns a single value -- with a given type. If you want a string result, then you need to be sure that all paths in the case return strings:
CASE WHEN Column1 IS NULL
THEN ''
ELSE CAST(Column1 AS VARCHAR(255))
END
This is more simply written using COALESCE():
COALESCE(CAST(Column1 as VARCHAR(255)), '')
You cannot display an integer as a "blank" (other than using a NULL value).

Removing Leading Zeros Part 2

SQL Server 2012
I have 3 columns in my table that will be using a function. '[usr].[Udf_OverPunch]'. and substring.
Here is my code:
[usr].[Udf_OverPunch](SUBSTRING(col001, 184, 11)) as REPORTED_GAP_DISCOUNT_PREVIOUS_AMOUNT
This function works appropriately for what I need it to do. It is basically converting symbols or letters to a designated number based on a data dictionary.
The problem I am having is that there are leading zeros. I just asked a questions about leading zeroes but it won't allow me to do it with the function columns because of the symbols cannot be converted to int.
This is what I am using to get rid of leading zeros (but leave one zero) in my code for the other columns:
cast(cast(SUBSTRING(col001, 217, 6) as int) as varchar(25)) as PREVIOUS_REPORTING_PERIOD
This works well at turning a value of '000000' to just one '0' or a value of '000060' to '60' but will not work with the function because of the symbol or letter (when trying to convert to int).
As I mentioned, I have 3 columns which produce values that look something like this when the function is not being used:
'0000019753{'
'0000019748G'
'0000019763H'
My goal here is to use the function while also removing the leading zeros (unless they are all zeros then keep one zero).
This is what I attempted that isn't working because the value contains a character that isn't an integer:
[usr].[Udf_OverPunch]cast(cast(SUBSTRING(col001, 184, 6) as int) as varchar(25)) as REPORTED_GAP_DISCOUNT_PREVIOUS_AMOUNT,
Please let me know if you have any ideas or need more information. :)
select case when col like '%[^0]%' then substring(col,patindex('%[^0]%',col),len(col)) when col like '%0%' then '0' else col end
from tab
or
select case when col like '%[^0]%' then right(col,len(ltrim(replace(col,'0',' ')))) when col like '%0%' then '0' else col end
from tab
I am handling such replacement with T-SQL CLR function that allows replacement using regular expressions. So, the solution will be like this:
[dbo].[fn_Utils_RegexReplace] ([value], '^0{1,}(?=.)', '')
You need to create such function because there are no regular expression support in T-SQL (build-in).
How to create regex replace function in T-SQL?
For example:
try this,
declare #i varchar(50)='0000019753}'--'0000019753'
select case when ISNUMERIC(#i)=1 then
cast(cast(#i as bigint) as varchar(50)) else #i end
or
[usr].[Udf_OverPunch]( case when ISNUMERIC(col001)=1 then
cast(cast(col001 as bigint) as varchar(50)) else col001 end)

sql, using a subquery in the like operator

why does this work,
select*
from some_table
WHERE some_column_name like '%i%'
and not this?
select*
from some_table
WHERE
some_column_name like (select ''''+'%' +value +'%' + '''' as val
from [dbo].[fn_Split](' i this is a test testing Chinese undefined',' ')
where idx = 0)
I am trying to search for individual words instead of the whole phrase, the split function above will split the string on space characters and plug the results into a table with two columns, idx and value.
the LIKE operator takes a string for an argument. It cannot be used on a table, which I assume your function returns.
I think what you want to do is JOIN to the function, and then check where LIKE fn.Value:
select *
from some_table t
INNER JOIN (select value as val
from [dbo].[fn_Split](' i this is a test testing Chinese undefined',' ')
where idx = 0) f
ON t.some_column_name like '%'+f.val+'%'
If your subquery is guaranteed to only return one result, you could try putting the modulo symbols around it instead of inside it:
LIKE '%' + (YourSubQuery) + '%'
One possible reason is because you are appending single quotes onto the beginning and end of the string, and none of the values actually store single quotes in the string.
Another reason is might not work is because the subquery returns more than one row or zero rows. The function fn_split() is your own function, so I don't know what it returns. You have a subquery in a context where it can return at most one row and one column. That is called a scalar subquery. If the subquery returns more than one row, you will get an error. If the subquery returns no rows -- for instance, if idx starts counting at 1 rather than 0 -- then it will return NULL which fails the test.
If you want to find a match this way, I would recommend exists:
select t.*
from some_table t
where exists (select 1 as val
from [dbo].[fn_Split](' i this is a test testing Chinese undefined',' ') s
where s.idx = 0 and
t.some_column_name like '%' + value + '%'
);
The results of your sub-query is a literal string. The % symbol isn't seen as a wildcard. Also, does your functions return multiple rows? If so, LIKE operator can only evaluate a single value.
If your functions does return a single value, I would suggest looking into using Dynamic SQL. Something like the following:
DECLARE #SQL VARCHAR(MAX), #WildCard VARCHAR(MAX)
SELECT #WildCard = '%' + value + '%'
FROM [dbo].[fn_Split](' i this is a test testing Chinese undefined',' ')
WHERE idx = 0
SET #SQL = 'SELECT * FROM some_table WHERE some_column_name like ''' + #WildCardWildCard + ''''
EXEC(#SQL)

Fastest way to compare substring property between two strings in sql server

Given two strings A and B, what is the fastest way to compare whether A is a substring of B or B is a substring of A?
A LIKE '%' + B + '%' OR B LIKE '%' + A + '%'
or
CHARIDNEX(A,B) <> 0 OR CHARINDEX(B,A) <> 0
I believe its the former because it doesnt calculate the location.
Question 1: is there a faster way to do it because I want to minimize the number of times B has to be used as B is a string I get by processing another column value.
As an additional note,
Basically I want to do something as follows with a column, C
SELECT
CASE WHEN A LIKE Processing(C) THEN 0
WHEN A LIKE '%' + PROCESSING(C) + '%' OR PROCESSING(C) LIKE '%' + A + '%' THEN LEN(A) - LEN(PROCESSING(C))
END AS Score
FROM #table
where A and C are columns in table, #table. As can be seen, the number of times I am calling Processing(C) is huge as it is done for each record.
Question 2: Should I put Processing(C) it in a separate temp table and then run substring check against that column or continue with the same approach.
My guess is that charindex() and like would have similar performance in this case. Don't hesitate to test which is faster (and report back on the results so we can all learn).
However, this particular optimization probably won't make a difference to the overall query. Your question may be an example of premature optimization.
Once upon a time, I thought that like performed worse than the comparable string operation. However, like is optimized in many databases, including SQL Server. As an example of the optimization, like is able to use indexes (when there is no wildcard or the wildcard is at the end). charindex() does not use indexes. If you are looking for matches at the beginning of the respective strings, then your query could possibly take advantage of indexes.
EDIT:
For your concern about PROCESSING(c), you might consider a subquery:
SELECT (CASE WHEN A LIKE Processing_C THEN 0
WHEN A LIKE '%' + Processing_C + '%' OR Processing_C LIKE '%' + A + '%'
THEN LEN(A) - LEN(Processing_C)
END) AS Score
FROM (select t.*, PROCESSING(C) as Processing_C
from #table
) t