Fuzzy logic matching - sql

So, I'm looking at implementing Fuzzy logic matching in my company and having trouble getting good results. For starters, I'm trying to match up Company names with those on a list supplied by other companies.
My first attempt was to use soundex, but it looks like soundex only compares the first few sounds in the company name, so longer company names were too easily confused for one another.
I'm now working on my second attempt using the levenstein distance comparison. It looks promising, especially if I remove the punctuation first. However, I'm still having trouble finding duplicates without too many false positives.
One of the issues I have is companies such as widgetsco vs widgets inc. So, if I compare the substring of the length of the shorter name, I also pickup things like BBC University and CBC University campus. I suspect that a score using a combination of distance and longest common substring may be the solution.
Has anyone managed to build an algorithm that does such a matching with limited false positives?

We have had good results on name and address matching using a Metaphone function created by Lawrence Philips. It works in a similar way to Soundex, but creates a sound/consonant pattern for the whole value. You may find this useful in conjunction with some other techniques, especially if you can strip some of the fluff like 'co.' and 'inc.' as mentioned in other comments:
create function [dbo].[Metaphone](#str as nvarchar(70), #KeepNumeric as bit = 0)
returns nvarchar(25)
/*
Metaphone Algorithm
Created by Lawrence Philips.
Metaphone presented in article in "Computer Language" December 1990 issue.
*********** BEGIN METAPHONE RULES ***********
Lawrence Philips' RULES follow:
The 16 consonant sounds:
|--- ZERO represents "th"
|
B X S K J T F H L M N P R 0 W Y
Drop vowels
Exceptions:
Beginning of word: "ae-", "gn", "kn-", "pn-", "wr-" ----> drop first letter
Beginning of word: "wh-" ----> change to "w"
Beginning of word: "x" ----> change to "s"
Beginning of word: vowel or "H" + vowel ----> Keep it
Transformations:
B ----> B unless at the end of word after "m", as in "dumb", "McComb"
C ----> X (sh) if "-cia-" or "-ch-"
S if "-ci-", "-ce-", or "-cy-"
SILENT if "-sci-", "-sce-", or "-scy-"
K otherwise
K "-sch-"
D ----> J if in "-dge-", "-dgy-", or "-dgi-"
T otherwise
F ----> F
G ----> SILENT if "-gh-" and not at end or before a vowel
"-gn" or "-gned"
"-dge-" etc., as in above rule
J if "gi", "ge", "gy" if not double "gg"
K otherwise
H ----> SILENT if after vowel and no vowel follows
or "-ch-", "-sh-", "-ph-", "-th-", "-gh-"
H otherwise
J ----> J
K ----> SILENT if after "c"
K otherwise
L ----> L
M ----> M
N ----> N
P ----> F if before "h"
P otherwise
Q ----> K
R ----> R
S ----> X (sh) if "sh" or "-sio-" or "-sia-"
S otherwise
T ----> X (sh) if "-tia-" or "-tio-"
0 (th) if "th"
SILENT if "-tch-"
T otherwise
V ----> F
W ----> SILENT if not followed by a vowel
W if followed by a vowel
X ----> KS
Y ----> SILENT if not followed by a vowel
Y if followed by a vowel
Z ----> S
*/
as
begin
declare #Result varchar(25)
,#str3 char(3)
,#str2 char(2)
,#str1 char(1)
,#strp char(1)
,#strLen tinyint
,#cnt tinyint
set #strLen = len(#str)
set #cnt = 0
set #Result = ''
-- Preserve first 5 numeric values when required
if #KeepNumeric = 1
begin
set #Result = case when isnumeric(substring(#str,1,1)) = 1
then case when isnumeric(substring(#str,2,1)) = 1
then case when isnumeric(substring(#str,3,1)) = 1
then case when isnumeric(substring(#str,4,1)) = 1
then case when isnumeric(substring(#str,5,1)) = 1
then left(#str,5)
else left(#str,4)
end
else left(#str,3)
end
else left(#str,2)
end
else left(#str,1)
end
else ''
end
set #str = right(#str,len(#str)-len(#Result))
end
--Process beginning exceptions
set #str2 = left(#str,2)
if #str2 = 'wh'
begin
set #str = 'w' + right(#str , #strLen - 2)
set #strLen = #strLen - 1
end
else
if #str2 in('ae', 'gn', 'kn', 'pn', 'wr')
begin
set #str = right(#str , #strLen - 1)
set #strLen = #strLen - 1
end
set #str1 = left(#str,1)
if #str1 = 'x'
set #str = 's' + right(#str , #strLen - 1)
else
if #str1 in ('a','e','i','o','u')
begin
set #str = right(#str, #strLen - 1)
set #strLen = #strLen - 1
set #Result = #Result + #str1
end
while #cnt <= #strLen
begin
set #cnt = #cnt + 1
set #str1 = substring(#str,#cnt,1)
set #strp = case when #cnt <> 0
then substring(#str,(#cnt-1),1)
else ' '
end
-- Check if the current character is the same as the previous character.
-- If we are keeping numbers, only compare non-numeric characters.
if case when #KeepNumeric = 1 and #strp = #str1 and isnumeric(#str1) = 0 then 1
when #KeepNumeric = 0 and #strp = #str1 then 1
else 0
end = 1
continue -- Skip this loop
set #str2 = substring(#str,#cnt,2)
set #Result = case when #KeepNumeric = 1 and isnumeric(#str1) = 1
then #Result + #str1
when #str1 in('f','j','l','m','n','r')
then #Result + #str1
when #str1 = 'q'
then #Result + 'k'
when #str1 = 'v'
then #Result + 'f'
when #str1 = 'x'
then #Result + 'ks'
when #str1 = 'z'
then #Result + 's'
when #str1 = 'b'
then case when #cnt = #strLen
then case when substring(#str,(#cnt - 1),1) <> 'm'
then #Result + 'b'
else #Result
end
else #Result + 'b'
end
when #str1 = 'c'
then case when #str2 = 'ch' or substring(#str,#cnt,3) = 'cia'
then #Result + 'x'
else case when #str2 in('ci','ce','cy') and #strp <> 's'
then #Result + 's'
else #Result + 'k'
end
end
when #str1 = 'd'
then case when substring(#str,#cnt,3) in ('dge','dgy','dgi')
then #Result + 'j'
else #Result + 't'
end
when #str1 = 'g'
then case when substring(#str,(#cnt - 1),3) not in ('dge','dgy','dgi','dha','dhe','dhi','dho','dhu')
then case when #str2 in('gi', 'ge','gy')
then #Result + 'j'
else case when #str2 <> 'gn' or (#str2 <> 'gh' and #cnt+1 <> #strLen)
then #Result + 'k'
else #Result
end
end
else #Result
end
when #str1 = 'h'
then case when #strp not in ('a','e','i','o','u') and #str2 not in ('ha','he','hi','ho','hu')
then case when #strp not in ('c','s','p','t','g')
then #Result + 'h'
else #Result
end
else #Result
end
when #str1 = 'k'
then case when #strp <> 'c'
then #Result + 'k'
else #Result
end
when #str1 = 'p'
then case when #str2 = 'ph'
then #Result + 'f'
else #Result + 'p'
end
when #str1 = 's'
then case when substring(#str,#cnt,3) in ('sia','sio') or #str2 = 'sh'
then #Result + 'x'
else #Result + 's'
end
when #str1 = 't'
then case when substring(#str,#cnt,3) in ('tia','tio')
then #Result + 'x'
else case when #str2 = 'th'
then #Result + '0'
else case when substring(#str,#cnt,3) <> 'tch'
then #Result + 't'
else #Result
end
end
end
when #str1 = 'w'
then case when #str2 not in('wa','we','wi','wo','wu')
then #Result + 'w'
else #Result
end
when #str1 = 'y'
then case when #str2 not in('ya','ye','yi','yo','yu')
then #Result + 'y'
else #Result
end
else #Result
end
end
return #Result
end

You want to use something like Levenshtein Distance or another string comparison algorithm. You may want to take a look at this project on Codeplex.
http://fuzzystring.codeplex.com/

Are you using Access? If so, consider the '*' character, without the quotes. If you're using SQL Server, use the '%' character. However, this really isn't fuzzy logic, it's really the Like operator. If you really need fuzzy logic, export your data-set to Excel and load the AddIn from the URL below.
https://www.microsoft.com/en-us/download/details.aspx?id=15011
Read the instructions very carefully. It definitely works, and it works great, but you need to follow the instructions, and it's not completely intuitive. The first time I tried it, I didn't follow the instructions, and I wasted a lot of time trying to get it to work. Eventually I figured it out, and it worked great!!

I found success implementing a function I found here on Stack Overflow that would find the percentage of strings that match. You can then adjust tolerance till you get an appropriate amount of matches/mismatches. The function implementation will be listed below, but the gist is including something like this in your query.
DECLARE #tolerance DEC(18, 2) = 50;
WHERE dbo.GetPercentageOfTwoStringMatching(first_table.name, second_table.name) > #tolerance
Credit for the following percent matching function goes to Dragos Durlut, Dec 15 '11.
The credit for the LEVENSHTEIN function was included in the code by Dragos Durlut.
T-SQL Get percentage of character match of 2 strings
CREATE FUNCTION [dbo].[GetPercentageOfTwoStringMatching]
(
#string1 NVARCHAR(100)
,#string2 NVARCHAR(100)
)
RETURNS INT
AS
BEGIN
DECLARE #levenShteinNumber INT
DECLARE #string1Length INT = LEN(#string1)
, #string2Length INT = LEN(#string2)
DECLARE #maxLengthNumber INT = CASE WHEN #string1Length > #string2Length THEN #string1Length ELSE #string2Length END
SELECT #levenShteinNumber = [dbo].[LEVENSHTEIN] ( #string1 ,#string2)
DECLARE #percentageOfBadCharacters INT = #levenShteinNumber * 100 / #maxLengthNumber
DECLARE #percentageOfGoodCharacters INT = 100 - #percentageOfBadCharacters
-- Return the result of the function
RETURN #percentageOfGoodCharacters
END
-- =============================================
-- Create date: 2011.12.14
-- Description: http://blog.sendreallybigfiles.com/2009/06/improved-t-sql-levenshtein-distance.html
-- =============================================
CREATE FUNCTION [dbo].[LEVENSHTEIN](#left VARCHAR(100),
#right VARCHAR(100))
returns INT
AS
BEGIN
DECLARE #difference INT,
#lenRight INT,
#lenLeft INT,
#leftIndex INT,
#rightIndex INT,
#left_char CHAR(1),
#right_char CHAR(1),
#compareLength INT
SET #lenLeft = LEN(#left)
SET #lenRight = LEN(#right)
SET #difference = 0
IF #lenLeft = 0
BEGIN
SET #difference = #lenRight
GOTO done
END
IF #lenRight = 0
BEGIN
SET #difference = #lenLeft
GOTO done
END
GOTO comparison
COMPARISON:
IF ( #lenLeft >= #lenRight )
SET #compareLength = #lenLeft
ELSE
SET #compareLength = #lenRight
SET #rightIndex = 1
SET #leftIndex = 1
WHILE #leftIndex <= #compareLength
BEGIN
SET #left_char = substring(#left, #leftIndex, 1)
SET #right_char = substring(#right, #rightIndex, 1)
IF #left_char <> #right_char
BEGIN -- Would an insertion make them re-align?
IF( #left_char = substring(#right, #rightIndex + 1, 1) )
SET #rightIndex = #rightIndex + 1
-- Would an deletion make them re-align?
ELSE IF( substring(#left, #leftIndex + 1, 1) = #right_char )
SET #leftIndex = #leftIndex + 1
SET #difference = #difference + 1
END
SET #leftIndex = #leftIndex + 1
SET #rightIndex = #rightIndex + 1
END
GOTO done
DONE:
RETURN #difference
END
Note: If you need to compare two or more fields (which I don't think you do) you can add another call to the function in the WHERE clause with a minimum tolerance. I also found success averaging the percentMatching and comparing it against a tolerance.
DECLARE #tolerance DEC(18, 2) = 25;
--could have multiple different tolerances for each field (weighting some fields as more important to be matching)
DECLARE #avg_tolerance DEC(18, 2) = 50;
WHERE AND dbo.GetPercentageOfTwoStringMatching(first_table.name, second_table.name) > #tolerance
AND dbo.GetPercentageOfTwoStringMatching(first_table.address, second_table.address) > #tolerance
AND (dbo.GetPercentageOfTwoStringMatching(first_table.name, second_table.name)
+ dbo.GetPercentageOfTwoStringMatching(first_table.address, second_table.address)
) / 2 > #avg_tolerance
The benefit of this solution is the tolerance variables can be specific per field (weighting the importance of certain fields matching) and the average can insure general matching across all fields.

Firstly, I suggest, you make sure that you can't match on any other attribute and company names are all you have(because fuzzy matching is bound to give you some false positives). If you want to go ahead with fuzzy matching you could use the following steps:
Remove all stop words from the text. For example : Co, Inc etc.
If your database is very large, make use of an indexing method such as blocking or sorted neighbourhood indexing.
Finally compute the fuzzy score using the Levenshtein distance. You could use the token_set_ratio or partial_ratio functions in Fuzzywuzzy.
Also, I found the following video which aims to solve the same problem: https://www.youtube.com/watch?v=NRAqIjXaZvw
The Nanonets blog also contains several resources on the subject that could potentially be helpful.

Related

Replace the alternate occurances of a substring

My input strings are like:
A or B OR C or D OR E or F
A OR B OR C OR D OR E OR F
Expected Output: 'A or B' OR 'C or D' OR 'E or F'
outputString = '''' + REPLACE(#inputValue COLLATE Latin1_General_CS_AS, ' OR ' COLLATE Latin1_General_CS_AS, ''' OR ''') + ''''
I tried using SQL Replace function and the above statement works properly for the first string and I get the desired output but for the second string since we have all the ORs in the uppercase it fails and returns 'A' OR 'B' OR 'C' OR 'D' OR 'E' OR 'F'
I'm using SSMS 15.0.
How can I solve this problem? Any help will be appreciated.
Here's a solution that uses a UDF.
The function splits a string on a pattern as a resultset.
(similar as the STRING_SPLIT function, but with a pattern)
The FOR XML trick is then used to construct a string from the splitted parts, and to add the quotes.
DECLARE #vchNewValue VARCHAR(100), #result VARCHAR(100);
SET #vchNewValue = 'A OR B or C OR D or E OR F';
SET #result = LTRIM(RTRIM((
SELECT
CASE WHEN match = 1
THEN ' '+quotename(ltrim(rtrim(replace(value,' OR ',' or ') )),'''')+' '
ELSE UPPER(value)
END
FROM dbo.fnPattern_Split(' '+#vchNewValue+' ', ' % OR % ') AS spl
ORDER BY ordinal
FOR XML PATH(''), TYPE).value(N'./text()[1]', N'nvarchar(max)')
));
SELECT #result AS result;
result
'A or B' OR 'C or D' OR 'E or F'
Test db<>fiddle here
The UDF
Uses PATINDEX to find each next start position of the given pattern in the string.
Then finds the nearest end position where the pattern is still valid.
So it's kinda like a lazy search in regex.
The positions are then used to insert the parts into the returned table.
CREATE FUNCTION dbo.fnPattern_Split
(
#str VARCHAR(MAX),
#pattern VARCHAR(100)
)
RETURNS #tbl TABLE (
ordinal INT,
value VARCHAR(MAX),
match BIT
)
WITH SCHEMABINDING
AS
BEGIN
DECLARE #value NVARCHAR(MAX)
, #splitvalue NVARCHAR(MAX)
, #startpos INT = 0
, #endpos INT = 0
, #ordinal INT = 0
, #foundend BIT = 0
, #patminlen INT = ISNULL(NULLIF(LEN(REPLACE(#pattern,'%','')),0),1);
WHILE (LEN(#str) > 0)
BEGIN
SET #startpos = ISNULL(NULLIF(PATINDEX('%'+#pattern+'%', #str),0), LEN(#str)+1);
IF #startpos < LEN(#str)
BEGIN
SET #foundend = 0;
SET #endpos = #startpos+#patminlen-1;
WHILE #endpos < LEN(#str) AND #foundend = 0
BEGIN
IF SUBSTRING(#str, #startpos, 1+#endpos-#startpos) LIKE #pattern
SET #foundend = 1;
ELSE
SET #endpos += 1;
END
END
ELSE SET #endpos = LEN(#str);
IF #startpos > 1
BEGIN
SET #ordinal += 1;
SET #value = LEFT(#str, #startpos-1);
INSERT INTO #tbl (ordinal, value, match)
VALUES (#ordinal, #value, 0);
END
IF #endpos >= #startpos
BEGIN
SET #ordinal += 1;
SET #splitvalue = SUBSTRING(#str, #startpos, 1+#endpos-#startpos);
INSERT INTO #tbl (ordinal, value, match)
VALUES (#ordinal, #splitvalue, 1);
END
SET #str = SUBSTRING(#str, #endpos+1, LEN(#str));
END;
RETURN;
END;
A recursive solution that stuffs the quotes.
The recursive CTE loops through the string while finding the start positions of the ' or ' patterns.
Since ' or ' has 4 characters, having a start position means you also have the end position.
TheSTUFF function can insert characters in a string on positions.
So the positions are used to stuff the quotes where needed.
Which is every even occurence (modulus 2 of lvl is 0).
declare #input varchar(100)
, #result varchar(100);
set #input = 'A OR B or C OR D or E OR F';
set #result = #input;
with rcte as (
select 1 as lvl
, charindex(' or ', #input) as pos
, len(#input) as max_pos
union all
select lvl+1
, isnull(nullif(charindex(' or ', #input, pos+4), 0), max_pos)
, max_pos
from rcte
where pos < max_pos
)
select #result = stuff(stuff(#result,pos+4,0,''''),pos,0,'''')
from rcte
where lvl%2 = 0 and pos+4 < max_pos
order by lvl desc;
SET #result = ''''+#result+'''';
SET #result = REPLACE(REPLACE(#result,' OR ',' or '),''' or ''',''' OR ''');
select #result as result;
result
'A or B' OR 'C or D' OR 'E or F'
Test on db<>fiddle here
This solution comes with a high cost on server load since while is used.
declare #input varchar(100)
set #input = 'A or B or C or D or E or F or G or'
declare #inc int = 1, #end int = 1
,#final varchar(100) = '', #part varchar(100)
,#nextposition varchar(100), #or varchar(10)= ''
,#last varchar(10), #ifendsOR varchar(10)
select #nextposition = case when #input like '%or' then substring(#input,1,len(#input)-2) else #input end
select #ifendsOR = case when #input like '%or' then ' or' else '' end
select #last = ltrim(rtrim(right(#nextposition,2)))
while #end <> 0
begin
select #part = substring(#nextposition,1,charindex('or',#nextposition)-2)
select #nextposition = replace(#nextposition,concat(#part,' or '),'')
set #end = charindex('or',#nextposition)
select #or = case when #inc%2 = 0 then ' OR ' else ' or ' end
set #inc = #inc+1
set #final = concat(#final,#part,#or)
end
select #ifendsOR = case when #inc%2 = 0 then upper(#ifendsOR) else #ifendsOR end
select concat(#final,#last,#ifendsOR)

SQL Server stored procedure: finding the values of Odd and even

I'm using #subno as Input. And I had to find the odd and even numbers. 16 is not a fix and it can be any other number. My question is how to find the odd and even number of my input?
Lastly, subno includes ( . ) dot at any position. e.g 123456.789123 I need to find the odd and even number of "123456" and the odd and even number of "789123" the dot ( . ) is the separator.
Once you find the odd for the left side, sum them up together. Once you find the even number for left side sum it up as well and then add the total odd to the total even values. That goes the same for the right side eg "789123".
Please help me. this is my 2nd week of trying to find the solution. Once you find all the total values for each side, multiply them together. example "123456" - total value of odd and even * the "789123" total value of odd and even.
It is for the the check digit validation. Validating the subscriber number. after validating through the calculation it should match the calculated reference number to the valid check digit number. It's the business rule. Kind of algorithm
create procedure ProcedureName
(#subno VARCHAR(16), --Input the 16 subscriber number
#result INT OUT,
)
as
begin
IF(LEN(#subno) <> 16)
SET #result = 1 -- INVALID RESULT
ELSE
IF(#subno % 2 = 0)
SET #result = #subno - even numbers
ELSE
SET #result = #subno --odd numbers
end
Please see below my sample work
-- this is the sample
create procedure ProcedureName
(
#subno VARCHAR(20), --Subscriber no
#result INT OUT, --result is invalid for 1, valid for 0
#payamt int
)
as
DECLARE #WA VARCHAR(2)
DECLARE #Weights varchar(9)
DECLARE #I INT
DECLARE #WD INT
DECLARE #WP INT
DECLARE #A INT
DECLARE #B INT
DECLARE #R INT
DECLARE #WR INT
SET #WR = 0
SET #R = 0
SET #A = 0
SET #B = 0
SET #WP = 0
SET #I = 0
BEGIN
IF (LEN(#subNo) = 7) AND (SUBSTRING(#subno,1,1) = '2') OR (SUBSTRING(#subno,1,1) = '9')
BEGIN
SET #result = 0 --VALID
END
ELSE IF(LEN(#subno) = 8) AND (SUBSTRING(#subno,1,1) = '2') OR
(SUBSTRING(#subno,1,1) = '9')
BEGIN
SET #result = 0 --VALID
END
ELSE IF(LEN(#subno) = 9)
BEGIN
SET #WA = SUBSTRING(#subno,1,2)
IF(#WA = '65')
set #result = 1 -- INVALID
else
BEGIN
SET #Weights = '12121212'
SET #WA = SUBSTRING(#subno,9,1)
SET #WD = 0
SET #I = 1
WHILE #I<9
BEGIN
SET #WP = cast(SUBSTRING(#Weights, #I,1)as int) * cast(SUBSTRING(#subno, #I, 1) as int)
IF(#WP > 9)
BEGIN
SET #A = SUBSTRING(CAST(#WP AS VARCHAR),1,1)
SET #B = SUBSTRING(CAST(#WP AS VARCHAR),2,1)
SET #WP = CAST(#A AS INT) + CAST(#B AS INT)
END
SET #WD = #WP + #WD
SET #I = #I + 1
END
SET #R = #WD % 10
IF(#R <> 0)
SET #WR = 10 - #R
ELSE
SET #WR = #R
IF(#WR <> CAST(#WA AS INT))
BEGIN
SET #result = 1 -- INVALID
END
ELSE
BEGIN
SET #result = 0 -- VALID
END
END
END
ELSE IF (LEN(#subno) = 10)
BEGIN
SET #I =1
SET #WD = 0
SET #Weights = '121212121'
SET #WA = SUBSTRING(#subno,10,1)
WHILE(#I < 10)
BEGIN
SET #WP = CAST(SUBSTRING(#Weights, #I, 1)AS INT) * CAST(SUBSTRING(#subno, #I, 1) AS INT)
IF(#WP > 9)
BEGIN
SET #A = SUBSTRING(CAST(#WP AS VARCHAR),1,1)
SET #B = SUBSTRING(CAST(#WP AS VARCHAR),2,1)
SET #WP = CAST(#A AS INT) + CAST(#B AS INT)
END
SET #WD = #WP + #WD
SET #I = #I + 1
END
SET #R = #WD % 10
IF(#R <> 0)
SET #WR = 10 - #R
ELSE
SET #WR = #R
IF (#WR<> #WA)
BEGIN
SET #result = 1 -- INVALID
END
ELSE
BEGIN
SET #result = 0 -- VALID
END
END
ELSE
SET #result = 1 -- INVALID
END
Split the values which u get . Then iterate iver each side and add them. Please see the sample below.
declare #v varchar (16) , #num1 varchar(20) , #num2 varchar(20)
set #v = '1234567.78906656'
select #num1 = substring(#v,0,charindex('.',#v))
select #num2 = substring(#v,charindex('.',#v)+1,len(#v))
--select #num1 = convert(int, substring(#v,0,charindex('.',#v)))
--select #num2 = substring(#v,charindex('.',#v)+1,len(#v))
declare #index int = 1 ,#len INT , #char CHAR
declare #TotalOddL int = 0
declare #TotalEvenL int = 0
DECLARE #FullTotL INT = 0
declare #TotalOddR int = 0
declare #TotalEvenR int = 0
DECLARE #FullTotR INT = 0
DECLARE #TEMP INT
set #len= LEN(#num1)
WHILE #index <= #len
BEGIN
set #char = SUBSTRING(#num1, #index, 1)
SET #TEMP = cast(#char as int)
IF(#TEMP % 2 = 0)
SET #TotalEvenL = #TotalEvenL + #char
else
SET #TotalOddL = #TotalOddL + #char
SET #FullTotL = #TotalEvenL + #TotalOddL
SET #index= #index+ 1
END
Select 'LeftSide total' , #FullTotL
Select 'Left Side odd' , #TotalOddL
Select 'Left Side Even' , #TotalEvenL
SET #index = 1
set #len= LEN(#num2)
WHILE #index <= #len
BEGIN
set #char = SUBSTRING(#num2, #index, 1)
SET #TEMP = cast(#char as int)
IF(#TEMP % 2 = 0)
SET #TotalEvenR= #TotalEvenR + #char
else
SET #TotalOddR = #TotalOddR + #char
SET #FullTotR = #TotalEvenR + #TotalOddR
SET #index= #index+ 1
END
select 'TotalRSide' , #FullTotR
select 'RsideOdd' , #TotalOddR
select 'RSideEven' , #TotalEvenR
select 'Multiplied value' , #FullTotR * #FullTotL
create or replace procedure prc_even_odd(i_number in number, o_result out varchar2)
as
begin
if (mod(i_number,2) = 0) then
o_result := 'EVEN';
else
o_result := 'ODD';
end prc_even;

Function to check the input number of words

Need help to create a function that returns TRUE or FALSE. TRUE - if type 1 or 3 words (like '__hello_', '_hello', '_hello my frend' - spaces should be cut), if the condition is not fulfilled FALSE
CREATE FUNCTION dbo.nazvFac(#f nvarchar(30))
RETURNS bit
AS
BEGIN
DECLARE #l int = 1, #s nvarchar(30), #i int = 0, #b bit
WHILE LTRIM(RTRIM(LEN(#f))) >= #l --error here, but I do not know how to fix it
BEGIN
SET #s = SUBSTRING(#f, #l, 1)
IF #s BETWEEN 'А' AND 'я'
SET #l += 1
ELSE IF #s = ' '
BEGIN
SET #l -= 1
SET #s = SUBSTRING(#f, #l, 1)
SET #s = RTRIM(#s)
SET #l += 2
SET #i += 1
END
ELSE
BREAK
END
IF #i = 0 OR #i = 2
SET #b = 'TRUE'
ELSE
SET #b = 'FALSE'
RETURN #b
END
GO
WHILE LTRIM(RTRIM(LEN(#f))) >= #l --error here, but I do not know how to fix it
LEN() returns an int, which you are then passing to a string function: RTRIM().
You want to return TRUE only if there are one or three words? This should do it:
CREATE FUNCTION dbo.nazvFac(#f NVARCHAR(30))
RETURNS BIT
AS
BEGIN
DECLARE #l INT, #b BIT
SET #l = LEN(#f) - LEN(REPLACE(#f, ' ', '')) + 1
IF #l == 1 OR #l == 3
SET #b = 'TRUE'
ELSE
SET #b = 'FALSE'
RETURN #b
END
Also, JC. is right about the len() error.
You should trim the string and then check the length.
CREATE FUNCTION dbo.nazvFac(#f NVARCHAR(30))
RETURNS BIT
AS
BEGIN
DECLARE #l INT = 1, #s NVARCHAR(30), #i INT = 0, #b BIT
WHILE LEN(LTRIM(RTRIM(#f))) >= #l --I think youn need to trim the string and then check length
BEGIN
SET #s = SUBSTRING(#f, #l, 1)
IF #s BETWEEN 'А' AND 'я'
SET #l += 1
ELSE IF #s = ' '
BEGIN
SET #l -= 1
SET #s = SUBSTRING(#f, #l, 1)
SET #s = RTRIM(#s)
SET #l += 2
SET #i += 1
END
ELSE
BREAK
END
IF #i = 0 OR #i = 2
SET #b = 'TRUE'
ELSE
SET #b = 'FALSE'
RETURN #b
END
GO

search in a string creditcard numeric value

I want to find a credit card numeric value in a sql string.
for example;
DECLARE #value1 NVARCHAR(MAX) = 'The payment is the place 1234567812345678'
DECLARE #value2 NVARCHAR(MAX) = 'The payment is the place 123456aa7812345678'
DECLARE #value3 NVARCHAR(MAX) = 'The payment1234567812345678is the place'
The result should be :
#value1Result 1234567812345678
#value2Result NULL
#value3Result 1234567812345678
16 digits must be together without space.
How to do this in a sql script or a function?
edit :
if I want to find these 2 credit card value.
#value4 = 'card 1 is : 4034349183539301 and the other one is 3456123485697865'
how should I implement the scripts?
You can use PathIndex as
PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', yourStr)
if the result is 0 then it doesnt containg 16 digits other was it contains.
It can be used withing a Where statement or Select statement based on your needs
You can write as:
SELECT case when Len(LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1)) = 16
then LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1)
else ''
end
FROM (
SELECT subsrt = SUBSTRING(string, pos, LEN(string))
FROM (
SELECT string, pos = PATINDEX('%[0-9]%', string)
FROM table1
) d
) t
Demo
DECLARE #value1 NVARCHAR(MAX) = 'card 1 is : 4034349183539301 and the other one is 3456123485697865'
DECLARE #Lenght INT
,#Count INT
,#Candidate CHAR
,#cNum INT
,#result VARCHAR(16)
SELECT #Count = 1
SELECT #cNum = 0
SELECT #result = ''
SELECT #Lenght = LEN(#value1)
WHILE #Count <= #Lenght
BEGIN
SELECT #Candidate = SUBSTRING(#value1, #Count, 1)
IF #Candidate != ' '
AND ISNUMERIC(#Candidate) = 1
BEGIN
SET #cNum = #cNum + 1
SET #result = #result + #Candidate
END
ELSE
BEGIN
SET #cNum = 1
SET #result = ''
END
IF #cNum > 16
BEGIN
SELECT #result 'Credit Number'
END
SET #Count = #Count + 1
END
There you go kind sir.
DECLARE
#value3 NVARCHAR(MAX) = 'The payment1234567812345678is the place',
#MaxCount int,
#Count int,
#Numbers NVARCHAR(100)
SELECT #Count = 1
SELECT #Numbers = ''
SELECT #MaxCount = LEN(#value3)
WHILE #Count <= #MaxCount
BEGIN
IF (UNICODE(SUBSTRING(#value3,#Count,1)) >= 48 AND UNICODE(SUBSTRING(#value3,#Count,1)) <=57)
SELECT #Numbers = #Numbers + SUBSTRING(#value3,#Count,1)
SELECT #Count = #Count + 1
END
PRINT #Numbers
You can make this as a function if you are planning to use it a lot.

Calculate Count of true bits in binary type with t-sql

I need to find how many true bit exists in my binary value.
example:
input: 0001101 output:3
input: 1111001 output:5
While both answers work, both have issues. A loop is not optimal and destructs the value. Both solutions can not be used in a select statement.
Possible better solution is by masking together as follows
select #counter = 0
+ case when #BinaryVariable2 & 1 = 1 then 1 else 0 end
+ case when #BinaryVariable2 & 2 = 2 then 1 else 0 end
+ case when #BinaryVariable2 & 4 = 4 then 1 else 0 end
+ case when #BinaryVariable2 & 8 = 8 then 1 else 0 end
+ case when #BinaryVariable2 & 16 = 16 then 1 else 0 end
+ case when #BinaryVariable2 & 32 = 32 then 1 else 0 end
+ case when #BinaryVariable2 & 64 = 64 then 1 else 0 end
+ case when #BinaryVariable2 & 128 = 128 then 1 else 0 end
+ case when #BinaryVariable2 & 256 = 256 then 1 else 0 end
+ case when #BinaryVariable2 & 512 = 512 then 1 else 0 end
This can be used in a select and update statement. It is also an order of magnitude faster. (on my server about 50 times)
To help you might want to use the following generator code
declare #x int = 1, #c int = 0
print ' #counter = 0 ' /*CHANGE field/parameter name */
while #c < 10 /* change to how many bits you want to see */
begin
print ' + case when #BinaryVariable2 & ' + cast(#x as varchar) + ' = ' + cast(#x as varchar) + ' then 1 else 0 end ' /* CHANGE the variable/field name */
select #x *=2, #c +=1
end
Also as further note: if you use a bigint or go beyond 32 bits it is necessary to cast like follows
print ' + case when #Missing & cast(' + cast(#x as varchar) + ' as bigint) = ' + cast(#x as varchar) + ' then 1 else 0 end '
Enjoy
DECLARE #BinaryVariable2 VARBINARY(10);
SET #BinaryVariable2 = 60; -- binary value is 111100
DECLARE #counter int = 0
WHILE #BinaryVariable2 > 0
SELECT #counter +=#BinaryVariable2 % 2, #BinaryVariable2 /= 2
SELECT #counter
Result:
4
I've left various debug selects in.
begin
declare #bin as varbinary(20);
declare #bitsSet as int;
set #bitsSet = 0;
set #bin = convert(varbinary(20), 876876876876);
declare #i as int;
set #i = 0
select LEN(#bin), 'Len';
while #i < LEN(#bin)
begin
declare #bit as varbinary(1);
set #bit = SUBSTRING(#bin, #i, 1);
select #bit, 'Bit';
declare #power as int
set #power = 0;
while #power < 8
begin
declare #powerOf2 as int;
set #powerOf2 = POWER(2, #power);
if #powerOf2 <> 0
set #bitsSet = #bitsSet + (#bit & #powerOf2) / #powerOf2; -- edited to add the divisor
select #power, #powerOf2;
set #power = #power + 1;
end;
select #bitsSet;
set #i = #i + 1;
end;
select #bitsSet, 'End'
end;
Cheers -
You can handle an arbitrary length binary value by using a recursive CTE to split the data into a table of 1-byte values and counting all of the bits that are true in each byte of that table...
DECLARE #data Varbinary(MAX) = Convert(Varbinary(MAX), N'We can count bits of very large varbinary values without a loop or number table if you like...');
WITH each ( byte, pos ) AS (
SELECT Substring(#data, Len(#data), 1), Len(#data)-1 WHERE Len(#data) > 0
UNION ALL
SELECT Substring(#data, pos, 1), pos-1 FROM each WHERE pos > 0
)
SELECT Count(*) AS [True Bits]
FROM each
CROSS JOIN (VALUES (1),(2),(4),(8), (16),(32),(64),(128)) [bit](flag)
WHERE each.byte & [bit].flag = [bit].flag
OPTION (MAXRECURSION 0);
From SQL Server 2022 you can just use SELECT BIT_COUNT(input)
expression_value can be
Any integer or binary expression that isn't a large object (LOB).
For integer expressions the result can depend on the datatype. e.g. -1 as smallint has a binary representation of 1111111111111111 (two's complement) and will have more bits set for int datatype.