Using levenshtein on parts of string in SQL - sql

I am trying to figure out a way to work some fuzzy searching methods into our store front search field using the Levenshtein method, but I'm running into a problem with how to search for only part of product names.
For example, a customer searches for scisors, but we have a product called electric scissor. Using the Levenshtein method levenshtein("scisors","electric scissor") we will get a result of 11, because the electric part will be counted as a difference.
What I am looking for is a way for it to look at substrings of the product name, so it would compare it to levenshtein("scisors","electric") and then also levenshtein("scisors","scissor") to see that we can get a result of only 2 in that second substring, and thus show that product as part of their search result.
Non-working example to give you an idea of what I'm after:
SELECT * FROM products p WHERE levenshtein("scisors", p.name) < 5
Question: Is there a way to write an SQL statement that handles checking for parts of the string? Would I need to create more functions in my database to be able to handle it perhaps or modify my existing function, and if so, what would it look like?
I am currently using this implementation of the levenshtein method:
//levenshtein(s1 as VARCHAR(255), s2 as VARCHAR(255))
//returns int
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END

This is a bit long for a comment.
First, I would suggest using a full-text search with a synonyms list. That said, you might have users with really bad spelling abilities, so the synonyms list might be difficult to maintain.
If you use Levenshtein distance, then I suggest doing it on a per word basis. For each word in the user's input, calculate the closest word in the name field. Then add these together to get the best match.
In your example, you would have these comparisons:
levenshtein('scisors', 'electric')
levenshtein('scisors', 'scissor')
The minimum would be the second. If the user types multiple words, such as 'electrk scisors', then you would be doing
levenshtein('electrk', 'electric') <-- minimum
levenshtein('electrk', 'scissor')
levenshtein('scisors', 'electric')
levenshtein('scisors', 'scissor') <-- minimum
This is likely to be an intuitive way to approach the search.

Related

Vbscript, how to write a For loop with two variables

I am use Vb to write for loop,
i=1 to 3
j=1 to 3
how to make the i+j result become 1+1,2+2, 3+3.
I know for C#, it can be for(i=0,j=0;i<10,j<10;i++,j++)
just don't know how to write in VB
Unfortunately, as of March 2022, Microsoft seems to have finally pulled their 25+ year-old VBScript language documentation from their documentation website, however third-party content-mirrors exist with the documentation for VBScript's For Next statement
As per the VBScript documentation...
TL;DR: You can't. VBScript's For Next statement only supports updating a single variable inside.
You can still declare and manually increment your own variables in the loop though, e.g.
Using a single variable:
Option Explicit
Dim i
For i = 0 To 5 Step 1
Call DoStuff
Next
Using multiple variables requires separate Dim declarations and incrementing them inside the For loop body:
Option Explicit
Dim i, j, k
Let j = 0
Let k = 0
For i = 0 To 5 Step 1
Call DoStuff
Let j = j + 1
Let k = k + 1
Next
Dim result As Integer
For i = 1 To 3
For j = i To i
result = i + j
Console.WriteLine(i.ToString + "+" + j.ToString + "=" + result.ToString)
Next
Next
Or you can also write
Dim i, j, result As Integer
i = 0
j = 0
While i < 10 And j < 10
result = i + j
Console.WriteLine(i.ToString + "+" + j.ToString + "=" + result.ToString)
i += 1
j += 1
End While
If you want to achieve the effect you said, I generally like to use two for loops, and I don't like to use while loops (personal preference).

Trim a specific string

I am manipulating strings.
I want the output string to be only between 2 specific characters (= and o)
I can do this by repeat this twice:
For f = 1 To Len(line5)
If Mid(line5, f, 1) = "=" Then
line5 = Mid(line5, f, Len(line5) - f + 1)
line5 = line5_out
End If
One time for = and one for o
Is there a quicker way to do this?
There are multiple ways to do it, the "best" way depends on what exactly you need.
Besides those comments, here are two more ways to do it:
'Delete everything behind o and infront of =
YourString = YourString.Remove(YourString.LastIndexOf("o") + 1, YourString.Length - YourString.LastIndexOf("o") - 1).Remove(0, YourString.IndexOf("="))
'Get part of string between = and o
YourString = YourString.Substring(IndexOf("="), YourString.LastIndexOf("o") + 1 - YourString.IndexOf("="))

Informix 4gl Split a String or Char

I wanted to know the Informix 4gl command to split a variable
such as
lv_var = variable01;variable02
into
lv_var01 = variable01
lv_var02 = variable02
Is there something in Informix 4gl that can do this.
In python I could do
lv_array = lv_var.split(";")
and use the variables from the array
It's possible with classic Informix 4gl with something like this...
define
p_list dynamic array of char(10)
main
define
i smallint,
cnt smallint,
p_str char(500)
let p_str = "a;b;c;d"
let cnt = toarray(p_str, ";")
for i = 1 to cnt
display p_list[i]
end for
end main
function toarray(p_str, p_sep)
define
p_str char(2000),
p_sep char(1),
i smallint,
last smallint,
ix smallint,
p_len smallint
let ix = 0
let p_len = length(p_str)
# -- get size of array needed
for i = 1 to p_len
if p_str[i] = p_sep then
let ix = ix + 1
end if
end for
if ix > 0 then
# -- we have more then one
allocate array p_list[ix + 1]
let ix = 1
let last = 1
for i = 1 to p_len
if p_str[i] = p_sep then
let p_list[ix] = p_str[last,i-1]
let ix = ix + 1
let last = i + 1
end if
end for
# -- set the last one
let p_list[ix] = p_str[last, p_len]
else
# -- only has one
allocate array p_list[1]
let ix = 1
let p_list[ix] = p_str
end if
return ix
end function
Out:
a
b
c
d
Dynamic array support requires IBM Informix 4GL 7.32.UC1 or higher
There isn't a standard function to do that. One major problem is returning the array. I'd probably write a C function to do the job, but in I4GL, it would look like:
FUNCTION nth_split_field(str, c, n)
DEFINE str VARCHAR(255)
DEFINE c CHAR(1)
DEFINE n INTEGER
...code to find nth field delimited by c in str...
END FUNCTION
What you'll find is that the products that have grown to superceed Informix 4GL over the years such as FourJs Genero will have built-in methods that have been added to simplify the Informix 4GL developers life.
So something like this would do what you are looking for if you upgraded to Genero
-- Example showing how string can be parsed using string tokenizer
-- New features added to Genero since Informix 4gl used include
-- STRING - like a CHAR but length does not need to be specified - http://www.4js.com/online_documentation/fjs-fgl-manual-html/?path=fjs-fgl-manual#c_fgl_datatypes_STRING.html
-- DYNAMIC ARRAY like an ARRAY but does not need to have length specified. Is also passed by reference to functions - http://www.4js.com/online_documentation/fjs-fgl-manual-html/?path=fjs-fgl-manual#c_fgl_Arrays_010.html
-- base.StringTokenizer - methods to split a string - http://www.4js.com/online_documentation/fjs-fgl-manual-html/?path=fjs-fgl-manual#c_fgl_ClassStringTokenizer.html
MAIN
DEFINE arr DYNAMIC ARRAY OF STRING
DEFINE i INTEGER
CALL string2array("abc;def;ghi",arr,";")
-- display result
FOR i = 1 TO arr.getLength()
DISPLAY arr[i]
END FOR
-- Should display
--abc
--def
--ghi
END MAIN
FUNCTION string2array(s,a,delimiter)
DEFINE s STRING
DEFINE a DYNAMIC ARRAY OF STRING
DEFINE delimiter STRING
DEFINE tok base.StringTokenizer
CALL a.clear()
LET tok = base.StringTokenizer.create(s,delimiter)
WHILE tok.hasMoreTokens()
LET a[a.getLength()+1] = tok.nextToken()
END WHILE
-- a is DYNAMIC ARRAY so has been pased by reference and does not need to be explicitly returned
END FUNCTION

For loop with variable upper bound

I'd like to write a for loop with a variable upper limit in Mathematica 9. So, instead of
j = 0;
For[n = 1, n <= 3, n++, j = j + n];
j
(*6*)
I'd like to do
N = 3;
j = 0;
For[n = 1, n <= N, n++, j = j + n];
j
n
(*
0
1
*)
. But, as shown, this does not give the right result at all; it would appear from the value of n that the body of the loop was not evaluated at all.
I've looked through the Mathematica docs both on for loops and and on loops and control structures more generally (and also done some DuckDuckGo searches), but there's still something fundamental I'm missing. What is it?
For completeness, I should note that my ultimate goal is to put this in a function:
foo[N] =
Module[{j = 0},
For[n = 1, n <= N, n++, j = j + n;];
j]
foo[3]
Your code shows several common new user's problems. For example:
N is a reserved word
You shouldn't start your identifiers with Upper Case letters
The function foo[] should be defined with SetDelayed (:=) and not
with Set (=)
You need to use patterns (_) in the function definition arguments
For[]loops, and iterations in general should be avoided in
Mathematica
I think you could carefully read all the answers to this post to get a better grip on Mathematica.
Anyway, your code may be rewritten as
foo[k_] := Module[{j = 0}, For[n = 1, n <= k, n++, j = j + n]; j]
foo[3]
(*6*)
But this is horrible Mathematica coding.
The following are much better ways in Mathematica:
foo[j_ , k_] := Fold[Plus, j, Range#k]
foo[j_ , k_] := j + Total#Range#k
foo[j_ , k_] := j + Tr#Range#k

Verify Gamefield VB.NET

So I'm developing a minesweeper game and im assigning the mines, but I've got to check where are the mines now, in order to generate the numbers. The problem is that when I'm verifying the columns and lines I need the program not to get out of the game field.
Here's how my code looks like now:
Public Sub avisinhos(ByVal line, ByVal column)
If mat(line, column) = 0 Then
mat(line, column) = -1
numbandeiras = numbandeiras + 1
End If
For auxlinha = -1 To 1
For auxcolumn = -1 To 1
Next
Next
End Sub
How do I create a IF function to verify that I don't get out of the game field?
Best regards, joao.
pseudo code
int linestart = -1;
int lineend = 1;
int colstart = -1;
int colend = 1;
Assuming a 10 x 10 grid (zero based)
if line < 2 linestart = 0
if line > 8 lineend = 0
if column < 2 colstart = 0
if column > 8 colend = 0
For auxlinha = linestart To lineend
For auxcolumn = colstart To colend
// check
Next
Next
Personally though I wouldn't bother with the loops, they add very little to nothing
HasMineAbove = (line > 1) and (gamefield[line -1,column] = MinePresentValue
would be my approach, do it all in one.
Not to mention the huge potential confusion when auxlinha and auxcolumn are both zero...
I'm not sure exactly what your code is saying. It's a bit cryptic since you're using abbreviations and all lowercase names. You might want to try camelCasing and spelling out the words more completely, intellisense is your friend. =)
But coding style aside, if you are trying to loop through a limited range of values, you can keep your values bounded by using the modulus operator (%). For example, if you need to keep you values between 0-7 and you end up with a value of 12, just take the modulus of 8 to loop back to within range with a value of 4:
12 % 8 = 4
9 % 8 = 1
15 % 8 = 7
24 % 8 = 0
I realize this doesn't answer your specific question, but it's a handy technique might find useful.