I'm working on a VBA script that is to work through an extensive list of email addresses and flag the ones that are suspected of being wrong.
I'd like to refine the routine by adding a function that would spot typos in common domain names such as gmail, hotmail, msn, skynet, etc. I'll have a list of these common display names in an array.
The string function would see if the inputted string looks similar but is not the same as an element in the array, and return true as boolean if it is the case.
Idea is to spot erroneous entries such as: homtail, mns, slynet, hotmal, yahooo, etc.
Not looking for a script per se, looking for inspiration of how to tackle this problem...
a fuzzy comarison is what you need - there is code here that will compare two strings, and give you a score from 0 to 1 depending on how close they are. It will be up to you to decide how close they are to do automatic substitution.
example results:
server text fuzzy score
------- -------- -----------
hotmail hotmale 0.7619048
hotmail hot 0.4285714
hotmail notmail 0.8571429
hotmail NotEvenClose 0.1944444
hotmail hotmail 1
hotmail yellow 0.0952381
hotmail homtail 0.7142857
The the source code has been released under GNU Lesser GPL
in case of link rot, here's the code:
Public Function Fuzzy(ByVal s1 As String, ByVal s2 As String) As Single
Dim i As Integer, j As Integer, k As Integer, d1 As Integer, d2 As Integer, p As Integer
Dim c As String, a1 As String, a2 As String, f As Single, o As Single, w As Single
'
' ******* INPUT STRINGS CLEANSING *******
'
s1 = UCase(s1) 'input strings are converted to uppercase
d1 = Len(s1)
j = 1
For i = 1 To d1
c = Mid(s1, i, 1)
Select Case c
Case "0" To "9", "A" To "Z" 'filter the allowable characters
a1 = a1 & c 'a1 is what remains from s1 after filtering
j = j + 1
End Select
Next
If j = 1 Then Exit Function 'if s1 is empty after filtering
d1 = j - 1
s2 = UCase(s2)
d2 = Len(s2)
j = 1
For i = 1 To d2
c = Mid(s2, i, 1)
Select Case c
Case "0" To "9", "A" To "Z"
a2 = a2 & c
j = j + 1
End Select
Next
If j = 1 Then Exit Function
d2 = j - 1
k = d1
If d2 < d1 Then 'to prevent doubling the code below s1 must be made the shortest string,
'so we swap the variables
k = d2
d2 = d1
d1 = k
s1 = a2
s2 = a1
a1 = s1
a2 = s2
Else
s1 = a1
s2 = a2
End If
If k = 1 Then 'degenerate case, where the shortest string is just one character
If InStr(1, s2, s1, vbBinaryCompare) > 0 Then
Fuzzy = 1 / d2
Else
Fuzzy = 0
End If
Else '******* MAIN LOGIC HERE *******
i = 1
f = 0
o = 0
Do 'count the identical characters in s1 and s2 ("frequency analysis")
p = InStr(1, s2, Mid(s1, i, 1), vbBinaryCompare)
'search the character at position i from s1 in s2
If p > 0 Then 'found a matching character, at position p in s2
f = f + 1 'increment the frequency counter
s2 = Left(s2, p - 1) & "~" & Mid(s2, p + 1)
'replace the found character with one outside the allowable list
'(I used tilde here), to prevent re-finding
Do 'check the order of characters
If i >= k Then Exit Do 'no more characters to search
If Mid(s2, p + 1, 1) = Mid(s1, i + 1, 1) Then
'test if the next character is the same in the two strings
f = f + 1 'increment the frequency counter
o = o + 1 'increment the order counter
i = i + 1
p = p + 1
Else
Exit Do
End If
Loop
End If
If i >= k Then Exit Do
i = i + 1
Loop
If o > 0 Then o = o + 1 'if we got at least one match, adjust the order counter
'because two characters are required to define "order"
finish:
w = 2 'Weight of characters order match against characters frequency match;
'feel free to experiment, to get best matching results with your data.
'If only frequency is important, you can get rid of the second Do...Loop
'to significantly accelerate the code.
'By altering a bit the code above and the equation below you may get rid
'of the frequency parameter, since the order counter increments only for
'identical characters which are in the same order.
'However, I usually keep both parameters, since they offer maximum flexibility
'with a variety of data, and both should be maintained for this project
Fuzzy = (w * o + f) / (w + 1) / d2
End If
End Function
What you want to do is called Hamming codes (or hamming distance) -
try this
Related
I am trying to write a code which has multiple For and If loops. I will try to explain the problem first where the dataset I have is like the following in column 'AH':
0,0,0,0,1,1,2,2,2,2,2,2,1,1,1,0,0,0,0,0,2,2,2,2,2,0,0,..... where the number of 0s, 1s and 2s in a stretch is unknown. What I am trying to find the number of cycles, where a cycle is defined when there has to be atleast 3 0s in a stretch and then has to be atleast 4 2s consecutively. So, to do that, I wrote the code in the following format
Dim M As Single: Dim Count As Integer: Dim A As Integer: Dim B As Integer
M = 2: Count = 0: A =3: B=4
Dim temp As Integer: Dim temp1 As Integer: temp = 0
For L = M To 50
Sheets("Sheet1").Range("AJ" & M) = M
temp = 0
For L1 = L To L + A
temp = temp + Sheets("Sheet1").Range("AH" & L1)
Next L1
If temp = 0 Then
N = L + A
For N1 = N To 60
If Sheets("Sheet1").Range("AH" & N1) = 2 Then
temp1 = 0
For I1 = N1 To N1 + B
temp1 = temp1 + Sheets("Sheet1").Range("AH" & I1)
Next I1
If temp1 = 2 * B Then
flg = True
Exit For
End If
End If
Next N1
Count = Count + 1: M = I1
Sheets("Sheet1").Range("AJ2") = Count
If flg = True Then Exit For
End If
M = M + 1
Next L
Basically, what I am trying to do is find the first 0 and count the sum of 3 consecutive values. If it is 0, then I am searching for 2. When the first 2 is found, it will add up the next 4 terms and if the sum is equal to 2*4, then I will update the count and the code should start look for 0. However, using the 'Exit For' puts me out of all the loops. And if I don't put Exit, then it keep counting the 2s for more times. I am new to VBA and struck with this problem for a long time. Any help on this will be greatly appreciated. Thank you in advance.
I have a set which has an unknown number of objects. I want to associate a label to each one of these objects. Instead of labeling each object with a number I want to label them with letters.
For example the first object would be labeled A the second B and so on.
When I get to Z, the next object would be labeled AA
AZ? then BA, BB, BC.
ZZ? then AAA, AAB, AAC and so on.
I'm working using Mapbasic (similar to VBA), but I can't seem to wrap my head around a dynamic solution. My solution assumes that there will be a max number of objects that the set may or may not exceed.
label = pos1 & pos2
Once pos2 reaches ASCII "Z" then pos1 will be "A" and pos2 will be "A". However, if there is another object after "ZZ" this will fail.
How do I overcome this static solution?
Basically what I needed was a Base 26 Counter. The function takes a parameter like "A" or "AAA" and determines the next letter in the sequence.
Function IncrementAlpha(ByVal alpha As String) As String
Dim N As Integer
Dim num As Integer
Dim str As String
Do While Len(alpha)
num = num * 26 + (Asc(alpha) - Asc("A") + 1)
alpha = Mid$(alpha, 2,1)
Loop
N = num + 1
Do While N > 0
str = Chr$(Asc("A") + (N - 1) Mod 26) & str
N = (N - 1) \ 26
Loop
IncrementAlpha = str
End Function
If we need to convert numbers to a "letter format" where:
1 = A
26 = Z
27 = AA
702 = ZZ
703 = AAA etc
...and it needs to be in Excel VBA, then we're in luck. Excel's columns are "numbered" the same way!
Function numToLetters(num As Integer) As String
numToLetters = Split(Cells(1, num).Address(, 0), "$")(0)
End Function
Pass this function a number between 1 and 16384 and it will return a string between A and XFD.
Edit:
I guess I misread; you're not using Excel. If you're using VBA you should still be able to do this will the help of an reference to an Excel Object Library.
This should get you going in terms of the logic. Haven't tested it completely, but you should be able to work from here.
Public Function GenerateLabel(ByVal Number As Long) As String
Const TOKENS As String = "ZABCDEFGHIJKLMNOPQRSTUVWXY"
Dim i As Long
Dim j As Long
Dim Prev As String
j = 1
Prev = ""
Do While Number > 0
i = (Number Mod 26) + 1
GenerateLabel = Prev & Mid(TOKENS, i, 1)
Number = Number - 26
If j > 0 Then Prev = Mid(TOKENS, j + 1, 1)
j = j + Abs(Number Mod 26 = 0)
Loop
End Function
I have a routing sequence for a set of machines on an assembly line. Each route has to go through the entire line (that is, if you only run the first and second machine, you still account for the distance from the second to the end of the line).
I have six different machines (720 possible combinations of machines) with fixed distances between each location on the line. The distance between the first and second machine is 100', the distance between second and third is 75', third and fourth is 75', fourth and fifth is 25', and fifth and sixth is 25'.
I have 4 different products that have to run down the line, and each of them have a fixed routing.
My problem is, how do I set up a vba code or solver that will allow me to run through all possible combinations of the line setup and determine the optimal setup for this line? Any machine can be placed at any location, as long as it optimizes the result!
The four product routes are :
A - B - C - D - F
A - C - B - D – E - F
A - F - E - D - C - B - A - F
A - C - E - B - D – F
Running through all possible combinations - if you really need to do that - is a job for something like Heap's algorithm, although I prefer the plain changes method:
Sub Evaluate(Lineup() As String)
' dummy evaluation, just output the permutation
Dim OffCell As Long
For OffCell = LBound(Lineup, 1) To UBound(Lineup, 1)
ActiveCell.Offset(0, OffCell).Value = Lineup(OffCell)
Next OffCell
ActiveCell.Offset(1, 0).Activate
End Sub
Sub AllPerms(Lineup() As String)
' Lineup is a 1-D array indexed at 1
Dim LSize As Long
Dim Shift() As Long
Dim Tot As Long
Dim Idx As Long
Dim Level As Long
Dim Change As Long
Dim Offset As Long
Dim TempStr As String
LSize = UBound(Lineup)
ReDim Shift(LSize)
'count of permutations, set initial changes
Tot = 1
For Idx = 2 To LSize
Tot = Tot * Idx
Shift(Idx) = 1 - Idx
Next Idx
Shift(1) = 2 ' end condition
' go through permutations
For Idx = 1 To Tot
' check this one
Call Evaluate(Lineup)
' switch for the next
Level = LSize
Offset = 0
Change = Abs(Shift(Level))
Do While Change = 0 Or Change = Level
If Change = 0 Then Shift(Level) = 1: Offset = Offset + 1
If Change = Level Then Shift(Level) = 1 - Level
Level = Level - 1
Change = Abs(Shift(Level))
Loop
Shift(Level) = Shift(Level) + 1
Change = Change + Offset
TempStr = Lineup(Change)
Lineup(Change) = Lineup(Change + 1)
Lineup(Change + 1) = TempStr
Next Idx
End Sub
Sub ABCDEF_case()
Dim LU(6) As String
LU(1) = "A"
LU(2) = "B"
LU(3) = "C"
LU(4) = "D"
LU(5) = "E"
LU(6) = "F"
Call AllPerms(LU)
End Sub
Two days of continual failure. I am using a barcode system which has a barcode scanner which scans a barcode of alpha-numeric text and places it into an ActiveX textbox. It enters the text one letter at a time, and upon the completion of the entire barcode, it matches up to a Case selection, which then deletes the text in the box to get ready for the next scan.
The issue I happen to be facing is inside of the textbox. For whatever reason, the text goes into the textbox and occasionally ~ (1 time in one hour or 0 times in 8 hours) it will not complete the case. The exact text inside of the textbox which matches one of the cases is not counted and stays inside the box. At this point, any future scans are appended to the end of the text inside of the box.
Below is a sample of the variables, a case, and one of the events occuring based on case selection.
Variables
Private Sub TextBox1_Change()
Dim ws As Worksheet, v, n, t, b, c, e, f, h, j, k, i1, i2, i3, i4
Set ws = Worksheets("Sheet1")
v = TextBox1.Value
n = 0
t = 0
b = 0
c = 0
e = 0
f = 0
h = 0
j = 0
k = 0
i1 = 0
i2 = 0
i3 = 0
i4 = 0
Case
Select Case v
Case "2 in x 16 ft R -1": n = 9
t = 1
b = 10
c = 1
e = 11
f = 6
g = "2 in x 16 ft"
h = 40
j = 0.296
k = 1
Stuff that is done based on case type
'n = Sets the column reference for waste - not used?
't = Sets the cutting station column to be used (1,2,3) for the sq yards, row, and column of last scanned item for each station
'b = Sets the row reference for adding cut rolls waste + regular row reference for cut rolls
'c = Sets the column reference for adding cut rolls waste + regular column refernce for cut rolls
'e = Sets the column reference for taking 1 master roll out
'f = Sets the row reference for taking 1 master roll out
'g = name of the item being used for the time stamp
'h = Number of rolls coming out of the master roll
'j = The amount of Sq yards in the cut roll (to be used for waste)
'k = Case Selection
'i1 = Count for Cutting Station 1 timestamp, row reference
'i2 = Count for Cutting Station 2 timestamp, row reference
'i3 = Count for Cutting Station 3 timestamp, row reference
'i4 = Count for Cutting Station 1 timestamp, row reference - not used in this worksheet
If k = 1 And t = 1 Then
'Cutter 1 items
ws.Cells(1, t) = b
ws.Cells(2, t) = c
ws.Cells(3, t) = j
ws.Cells(4, t) = b
ws.Cells(5, t) = c
ws.Cells(6, t) = f
ws.Cells(7, t) = h
ws.Cells(b, c) = ws.Cells(b, c) + h
' adding different number based on case
ws.Cells(f, e) = ws.Cells(f, e) - 1
' always subtracts 1 from certain range based on case
i1 = ws.Cells(1, 30)
Cells(i1, 19).Value = Format(Now, "mm/dd/yyyy AM/PM h:mm:ss")
Cells(i1, 20).Value = g
TextBox1.Activate
TextBox1.Value = ""
Remember the text enters in one character at a time until the entire barcodes information is passed into the ActiveX textbox.
I can set a max length, but upon the max length it stays in the textbox. If I set the textbox to "", the next character in the barcode starts again and the append issue continues.
Is there a way to not have the case selection start upon the entry of each single character? Is there a way to have the textbox delete the extra information. If you set it to delete something which does not match a case, then it will delete anything entered since it puts one character in at a time.
Best regards,
Ford
This function goes through all integers and picks out binary values with only five ones and writes them to the spreadsheet.
To run this For x = 1 To 134217728 would take 2.5 days!!!! Help!
How could I speed this up?
Function D2B(ByVal n As Long) As String
n = Abs(n)
D2B = ""
Do While n > 0
If n = (n \ 2) * 2 Then
D2B = "0" & D2B
Else
D2B = "1" & D2B
n = n - 1
End If
n = n / 2
Loop
End Function
Sub mixtures()
Dim x As Long
Dim y As Integer
Dim fill As String
Dim mask As String
Dim RowOffset As Integer
Dim t As Date
t = Now
fill = ""
For x = 1 To 134217728
mask = Right(fill & CStr(D2B(x)), Len(fill & CStr(D2B(x))))
Debug.Print mask
If x > 100000 Then Exit For
If Len(mask) - Len(WorksheetFunction.Substitute(mask, "1", "")) = 5 Then _
RowOffset = RowOffset + 1
For y = 1 To Len(mask)
If Len(mask) - Len(WorksheetFunction.Substitute(mask, "1", "")) = 5 Then _
Range("mix").Offset(RowOffset).Cells(y) = Mid(mask, y, 1)
Next
Next
Debug.Print DateDiff("s", Now, t)
End Sub
By first sight guess, I think the problem lies in the fact that you do that cell by cell, which causes many read and write accesses.
You should do it range by range, like
vArr = Range("A1:C1000").Value
' it is array now, do something here effeciently
Range("A1:C1000").Value = vArr
You want find all 28bit numbers with 5 1s
There are 28*27*26*25*24/5/4/3/2=98280 such numbers
The following code took ~10 seconds on my PC:
lineno = 1
For b1 = 0 To 27
For b2 = b1 + 1 To 27
For b3 = b2 + 1 To 27
For b4 = b3 + 1 To 27
For b5 = b4 + 1 To 27
Cells(lineno, 1) = 2 ^ b1 + 2 ^ b2 + 2 ^ b3 + 2 ^ b4 + 2 ^ b5
lineno = lineno + 1
Next
Next
Next
Next
Next
mask = Right(fill & CStr(D2B(x)), Len(fill & CStr(D2B(x))))
The above line of code does the same thing (CStr(D2B(x))) twice.
Store the result of CStr(D2B(x)) in a variable & use that variable in the above line of code.
I've got 2 suggestions:
Get rid of the substitution command by counting the ones/zeroes in D2B and return an empty string if the count does not equal 5
Write these pre-filtered bitstrings to an array first and copy the array directly to the cells when finished.
Something like
ws.Range(ws.cells(1, 1), ws.cells(UBound(dstArr, 1) + 1, UBound(dstArr, 2) + 1)) = dstArr
The array-copy-trick greatly improves performance!