Generate list from strings and numbers vba - vba

This question is based on this puzzle that I am trying to do in vba: https://codegolf.stackexchange.com/questions/166765/fun-with-strings-and-numbers
Basically we have strings in col A and numbers in column B and in column C we have to generate a list so that:
The total count of any string should be exactly equal to its
corresponding number in the input data.
No string should be repeated adjacently in the sequence, and every
string should appear in the output list.
The selection of the next string should be done randomly as long as
they don't break above two rules. Each solution should have a
non-zero probability of being chosen.
If no combination is possible, the output should be just 0.
I tried this but I don't how to solve the problem so that it doesn't break rule #2. Any input would be appreciated thanks.
Sub generateList()
Application.ScreenUpdating = False
Dim fI As Long, totTimes As Long, i As Long, j As Long, fO As Long, tryCount As Long
Dim myArr()
Dim randNum As Long
OUT.Range("A1:A" & OUT.Rows.Count).Clear
fO = 1
With DATA
fI = .Range("A" & .Rows.Count).End(xlUp).Row
If fI < 2 Then MsgBox "No data!": Exit Sub
.Sort.SortFields.Clear
.Sort.SortFields.Add Key:=Range("B2:B" & fI), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
With DATA.Sort
.SetRange DATA.Range("A1:B" & fI)
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
fI = .Range("A" & .Rows.Count).End(xlUp).Row
If fI < 2 Then MsgBox "No data!": Exit Sub
totTimes = 0: j = 0
For i = 2 To fI
If Trim(.Range("A" & i).Value) <> "" And IsNumeric(.Range("B" & i).Value) Then j = j + 1
Next i
If j < 1 Then MsgBox "No valid data present. Make sure column B has numbers and column A some string.": Exit Sub
ReDim Preserve myArr(1 To j, 1 To 2)
j = 0
For i = 2 To fI
If Trim(.Range("A" & i).Value) <> "" And IsNumeric(.Range("B" & i).Value) Then
totTimes = totTimes + CLng(.Range("B" & i).Value)
j = j + 1
myArr(j, 1) = .Range("A" & i)
myArr(j, 2) = .Range("B" & i)
End If
Next i
Do While totTimes > 0
randNum = WorksheetFunction.RandBetween(1, j)
If myArr(randNum, 2) > 0 Then
totTimes = totTimes - 1
OUT.Range("A" & fO) = myArr(randNum, 1)
myArr(randNum, 2) = myArr(randNum, 2) - 1
fO = fO + 1
End If
tryAgain:
Loop
End With
Application.ScreenUpdating = True
OUT.Activate
MsgBox "Process Completed"
End Sub

I have a solution (that isn't based on yours, unfortunately) that gives correct results... some of the time. I think I know why it falls short, I just have given up on fixing it.
It's also terrible for golfing, since it's a rather large amount of code, and it's an unholy mishmash of different approaches and implementation ideas that I made up as I went (and I never cleaned it up properly)... but maybe some of this will inspire you to get further.
As per rule #3, I select each letter at random. It was hit and miss using only that approach so I moved to weighted probabilities, which is what the code further down uses - and it seems to work somewhat well. Occasionally there will be 1 letter too many for one of the elements, or there will be adjacent equal elements, so it doesn't actually solve the puzzle all the time.
Ideas to remedy this problem:
Adjust the probability weights based on the frequency each letter has already been used. If you set dbg to true, you'll see that I implemented some calculations with that in mind, but never got around to figuring out how to actually adjust the weights themselves.
Hardcode a check or two for how many letters have been used early in the result, for the largest element group
Change the rand section to make more than 1 pass (maybe best out of 3) - the weights are sorted by "size", so doing 3 (or n) passes should increasingly favor the larger element groups
Maybe a combination of the first and the last suggestion.
Here's the code:
Sub NonRepeatSort(v() As String)
Dim lElementCount As Long
Dim lElement As Element ' Largest
Dim tElement As Long ' Total element count
Dim tEleGroups As Long ' Number of groups of elements
Dim tEle As Element
Dim e As Element
Dim EleCol As New Collection
Dim dbg As Boolean
dbg = False
Dim s As String, res As String, previousRes As String, inputString As String
Dim lCounter As Long
For i = 1 To UBound(v)
' Check if element already exists
On Error Resume Next
s = ""
s = EleCol.Item(v(i, 1))
On Error GoTo 0
' If not, create new
If s = "" Then
Set tEle = New Element
With tEle
.SetName = v(i, 1)
.SetTotal = CLng(v(i, 2))
End With
EleCol.Add Item:=tEle, Key:=tEle.Name
End If
Next i
For Each e In EleCol
' Find the largest element
If e.Total > lElementCount Then
lElementCount = e.Total
Set lElement = e
End If
' Count total elements
tElement = tElement + e.Total
' And groups
tEleGroups = tEleGroups + 1
' Generate inputstring
For k = 1 To e.Total
inputString = inputString + e.Name
Next k
Next e
' If the largest element is larger than the total remaining elements, we'll break rule 4
If lElement.Total - (tElement - lElement.Total) > 1 Then
Debug.Print "0"
GoTo EndForSomeReason
End If
' Bubble sort - lowest to highest
' Adapted from https://stackoverflow.com/a/3588073/4604845
Dim tmpE As Element
For x = 1 To EleCol.Count - 1
For y = 1 To EleCol.Count
If EleCol.Item(x).Total > EleCol.Item(y).Total Then
Set tmpE = EleCol.Item(y)
EleCol.Remove y
EleCol.Add tmpE, tmpE.Name, x
End If
Next y
Next x
' Weighted probability array
Dim pArr() As Variant, tmpProb As Double
ReDim Preserve pArr(1 To 2, 1 To EleCol.Count)
For u = 1 To UBound(pArr, 2)
Set pArr(2, u) = EleCol.Item(u)
tmpProb = tmpProb + pArr(2, u).Freq(tElement)
pArr(1, u) = tmpProb
Next u
' The meat of it
Dim r As Long, lBool As Boolean, sLen As Long, o As Double, t As Long
For j = 1 To tElement
Do
' Reset loop control
lBool = False
' Generate a random number between 1 and 100 _
to decide which group we pick a letter from
r = Rand1To100
For i = 1 To UBound(pArr, 2)
If r <= pArr(1, i) And Not r > pArr(1, i) Then
If dbg Then Debug.Print "Probability match: " & pArr(2, t).Name
t = i
Exit For
End If
Next i
Set tEle = EleCol.Item(t)
If dbg Then Debug.Print "Name: " & tEle.Name
' If the random group is different from the previous result, proceed
If tEle.Name <> previousRes Then
lBool = True
Else
If dbg Then Debug.Print "This was also the previous result - skipping"
End If
' If the use-frequency for the random group is lower than _
how many times it appears in the string, proceed
If lBool Then
o = Round((tEle.Used / tElement) * 100, 5)
If dbg Then Debug.Print "Freq: " & tEle.Freq(tElement)
If dbg Then Debug.Print "Used: " & tEle.UsedFreqI()
If dbg Then Debug.Print "res%: " & Round((Len(res) / tElement) * 100, 1)
If dbg Then Debug.Print "o : " & o
' check use-frequency against modeled frequency
If o < tEle.Freq(tElement) Then
If dbg Then Debug.Print "Proceed with " & tEle.Name
lBool = True
Else
lBool = False
End If
End If
If dbg Then Debug.Print "----------"
lCounter = lCounter + 1
Loop While (Not lBool And lCounter < 1000)
tEle.IncrementUsed
res = res + tEle.Name
previousRes = tEle.Name
Next j
' Generate results
Debug.Print "INPUT : " & inputString
Debug.Print "RESULT: " & res
EndForSomeReason:
End Sub
Function Rand1To100() As Long
Dim r As Long
Randomize
r = ((100 - 1) * Rnd + 1)
r = Round(r, 0)
Rand1To100 = r
End Function
Private Sub TestSort()
Dim v(1 To 4, 1 To 2) As String
v(1, 1) = "A"
v(1, 2) = "6"
v(2, 1) = "B"
v(2, 2) = "2"
v(3, 1) = "C"
v(3, 2) = "2"
v(4, 1) = "D"
v(4, 2) = "4"
Call NonRepeatSort(v)
End Sub
And you'll need this class module:
' * Class module named Element
Private pName As String
Private pTotal As Long
Private pUsed As Long
Private FrequencyCoefficient As Long ' Obsolete?
' Name
Public Property Get Name() As String
Name = pName
End Property
Public Property Let SetName(s As String)
pName = s
End Property
' Total
Public Property Get Total() As Long
Total = pTotal
End Property
Public Property Let SetTotal(t As Long)
pTotal = t
End Property
' Used
Public Property Get Used() As Long
Used = pUsed
End Property
Public Sub IncrementUsed()
pUsed = pUsed + 1
End Sub
' Freq coefficient
Public Property Get Freq(f As Long) As Double
' Where f is the total number of elements
'Freq = FrequencyCoefficient
Freq = Round((Me.Total / f) * 100, 5)
End Property
Private Property Let SetFreq(f As Long)
' Obsolete?
' Where f is the total number of elements
FrequencyCoefficient = Round((Me.Total / f) * 100)
End Property
' Used freq - internal
Public Property Get UsedFreqI() As Long
If Me.Used > 0 Then
UsedFreqI = Round((Me.Used / Me.Total) * 100)
'Debug.Print "UF: " & UsedFreqI
Else
UsedFreqI = 0
End If
End Property
' Used freq - external
Public Property Get UsedFreqE(f As Long) As Long
If Me.Used > 0 Then
UsedFreq = Round((Me.Used / f) * 100)
Else
UsedFreq = 0
End If
End Property

Related

Search in a two-dimensional array

I'm trying to find the values in different points of the array. When I run the code, it always goes to The value doesn't exists, also I do not know how to count the values that are same r.
r = 0
c = txtbbus.Text
For i = 0 To n - 1
For j = 0 To n - 1
If a(i, j) = c Then
txtbres.Text = "The value exists " & r & " and it's in the position (" & i & ", " & j & ") "
Else
txtbres.Text = "The value doesn't exists"
End If
Next j
Next i
And this is how I initialize a:
txtbmatriz.Text = ""
For i = 0 To n - 1
For j = 0 To n - 1
a(i, j) = CInt((100 * Rnd()) + 1)
txtbmatriz.Text += a(i, j) & " "
m += a(i, j)
l += 1
Next j
txtbmatriz.Text += vbCrLf
Next i
The problem is almost certainly that you don't break out of the loop when you find a match. Your code will only ever show you the result of the last element in the array because you always keep searching to the last element. Once you find a match, there's no point to looking further and, in fact, doing so is detrimental. Once you find a match, stop looking.
Finding a single/first match:
Dim rng As New Random
Dim matrix = New Integer(9, 9) {}
For i = 0 To matrix.GetUpperBound(0)
For j = 0 To matrix.GetUpperBound(1)
matrix(i, j) = rng.Next(1, 101)
Next
Next
Dim target = rng.Next(1, 101)
Dim message As String
For i = 0 To matrix.GetUpperBound(0)
For j = 0 To matrix.GetUpperBound(1)
If matrix(i, j) = target Then
message = $"{target} found at ({i},{j})"
Exit For
End If
Next
If message IsNot Nothing Then
Exit For
End If
Next
Console.WriteLine(If(message, $"{target} not found"))
Finding all matches:
Dim rng As New Random
Dim matrix = New Integer(9, 9) {}
For i = 0 To matrix.GetUpperBound(0)
For j = 0 To matrix.GetUpperBound(1)
matrix(i, j) = rng.Next(1, 101)
Next
Next
Dim target = rng.Next(1, 101)
Dim matches As New List(Of String)
For i = 0 To matrix.GetUpperBound(0)
For j = 0 To matrix.GetUpperBound(1)
If matrix(i, j) = target Then
matches.Add($"({i},{j})")
End If
Next
Next
Console.WriteLine(If(matches.Any(),
$"{target} found at {String.Join(", ", matches)}",
$"{target} not found"))
Try this:
r = 0
c = txtbbus.Text
Dim i As Integer
Dim j As Integer
Dim FoundMatch As Boolean = False
For i = 0 To n - 1
For j = 0 To n - 1
If a(i, j) = c Then
FoundMatch = True
Exit For
End If
Next j
If FoundMatch = True Then
Exit For
End If
Next i
If FoundMatch = True Then
txtbres.Text = "The value exists " & r & " and it's in the position (" & i & ", " & j & ") "
Else
txtbres.Text = "The value doesn't exists"
End If
I'm going to assume c = txtbbus.Text is from some form input. Meaning a string. For the equality check you'd be testing against an Int type. Try casting the input from txtbbus.Text as an integer. Also, like the other poster said breaking from the loop on finding your match would also be a good decision.

VBA Reference Designators group by Prefix, using a filter but running into problem

Here is my code. I am taking the following designators and shorthanding grouping them to AR2-AR4,AR15,AT3-AT4,C68,C76,C316,C319,FL14-FL18,J1-J6,L2-5,etc. This is all working good except when the filter applies "L" in which it returns FL14,FL15,FL16,FL17,FL8,L2,L3,L4,L5. I need a way to do an exact character match or something.
Sub FormatAsRanges()
Dim Lne As String, arr, s
Dim n As Long, v As Long, prev As Long, inRange As Boolean
Dim test As String
Dim x As Variant
Dim filterarray As Variant
inRange = False
Lne = "AR15,AR2,AR3,AR4,AT3,AT4,C316,C319,C68,C76,FL14,FL15,FL16,FL17,FL18,FL6,J1,J2,J3,J4,J5,J6,L2,L3,L4,L5,T4,T5,T6,U38"
arr = Split(Lne, ",") 'Break apart references into array items
x = Prefix(arr) 'Get the Prefix's (AR,AT,C,FL,J,L,T,U)
x = Split(x, ",") ' Split them in an array
For j = 0 To UBound(x)
inRange = False 'Initialize to False
arr = Split(Lne, ",") ' Redifine arr since it is being filtered and use in the j loop for each prefix
filterarray = Filter(arr, x(j)) ' Apply filter
For i = 0 To UBound(filterarray)
filterarray(i) = Replace(filterarray(i), x(j), "")
Next i
arr = ArraySort(filterarray)
prev = -999 'dummy value
For n = LBound(filterarray) To UBound(filterarray)
v = CLng(filterarray(n))
If v - prev = 1 Then 'starting or continuing a range?
inRange = True 'wait until range ends before adding anything
Else
If inRange Then 'ending a range ?
s = s & "-" & x(j) & prev 'close out current range with previous item
inRange = False
End If
s = s & IIf(Len(s) > 0, ",", "") & x(j) & v 'add the current item
End If
prev = v
Next n
If inRange Then s = s & "-" & x(j) & prev 'close out last item if in a range
Debug.Print s
s = Empty
filterarray = Empty
Next j
End Sub
Function ArraySort(MyArray As Variant)
Dim First As Long, last As Long
Dim i As Long, j As Long, Temp
First = LBound(MyArray)
last = UBound(MyArray)
For i = First To last - 1
For j = i + 1 To last
If CLng(MyArray(i)) > CLng(MyArray(j)) Then
Temp = MyArray(j)
MyArray(j) = MyArray(i)
MyArray(i) = Temp
End If
Next j
Next i
ArraySort = MyArray
End Function
'get the character prefix (up to the first digit)
Public Function Prefix(a As Variant)
Dim rv As String, c As String, i As Long, j As Long, k As Integer, Prf As String
Dim flt(10) As String
Prf = "*" 'Initialize string
k = 0 'initialize
For j = 0 To UBound(a)
If InStr(a(j), Prf) Then
'Debug.Print "Yes"
Else
Prf = Empty
For i = 0 To Len(a(j))
c = Mid(a(j), i + 1, 1)
If c Like "#" Then
Exit For
Else
rv = rv & c
End If
Next i
Prf = rv
flt(k) = Prf
k = k + 1
rv = Empty
End If
Next j
For l = 0 To UBound(flt) 'Output as string so to define an array that is the correct size in the main program
If flt(l) Like "?" Then
rtn = rtn + flt(l) + ","
ElseIf flt(l) Like "??" Then
rtn = rtn + flt(l) + ","
ElseIf flt(l) Like "???" Then
rtn = rtn + flt(l) + ","
End If
Next l
rtn = Left(rtn, Len(rtn) - 1)
Prefix = rtn
End Function
You can move more of the code into separate methods:
Sub Tester()
Dim Lne As String, arr, allPrefixes, arrFilt, arrSorted, s, prefix
Lne = "AR15,AR2,AR3,AR4,AT3,AT4,C316,C319,C68,C76,FL14,FL15," & _
"FL16,FL17,FL18,FL6,J1,J2,J3,J4,J5,J6,L2,L3,L4,L5,T4,T5,T6,U38"
arr = Split(Lne, ",") 'split to an array
allPrefixes = UniquePrefixes(arr) 'All unique character prefixes
Debug.Print "All prefixes: " & Join(allPrefixes, ",")
'process each prefix in turn
For Each prefix In allPrefixes
arrFilt = FilterPrefixNumbers(arr, prefix) 'items for this prefix (numbers only)
Debug.Print , "'" & prefix & "' items:", Join(arrFilt, ",")
arrSorted = ArraySort(arrFilt) 'numeric parts, sorted ascending
Debug.Print , "Sorted:", Join(arrSorted, ",")
s = s & iif(s<>"", ",", "") & FormatAsRanges(arrSorted, prefix)
'Debug.Print FormatAsRanges(arrSorted, prefix)
Next prefix
Debug.Print s 'the whole thing
End Sub
Function FormatAsRanges(arr, prefix) As String
Dim s As String, n As Long, v As Long, prev As Long, inRange As Boolean
prev = -999 'dummy value
For n = LBound(arr) To UBound(arr)
v = CLng(arr(n))
If v - prev = 1 Then 'starting or continuing a range?
inRange = True 'wait until range ends before adding anything
Else
If inRange Then 'ending a range ?
s = s & "-" & prefix & prev 'close out current range with previous item
inRange = False
End If
s = s & IIf(Len(s) > 0, ",", "") & prefix & v 'add the current item
End If
prev = v
Next n
If inRange Then s = s & "-" & prefix & prev 'close out last item if in a range
FormatAsRanges = s
End Function
Function ArraySort(MyArray As Variant)
Dim First As Long, last As Long
Dim i As Long, j As Long, Temp
First = LBound(MyArray)
last = UBound(MyArray)
For i = First To last - 1
For j = i + 1 To last
If CLng(MyArray(i)) > CLng(MyArray(j)) Then
Temp = MyArray(j)
MyArray(j) = MyArray(i)
MyArray(i) = Temp
End If
Next j
Next i
ArraySort = MyArray
End Function
'return an array *of numbers* from all items in "arr" with the given prefix
Function FilterPrefixNumbers(arr, prefix)
Dim rv(), e, n As Long
ReDim rv(LBound(arr) To UBound(arr))
n = LBound(arr)
For Each e In arr
If GetPrefix(CStr(e)) = prefix Then
rv(n) = Replace(e, prefix, "") 'return just the numeric parts...
n = n + 1
End If
Next e
ReDim Preserve rv(LBound(arr) To n - 1) 'shrink to remove any empty slots
FilterPrefixNumbers = rv
End Function
'all unique character prefixes
Function UniquePrefixes(arr)
Dim dict, e
Set dict = CreateObject("scripting.dictionary")
For Each e In arr
dict(GetPrefix(CStr(e))) = True
Next e
UniquePrefixes = dict.keys
End Function
'get the character prefix (all non-digit characters preceding the first digit)
Function GetPrefix(v As String) As String
Dim rv As String, c As String, i As Long
For i = 1 To Len(v)
c = Mid(v, i, 1)
If c Like "#" Then
Exit For
Else
rv = rv & c
End If
Next i
GetPrefix = rv
End Function
Eliminate Filter() and Replace() functions. Given that input data is already sorted alphabetically by prefix, following revised procedure works:
Sub FormatAsRanges()
Dim Lne As String, arr, s
Dim n As Long, v As Long, prev As Long
Dim inRange As Boolean
Dim j As Integer, i As Integer
Dim x As Variant
Dim filterarray As Variant
Lne = "AR15,AR2,AR3,AR4,AT3,AT4,C316,C319,C68,C76,FL14,FL15,FL16,FL17,FL18,FL6,J1,J2,J3,J4,J5,J6,L2,L3,L4,L5,T4,T5,T6,U38"
arr = Split(Lne, ",") 'Break apart references into array items
x = Split(Prefix(arr), ",") 'Get the Prefix's (AR,AT,C,FL,J,L,T,U)
For j = 0 To UBound(x)
inRange = False 'Initialize to False
Do While arr(i) Like x(j) & "*" And i <= UBound(arr)
If arr(i) Like x(j) & "*" Then
s = s & Mid(arr(i), Len(x(j)) + 1) & ","
If i = UBound(arr) Then
Exit Do
Else
i = i + 1
End If
End If
Loop
If Right(s, 1) = "," Then s = Left(s, Len(s) - 1)
filterarray = ArraySort(Split(s, ","))
prev = -999 'dummy value
s = ""
For n = LBound(filterarray) To UBound(filterarray)
v = CLng(filterarray(n))
If v - prev = 1 Then 'starting or continuing a range?
inRange = True 'wait until range ends before adding anything
Else
If inRange Then 'ending a range ?
s = s & "-" & x(j) & prev 'close out current range with previous item
inRange = False
End If
s = s & IIf(Len(s) > 0, ",", "") & x(j) & v 'add the current item
End If
prev = v
Next n
If inRange Then s = s & "-" & x(j) & prev 'close out last item if in a range
Debug.Print s
s = Empty
filterarray = Empty
Next j
End Sub

Excel macro to sort data by word count in one column and expanding to other data columns

I have a data worksheet of 11 columns and over 6000 rows. I need to sort column F by the number of words in each cell, smallest to largest, expanding the sort to the entire data in 11 columns.
Can this be achieved? Appreciating any assistance in this matter.
The simplest solution would be to use worksheet formula as suggested in Microsoft support artice kb213889. If you are prepared to assume the deliminator is a ' ' for example and that the column will only contain valid words, in row L you could add a sort-index column with the following formula:
=IF(F1=TRIM(""),-1,LEN(TRIM(F1))-LEN(SUBSTITUTE(TRIM(F1)," ","")))
There are several options for sorting on the basis of the indexed column e.g. VBA (e.g. Steve Bullen's QuickSort()) or using standard Excel sorting functionality.
If you insist on doing the entire calculation in VBA a possible solution would be to read the entire array into a variant and append an addition column onto the array and again use the QuickSort() algorithm above to sort the array. A simple function to count the number of words could be:
'count words in input string - assume only words passed in
Public Function CountWords(vInput As Variant, Optional sDelim As String = " ") As Integer
Dim iWordCount as integer
Dim sString as string
sString = Trim$(vInput)
If Len(sString) = 0 Then
iWordCount = 0
ElseIf InStr(1, sString, sDelim) = 0 Then
iWordCount = 1
Else
'return array 0 based
iWordCount = UBound(VBA.Split(sString, sDelim)) + 1
End If
CountWords = iWordCount
End
below variant based on assumption that your data table in range "A:K" (as you mentioned 11 columns). Replace "Sheet1" to the sheet name with your data
Sub sort_by_words_count()
Dim Cl As Range, S() As String, i&
Application.ScreenUpdating = 0
With ThisWorkbook.Worksheets("Sheet1")
i = .Cells.Find("*", , , , xlByRows, xlPrevious).Row
For Each Cl In .Range("F2:F" & i)
S = Split(WorksheetFunction.Trim(Cl.Value))
Cl.Value = Right("0000" & UBound(S()) + 1, 4) & "|" & Cl.Value
Next Cl
If .AutoFilterMode Then .AutoFilterMode = False
.Range("A1:K" & i).AutoFilter
.AutoFilter.Sort.SortFields.Add _
Key:=Range("F1:F" & i), _
SortOn:=xlSortOnValues, _
Order:=xlAscending, _
DataOption:=xlSortNormal
With .AutoFilter.Sort
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
For Each Cl In .Range("F2:F" & i)
Cl.Value = Split(Cl.Value, "|")(1)
Next Cl
End With
Application.ScreenUpdating = 1
End Sub
sheet before sort
sheet after sort
Assuming names are in column B, this macro puts all cells including both "Jon" and "Smith" into an array. Another array is created to show how many words are in each array element. The sort2 sub is then called to sort myArr by wordCountarr
I found Sort2 sub HERE , it was written by member Gary's Student and appears to work flawlessly.
Sub create2arr()
Dim myArr() As Variant, name1 As String, name2 As String, firstMarker As Boolean, myArrayCounter As Long, myArray2Counter As Long
Dim splitArr() As String, wordCountArr() As Variant
name1 = "Jon"
name2 = "Smith"
ReDim myArr(1 To 1)
ReDim myArr2(1 To 1)
ReDim wordCountArr(1 To 1)
myArrayCounter = 1
myArray2Counter = 1
For I = 1 To 3
splitArr = Split(Sheet6.Range("B" & I))
For J = LBound(splitArr) To UBound(splitArr)
If splitArr(J) = name1 Or splitArr(J) = name2 Then
If firstMarker = True Then
myArr(myArrayCounter) = Sheet6.Range("B" & I)
wordCountArr(myArrayCounter) = UBound(splitArr) + 1
myArrayCounter = myArrayCounter + 1
ReDim Preserve myArr(1 To myArrayCounter)
ReDim Preserve wordCountArr(1 To myArrayCounter)
firstMarker = False
Else
firstMarker = True
End If
End If
Next J
Next I
For I = 1 To UBound(myArr)
Debug.Print myArr(I)
Next I
Call sort2(wordCountArr, myArr)
For I = 1 To UBound(myArr)
Debug.Print myArr(I)
Next I
End Sub
Sub sort2(key() As Variant, other() As Variant)
Dim I As Long, J As Long, Low As Long
Dim Hi As Long, Temp As Variant
Low = LBound(key)
Hi = UBound(key)
J = (Hi - Low + 1) \ 2
Do While J > 0
For I = Low To Hi - J
If key(I) > key(I + J) Then
Temp = key(I)
key(I) = key(I + J)
key(I + J) = Temp
Temp = other(I)
other(I) = other(I + J)
other(I + J) = Temp
End If
Next I
For I = Hi - J To Low Step -1
If key(I) > key(I + J) Then
Temp = key(I)
key(I) = key(I + J)
key(I + J) = Temp
Temp = other(I)
other(I) = other(I + J)
other(I + J) = Temp
End If
Next I
J = J \ 2
Loop
End Sub

Easy generic print Dictionary to Excel Worksheet using VBA

I'm working with really intricate data. Because of this I wrote this really nice function to print data to the debug area - the imediate window you can reach with Ctrl + G on the VBA, inside Excel. I need a similar function to print this generic data (that has numbers, strings, dictionarys and arrays) to a worksheet.
'call using: Call PrintDict(data)
' Where data can be a number, a string, a dictionary or an Array,
' with any of these inside.
Sub PrintDict(ByVal dicttoprint As Variant, Optional indent As Integer = 0, Optional wasdict As Boolean = False)
Dim i As Long
Dim j As Long
Dim indentStr As String
indentStr = ""
i = 0
Do While i < indent
indentStr = indentStr + " "
i = i + 1
Loop
Dim key
If (TypeName(dicttoprint) = "Dictionary") Then
If (wasdict = True) Then
Debug.Print vbNewLine;
End If
For Each key In dicttoprint.Keys:
Debug.Print indentStr & key & " ";
Call PrintDict(dicttoprint.Item(key), indent + 2, True)
Next
ElseIf (TypeName(dicttoprint) = "Variant()") Then
If (wasdict = True) Then
Debug.Print vbNewLine;
End If
For j = LBound(dicttoprint) To UBound(dicttoprint)
Call PrintDict(dicttoprint(j), indent + 2)
Next j
Else
Debug.Print indentStr & dicttoprint & " "
End If
End Sub
Edit1:
Ok, been thinking about, I have an idea, but can't solve some corner cases...
Example expected output below:
key1:____|__________|__________|__________|_________|
_________|key1.1:___|_numvalue_|__________|_________|
_________|__________|_numvalue_|__________|_________|
_________|__________|_arr1Indx1|_numvalue_|_________|
_________|__________|_arr1Indx2|_numvalue_|_________|
_________|__________|_arr1Indx3|_numvalue_|_________|
_________|key1.2:___|_numvalue_|__________|_________|
_________|__________|_numvalue_|__________|_________|
key2:____|_numvalue_|__________|__________|_________|
key3:____|__________|__________|__________|_________|
_________|_arr2Indx1|keyA.1:___|_numvalue_|_________|
_________|__________|keyA.2:___|_strvalue_|_________|
_________|_arr2Indx2|_numvalue_|__________|_________|
Ok, I think now this output solves some corner cases. Any ideas on how to implement it?
I'm thinking on having the function be able to pass X,Y parameters, that are optional and to return last Y. When working with text, the cursor naturally goes down, I don't know how to do this through recursion in a worksheet.
Edit 2:
Ok, this is pseudo code idea - is almost VBA, but I don't know how to make this work...
Function PrintToWS(ByVal data As Variant, _
Optional rowi As Integer = 0, _
Optional coli As Integer = 0) As Integer
If (TypeName(data) = "Dictionary") Then
For Each key In data.Keys:
Cells(rowi, coli).Value = key
coli = coli + PrintToWS(data.Item(key), rowi+1, coli)
Next
ElseIf (TypeName(data) = "Variant()") Then
For j = LBound(data) To UBound(data)
coli = coli + PrintToWS(data(j), rowi+1, coli)
Next j
Else
Cells(rowi, coli).Value = data
coli = coli + 1
End If
PrintToWS = coli
End Function
Edit2:
Added it in a gist here
Solved. Code is below:
'usage: PrintToWS(yourdata)
' Optional parameters are to be used internally by the function,
'leave optional parameters blank.
Function PrintToWS(ByVal data As Variant, _
Optional rowi As Integer = 1, _
Optional coli As Integer = 1, _
Optional wasdict As Integer = 0) As Integer
Dim key
Dim j As Integer
If (TypeName(data) = "Dictionary") Then
For Each key In data.Keys:
Cells(rowi + wasdict, coli).Value = key
rowi = PrintToWS(data.Item(key), rowi + wasdict, coli + 1, 1)
wasdict = 0
Next
ElseIf (TypeName(data) = "Variant()") Then
For j = LBound(data) To UBound(data)
rowi = PrintToWS(data(j), rowi, coli + 1)
Next j
Else
Cells(rowi, coli).Value = data
rowi = rowi + 1
End If
PrintToWS = rowi
End Function

Highlight (not delete) repeat sentences or phrases

I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper I could find duplicates of the same phrases.
I copy and pasted a lot of documentation from past papers and was hoping to find a simple way to find any repeated information in this 40+ page document there is a lot of different formatting but I would be willing to temporarily get rid of formatting in order to find repeated information.
To highlight all duplicate sentences, you can also use ActiveDocument.Sentences(i). Here is an example
LOGIC
1) Get all the sentences from the word document in an array
2) Sort the array
3) Extract Duplicates
4) Highlight duplicates
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
BEFORE
AFTER
I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:
Option Explicit
Sub test()
Dim ABC As Scripting.Dictionary
Dim v As Range
Dim n As Integer
n = 5
Set ABC = FindRepeatingWordChains(n, ActiveDocument)
' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
' Loop through this collection to make your selections/highlights/whatever you want to do.
If Not ABC Is Nothing Then
For Each v In ABC
v.Font.Color = wdColorRed
Next v
End If
End Sub
' This is where the real code begins.
Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
Dim sChain As String
Dim CurWord As Range
Dim MatchCount As Integer
Dim i As Integer
MatchCount = 0
For Each CurWord In DocToCheck.Words
' Make sure there are enough remaining words in our document to handle a chain of the length specified.
If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
' Check for non-printing characters in the first/last word of the chain.
' This code will read a vbCr, etc. as a word, which is probably not desired.
' However, this check does not exclude these 'words' inside the chain, but it can be modified.
If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
sChain = CurWord
For i = 1 To ChainLenth - 1
' Add each word from the current word through the next ChainLength # of words to a temporary string.
sChain = sChain & " " & CurWord.Next(wdWord, i)
Next i
' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
' If not, then add it to the dictionary and increment our index.
If DictWords.Exists(sChain) Then
MatchCount = MatchCount + 1
DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
Else
DictWords.Add sChain, sChain
End If
End If
End If
Next CurWord
' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
If DictMatches.Count > 0 Then
Set FindRepeatingWordChains = DictMatches
Else
Set FindRepeatingWordChains = Nothing
End If
End Function
I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.
See the test() sub for usage.
You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.
I chose a rather lame theory, but it seems to work (at least if I got the question right cuz sometimes I'm a slow understander).
I load the entire text into a string, load the individual words into an array, loop through the array and concatenate the string, containing each time three consecutive words.
Because the results are already included in 3 word groups, 4 word groups or more will automatically be recognized.
Option Explicit
Sub Find_Duplicates()
On Error GoTo errHandler
Dim pSingleLine As Paragraph
Dim sLine As String
Dim sFull_Text As String
Dim vArray_Full_Text As Variant
Dim sSearch_3 As String
Dim lSize_Array As Long
Dim lCnt As Long
Dim lCnt_Occurence As Long
'Create a string from the entire text
For Each pSingleLine In ActiveDocument.Paragraphs
sLine = pSingleLine.Range.Text
sFull_Text = sFull_Text & sLine
Next pSingleLine
'Load the text into an array
vArray_Full_Text = sFull_Text
vArray_Full_Text = Split(sFull_Text, " ")
lSize_Array = UBound(vArray_Full_Text)
For lCnt = 1 To lSize_Array - 1
lCnt_Occurence = 0
sSearch_3 = Trim(fRemove_Punctuation(vArray_Full_Text(lCnt - 1) & _
" " & vArray_Full_Text(lCnt) & _
" " & vArray_Full_Text(lCnt + 1)))
With Selection.Find
.Text = sSearch_3
.Forward = True
.Replacement.Text = ""
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
Do While .Execute
lCnt_Occurence = lCnt_Occurence + 1
If lCnt_Occurence > 1 Then
Selection.Range.Font.Color = vbRed
End If
Selection.MoveRight
Loop
End With
Application.StatusBar = lCnt & "/" & lSize_Array
Next lCnt
errHandler:
Stop
End Sub
Public Function fRemove_Punctuation(sString As String) As String
Dim vArray(0 To 8) As String
Dim lCnt As Long
vArray(0) = "."
vArray(1) = ","
vArray(2) = ","
vArray(3) = "?"
vArray(4) = "!"
vArray(5) = ";"
vArray(6) = ":"
vArray(7) = "("
vArray(8) = ")"
For lCnt = 0 To UBound(vArray)
If Left(sString, 1) = vArray(lCnt) Then
sString = Right(sString, Len(sString) - 1)
ElseIf Right(sString, 1) = vArray(lCnt) Then
sString = Left(sString, Len(sString) - 1)
End If
Next lCnt
fRemove_Punctuation = sString
End Function
The code assumes a continuous text without bullet points.