I need analog of Application.Evaluate method in MS Word - vba

I've an mathematical expression calculator. For instance: (2+3)*5. The cod works correctly in excel. But in MS Word it doesn't work, becouse there are no Application.Evaluate method in MS Word
I need help with finding an analog of this method in MS Word..
Private Sub TextBox1_Change()
Dim strExpr As String, p As Long, ss As String, qwer As String, i As Integer, a
On Error Resume Next
strExpr = TextBox1.Value
Do
p = InStr(strExpr, ",")
If p = 0 Then Exit Do Else strExpr = Left(strExpr, p - 1) & "." & Mid(strExpr, p + 1)
Loop
If strExpr = Empty Then
Label3.Caption = "": Label1.Caption = ""
Else
Label3.Caption = Application.Evaluate(strExpr)
a = Split(Label3.Caption, ",")
qwer = StrReverse(a(0))
For i = 1 To Len(qwer)
If i Mod 3 = 0 Then ss = ss & Mid(qwer, i, 1) & " " _
Else ss = ss & Mid(qwer, i, 1)
Next
Label1.Caption = StrReverse(ss): ss = ""
If UBound(a) > 0 Then Label1.Caption = Label1.Caption & "," & a(1)
End If
End Sub
But this line of cod doesn't work:
Label3.Caption = Application.Evaluate(strExpr)
I've changed this line of cod:
Label3.Caption = Application.Evaluate(strExpr)
to this line:
Selection=strExpr
Label3.Caption = Selection.Calculate(strExpr)
but, at the same time, the mathematical expression written inside TextBox1 is written / copied into an MS Word document ...
and it returns value with type single...

Related

VBA PPT error while reading CR/LF character

I have encountered an odd problem while writing VBA macro for Ppt document.
In the code I would like to extract text in frames at the very top of each slide of my presentation.
The text is divided by newline character, so first I would like to search for newline character.
I am using Instr function to search for the position of a newline in a string.
The code is the following:
Sub SetTitle()
Dim sl As Slide
Dim sh As Shape
Dim trng As TextRange
Dim asptext() As Variant
For Each sl In ActivePresentation.Slides
For Each sh In sl.Shapes
If sh.Top = 5.403701 Then
Set trng = sh.TextFrame.TextRange
txt = trng.Text
Debug.Print txt
pos = InStr(1, Chr(13), txt)
Debug.Print pos
Debug.Print Asc(Mid(txt, 7, 1))
If pos <> 0 Then
Debug.Print pos
End If
End If
Next sh
Next sl
End Sub
In Immediate window I am getting the following results : https://i.stack.imgur.com/Ig41i.png
The debugger gives an error:
Runtime error '5': Invalid procedure call or argument
for the line
Debug.Print Asc(Mid(txt, 7, 1))
So I suppose that there is a problem in recognizing this NL character.
Do you have any idea why it is like this?
The error comes likely not from any NewLine character but from a string that is too short. Mid(txt, 7, 1) will return an empty string, and that is not a valid parameter for the Asc-function.
To check for Newline-characters, you can use the constants vbCr (same as Chr(13)) and vbLf (same as Chr(10)) and vbCrLf.
If you are unsure about the content of a string, you can use the following function:
Function DumpString(s As String) As String
Const Separator = ", "
' Write all chars of String as ASCII-Value
Dim i As Long
For i = 1 To Len(s)
Dim a As Long, c As String
a = AscW(Mid(s, i, 1))
If a = AscW(vbCr) Then
c = "<CR>"
ElseIf a = AscW(vbLf) Then
c = "<LF>"
ElseIf a = AscW(vbTab) Then
c = "<TAB>"
ElseIf a = 0 Then
c = "<NUL>"
Else
c = Mid(s, i, 1)
End If
DumpString = DumpString & IIf(DumpString = "", "", Separator) & a & "(" & c & ")"
Next i
End Function

Generate list from strings and numbers vba

This question is based on this puzzle that I am trying to do in vba: https://codegolf.stackexchange.com/questions/166765/fun-with-strings-and-numbers
Basically we have strings in col A and numbers in column B and in column C we have to generate a list so that:
The total count of any string should be exactly equal to its
corresponding number in the input data.
No string should be repeated adjacently in the sequence, and every
string should appear in the output list.
The selection of the next string should be done randomly as long as
they don't break above two rules. Each solution should have a
non-zero probability of being chosen.
If no combination is possible, the output should be just 0.
I tried this but I don't how to solve the problem so that it doesn't break rule #2. Any input would be appreciated thanks.
Sub generateList()
Application.ScreenUpdating = False
Dim fI As Long, totTimes As Long, i As Long, j As Long, fO As Long, tryCount As Long
Dim myArr()
Dim randNum As Long
OUT.Range("A1:A" & OUT.Rows.Count).Clear
fO = 1
With DATA
fI = .Range("A" & .Rows.Count).End(xlUp).Row
If fI < 2 Then MsgBox "No data!": Exit Sub
.Sort.SortFields.Clear
.Sort.SortFields.Add Key:=Range("B2:B" & fI), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
With DATA.Sort
.SetRange DATA.Range("A1:B" & fI)
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
fI = .Range("A" & .Rows.Count).End(xlUp).Row
If fI < 2 Then MsgBox "No data!": Exit Sub
totTimes = 0: j = 0
For i = 2 To fI
If Trim(.Range("A" & i).Value) <> "" And IsNumeric(.Range("B" & i).Value) Then j = j + 1
Next i
If j < 1 Then MsgBox "No valid data present. Make sure column B has numbers and column A some string.": Exit Sub
ReDim Preserve myArr(1 To j, 1 To 2)
j = 0
For i = 2 To fI
If Trim(.Range("A" & i).Value) <> "" And IsNumeric(.Range("B" & i).Value) Then
totTimes = totTimes + CLng(.Range("B" & i).Value)
j = j + 1
myArr(j, 1) = .Range("A" & i)
myArr(j, 2) = .Range("B" & i)
End If
Next i
Do While totTimes > 0
randNum = WorksheetFunction.RandBetween(1, j)
If myArr(randNum, 2) > 0 Then
totTimes = totTimes - 1
OUT.Range("A" & fO) = myArr(randNum, 1)
myArr(randNum, 2) = myArr(randNum, 2) - 1
fO = fO + 1
End If
tryAgain:
Loop
End With
Application.ScreenUpdating = True
OUT.Activate
MsgBox "Process Completed"
End Sub
I have a solution (that isn't based on yours, unfortunately) that gives correct results... some of the time. I think I know why it falls short, I just have given up on fixing it.
It's also terrible for golfing, since it's a rather large amount of code, and it's an unholy mishmash of different approaches and implementation ideas that I made up as I went (and I never cleaned it up properly)... but maybe some of this will inspire you to get further.
As per rule #3, I select each letter at random. It was hit and miss using only that approach so I moved to weighted probabilities, which is what the code further down uses - and it seems to work somewhat well. Occasionally there will be 1 letter too many for one of the elements, or there will be adjacent equal elements, so it doesn't actually solve the puzzle all the time.
Ideas to remedy this problem:
Adjust the probability weights based on the frequency each letter has already been used. If you set dbg to true, you'll see that I implemented some calculations with that in mind, but never got around to figuring out how to actually adjust the weights themselves.
Hardcode a check or two for how many letters have been used early in the result, for the largest element group
Change the rand section to make more than 1 pass (maybe best out of 3) - the weights are sorted by "size", so doing 3 (or n) passes should increasingly favor the larger element groups
Maybe a combination of the first and the last suggestion.
Here's the code:
Sub NonRepeatSort(v() As String)
Dim lElementCount As Long
Dim lElement As Element ' Largest
Dim tElement As Long ' Total element count
Dim tEleGroups As Long ' Number of groups of elements
Dim tEle As Element
Dim e As Element
Dim EleCol As New Collection
Dim dbg As Boolean
dbg = False
Dim s As String, res As String, previousRes As String, inputString As String
Dim lCounter As Long
For i = 1 To UBound(v)
' Check if element already exists
On Error Resume Next
s = ""
s = EleCol.Item(v(i, 1))
On Error GoTo 0
' If not, create new
If s = "" Then
Set tEle = New Element
With tEle
.SetName = v(i, 1)
.SetTotal = CLng(v(i, 2))
End With
EleCol.Add Item:=tEle, Key:=tEle.Name
End If
Next i
For Each e In EleCol
' Find the largest element
If e.Total > lElementCount Then
lElementCount = e.Total
Set lElement = e
End If
' Count total elements
tElement = tElement + e.Total
' And groups
tEleGroups = tEleGroups + 1
' Generate inputstring
For k = 1 To e.Total
inputString = inputString + e.Name
Next k
Next e
' If the largest element is larger than the total remaining elements, we'll break rule 4
If lElement.Total - (tElement - lElement.Total) > 1 Then
Debug.Print "0"
GoTo EndForSomeReason
End If
' Bubble sort - lowest to highest
' Adapted from https://stackoverflow.com/a/3588073/4604845
Dim tmpE As Element
For x = 1 To EleCol.Count - 1
For y = 1 To EleCol.Count
If EleCol.Item(x).Total > EleCol.Item(y).Total Then
Set tmpE = EleCol.Item(y)
EleCol.Remove y
EleCol.Add tmpE, tmpE.Name, x
End If
Next y
Next x
' Weighted probability array
Dim pArr() As Variant, tmpProb As Double
ReDim Preserve pArr(1 To 2, 1 To EleCol.Count)
For u = 1 To UBound(pArr, 2)
Set pArr(2, u) = EleCol.Item(u)
tmpProb = tmpProb + pArr(2, u).Freq(tElement)
pArr(1, u) = tmpProb
Next u
' The meat of it
Dim r As Long, lBool As Boolean, sLen As Long, o As Double, t As Long
For j = 1 To tElement
Do
' Reset loop control
lBool = False
' Generate a random number between 1 and 100 _
to decide which group we pick a letter from
r = Rand1To100
For i = 1 To UBound(pArr, 2)
If r <= pArr(1, i) And Not r > pArr(1, i) Then
If dbg Then Debug.Print "Probability match: " & pArr(2, t).Name
t = i
Exit For
End If
Next i
Set tEle = EleCol.Item(t)
If dbg Then Debug.Print "Name: " & tEle.Name
' If the random group is different from the previous result, proceed
If tEle.Name <> previousRes Then
lBool = True
Else
If dbg Then Debug.Print "This was also the previous result - skipping"
End If
' If the use-frequency for the random group is lower than _
how many times it appears in the string, proceed
If lBool Then
o = Round((tEle.Used / tElement) * 100, 5)
If dbg Then Debug.Print "Freq: " & tEle.Freq(tElement)
If dbg Then Debug.Print "Used: " & tEle.UsedFreqI()
If dbg Then Debug.Print "res%: " & Round((Len(res) / tElement) * 100, 1)
If dbg Then Debug.Print "o : " & o
' check use-frequency against modeled frequency
If o < tEle.Freq(tElement) Then
If dbg Then Debug.Print "Proceed with " & tEle.Name
lBool = True
Else
lBool = False
End If
End If
If dbg Then Debug.Print "----------"
lCounter = lCounter + 1
Loop While (Not lBool And lCounter < 1000)
tEle.IncrementUsed
res = res + tEle.Name
previousRes = tEle.Name
Next j
' Generate results
Debug.Print "INPUT : " & inputString
Debug.Print "RESULT: " & res
EndForSomeReason:
End Sub
Function Rand1To100() As Long
Dim r As Long
Randomize
r = ((100 - 1) * Rnd + 1)
r = Round(r, 0)
Rand1To100 = r
End Function
Private Sub TestSort()
Dim v(1 To 4, 1 To 2) As String
v(1, 1) = "A"
v(1, 2) = "6"
v(2, 1) = "B"
v(2, 2) = "2"
v(3, 1) = "C"
v(3, 2) = "2"
v(4, 1) = "D"
v(4, 2) = "4"
Call NonRepeatSort(v)
End Sub
And you'll need this class module:
' * Class module named Element
Private pName As String
Private pTotal As Long
Private pUsed As Long
Private FrequencyCoefficient As Long ' Obsolete?
' Name
Public Property Get Name() As String
Name = pName
End Property
Public Property Let SetName(s As String)
pName = s
End Property
' Total
Public Property Get Total() As Long
Total = pTotal
End Property
Public Property Let SetTotal(t As Long)
pTotal = t
End Property
' Used
Public Property Get Used() As Long
Used = pUsed
End Property
Public Sub IncrementUsed()
pUsed = pUsed + 1
End Sub
' Freq coefficient
Public Property Get Freq(f As Long) As Double
' Where f is the total number of elements
'Freq = FrequencyCoefficient
Freq = Round((Me.Total / f) * 100, 5)
End Property
Private Property Let SetFreq(f As Long)
' Obsolete?
' Where f is the total number of elements
FrequencyCoefficient = Round((Me.Total / f) * 100)
End Property
' Used freq - internal
Public Property Get UsedFreqI() As Long
If Me.Used > 0 Then
UsedFreqI = Round((Me.Used / Me.Total) * 100)
'Debug.Print "UF: " & UsedFreqI
Else
UsedFreqI = 0
End If
End Property
' Used freq - external
Public Property Get UsedFreqE(f As Long) As Long
If Me.Used > 0 Then
UsedFreq = Round((Me.Used / f) * 100)
Else
UsedFreq = 0
End If
End Property

Easy generic print Dictionary to Excel Worksheet using VBA

I'm working with really intricate data. Because of this I wrote this really nice function to print data to the debug area - the imediate window you can reach with Ctrl + G on the VBA, inside Excel. I need a similar function to print this generic data (that has numbers, strings, dictionarys and arrays) to a worksheet.
'call using: Call PrintDict(data)
' Where data can be a number, a string, a dictionary or an Array,
' with any of these inside.
Sub PrintDict(ByVal dicttoprint As Variant, Optional indent As Integer = 0, Optional wasdict As Boolean = False)
Dim i As Long
Dim j As Long
Dim indentStr As String
indentStr = ""
i = 0
Do While i < indent
indentStr = indentStr + " "
i = i + 1
Loop
Dim key
If (TypeName(dicttoprint) = "Dictionary") Then
If (wasdict = True) Then
Debug.Print vbNewLine;
End If
For Each key In dicttoprint.Keys:
Debug.Print indentStr & key & " ";
Call PrintDict(dicttoprint.Item(key), indent + 2, True)
Next
ElseIf (TypeName(dicttoprint) = "Variant()") Then
If (wasdict = True) Then
Debug.Print vbNewLine;
End If
For j = LBound(dicttoprint) To UBound(dicttoprint)
Call PrintDict(dicttoprint(j), indent + 2)
Next j
Else
Debug.Print indentStr & dicttoprint & " "
End If
End Sub
Edit1:
Ok, been thinking about, I have an idea, but can't solve some corner cases...
Example expected output below:
key1:____|__________|__________|__________|_________|
_________|key1.1:___|_numvalue_|__________|_________|
_________|__________|_numvalue_|__________|_________|
_________|__________|_arr1Indx1|_numvalue_|_________|
_________|__________|_arr1Indx2|_numvalue_|_________|
_________|__________|_arr1Indx3|_numvalue_|_________|
_________|key1.2:___|_numvalue_|__________|_________|
_________|__________|_numvalue_|__________|_________|
key2:____|_numvalue_|__________|__________|_________|
key3:____|__________|__________|__________|_________|
_________|_arr2Indx1|keyA.1:___|_numvalue_|_________|
_________|__________|keyA.2:___|_strvalue_|_________|
_________|_arr2Indx2|_numvalue_|__________|_________|
Ok, I think now this output solves some corner cases. Any ideas on how to implement it?
I'm thinking on having the function be able to pass X,Y parameters, that are optional and to return last Y. When working with text, the cursor naturally goes down, I don't know how to do this through recursion in a worksheet.
Edit 2:
Ok, this is pseudo code idea - is almost VBA, but I don't know how to make this work...
Function PrintToWS(ByVal data As Variant, _
Optional rowi As Integer = 0, _
Optional coli As Integer = 0) As Integer
If (TypeName(data) = "Dictionary") Then
For Each key In data.Keys:
Cells(rowi, coli).Value = key
coli = coli + PrintToWS(data.Item(key), rowi+1, coli)
Next
ElseIf (TypeName(data) = "Variant()") Then
For j = LBound(data) To UBound(data)
coli = coli + PrintToWS(data(j), rowi+1, coli)
Next j
Else
Cells(rowi, coli).Value = data
coli = coli + 1
End If
PrintToWS = coli
End Function
Edit2:
Added it in a gist here
Solved. Code is below:
'usage: PrintToWS(yourdata)
' Optional parameters are to be used internally by the function,
'leave optional parameters blank.
Function PrintToWS(ByVal data As Variant, _
Optional rowi As Integer = 1, _
Optional coli As Integer = 1, _
Optional wasdict As Integer = 0) As Integer
Dim key
Dim j As Integer
If (TypeName(data) = "Dictionary") Then
For Each key In data.Keys:
Cells(rowi + wasdict, coli).Value = key
rowi = PrintToWS(data.Item(key), rowi + wasdict, coli + 1, 1)
wasdict = 0
Next
ElseIf (TypeName(data) = "Variant()") Then
For j = LBound(data) To UBound(data)
rowi = PrintToWS(data(j), rowi, coli + 1)
Next j
Else
Cells(rowi, coli).Value = data
rowi = rowi + 1
End If
PrintToWS = rowi
End Function

MS Word 2010 VBA - how to parse non-latin characters?

I have a VBA script that prints out the word frequency in a document.
Sub WordFrequency()
Const maxwords = 9000 'Maximum unique words allowed
Dim SingleWord As String 'Raw word pulled from doc
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for unique words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim ans As String 'How user wants to sort results
Dim tword As String '
' Set up excluded words
Excludes = "[the][a][of][is][to][for][by][be][and][are]"
' Find out how to sort
ByFreq = True
ans = InputBox("Sort by WORD or by FREQ?", "Sort order", "WORD")
If ans = "" Then End
If UCase(ans) = "WORD" Then
ByFreq = False
End If
Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(LCase(aword))
'Out of range?
If SingleWord < "a" Or SingleWord > "z" Then
SingleWord = ""
End If
'On exclude list?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = ""
End If
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("Too many words.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & ", Unique: " & WordNum
Next aword
' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) _
Or (ByFreq And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j
' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Trim(Str(Freq(j))) _
& vbTab & Words(j) & vbCrLf
Next j
End With
System.Cursor = wdCursorNormal
j = MsgBox("There were " & Trim(Str(WordNum)) & _
" different words ", vbOKOnly, "Finished")
End Sub
However, this VBA script is not picking up non-latin characters in the document.
I am using Arial Unicode MS and another unicode font
What do I need in the macro to pick up these words?
-Thanks
It can't work with this...
If SingleWord < "a" Or SingleWord > "z" Then
SingleWord = ""
End If
Remove that, and you should at least see non-Latin words. Beyond that, it's more a question of what you are trying to filter out. If it's stuff like "'words' starting with a digit or a punctuation mark" then you probably need to have a look through the Unicode tables and work out a better set of things to filter out (bearing in mind that some transliterated scripts may even use digits at the beginnings of words).

Highlight (not delete) repeat sentences or phrases

I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper I could find duplicates of the same phrases.
I copy and pasted a lot of documentation from past papers and was hoping to find a simple way to find any repeated information in this 40+ page document there is a lot of different formatting but I would be willing to temporarily get rid of formatting in order to find repeated information.
To highlight all duplicate sentences, you can also use ActiveDocument.Sentences(i). Here is an example
LOGIC
1) Get all the sentences from the word document in an array
2) Sort the array
3) Extract Duplicates
4) Highlight duplicates
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
BEFORE
AFTER
I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:
Option Explicit
Sub test()
Dim ABC As Scripting.Dictionary
Dim v As Range
Dim n As Integer
n = 5
Set ABC = FindRepeatingWordChains(n, ActiveDocument)
' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
' Loop through this collection to make your selections/highlights/whatever you want to do.
If Not ABC Is Nothing Then
For Each v In ABC
v.Font.Color = wdColorRed
Next v
End If
End Sub
' This is where the real code begins.
Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
Dim sChain As String
Dim CurWord As Range
Dim MatchCount As Integer
Dim i As Integer
MatchCount = 0
For Each CurWord In DocToCheck.Words
' Make sure there are enough remaining words in our document to handle a chain of the length specified.
If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
' Check for non-printing characters in the first/last word of the chain.
' This code will read a vbCr, etc. as a word, which is probably not desired.
' However, this check does not exclude these 'words' inside the chain, but it can be modified.
If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
sChain = CurWord
For i = 1 To ChainLenth - 1
' Add each word from the current word through the next ChainLength # of words to a temporary string.
sChain = sChain & " " & CurWord.Next(wdWord, i)
Next i
' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
' If not, then add it to the dictionary and increment our index.
If DictWords.Exists(sChain) Then
MatchCount = MatchCount + 1
DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
Else
DictWords.Add sChain, sChain
End If
End If
End If
Next CurWord
' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
If DictMatches.Count > 0 Then
Set FindRepeatingWordChains = DictMatches
Else
Set FindRepeatingWordChains = Nothing
End If
End Function
I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.
See the test() sub for usage.
You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.
I chose a rather lame theory, but it seems to work (at least if I got the question right cuz sometimes I'm a slow understander).
I load the entire text into a string, load the individual words into an array, loop through the array and concatenate the string, containing each time three consecutive words.
Because the results are already included in 3 word groups, 4 word groups or more will automatically be recognized.
Option Explicit
Sub Find_Duplicates()
On Error GoTo errHandler
Dim pSingleLine As Paragraph
Dim sLine As String
Dim sFull_Text As String
Dim vArray_Full_Text As Variant
Dim sSearch_3 As String
Dim lSize_Array As Long
Dim lCnt As Long
Dim lCnt_Occurence As Long
'Create a string from the entire text
For Each pSingleLine In ActiveDocument.Paragraphs
sLine = pSingleLine.Range.Text
sFull_Text = sFull_Text & sLine
Next pSingleLine
'Load the text into an array
vArray_Full_Text = sFull_Text
vArray_Full_Text = Split(sFull_Text, " ")
lSize_Array = UBound(vArray_Full_Text)
For lCnt = 1 To lSize_Array - 1
lCnt_Occurence = 0
sSearch_3 = Trim(fRemove_Punctuation(vArray_Full_Text(lCnt - 1) & _
" " & vArray_Full_Text(lCnt) & _
" " & vArray_Full_Text(lCnt + 1)))
With Selection.Find
.Text = sSearch_3
.Forward = True
.Replacement.Text = ""
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
Do While .Execute
lCnt_Occurence = lCnt_Occurence + 1
If lCnt_Occurence > 1 Then
Selection.Range.Font.Color = vbRed
End If
Selection.MoveRight
Loop
End With
Application.StatusBar = lCnt & "/" & lSize_Array
Next lCnt
errHandler:
Stop
End Sub
Public Function fRemove_Punctuation(sString As String) As String
Dim vArray(0 To 8) As String
Dim lCnt As Long
vArray(0) = "."
vArray(1) = ","
vArray(2) = ","
vArray(3) = "?"
vArray(4) = "!"
vArray(5) = ";"
vArray(6) = ":"
vArray(7) = "("
vArray(8) = ")"
For lCnt = 0 To UBound(vArray)
If Left(sString, 1) = vArray(lCnt) Then
sString = Right(sString, Len(sString) - 1)
ElseIf Right(sString, 1) = vArray(lCnt) Then
sString = Left(sString, Len(sString) - 1)
End If
Next lCnt
fRemove_Punctuation = sString
End Function
The code assumes a continuous text without bullet points.