So... I have this form where people select different controls (We call safety measures controls, these are not content controls) from a listbox and add them to a list. This is in a repeating table. Each control has a heading label (either "engineering" "administrative" or "PPE" that I want to make bold and underlined but I want the options selected in the listboxes to be in normal formatting.
the portion of code that is printing this to the document looks like this:
Set tableSequence = ActiveDocument.Tables(1)
Set NewRow = tableSequence.Rows.Add
NewRow.Cells(5).Range.Text = "Engineering: " & MyString3 _
& vbCrLf & "Administrative: " _
& MyString4 & vbCrLf _
& "PPE: " & MyString5
I want the words Engineering, Administrative, and PPE to be bold and underlined, and the items represented by the MyString objects to appear in standard formatting. Thank you.
The string portion is as follows:
Private Sub CommandButton6_Click()
Dim tableSequence As Table
Dim NewRow As Row
Dim MyString5 As String
Dim v As Variant
Dim var3
Dim p As String
Dim M As Long
For var3 = 0 To ListBox7.ListCount - 1
If ListBox7.Selected(var3) = True Then
MyString5 = MyString5 & ListBox7.List(var3)
v = Split(MyString5, ",")
p = ""
For M = LBound(v) To UBound(v)
p = p + v(M)
If M Mod 3 = 2 Then
p = p + vbCr
Else
p = p + ","
End If
Next M
p = Left(p, Len(p) - 1)
Debug.Print p
End If
sorry for leaving that out
How to format a part (or multiple parts) of a Cell's Value in a Word table:
I have to admit i am not very fond of Word VBA, but i stitched this Sub together for you and it works in my test document. Adjust it to your needing.
Option Explicit
Sub asd()
Dim tableSequence As Table
Set tableSequence = ActiveDocument.Tables(1)
Dim NewRow As Row
Set NewRow = tableSequence.Rows.Add
NewRow.Cells(5).Range.Text = "Engineering: asd" & vbCrLf & "Administrative: vvv" & vbCrLf & "test" & vbCrLf & "PPE: blabla"
NewRow.Cells(5).Range.Bold = False
NewRow.Cells(5).Range.Underline = False
Dim keywordArr As Variant
keywordArr = Array("Engineering:", "Administrative:", "PPE:")
Dim keyword As Variant
Dim myRange As Variant
Dim startPos As Integer
Dim endPos As Integer
Dim length As Integer
Dim i As Integer
i = 1
For Each keyword In keywordArr
Do While InStr(1, myRange, keyword) = 0
Set myRange = NewRow.Cells(5).Range.Paragraphs(i).Range
i = i + 1
Loop
startPos = InStr(1, myRange, keyword)
startPos = myRange.Characters(startPos).Start
length = Len(keyword)
endPos = startPos + length
Set myRange = ActiveDocument.Range(startPos, endPos)
With myRange.Font
.Bold = True
.Underline = True
End With
Next keyword
End Sub
Below is a solution for the same thing in Excel:
First off you would have to write the text into the cell just like you already do.
Next would be to find the position of your keywords in the cell's value + the length of your keywords like so
startPos = Instr(1, NewRow.Cells(5), "Engineering:")
length = len("Engineering:")
Then you can set up the Font of the found substring via Range.Characters.Font
NewRow.Cells(5).Characters(startPos, Length).Font.Bold = True
NewRow.Cells(5).Characters(startPos, Length).Font.Underline = True
Now the elegant way would be to have an array of keywords and iterate through them to change the font for all them
Dim keywordArr As Variant
keywordArr = Array("Engineering:", "Administrative:", "PPE:")
Dim keyword As Variant
Dim startPos as Integer
Dim length as Integer
For Each keyword In keywordArr
startPos = InStr(1, NewRow.Cells(5), keyword)
length = Len(keyword)
With NewRow.Cells(5).Characters(startPos, Length).Font
.Bold = True
.Underline = True
End With
Next keyword
Related
Any help here would be appreciated please.
The included VBA code almost meets the intended purpose, however, I need a solution that enables the use of wildcards and highlights all parameters contained between "##", "%%" or potentially other special characters (special characters included).
For instance, lets say in the cell range B2:B10 we would find something like:
Checked at ##date1## and ##hour1##
But I want to be able to do a search and highlight using # * # or % * % within a selected determined cell range with the end result (bold being color):
Checked at ##date1## and ##hour1##
Sub HighlightStrings()
Application.ScreenUpdating = False
Dim Rng As Range
Dim cFnd As String
Dim xTmp As String
Dim x As Long
Dim m As Long
Dim y As Long
Dim xFNum As Integer
Dim xArrFnd As Variant
Dim xStr As String
cFnd = InputBox("Please enter the text, separate them by comma:")
If Len(cFnd) < 1 Then Exit Sub
xArrFnd = Split(cFnd, ",")
For Each Rng In Selection
With Rng
For xFNum = 0 To UBound(xArrFnd)
xStr = xArrFnd(xFNum)
y = Len(xStr)
m = UBound(Split(Rng.Value, xStr))
If m > 0 Then
xTmp = ""
For x = 0 To m - 1
xTmp = xTmp & Split(Rng.Value, xStr)(x)
.Characters(Start:=Len(xTmp) + 1, Length:=y).Font.ColorIndex = 3
xTmp = xTmp & xStr
Next
End If
Next xFNum
End With
Next Rng
Application.ScreenUpdating = True
End Sub
Thank you
Okay This seems to work for me. There is a limitation we can work on if required: the phrase to highlight must be padded with spaces on both sides.
Option Explicit
Option Base 0
Sub testreplace()
Dim I As Integer 'Iteration
Dim FlagNum As Integer 'Flag Number
Dim RG As Range 'Whole range
Dim CL As Range 'Each Cell
Dim FlagChar As String 'Flag characters
Dim ArrFlag 'Flag Char Array
Dim TextTemp As String 'Cell Contents
Set RG = Selection
FlagChar = "##"
FlagChar = InputBox("Enter 'Flag Characters' separated by a comma." & vbCrLf & vbCrLf & _
"Example:" & vbCrLf & vbCrLf & _
"##,%%,&&" & vbCrLf & _
"$$,XX", "Flag Characters to Highlight", "##,%%")
ArrFlag = Split(FlagChar, ",")
For Each CL In RG.Cells
TextTemp = CL.Value
For FlagNum = 0 To UBound(ArrFlag)
For I = 1 To Len(TextTemp)
'Debug.Print "<<" & Mid(TextTemp, I, Len(ArrFlag(Flagnum)) + 1) & _
"=" & " " & ArrFlag(Flagnum) & ">>"
If Mid(TextTemp, I, Len(ArrFlag(FlagNum)) + 1) = " " & ArrFlag(FlagNum) Then
CL.Characters(I + 1, InStr(I, TextTemp, ArrFlag(FlagNum) & " ") + _
Len(ArrFlag(FlagNum)) - I).Font.ColorIndex = 3
End If
Next I
Next FlagNum
Next CL
End Sub
Here's an example of it working:
This question already has answers here:
Excel UDF for capturing numbers within characters
(4 answers)
Closed 4 years ago.
I need to extract the numbers from a string of text and I'm not quite sure how to do it. The code I've attached below is very preliminary and most likely can be done more elegantly. A sample of the string I'm trying to parse is as follows:
"ID CSys ID Set ID Set Value Set Title 7026..Plate Top MajorPrn Stress 7027..Plate Top MinorPrn Stress 7033..Plate Top VonMises Stress"
I need to pull the numbers 7026, 7027, and 7033. The string will vary in length and the number of values that I'll need to pull will also vary. Any help would be much appreciated. Thanks!
Dim WrdArray() As String
Dim txtstrm As TextStream
Dim line As String
Dim clm As Long
Dim Rw As Long
'------------------------------------------------------------
Dim i As Long
Dim strPath As String
Dim strLine As String
Dim count, count1 As Integer
Dim holder As String
Dim smallSample As String
count = 0
count1 = 1
holder = ""
'Remove Filters and Add Custom Filters
Call Application.FileDialog(msoFileDialogOpen).Filters.Clear
Call Application.FileDialog(msoFileDialogOpen).Filters.Add("Text Files", "*.txt")
Call Application.FileDialog(msoFileDialogOpen).Filters.Add("Dat Files", "*.dat")
'only allow the user to select one file
Application.FileDialog(msoFileDialogOpen).AllowMultiSelect = False
'make the file dialog visible to the user
intChoice = Application.FileDialog(msoFileDialogOpen).Show
'determine what choice the user made
If intChoice <> 0 Then
'get the file path selected by the user
strPath = Application.FileDialog(msoFileDialogOpen).SelectedItems(1)
End If
'------------------------------------------------------------
If strPath <> "" Then
Set txtstrm = FSO.OpenTextFile(strPath)
Else
MsgBox "No file selected.", vbExclamation
Exit Sub
End If
Rw = 1
Do Until txtstrm.AtEndOfStream
line = txtstrm.ReadLine
clm = 1
WrdArray() = Split(line, " ") 'Change with ; if required
For Each wrd In WrdArray()
If Rw = 1 Then
Do While count <> Len(wrd)
smallSample = Left(wrd, 1)
If smallSample = "0" Or smallSample = "1" Or smallSample = "2" Or smallSample = "3" Or smallSample = "4" _
Or smallSample = "5" Or smallSample = "6" Or smallSample = "7" Or smallSample = "8" _
Or smallSample = "9" Then
holder = holder & smallSample
Else
If holder <> "" Then
Cells(count1, 1) = holder
count1 = count1 + 1
End If
holder = ""
End If
wrd = Right(wrd, Len(wrd) - 1)
clm = clm + 4
ActiveSheet.Cells(Rw, clm) = holder
Loop
Else
ActiveSheet.Cells(Rw, clm) = wrd
clm = clm + 1
End If
Next wrd
Rw = Rw + 1
Loop
txtstrm.Close
End Sub
You can use Regular Expressions.
Sub ExtractNumbers()
Dim str As String, regex As regExp, matches As MatchCollection, match As match
str = "ID CSys ID Set ID Set Value Set Title 7026..Plate Top MajorPrn Stress 7027..Plate Top MinorPrn Stress 7033..Plate Top VonMises Stress"
Set regex = New regExp
regex.Pattern = "\d+" '~~~> Look for variable length numbers only
regex.Global = True
If (regex.Test(str) = True) Then
Set matches = regex.Execute(str) '~~~> Execute search
For Each match In matches
Debug.Print match.Value '~~~> Prints: 7026, 7027, 7033
Next
End If
End Sub
Make sure you reference the VBA regex library:
Open VBA editor
Tools > References...
Check Microsoft VBScript Regular Expression 5.5
To exact numbers in the form you want, try something like:
Sub dural()
Dim s As String, i As Long, L As Long, c As String, temp As String
s = [A1]
L = Len(s)
temp = ""
For i = 1 To L
c = Mid(s, i, 1)
If c Like "[0-9]" Then
temp = temp & c
Else
temp = temp & " "
End If
Next i
temp = "'" & Application.WorksheetFunction.Trim(temp)
temp = Replace(temp, " ", ",")
[B1] = temp
End Sub
You can use this function that splits the "words and test for numeric:
Function numfromstring(str As String) As String
Dim strarr() As String
str = Replace(str, ".", " ")
strarr = Split(str)
Dim i As Long
For i = 0 To UBound(strarr)
If IsNumeric(strarr(i)) Then
numfromstring = numfromstring & "," & strarr(i)
End If
Next i
numfromstring = Mid(numfromstring, 2)
End Function
You would call it from the worksheet with a formula:
=numfromstring(A1)
Or from vba like this:
Sub try()
Dim str As String
str = "ID CSys ID Set ID Set Value Set Title 7026..Plate Top MajorPrn Stress 7027..Plate Top MinorPrn Stress 7033..Plate Top VonMises Stress"
Dim out As String
out = numfromstring(str)
Debug.Print out
End Sub
If you have Office 365 Excel you can use this array formula:
=TEXTJOIN(",",TRUE,IF(ISNUMBER(--TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",99)),(ROW($1:$100)-1)*99+1,99))),TRIM(MID(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",99)),(ROW($1:$100)-1)*99+1,99)),""))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode:
Alright,
I've been cobbling together code to automate a task. I have a word document that has something like 300 lines that each have an identifier number, a title, and a website. I would like to search the document by identifier pull the title and website separately and enter them into an excel sheet separately. The identifiers are already listed in excel and I would like them to match up with the appropriate information.
I know its really, really messy -
Public Sub ParseDoc()
Dim list As Workbook
Dim doc As Document
Set doc = "C:\network\path\importantlist.doc"
Dim paras As Paragraphs
Set paras = doc.Paragraphs
Dim para As Paragraph
Dim sents As Sentences
Dim sent As Range
Set list = ActiveSheet
Dim i As Integer
Dim mystring As String
Dim length As Integer
Dim space As String
Dim dot As String
Dim space1 As String
Dim space2 As String
Dim XYZ As Range
dot = "."
space = " "
i = 1
While i < 300 'This loops for the duration of the identifier list in excel
mystring = Cells(i, 1) ' this pulls the unique identifier from the cell
For Each para In paras
Set sents = para.Range.Sentences ' this searches the document by paragraphs to sentences
For Each sent In sents
If InStr(1, sent, mystring) <> 0 Then 'If a the identifier is found
space1 = InStr(1, sent, space, vbTextCompare) 'measure the length to the first blank space (this indicates the title is about to begin)
space2 = InStr(1, sent, dot, vbTextCompare) ' This dot is the ".doc" and indicates the title has concluded, I want the text between these two characters
Set XYZ =
Start:= space1.range.start
End:= space2.range.start
'Here is where I am stuck, I have never used range or selection before and after looking around, I still feel very much at a loss on how to proceed forward...
Next
Next
End Sub
Updated:
Updates values for matching IDs
Appends records that have no matching ID
General Instructions
Insert this into a Excel code module
Set the correct values for the constants in ParseWordDocument()
Cross you finger
Run ParseWordDocument()
Let me know how it went
Option Explicit
Sub ParseWordDocument()
Const WordPath As String = "C:\Users\best buy\Downloads\stackoverflow\Sample Files\A203 Paralegal.docx"
Const iID = 1
Const iTitle = 2
Const iHyperLink = 3
Const TargetSheetName As String = "Sheet1"
Dim k As String, id As String, title As String, hAddress As String, hScreenTip As String, hTextToDisplay As String
Dim lastRow As Long, x As Long, y As Long
Dim arData, h
arData = getWordDocArray(WordPath, False)
With Worksheets(TargetSheetName)
lastRow = .Cells(Rows.Count, iID).End(xlUp).Row + 1
For x = 2 To lastRow
For y = 0 To UBound(arData, 2)
id = Trim(.Cells(x, iID))
If Len(id) And (id = arData(0, y)) Then
id = Trim(.Cells(x, iID))
title = arData(1, y)
hAddress = arData(2, y)
hScreenTip = arData(3, y)
hTextToDisplay = arData(4, y)
.Cells(x, iTitle) = title
.Hyperlinks.Add .Cells(x, iHyperLink), Address:=hAddress, ScreenTip:=hScreenTip, TextToDisplay:=hTextToDisplay
arData(0, y) = ""
Exit For
End If
Next
Next
For y = 0 To UBound(arData, 2)
id = arData(0, y)
If Len(id) Then
title = arData(1, y)
hAddress = arData(2, y)
hScreenTip = arData(3, y)
hTextToDisplay = arData(4, y)
.Cells(lastRow, iID) = id
.Cells(lastRow, iTitle) = title
.Hyperlinks.Add .Cells(lastRow, iHyperLink), Address:=hAddress, ScreenTip:=hScreenTip, TextToDisplay:=hTextToDisplay
arData(0, y) = ""
lastRow = lastRow + 1
End If
Next
End With
End Sub
Function getWordDocArray(WordPath As String, Optional ShowWord As Boolean = False) As Variant
Dim i As Integer, iStart As Integer, iEnd As Integer
Dim id As String, title As String
Dim arData, s
Dim wdApp, wdDoc, h
Set wdApp = CreateObject("Word.Application")
Set wdDoc = wdApp.Documents.Open(Filename:=WordPath, ReadOnly:=True)
wdApp.Visible = ShowWord
ReDim arData(4, 0)
For Each s In wdDoc.Sentences
On Error GoTo SkipSentence
iStart = InStr(s.Text, s.Words(2))
iEnd = InStr(s.Text, "(") - iStart
id = Trim(s.Words(1))
title = Mid(s.Text, iStart, iEnd)
Set h = s.Hyperlinks(1)
ReDim Preserve arData(4, i)
arData(0, i) = id
arData(1, i) = title
arData(2, i) = h.Address
arData(3, i) = h.ScreenTip
arData(4, i) = h.TextToDisplay
i = i + 1
SkipSentence:
On Error GoTo 0
Next
getWordDocArray = arData
If Not ShowWord Then
wdDoc.Close False
wdApp.QUIT
End If
Set wdDoc = Nothing
Set wdApp = Nothing
End Function
I'm trying to extract text content from Excel using a macro. This is my code:
Dim i As Integer, j As Integer
Dim v1 As Variant
Dim Txt As String
v1 = Range("A2:C15")
For i = 1 To UBound(v1)
For j = 1 To UBound(v1, 2)
Txt = Txt & v1(i, j)
Next j
Txt = Txt & vbCrLf
Next i
MsgBox Txt
But it is showing the raw characters only meaning that it doesn't show any formatting information like bold, italic, underline, etc..
I want to extract the text along with the formatting information.
Example: This is sample text
Expected output: This is sample text
Actual output: This is sample text
Can someone explain what's wrong with the code and tell if anything is wrong?
A messagebox does not permit formatiing without changing system defaults, which is not a starightforward approach. If you want to display formatted text in a prompt then you are probably easiest to create a userform and format the label appropriately.
For example, you can determine if a cell has bold fomatting using:
Dim isBold As Boolean
isBold = v1(i, j).Font.Bold
And apply this to a userform label font using:
label.Font.Bold = isBold
If you want to output to a text (ie .txt) file then this cannot store any formatting information. The best you could hope to achieve is to create a markup style output where:
If isBold Then
txt = "<b >mytext< /b>" 'Ignore the spaces
Else
txt = "mytext"
End If
The range.Font.Bold property has three return options:
v1(i, j).Font.Bold = True 'if the entire cell IS bold
v1(i, j).Font.Bold = False 'if the entire cell IS NOT bold
v1(i, j).Font.Bold = Null 'if the cell is PARTIALLY bold
Calling IsNull(v1(i, j).Font.Bold) will tell you whether you have partial fomatting in a cell. Unfortunately you must then assess each character in the string individually to determine the bold characters. This function should determine where the bold formatting is switched on or off in a string contained in the Range object passed and add the appropriate markup tag:
Function markup(rng As Range) As String
Dim chr As Integer
Dim isCharBold As Boolean
Dim str As String
Dim tempChar As Characters
isCharBold = False
str = ""
If IsNull(rng.Font.Bold) Then
For chr = 1 To rng.Characters.Count
Set tempChar = rng.Characters(chr, 1)
If isCharBold Then
If tempChar.Font.Bold Then
str = str + tempChar.Text
Else
isCharBold = False
str = str & "</b>" & tempChar.Text
End If
Else
If tempChar.Font.Bold Then
isCharBold = True
str = str + "<b>" & tempChar.Text
Else
str = str & tempChar.Text
End If
End If
Next chr
Else
str = rng.Value
End If
markup = str
End Function
Notice that the Else case just returns the default string values. You can modify this approach to work for any of the .Font properties e.g. strikethrough, underline, italic....
The framework in the OP suggests that you are assigning the contents of a range of cells into an array of type Variant. This essentially leaves you with an unformatted string of characters in each array index. In this case you won't be able to extract any formatting from the array strings. To access the Characters().Font.Bold property you must be operating on a Range object so it might be best to iterate through each cell in Range("A2:C15") directly. This could be achieved by modifying your initial code as such, so it now calls the markup function:
Sub OutputText()
Dim i As Integer, j As Integer
Dim rng As Range
Dim Txt As String
Set rng = Range("A2:C15")
For i = 1 To rng.Rows.Count
For j = 1 To rng.Columns.Count
Txt = Txt & markup(rng(i, j)) & " "
Next j
Txt = Txt & vbCrLf
Next i
Debug.Print Txt
End Sub
OK, let's have the algorithm from #stucharo a little bit simpler to extend.
Public Function getHTMLFormattedString(r As Range) As String
isBold = False
isItalic = False
isUnderlined = False
s = ""
cCount = 0
On Error Resume Next
cCount = r.Characters.Count
On Error GoTo 0
If cCount > 0 Then
For i = 1 To cCount
Set c = r.Characters(i, 1)
If isUnderlined And c.Font.Underline = xlUnderlineStyleNone Then
isUnderlined = False
s = s & "</u>"
End If
If isItalic And Not c.Font.Italic Then
isItalic = False
s = s & "</i>"
End If
If isBold And Not c.Font.Bold Then
isBold = False
s = s & "</b>"
End If
If c.Font.Bold And Not isBold Then
isBold = True
s = s + "<b>"
End If
If c.Font.Italic And Not isItalic Then
isItalic = True
s = s + "<i>"
End If
If Not (c.Font.Underline = xlUnderlineStyleNone) And Not isUnderlined Then
isUnderlined = True
s = s + "<u>"
End If
s = s & c.Text
If i = cCount Then
If isUnderlined Then s = s & "</u>"
If isItalic Then s = s & "</i>"
If isBold Then s = s & "</b>"
End If
Next i
Else
s = r.Text
If r.Font.Bold Then s = "<b>" & s & "</b>"
If r.Font.Italic Then s = "<i>" & s & "</i>"
If Not (r.Font.Underline = xlUnderlineStyleNone) Then s = "<u>" & s & "</u>"
End If
getHTMLFormattedString = s
End Function
To be clear, this function works only with a range containing a single cell. But it should be easy calling this function for each cell in a bigger range and concatenating the returned strings into one.
Edit by the OP:
I called the function by the below code:
Sub ReplaceFormattingTags()
Dim i As Integer, j As Integer
Dim rng As Range
Dim Txt As String
Set rng = Range("A2:C15")
For i = 1 To rng.Rows.Count
For j = 1 To rng.Columns.Count
Txt = Txt & getHTMLFormattedString(rng(i, j)) & " "
Next j
Txt = Txt & vbCrLf
Next i
Debug.Print Txt
End Sub
VBA string does not support formatting like that. It will purely take the string from the range. No formatting at all. If you want to format the string, you can not see this through msgbox.
Only way to do it would be to store it in a cell then format the cell. But then that does not give you the output in a messagebox as a formatted string.
If you are planning to then put the string in a cell with formatting, you will need to save the formatting somewhere, or copy it from the cell you got the text from. And then apply the formatting to the cell
I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper I could find duplicates of the same phrases.
I copy and pasted a lot of documentation from past papers and was hoping to find a simple way to find any repeated information in this 40+ page document there is a lot of different formatting but I would be willing to temporarily get rid of formatting in order to find repeated information.
To highlight all duplicate sentences, you can also use ActiveDocument.Sentences(i). Here is an example
LOGIC
1) Get all the sentences from the word document in an array
2) Sort the array
3) Extract Duplicates
4) Highlight duplicates
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
BEFORE
AFTER
I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:
Option Explicit
Sub test()
Dim ABC As Scripting.Dictionary
Dim v As Range
Dim n As Integer
n = 5
Set ABC = FindRepeatingWordChains(n, ActiveDocument)
' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
' Loop through this collection to make your selections/highlights/whatever you want to do.
If Not ABC Is Nothing Then
For Each v In ABC
v.Font.Color = wdColorRed
Next v
End If
End Sub
' This is where the real code begins.
Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
Dim sChain As String
Dim CurWord As Range
Dim MatchCount As Integer
Dim i As Integer
MatchCount = 0
For Each CurWord In DocToCheck.Words
' Make sure there are enough remaining words in our document to handle a chain of the length specified.
If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
' Check for non-printing characters in the first/last word of the chain.
' This code will read a vbCr, etc. as a word, which is probably not desired.
' However, this check does not exclude these 'words' inside the chain, but it can be modified.
If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
sChain = CurWord
For i = 1 To ChainLenth - 1
' Add each word from the current word through the next ChainLength # of words to a temporary string.
sChain = sChain & " " & CurWord.Next(wdWord, i)
Next i
' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
' If not, then add it to the dictionary and increment our index.
If DictWords.Exists(sChain) Then
MatchCount = MatchCount + 1
DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
Else
DictWords.Add sChain, sChain
End If
End If
End If
Next CurWord
' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
If DictMatches.Count > 0 Then
Set FindRepeatingWordChains = DictMatches
Else
Set FindRepeatingWordChains = Nothing
End If
End Function
I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.
See the test() sub for usage.
You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.
I chose a rather lame theory, but it seems to work (at least if I got the question right cuz sometimes I'm a slow understander).
I load the entire text into a string, load the individual words into an array, loop through the array and concatenate the string, containing each time three consecutive words.
Because the results are already included in 3 word groups, 4 word groups or more will automatically be recognized.
Option Explicit
Sub Find_Duplicates()
On Error GoTo errHandler
Dim pSingleLine As Paragraph
Dim sLine As String
Dim sFull_Text As String
Dim vArray_Full_Text As Variant
Dim sSearch_3 As String
Dim lSize_Array As Long
Dim lCnt As Long
Dim lCnt_Occurence As Long
'Create a string from the entire text
For Each pSingleLine In ActiveDocument.Paragraphs
sLine = pSingleLine.Range.Text
sFull_Text = sFull_Text & sLine
Next pSingleLine
'Load the text into an array
vArray_Full_Text = sFull_Text
vArray_Full_Text = Split(sFull_Text, " ")
lSize_Array = UBound(vArray_Full_Text)
For lCnt = 1 To lSize_Array - 1
lCnt_Occurence = 0
sSearch_3 = Trim(fRemove_Punctuation(vArray_Full_Text(lCnt - 1) & _
" " & vArray_Full_Text(lCnt) & _
" " & vArray_Full_Text(lCnt + 1)))
With Selection.Find
.Text = sSearch_3
.Forward = True
.Replacement.Text = ""
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
Do While .Execute
lCnt_Occurence = lCnt_Occurence + 1
If lCnt_Occurence > 1 Then
Selection.Range.Font.Color = vbRed
End If
Selection.MoveRight
Loop
End With
Application.StatusBar = lCnt & "/" & lSize_Array
Next lCnt
errHandler:
Stop
End Sub
Public Function fRemove_Punctuation(sString As String) As String
Dim vArray(0 To 8) As String
Dim lCnt As Long
vArray(0) = "."
vArray(1) = ","
vArray(2) = ","
vArray(3) = "?"
vArray(4) = "!"
vArray(5) = ";"
vArray(6) = ":"
vArray(7) = "("
vArray(8) = ")"
For lCnt = 0 To UBound(vArray)
If Left(sString, 1) = vArray(lCnt) Then
sString = Right(sString, Len(sString) - 1)
ElseIf Right(sString, 1) = vArray(lCnt) Then
sString = Left(sString, Len(sString) - 1)
End If
Next lCnt
fRemove_Punctuation = sString
End Function
The code assumes a continuous text without bullet points.