Adding tables in word with vba - vba

I have been experimenting with adding tables in word from database.
So far I made a table in word document as a template
Then what I do is that I copy it into a new word document, search for DT and Dokumenttype and then replace it with the value I want. This is properly slow(however it seems to go extremly fast)and it would propberly be better to create it directly in word.
After creating the table i start adding rows to it, where the first column is to be hyperlinked. This is what seems to take time, it is only 235 rows total split on 11 tables but it takes almost a minute to create the 11 tables. So my question is how does you folks normally creates tables?
Do you create the header of the table, then keep adding rows?
Do you double loop, to find number of rows needed then create the whole table at one go?
Do you copy an array into the table to fill the rows? Then reloop to hyperlink the first column?
Output looks like this:
Below is my current code:
Option Explicit
Public Sub createDocOut(projectNumber As String, Optional reloadDatabase As Boolean = False)
Dim docOutArray() As String
Dim previousDokType As String
Dim doc_ As Document
Dim i As Long
Dim j As Integer
Dim k As Integer
Dim m As Integer
Dim sPercentage As Double
Dim numOfRows As Long
'Application.ScreenUpdating = False
docOutArray = Post.helpRequest("http://proto-ls/wordin.asp?Dok4=" & projectNumber)
If CPearson.IsArrayEmpty(docOutArray) Then
MsgBox "No document registered in database!"
End If
numOfRows = UBound(docOutArray, 1)
' creates a new document if needed otherwise it opens it
Set doc_ = NEwDocOut.createDocOutDocument(projectNumber)
If CustomProperties.getValueFromProperty(doc_, "_DocumentCount") = numOfRows And reloadDatabase = False Then
Application.ScreenUpdating = True
Exit Sub
Else
Selection.WholeStory
Selection.Delete
End If
'We add number of rows to document
Call CustomProperties.createCustomDocumentProperty(doc_, "_DocumentCount", numOfRows)
j = 0
previousDokType = ""
For i = LBound(docOutArray, 1) To numOfRows
'new table
If docOutArray(i, 1) <> previousDokType Then
If j > 0 Then
doc_.Tables(j).Select
Selection.Collapse WdCollapseDirection.wdCollapseEnd
Selection.MoveDown Unit:=wdLine, Count:=1
End If
j = j + 1
m = 2
Call NEwDocOut.addTable(doc_, docOutArray(i, 1), docOutArray(i, 2))
End If
'new row
With doc_.Tables(j)
.Rows(m).Select
Selection.InsertRowsBelow (1)
m = m + 1
' Hyper link the file
ActiveDocument.Hyperlinks.Add Anchor:=.Cell(m, 1).Range, _
Address:="z:\Prosjekt\" & projectNumber & docOutArray(i, 3), ScreenTip:="HyperLink To document", _
TextToDisplay:=FileHandling.GetFilenameFromPath(docOutArray(i, 3))
'loop through cells
For k = 3 To UBound(docOutArray, 2)
' .Cell(m, k - 2).Range.Font.Bold = False
' .Cell(m, k - 2).Range.Font.name = "Times New Roman"
' .Cell(m, k - 2).Range.Font.Size = 10
If k > 3 And k <> 8 Then
.Cell(m, k - 2).Range.Text = docOutArray(i, k)
End If
If k = 8 Then
.Cell(m, k - 2).Range.Text = Format(replace(docOutArray(i, k), ".", "/"), "mm.dd.yyyy")
End If
Next k
End With
previousDokType = docOutArray(i, 1)
Next i
'Application.ScreenUpdating = True
End Sub
'**********************************************************************
' ****** CREATE NEW DOCUMENT OUT **************************************
'**********************************************************************
Function createDocOutDocument(prosjektnumber As String) As Document
Dim dirName As String
Dim docOutname As String
Set createDocOutDocument = Nothing
' Hvis directory \Dokumentstyring\PFK ikke eksisterer, lag dette
dirName = "z:\Prosjekt\" & prosjektnumber
'change permision if needed
If Dir(dirName, vbDirectory) = "" And Not Settings.debugMy Then
MkDir dirName
End If
'filename of docOut
docOutname = dirName & "\" & prosjektnumber & "-Dokut.docx"
If FileHandling.doesFileExist(docOutname) Then
If FileHandling.openDocument(docOutname, True, True) Then
Set createDocOutDocument = ActiveDocument
Exit Function
End If
End If
'
' Add the tamplate for DocOut and save it to Doclist
'
Set createDocOutDocument = Documents.Add(Template:="Z:\Dokumentstyring\Config\DocOut.dotm", NewTemplate:=False)
createDocOutDocument.SaveAs filename:=docOutname
'Final check if document was created
If Not FileHandling.doesFileExist(docOutname) Then
Set createDocOutDocument = Nothing
End If
End Function
Function addTable(doc_ As Document, category As String, description As String)
doc_.Activate
'Insert out table
Selection.InsertFile filename:="Z:\Dokumentstyring\Config\Doklistut.docx", Range:="", _
ConfirmConversions:=False, link:=False, Attachment:=False
'Replace the DT with the category
If Not searchAll(doc_, "DT", category) Then
MsgBox "Failed to replace category in table"
End If
'Replace the Dokumenttype with the category
If Not searchAll(doc_, "Dokumenttype", description) Then
MsgBox "Failed to replace document type in table"
End If
End Function

Related

Add filenames to an array and pass it to a sorting function as a string argument

The goal is to provide a folder choosing dialogue to read file names and paste them into the open Word document with the file names being the title (above the picture). This is to ease step by step documentations in Word with a style of "1. Do this", "2. Do that" .... "10. And then that", "11. And then this" (with it being sorted wrong, i.e. 1, 10, 11, 13, 2, 3, 4, 5, 6, 7, 8, 9 without the sorting function).
I can't overcome the type mismatch error, that the following VBA code generates (it seems to be the error of String vs. Array type):
Function:
Function QuickSortNaturalNum(strArray() As String, intBottom As Integer, intTop As Integer)
Dim strPivot As String, strTemp As String
Dim intBottomTemp As Integer, intTopTemp As Integer
intBottomTemp = intBottom
intTopTemp = intTop
strPivot = strArray((intBottom + intTop) \ 2)
Do While (intBottomTemp <= intTopTemp)
' < comparison of the values is a descending sort
Do While (CompareNaturalNum(strArray(intBottomTemp), strPivot) < 0 And intBottomTemp < intTop)
intBottomTemp = intBottomTemp + 1
Loop
Do While (CompareNaturalNum(strPivot, strArray(intTopTemp)) < 0 And intTopTemp > intBottom)
intTopTemp = intTopTemp - 1
Loop
If intBottomTemp < intTopTemp Then
strTemp = strArray(intBottomTemp)
strArray(intBottomTemp) = strArray(intTopTemp)
strArray(intTopTemp) = strTemp
End If
If intBottomTemp <= intTopTemp Then
intBottomTemp = intBottomTemp + 1
intTopTemp = intTopTemp - 1
End If
Loop
'the function calls itself until everything is in good order
If (intBottom < intTopTemp) Then QuickSortNaturalNum strArray, intBottom, intTopTemp
If (intBottomTemp < intTop) Then QuickSortNaturalNum strArray, intBottomTemp, intTop
End Function
Sub:
Sub PicWithCaption()
Dim xFileDialog As FileDialog
Dim xPath, xFile, xFileNameOnly As String
Dim xFileNameOnlySorted, xFileNameOnlyUnsorted As Variant
Dim xFileNameOnlyUnsortedAsString As String
Dim i, k, l As Integer
l = 1
m = 100
On Error Resume Next
Set xFileDialog = Application.FileDialog(msoFileDialogFolderPicker)
If xFileDialog.Show = -1 Then
xPath = xFileDialog.SelectedItems.Item(i)
If xPath <> "" Then
xFile = Dir(xPath & "\*.*")
For i = 0 To 100
Do While xFile <> ""
xFileNameOnly = Left(xFile, Len(xFile) - 4)
xFileNameOnlyUnsorted(i) = xFileNameOnly
ReDim Preserve xFileNameOnlyUnsorted(0 To i) As Variant
xFileNameOnlyUnsorted(i) = xFileNameOnlyUnsorted(i).Value
Loop
Next i
xFileNameOnlySorted = Module1.QuickSortNaturalNum(xFileNameOnlyUnsorted, l, m)
For xFileNameOnlySorted(k) = 1 To 100
If UCase(Right(xFileNameOnlySorted(k), 3)) = "PNG" Or _
UCase(Right(xFileNameOnlySorted(k), 3)) = "TIF" Or _
UCase(Right(xFileNameOnlySorted(k), 3)) = "JPG" Or _
UCase(Right(xFileNameOnlySorted(k), 3)) = "GIF" Or _
UCase(Right(xFileNameOnlySorted(k), 3)) = "BMP" Then
With Selection
.Text = xFileNameOnlySorted(k)
.MoveDown wdLine
.InlineShapes.AddPicture xPath & "\" & xFile, False, True
.InsertAfter vbCrLf
.MoveDown wdLine
End With
End If
Next xFileNameOnlySorted(k)
xFile = Dir()
End If
End If
End Sub
Here's a slightly different approach:
Sub PicWithCaption()
Dim xPath As String, colImages As Collection, arrFiles, f
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select a folder with files to insert"
.AllowMultiSelect = False
If .Show = -1 Then xPath = .SelectedItems(1) & "\"
End With
If Len(xPath) = 0 Then Exit Sub
Set colImages = ImageFiles(xPath) 'get a Collection of image file names
If colImages.Count > 0 Then 'found some files ?
arrFiles = CollectionToArray(colImages) 'get array from Collection
SortSpecial arrFiles, "SortVal" 'sort files using `Val()`
For Each f In arrFiles 'loop the sorted array
With Selection
.Text = f
.MoveDown wdLine
.InlineShapes.AddPicture xPath & f, False, True
.InsertAfter vbCrLf
.MoveDown wdLine
End With
Next f
Else
MsgBox "No image files found in selected folder"
End If
End Sub
'return a Collection of image files given a folder location
Function ImageFiles(srcFolder As String) As Collection
Dim col As New Collection, f As String
f = Dir(srcFolder & "*.*")
Do While f <> ""
Select Case UCase(Right(f, 3))
Case "PNG", "TIF", "JPG", "GIF", "BMP"
col.Add f
End Select
f = Dir()
Loop
Set ImageFiles = col
End Function
'create and return a string array from a Collection
Function CollectionToArray(col As Collection) As String()
Dim arr() As String, i As Long
ReDim arr(1 To col.Count)
For i = 1 To col.Count
arr(i) = col(i)
Next i
CollectionToArray = arr
End Function
'Sorts an array using some specific translation defined in `func`
Sub SortSpecial(list, func As String)
Dim First As Long, Last As Long, i As Long, j As Long, tmp, arrComp()
First = LBound(list)
Last = UBound(list)
'fill the "compare array...
ReDim arrComp(First To Last)
For i = First To Last
arrComp(i) = Application.Run(func, list(i))
Next i
'now sort by comparing on `arrComp` not `list`
For i = First To Last - 1
For j = i + 1 To Last
If arrComp(i) > arrComp(j) Then
tmp = arrComp(j) 'swap positions in the "comparison" array
arrComp(j) = arrComp(i)
arrComp(i) = tmp
tmp = list(j) '...and in the original array
list(j) = list(i)
list(i) = tmp
End If
Next j
Next i
End Sub
'a function to allow comparing values based on the initial numeric part...
Function SortVal(v)
SortVal = Val(v) ' "1 day" --> 1, "11 days" --> 11 etc
End Function

VBA merge cells and **Bold** text of second cell

I am coding a VBA function to merge two cells and then highlight the text of cell 2 with bold formatting
The merging goes well
The call to Sub goes well
But the text format is not applied
I believe it might be caused by the sub executing before the cell is populated with the string - but that's pure guessing - this is my first VBA script
Function boldIt(navn As String, ekstra As String)
Dim ln1 As Integer
Dim ln2 As Integer
Dim st1 As String
ln1 = Len(navn)
ln2 = Len(navn) + Len(ekstra)
If (ln1 = ln2) Then
boldIt = navn
Else
boldIt = navn & " - " & ekstra
boldTxt ln1, ln2
End If
End Function
Public Sub boldTxt(startPos As Integer, charCount As Integer)
With ActiveCell.Characters(Start:=startPos, Length:=charCount).Font
.FontStyle = "Bold"
End With
End Sub
I think I would just run two subs; the function is not returning anything.
Option Explicit
Sub boldIt()
Dim secondOne As String
With Selection
secondOne = .Cells(2).Value2
Application.DisplayAlerts = False
.Merge
Application.DisplayAlerts = True
.Cells(1) = .Cells(1).Value2 & secondOne
boldTxt .Cells(1), Len(.Cells(1).Value2) - Len(secondOne) + 1, Len(secondOne)
End With
End Sub
Public Sub boldTxt(rng As Range, startPos As Integer, charCount As Integer)
With rng.Characters(Start:=startPos, Length:=charCount).Font
.FontStyle = "Bold"
End With
End Sub
This Sub loops through the columns, takes the strings of the two cells, combines the strings and add them to the target cell, while bolding the text of the second cell
Thanks to #Jeeped for the pointers!
Sub boldIt()
Dim pos_bold As Integer
Dim celltxt As String
For i = 2 To 200000
' first cell will always be populated - if not - exit
If (Range("Plan!E" & i).Value = "") Then Exit For
' if second cell is empty - take only first cell as normal txt
If (Range("Plan!F" & i).Value = "") Then
Range("Kalender!F" & i).Value = Range("Plan!E" & i).Value
Else
' calculate start of bold text
pos_bold = Len(Range("Plan!E" & i).Value) + 1
' create the string
celltxt = Range("Plan!E" & i).Value & " - " & Range("Plan!F" & i).Value
' add string to field and add bold to last part
With Worksheets("Kalender").Range("F" & i)
.Value = celltxt
.Characters(pos_bold).Font.Bold = True
End With
End If
Next i
End Sub

Identify it then Move It (Macro)

I had this project in Chemistry to supply a list of Compound elements
now I had found a website where it gives me a very long list of elements:
I had made this Code but it Doesn't Work
Sub move()
Dim list As Range
Set list = Range("A1:A2651")
For Each Row In list.Rows
If (Row.Font.Regular) Then
Row.Cells(1).Offset(-2, 1) = Row.Cells(1)
End If
Next Row
End Sub
Can you make it run for me? you can have your own algorithm ofc.
Assuming the list is constantly in the same format (i.e. Compound name, empty line, Compound Symbols, empty line) this quick code will work:
Sub move()
Dim x As Integer
x = 3
With ActiveSheet
Do Until x > 2651
.Cells(x - 2, 2).Value = .Cells(x, 1).Value
.Cells(x, 1).ClearContents
x = x + 4
Loop
End With
End Sub
After running you can then just sort columns A:B to remove the blanks.
After trying your original code I realised the problem was with the .regular property value. I've not seen .regular before, so swapped it to NOT .bold instead, and to ignore blank entries, then added the line for clearing the contents of the cell copied. This is most like the original code for reference:
Sub get_a_move_on()
Dim list As Range
Set list = ActiveSheet.Range("A1:A2561")
For Each Row In list.Rows
If Row.Font.Bold = False And Row.Value <> "" Then
Row.Cells(1).Offset(-2, 1) = Row.Cells(1)
Row.Cells(1).ClearContents
End If
Next Row
End Sub
P.S it's a list of compounds, not elements, there's only about 120 elements in the periodic table! ;)
Another way to retrieve the data you need via XHR and RegEx:
Sub GetChemicalCompoundsNames()
Dim sRespText As String
Dim aResult() As String
Dim i As Long
' retrieve HTML content
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://quizlet.com/18087424", False
.Send
sRespText = .responseText
End With
' regular expression for rows
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "qWord[^>]*?>([\s\S]*?)<[\s\S]*?qDef[^>]*?>([\s\S]*?)<"
With .Execute(sRespText)
ReDim aResult(1 To .Count, 1 To 2)
For i = 1 To .Count
With .Item(i - 1)
aResult(i, 1) = .SubMatches(0)
aResult(i, 2) = .SubMatches(1)
End With
Next
End With
End With
' output to the 1st sheet
With Sheets(1)
.Cells.Delete
Output .Range("A1"), aResult
End With
End Sub
Sub Output(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1 _
)
.NumberFormat = "#"
.Value = aCells
.Columns.AutoFit
End With
End With
End Sub
Gives output (663 rows total):

Improve redline comparison of cells

I am using Excel 2010.
I have some working VBA code that compares two cells (from text, to text) and generates the redlined text into a third cell with strikethroughs on removed words, underlines on added words. This is not a straight combination of the contents of the cells.
The code works, but I think it can be more efficient with the use of multidimensional arrays to store things instead of using additional cells and recombining. But I am stuck on how to implement it. I would also like to determine where the breaking point is, especially for newer versions of Excel that I don't have yet, since the number of characters allowed in a cell seems to continually grow with every new release.
Comments are also welcome.
The working code:
Sub main()
Cells(3, 3).Clear
Call Redline(3)
End Sub
Sub Redline(ByVal r As Long)
Dim t As String
Dim t1() As String
Dim t2() As String
Dim i As Integer
Dim j As Integer
Dim f As Boolean
Dim c As Integer
Dim wf As Integer
Dim ss As Integer
Application.ScreenUpdating = False
t1 = Split(Range("A" + CStr(r)).Value, " ", -1, vbTextCompare)
t2 = Split(Range("B" + CStr(r)).Value, " ", -1, vbTextCompare)
t = ""
f = False
c = 4
ss = 0
If (Range("A" + CStr(r)).Value <> "") Then
If (Range("B" + CStr(r)).Value <> "") Then
j = 1
For i = LBound(t1) To UBound(t1)
f = False
For j = ss To UBound(t2)
If (t1(i) = t2(j)) Then
f = True
wf = j
Exit For
End If
Next j
If (Not f) Then
Cells(r, c).Value = t1(i)
Cells(r, c).Font.Strikethrough = True ' strikethrough this cell
c = c + 1
Else
If (wf = i) Then
Cells(r, c).Value = t1(i) ' aka t2(wf)
c = c + 1
ss = i + 1
ElseIf (wf > i) Then
For j = ss To wf - 1
Cells(r, c).Value = t2(j)
Cells(r, c).Font.Underline = xlUnderlineStyleSingle ' underline this cell
c = c + 1
Next j
Cells(r, c).Value = t1(i)
c = c + 1
ss = wf + 1
End If
End If
Next i
If (UBound(t2) > UBound(t1)) Then
For i = ss To UBound(t2)
Cells(r, c).Value = t2(i)
Cells(r, c).Font.Underline = xlUnderlineStyleSingle ' underline this cell
c = c + 1
Next i
End If
Else
t = Range("A" + CStr(r)).Value
End If
Else
t = Range("B" + CStr(r)).Value
End If
lc = Range("XFD" + CStr(r)).End(xlToLeft).Column
Call Merge_Cells(r, 4, lc)
Application.ScreenUpdating = True
End Sub
Sub Merge_Cells(ByVal r As Long, ByVal fc As Integer, ByVal lc As Long)
Dim i As Integer, c As Integer, j As Integer
Dim rngFrom As Range
Dim rngTo As Range
Dim lenFrom As Integer
Dim lenTo As Integer
Set rngTo = Cells(r, 3)
' copy the text over
For c = fc To lc
lenTo = rngTo.Characters.Count
Set rngFrom = Cells(r, c)
lenFrom = rngFrom.Characters.Count
If (c = lc) Then
rngTo.Value = rngTo.Text & rngFrom.Text
Else
rngTo.Value = rngTo.Text & rngFrom.Text & " "
End If
Next c
' now copy the formatting
j = 0
For c = fc To lc
Set rngFrom = Cells(r, c)
lenFrom = rngFrom.Characters.Count + 1 ' add one for the space after each word
For i = 1 To lenFrom - 1
With rngTo.Characters(j + i, 1).Font
.Name = rngFrom.Characters(i, 1).Font.Name
.Underline = rngFrom.Characters(i, 1).Font.Underline
.Strikethrough = rngFrom.Characters(i, 1).Font.Strikethrough
.Bold = rngFrom.Characters(i, 1).Font.Bold
.Size = rngFrom.Characters(i, 1).Font.Size
.ColorIndex = rngFrom.Characters(i, 1).Font.ColorIndex
End With
Next i
j = j + lenFrom
Next c
' wipe out the temporary columns
For c = fc To lc
Cells(r, c).Clear
Next c
End Sub
You can directly assign Excel Range object to VBA 2d-array and perform all that business logic operations on that array. It will provide substantial performance boost vs range iteration. The result values then can be inserted back into Excel worksheet column from that 2d-array.
Sample code snippet follows:
Sub Range2Array()
Dim arr As Variant
arr = Range("A:B").Value
'alternatively
'arr = Range("A:B")
'test
Debug.Print (arr(1, 1))
End Sub
Another useful technique is to assign Excel's UsedRange to VBA Array:
arr = ActiveSheet.UsedRange
Hope this may help. Best regards,
Sample code not quite right
I've got a spreadsheet with the following "original" and "changed" content:
Tesla to Begin Trial for Allowing Other Vehicles from Other Electric Vehicle Automakers to Use Tesla Superchargers
Tesla to Begin Trial for Allowing Other Vehicles from Other EV Auto Makers to Use Tesla Superchargers
Running your code, I got not-quite-right results.
The "original" text that is missing from the "changed" version is correctly shown with strikethrough, but the new text in the "changed" version is just ... missing.
Alternative approach
Poking around, it looks like you're trying to re-create MS Word's Track Changes formatting.
Why not just leverage Word?
The following VBA code does just that. This requires that your Excel VBA project has a reference to the Word object library. You can add this from within the VBA editor by clicking Tools → References, and selecting Microsoft Word XX.Y Object Library, where XX.Y is whatever version you have installed.
Public Sub CompareCells()
' ####################
' Basic Flow
'
' 1. Get the text content of the two cells to compare.
' 2. Get an open instance of MS Word, or spin up a new one.
' 3. Use Word's text-comparison features to generate the tracked-changes markup.
' 4. Copy that markup to the clipboard.
' 5. Then just paste that into our target cell.
' ####################
Const Src As String = "A" ' Column containing the original source text
Const Tgt As String = "B" ' Column containing the targeted text to compare
Const Cmp As String = "C" ' Column where we will put the marked-up comparison
Const RowToUse As Integer = 8 ' Rejigger as appropriate to your use case.
' 1.
Dim ThisSheet As Excel.Worksheet: Set ThisSheet = Excel.ActiveSheet
Dim StrSrc As String, StrTgt As String
StrSrc = ThisSheet.Range(Src & RowToUse).Value
StrTgt = ThisSheet.Range(Tgt & RowToUse).Value
' 2.
Dim Wd As Word.Application: Set Wd = GetApp("Word")
' 3.
Dim DocOrig As Word.Document, DocChgd As Word.Document, DocMarkup As Word.Document
Set DocOrig = Wd.Documents.Add(Visible:=False)
DocOrig.Content = StrSrc
Set DocChgd = Wd.Documents.Add(Visible:=False)
DocChgd.Content = StrTgt
Set DocMarkup = Wd.CompareDocuments(DocOrig, DocChgd, wdCompareDestinationNew)
' 4.
DocMarkup.Content.Copy
' 5.
ThisSheet.Range(Cmp & RowToUse).Select
ThisSheet.Paste
' Cleanup
DocOrig.Close savechanges:=False
DocChgd.Close savechanges:=False
DocMarkup.Close savechanges:=False
End Sub
Public Function GetApp(AppName As String) As Object
Dim app As Object
On Error GoTo Handler
Set app = GetObject(, AppName & ".Application")
Set GetApp = app
Exit Function
On Error GoTo 0
Handler:
If Err.Number > 0 And Err.Number <> 429 Then ' Unknown error, so error out
Err.Raise Err.Number, Err.Source, Err.Description, Err.HelpFile, Err.HelpContext
Exit Function
End If
DoEvents
' If we get here, there's no open app by that name, so start a new instance.
Set app = CreateObject(AppName & ".Application")
Set GetApp = app
End Function
When run using the same sample texts, I get the following:
This time, we get both the removed text in strikethrough, and the added text in underlining, with color coding as well.

Highlight (not delete) repeat sentences or phrases

I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper I could find duplicates of the same phrases.
I copy and pasted a lot of documentation from past papers and was hoping to find a simple way to find any repeated information in this 40+ page document there is a lot of different formatting but I would be willing to temporarily get rid of formatting in order to find repeated information.
To highlight all duplicate sentences, you can also use ActiveDocument.Sentences(i). Here is an example
LOGIC
1) Get all the sentences from the word document in an array
2) Sort the array
3) Extract Duplicates
4) Highlight duplicates
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
BEFORE
AFTER
I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:
Option Explicit
Sub test()
Dim ABC As Scripting.Dictionary
Dim v As Range
Dim n As Integer
n = 5
Set ABC = FindRepeatingWordChains(n, ActiveDocument)
' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
' Loop through this collection to make your selections/highlights/whatever you want to do.
If Not ABC Is Nothing Then
For Each v In ABC
v.Font.Color = wdColorRed
Next v
End If
End Sub
' This is where the real code begins.
Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
Dim sChain As String
Dim CurWord As Range
Dim MatchCount As Integer
Dim i As Integer
MatchCount = 0
For Each CurWord In DocToCheck.Words
' Make sure there are enough remaining words in our document to handle a chain of the length specified.
If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
' Check for non-printing characters in the first/last word of the chain.
' This code will read a vbCr, etc. as a word, which is probably not desired.
' However, this check does not exclude these 'words' inside the chain, but it can be modified.
If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
sChain = CurWord
For i = 1 To ChainLenth - 1
' Add each word from the current word through the next ChainLength # of words to a temporary string.
sChain = sChain & " " & CurWord.Next(wdWord, i)
Next i
' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
' If not, then add it to the dictionary and increment our index.
If DictWords.Exists(sChain) Then
MatchCount = MatchCount + 1
DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
Else
DictWords.Add sChain, sChain
End If
End If
End If
Next CurWord
' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
If DictMatches.Count > 0 Then
Set FindRepeatingWordChains = DictMatches
Else
Set FindRepeatingWordChains = Nothing
End If
End Function
I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.
See the test() sub for usage.
You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.
I chose a rather lame theory, but it seems to work (at least if I got the question right cuz sometimes I'm a slow understander).
I load the entire text into a string, load the individual words into an array, loop through the array and concatenate the string, containing each time three consecutive words.
Because the results are already included in 3 word groups, 4 word groups or more will automatically be recognized.
Option Explicit
Sub Find_Duplicates()
On Error GoTo errHandler
Dim pSingleLine As Paragraph
Dim sLine As String
Dim sFull_Text As String
Dim vArray_Full_Text As Variant
Dim sSearch_3 As String
Dim lSize_Array As Long
Dim lCnt As Long
Dim lCnt_Occurence As Long
'Create a string from the entire text
For Each pSingleLine In ActiveDocument.Paragraphs
sLine = pSingleLine.Range.Text
sFull_Text = sFull_Text & sLine
Next pSingleLine
'Load the text into an array
vArray_Full_Text = sFull_Text
vArray_Full_Text = Split(sFull_Text, " ")
lSize_Array = UBound(vArray_Full_Text)
For lCnt = 1 To lSize_Array - 1
lCnt_Occurence = 0
sSearch_3 = Trim(fRemove_Punctuation(vArray_Full_Text(lCnt - 1) & _
" " & vArray_Full_Text(lCnt) & _
" " & vArray_Full_Text(lCnt + 1)))
With Selection.Find
.Text = sSearch_3
.Forward = True
.Replacement.Text = ""
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
Do While .Execute
lCnt_Occurence = lCnt_Occurence + 1
If lCnt_Occurence > 1 Then
Selection.Range.Font.Color = vbRed
End If
Selection.MoveRight
Loop
End With
Application.StatusBar = lCnt & "/" & lSize_Array
Next lCnt
errHandler:
Stop
End Sub
Public Function fRemove_Punctuation(sString As String) As String
Dim vArray(0 To 8) As String
Dim lCnt As Long
vArray(0) = "."
vArray(1) = ","
vArray(2) = ","
vArray(3) = "?"
vArray(4) = "!"
vArray(5) = ";"
vArray(6) = ":"
vArray(7) = "("
vArray(8) = ")"
For lCnt = 0 To UBound(vArray)
If Left(sString, 1) = vArray(lCnt) Then
sString = Right(sString, Len(sString) - 1)
ElseIf Right(sString, 1) = vArray(lCnt) Then
sString = Left(sString, Len(sString) - 1)
End If
Next lCnt
fRemove_Punctuation = sString
End Function
The code assumes a continuous text without bullet points.