Identify Paragraph content with certain outline level - vba

I have a macro which pass through paragraphs of a word document. This code is intended to pass the paragraph, identify its outline level and retrieve the content when the desired paragraph outline level is found. With this information, I'm populating a listbox that will allow users to choose from what point they want to export some text in a document.
This functionality is working, however, I'm looking for a way to improve its speed. Right now I'm handling a document with 5678 paragraphs, and it is taking over 30 minutes to process all the information. Do you have any suggestion?
I had tried to approaches without having success:
1 - I've tried to use the object TableOfContents, however I was not able to have a clean information and differentiate outline levels from the paragraphs.
2 - I've tried to adapt the code from here Getting the headings from a Word document, specially because of the use of the command _docSource.GetCrossReferenceItems(wdRefTypeHeading), also with no success
Here there is the image of the form, and the code I'm using.
Sub ProcessHeaders()
Dim j As Long
Dim Paragraph_Number() As Variant
Dim Paragraph_Content() As Variant
Dim Paragraph_Mapping() As Variant
j = 1
With UserForm1
If .ComboBox4.ListCount > 0 Then
.ComboBox4.Clear
End If
For i = 1 To wordDoc.Paragraphs.Count
If wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel1 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel2 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel3 _
Or wordDoc.Paragraphs.Item(i).OutlineLevel = wdOutlineLevel4 Then
If wordDoc.Paragraphs.Item(i).Range.ListFormat.ListString <> "" Then
ReDim Preserve Paragraph_Number(j)
ReDim Preserve Paragraph_Content(j)
Paragraph_Content(j) = wordDoc.Paragraphs.Item(i).Range.ListFormat.ListString & " " & Trim(Left(wordDoc.Paragraphs.Item(i).Range.Text, (Len(wordDoc.Paragraphs.Item(i).Range.Text) - 1)))
Paragraph_Number(j) = i
j = j + 1
End If
End If
Next i
ReDim Preserve Paragraph_Mapping(1 To UBound(Paragraph_Content), 1)
For i = 1 To UBound(Paragraph_Number)
Paragraph_Mapping(i, 0) = Paragraph_Content(i)
Paragraph_Mapping(i, 1) = Paragraph_Number(i)
Next i
.ComboBox4.List = Paragraph_Mapping
End With
End Sub
Edit 1 - I Achieve to reduce the time from 32 minutes to 8 minutes of execution with the code below. Any suggestions to improve even more? Thanks in advance
Sub ProcessHeaders()
Dim j As Long
Dim thisOutlineLevel As WdOutlineLevel
Dim thisHeader As String
Dim thisList As String
Dim ParagraphCount As Long
Dim Paragraph_Number_Base() As Variant
Dim Paragraph_Content_Base() As Variant
Dim Paragraph_ListItem_Base() As Variant
Dim ParagraphContent() As Variant
Dim ParagraphNumber() As Variant
Dim Paragraph_Mapping() As Variant
Dim StartTime As Double
Dim MinutesElapsed As String
j = 1
With UserForm1
If .ComboBox4.ListCount > 0 Then
.ComboBox4.Clear
End If
ParagraphCount = wordDoc.Paragraphs.Count
ReDim Paragraph_Content_Base(ParagraphCount + 1)
ReDim Paragraph_ListItem_Base(ParagraphCount + 1)
ReDim Paragraph_Number_Base(ParagraphCount + 1)
StartTime = Timer
For i = 1 To ParagraphCount
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = "Reading Paragraphs. " & Format(i / ParagraphCount, "0%") & " | Total of Paragraphs Found: " & ParagraphCount & " | Time Elapsed: " _
& MinutesElapsed & " Minutes"
With wordDoc.Paragraphs.Item(i)
Select Case .OutlineLevel
Case wdOutlineLevelBodyText
GoTo ResumeNext
Case wdOutlineLevel1, wdOutlineLevel2, wdOutlineLevel3, wdOutlineLevel4
Paragraph_Content_Base(i) = .Range.Text
Paragraph_ListItem_Base(i) = .Range.ListFormat.ListString
Paragraph_Number_Base(i) = i
End Select
End With
ResumeNext:
Next i
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = ParagraphCount & " read on " & MinutesElapsed & " Minutes. Now, identifying the Headers"
For i = 0 To UBound(Paragraph_Content_Base)
If Paragraph_Content_Base(i) <> "" And Paragraph_ListItem_Base(i) <> "" Then
ReDim Preserve ParagraphContent(j)
ReDim Preserve ParagraphNumber(j)
ParagraphContent(j) = Trim(Paragraph_ListItem_Base(i)) & " " & Trim(Left(Paragraph_Content_Base(i), Len(Paragraph_Content_Base(i)) - 1))
ParagraphNumber(j) = Paragraph_Number_Base(i)
j = j + 1
End If
Next i
Erase Paragraph_Content_Base
Erase Paragraph_ListItem_Base
Erase Paragraph_Number_Base
ReDim Preserve Paragraph_Mapping(1 To UBound(ParagraphContent), 1)
For i = 1 To UBound(ParagraphNumber)
Paragraph_Mapping(i, 0) = ParagraphContent(i)
Paragraph_Mapping(i, 1) = ParagraphNumber(i)
Next i
.ComboBox4.List = Paragraph_Mapping
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
UserForm1.Label7.Caption = "Identifying Headers: " & j & " identified. Total Time: " & MinutesElapsed & " minutes"
End With
Edit 2 - With the Help of Cindy, the code which was initially running in 32 minutes right now is running on 32 seconds. Here is the final Code.
Sub ProcessHeaders()
Dim rng As Word.Range
Dim para As Word.Paragraph
Dim lstFormat As Word.ListFormat
Dim paraNr() As Variant
Dim paraContent() As Variant
Dim counter As Long, paraIndex As Long
Dim Paragraph_Mapping() As Variant
Dim ParagraphCount As Long
Dim i, j As Long
Dim StartTime As Double
Dim StartRealTime As Date
Dim MinutesElapsed As String
With UserForm1
If .ComboBox4.ListCount > 0 Then
.ComboBox4.Clear
End If
counter = 1
paraIndex = 1
i = 0
j = 1
StartTime = Timer
StartRealTime = Now
Set rng = wordDoc.Content
ParagraphCount = rng.ListParagraphs.Count
For Each para In rng.ListParagraphs
i = i + 1
Set lstFormat = para.Range.ListFormat
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
.Label7.Caption = "Reading Paragraphs. " & Format(i / ParagraphCount, "0%") & " | Total of Paragraphs Found: " & ParagraphCount & _
" | Start Time: " & StartRealTime & " | Time Elapsed: " & MinutesElapsed & " Minutes"
'CheckOutLine = rng.ListParagraphs.Item(1).OutlineLevel
If lstFormat.ListString <> "" And Len(lstFormat.ListString) >= 2 Then
ReDim Preserve paraNr(counter)
ReDim Preserve paraContent(counter)
paraContent(counter) = lstFormat.ListString & " " _
& Trim(Left(para.Range.Text, (Len(para.Range.Text) - 1)))
paraNr(counter) = i
wordDoc.Bookmarks.Add Name:="ExpContent" & i, Range:=para.Range
counter = counter + 1
End If
paraIndex = paraIndex + 1
Next
j = 1
ReDim Preserve Paragraph_Mapping(1 To UBound(paraNr), 1)
For i = UBound(paraNr) To 1 Step -1
Paragraph_Mapping(j, 0) = paraContent(i)
Paragraph_Mapping(j, 1) = paraNr(i)
j = j + 1
Next i
.ComboBox4.List = Paragraph_Mapping
MinutesElapsed = Format((Timer - StartTime) / 86400, "hh:mm:ss")
.Label7.Caption = "Identifying Headers: " & j & " identified. Total Time: " & MinutesElapsed & " minutes"
End With
'
' For counter = 1 To UBound(paraNr)
' Debug.Print paraNr(counter) & vbTab & paraContent(counter)
' Next
End Sub
And After the user choose the paragraph, the bookmarks are being managed by this call
With objWord.Selection
BookmarkID = "ExpContent" & PositionReference
wordDoc.Bookmarks(BookmarkID).Select
.InsertParagraphBefore
End With
Once again, thank you

I think the fastest approach is going to be looping only the numbered paragraphs, rather than all paragraphs. This can be done using the ListParagraphs object. For example:
Sub IdOutlineLevels()
Dim rng As word.Range
Dim para As word.Paragraph
Dim lstFormat As word.ListFormat
Dim paraNr() As Variant
Dim paraContent() As Variant
Dim counter As Long, paraIndex As Long
counter = 1
paraIndex = 1
Set rng = ActiveDocument.content
For Each para In rng.ListParagraphs
Set lstFormat = para.Range.ListFormat
Select Case lstFormat.ListLevelNumber
Case 1, 2, 3, 4
If lstFormat.ListString <> "" Then
ReDim Preserve paraNr(counter)
ReDim Preserve paraContent(counter)
paraContent(counter) = lstFormat.ListString & " " _
& Trim(Left(para.Range.Text, (Len(para.Range.Text) - 1)))
paraNr(counter) = paraIndex
counter = counter + 1
ActiveDocument.Bookmarks.Add Name:="ExpContent" & counter, Range:=para.Range
End If
Case Else
End Select
paraIndex = paraIndex + 1
Next
For counter = 1 To UBound(paraNr)
Debug.Print paraNr(counter) & vbTab & paraContent(counter)
Next
End Sub
Rather than relying on the index number of the paragraph in the document to locate the paragraph again I've added bookmarks to each of the paragraphs using the same "counter" as the paragraph number. This is the technique Word, itself, uses for cross-referencing.

Related

How to find the first incident of any signature in a list/array within an email?

I want to give credit to an agent, if they're the one that sent the message, but only if their signature is at the top of the email.
Here is what I have. The search order is off. The code searches for one name at a time, and clear through the document. I need it to search for All names, the first one that hits in the body of the email.
Sub CountOccurences_SpecificText_In_Folder()
Dim MailItem As Outlook.MailItem
Dim strSpecificText As String
Dim tmpStr As String
Dim x As Integer
Dim Count As Integer
Dim HunterCnt As Integer
Dim SunmolaCnt As Integer
Dim RodriguezCnt As Integer
Dim MammedatyCnt As Integer
Dim MitchellCnt As Integer
Dim TannerCnt As Integer
Dim TAYLORCnt As Integer
Dim WilsonCnt As Integer
Dim WilliamsCnt As Integer
Dim GrooverCnt As Integer
Dim TyreeCnt As Integer
Dim ChapmanCnt As Integer
Dim LukerCnt As Integer
Dim KlinedinstCnt As Integer
Dim HicksCnt As Integer
Dim NATHANIALCnt As Integer
Dim SkinnerCnt As Integer
Dim SimonsCnt As Integer
Dim AgentNames(14) As Variant
AgentNames(0) = "Simons"
AgentNames(1) = "Skinner"
AgentNames(2) = "Mammedaty"
AgentNames(3) = "Hunter"
AgentNames(4) = "Sunmola"
AgentNames(5) = "Rodriguez"
AgentNames(6) = "Mitchell"
AgentNames(7) = "Tanner"
AgentNames(8) = "Taylor"
AgentNames(9) = "Wilson"
AgentNames(10) = "Williams"
AgentNames(11) = "Groover"
AgentNames(12) = "Tyree"
AgentNames(13) = "Chapman"
AgentNames(14) = "Luker"
x = 0
While x < ActiveExplorer.Selection.Count
x = x + 1
Set MailItem = ActiveExplorer.Selection.item(x)
tmpStr = MailItem.Body
For Each Agent In AgentNames
If InStr(tmpStr, Agent) <> 0 Then
If Agent = "Assunta" Then
HunterCnt = HunterCnt + 1
GoTo skip
End If
If Agent = "Sunmola" Then
SunmolaCnt = SunmolaCnt + 1
GoTo skip
End If
If Agent = "Rodriguez" Then
RodriguezCnt = RodriguezCnt + 1
GoTo skip
End If
If Agent = "Mammedaty" Then
MammedatyCnt = MammedatyCnt + 1
GoTo skip
End If
If Agent = "Mitchell" Then
MitchellCnt = MitchellCnt + 1
GoTo skip
End If
If Agent = "Tanner" Then
TannerCnt = TannerCnt + 1
GoTo skip
End If
If Agent = "Taylor" Then
TAYLORCnt = TAYLORCnt + 1
GoTo skip
End If
If Agent = "Wilson" Then
WilsonCnt = WilsonCnt + 1
GoTo skip
End If
If Agent = "Williams" Then
WilliamsCnt = WilliamsCnt + 1
GoTo skip
End If
If Agent = "Groover" Then
GrooverCnt = GrooverCnt + 1
GoTo skip
End If
If Agent = "Tyree" Then
TyreeCnt = TyreeCnt + 1
GoTo skip
End If
If Agent = "Chapman" Then
ChapmanCnt = ChapmanCnt + 1
GoTo skip
End If
If Agent = "Luker" Then
LukerCnt = LukerCnt + 1
GoTo skip
End If
If Agent = "Hicks" Then
HicksCnt = HicksCnt + 1
GoTo skip
End If
End If
Next
skip:
Count = Count + 1
Wend
MsgBox "Found " & vbCrLf & "Hunter Count: " & HunterCnt & vbCrLf & "Sunmola Count: " & SunmolaCnt & vbCrLf & "Rodriguez Count: " & RodriguezCnt & vbCrLf & "Mammedaty Count: " & MammedatyCnt & vbCrLf & "Mitchell Count: " & MitchellCnt & vbCrLf & "Tanner Count: " & TannerCnt & vbCrLf & "Taylor Count: " & TAYLORCnt & vbCrLf & "Wilson Count: " & WilsonCnt & vbCrLf & "Williams Count: " & WilliamsCnt & vbCrLf & "Groover Count: " & GrooverCnt & vbCrLf & "Tyree Count: " & TyreeCnt & vbCrLf & "Chapman Count: " & ChapmanCnt & vbCrLf & "Luker Count: " & LukerCnt & vbCrLf & " in: " & Count & " emails"
End Sub
InStr returns positional information. While it is difficult to find the first occurrence of an array member within the text (you would need to build and compare matches), you can find the first position of each name then find which came first.
For example (untested)
Sub CountOccurences_SpecificText_In_Folder()
Dim MailItem As Outlook.MailItem
Dim i As Long, x As Long, position As Long, First As Long
Dim AgentNames() As String
AgentNames = Split("Simons,Skinner,Mammedaty,Hunter,Sunmola,Rodriguez,Mitchell,Tanner,Taylor,Wilson,Williams,Groover,Tyree,Chapman,Luker", ",")
Dim AgentCount(LBound(AgentNames) To UBound(AgentNames)) As Long
For i = LBound(AgentCount) To UBound(AgentCount)
AgentCount(i) = 0
Next i
For Each MailItem In ActiveExplorer.Selection
x = 0
For i = LBound(AgentNames) To UBound(AgentNames)
position = InStr(MailItem.Body, AgentNames(i))
If x > 0 Then
If position < x Then
x = position
First = i
End If
Else
If position > 0 Then
x = position
First = i
End If
End If
Next i
AgentCount(First) = AgentCount(First) + 1
Next MailItem
For i = LBound(AgentNames) To UBound(AgentNames)
Debug.Print AgentNames(i) & " Count: " & AgentCount(i)
Next i
End Sub
The idea in the previous answer may be better implemented like this:
Option Explicit
Sub CountOccurences_SpecificText_SelectedItems()
Dim objItem As Object
Dim objMail As MailItem
Dim i As Long
Dim j As Long
Dim x As Long
Dim position As Long
Dim First As Long
Dim AgentNames() As String
AgentNames = Split("Simons,Skinner,Mammedaty,Hunter,Sunmola,Rodriguez,Mitchell,Tanner,Taylor,Wilson,Williams,Groover,Tyree,Chapman,Luker", ",")
ReDim AgentCount(LBound(AgentNames) To UBound(AgentNames)) As Long
For j = 1 To ActiveExplorer.Selection.Count
Set objItem = ActiveExplorer.Selection(j)
' Verify before attempting to return mailitem poroperties
If TypeOf objItem Is MailItem Then
Set objMail = objItem
Debug.Print
Debug.Print "objMail.Subject: " & objMail.Subject
x = Len(objMail.Body)
For i = LBound(AgentNames) To UBound(AgentNames)
Debug.Print
Debug.Print "AgentNames(i): " & AgentNames(i)
position = InStr(objMail.Body, AgentNames(i))
Debug.Print " position: " & position
If position > 0 Then
If position < x Then
x = position
First = i
End If
End If
Debug.Print "Lowest position: " & x
Debug.Print " Current first: " & AgentNames(First)
Next i
If x < Len(objMail.Body) Then
AgentCount(First) = AgentCount(First) + 1
Debug.Print
Debug.Print AgentNames(First) & " was found first"
Else
Debug.Print "No agent found."
End If
End If
Next
For i = LBound(AgentNames) To UBound(AgentNames)
Debug.Print AgentNames(i) & " Count: " & AgentCount(i)
Next i
End Sub

How can I prevent VBA code, used to draw up a list of unique words from a word document, from slowing down as the document get's longer

I have used some code from the internet, modified slightly for my specific use case, to draw up a list of unique words from a word document, the code works without a problem, but the time to execute the code seems to grow exponentially as the document length increases. Can anyone give me any suggestions to speed up the code when working with very long documents?
Sub UniqueWordList()
Dim wList As New Collection
Dim wrd
Dim chkwrd
Dim sTemp As String
Dim k As Long
Dim cWrd As Long
Dim tWrd As Long
Dim nWrd As String
Dim Flag As Boolean
Flag = False
tWrd = ActiveDocument.Range.Words.Count
cWrd = 0
For Each wrd In ActiveDocument.Range.Words
cWrd = cWrd + 1
If cWrd Mod 100 = 0 Then
Application.StatusBar = "Updating: " & (cWrd)
End If
If Flag Then
Flag = False
GoTo nw
End If
If cWrd < tWrd Then
nWrd = ActiveDocument.Words(cWrd + 1)
nWrd = Trim(LCase(nWrd))
End If
sTemp = Trim(LCase(wrd))
If sTemp = "‘" Then
sTemp = sTemp & nWrd
Flag = True
End If
If sTemp Like "*[a-zA-Z]*" Then
k = 0
For Each chkwrd In wList
k = k + 1
If chkwrd = sTemp Then GoTo nw
If chkwrd > sTemp Then
wList.Add Item:=sTemp, Before:=k
GoTo nw
End If
Next chkwrd
wList.Add Item:=sTemp
End If
nw:
Next wrd
sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
sTemp = sTemp & "in the document, before this summary, but there "
sTemp = sTemp & "are only " & wList.Count & " unique words."
ActiveDocument.Range.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.TypeText vbCrLf & sTemp & vbCrLf
For Each chkwrd In wList
Selection.TypeText chkwrd & vbCrLf
Next chkwrd
End Sub
After some suggestions I modified my code to use a scripting dictionary, this however does not seem to have solved the problem. Also to answer the concern regarding my message at the end, I understand that the wording is off, what I want is a list of words from the document but each word only once.
Sub UniqueWordListMi()
Dim wList() As String
Dim sTemp As String
Dim cWrd As Long
Dim tWrd As Long
Dim nWrd As String
Dim Flag As Boolean
Dim IsInArray As Boolean
Dim arrsize As Long
Dim rra2 As Variant
arrsize = 0
Flag = False
tWrd = ActiveDocument.Range.Words.Count
cWrd = 1
ReDim Preserve wList(0 To arrsize)
wList(arrsize) = "UNQ"
For Each wrd In ActiveDocument.Range.Words
If cWrd Mod 100 = 0 Then
Application.StatusBar = "Updating" & (cWrd)
End If
If Flag Then
Flag = False
GoTo nw
End If
If cWrd < tWrd Then
nWrd = ActiveDocument.Words(cWrd + 1)
nWrd = Trim(LCase(nWrd))
End If
sTemp = Trim(LCase(wrd))
If sTemp = "‘" Then
sTemp = sTemp & nWrd
Flag = True
End If
If sTemp Like "*[a-zA-Z]*" Then
ReDim Preserve wList(0 To arrsize)
wList(arrsize) = sTemp
arrsize = arrsize + 1
End If
nw:
cWrd = cWrd + 1
Next wrd
Set Dict = CreateObject("scripting.dictionary")
For i = 0 To UBound(wList)
If (Not Dict.Exists(CStr(wList(i)))) Then Dict.Add CStr(wList(i)), wList(i) 'Next i
Next i
rra2 = Dict.Items
sTemp = "There are " & ActiveDocument.Range.Words.Count & " words "
sTemp = sTemp & "in the document, before this summary, but there "
sTemp = sTemp & "are only " & UBound(wList) & " unique words."
ActiveDocument.Range.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.TypeText vbCrLf & sTemp & vbCrLf
For u = 0 To UBound(rra2)
Selection.TypeText vbCrLf & rra2(u) & vbCrLf
Next u
End Sub
#AlexK beat me to it with a comment on using a Scripting.Dictionary.
Something like this might help
Option Explicit
Public Function CountUniqueWords(ByRef ipRange As Word.Range) As Scripting.Dictionary
Dim myUniqueWords As Scripting.Dictionary
Set myUniqueWords = New Scripting.Dictionary
Dim myPara As Variant
For Each myPara In ipRange.Paragraphs
Dim myWord As Variant
For Each myWord In Split(myPara.Range.Text)
If myUniqueWords.Exists(myWord) Then
myUniqueWords.Item(myWord) = myUniqueWords.Item(myWord) + 1
Else
myUniqueWords.Add myWord, 1
End If
Next
Next
Set CountUniqueWords = myUniqueWords
End Function
Some polishing might be required to meet your specific requirements.
You can't help some increase in processing time as the document gets longer but as the access to the document is limited to paragraphs rather than words is should proceed somewhat faster.
Try the following code. It uses the dictionary directly with the rules of your code.
Note that this will only improve your code. But still, the longer the document will get, the more words need to be checked and the more time it will need. That fact will not change, you can just optimize it by using the dictionary directly but more words need more time to check.
Option Explicit
Sub UniqueWordListMi()
Dim wList As Object
Set wList = CreateObject("scripting.dictionary")
Dim sTemp As String
Dim cWrd As Long
Dim tWrd As Long
Dim nWrd As String
Dim Flag As Boolean
Dim IsInArray As Boolean
Dim arrsize As Long
Dim rra2 As Variant
arrsize = 0
Flag = False
tWrd = ActiveDocument.Range.Words.Count
cWrd = 1
Dim wrd As Variant
For Each wrd In ActiveDocument.Range.Words
If cWrd Mod 100 = 0 Then
Application.StatusBar = "Updating" & (cWrd)
End If
If Flag Then
Flag = False
GoTo nw
End If
If cWrd < tWrd Then
nWrd = ActiveDocument.Words(cWrd + 1)
nWrd = Trim(LCase(nWrd))
End If
sTemp = Trim(LCase(wrd))
If sTemp = "‘" Then
sTemp = sTemp & nWrd
Flag = True
End If
If sTemp Like "*[a-zA-Z]*" Then
If Not wList.Exists(sTemp) Then
wList.Add sTemp, 1
Else
wList.Item(sTemp) = wList.Item(sTemp) + 1
End If
cWrd = cWrd + 1
End If
nw:
Next wrd
sTemp = "There are " & (cWrd - 1) & " words "
sTemp = sTemp & "in the document, before this summary, but there "
sTemp = sTemp & "are only " & wList.Count & " distinct words."
ActiveDocument.Range.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.TypeText vbCrLf & sTemp & vbCrLf
Dim chkwrd As Variant
For Each chkwrd In wList
Selection.TypeText chkwrd & vbTab & wList.Item(chkwrd) & " times" & vbCrLf
Next chkwrd
End Sub
The following example:
This is an example test where every word is unique except one.
There are 12 words in the document, before this summary, but there are only 11 distinct words.
this 1 times
is 2 times
an 1 times
example 1 times
test 1 times
where 1 times
every 1 times
word 1 times
unique 1 times
except 1 times
one 1 times
With everyone's help and some additional reading, as well as some help from a reddit user this code work's perfectly:
Sub UniqueWordListFast()
Dim WordDictionary As Object
Dim SourceText As Document
Dim objWord As Object
Dim sTemp As String, strWord As String, nxtWord As String
Dim count As Long
count = 0
Set WordDictionary = CreateObject("Scripting.Dictionary")
Set SourceText = Application.ActiveDocument
For Each objWord In SourceText.Range.Words
count = count + 1
strWord = Trim(objWord.Text)
If strWord = nxtWord Then GoTo nw
If strWord Like "*[a-z]*" Then WordDictionary(strWord) = strWord
If strWord Like "‘" Then
nxtWord = Trim(SourceText.Words(count + 1))
strWord = strWord & nxtWord
WordDictionary(strWord) = strWord
End If
nw:
Next
sTemp = "[DOCUMENT] " & vbTab & SourceText.Name & vbCrLf & vbCrLf & _
"There are " & SourceText.Range.Words.count & " words in the document, " & _
"before this summary, but there are only " & WordDictionary.count & " unique words."
Dim NewDocument As Document
Set NewDocument = Documents.Add
NewDocument.Range.Text = sTemp & vbCrLf & Join(WordDictionary.Keys, vbCrLf)
End Sub
Extremely fast and efficient. Thank you everyone!

Rename PDF files with VBA

I am trying to rename some pdf files with this kind of name: "2020-01-24-GOOGLE.NY-JPM-XXXXXXXXX.pdf"
into: "2020 01 24 - GOOGLE - JPM - 30p.pdf" with 30p meaning 30 pages (the number of pages in the pdf file).
The structure of the name is always the same, only the letters / numbers change.
I have already prepared some code (that you can find below), yet I am struggling with two things:
How can I "extract" the Broker name, (here JPM)
How can I get the number of pages in the pdf ? I have seen some solutions on the forum requiring Adobe Pro, yet I do not have access to it
Do you have any ideas to solve this problem ?
Here is the code:
Sub FetchName()
Dim nameArray() As Variant
Dim renameArray() As Variant
Dim myPath As String
Dim myFile As String
Dim r As Integer
Dim Year As String
Dim Month As String
Dim Day As String
Dim Company As String
Dim Broker As String
Dim NPage As String
Dim numElements As Integer
Dim s As Integer
Dim t As Integer
Dim AcroDoc As Object
Dim StartNum As Integer
Dim numCar As Integer
'get two inputs
myPath = Worksheets("Cover").Cells(3, 4)
Company = Worksheets("Cover").Cells(3, 2)
'get names in an array
myFile = Dir(myPath & "*.pdf")
r = 1
Do While myFile <> ""
ReDim Preserve nameArray(r)
nameArray(UBound(nameArray)) = myFile
r = r + 1
myFile = Dir
Loop
numElements = UBound(nameArray) - LBound(nameArray) + 1
'prepare array with new names
s = 1
For s = 1 To numElements
Year = Left(nameArray(s), 4)
Month = Mid(nameArray(s), 6, 2)
Day = Mid(nameArray(s), 9, 2)
StartNum = InStr(1, Replace(nameArray(s), "-", "~", 4), "~")
numCar = InStr(1, Replace(nameArray(s), "-", "~", 5), "~") - InStr(1, Replace(nameArray(s), "-", "~", 4), "~") + 1
Broker = Mid(nameArray(s), StartNum, numCar)
'numpage
'ReDim Preserve renameArray(r)
'renameArray(UBound(renameArray)+1) = Year & " " & Month & " " & Day & " - " & Company & " - " & Broker & " - " & NPage & "p"
s = s + 1
Next s
'rename files with renameArray
t = 1
For t = 1 To numElements
Name myPath & nameArray(1) As myPath & renameArray(1)
t = t + 1
Next t
End Sub
enter code here
For the Broker name, you can use InStrRev to search for the position of the last and second last dashes:
namePDF = "2020-01-24-GOOGLE.NY-JPM-XXXXXXXXX.pdf"
lastDashAt = InStrRev(namePDF, "-")
secondLastDashAt = InStrRev(namePDF, "-", lastDashAt - 1)
Broker = Mid(namePDF, secondLastDashAt + 1, lastDashAt - secondLastDashAt - 1)

Excel VBA - Split a cell into 1000 pieces and copy them into different cells

I was wondering if there is a way to split a cell with for example 6000 words into 1000 word pieces. So for example, 1000 words in cell C1, then the next 1000 words in C2 and so on.
Here is the code I have so far.
The output of that code (Cell C1) should be split, with C6 with 1000 words, C7 with 1000 words and so on until no more words are available.
Thank you in advance!
Option Explicit
Option Base 1
Dim dStr As String, aCell As Range
Dim cet, i As Long
Sub countWords()
Application.ScreenUpdating = False
Dim iniWords As Long, lWords As Long
Dim wK As Worksheet
On Error GoTo Err
Set wK = ActiveSheet
dStr = Join(WorksheetFunction.Transpose(wK.Range("A1:A" & wK.Range("A" & Rows.Count).End(xlUp).Row).Value), " ")
'iniWords = WorksheetFunction.CountA(wK.Range("A1:A" & wK.Rows.Count))
cet = Split(dStr, " ")
iniWords = UBound(cet)
wK.Range("A1:A" & wK.Range("A" & Rows.Count).End(xlUp).Row).RemoveDuplicates Columns:=1, Header:=xlNo
'lWords = WorksheetFunction.CountA(wK.Range("A1:A" & wK.Rows.Count))
dStr = Join(WorksheetFunction.Transpose(wK.Range("A1:A" & wK.Range("A" & Rows.Count).End(xlUp).Row).Value), " ")
cet = Split(dStr, " ")
dStr = ""
For i = LBound(cet) To UBound(cet)
If Trim(cet(i)) <> "" And InStr(dStr, Trim(cet(i))) = 0 Then
dStr = Trim(dStr) & " " & Trim(cet(i))
End If
Next i
dStr = Trim(dStr)
cet = Split(dStr, " ")
lWords = UBound(cet)
wK.Range("C1") = dStr
Application.ScreenUpdating = True
MsgBox "Words: " & iniWords & vbNewLine & _
"Removed duplicates " & iniWords - lWords & vbNewLine & _
"Remaining Words " & lWords
Exit Sub
Err:
MsgBox "There is no data in row A"
End Sub
you could use this:
Option Explicit
Sub main()
Const NWORDS As Long = 100 '<--| it's the number of words you want each cell to be written with. change it to your needs
Dim strng As String
Dim rowOffset As Long
With Range("C1")
strng = .Value
rowOffset = 5 '<--| point to C7 at the first iteration
Do
strng = Replace(strng, " ", "|", , NWORDS) '<--| "mark" the first NWORDS with a different separator (be sure pipe ("|") is not a character you can have in your words)
.Offset(rowOffset).Value = Replace(Left(strng, InStrRev(strng, "|") - 1), "|", " ") '<--| write first NWORDS words in current 'rowoffset' cell
strng = Right(strng, Len(strng) - InStrRev(strng, "|"))
rowOffset = rowOffset + 1 '<--| update row offset
Loop While UBound(Split(strng, " ")) > NWORDS - 1
.Offset(rowOffset).Value = strng
End With
End Sub

Improve VBA flexibility to convert VLOOKUP to INDEX/MATCH

After all my searching for code to read in a VLOOKUP formula and converting it to INDEX/MATCH came up empty, I wrote some myself.
However, the code (below) is lacking some of the flexibility I would like, but I can't seem to figure out how to make it work. Specifically, I would like to test each range criterion in the VLOOKUP formula for being an absolute reference or not, i.e. preceded by $, and carry that through to the INDEX/MATCH formula that results. For example, the formula =VLOOKUP(A2,$A$1:B$11,2,FALSE) should convert to =INDEX(B$1:B$11,MATCH(A2,$A1:$A11,0)).
NOTE: This sub depends on two functions (ColumnLetterToNumber and ColumnNumberToLetter). As their names imply they take column letters or numbers and interconvert them. Both these functions are short, simple, and work without problems. However, if anyone believes that the code to one or both of them would be helpful, I would be happy to provide them.
Additionally, any ideas on improving code readability and/or execution efficiency would also be appreciated.
Option Explicit
Public Sub ConvertToIndex()
Dim booLookupType As Boolean
Dim booLeftOfColon As Boolean
Dim booHasRowRef As Boolean
Dim lngStartCol As Long
Dim lngRefCol As Long
Dim lngStart As Long
Dim lngEnd As Long
Dim lngMatchType As Long
Dim lngInt As Long
Dim lngRowRef As Long
Dim strRefCol As String
Dim strOldFormula As String
Dim strNewFormula As String
Dim strLookupCell As String
Dim strValueCol As String
Dim strMatchCol As String
Dim strStartRow As String
Dim strEndRow As String
Dim strCheck As String
Dim strLookupRange As String
Dim strTabRef As String
Dim strSheetRef As String
Dim rngToMod As Range
Dim rngModCell As Range
Set rngToMod = Selection
For Each rngModCell In rngToMod
strOldFormula = rngModCell.Formula
lngStart = InStrRev(strOldFormula, "VLOOKUP(")
If lngStart > 0 Then
lngStart = InStr(lngStart, strOldFormula, "(") + 1
lngEnd = InStr(lngStart, strOldFormula, ",")
strLookupCell = Mid(strOldFormula, lngStart, lngEnd - lngStart)
lngStart = lngEnd + 1
lngEnd = InStr(lngStart, strOldFormula, ",")
strLookupRange = Mid(strOldFormula, lngStart, lngEnd - lngStart)
lngStart = lngEnd + 1
lngEnd = InStr(lngStart, strOldFormula, ",")
lngRefCol = CInt(Mid(strOldFormula, lngStart, lngEnd - lngStart))
lngStart = lngEnd + 1
lngEnd = InStr(lngStart, strOldFormula, ")")
booLookupType = (Mid(strOldFormula, lngStart, lngEnd - lngStart) = "TRUE")
If booLookupType Then
lngMatchType = 1
Else
lngMatchType = 0
End If
booLeftOfColon = True
lngEnd = InStr(1, strLookupRange, "]")
If lngEnd > 0 Then
strSheetRef = Left(strLookupRange, lngEnd)
strLookupRange = Right(strLookupRange, Len(strLookupRange) - lngEnd)
Else
strSheetRef = ""
End If
lngEnd = InStr(1, strLookupRange, "!")
If lngEnd > 0 Then
strTabRef = Left(strLookupRange, lngEnd)
strLookupRange = Right(strLookupRange, Len(strLookupRange) - lngEnd)
Else
strTabRef = ""
End If
For lngInt = 1 To Len(strLookupRange)
strCheck = Mid(strLookupRange, lngInt, 1)
Select Case True
Case strCheck = ":"
booLeftOfColon = False
Case booLeftOfColon
If IsNumeric(strCheck) Then
strStartRow = strStartRow & strCheck
Else
strMatchCol = strMatchCol & strCheck
End If
Case Else
If IsNumeric(strCheck) Then strEndRow = strEndRow & strCheck
End Select
Next lngInt
strMatchCol = Replace(strMatchCol, "$", "")
lngStartCol = ColumnLetterToNumber(strMatchCol)
strValueCol = ColumnNumberToLetter(lngStartCol + lngRefCol - 1)
If Len(strStartRow) > 0 Then strStartRow = "$" & strStartRow
If Len(strEndRow) > 0 Then strEndRow = "$" & strEndRow
strValueCol = strSheetRef & strTabRef & strValueCol & strStartRow & ":" & strValueCol & strEndRow
strMatchCol = strSheetRef & strTabRef & strMatchCol & strStartRow & ":" & strMatchCol & strEndRow
strNewFormula = "=INDEX(" & strValueCol & ",MATCH(" & "$" & strLookupCell & "," & strMatchCol & "," & lngMatchType & "))"
rngModCell.Formula = strNewFormula
End If
Next rngModCell
End Sub
At this time I am not looking for help to take this to the next step of enabling it to process VLOOKUP/HLOOKUP or VLOOKUP/MATCH combination formulas.
To avoid all errors I can think of, you would need to change it to a not so good looking way like this:
Sub changeToIndex()
Dim xText As Boolean
Dim xBrac As Long
Dim VLSep As New Collection
Dim i As Long, t As String
With Selection.Cells(1, 1) 'just for now
'it assumes that there is NEVER a text string which has VLOOKUP like =A1&"mean text with VLOOKUP inside it"
While InStr(1, .Formula, "VLOOKUP", vbTextCompare)
Set VLSep = New Collection
VLSep.Add " " & InStr(1, .Formula, "VLOOKUP", vbTextCompare) + 7
'get the parts
For i = VLSep(1) + 1 To Len(.Formula)
t = Mid(.Formula, i, 1)
If t = """" Then
xText = Not xText
ElseIf Not xText Then 'avoid "(", ")" and "," inside of the string to be count
If t = "(" Then
xBrac = xBrac + 1
ElseIf xBrac Then 'cover up if inside of other functions
If t = ")" Then xBrac = xBrac - 1
ElseIf t = ")" Then
VLSep.Add " " & i
Exit For
ElseIf t = "," Then
VLSep.Add " " & i 'the space is to avoid errors with index and item if both are numbers
End If
End If
Next
Dim xFind As String 'get all the parts
Dim xRng As String
Dim xCol As String
Dim xType As String
xFind = Mid(.Formula, VLSep(1) + 1, VLSep(2) - VLSep(1) - 1)
xRng = Mid(.Formula, VLSep(2) + 1, VLSep(3) - VLSep(2) - 1)
xCol = Mid(.Formula, VLSep(3) + 1, VLSep(4) - VLSep(3) - 1)
If VLSep.Count = 5 Then
xType = Mid(.Formula, VLSep(4) + 1, VLSep(5) - VLSep(4) - 1)
Else
xType = "0"
End If
Dim fullFormulaNew As String 'get the whole formulas
Dim fullFormulaOld As String
fullFormulaNew = "INDEX(" & xRng & ",MATCH(" & xFind & ",INDEX(" & xRng & ",,1)," & xType & ")," & xCol & ")"
fullFormulaOld = Mid(Selection.Cells(1, 1).Formula, VLSep(1) - 7, VLSep(VLSep.Count) - VLSep(1) + 8)
.Formula = Replace(.Formula, fullFormulaOld, fullFormulaNew) 'simply replace the old one with the new one
Wend
End With
End Sub
It also should work for very complex formulas. Still you would need some special checks to cut everything so it looks like you want. I just assumed that the range for the vlookup may be something like IF(A1=1,B1:C10,L5:N30) and this said, you would need additional subs to also clear something like this up. :(
A formula like
=VLOOKUP(VLOOKUP(IF(TRUE,A2,"aaa"),$A$1:B$11,2),$B$1:$C$11,2,FALSE)
will be changed (messed up) this way to
=INDEX($B$1:$C$11,MATCH(INDEX($A$1:B$11,MATCH(IF(TRUE,A2,"aaa"),INDEX($A$1:B$11,,1),0),2),INDEX($B$1:$C$11,,1),FALSE),2)
EDIT
Assuming your formulas are "normal" you can replace the the last part with:
Dim xFind As String 'get all the parts
Dim xRngI As String, xRngM As String
Dim xCol As String
Dim xType As String
xFind = Mid(.Formula, VLSep(1) + 1, VLSep(2) - VLSep(1) - 1)
xRngI = Mid(.Formula, VLSep(2) + 1, VLSep(3) - VLSep(2) - 1)
xCol = Mid(.Formula, VLSep(3) + 1, VLSep(4) - VLSep(3) - 1)
If VLSep.Count = 5 Then
xType = Mid(.Formula, VLSep(4) + 1, VLSep(5) - VLSep(4) - 1)
Else
xType = "0"
End If
If xType = "FALSE" Then xType = 0
Do While Not IsNumeric(xCol)
Select Case MsgBox("Error: The Column to pick from is not numerical! Do you want to manually set the column (Yes) or directly use the last column of the input range (No)?", vbYesNoCancel)
Case vbYes
xCol = Application.InputBox("Input the column number for the input range (" & xRngI & "). '1' will be the range " & Range(xRngI).Columns(1).Address(0, 0) & ".", "Column to pick from", 1, , , , , 2)
Case vbNo
xCol = Range(xRngI).Columns.Count
Case vbCancel
xCol = " "
Exit Do
End Select
If xCol <> CInt(xCol) Or xCol > Range(xRngI).Columns.Count Or xCol = 0 Then xCol = " "
Loop
If IsNumeric(xCol) Then
Dim absRs As Boolean, absRe As Boolean, absCs As Boolean, absCe As Boolean
absCs = (Left(xRngI, 1) = "$")
absCe = (Mid(xRngI, InStr(xRngI, ":") + 1, 1) = "$")
absRs = (InStr(2, Left(xRngI, InStr(xRngI, ":") - 1), "$") > 0)
absRe = (InStr(Mid(xRngI, InStr(xRngI, ":") + 2), "$") > 0)
xRngM = Range(xRngI).Columns(1).Cells(1, 1).Address(absRs, absCs) & ":" & Range(xRngI).Columns(1).Cells(Range(xRngI).Rows.Count, 1).Address(absRe, absCs) 'for MATCH
xRngI = Range(xRngI).Cells(1, CLng(xCol)).Address(absRs, absCe) & ":" & Range(xRngI).Cells(Range(xRngI).Rows.Count, CLng(xCol)).Address(absRe, absCe) 'for INDEX
Dim fullFormulaNew As String, fullFormulaOld As String
fullFormulaNew = "INDEX(" & xRngI & ",MATCH(" & xFind & "," & xRngM & "," & xType & "))"
fullFormulaOld = Mid(Selection.Cells(1, 1).Formula, VLSep(1) - 7, VLSep(VLSep.Count) - VLSep(1) + 8)
.Formula = Replace(.Formula, fullFormulaOld, fullFormulaNew) 'simply replace the old one with the new one
End If
Wend
End With
End Sub
As you can see: the "simpler" the outcome, the more code you need. If the lookup_range is not just a address, this will fail.
If you still have any questions, just ask ;)