Word VBA: How to get the current section number - vba

I'm building a macro that loops through each word of a document and checks via a regex whether it matches a pattern and if so, writes the found word to an excel sheet. It goes like this:
For Each sentence In ActiveDocument.StoryRanges
For Each w In sentence.Words
myWord = w
If TestRegExp(myPattern, myWord) Then
WKS.Cells(myCount, 1).Value = myWord
myCount = myCount + 1
End If
Next
Next
This part works fine. Now I would also like to get the section per found word (aka "in what section did the found word appear"). I found the command "selection.Information" but no matter what I try, I only get "Section = 1". Even if I just check the whole document for sections ("ActiveDocument.Sections.Count") I only get 1. So there must be something off with the sections, but this document definitely has sections. Has anybody an idea what I do wrong?

Page and section numbers in Word are difficult since the document's logical structure may not match the displayed text. I can, for example, reset page numbering in the middle of a document.
Similarly, a "section" to word is the separation of parts of the document by virtue of the insertion of a section break, whether next page, continuous, next odd, next even, etc. However, the reader often thinks of the "section" as the number that appears before a "heading 1" style paragraph. Again, I can reset those numbers mid-document. So, a document that has 3 sections (logical) might have only two headings: none in section 1, one in section 2 that says "1. Introduction", and another in section 3 that says "Appendix A. Glossary". The (logical) sections are still 1, 2, 3......
w = ActiveDocument.Sentences(10).Words(1) ' given some word in the document
MsgBox w.Information(wdActiveEndPageNumber) ' logical page number
MsgBox w.Information(wdActiveEndAdjustedPageNumber) ' displayed page number
MsgBox w.Information(wdActiveEndSectionNumber) ' logical section number
As for the displayed section number, which is there by virtue of being "Heading 1" style and that style having been set to a multi-level numbered format... getting the displayed number of that list item appears to be very difficult, in the general case.
Solutions I've seen involve searching for the previous paragraph that matches a heading style then getting the .ListFormat.ListString from that.
MsgBox w.GoTo(What:=wdGoToHeading, Which:=wdGoToPrevious).ListFormat.ListString
but that gets the previous heading of any level, not just "Heading 1".

Sections are different from StoryRanges, and are part of the StoryRanges(wdMainTextStory) range. You can use For loop instead of For Each loop to get the WdStoryType number:
For i = 1 To ActiveDocument.StoryRanges.Count ' or 1 To 17
For Each w In ActiveDocument.StoryRanges(i).Words
If TestRegExp(myPattern, w) Then
WKS.Cells(myCount, 1).Value = myWord
myCount = myCount + 1
Debug.Print i, myCount, w ' i has the WdStoryType number
End If
Next
Next
Also, RegExp is probably not needed, as Word has wildcard Find and Replace https://superuser.com/questions/86397/wildcards-in-word, and VBA has the Like Operator

Related

Unwanted Character When Looping Word Tables

I have a simple table on a Word document.
When I run the code to loop thru the rows in the table and print the contents of the first column out to the Immediate Window you get a dot below each word.
I tried stripping this dot out using code but because the dot is below the characters on each line I don’t seem to be able to get this to work.
I need to get rid of this dot because what I eventually want to do is compare the contents of each cell to a search string and if they match update another field.
If you have a table on a Word document even without anything entered in the cells and click the Show/Hide Character Marks button – Home toolbar Paragraph section you can see what I believe to be the cause.
Dim tbl As Table, r As Long
Set tbl = ActiveDocument.Tables(1)
Debug.Print tbl.Cell(r, 1).Range.Text
Next r
Input
This is the output I get
As mentioned in a comment on your previous question, you need to remove the last two characters (the "end of cell marker", chr(13) + chr(7) )
txt = tbl.Cell(r, 1).Range.Text
Debug.Print Left(txt, Len(txt)-2)

Cross reference to custom reference type

I'm trying to add a cross reference into a SEQ field.
My document contains "point headings" which means that between two heading elements, the user can add an extension (between 1.1 and 1.2 may be 1.1A, 1.1B, ...)
Here is how the point heading code looks like:
{STYLEREF "HEADING 2" \N}{SEQ "HEADING 2 POINT" \* ALPHABETIC \S 2}
Which results with: 1.1A
I want to be able to do a cross reference into the point heading.
While I can set the reference type into 'Heading' I can't find out how to reference it to a custom element.
Searching through the web did not reveal any solution but some clues that it might be possible:
This website which explains cross-reference formatting, contains an image with custom type (My New Caption).
Microsoft DOC's description for ReferenceType is: The type of item for which a cross-reference is to be inserted. Can be any WdReferenceType or WdCaptionLabelID constant or a user defined caption label.
My client is used to work with the cross reference dialog box hence I prefer this approach, but VBA script will also be appreciated.
Thanks!
Update:
I'll try to describe my constraints and environment.
Headings 1-9 are used inside Multi-Level list item, hence they have custom styling.
They cannot be changed.
For a specific task, which is described and answered here, I've created what I call 'Point Headings'.
'Point Headings' are basically an extension that the user can add in between the Multi-Level numbering with a VBA macro.
Let's say that I have two Heading 2 items (1.1, 1.2), the user can add 1.1A, followed by 1.1B and so on.
The user can add point headings from level 2 up to level 5.
Their style is 'Heading 2 Point', 'Heading 3 Point' and so on, and each one is based on its relevant Heading.
As described above, eventually in the document, the word field has the following structure: {STYLEREF "HEADING 2" \N}{SEQ "HEADING 2 POINT" \* ALPHABETIC \S 2}.
My goal is to be able to cross reference into these items, but they do not appear in the Heading type, well because they are not of style Heading.
I wish to be able to create a custom reference type, which will show these items.
After some research, here is my answer. Hopefully it will help some future viewers.
Private Sub createPointHeader(pointLevel As Integer, Optional appendixPrefix As String = "")
Dim sQuote As String, referencedStyle As String, captionName As String
sQuote = Chr(34)
referencedStyle = appendixPrefix & "Heading " & pointLevel
captionName = referencedStyle & " Point"
With Selection
.Fields.Add .Range, wdFieldEmpty, "StyleRef " & sQuote & referencedStyle & sQuote & " \n", False
.Collapse wdCollapseEnd
CaptionLabels.Add (captionName)
.InsertCaption Label:=captionName, ExcludeLabel:=True
' Select the created field
.MoveLeft Count:=1, Extend:=True
' Replace the syntax from Arabic to Alphabetic
.Fields.ToggleShowCodes
With .find
.Text = "ARABIC"
.Forward = False
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchKashida = False
.MatchDiacritics = False
.MatchAlefHamza = False
.MatchControl = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
If .Found = True Then
Selection.Range.Text = "ALPHABETIC \s " & pointLevel
End If
End With
.Fields.ToggleShowCodes
.Fields.Update
.MoveRight Count:=1
.InsertAfter vbTab
.Collapse wdCollapseEnd
' Apply style after .InsertCaption, because it changes the style to Caption
.Style = ActiveDocument.Styles(referencedStyle & " Point")
End With
End Sub
Few remarks
I have two styles to base upon: Heading (2-5), and Appendix Heading (2-5). This is the reason for the optional appendixPrefix as a sub variable.
CaptionLabels.Add as I've checked can get the same value. No need to check in advance if it already exists.
Selection.InsertCaption automatically changes the style into Caption. This is why I apply the style change at the end.
The result
Here is how Point Heading 2 looks like:
{STYLEREF "HEADING 2" \N"}{SEQ HEADING_2_POINT \* ALPHABETIC \S 2}
Snapshot of the document with the point headings
And finally, as requested, cross reference to the Point headings from the Cross reference box
The question asks how to create a cross reference to a custom reference type. I suspect this answer may actually respond to what the original asker might have been getting at.
The idea is to use custom caption labels. A custom caption label appears (ideally) in the Insert/Cross Reference dialog.
A custom caption label is created when you say Insert/Caption and then ask to add a new custom label.
If you have added a custom caption label yourself in a given document, then it automatically appears as a choice when you say Insert/Cross Reference ...
However a difficulty arises when you are given a document where someone else has already added the cross reference type and you want to edit it (by adding additional cross references to the given type of caption). The secret here is to add the custom caption label yourself (even though it already exists), by creating a new temporary caption with the custom label type. You can then go ahead and delete the temporary caption, but you will from then on be able to add cross references to that caption type.
I use this when I want to make reference to 'Code Snippets' or 'Boxes' or 'Algorithms'.
I'm taking the chance of responding as an answer rather than as a comment as the reply is longish but hopefully should get you going in the right direction.
I think you have been led down the wrong path by the point pages article you have referenced.
I'm assuming that we can't modify the styles 'Heading 1' to 'Heading 9'. If you can then you will be able to adapt the suggestion below to use with only 'Heading 1' to 'Heading 9' styles.
You will need to create some new styles. I've used the following styles
Name Based on Style Outline level
Heading Point 1 Heading 1 1
Heading Point 2 Heading 2 2
Heading Point 2 Ext Heading 2 3
Heading Point 3 Heading 3 4
Heading Point 3 Ext Heading 3 5
Heading Point 4 Heading 4 6
Heading Point 4 Ext Heading 4 7
Heading Point 5 Heading 5 8
Heading Point 5 Ext Heading 5 9
Please note that getting the outline level correct is important for Heading numbering.
Next create a new Multilevel list. Call the list 'PointNumbering' (Because if you do this you can identify the list by the name in VBA should you need this facility). Link the styles 'Heading Point 1' to 'Heading Point 5 Ext' to levels 1 to 9 of the numbering sequence (e.g. Outline level 1 matches level 1 in the numbering sequence etc).
Turn off the legal style numbering for each level otherwise we won't be able to use Alphabetic numbering. Set the numbering scheme as indicated below.
Level Number style format levels* Final Appearance
1 1,2,3, 1 1
2 1,2,3 1.2 1.1
3 A,B,C 1.23 1.1A
4 1,2,3 1.2.4 1.1.1
5 A,B,C 1.2.45 1.1.1A
6 1,2,3 1.2.4.6 1.1.1.1
7 A,B,C 1.2.4.67 1.1.1.1A
8 1,2,3 1.2.4.6.8 1.1.1.1.1
9 A,B,C 1.2.3.6.89 1.1.1.1.1A
The actual levels are picked from a drop down list and appear as '1' in the number format box. This makes getting the numbering wrong quite easy so take care. The last number in each level is obtained by selecting the number format in the 'Number style for this level' box.
Once you have set up your styles and ensured that they are linked to the above numbering scheme you need to adjust the styles used for the headings in you current document.
Do a search and replace to do the following style replacements
Current Style New Style
Heading 1 Heading Point 1
Heading 2 Heading Point 2
Heading 3 Heading Point 3
Heading 4 Heading Point 4
Heading 5 Heading Point 5
Then for each of your extension headings where you are currently creating the numbering using style ref and seq field delete the fields and apply the relevant Ext Heading.
Thus for A,B,C numbering after 'Heading Point 2', apply the 'Heading Point 2 Ext' style.
This should now mean that all Heading Point styles can be accessed through the cross reference dialog.
If you document headings at 'Heading 6' Level 6 and below the after 'Heading Point 5 Ext you can use the Heading styles (Heading 6 to Heading 9) as normal. However, each time you use a Heading 6 you will need to manually reset the number. I think this is an easier task than asking users to insert multiple styleref and seq fields because you just select then right click on the heading number and then tick buttons to enable 'Advanced value (skip number)' which allows you to reset any level within your current Heading Number.
If you subsequently need to create a TOC field for your document you will now have to use the \t switch and provide a list of styles and the level number to use for the style in the TOC. e.g. {toc \t "Heading Point 1,1,Heading Point 2,2,Heading Point 2 Ext,2,Heading Point 3,3,Heading Point 3 Ext,3.....etc}.
I have created and tested all of the above in a Word document.

VBA code not exiting list (MS Word)

Writing a macro to automatically fix paraphrase spacing issues in MS Word docs generated from software we use.
Goal:
All standard paragraphs have 0pt before and after spacing.
All bullet lists have 3pt before and after spacing.
Progress:
Currently I have a function that sets the entire document to 0pt, then looks through for all lists and changes them to 3pt. (currently also have a highlight on so I can easily see what is being treated as a list).
It works great on some parts, but on other parts (I assume based on how the software we use generates the document), the list doesn't exist and it will continue to format blocks of text and heading to 3pt when it is not wanted (see attached images).
Current code is:
Sub Paragraph()
ActiveDocument.Range.ParagraphFormat.SpaceAfter = 0
ActiveDocument.Range.ParagraphFormat.SpaceBefore = 0
Dim li As Word.list
For Each li In ActiveDocument.lists
li.Range.ParagraphFormat.SpaceBefore = 3
li.Range.ParagraphFormat.SpaceAfter = 3
li.Range.HighlightColorIndex = wdYellow
Next li
End Sub
Working:
Not working:
According to the MSDN:
List Object: Represents a single list format that's been applied to specified paragraphs in a document.
So if you have more than one list with some non-bulleted paragraph(s) in the middle, the Range will start with the first item of the first list and end with the last item of the last list including all non-bulleted paragraph(s) in the middle.
To fix this issue, you need to separate the lists (right-click on the bullet and select Separate List). However, you mentioned that the document was generated by some software, so that is probably not an option. In that case, you will have to iterate though the paragraphs of the Range of each List and check if it has a ListFormat.ListTemplate which indicates that it is a list item, otherwise it is a non-bulleted paragraph:
Sub Paragraph()
ActiveDocument.Range.ParagraphFormat.SpaceAfter = 0
ActiveDocument.Range.ParagraphFormat.SpaceBefore = 0
Dim li As Word.List
Dim p As Paragraph
For Each li In ActiveDocument.Lists
For Each p In li.Range.Paragraphs
If Not p.Range.ListFormat.ListTemplate Is Nothing Then
p.Range.ParagraphFormat.SpaceBefore = 3
p.Range.ParagraphFormat.SpaceAfter = 3
p.Range.HighlightColorIndex = wdYellow
End If
Next p
Next li
End Sub
Even before touching VBA:
Use Styles in the document.
Limit the use of Styles in the document to only those that are in the
template.
Set your spacing in the Styles.
If, at some stage, you change your mind and want to use 6pt spacing, you can adjust the template and re-apply it, rather than finding all the VBA code and re-writing it. Not only that, but by using Styles, you can avoid having VBA code, or having VBA-enabled documents which may interfere with some corporate security settings.
Oh, and set up your corporate structure to limit the use of templates to only the approved ones.

Word Macro to convert Bullets into simple Text

I am looking a way to convert the Bullets in Word document to simple text. E.g.
I have these kind of Bullets:
a)-> Apple
b)-> Orange
c)-> Mangoes
I want them to be like this:
a)Apple
b)Oranges
c)Mangoes
I am using this code but it removes the Bullets entirely:
Dim oPara As Paragraph
For Each oPara In ActiveDocument.Paragraphs()
Set r = oPara.Range
If r.ListFormat.RemoveNumbers = wdListBullet Then
r.ListFormat.ApplyListTemplate _
ListTemplate:=ListGalleries(wdNumberGallery) _
.ListTemplates(1)
End If
Set r = Nothing
Next
Is ActiveDocument.ConvertNumbersToText what you're after?
It can also be run on a specific list if you're not doing this globally.
ETA: It seems like ConvertNumbersToText takes a NumberType argument (this isn't documented by the 2010 spec that F1 brings up, but it is valid). Perhaps the default doesn't apply to all the bullets in your document. A combination of the three possibilities might work.
ActiveDocument.ConvertNumbersToText(wdNumberParagraph) 'Preset numbers you can add to paragraphs by selecting a template in the Bullets and Numbering dialog box.
ActiveDocument.ConvertNumbersToText(wdNumberListNum) 'Default value for LISTNUM fields.
ActiveDocument.ConvertNumbersToText(wdNumberAllNumbers) 'Default value for all other cases.
I tend to use the first one, but your case might be different.

VBA- how to remove only SECOND occurrence of a character (in each cell of a Word doc table)

(I have avidly searched the forum, but the only similar 'replace' questions I could find were related to Python, Java etc, and not VBA)
I have a table within a MS Word (2010) document (it has two columns but only the second column has text in)
Some cells in the second column have one line of text and NO paragraph mark
Other cells in the second column have two lines of text and TWO paragraph marks (^p)
There is no regular pattern between these two types of cells
Where there is a second paragraph mark, this always occurs directly before the end-of-cell marker
I really need a macro to remove (replace with " ") this second (=final) paragraph mark from each cell where it occurs, but to ignore any first paragraph marks within the cells.
I would be extremely grateful if someone out there has the time and inclination to help me with this macro. As you might have guessed I have little experience in VBA despite attempting to give myself a crash course: I am more used to recording macros which is not an option in this more intricate case.
My hopes were raised when Google found me this- which looks like it could be adaptable http://www.vbaexpress.com/kb/getarticle.php?kb_id=334; but I have no idea how to tweak it to only replace a SECOND (/final) occurrence of ^p.
Thank you in advance.
How about:
Dim tbl As Table
Dim c As Cell
For Each tbl In ActiveDocument.Tables
For Each c In tbl.Columns(2).Cells
'The end of a cell without a carriage return is vbCr & Chr(7)
If Right(c.Range.Text, 3) = vbCr & vbCr & Chr(7) Then
c.Range.Text = Mid(c.Range.Text, 1, Len(c.Range.Text) - 3) & Chr(7)
End If
Next
Next