word.object.model how to detect if range's word is hyphenated - vb.net

I'm writing a spellchecker in VB.NET. To highlight the words, I use the GetPoint() method to get the coordinates/bounding box of that word. A problem occurs when the misspelled word is hyphenated as GetPoint() returns a huge box (containing both lines of the hyphenated word from start to end of the line).
I have not yet found a way to detect whether a range is hyphenated and if so, get the coordinates of both boxes. Both boxes here meaning, the box for the first part of the hyphenated word at the end of the line and the second box being the box of the second part of the word in the new line.
I have tried both TextRetrievalMode and IncludeHiddenText and checked if the .Text of the range has any hidden characters, but there isn't any such thing. Is there a way to do this? Ideally, I would like to have two boxes for each part of the hyphenated word.
As you can see in the image, when the misspelled word ανθρπος is hyphenated, the orange box is huge, contains two document lines and starts from the beginning and up to the end. The word καλλός is also misspelled. It's there just as a reference as a second misspelled word.
This is the function that gets the current word's (range) rectangle:
Public Function getRangeRectangle(w As Word.Window, r As Word.Range) As System.Drawing.Rectangle
Dim left As Integer
Dim top As Integer
Dim width As Integer
Dim height As Integer
Try
'r.TextRetrievalMode.IncludeFieldCodes = True
'r.TextRetrievalMode.IncludeHiddenText = True
w.GetPoint(left, top, width, height, r)
'getRangeRectangle = New System.Drawing.Rectangle(left, top, width, (wa.PointsToPixels(r.Font.Size) + 3) * (wa.ActiveDocument.ActiveWindow.View.Zoom.Percentage / 100))
getRangeRectangle = New System.Drawing.Rectangle(left, top, width, height)
gfrm_Drawing.DrawRectangle(getRangeRectangle, System.Drawing.Color.Orange)
'If (i > 0) Then r.MoveEnd(Word.WdUnits.wdCharacter, i)
Catch ex As Exception
getRangeRectangle = New System.Drawing.Rectangle()
'MsgBox(getRangeRectangle.X & ":" & getRangeRectangle.Y & ":" & getRangeRectangle.Height & ":" & getRangeRectangle.Width)
'MsgBox("getRangeRectangle: " & r.Text & r.Start & ":" & ex.ToString())
Finally
End Try
End Function

Related

Count words in a Microsoft Word document by document style?

In analogy to this question, I would like to script a VBA script that counts words formatted with a certain document style, more precisely: a certain paragraph style.
Sub CountTypeface()
Dim lngWord As Long
Dim lngCountIt As Long
Const Typeface As String = "Calibri"
Const MyFormat As String = "My Paragraph Style"
'Ignore any document "Words" that aren't real words (CR, LF etc)
For lngWord = 1 To ActiveDocument.Words.Count
If Len(Trim(ActiveDocument.Words(lngWord))) > 1 Then
If ActiveDocument.Styles(lngWord) = MyFormat Then
lngCountIt = lngCountIt + 1
End If
End If
Next lngWord
MsgBox "Number of " & Typeface & " words: " & lngCountIt
End Sub
But running this code results in the runtime error:
runtime error "5941".: the requested member of the collection does not exist
Why does this happen and how to fix it?
You're using your word count iterator as the index for the style collection. You have more words than Styles has indices, and the If would only be true one time, since you aren't checking the word's style.
Replace:
If ActiveDocument.Styles(lngWord) = MyFormat Then
With:
If ActiveDocument.Words(lngWord).Style = MyFormat Then

How to count how many tabs are in a selection using Macro for formatting tables

I have to format a large document for a file that has been created from a PDF which is editable, so I know all the text is there.
The document is a series of tables. In Word the tables look pretty OK, but in some cases where there should be various cells there is just 1 and tabs have been used to align the text. So, it looks good, but if any of the text gets changed then the formatting will get messed up. I would like to have a macro that looks for cells with a tab, selects the cell, counts the number of tabs, divides the cell into the right number of cells and puts the text into the right cell. For example, a cell that contains "text 1 [tab]text 2[tab]text 3" would become 3 cells "text 1", "text 2" and "text 3".
I thought Word would be able to convert the text to a table, but when the text is already in a table it doesn't work.
If anyone has any suggestions as to how I might achieve this, then they would be much appreciated!
My main issue is not knowing how to count how many tabs are in a selection.
This function will return the number of Tabs in the given string.
Function CountTabs() As Integer
Dim Txt As String
Txt = "This is" & vbTab & "a test" & vbTab & "to count Tabs"
CountTabs = Len(Txt) - Len(Replace(Txt, vbTab, ""))
End Function
The tab character - Chr(9) - is replace with nothing and the number of tabs is the difference in character count before and after the replacement. Here is an implementation of the idea in a snippet.
Private Sub Snippet()
Dim Txt As String
Dim Count As Integer
Txt = "This is" & vbTab & "a test" & vbTab & "to count Tabs"
Count = Len(Txt) - Len(Replace(Txt, vbTab, ""))
MsgBox "There are " & Count & " tabs."
End Sub
Of course, how you get the text for the variable Txt is another story and, in the context of this forum, another question. Prophylactically, I advise against using the Selection object, however. Try to use the Range object instead.

Need to preserve initial paragraph and character styles in text selection after adding hyperlinks

I am adding hyperlinks to selected text in Word. The selected text may have formatted text in it, such as italics, small caps, bold, etc.
I am able to preserve the CHARACTER styles, i.e., small caps, if the ENTIRE selection is the same character style (e.g., small caps). However, if the text includes unformatted text along with the formatted text, then the character style is replaced with the hyperlink style.
I inserted screenshots of before and after adding the hyperlinks.
BEFORE (no hyperlinks added):
AFTER (each line has a hyperlink added):
What I'm trying to end up with is something like the "SMALL CAPS TEST" as it:
preserves the initial small caps style
does not get overwritten with the HYPERLINK style (blue underlined text)
Any help is much appreciated. I'm at a loss on how to step through the style applied to each word or even character in the text selection.
UPDATE:
I managed to get rid of the Hyperlink character style SHOWING up as blue and underlined. However, the character style will still overwrite the previous character styles (i.e., Bold, Small Caps, Italics). Here is my code that will provide the correct character style applied until the Hyperlink is attached. However, if the entire selection is all ONE character style, then it is still preserved (probably due to the code line:
Selection.style = MyStyle
I need to be able to apply the correct character style to each character.
Here is my code:
``Sub AddHLINK()
Dim myText, myISBN, myCurrISBN, myURL, myCharStyle, ochr As String
Dim NoCharSty As Boolean
Dim MyStyle, MyFont, mychar, mycharstyleb As String
Dim charcount, j, k As Integer
myURL = "http://www.google.com"
myText = Selection.Text
charcount = Len(myText)
For k = 1 To charcount
mychar = Selection.Characters(k).Text
MyStyle = Selection.Characters(k).Style
If ActiveDocument.Characters(k).Style = wdStyleTypeCharacter Then
myCharStyle = MyStyle
NoCharSty = False
Else
NoCharSty = True
End If
Next k
MsgBox "the current url is " & myURL
Stop
ActiveDocument.Hyperlinks.Add Anchor:=Selection.Range, Address:=myURL, TextToDisplay:=myText
End Sub

What does a hyperlink range.start and range.end refer to?

I'm trying to manipulate some text from a MS Word document that includes hyperlinks. However, I'm tripping up at understanding exactly what Range.Start and Range.End are returning.
I banged a few random words into an empty document, and added some hyperlinks. Then wrote the following macro...
Sub ExtractHyperlinks()
Dim rHyperlink As Range
Dim rEverything As Range
Dim wdHyperlink As Hyperlink
For Each wdHyperlink In ActiveDocument.Hyperlinks
Set rHyperlink = wdHyperlink.Range
Set rEverything = ActiveDocument.Range
rEverything.TextRetrievalMode.IncludeFieldCodes = True
Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start) & "#" & vbCrLf
Next
End Sub
However, the output between the #s does not quite match up with the hyperlinks, and is more than a character or two out. So if the .Start and .End do not return char positions, what do they return?
This is a bit of a simplification but it's because rEverything counts everything before the hyperlink, then all the characters in the hyperlink field code (including 1 character for each of the opening and closing field code braces), then all the characters in the hyperlink field result, then all the characters after the field.
However, the character count in the range (e.g. rEverything.Characters.Count or len(rEverything)) only includes the field result if TextRetrievalMode.IncludeFieldCodes is set to False and only includes the field code if TextRetrievalMode.IncludeFieldCodes is set to True.
So the character count is always smaller than the range.End-range.Start.
In this case if you change your Debug expression to something like
Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start - (rEverything.End - rEverything.Start - 1 - Len(rEverything))) & "#" & vbCrLf
you may see results more along the lines you expect.
Another way to visualise what is going on is as follows:
Create a very short document with a piece of text followed by a short hyperlink field with short result, followed by a piece of text. Put the following code in a module:
Sub Select1()
Dim i as long
With ActiveDocument
For i = .Range.Start to .Range.End
.Range(i,i).Select
Next
End With
End Sub
Insert a breakpoint on the "Next" line.
Then run the code once with the field codes displayed and once with the field results displayed. You should see the progress of the selection "pause" either at the beginning or the end of the field, as the Select keeps "selecting" something that you cannot actually see.
Range.Start returns the character position from the beginning of the document to the start of the range; Range.End to the end of the range.
BUT everything visible as characters are not the only things that get counted, and therein lies the problem.
Examples of "hidden" things that are counted, but not visible:
"control characters" associated with content controls
"control characters" associated with fields (which also means hyperlinks), which can be seen if field result is toggled to field code display using Alt+F9
table structures (ANSI 07 and ANSI 13)
text with the font formatting "hidden"
For this reason, using Range.Start and Range.End to get a "real" position in the document is neither reliable nor recommended. The properties are useful, for example, to set the position of one range relative to the position of another.
You can get a somewhat more accurate result using the Range.TextRetrievalMode boolean properties IncludeHiddenText and IncludeFieldCodes. But these don't affect the structural elements involved with content controls and tables.
Thank you both so much for pointing out this approach was doomed but that I could still use .Start/.End for relative positions. What I was ultimately trying to do was turn a passed paragraph into HTML, with the hyperlinks.
I'll post what worked here in case anyone else has a use for it.
Function ExtractHyperlinks(rParagraph As Range) As String
Dim rHyperlink As Range
Dim wdHyperlink As Hyperlink
Dim iCaretHold As Integer, iCaretMove As Integer, rCaret As Range
Dim s As String
iCaretHold = 1
iCaretMove = 1
For Each wdHyperlink In rParagraph.Hyperlinks
Set rHyperlink = wdHyperlink.Range
Do
Set rCaret = ActiveDocument.Range(rParagraph.Characters(iCaretMove).Start, rParagraph.Characters(iCaretMove).End)
If RangeContains(rHyperlink, rCaret) Then
s = s & Mid(rParagraph.Text, iCaretHold, iCaretMove - iCaretHold) & "" & IIf(wdHyperlink.TextToDisplay <> "", wdHyperlink.TextToDisplay, wdHyperlink.Address) & ""
iCaretHold = iCaretMove + Len(wdHyperlink.TextToDisplay)
iCaretMove = iCaretHold
Exit Do
Else
iCaretMove = iCaretMove + 1
End If
Loop Until iCaretMove > Len(rParagraph.Text)
Next
If iCaretMove < Len(rParagraph.Text) Then
s = s & Mid(rParagraph.Text, iCaretMove)
End If
ExtractHyperlinks = "<p>" & s & "</p>"
End Function
Function RangeContains(rParent As Range, rChild As Range) As Boolean
If rChild.Start >= rParent.Start And rChild.End <= rParent.End Then
RangeContains = True
Else
RangeContains = False
End If
End Function

vba Word: Why can Word use special chars in bookmark creation, and I cant?

Here's a head-splitter:
I'm trying to programmatically create hidden bookmarks for existing headings in a doc, so that I can create hyperlinks elsewhere in the doc that point to these bookmarks. (I want to use hyperlinks instead of cross-references so I can specify my own 'Display Text' for the links, which isnt possible using cross-refs).
I want my bookmarks to be named after the headings they relate to, with a custom prefix.
Example:
style: Heading1
heading text: Entrance & Hallway
bookmark name: _Hd1_Entrance_&_Hallway
I'm specifying a custom prefix to make each bookmark unique to it's style, so I can then have 2 matching headings in the doc, so long as they are in different heading styles. (example: _Hd1_Entrance_&_Hallway and _Hd3_Entrance_&_Hallway)
The catch is: if my heading contains special chars like '&', I get a 'Bad Bookmark Name' error, which I understand, and this is documented on the web. I'm only allowed to use a limited character set.
So how come if I manually create a hyperlink using Word's own dialog, selecting a 'Place In This Document' such as a heading like "Entrance & Hallway", Word manages this no problem? Once the Hlink is created, I can now see the hidden bookmark associated with this Hlink in Word's 'Bookmarks' dialog - and it's quite happily named "_Entrance _&_Hallway". This confounds me!
Anyone have an explanation? I'd really like to be able to leverage this same functionality, but cannot fathom how. Any help is greatly valued!
Thanks,
Sub ScratchPad_Bookmarks()
Dim doc As Document
Dim rng As Range
Dim sHdName As String
Dim sBmName As String
Set doc = ActiveDocument
'Insert a heading at start of document
sHdName = "Entrance & Hallway"
doc.Range.InsertBefore sHdName & vbCr
doc.Paragraphs(1).Range.Style = doc.Styles("Heading 1")
'Find the above heading in the active document
Set rng = doc.Range
With rng.Find
.ClearFormatting
.Text = sHdName
.Style = "Heading 1"
If Not .Execute Then
'Heading not found, so quit
Exit Sub
End If
End With
'rng has collapsed to the found heading, so create a bookmark
'rng.Select 'debug
sBmName = Replace(rng.Text, " ", "_")
rng.Collapse wdCollapseStart
rng.Bookmarks.Add sBmName
'sBmName contains '&' so this throws a Runtime error:
'5828: Bad Bookmark Name (as expected)
End Sub
The above doesnt work. However to test the manual operation yourself is easy. Just create a heading that includes a '&' character, style it as Heading 1.
In the next paragraph, insert a hyperlink using Word's own dialog. Select Place In This Doc and select the heading you just created. Shouldn't be a problem.
Now open Word's Bookmark dialog, enable the Hidden Bookmarks view, and voila: a hidden bookmark with a '&' character. (Wd 2010) Say what?!
IMO Word is "cheating", either by breaking its own rules for bookmark names, or by denying you the full range of names that it allowed by the Specification.
If you go back to the early ECMA standard for .docx, a bookmark's name is defined as an ST_String, which is a restriction of xsd:string with a maximum length of 255 characters. I do not think this has changed since in any of the ECMA or ISO versions of the standards.
However, Microsoft's implementation notes [MS-OE376].pdf, which pertain to the ECMA version of the specs, and [MS-OI29500].pdf, which pertain to the 2012 ISO version, specify that the name can be no longer than 40 characters, but do not describe any other limitations.
My (previous) understanding was that bookmark names were limited to 40 characters, had to consist of "letters", "digits" and "underscores" and had to start with an underscore (for hidden bookmarks) or a letter. (Not sure about, e.g. "_1"). And I believe that VBA still imposes those rules, although I have never checked what its understanding of a "letter" or a "digit" is - are non-Latin letters/digits allowed?
However, if you save a document as a .xml or edit the document.xml within a .docx, you can modify the bookmark name so that it contains, e.g. "&". Further, Word will re-save such characters. But it will truncate names to 40 characters when you open, and it doesn't retain the original name when you re-save. I don't think the use of initial "_" to denote "hidden" is in the standard, either.
So, given the specifications, I would say that Word is "cheating" by not allowing you to use the full range of possible names, rather than "cheating" by allowing hyperlink destinations to contain "&".
You could insert bookmarks with "&" in Windows versions of Word in VBA by using InsertXML to insert a chunk of XML with a preconfigured bookmark name (see some simple sample code below) but I suspect you would have to work rather harder to move a bookmark. You would probably have to extract the exiting XML of the thing you wanted to "cover", then surround it with the bookmarkStart and bookmarkEnd tags, and that sounds like a pretty nasty exercise to me.
As a final observation, AFAICR the bookmark names that you can specify for legacy form fields have a 20-character length restriction.
That code:
Sub insertbm()
Dim x As String
x = ""
x = x & "<?xml version='1.0' encoding='utf-8' standalone='yes'?>"
x = x & "<pkg:package xmlns:pkg='http://schemas.microsoft.com/office/2006/xmlPackage'>"
x = x & " <pkg:part pkg:name='/_rels/.rels' pkg:contentType='application/vnd.openxmlformats-package.relationships+xml'>"
x = x & " <pkg:xmlData>"
x = x & " <Relationships xmlns='http://schemas.openxmlformats.org/package/2006/relationships'>"
x = x & " <Relationship Id='rId1' Type='http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument' Target='word/document.xml' />"
x = x & " </Relationships>"
x = x & " </pkg:xmlData>"
x = x & " </pkg:part>"
x = x & " <pkg:part pkg:name='/word/document.xml' pkg:contentType='application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml'>"
x = x & " <pkg:xmlData>"
x = x & " <w:document xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'>"
x = x & " <w:body>"
x = x & " <w:p>"
x = x & " <w:bookmarkStart w:id='0' w:name='bookmark&name' />"
x = x & " <w:r>"
x = x & " <w:t>"
x = x & "bookmarkedtext</w:t>"
x = x & " </w:r>"
x = x & " <w:bookmarkEnd w:id='0' />"
x = x & " </w:p>"
x = x & " </w:body>"
x = x & " </w:document>"
x = x & " </pkg:xmlData>"
x = x & " </pkg:part>"
x = x & "</pkg:package>"
Selection.InsertXML x
End Sub