(MS Word) Removing character style IF text has specific paragraph style applied - vba

My Google-fu must be very weak today, ’cause this seems like an obvious thing to need to do sometimes, yet I cannot find a single case of anyone ever asking about it anywhere…
I have a document that I am preparing for proper typesetting in InDesign, which includes among other things getting rid of local overrides to paragraph and character styles. I did an find-and-replace to replace all instances of italic text with a character style called Italic, but stupidly forgot to limit this to text with the Normal style applied.
There are hundreds of headers strewn throughout the document which are supposed to be italic; that’s part of their paragraph style definitions. Since I forgot to limit the find-and-replace, the Italic style was applied to all these many headers. Annoyingly, since ‘italic’ is something like a boolean switch in Word, this means that all these headers are now not italic in the document.
I didn’t notice this for a while, so I can’t simply undo it now—the file has been saved and worked on since the find-and-replace. The author (who is a cantankerous, octogenarian technophobe) also needs to see the file again before it’s set, so while ultimately it doesn’t matter whether or not the font is italic in the Word document, it will matter to him.
So what I would very much like to do is to search for all text which has both the paragraph style Header and the character style Italic applied, and remove the character style.
This is an easy task in InDesign where paragraph and character styles are separate entities, but not in Word where they’re all lumped together in one big, messy pile. It doesn’t seem like it can be done through the UI, so I’m guessing I’ll have to resort to a VBA macro… which I’m utterly incompetent at.
Is there a way to find text with a particular paragraph style and a particular character style, and then remove the character style from that text?

here is some code to get you started
press F5 to run code, it will stop at Stop command
examine the Immediate window to determine the header style
each paragraph gets selected, so that you can tell which one you are examining
you can then modify the code with if/then statement to make specific paragraphs italic
Sub aaaa()
Dim ppp As Paragraph
Dim ccc As Range
For Each ppp In ActiveDocument.Content.Paragraphs
ppp.Range.Select ' visual aid only. not used by any other part of the program
Debug.Print "style :", ppp.Style
Debug.Print ppp.Style.Description
Stop
For Each ccc In ppp.Range.Characters ' you can probably comment out these 3 lines
Debug.Print ccc.FormattedText.Italic ' True prints as -1
Next ccc
Debug.Print "italic :", ppp.Range.Italic ' prints -1 if all are italic. 9999999 if some. 0 if none
ppp.Range.Italic = False ' this removes italic from whole paragraph
Next ppp
End Sub

Related

Finding and redacting text highlighted with a specific color, but while keeping the spaces and line breaks (to maintain doc layout)

I'm trying to use the VBA code from a similar question in this forum to redact text highlighted in a specific color, but I would like to keep the document layout, which means only replacing the words, but not the spaces and paragraph breaks in the document. Alternatively, I would be happy if we could identify the line breaks and put a space there.
At the end the document would not have large sections of unbroken text where words and spaces were replaced by XXXXXXXX and highlighted black. It the text would look more like XX X XXXX XXX X but all of it should be highlighted in black.
In other words, the text "Mary had a little lamb." would be redacted to "XXXX XXX X XXXXXX XXXXX" rather than XXXXXXXXXXXXXXXXXXXXXXXX.
I've tried changing the "If flag then" section to include unicode 32 (space) instead of the carriage return (unicode 13), but that doesn't seem to work.
Many thanks.
If flag Then
If Selection.Range.HighlightColorIndex = wdTurquoise Then
' Create replacement string
' If last character is a carriage return (unicode 13), then keep that carriage return
OldText = Selection.Text
OldLastChar = Right(OldText, 1)
NewLastChar = ReplaceChar
If OldLastChar Like String(1, 13) Then NewLastChar = String(1, 13)
NewText = String(Len(OldText) - 1, ReplaceChar) & NewLastChar
' Replace text, black block
Selection.Text = NewText
Selection.Font.ColorIndex = wdBlack
Selection.Font.Underline = False
Selection.Range.HighlightColorIndex = wdBlack
Selection.Collapse wdCollapseEnd
End If
End If
#freeflow has given you an answer in his comment on your post, but if you do that you should also include in the wildcard search, all potential punctuation characters excluding blank spaces.
However, with that said, I recommend you not try and eliminate punctuation characters and do not eliminate spaces between words. I’m recommending that because the purpose of redaction is to eliminate the possibility of someone comprehending what the redacted portion of the document originally contained. If you provide them clues, such as how many words in the sentence ... they can guess and sometimes be quite accurate because of the surrounding non-redacted script.
Oh course, that’s just my opinion.
To maintain document formatting, I suggest that you not use as replacement characters letters such as “X” because it is a wide character. I’ve found it better to use a symbol and I recommend a Wingdings character 127. It’s an average width and does a good job of balancing out sentence length ... but for added assurance I also recommend that you include in your replacement a Font.Spacing of -1, which will tighten up each redacted sentence even more.
In redacting, just be aware that maintaining the document formatting, no matter what your replacement character strategy might be, is very difficult. I’ve spent a lot of time experimenting with this and I’ve now shared what I do in my own redaction add-in. I don’t redact paragraph marks, I redact the entire highlighted string, including spaces and punctuation and I use a Wingding font character 127, set the Font.Spacing to -1, at the font color is the same as whatever color I’m using to highlight the redaction.
If you you are interested in seeing my add-in, do a Web search on AuthorTec Redactor.

Update [Style] to Match Selection option on Character Style via macro

I'm having an issue with some character style that doesn't reflect the style change when applied (this doesn't happen always):
Example: I have a character style for italics, and when I apply the character style, the word still appears as normal (but the char style is applied, and checking the properties, the style has the check ok for italics).
In order to fix this, I select the word with the issue, right click on the style and use the option "Update [StyleName] to match selection", and it displays the italics correctly.
The problem here is that when I try to replicate this behavior with a VBA Macro (via recording macro), the macro that Word writes has this error:
Run-time error '5900': The property is not allowed for character styles. This is the line with error:
ActiveDocument.Styles("StyleItalic").AutomaticallyUpdate = False
Looking at the code created via Macro, seems that it is not possible for a character style to be automatically updated.
The character style is also created via macro, and I can't see anything wrong in the style:
Private Sub Creo(style As String, fontName As String, fontSize As Integer, hasItalic As Boolean)
On Error Resume Next
Selection.ClearFormatting
ActiveDocument.Find.style = ActiveDocument.Styles(style)
ActiveDocument.Find.Execute
If ActiveDocument.Find.Found = False Then
ActiveDocument.Styles.Add name:=style, Type:=wdStyleTypeCharacter
ActiveDocument.Styles(style).QuickStyle = True
ActiveDocument.Styles(style).font.Size = fontSize
ActiveDocument.Styles(style).font.name = fontName
ActiveDocument.Styles(style).font.Italic = hasItalic
End If
End Sub
Is there a way to fix this? Hope I explain myself. I am working with 1000+ pages Word Document, so this becomes a bit too tedious for manual editing. Also, sometimes the style in other word works ok, but other doesn't. (All the words have both paragraph style and character style)
Thanks!
Your question describes two unrelated issues.
Applying a character style with the same property as the underlying
paragraph style will cause that property to be turned off in the
text. Updating the character style to match the selection will have
the opposite effect to what you want.
To demonstrate: in a new document type a paragraph of text and apply
a style that is defined as italic, e.g. Quote or Intense Quote.
Select the whole paragraph and press Ctrl+I to turn off italics. Now
select just part of the text and apply the character style named
Emphasis. You will see that it has no apparent effect on the text.
This is because both the paragraph style and the character style are
italic, cancelling each other out and having the same effect as
turning italics off manually.
Now right click on Emphasis and select Update to Match Selection.
The selected text will now be italic but, as the text preview in the
Quick Styles gallery will show you, Emphasis is no longer italic.
Only paragraph and linked styles have the Automatically Update
property, which is why you get an error when attempting to set it on a character style.

MS Word, how to change formatting of entire paragraphs automatically in whole document?

I have a 20-page word document punctuated with descriptive notes throughout, like this:
3 Input Data Requirements
Some requirement text.
NOTE: This is a descriptive note about the requirement, which is the paragraph that I would like to use find-and-replace or a VBA script to select automatically and change the formatting to italicized. The notes invariably end in a carriage-return: ¶.
If it was just a text document, not MS-Word, I would just use a regex in a code editor like sublime to wrap it with <I>...</I> or something along those lines.
Preferably, is there a way to do this in Word's "advanced" find-and-replace feature? Or if not, what's the best way to do it in VBA?
I've tried using a search string like this in find-and-replace: NOTE: *[a-z0-9,. A-Z)(-]{1,255}^l but the line-break part doesn't seem to work, and the 255 char max isn't enough for many of the paragraphs.
EDIT: Another slightly important detail: The doc is automatically generated from another piece of software as a .RTF, which I promptly converted to .docx.
Attempt #2: Use Notepad++ to find and replace using regex. Remove quotes.
Find: "( NOTE: .*?)\r"
Replace with: " \i \1 \i0 \r "
//OLD
Sure is. No VBA or fancy tricks needed.
CTRL + H to bring up the replace dialog.
Click "More".
Select "Font" in the drop down menu called "Format".
Click italics.
Enter find and replace text as the same thing. Make sure you set this up right so that you don't accidentally replace substrings (e.g. goal to replace all " test " with " nice ", testing -> niceing).
Should work. If you need to alter entire paragraphs, consistently, then you probably should have used the styles on those paragraphs to begin with. That way, you can change all of them at once by updating the style itself.
You can use Advance Find, yes. Find Next and then Replace makes the selection Italic.

Word.Range : Move Range index in the formatted text that corresponds to the plain text

I need to analyze text of my Word document, and create bookmarks on range of text my analyzer has detected (almost like a grammar checker).
I don't want use Find() utility, because my needs are too specific.
Explanations
For that,
1/ Retrieve Document plain text
I Retrieve Plain text of the main story of my document :
String plainText = ActiveDocument.Range().Text;
2/ Analyze plain text and get results
I send it to my analyzer tool which return a collection of marker with position :
For example, if I wanted to detected the pattern "my pattern" in the document text, analyzer could return a marker as { pattern : "my marker", start: 5, end : 14 }, where "start" and "end" are the character indexes of the pattern in the plain text sent.
3/ Display results in Document
I create bookmark from theses markers
For previously example, it woold be :
// init a new range and collapse it
Word.Range range = activeDocument.Range(); range.Collapse(WdCollapseStart);
// move character-by-character in the "formatted" text
range.MoveStart(WdUnits.Character, Marker.start ); # Marker.start=5
//set length (end)
range.setRange(range.Start,range.Start+(Marker.End-Marker.Start)); #Marker.end=14
4/ Results
4.1 Global Result
Everything is OK when Document Main Story Contains Text, links, lists, titles :
Ranges are well positionned, Plain Text indexes correlate with formatted text indexes.
4.2 Arrays Issue
When a document contains an array, Ranges are bad positionned a few characters : Plain Text indexes correlate not exactly with formatted text indexes.
I found the reason of this issue (It was explained in others forums) : this is due to non printing char(7), which is a cell delimiter added in plain text. We can handle these chars to calculate position range and everything is OK !
4.3 Issue for Content Controls, Table of contents, Sections and others
When a document contains theses elements, Ranges are also bad positionned a few characters.
Others non printing appears in plain text but I don't understand what it means and how deal with to calculate position range.
By displaying Word element markers with "Developer ribbon > creation mode", we see 2 markers per elements : shifting plain text indexes by 2*elements resolve issues. It's seems OK.
4.4 Issue with Endpaper
I don't know how we says "page de garde" (french) in english, I think it's "endpaper" : this is the first page with specific header, footer and content controls :)
When a document contains an Endpaper, Ranges are also bad positionned a few characters.
But this time, there are not non printing marker in the plain text.
Other info, when I display word element markers with "Developer ribbon > creation mode", I see endpaper markers.
Questions
How detect Endpaper in Word Document Range ?
How understand Plain Text indexes don't always correlate with formatted text indexes, in function of Word document elements which contains ?
XML nodes manipulation would be a more reliable alternative for that? If yes, could you give me good examples to manage bookmars or others in current document with XML Api ?
Others ressources
I found similar issues :
Correlate Range.Text to Range.Start and Range.End
http://www.vbaexpress.com/forum/showthread.php?36710-Strange-character-on-table-range-text
I hope my explanations are clear and you can help me to understand what is wrong or show me a best way to do that ?
Thanks, really.
It's not really pretty but you can try to remove the unwanted characters by Regex. For example to remove the \a letters (it has code 7):
string j = new string(new char[] { (char)7 });
plainText = Regex.Replace(plainText,string.Format("[{0}]", j), "");
Now you have to identify the other 'evil' characters and add them to the char array. If it works you will get a string whose length corresponds with the number of Characters in your document. Probably you have to adapt this code by experimenting. (I was not sure which language you are using - I supposed C#.)
Update
Another idea (if it is applicable to your analyzer tool):
Break your problem down to single paragraphs:
foreach(Word.Paragraph pg in activeDocument.Paragraphs)
{
Word.Range range = pg.Range();
string text = range.Text;
// your stuff here
}
With this paragraph range objects and the contained text strings you do the same as you tried to do with the whole document object and its text - just paragraph by paragraph. All these paragraphs are 'addressable' by ranges and Move operations as you already do it. I suppose that the problematic characters are outside or at the end of the paragraphs so they don't influence the character counting inside these paragraphs.
As I can't reproduce what you call endpaper I can't validate it. Besides I don't know if special text ranges as page headers and tables of content are covered by paragraphs. But at least you can reduce your problem to smaller ranges. I think it is worth trying.

automating word 2010 to generate docs

the webapp was already done on office2007 and i need to convert it so it'll work in office2010.
i was able to convert the header generator part of the code but i have problem with the body of the doc itself. the code copy the data from a "data" doc and paste it into the generated doc.
appword.activewindow.activepane.view.seekview = 0
'set appsel1 = appword.activewindow.selection
set appsel1 = appword.window(filepath).selection -that is the original one
appdoc1.bookmarks("b1").select
appword.selection.insertafter("some text")
appsel1.endkey(6) -the code stops here
appword.selection.insertafter("some other text")
the iexplorer debuger says ERROR:appsel1 object required. and when i view its data using the iexplorer debugger its data is "empty" instead of "{...}"
can anyone tell me what i'm doing wrong
if you need more of the code tell me.
From MSDN
After this method is applied, the selection expands to include the new
text.
If you use this method with a selection that refers to an entire
paragraph, the text is inserted after the ending paragraph mark (the
text will appear at the beginning of the next paragraph). To insert
text at the end of a paragraph, determine the ending point and
subtract 1 from this location (the paragraph mark is one character).
However, if the selection ends with a paragraph mark that also happens
to be the end of the document, Microsoft Word inserts the text before
the final paragraph mark rather than creating a new paragraph at the
end of the document.
Also, if the selection is a bookmark, Word inserts the specified
text but does not extend the selection or the bookmark to include the
new text.
So I suspect that you still have no selected text.
I wonder if you can do a Selection Collapse(wdCollapseStart) but that's just a thought.