Split a string on the Paragraph marker - vba

I have a text string that is delimited by paragraph markers. I'm trying to do a split using the paragraph markers but keep getting a type mismatch error. I've tried the following code but no luck. Any suggestions would be greatly appreciated.
vData1 = Split(vData, vbcr)
vData1 = Split(vData, vbCrLf)
vData1 = Split(vData, vbLf)
None of these lines have worked.

I'm going to go out on a limb and suggest that you are confusing a pilcrow character (e.g. ¶) with line feeds and carriage returns.
A carriage return is ASCII character 13. It is typically paired with a line feed and accomplished by tapping the Enter↵ key.
A line feed is half of a vbCRLF and is ASCII character 10. Combining Shift+Enter↵ or Alt+Enter↵ typically accomplishes a line feed.
The pilcrow character (¶) is ASCII character 182. It is used in MS Word to visually illustrate a carriage return or paragraph terminator but has no functional properties other than as a visual indicator.
My suggestion would be to split on a pilcrow.
vData1 = Split(vData, Chr(182))

In My case I Had in MS-Word
Some Text1
Some Text2
Some Text3
Using following Code just worked fine to print Some Text3
Sub Demo()
Dim StrTxt As String
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "<[A-Z]*^13*^13*^13"
.Execute
End With
StrTxt = Split(.Text, vbCr)(2)
MsgBox (StrTxt)
End With
End Sub

Related

Find/Replace an Inserted Check Box Symbol with a Check Box Content Control

I would like to find/replace all inserted check box symbols with checkbox content controls. The symbol's font is Wingdings (either 111 or 168). Below is the code I started with, but I hit a wall when I realized that Word find doesn't recognize the symbol. I appreciate any help or guidance. Thank you.
Sub ReplaceUnicode168()
Dim objContentControl As ContentControl
With ActiveDocument
Set objContentControl = ActiveDocument.ContentControls.Add(wdContentControlCheckBox)
objContentControl.Cut
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.Text = Chr(168)
.Replacement.Text = "^c"
.Execute Replace:=wdReplaceAll
End With
End With
End Sub
I suggest that you try to find/replace these two particular characters using
.Text = ChrW(61551)
for the "111" WingDings Character and
.Text = ChrW(61608)
for the "168" WingDings character.
Be aware that the way Word encodes these characters is not very helpful. As far as Find/Replace is concerned, you have to use these Unicode Private Use Area encodings.
If you actually select the character and use VBA to discover its code using e.g.
Debug.Print AscW(Selection)
the answer is always 40 (and the Font of the character will probably be the same as the Surrounding font) Pretty useless. In older versions of Word you used to be able to look for the 40 character and find these characters, but I don't think that's possible now. But if you select the character and use
Sub SymInfo()
With Dialogs(wdDialogInsertSymbol)
' You won't see .Font and .CharNum listed under the
' properties of a Word.Dialog - some older Dialogs add
' per-Dialog properties at runtime.
Debug.Print .Font
Debug.Print .CharNum
End With
End Sub
Then you get the font name (Wingdings in this case) and the private use area character number, except it's expressed as a negative number (-3928 for Wingdings 168). The character to use in the Find/Replace is 65536-3928 = 61608.
Alternatively, you can find the private use area code by selecting the character, getting its WordOpenXML code, then finding the XML element that gives the code (and the font). Ideally use MSXML to look for the element but the following gives the general idea.
Sub getSymElement
Dim finish As Long
Dim start As Long
Dim x As String
x = Selection.WordOpenXML
start = Instr(1,x,"<w:sym")
' Should check for start = 0 (not found) here.
finish = Instr(start,x,">")
Debug.Print Mid(x,start, finish + 1 - start)
and for the 168 character you should see something like
<w:sym w:font="Wingdings" w:char="F0A8"/>
(Hex F0A8 is 61608)
There may be a problem where Word could potentially map more than one font/code to the same unicode private use area codepoint. There is some further code by Dave Rado here but I do not think you will need it for this particular problem.
After some follow-up, the following seems to work reasonably well here:
Sub replaceWingdingsWithCCs()
Dim cc As Word.ContentControl
Dim charcode As Variant
Dim ccchecked As Variant
Dim i As Integer
Dim r As Word.Range
' Make sure the selection point is not in the way
' (If the selection contains one of the characters you are trying to
' replace, Word will raise an error about the selection being in a
' plain text content control.
' If the first item in the document is not a CC,
' it's enouugh to do this:
ActiveDocument.Select
Selection.Collapse WdCollapseDirection.wdCollapseStart
' Put the character codes you need to look for here. Maybe you have some checked boxes too?
charcode = Array(61551, 61608)
' FOr each code, say whether you want a checked box (True) or an unchecked one.
ccchecked = Array(False, False)
For i = LBound(charcode) To UBound(charcode)
Set r = ActiveDocument.Range
With r.Find
.ClearFormatting
With .Replacement
.ClearFormatting
.Text = ""
End With
.Forward = True
.Wrap = wdFindStop
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.Text = ChrW(charcode(i))
Do While .Execute(Replace:=True)
Set cc = r.ContentControls.Add(WdContentControlType.wdContentControlCheckBox)
cc.Checked = ccchecked(i)
r.End = r.Document.Range.End
r.Start = cc.Range.End + 1
Set cc = Nothing
Loop
End With
Next
Set r = Nothing
End Sub

How to replace with multiline string?

The code for find and replace is working when replacement string is short (in a line) but causing problem when replacement string is multiline string (may be without new line char).
With WordDoc.Content.Find
.Text = "<<audit_standard>>"
.Replacement.Text = Range("B9")
.Wrap = 1
.Execute Replace:=1
End With
This works when Cell B9 content is short and can be fit in a single line in Word file.
Neither a Find expression nor a Replace expression can include a line break. Wrapped lines in the source are of no consequence however, unless they are more than 255 characters long (the F/R string-length limit). What might be able to use is:
Range("B9").Copy
With WordDoc.Content.Find
.Text = "<<audit_standard>>"
.Replacement.Text = "^c"
.Wrap = 1
.Execute Replace:=1
End With
Alternatively, to insert a line break (not a paragraph break) but otherwise discard the source formatting:
With WordDoc.Content.Find
.Text = "<<audit_standard>>"
.Replacement.Text = Replace(Range("B9").Text, Chr(10), "^l")
.Wrap = 1
.Execute Replace:=1
End With
If you want a paragraph break instead of a line break, replace ^l with ^p.
After researching about it little more, I found an easy and elegant solution.
I do not see any problem with my new approach for now. It's working perfectly.
In word document I created a bookmark named "audit_standard" for <<audit_standard>> text.
and then wrote below code in VBA for replacement.
WordDoc.Bookmarks("audit_standard").Range.text = Range("B9").Value

Word VBA: Convert Superscripts in Footers/Headers

I'm using code like this to output all the headers/footers per section to a text file:
Word VBA - getting text file output to look right
I've been able to convert all the special characters in the sHeader/sFooter string variables using Replace() without a problem.
Example: sFooter = Replace(sFooter, ChrW(8804), "^R'\ {\uc2\u8804 <=}'")
All is great until I try to replace a superscript. Alt+x only gives me the value of the letter itself. I've tried looking up the hex value (may not even be a thing) with no success. I've even tried the different Latin character values.
I recorded a macro to see how Word would replace it and got:
Selection.Find.ClearFormatting
With Selection.Find.Font
.Superscript = True
.Subscript = False
End With
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "A"
.Replacement.Text = "^super{a}"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
The problem is I don't know how to use them together.
I tried to use oSec.Footers(wdHeaderFooterFirstPage).Range.Find... instead of Selection.Find... but that results in a null string.
Can I make a String object somehow from sHeader/sFooter so I can utilize the recorded macro code? I tried googling, converting string to object, but didn't get anything help. Is there a better approach?
Thanks in advance.
I know this isn't exactly answering your question, but the problem is that the character is the same regardless of whether it is superscript or not, i.e. superscript is formatting, just like bold or italics.
For example, a bold A, a normal A and a superscript A all have code 65.
If you are saving to a text file, you won't be able to save formatting, so you would need to invent your own representation for a superscript character.
The representation from the Find/Replace code, i.e. "^super{a}" is what Word uses to allow you to do Find/Replace on text with a specific formatting.
Note: There are some special characters such as ² which have a separate character code to 2, so these would have a different code.
Thank you M1chael for your input.
I was able to get the recorded macro above to work by setting the selection to the current header/footer section.
Example:
oSec.Footers(wdHeaderFooterPrimary).Range.Select
Selection = ActiveDocument.ActiveWindow.Selection
Thanks

VBA emulation of WinWord's file compare for strings

I'm replacing certain strings throughout a WinWord document with strings of a slightly different spelling with revision tracking being enabled.
Revision tracking will mark the whole original string as being deleted and the whole replacement string as being inserted. Anybody who is reviewing the text and wanting to know why a certain string has been replaced will have to visually compare both strings, even if they differ only in one or two characters.
I would much prefer if only the differing characters are being marked as revisions. That would probably mean that I have to emulate WinWords file compare function, albeit not for whole files but for strings within a given file. Has anybody already tried anything like that or a good idea of how to approach this task? (I know, it is possible to copy original and replacement string into 2 empty files, compare the files and use the result, but with hundreds of strings within a single file this is no practical solution.)
One way to do this is to be more selective in how you're actually replacing the words. The built in Find/Replace works by replacing the entire word, but by utilising VBA you can be more specific about what is being replaced. So we still use the built in Find function to identify the words to be replaced, but then you could iterate through each character in the word and compare against your replacement text therefore only replacing what is necessary. I've commented to below code to explain how an example of this would work.
This would produce output as in the below examples, only highlighting the actual characters that have been changed.
Sub replaceDifferencesOnly()
Dim findText As String: findText = "Analyze"
Dim replaceText As String: replaceText = "Analyse"
Dim found As Boolean: found = True
' Using the built-in 'Find' Function, loop through each instance of
' findText within the Document.
While found
With Selection.Find
.Text = findText
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = True
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
' If findText is found within the Document...
If Selection.Find.Execute = True Then
Dim match As Range
Dim char As Range
Dim position As Integer: position = 1
' Transfer the Selections Range into a seperate Range object.
Set match = Selection.Range
For Each char In match.Characters
Select Case Len(replaceText)
Case Is >= position
' If findText and replaceText are currently of an equal length...
' Simply compare and replace differing character.
If Not char.Text = Mid(replaceText, position, 1) Then
char.InsertBefore Mid(replaceText, position, 1)
' Inserting a Character will extend the size of the current 'character' or Range
' Move the start position of the range on so as to only delete the original unwanted character.
char.MoveStart
char.Delete wdCharacter
End If
Case Is < position
' If replaceText is shorter than findText...
' Simply delete the remaining unwanted characters in findText.
char.Delete
End Select
position = position + 1
Next char
If position <= Len(replaceText) Then
' Finally if replaceText is longer than findText ie. we've processed each original character in
' findText but there are still more characters in replaceText...
' Simply append the remaining characetrs within replaceText to the end of the Range
match.InsertAfter Mid(replaceText, position, (Len(replaceText) - position) + 1)
End If
Else
' No match from Find so exit the routine as there is nothing more to replace.
found = False
End If
Wend
' Clear the current Selected text.
Selection.Collapse
End Sub

writing word macro -- how do I write a Find to locate the next Korean character on a line?

I'm trying to take a large file of Korean vocabulary and set it up to import smoothly into a flashcard program. The format of the file is [Korean word/phrase] [English translation] [Korean sample sentence]. Example:
너무 피곤해서 Because I’m tired 너무 피곤해서 잤어요.
I can write a macro to look for the first English letter and replace the space before it with a tab. I identified the first English letter by searching for the range [a-Z]. After that I want to locate the beginning of the sample sentence by searching for the next Korean character encountered, but what is the range for Korean characters?
I found a unicode FAQ on Korean characters which seemed to suggest that each character is really just a combination of individual letters, and in some kinds of programming can be treated as the separate letters, but I probably misunderstood. The idea was that something like "식" is really the three letters "ㅅ" + "ㅣ" + "ㄱ". So I tried a search on just the one letter "ㅅ" (which appears in tons of characters in my input file) and got no hits. That sure had the potential to make things simple, but no dice.
Okay, got it -- found the range here:
http://en.wikipedia.org/wiki/Korean_language_and_computers#Hangul_in_Unicode
The below code in my macro finds the next Korean character in a Word document:
With Selection.Find
.Text = "[" & ChrW(44032) & "-" & ChrW(55203) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute