How can I determine if text is in Cyrillic characters? - vba

My junk mail folder has been filling up with messages composed in what appears to be the Cyrillic alphabet. If a message body or a message subject is in Cyrillic, I want to permanently delete it.
On my screen I see Cyrillic characters, but when I iterate through the messages in VBA within Outlook, the "Subject" property of the message returns question marks.
How can I determine if the subject of the message is in Cyrillic characters?
(Note: I have examined the "InternetCodepage" property - it's usually Western European.)

The String datatype in VB/VBA can handle Unicode characters, but the IDE itself has trouble displaying them (hence the question marks).
I wrote an IsCyrillic function that might help you out. The function takes a single String argument and returns True if the string contains at least one Cyrillic character. I tested this code with Outlook 2007 and it seems to work fine. To test it, I sent myself a few e-mails with Cyrillic text in the subject line and verified that my test code could correctly pick out those e-mails from among everything else in my Inbox.
So, I actually have two code snippets:
The code that contains the IsCyrillic function. This can be copy-pasted
into a new VBA module or added to
the code you already have.
The Test routine I wrote (in Outlook VBA) to test that the code actually works. It demonstrates how to use the IsCyrillic function.
The Code
Option Explicit
Public Const errInvalidArgument = 5
' Returns True if sText contains at least one Cyrillic character'
' NOTE: Assumes UTF-16 encoding'
Public Function IsCyrillic(ByVal sText As String) As Boolean
Dim i As Long
' Loop through each char. If we hit a Cryrillic char, return True.'
For i = 1 To Len(sText)
If IsCharCyrillic(Mid(sText, i, 1)) Then
IsCyrillic = True
Exit Function
End If
Next
End Function
' Returns True if the given character is part of the Cyrillic alphabet'
' NOTE: Assumes UTF-16 encoding'
Private Function IsCharCyrillic(ByVal sChar As String) As Boolean
' According to the first few Google pages I found, '
' Cyrillic is stored at U+400-U+52f '
Const CYRILLIC_START As Integer = &H400
Const CYRILLIC_END As Integer = &H52F
' A (valid) single Unicode char will be two bytes long'
If LenB(sChar) <> 2 Then
Err.Raise errInvalidArgument, _
"IsCharCyrillic", _
"sChar must be a single Unicode character"
End If
' Get Unicode value of character'
Dim nCharCode As Integer
nCharCode = AscW(sChar)
' Is char code in the range of the Cyrillic characters?'
If (nCharCode >= CYRILLIC_START And nCharCode <= CYRILLIC_END) Then
IsCharCyrillic = True
End If
End Function
Example Usage
' On my box, this code iterates through my Inbox. On your machine,'
' you may have to switch to your Inbox in Outlook before running this code.'
' I placed this code in `ThisOutlookSession` in the VBA editor. I called'
' it in the Immediate window by typing `ThisOutlookSession.TestIsCyrillic`'
Public Sub TestIsCyrillic()
Dim oItem As Object
Dim oMailItem As MailItem
For Each oItem In ThisOutlookSession.ActiveExplorer.CurrentFolder.Items
If TypeOf oItem Is MailItem Then
Set oMailItem = oItem
If IsCyrillic(oMailItem.Subject) Then
' I just printed out the offending subject line '
' (it will display as ? marks, but I just '
' wanted to see it output something) '
' In your case, you could change this line to: '
' '
' oMailItem.Delete '
' '
' to actually delete the message '
Debug.Print oMailItem.Subject
End If
End If
Next
End Sub

the "Subject" property of the message returns a bunch of question marks.
A classic string encoding problem. Sounds like that property is returning ASCII but you want UTF-8 or Unicode.

It seems to me you have an easy solution already - just look for any subject line with (say) 5 question marks in it

Related

What does a hyperlink range.start and range.end refer to?

I'm trying to manipulate some text from a MS Word document that includes hyperlinks. However, I'm tripping up at understanding exactly what Range.Start and Range.End are returning.
I banged a few random words into an empty document, and added some hyperlinks. Then wrote the following macro...
Sub ExtractHyperlinks()
Dim rHyperlink As Range
Dim rEverything As Range
Dim wdHyperlink As Hyperlink
For Each wdHyperlink In ActiveDocument.Hyperlinks
Set rHyperlink = wdHyperlink.Range
Set rEverything = ActiveDocument.Range
rEverything.TextRetrievalMode.IncludeFieldCodes = True
Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start) & "#" & vbCrLf
Next
End Sub
However, the output between the #s does not quite match up with the hyperlinks, and is more than a character or two out. So if the .Start and .End do not return char positions, what do they return?
This is a bit of a simplification but it's because rEverything counts everything before the hyperlink, then all the characters in the hyperlink field code (including 1 character for each of the opening and closing field code braces), then all the characters in the hyperlink field result, then all the characters after the field.
However, the character count in the range (e.g. rEverything.Characters.Count or len(rEverything)) only includes the field result if TextRetrievalMode.IncludeFieldCodes is set to False and only includes the field code if TextRetrievalMode.IncludeFieldCodes is set to True.
So the character count is always smaller than the range.End-range.Start.
In this case if you change your Debug expression to something like
Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start - (rEverything.End - rEverything.Start - 1 - Len(rEverything))) & "#" & vbCrLf
you may see results more along the lines you expect.
Another way to visualise what is going on is as follows:
Create a very short document with a piece of text followed by a short hyperlink field with short result, followed by a piece of text. Put the following code in a module:
Sub Select1()
Dim i as long
With ActiveDocument
For i = .Range.Start to .Range.End
.Range(i,i).Select
Next
End With
End Sub
Insert a breakpoint on the "Next" line.
Then run the code once with the field codes displayed and once with the field results displayed. You should see the progress of the selection "pause" either at the beginning or the end of the field, as the Select keeps "selecting" something that you cannot actually see.
Range.Start returns the character position from the beginning of the document to the start of the range; Range.End to the end of the range.
BUT everything visible as characters are not the only things that get counted, and therein lies the problem.
Examples of "hidden" things that are counted, but not visible:
"control characters" associated with content controls
"control characters" associated with fields (which also means hyperlinks), which can be seen if field result is toggled to field code display using Alt+F9
table structures (ANSI 07 and ANSI 13)
text with the font formatting "hidden"
For this reason, using Range.Start and Range.End to get a "real" position in the document is neither reliable nor recommended. The properties are useful, for example, to set the position of one range relative to the position of another.
You can get a somewhat more accurate result using the Range.TextRetrievalMode boolean properties IncludeHiddenText and IncludeFieldCodes. But these don't affect the structural elements involved with content controls and tables.
Thank you both so much for pointing out this approach was doomed but that I could still use .Start/.End for relative positions. What I was ultimately trying to do was turn a passed paragraph into HTML, with the hyperlinks.
I'll post what worked here in case anyone else has a use for it.
Function ExtractHyperlinks(rParagraph As Range) As String
Dim rHyperlink As Range
Dim wdHyperlink As Hyperlink
Dim iCaretHold As Integer, iCaretMove As Integer, rCaret As Range
Dim s As String
iCaretHold = 1
iCaretMove = 1
For Each wdHyperlink In rParagraph.Hyperlinks
Set rHyperlink = wdHyperlink.Range
Do
Set rCaret = ActiveDocument.Range(rParagraph.Characters(iCaretMove).Start, rParagraph.Characters(iCaretMove).End)
If RangeContains(rHyperlink, rCaret) Then
s = s & Mid(rParagraph.Text, iCaretHold, iCaretMove - iCaretHold) & "" & IIf(wdHyperlink.TextToDisplay <> "", wdHyperlink.TextToDisplay, wdHyperlink.Address) & ""
iCaretHold = iCaretMove + Len(wdHyperlink.TextToDisplay)
iCaretMove = iCaretHold
Exit Do
Else
iCaretMove = iCaretMove + 1
End If
Loop Until iCaretMove > Len(rParagraph.Text)
Next
If iCaretMove < Len(rParagraph.Text) Then
s = s & Mid(rParagraph.Text, iCaretMove)
End If
ExtractHyperlinks = "<p>" & s & "</p>"
End Function
Function RangeContains(rParent As Range, rChild As Range) As Boolean
If rChild.Start >= rParent.Start And rChild.End <= rParent.End Then
RangeContains = True
Else
RangeContains = False
End If
End Function

Insert RichText (From RichTextBox, RTF File, OR Clipboard) into Word Document (Bookmarks or Find/Replace)

To summarize what I'm attempting to do, I work for a non-profit organization that sends out acknowledgement letters when someone donates money to us (a thank you, basically). We have multiple different letters that are written every month and sent to IS to "process". I would like to make this as efficient and use as little time as possible for IS, so I've created a program in VB.NET that takes content and pastes it into a template using Word bookmarks, updates a table in SQL so that the letter can be tested with live data, and sends an e-mail to the Production department letting them know to test the letter. It works fully, except...
I cannot for the life of me figure out how to retain RTF (RichText) when I insert the content into the letter template.
I've tried saving the content of the RichTextBox as an RTF file, but I can't figure out how to insert the RTF file contents into my document template and replace the bookmark.
I've tried using the Clipboard.SetText, odoc......Paste method, but it's unreliable as I can't accurately state where I'd like the text to paste. The find/replace function isn't very helpful because all of the bookmarks I'm trying to replace are within text boxes.
I'd show some code, but most of it has been deleted out of frustration for not working. Either way, here's some code I've been working with:
Private Sub testing()
strTemplateLocation = "\\SERVER\AcknowledgementLetters\TEST\TEMPLATE.dot"
Dim Selection As Word.Selection
Dim goWord As Word.Application
Dim odoc As Word.Document
goWord = CreateObject("Word.Application")
goWord.Visible = True
odoc = goWord.Documents.Add(strTemplateLocation)
Clipboard.Clear()
Clipboard.SetText(txtPreD.Rtf, TextDataFormat.Rtf)
odoc.Content.Find.Execute(FindText:="<fp>", ReplaceWith:=My.Computer.Clipboard.GetText)
'Code for looping through all MS Word Textboxes, but didn't produce desired results
For Each oCtl As Shape In odoc.Shapes
If oCtl.Type = Microsoft.Office.Core.MsoShapeType.msoTextBox Then
oCtl.TextFrame.TextRange.Text.Replace("<fp>", "Test")
goWord.Selection.Paste()
End If
Next
'Clipboard.Clear()
'Clipboard.SetText(txtPostD.Rtf, TextDataFormat.Rtf)
'odoc.Content.Find.Execute(FindText:="<bp>", ReplaceWith:="")
'goWord.Selection.Paste()
MsgBox("Click Ok when finished checking.")
odoc.SaveAs2("\\SERVER\AcknowledgementLetters\TEST\TEST.docx")
odoc = Nothing
goWord.Quit(False)
odoc = Nothing
goWord = Nothing
End Sub
...and here is the default code for setting bookmarks. This works perfectly as long as formatting is not required:
Private Sub SetBookmark(odoc As Object, strBookmark As String, strValue As String)
Dim bookMarkRange As Object
If odoc.Bookmarks.Exists(strBookmark) = False Then
Exit Sub
End If
bookMarkRange = odoc.Bookmarks(strBookmark).Range
If ((Err.Number = 0) And (Not (bookMarkRange Is Nothing))) Then
bookMarkRange.text = strValue
odoc.Bookmarks.Add(strBookmark, bookMarkRange)
bookMarkRange = Nothing
End If
End Sub
TL;DR - Need formatted text (Example: "TEST") to be inserted into a Word document either as a bookmark or as a replacement text.
Expected results: Replace "fp" (front page) bookmark with "TEST" including bold formatting.
Actual results: "fp" is not replaced (when using clipboard and find/replace method), or is replaced as "TEST" with no formatting.
I figured it out! I had to do it a weird way, but it works.
The following code saves the RichTextBox as an .rtf file:
RichTextBoxName.SaveFile("temp .rtf file location")
I then used the following code to insert the .rtf file into the bookmark:
goWord.ActiveDocument.Bookmarks("BookmarkName").Select()
goWord.Selection.InsertFile(FileName:="temp .rtf file location")
I then deleted the temp files:
If My.Computer.FileSystem.FileExists("temp .rtf file location") Then
My.Computer.FileSystem.DeleteFile("\temp .rtf file location")
End If

How to get Selection.Text to read displaytext of Macro field code

I'm having some trouble figuring this out, and would really appreciate some help. I'm trying to write a macro that uses the selection.text property as a Case text-expression. When the macro is clicked in Microsoft Word, the selected text is automatically set to the DisplayText. This method worked great for the formatting via Selection.Font.Color for a quick and dirty formatting toggling macro, but it doesn't work for the actual text.
When debugging with MsgBox, it is showing a box (Eg: □ ) as the value.
For example,
Word Field Code:
{ MACROBUTTON Macro_name DisplayText }
VBA Code run when highlighting "DisplayText" in Word:
Sub Macro_name()
Dim Str As String
Str = Selection.Text
MsgBox Str
Select Case Str
Case "DisplayText"
MsgBox "A was selected"
Case "B"
MsgBox "B was selected"
End Select
End Sub
What is output is a Message Box that only shows □
When I run this macro with some regular text selected, it works just fine.
My question is this: Is there a way to have the macro read the displaytext part of the field code for use in the macro?
You can read the field code, directly, instead of the selection (or the Field.Result which also doesn't give the text).
It's not quite clear how this macro is to be used throughout the document, so the code sample below provides two variations.
Both check whether the selection contains fields and if so, whether the (first) field is a MacroButton field. The field code is then tested.
In the variation that's commented out (the simpler one) the code then simply checks whether the MacroButton display text is present in the field code. If it is, that text is assigned to the string variable being tested by the Select statement.
If this is insufficient because the display text is "unknown" (more than one MacroButton field, perhaps) then it's necessary to locate the part of the field code that contains the display text. In this case, the function InstrRev locates the end point of the combined field name and macro name, plus the intervening spaces, in the entire field code, searching from the end of the string. After that, the Mid function extracts the display text and assigns it to the string variable tested by the Select statement.
In both variations, if the selection does not contain a MacroButton field then the selected test is assigned to the string variable for the Select statement.
(Note that for my tests I needed to use Case Else in the Select statement. You probably want to change that back to Case "B"...)
Sub Display_Field_DisplayText()
Dim Str As String, strDisplayText As String
Dim textLoc As Long
Dim strFieldText As String, strMacroName As String
Dim strFieldName As String, strFieldCode As String
strDisplayText = "text to display"
If Selection.Fields.Count > 0 Then
If Selection.Fields(1).Type = wdFieldMacroButton Then
strFieldName = "MacroButton "
strMacroName = "Display_Field_DisplayText "
strFieldCode = strFieldName & strMacroName
Str = Selection.Fields(1).code.text
textLoc = InStrRev(Str, strFieldCode)
strFieldText = Mid(Str, textLoc + Len(strFieldCode))
MsgBox strFieldText
Str = strFieldText
'If InStr(Selection.Fields(1).code.text, strDisplayText) > 0 Then
' Str = strDisplayText
'End If
End If
Else
Str = Selection.text
End If
Select Case Str
Case strDisplayText
MsgBox "A was selected"
Case Else
MsgBox "B was selected"
End Select
End Sub

How to hardcode long strings in VBA

I'm writing a macro to save VBA modules as 64 bit strings in another self-extracting module. The self-extracting module is designed to hold several long strings (could be any length, up to the max 2GB strings I suppose), and a few short snippets of code to decompress the strings and import the modules they represent.
Anyway, when my macro builds the self extracting module it needs to save the really long strings (I'm saving as hardcoded Consts). But if they are too long (>1024 ish) to fit on a single line in the VBA editor, I get errors.
How should I format these hardcoded strings so that I can save them either as Consts or in another way in my self-extracting module? So far I've been saving each string as several Const declarations in 1000 character chunks, but it would be preferable to have one string per item only.
As suggested in the comment, you can use custom XML part to store information inside the workbook.
Here’s the code:
Option Explicit
Public Sub AddCustomPart()
Dim oXmlPart As CustomXMLPart
Dim strTest As String
strTest = "<Test_ID>123456</Test_ID>"
Set oXmlPart = ReadCustomPart("Test_ID")
'/ Check if there is already an elemnt available with same name.
'/ VBA or Excel Object Model, doesn't perevnt duplicate entries.
If oXmlPart Is Nothing Then
Set oXmlPart = ThisWorkbook.CustomXMLParts.Add(strTest)
Else
MsgBox oXmlPart.DocumentElement.Text
End If
End Sub
Function ReadCustomPart(strTest As String) As CustomXMLPart
Dim oXmlPart As CustomXMLPart
For Each oXmlPart In ThisWorkbook.CustomXMLParts
If Not oXmlPart.DocumentElement Is Nothing Then
If oXmlPart.SelectSingleNode("/*").BaseName = strTest Then
Set ReadCustomPart = oXmlPart
Exit Function
End If
End If
Next
Set ReadCustomPart = Nothing
End Function

Find each word marked as error

Is it possible to find words that MS-Word marks as errors?
My goal is to find words containing "è" instead of "é", but to use a macro I need to replace the char only into words marked as error.
I'm working on MS-Word 2013
here is some code to get you started. you need to add code that checks for the "bad" letter
' this is just demo code that shows how misspelled words could be replaced
' create document with a few words, one or two misspelled
' then single-step this code using F8 key
' while watching the text in the document
Sub aaaaaa()
Dim i As Integer
Dim badChr As String
Dim badWrd As String
Dim wrd As Object
For Each wrd In ActiveDocument.Words
If wrd.SpellingErrors.Count > 0 Then
badWrd = wrd.SpellingErrors(1).Text
Debug.Print badWrd
wrd.SpellingErrors(1).Text = string(len(badWrd),"x") ' replace whole word if you like
wrd.SpellingErrors(1).Text = badWrd ' put back original
For i = 1 To wrd.SpellingErrors(1).Characters.Count ' loop characters in misspelled word
badChr = wrd.SpellingErrors(1).Characters(i).Text
wrd.SpellingErrors(1).Characters(i).Text = "x" ' replace character
wrd.SpellingErrors(1).Characters(i).Text = badChr ' restore character
Next i
End If
Next wrd
End Sub