Indentify line breaks in excel VBA - vba

I'm trying to identify a cell has line breaks(not the menu bar cell option, actual multiple lines through alt+enter), and extract each line separatedely
I have tried both
InStr(str, "\n") < 0
and
Split(str, "\n")
But doesn't seem to be working

VBA is not C# ("\n"). Line breaks you'll find on: vbCr or vbLf or vbCrLf constants.
For further information, please see:
vbCr
vbLf
vbCrLf
[EDIT]
Points to Mat's Mug answer! I forgot about vbNewLine constant.

There are no escape sequences in VBA. Use the built-in vbNewLine constant instead for the equivalent:
hasLineBreaks = InStr(str, vbNewLine) > 0
Per MSDN, vbNewline returns a Platform-specific new line character; whichever is appropriate for current platform, that is:
Chr(13) + Chr(10) [on Windows] or, on the Macintosh, Chr(13)
So you don't need to work with ASCII character codes, or even with their respective built-in constants.
Except Excel will strip CR chars from cell and shape contents, and this has nothing to do with VBA (the CR chars would be stripped all the same and "\n" wouldn't work for correctly reading that Excel data in C#, Javascript, or Python either) and everything to do with the context of where the string came from.
To read "line breaks" in a string with the CR chars stripped, you need to look for line feed chars in the string (vbLf).
But if you systematically treat the line feed character as a line ending, you'll eventually run into problems (esp.cross-platform), because ASCII 10 all by itself isn't an actual line break on either platform, and you'll find ASCII 13 characters in strings you thought you had stripped line breaks from, and they'll still properly line-break on a Mac, but not on Windows.

Consider either:
Split(str, Chr(10))
or
Split(str, Chr(13))
You may need to try both if the data has been imported from external source.

Related

Change Font for Chinese Characters in VBA

I am trying to write a helper script for a colleague that will automatically open up all .doc(x) files in a directory, find any and all chinese characters, set their Font, save and close.
I already have a working version of this script. The file opening/saving/closing part is handled in Python/win32com and works fine. My big point of contention is still the VBA macro.
I know there is a regex (\p{Han}) that should be able to catch all Chinese characters, but this does not seem to work in VBA. Similarly, I have tried using Unicode Ranges and Chr(W). Nothing so far produced any output, let alone correct output.
Out of frustration, I made one last ditch attempt and simply inverted the search paramters. This is how it is now:
Sub FindReplace_zh(Rng As Range)
With Rng.Find
Do While .Execute(FindText:="[!A-ZÄÖÜa-zäöü0-9><_ ^11^13§$²³%#&/\+-]", MatchWildcards:=True)
If Rng.Font.Bold = True And Rng.Font.Name Like "Arial*" Then
Rng.Font.Name = "SimHei"
ElseIf Rng.Font.Bold = False And Rng.Font.Name Like "Arial*" Then
Rng.Font.Name = "SimSun"
End If
Rng.Collapse 0
Loop
End With
End Sub
AT LEAST THIS WORKS, but its far from elegant and still produces some undesired output.
I have yet to understand how I can substitute "[!A-ZÄÖÜa-zäöü0-9><_ ^11^13§$²³%#&/+-]" with a variable, or most anything else. Many characters are not covered by this regex, such as "(", ")" etc., but adding them (even escaped with ) will result in runtime errors in VBA.
I found a lot of tutorials and questions dealing with removing or inserting text, but my specific case of finding text and then changing the font, while leaving everything else untouched, seems rather specific.
Fun fact:
I had to add ^11 and ^13 to the regex list, as not including them would lead to the Macro inserting new linebreaks in random positions of the .doc
EDIT:
New try with comment:
Dim searchPattern As String
searchPattern = "[" & ChrW(&H2E80) & "-" & ChrW(&HFFED) & "]{1,}"
With Rng.Find
Do While .Execute(FindText:=searchPattern, MatchWildcards:=True)
Invalid operation on final line!
I also would not have concatinated a string like this. I am not sure how VBA parses this, but apprently not the way we hoped.
EDIT2: FIX
Removing "{1,}" from searchPattern did it. Now it works exactly as I expected it to :)
searchPattern = "[" & ChrW(&H2E80) & "-" & ChrW(&HFFED) & "]"
It is possible to find the value of characters that cannot be represented in the VBIDE by pasting them into an empty Word document and then using VBA to print the AscW values of each character in the text you wish to investigate. You can then use ChrW in VBA to reassemble the text in a VBA friendly way.
From
pinyin.info/news/2016/…
You can use the find string "[⺀-■]{1,}" to find any Chinese character. However as you have noted when you paste this text into the VBA IDE you get [?-?]{1,} because VBA uses UTD-8 as its character set. (I think).
The following code
Public Sub PrintCharacterValues()
Dim myIndex As Long
With ActiveDocument.Paragraphs(1).Range
For myIndex = 1 To 8
Debug.Print .Characters(myIndex), AscW(.Characters(myIndex)), Hex(AscW(.Characters(myIndex)))
Next
End With
End Sub
Gives the output of
" 34 22
[ 91 5B
? 11904 2E80
- 45 2D
? -19 FFED
] 93 5D
" 34 22
  160 A0
Thus you can get the critical section of the find string as
"[" & ChrW(&H2£80) & "-" & ChrW(&HFFED) &"]"

How can I use special chars in VBA of Microsoft Word?

I've created a set of macro files in Microsoft Word's VBA as a sort of a CAT tool (CAT = https://en.wikipedia.org/wiki/Computer-assisted_translation). The problem is that there are cases where I display the text needed to be translated and the user needs to input text in his own language. That might include some special chars, like "ăîâșț/ĂÎÂȘȚ", or even quotes or brackets. Is there any way to use those in some InputBox function? Or, at least, some way to let the user input the text he needs in some TextBox or something?... Or how should I approach this?... Maybe UTF-8 support would be what I need? Or?... Any help would be appreciated!...
I've tried Microsoft Word's vba function InputBox. I'm also thinking if, maybe, I would be able to create my own InputBox, with my conditions on it, I might be able to have one that accepts those chars too, or all the chars into some string variable... Here is something someone on StackOverflow says:
Is it possible to create an 'input box' in VBA that can take a text selection with multiple lines as an input? (I'm referring to gizlmo's answer...)
Here are 3 lines of code that contain that (although it's more of a how to question, not a debugging question, so those are not really needed...)
MsgBox ("Ziua " & Str(ziua) & " - " & titlurien(ziua))
titluales = InputBox("Titlul original: " & titlurien(ziua), "Ziua: " & Str(ziua) & ", Rapsodia Realitatilor " & monthname(lunanecesara) & Str(annecesar))
titluriro(ziua) = titluales
I expect the output to be exactly what he typed, whether it's quotes, brackets or special characters (like "ăîâșț"/"ĂÎÂȘȚ")...
A VBA InputBox will take any character typed or pasted into it. The characters available to type depends on the Language version of Windows and Office that the end user has installed.
Below is a test I just made with your example character string "ăîâșț/ĂÎÂȘȚ"
SpecialCharInput()
Dim str As String
str = InputBox("Enter you text", "Special Test Input Box")
Debug.Print str
End Sub
On my English language system, the only trouble it had was with the upper and lower case "ȘȚ" Turkish characters. By trouble I mean it turned those characters into question marks "??" in the result string. I'm sure though, if my system supported the Turkish language that those characters would be recognized and outputted properly.

Differences Between vbLf, vbCrLf & vbCr Constants

I used constants like vbLf , vbCrLf & vbCr in a MsgBox; it produces same output in a MsgBox (Text "Hai" appears in a first paragraph and a word "Welcome" appears in a next Paragraph )
MsgBox("Hai" & vbLf & "Welcome")
MsgBox ("Hai" & vbCrLf & "Welcome")
MsgBox("Hai" & vbCr & "Welcome")
I know vbLf , vbCrLf & vbCr are used for print and display functions.
I want to know the Difference between the vbLf , vbCrLf & vbCr constants.
Constant Value Description
----------------------------------------------------------------
vbCr Chr(13) Carriage return
vbCrLf Chr(13) & Chr(10) Carriage return–linefeed combination
vbLf Chr(10) Line feed
vbCr : - return to line beginning
Represents a carriage-return character for print and display functions.
vbCrLf : - similar to pressing Enter
Represents a carriage-return character combined with a linefeed character for print and display
functions.
vbLf : - go to next line
Represents a linefeed character for print and display functions.
Read More from Constants Class
The three constants have similar functions nowadays, but different historical origins, and very occasionally you may be required to use one or the other.
You need to think back to the days of old manual typewriters to get the origins of this. There are two distinct actions needed to start a new line of text:
move the typing head back to the left. In practice in a typewriter this is done by moving the roll which carries the paper (the "carriage") all the way back to the right -- the typing head is fixed. This is a carriage return.
move the paper up by the width of one line. This is a line feed.
In computers, these two actions are represented by two different characters - carriage return is CR, ASCII character 13, vbCr; line feed is LF, ASCII character 10, vbLf. In the old days of teletypes and line printers, the printer needed to be sent these two characters -- traditionally in the sequence CRLF -- to start a new line, and so the CRLF combination -- vbCrLf -- became a traditional line ending sequence, in some computing environments.
The problem was, of course, that it made just as much sense to only use one character to mark the line ending, and have the terminal or printer perform both the carriage return and line feed actions automatically. And so before you knew it, we had 3 different valid line endings: LF alone (used in Unix and Macintoshes), CR alone (apparently used in older Mac OSes) and the CRLF combination (used in DOS, and hence in Windows). This in turn led to the complications of DOS / Windows programs having the option of opening files in text mode, where any CRLF pair read from the file was converted to a single CR (and vice versa when writing).
So - to cut a (much too) long story short - there are historical reasons for the existence of the three separate line separators, which are now often irrelevant: and perhaps the best course of action in .NET is to use Environment.NewLine which means someone else has decided for you which to use, and future portability issues should be reduced.

FileSystem.WriteAllText adds non-printable characters

Here are two methods for writing text to a file in VB.Net 2012. The first one prepends the same three non-printable characters to each file: . The second one works as expected and does not add the three characters. objDataReader is an OleDB datareader.
Any idea why?
Greg
My.Computer.FileSystem.WriteAllText(lblLocation.Text & "\" &
objDataReader("MessageControlId").ToString & ".txt", objDataReader("MsgContents").ToString, False)
Using outfile As New StreamWriter(lblLocation.Text & "\" & objDataReader("MessageControlId").ToString & ".txt")
outfile.Write(objDataReader("MsgContents").ToString)
End Using
Thanks. I found the entry below I after Googled BOM, in case anyone wants a more detailed explanation. While the BOM was not visible in a text editor it did cause problems when I passed the file to our HL7 interface engine.
Greg
Write text files without Byte Order Mark (BOM)?

VBA: newline and carriage return character for an Excel (.xlxs) file with Korean/French

I'm having quite a bit of trouble replacing the newline and carriage return characters in an Excel file (2010 US version and .xlxs extension).
Previously, I have written a macro that does this successfully on a regular Excel file with English with the following code.
newStr = Replace(originalStr, newline/carriage return/both, replacementStr)
Newline/carriage return/both would be vbNewLine (Chr(10)), vbCr (Chr(13)), or vbCrLf, respectively.
I now have an Excel file with Koreans and French in it, and the newline and CR characters seem to be something else. How do I go about finding what they actually are, in terms of Chr()) or some VBA constant, and replace these characters? I need to remove all of the newlines and replace with <br />.
Can you select a relevant cell and check the characters contained, like so:
s = Sheets("Sheet3").[e2]
For i = 1 To Len(s)
If Asc(Mid(s, i, 1)) < 32 Then
Debug.Print Asc(Mid(s, i, 1)); " -- "; i
End If
Next