How do I check if a string has a paragraph character? - vba

I need to check if a string from a word document contains a paragraph character. I want to only extract the text without the paragraph character. Is There a constant for the paragraph character? I tried checking for vbnewLine and vbCrLF, but these didn't work.

Have a look at the wikipedia article on newlines.
In total there are 3 characters indicating a new line (in some context), and sometimes they are used in combinations.
I think it does not matter which ones Word uses and which ones it doesn't; You want them all gone.
So, I'd say run through all characters and remove all LF, CR and RS instances, or replace them by spaces (whilst avoiding double spaces)

Related

Finding and redacting text highlighted with a specific color, but while keeping the spaces and line breaks (to maintain doc layout)

I'm trying to use the VBA code from a similar question in this forum to redact text highlighted in a specific color, but I would like to keep the document layout, which means only replacing the words, but not the spaces and paragraph breaks in the document. Alternatively, I would be happy if we could identify the line breaks and put a space there.
At the end the document would not have large sections of unbroken text where words and spaces were replaced by XXXXXXXX and highlighted black. It the text would look more like XX X XXXX XXX X but all of it should be highlighted in black.
In other words, the text "Mary had a little lamb." would be redacted to "XXXX XXX X XXXXXX XXXXX" rather than XXXXXXXXXXXXXXXXXXXXXXXX.
I've tried changing the "If flag then" section to include unicode 32 (space) instead of the carriage return (unicode 13), but that doesn't seem to work.
Many thanks.
If flag Then
If Selection.Range.HighlightColorIndex = wdTurquoise Then
' Create replacement string
' If last character is a carriage return (unicode 13), then keep that carriage return
OldText = Selection.Text
OldLastChar = Right(OldText, 1)
NewLastChar = ReplaceChar
If OldLastChar Like String(1, 13) Then NewLastChar = String(1, 13)
NewText = String(Len(OldText) - 1, ReplaceChar) & NewLastChar
' Replace text, black block
Selection.Text = NewText
Selection.Font.ColorIndex = wdBlack
Selection.Font.Underline = False
Selection.Range.HighlightColorIndex = wdBlack
Selection.Collapse wdCollapseEnd
End If
End If
#freeflow has given you an answer in his comment on your post, but if you do that you should also include in the wildcard search, all potential punctuation characters excluding blank spaces.
However, with that said, I recommend you not try and eliminate punctuation characters and do not eliminate spaces between words. I’m recommending that because the purpose of redaction is to eliminate the possibility of someone comprehending what the redacted portion of the document originally contained. If you provide them clues, such as how many words in the sentence ... they can guess and sometimes be quite accurate because of the surrounding non-redacted script.
Oh course, that’s just my opinion.
To maintain document formatting, I suggest that you not use as replacement characters letters such as “X” because it is a wide character. I’ve found it better to use a symbol and I recommend a Wingdings character 127. It’s an average width and does a good job of balancing out sentence length ... but for added assurance I also recommend that you include in your replacement a Font.Spacing of -1, which will tighten up each redacted sentence even more.
In redacting, just be aware that maintaining the document formatting, no matter what your replacement character strategy might be, is very difficult. I’ve spent a lot of time experimenting with this and I’ve now shared what I do in my own redaction add-in. I don’t redact paragraph marks, I redact the entire highlighted string, including spaces and punctuation and I use a Wingding font character 127, set the Font.Spacing to -1, at the font color is the same as whatever color I’m using to highlight the redaction.
If you you are interested in seeing my add-in, do a Web search on AuthorTec Redactor.

Does regex not work in Excel search?

I am trying to search for trailing whitespaces in text cells in Excel. Knowing that Excel search accepts regex, I expected to leverage on the full feature set, but was surprised to find that some features do not seem to work.
For example, I have some cells with strings like ELUFA\s\s\s\s\s (note: in my excel sheet there is no \s, but just blank invisible whitespaces, after ELUFA, but I had to add these \s in here otherwise Stackoverflow would just remove these whitespaces and the string would just appear to be ELUFA) or NATION CONFEC.\s with trailing whitespaces.
I used the expression [A-Z.]{1}\s+$ into the excel search function expecting that it would return search results for these cells, but it does not, and just tells me that nothing is found.
However, what I find really funny is that Excel search is somehow able to interpret a regex like this A *. Using this expression, excel search does find for me only the ELUFA\s\s\s\s\s cells, and no other cells which do not match this regex.
Is there some kind of limitations as to what subset of the full REGEX that Excel search accepts? How do we get excel search to accept the full REGEX feature set as described here?
Thank you.
The Excel SEARCH() function does not support full regex. It actually only supports two wildcards, ? and *. From the documentation:
You can use the wildcard characters — the question mark (?) and asterisk (*) — in the find_text argument. A question mark matches any single character; an asterisk matches any sequence of characters. If you want to find an actual question mark or asterisk, type a tilde (~) before the character.
If you want to match spaces then you will have to enter them as literals. Note that finding any amount of trailing spaces could be as simple as ELUFA\s, with one space at the end, because that would actually match one, or more than one, space.

How do we convert any special characters in textbox to certain format

I want to convert any special characters entered in infopath textbox into under score.Using the translate function, in the below example I am able to replace the space with under score
translate(fileName, " ","_")
Since translate function takes only three parameters then how can we check all the special characters? My target is if any special characters including space is entered into the text box it should automatically replace these special characters with under score ("_")
what if you reverse it. If it is not A-Z, then convert it
or try this:
translate(field1, "!##%^&", "_")

Replace all non latin characters with their latin a-z counterparts and word count in VBA

I am trying to write a script in VBA that
will:
replace all É and other similar
characters with their latin
counterparts.
Remove all non alpha numeric
characters.
Remove duplicate spacing
then word count the string
I have worked out that i can split the string on " " and count the elements to get the word count... but I am struggling on the rest of it. Help much appreciated.
Word has a word count built in for sentences, paragraphs and the entire document:
ActiveDocument.Words.Count
As for replace, it is probably easiest to record a macro to see how this works. You will have to replace the accented characters one by one, or use RegEx to replace all A type (Å, Ä, Â ,Á, À) characters with A, and so on.

iphone sdk , apostrophe showing up as question mark

The quotation marks (apostrophe to be more specific) single and double are displaying as question mark on my text view.
The problem come up when I try to copy and paste some thing from a webpage and save it.
This problem does not happen when I type the sentence.
How can I replace a apostrophe with a regular single quote?
When you copy from a webpage you are not copying a plain old apostrophe. You are copying a fancy one that looks very similar but is not. Since the text view only displays plain text it cannot understand your fancy apostrophe.
When you copy from a webpage you will have to manually delete and retype the apostrophes.
You have to do a string replace probably with unicode characters. The following may be the characters that you want to replace:
Char Unicode HTML
“ 8220 “
‘ 8216 ‘
” 8221 ”
’ 8217 ’