Replace all non latin characters with their latin a-z counterparts and word count in VBA - vba

I am trying to write a script in VBA that
will:
replace all É and other similar
characters with their latin
counterparts.
Remove all non alpha numeric
characters.
Remove duplicate spacing
then word count the string
I have worked out that i can split the string on " " and count the elements to get the word count... but I am struggling on the rest of it. Help much appreciated.

Word has a word count built in for sentences, paragraphs and the entire document:
ActiveDocument.Words.Count
As for replace, it is probably easiest to record a macro to see how this works. You will have to replace the accented characters one by one, or use RegEx to replace all A type (Å, Ä, Â ,Á, À) characters with A, and so on.

Related

2 spaces to 1 space after punctuation and super/subscript

I'm trying to write a macro in Word that will make 2 spaces into 1 space after a punctuation and formatting section, like this, where the 23-29 will be links to references at the end of the document.
dultricies.23-29 Purus
I would like the macro to identify the two spaces after the superscript and make it 1 space.
Thanks,
Chris
I tried creating the macro to identify 2 spaces and make it 1 space - that worked. But when I tried to create a macro using wildcard characters or special formatting (superscript), I expected Word to locate the the instance and make it one space, but it did not.
You don't even need a macro for this. All you need is a simple wildcard Find/Replace, where:
Find = ([ ^s]){2,}
Replace = \1

Word VBA search for adjacent (non-space) characters with different formatting

I need to be able to find every place in my document (hundreds of pages) where there is a formatting change without a space. For example:
a bold partnext to regular text
Or red text next to black with no space. I want to have my macro find each "word" (in the vba sense) like this, and execute code based on that character location accordingly. (The loop should identify the character position where the format change occurs... although I can do that part with a loop through the characters within the found word).
Is there a simpler way to do this than by looping character by character through the whole document and checking for a difference in formatting, which would be too resource-intensive?
Thanks for your help.

Does regex not work in Excel search?

I am trying to search for trailing whitespaces in text cells in Excel. Knowing that Excel search accepts regex, I expected to leverage on the full feature set, but was surprised to find that some features do not seem to work.
For example, I have some cells with strings like ELUFA\s\s\s\s\s (note: in my excel sheet there is no \s, but just blank invisible whitespaces, after ELUFA, but I had to add these \s in here otherwise Stackoverflow would just remove these whitespaces and the string would just appear to be ELUFA) or NATION CONFEC.\s with trailing whitespaces.
I used the expression [A-Z.]{1}\s+$ into the excel search function expecting that it would return search results for these cells, but it does not, and just tells me that nothing is found.
However, what I find really funny is that Excel search is somehow able to interpret a regex like this A *. Using this expression, excel search does find for me only the ELUFA\s\s\s\s\s cells, and no other cells which do not match this regex.
Is there some kind of limitations as to what subset of the full REGEX that Excel search accepts? How do we get excel search to accept the full REGEX feature set as described here?
Thank you.
The Excel SEARCH() function does not support full regex. It actually only supports two wildcards, ? and *. From the documentation:
You can use the wildcard characters — the question mark (?) and asterisk (*) — in the find_text argument. A question mark matches any single character; an asterisk matches any sequence of characters. If you want to find an actual question mark or asterisk, type a tilde (~) before the character.
If you want to match spaces then you will have to enter them as literals. Note that finding any amount of trailing spaces could be as simple as ELUFA\s, with one space at the end, because that would actually match one, or more than one, space.

How do we convert any special characters in textbox to certain format

I want to convert any special characters entered in infopath textbox into under score.Using the translate function, in the below example I am able to replace the space with under score
translate(fileName, " ","_")
Since translate function takes only three parameters then how can we check all the special characters? My target is if any special characters including space is entered into the text box it should automatically replace these special characters with under score ("_")
what if you reverse it. If it is not A-Z, then convert it
or try this:
translate(field1, "!##%^&", "_")

How do I check if a string has a paragraph character?

I need to check if a string from a word document contains a paragraph character. I want to only extract the text without the paragraph character. Is There a constant for the paragraph character? I tried checking for vbnewLine and vbCrLF, but these didn't work.
Have a look at the wikipedia article on newlines.
In total there are 3 characters indicating a new line (in some context), and sometimes they are used in combinations.
I think it does not matter which ones Word uses and which ones it doesn't; You want them all gone.
So, I'd say run through all characters and remove all LF, CR and RS instances, or replace them by spaces (whilst avoiding double spaces)