Why does Word (2007) mess up this Unicode character in a macro? - vba

NB I have looked at other questions on this topic. In particular, this answer works fine to produce the symbol there ("Rs"), and other characters, such as East Asian ideograms, also work fine.
NB this is Word 2007. If you find that the above-linked method works for you for the character I need (see below), with a newer version of Word, that would be of interest to know about.
The character I need is ⏑, Metrical Breve, Unicode 23D1 (9169 decimal), from the Miscellaneous Technical block.
I can produce this in the file, manually, using this method:
ensure Num Lock is on
hold down Alt key
using the numeric keypad, type +09169
release Alt key
... but to date I can find no way whatsoever to incorporate this into a working Word 2007 macro: ChrW(&H23D1&) gets printed as a generic "Don't recognise this" block (small rectangle).

Related

simple input of diacritical marks, and superscripts

There are times when you need to input modified variables with diacritical marks, or superscripts.
Seems like declare_index_properties allows doing it at the stage of display print.
But it is neither simple, nor very useful in formulas.
is there a simple way of adding hats, umlauts, and ', "strokes on top of a symbol, making it distinguishable from the symbol without such mark both to interpreter and to human eye?
Maxima doesn't have a notion of declaring a symbol to have diacritical marks or other combining marks on it. However, Maxima allows Unicode characters in symbol names if the underlying Lisp implementation allows Unicode; almost all of them allow Unicode. GCL is the only Lisp implementation, so far as I know, which doesn't handle Unicode correctly.
WxMaxima appears to allow Unicode characters to be input. At least, it worked that way when I tried some examples. Command-line Maxima allows Unicode if the terminal it is running in allows Unicode.
I think any Unicode character should be OK in a string. For symbols, any character which passes ALPHA-CHAR-P (a build-in Lisp function) can be part of a symbol name. Also, any character which is declared to be alphabetic (via declare("x", alphabetic) where x is the character in question) can be part of a symbol name.
I think wxMaxima has some capability to allow the user to select characters with diacritical marks from a menu; I haven't tried it. When I want to use Unicode characters, I end up just pasting them from a web page or something. I have used https://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html as a source of characters in the past.

VBA replace certain carriage

All.
I am used to programming VBA in Excel, but am new to the structures in Word.
I am working through a library of text files to update them. Many of them are either OCR documents, or were manually entered.
Each has a recurring pattern, the most common of which is unnecessary carriage returns.
For example, I am looking at several text files where there is a double return after each line. A search and replace of all double carriage returns removes all paragraph distinctions.
However, each line is approximately 30 characters long, and if I manually perform the following logic, it gives me a functional document.
If there is a double carriage return after 30+ characters, I replace them with a space.
If there were less than 30 characters prior to the double return, I replace them with a single return.
Can anyone help me with some rudimentary code that would help me get started on that? I could then modify it for each "pattern" of text documents I have.
e.g.
In this case, there are more than
thirty characters per line. And I
will keep going to illustrate this
example.
This would be a new paragraph, and
would be separated by another of
the single returns.
I want code that would return:
In this case, there are more than thirty character returns. And I will keep going to illustrate this example.
This would be a new paragraph, and would be separated by another of the single returns.
Let me know if anyone can throw something out that I can play with!
You can do this without code (which RegEx requires), simply using Word's own wildcard Find/Replace tools, where:
Find = ([!^13]{30,})[^13]{1,}
Replace = \1^32
and, to clean up the residual multi-paragraph breaks:
Find = [^13]{2,}
Replace = ^p
You could, of course, record the above as a macro...
Here is a RegEx that might work for you:
(\n\n)(?<!\.(\n\n))
The substitution is just a plain space, you can try it out (and modify / tweak it) here: https://regex101.com/r/zG9GPw/4
This 'pattern' tells the RegEx engine to look for the newline character \n which occurs x2 like this \n\n (worth noting this is from your question and might be different in your files, e.g. could be \r\n) and it assumes that a valid line break will be proceeded by a full stop: \..
In RegEx the full stop symbol is a single character wild card so it needs to be escaped with the '\' (n and r are normal characters, escaping them tells the RegEx engine they represent newline and return characters).
So... the expression is looking for a group of x2 newline characters but then uses a negative look-behind to exclude any matches where the previous character was a full stop.
Anyway, it's all explained on the site:
Here is how you could do a RegEx find and replace using NotePad++ (I'm not sure if it comes with RegEx or if a plugin is needed, either way it is easy). But you can set a location, filters (to target specific file types), and other options (such as search in sub-directories).
Other than that, as #MacroPod pointed out you could also do this with MS Word, document by document, not using any code :)

Parser not recognizing a dash

My program makes calculations on physics vectors and it allows copy/pasting from websites and then tries to parse them into the x, y, and z components automatically. I've come across one website (http://mathinsight.org/cross_product_examples) that has (3,−3,1). While that looks normal, that minus is actually not recognized by VB. Visually, it is longer than the normal minus (− and -), but return the same Unicode of 45. This picture shows the Unicode for every character (I added a minus in front of the first 3 for comparison) in the Textbox. Also, from this website, I had to use Ctrl+c because right clicking shows that this is not simple HTML.
One is valid (the first), but the second gives VB fits as shown below. Either it won't compile (shown by the blue line below) or a simple assignment (the second one) wrecks havok on my form.
I have tried using
vectorString.Replace("–", "-")
and pasting in the longer dash for the target string and a normal keystroke dash as the replacement, but nothing happens. I'm guessing that since they both have the same Unicode.
Is there some way to convert the longer, invalid dash into the one recognized by VB? I tried using dash symbol that Word likes to replace the minus sign with and it comes up as Unicode 150. So, apparently there are at least three different kinds of dashes. Any thoughts?
The character from Math Insight is U+2212, minus sign. The character you tried using in your Replace call is U+2013, en dash. That's why your replace didn't work.
Beyond the standard ASCII hyphen (-, U+0045), there are two common dashes: the en dash (–, U+2013) and the em dash (—, U+2014). There is also a figure dash (‒, U+2012), but it is not as common.

ANSI escape codes in GNU Smalltalk

I'm trying to make a console-based program that makes use of ANSI escape codes with GNU Smalltalk. I can't seem to figure out how to go about printing a string object formatted with ANSI escape codes. I've tried the following.
'\x1b[31mHi' displayNl
This prints the entire string, including the escape code, without any formatting. I would have expected this to print "Hi" in red (and then everything else in the console after that, as I didn't reset the color.)
After googling a bit, I was able to find a couple issues on mailing lists where people were trying to produce things like newlines using "\n". Most of the answers were using the Transcript object's cr method, but I didn't find anything about colors in the textCollector class.
It looks like it shouldn't be all that hard to create my own module in C to achieve this functionality, but I'd like to know if there's a better way first.
I'm aware of the ncurses bindings, but I'm not sure that'd be practical for just making certain pieces of text in the program colored. So, is there a standard way of outputting colored text to the terminal in GNU Smalltalk using ANSI escape sequences?
Ended up getting an answer on the GNU Smalltalk mailing list. Looks like you can use an interpolation operator to achieve this.
For example ('%1[31mHi' % #($<16r1B>)) displayNl. would change the color to red, and ('%1[34mHi' % #($<16r1B>)) displayNl. would change the color to blue.
Basically, the % operator looks for a sequences that look like "%(number)" and replaces them with the objects in the array to the right of the operator. In our case, the array has one item, which is the ascii escape character in hexadecimal. So the "%1" in "%1[31mHi' is being replaced with the escape character, and then printed.
(This answer was stolen almost verbatim from Paolo on the GNU Smalltalk mailing list.)

jedit unwanted conversion of english words to Greek Letters, Math and Logic symbols

Problem: Some English words are translated to symbols
Greek letters as English words are translated to symbols:
example lambda is converted to the equivalent small Greek letter.
Logic and Math words are transliated to symbols.
examples: and, or, in, exists, sum, div, top, int, pm converts to symbols
or small empty square if the symbol is not recognized.
Scope: Windows XP 32-bit, WIndows 7 64-bit with jEdit 4.5.2
This problem acts like an abbreviation expansion. As I type a-l-p-h-a then a space,
jedit converts alpha to the small Greek letter alpha.
I have learned to live with this but would like to find a solution to the problem.
Any help would be appreciated. I don't know if this is a customization problem or a feature or a bug.
To turn off all abbreviations, go into Utilities > Global Options, then Abbreviations. Uncheck "Space bar expands abbrevs".
EDIT: I didn't realize you wanted to use abbreviations but not those specific ones.
To take out the abbreviations for lambda, alpha, etc., go into that same dialog, pick "global" if it isn't already selected, then select each one from the list and hit the minus button under the list. Unfortunately (at least in jEdit 4.5) you'll have to select each one and delete it individually; you can't select multiple entries.