Asc(Chr(254)) returns 116 in .Net 1.1 when language is Hungarian - vb.net

I set the culture to Hungarian language, and Chr() seems to be broken.
System.Threading.Thread.CurrentThread.CurrentCulture = "hu-US"
System.Threading.Thread.CurrentThread.CurrentUICulture = "hu-US"
Chr(254)
This returns "ţ" when it should be "þ"
However, Asc("ţ") returns 116.
This: Asc(Chr(254)) returns 116.
Why would Asc() and Chr() be different?
I checked and the 'wide' functions do work correctly: ascw(chrw(254)) = 254

Chr(254) interprets the argument in a system dependent way, by looking at the System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage property. See the MSDN article about Chr. You can check whether that value is what you expect. "hu-US" (the hungarian locale as used in the US) might do something strange there.
As a side-note, Asc() has no promise about the used codepage in its current documentation (it was there until 3.0).
Generally I would stick to the unicode variants (ending on -W) if at all possible or use the Encoding class to explicitly specify the conversions.

My best guess is that your Windows tries to represent Chr(254)="ţ" as a combined letter, where the first letter is Chr(116)="t" and the second ("¸" or something like that) cannot be returned because Chr() only returns one letter.
Unicode text should not be handled character-by-character.

It sounds like you need to set the code page for the current thread -- the current culture shouldn't have any effect on Asc and Chr.
Both the Chr docs and the Asc docs have this line:
The returned character depends on the code page for the current thread, which is contained in the ANSICodePage property of the TextInfo class. TextInfo.ANSICodePage can be obtained by specifying System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.

I have seen several problems in VBA on the Mac where characters over 127 and some control characters are not treated properly.
This includes paragraph marks (especially in text copied from the internet or scanned), "¥", and "Ω".
They cannot always be searched for, cannot be used in file names - though they could in the past, and when tested, come up as another ascii number. I have had to write algorithms to change these when files open, as they often look like they are the right character, but then crash some of my macros when they act strangely. The character will look and act right when I save the file, but may be changed when it is reopened.
I will eventually try to switch to unicode, but I am not sure if that will help this issue.
This may not be the issue that you are observing, but I would not rule out isolated problems with certain characters like this. I have sent notes to MS about this in the past but have received no joy.
If you cannot find another solution and the character looks correct when you type it in, then I recommend using a macro snippet like the one below, which I run when updating tables. You of course have to setup theRange as the area you are looking at. A whole file can take a while.
For aChar = 1 To theRange.Characters.count
theRange.Characters(aChar).Select
If Asc(Selection.Text) = 95 And Selection.Text <> "_" Then Selection.TypeText "Ω"
Next aChar

Related

How to determine Thousands Separator using Format in VBA

I would like to determine the Thousand Separator used while running a VBA Code on a target machine without resolving to calling system built-in functions such as (Separator = Application.ThousandsSeparator).
I am using the following simple code using 'Format':
ThousandSeparator = Mid(Format(1000, "#,#"), 2, 1)
The above seems to work fine, and would like to confirm if this is a safe method of doing it without resorting to system calls.
I would expect the result to be a single char string in the form of , or . or ' or a Space as applicable to the locale on the machine.
Please note that I want to only use a language statement such as Format or similar (no sys calls). Also this relates to Thousands Separator not Decimal Separator. This article Using VBA to detect which decimal sign the computer is using does not help or answer my question. Thanks
Thanks in advance.
The strict answer to whether it is safe to use Format to get the thousands separator is No.
E.g. on Windows, it is possible to enter up to three characters into the Thousands Separator field in the regional settings in the control panel.
Suppose you enter asd and click OK.
If you now call Format(1000, "#,#") it will give you 1a000. That is only the first letter of your thousands separator. You have failed to retrieve it correctly.
Reading the registry:
? CreateObject("WScript.Shell").RegRead("HKCU\Control Panel\International\sThousand")
you get back asd in full.
To be fair, the Excel international properties do not seem to be of much help either. Application.International(xlThousandsSeparator) in this situation will return the separator originally defined in your computer's locale, not the value you've overridden it to.
Having that said, the practical answer is Yes, because it would appear (and if you happen to know for sure, please post an answer here) that there is no culture with multi-char thousand separator (even in China where scary things like 1億2345万6789 or 1億2345萬6789 exist, they happen to be represented with just one UTF-16 character), and you probably are happy to ignore the people who decided to play with their locale settings in that fashion.

Alt-Code Characters in F#

Edit/Update:
Thank you all for responding. I understand I was being too vague, but wasn't sure if posting naked lines of code would be useful in this case.
In my .vb file I have a pulldown control with its validation values as:
TempUnit.DataSource = {"°C", "°F", "°R", "K"}
...which is stored in a variable:
Dim unit As String = TempUnit.SelectedItem.ToString
...which gets passed into a function along with other variables:
Function xxx(..., ByVal unitT As String) As Double
... which finally calls the .fs file and gets evaluated using:
let tempConv t u =
match u with
|"°C" -> t * 9.0 / 5.0 + 32.0
|"°R" -> t - 459.67
|"K" -> t * 9.0 / 5.0 - 459.67
|_ -> t
If any temperature unit other than Kelvin is selected, the match fails and defaults to the else case (which is Fahrenheit in this context). I ended up bypassing the degree symbol entirely by evaluating the substring instead:
Dim unit As String = TempUnit.SelectedItem.ToString.Substring(1)
The program is working again, but I have no idea what I changed, if anything, to make the string match stop working. The first thing I tried was to copy/paste from one file to other to ensure they were identical strings, in addition to trying other symbols, but to no avail. The degree symbol is what caught my attention, but then I checked the pressure units and found the exact same issue with the micro prefix.
Thank you, Hans Passant, I had unicode in mind as a possible solution, but it didn't seem like an easy fix in the heat of the moment. I appreciate your link.
Original Post:
I have a VB program referencing a function stored in an F# library file whose arguments include unit of measure strings containing special characters (e.g. "°C" "µBar").
The strings are identical in the .vb and .fs files; and there was no issue until the F# library file stopped recognizing the Alt-Code characters for reasons unbeknownst to me.
The program works as intended if I remove the offending Alt-Code character from the string definitions in the F# and VB files.
What would cause a match to fail between two identical strings that happen to contain an Alt-Code character?
What is the proper way to handle Alt-Code characters in F# (and VB for that matter)?
The µ glyph is a bit infamous. Unicode has two codepoints that look like that: U+03BC = "Greek small letter Mu" and U+00B5 = "Micro sign". One is a letter in the Greek alphabet, the other is a symbol that often appears in math and units.
Compare μ and µ. Looks almost identical in most fonts (you can see the difference with Segoe UI) and very easily fools the human eye. Typographers insist they are not the same, particularly if they are Greek I'd imagine. Nor does a computer, the problem you are surely dealing with.
Copy/paste or re-type to fix. The Charmap.exe applet in Windows is very handy to get this right.

Mid() don't extract string in accurate position

I am using VBA in Ms Access environment, to handle long string (memo field storing HTML originally).
After positioning by Instr(), I put the position into Mid(vStr,vStartPos,vEndPos-vStartPos+1) to extract the string, but the output doesn't match. I have already carefully checked this in immediate windows, as well as NotePad++. What I can say is Instr() and NotePad++ have given the same counting result, while Mid() is different. Mid()'s result are former than Instr()'s in some cases, and latter in other cases. I don't know the reason, and can just believe Mid() use different mechanism or have defeative (surprised!) in handling long string mixed with single-byte and bi-byte chars (but this is common in the world, and meet no problem before), and possibly some special characters.
I believe I need to custom-make a Mid() function. Any idea how to do it effectively and efficiently?
Thanks all for your reply. After I created a custom Mid() by RegEx and find that the problem has no change, I have found out the silly mistake I made. The Instr() and Mid() have no problem, but the string has been carelessly modified between them. So this case should be closed now.

Trailing Ampersand in VB.NET hexadecimal?

This should be an easy one for folks. Google's got nothing except content farms linking to one blurb, and that's written in broken English. So let's get this cleared up here where it'll be entombed for all time.
What's the trailing ampersand on VB hexadecimal numbers for? I've read it forces conversion to an Int32 on the chance VB wants to try and store as an Int16. That makes sense to me. But the part I didn't get from the blurb was to always use the trailing ampersand for bitmasks, flags, enums, etc. Apparantly, it has something to do with overriding VB's fetish for using signed numbers for things internally, which can lead to weird results in comparisons.
So to get easy points, what are the rules for VB.Net hexadecimal numbers, with and without the trailing ampersand? Please include the specific usage in the case of bitmasks/flags and such, and how one would also use it to force signed vs. unsigned.
No C# please :)
Vb.net will regard "&h"-notation hex constants in the range from 0x80000000-0xFFFFFFFF as negative numbers unless the type is explicitly specified as UInt32, Int64, or UInt64. Such behavior might be understandable if the numbers were written with precisely eight digits following the "&", but for some reason I cannot fathom, vb.net will behave that way even if the numbers are written with leading zeroes. In present versions of VB, one may force the number to be evaluated correctly by using a suffix of "&" suffix (Int64), "L" (Int64), "UL" (UInt64), or "UI" (UInt32). In earlier versions of VB, the "problem range" was 0x8000-0xFFFF, and the only way to force numbers in that range to be evaluated correctly (as a 32-bit integer, which was then called a "Long") was a trailing ampersand.
Visual Basic has the concept of Type Characters. These can be used to modify variable declarations and literals, although I'd not recommend using them in variable declarations - most developers are more familiar these days with As. E.g. the following declarations are equivalent:
Dim X&
Dim X As Long
But personally, I find the second more readable. If I saw the first, I'd actually have to go visit the link above, or use Intellisense, to work out what the variable is (not good if looking at the code on paper).

vb.net character set

According to MSDN vb.net uses this extended character set. In my experience it actually uses this:
What am I missing? Why does it say it uses the one and uses the other?
Am I doing something wrong?
Is there some sort of conversion tool to the original character set?
This behaviour is defined in the documentation of the Chr command:
The returned value depends on the code page for the current thread, which is contained in the ANSICodePage property of the TextInfo class in the System.Globalization namespace. You can obtain ANSICodePage by specifying System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.
So, the output of Chr for values greater than 127 is system-dependent. If you want reproducible results, create the desired instance of Encoding by calling Encoding.GetEncoding(String), then use Encoding.GetChars(Byte()) to convert your numeric values into characters.
If you go up one level in the chart linked in your question, you will see that they do not claim that this chart is always the output of the Chr command:
The characters that appear in Windows above 127 depend on the selected typeface.
The charts in this section show the default character set for a console application.
Your application is a WinForm application, not a console application. Even in the console, the character set used can be changed (for example, by using the chcp command), hence the word "default".
For detailed information about the encodings used in .net, I recommend the following MSDN article: Character Encoding in the .NET Framework.
The first character set is Code Page 437 (CP437), the second looks like Code Page 1252 (CP1252) also known as Windows Latin-1.
I'd guess VB.Net is simply picking up the default encoding for the PC.
How did you write all this? Because usually, when you use a output stream function, you can specify the encoding going with it.
Edit: I know this is not C#, but you can see the idea...
You'd have to set the encoding of your filestream, by doing something like this:
Setting the encoding when creating the filestream