Where can I find a list of .NET unicode (wide) functions? - vb.net

I would like to get a list of the VB.net/C# "wide" functions for unicode - i.e. AscW, ChrW, MessageBoxW, etc.
Is there a list of these somewhere?

All string methods in .NET are Unicode by default. .NET uses unicode strings for all String methods, since System.String is Unicode.
From System.String's Documentation:
Represents text as a series of Unicode characters.
Any time you call any method that takes a String as a parameter, you're working in Unicode. There is no need for "wide character" versions of methods in .NET.
In fact, if you want to work with ANSI text, you need to explicitly tell the framework that is what you are doing.
This is often used via a method from the Marhsal class (for interop with other libraries), or via the Encoding class (Encoding.ASCII, or a different character encoding) to convert a series of bytes to or from text.

All .net strings are Unicode already. There are no Ascii strings to worry about. So they dropped the W from the Win32 names.

Strings in .net are all Unicode. You don't need specific functions to handle Unicode because it's built in already.

Related

C# Bond: string to wstring

In the Bond C# manual, it notes the following:
These following changes will break wire compatibility and are not recommended:
Adding or removing required fields
Incompatible change of field types (any type change not covered above); e.g.: int32 to string, string to wstring
...
But it doesn't explain why. The use case here is that I'm using Bond that connects a C# application with a C++ backend. The field is currently a string. I want to change it to a wstring. The manual notes that C# strings can handle C++ strings and C++ wstrings. Therefore, why I can't I just change the field type from string to wstring? Why does this break wire compat?
In Bond's binary formats, strings are UTF8 encoded (no BOM) and wstrings are UTF16-LE encoded. If you were to switch a field from string to wstring, the reading side would try to interpret UTF8 data as UTF16-LE data. These two encodings are not compatible with each other, hence a field type change from string to wstring is a breaking change.
Note that the manual says "For example C# string can represent either Bond type string or wstring." It does not say anything about C++ types. When working with Bond across C# and C++, there are three type systems: Bond's, C#'s, and C++'s.
If on the C++ side, you want to use something akin to std::wstring to store the field in memory, take a look as using Custom type mapping with the string concept.

Apply SQL "LIKE" to bytes

I must create a DAO with hibernate that can work in a generic way, that means to execute some queries based on properties types.
My generic DAO works ok when filtering String properties of any class, it accepts "contains", "starts with", "ends with" using "like" restrictions:
Restrictions.like(propertyName, (String) value, getMatchMode());
The problem I have is that I need to also create a similar "contains", "starts with", "ends with" to bytes (byte[]) properties, the hibernate
SimpleExpression like(String propertyName, Object value)
api does not work (probably totally expected not to work), so I was thinking maybe I could convert the bytes stored in DB into a String, and then with a workaround apply the normal stringed Restrictions.like api.
The problem is that I think there's no standard way to convert bytes[] into String since there's no standard data type among DB platforms, I mean, Oracle uses "RAW", hsql uses "VARBINARY" and so on (Oracle uses its own RAWTOHEX for instance).
Or should any of you have an idea how to sort out the problem it will be very welcome.
Cheers.
///RGB
In MySQL you could use HEX to convert BINARY to String. i.e.
SELECT
*
FROM
myTable
WHERE
HEX(myBinaryField) LIKE 'abc%'`
In your Java code you could use some Base64 Encoder, which will convert bytes to string. Then you could just persist the Base64 encoded String and use normal LIKE queries. Maybe not the most efficient way, but it should work well.

Do string literals in different files have the same memory address?

I'm using objc_[sg]etAssociatedObject(), and this function uses a memory address as a key. If I pass the same string literal - e.g. "UIImageForTexture" - into the function from two different files, will the two string literals have:
the same memory address?
different memory addresses?
it depends on some other factors.
it's undefined.
I can do this more explicitly by sticking the literal in a file somewhere and referring to it via an extern but I'd like to know if I can avoid having to do that.
This is going to be an implementation defined behavior from the C99 draft standard section 6.4.5 String literals paragraph 6 says:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
It's whether two string literals with same content (e.g. char *p1="abc; char *p2="abc";) have the same address or different address is an implementation detail. p1 and p2 may be equal or may not be equal. C standard guarantees neither of them.
So the answer is (3) It depends on the implementation/optimization.
Instead of passing string literals directly, you can use pointers to them and pass them instead. That way, you can reliably handle them and not worry about (1) and (2).
This may be of interest: Addresses of two pointers are same?

Derive string from const enum

I have the following in my constants file:
typedef enum
{
AnimalTypeBear,
AnimalTypeCamel,
AnimalTypeCow,
AnimalTypeCount
}
AnimalType;
If I declare an AnimalType variable somewhere in my code like following and set it to AnimalTypeBear:
AnimalType animalType = 0;
Is there away to somehow derive the string "Bear" from that animalType variable or just in general to access the string of its corresponding constant type (in this case AnimalTypeBear).
Enums are constant expressions like #define. Enums at compile time will be "translated" into the code as constants (while #define will be evaluated before compilation). So basically it is not possible to reference the enum string in this way.
As suggested by others you can use a string array.
You cannot do this without code in (Objective-)C. If you want to be able to use actual enumeration literals as strings in your code, or during I/O, with language support then you need to use a language with enumeration type support such as Pascal or Ada.
If you are keen to have this and don't mind work as long as it is reusable then you need to learn about reading the symbol tables structures from a binary and make sure that the information is not stripped from your application. You'll see the debugger can show the correct literals, also if you use Xcode's "Product > Generate Output > Assembly File" menu item you'll see the literals are in there as strings. It will be a lot of work for you, but would be reusable once done.
After that give up and write some code - a simple static array of labels and an index operation. Yes, it's a maintenance headache if you ever change your enumeration.
Alternatively you can write some different code, say in Ruby... Xcode supports adding your own file "types" and running scripts to (pre-)process them. So you could define, say, a file type ".enum" and use a Ruby script to convert that into a C enumeration definition and code to provide the strings. Apple has examples of using Ruby to pre-process files in this way. Once you have your script Xcode will do the rest, on each compilation it will run your script to convert your ".enum" into ".m" (or ".c") and the compile the result. This approach is usually best though for files which contain only one thing, e.g. localised string file processing, you don't usually write your enum declarations in their own files.

P/Invoke with [Out] StringBuilder / LPTSTR and multibyte chars: Garbled text?

I'm trying to use P/Invoke to fetch a string (among other things) from an unmanaged DLL, but the string comes out garbled, no matter what I try.
I'm not a native Windows coder, so I'm unsure about the character encoding bits. The DLL is set to use "Multi-Byte Character Set", which I can't change (because that would break other projects). I'm trying to add a wrapper function to extract some data from some existing classes. The string in question currently exists as a CString, and I'm trying to copy it to an LPTSTR, hoping to get it into a managed StringBuilder.
This is what I have done that I believe is the closest to being correct (I have removed the irrelevant bits, obviously):
// unmanaged function
DLLEXPORT void Test(LPTSTR result)
{
// eval->result is a CString
_tcscpy(result, (LPCTSTR)eval->result);
}
// in managed code
[DllImport("Test.dll", CharSet = CharSet.Auto)]
static extern void Test([Out] StringBuilder result);
// using it in managed code
StringBuilder result = new StringBuilder();
Test(result);
// contents in result garbled at this point
// just for comparison, this unmanaged consumer of the same function works
LPTSTR result = new TCHAR[100];
Test(result);
Really appreciate any tips! Thanks!!!
One problem is using CharSet.Auto.
On an NT-based system this will assume that the result parameter in the native DLL will be using Unicode. Change that to CharSet.Ansi and see if you get better results.
You also need to size the buffer of the StringBuilder that you're passing in:
StringBuilder result = new StringBuilder(100); // problem if more than 100 characters are returned
Also - the native C code is using 'TCHAR' types and macros - this means that it could be built for Unicode. If this might happen it complicates the CharSet situation in the DllImportAtribute somewhat - especially if you don't use the TestA()/TestW() naming convention for the native export.
Dont use out paramaeter as you are not allocating in c function
[DllImport("Test.dll", CharSet = CharSet.Auto)]
static extern void Test(StringBuilder result);
StringBuilder result = new StringBuilder(100);
Test(result);
This should work for you
You didn't describe what your garbled string looks like. I suspect you are mixing up some MBCS strings and UCS-2 strings (using 2-byte wchar_ts). If every other byte is 0, then you are looking a UCS-2 string (and possibly misusing it as an MBCS string). If every other byte is not 0, then you are probably looking at an MBCS string (and possibly misusing it as a Unicode string).
In general, I would recommend not using TCHARs (or LPTSRs). They use macro magic to switch between char (1 byte) and wchar_t (2 bytes), depending on whether _UNICODE is #defined. I prefer to explicit use chat and wchar_t to make the codes intent very clear. However, you will need to call the -A or -W forms of any Win32 APIs that use TCHAR parameters: e.g. MessageBoxA() or MessageBoxW() instead of MessageBox() (which is a macro that checks whether _UNICODE is #defined.
Then you should change CharSet = CharSet.Auto to something CharSet = CharSet.Ansi (if both caller and callee are using MBCS) or CharSet = CharSet.Unicode (if both caller and callee are using UCS-2 Unicode). But it sounds like your DLL is using MBCS, not Unicode.
pinvoke.net is a great wiki reference with many examples of P/Invoke function signatures for Win32 APIs: