P/Invoke with [Out] StringBuilder / LPTSTR and multibyte chars: Garbled text? - pinvoke

I'm trying to use P/Invoke to fetch a string (among other things) from an unmanaged DLL, but the string comes out garbled, no matter what I try.
I'm not a native Windows coder, so I'm unsure about the character encoding bits. The DLL is set to use "Multi-Byte Character Set", which I can't change (because that would break other projects). I'm trying to add a wrapper function to extract some data from some existing classes. The string in question currently exists as a CString, and I'm trying to copy it to an LPTSTR, hoping to get it into a managed StringBuilder.
This is what I have done that I believe is the closest to being correct (I have removed the irrelevant bits, obviously):
// unmanaged function
DLLEXPORT void Test(LPTSTR result)
{
// eval->result is a CString
_tcscpy(result, (LPCTSTR)eval->result);
}
// in managed code
[DllImport("Test.dll", CharSet = CharSet.Auto)]
static extern void Test([Out] StringBuilder result);
// using it in managed code
StringBuilder result = new StringBuilder();
Test(result);
// contents in result garbled at this point
// just for comparison, this unmanaged consumer of the same function works
LPTSTR result = new TCHAR[100];
Test(result);
Really appreciate any tips! Thanks!!!

One problem is using CharSet.Auto.
On an NT-based system this will assume that the result parameter in the native DLL will be using Unicode. Change that to CharSet.Ansi and see if you get better results.
You also need to size the buffer of the StringBuilder that you're passing in:
StringBuilder result = new StringBuilder(100); // problem if more than 100 characters are returned
Also - the native C code is using 'TCHAR' types and macros - this means that it could be built for Unicode. If this might happen it complicates the CharSet situation in the DllImportAtribute somewhat - especially if you don't use the TestA()/TestW() naming convention for the native export.

Dont use out paramaeter as you are not allocating in c function
[DllImport("Test.dll", CharSet = CharSet.Auto)]
static extern void Test(StringBuilder result);
StringBuilder result = new StringBuilder(100);
Test(result);
This should work for you

You didn't describe what your garbled string looks like. I suspect you are mixing up some MBCS strings and UCS-2 strings (using 2-byte wchar_ts). If every other byte is 0, then you are looking a UCS-2 string (and possibly misusing it as an MBCS string). If every other byte is not 0, then you are probably looking at an MBCS string (and possibly misusing it as a Unicode string).
In general, I would recommend not using TCHARs (or LPTSRs). They use macro magic to switch between char (1 byte) and wchar_t (2 bytes), depending on whether _UNICODE is #defined. I prefer to explicit use chat and wchar_t to make the codes intent very clear. However, you will need to call the -A or -W forms of any Win32 APIs that use TCHAR parameters: e.g. MessageBoxA() or MessageBoxW() instead of MessageBox() (which is a macro that checks whether _UNICODE is #defined.
Then you should change CharSet = CharSet.Auto to something CharSet = CharSet.Ansi (if both caller and callee are using MBCS) or CharSet = CharSet.Unicode (if both caller and callee are using UCS-2 Unicode). But it sounds like your DLL is using MBCS, not Unicode.
pinvoke.net is a great wiki reference with many examples of P/Invoke function signatures for Win32 APIs:

Related

C# Bond: string to wstring

In the Bond C# manual, it notes the following:
These following changes will break wire compatibility and are not recommended:
Adding or removing required fields
Incompatible change of field types (any type change not covered above); e.g.: int32 to string, string to wstring
...
But it doesn't explain why. The use case here is that I'm using Bond that connects a C# application with a C++ backend. The field is currently a string. I want to change it to a wstring. The manual notes that C# strings can handle C++ strings and C++ wstrings. Therefore, why I can't I just change the field type from string to wstring? Why does this break wire compat?
In Bond's binary formats, strings are UTF8 encoded (no BOM) and wstrings are UTF16-LE encoded. If you were to switch a field from string to wstring, the reading side would try to interpret UTF8 data as UTF16-LE data. These two encodings are not compatible with each other, hence a field type change from string to wstring is a breaking change.
Note that the manual says "For example C# string can represent either Bond type string or wstring." It does not say anything about C++ types. When working with Bond across C# and C++, there are three type systems: Bond's, C#'s, and C++'s.
If on the C++ side, you want to use something akin to std::wstring to store the field in memory, take a look as using Custom type mapping with the string concept.

How do I return a string from DLL to Inno Setup Pascal Script

I have two C functions in a DLL which are defined in the definition file and are exported for using in Inno Setup.
char* __stdcall GetName()
{
return "Kishore";
}
void __stdcall getName(char* strName)
{
strcpy(strName, "Kishore");
}
The Inno Setup code will load the custom DLL and call the function/procedure to return the names
{ Inno Setup script }
[Code]
procedure getName(MacAddress: String);
external 'getName#files:MyDll.dll stdcall setuponly';
function GetName():PAnsiChar;
external 'GetName#files:MyDll.dll stdcall setuponly';
function NextButtonClick(CurPage: Integer): Boolean;
var
StrName: String;
begin
SetLength(StrName,15);
getName(StrName); { displaying only single character }
StrName := GetName(); { this call is crashing }
end
How can I retrieve the name in Inno Setup script without it crashing?
GetName should return a const char *, however it is otherwise fine as written. Note however that returning a string like this can only ever possibly work for literal string constants, as shown above. You cannot use this pattern to return a calculated string value (if you try that, it will likely crash or give corrupted data); thus getName is wrong.
Also note that while C is case sensitive, Pascal is not, so getName and GetName are the same function in the Inno script. You might be getting away with this in the above case because the parameters are different, but I wouldn't rely on that -- you should give them distinct names. (Don't use the same name on the C side either, as DLL exports are sometimes looked up case-insensitively too.)
To return a calculated string, you should use a pattern like this:
DLL code:
void __stdcall CalculateName(char *buffer, size_t size)
{
strncpy(buffer, "whatever", size);
buffer[size-1] = 0;
}
Inno code:
procedure CalculateName(Buffer: AnsiString; Max: Cardinal);
external 'CalculateName#files:my.dll stdcall';
...
Max := 16;
Buffer := StringOfChar(#0, Max);
CalculateName(Buffer, Max);
SetLength(Buffer, Pos(#0, Buffer) - 1);
...
A few acceptable variations exist, for example you can make the DLL function return the number of characters actually written to the buffer, and use that in the subsequent SetLength rather than calling Pos to find the null terminator.
But you must:
Ensure that both sides are using the same string types, either both ANSI or both Unicode.
ANSI Inno Setup supports only ANSI strings with its String type.
Unicode Inno Setup supports either ANSI strings with AnsiString or Unicode strings with String.
When using Unicode strings, ensure that both sides agree whether Max and/or the return value is specified in characters or bytes (the example code assumes it's in characters).
Prior to calling the function, use either SetLength or StringOfChar to ensure that the buffer has been sized to the required maximum possible result length.
Ensure the called function does not try to write past this maximum length (which is easier if this is provided as a parameter to the function).
Ensure that if you're using Pos, the called function must ensure the value is null-terminated (or you need to be more careful than shown in the example).
Ensure that after the call you truncate the string to the actual length, either by using a returned value or by finding the null terminator.
One of the constraints in play here is that memory allocated by one side must be freed by the same side. You cannot safely release memory allocated on the "wrong" side of the DLL boundary, in either direction.

Should mid and instr be used, or indexof and substring?

Some VB string functions have similar methods in System.String, such as mid and substring, instr and indexof. Is there a good reason to use one or the other?
An example could explain a lot. This is the source code of Mid from Microsoft.VisualBasic
public static string Mid(string str, int Start, int Length)
{
if (Start <= 0)
{
throw new ArgumentException(Utils.GetResourceString("Argument_GTZero1", new string[] { "Start" }));
}
if (Length < 0)
{
throw new ArgumentException(Utils.GetResourceString("Argument_GEZero1", new string[] { "Length" }));
}
if ((Length == 0) || (str == null))
{
return "";
}
int length = str.Length;
if (Start > length)
{
return "";
}
if ((Start + Length) > length)
{
return str.Substring(Start - 1);
}
return str.Substring(Start - 1, Length);
}
At the end of the day they call Substring....
The story is a little more complex for Instr agains IndexOf because you could use a compare parameter but also in that case the internal code used in the Microsoft.VisualBasic COMPATIBILITY (Bold is mine) library falls again inside the base methods provided by the NET Framework.
Of course, if you need only to maintain an old program ported from the VB6 days, then it is absolutely correct to use these methods. Instead if you plan to continue the evolution of your program or you build a new one I suggest to switch to the NET Framework core methods as soon as possible.
I'd love to say you should use Mid() or Instr() because some of us have been using those functions for years, but I'd recommend against using those throwbacks. Mostly because the portable target platforms (like for Xbox and Windows Phone) do not support them. To me that's a sign that they're going to deprecate sooner than later. I've also read the .Net versions seem to perform better, but can't find any statistics to support that claim right now.
One other interesting note that is somewhat related is that the way in which the Trim() function deals with break lines is different. Sample code:
Dim strTest As String = ControlChars.NewLine ' OR Environment.NewLine OR vbNewLine
Dim oldLength As Integer = Len(Trim(strTest)) '2
Dim newLength As Integer = strTest.Trim().Length '0
So be careful if you're porting code to the .Net versions.
The reason I usually prefer to use the System.String one is that it is more compatible with other languages. If using c# as well as VB it's much less confusing to stick to the System.String ones. There are also a number of System.String functions which don't AFAIK have equivalents in Microsoft.VisualBasic, such as EndsWith and it's a bit odd to use a mixture. The VB ones are for compatibility with VB6 etc which is ancient now.
However - I do sometimes like the fact that the VB versions are more fault tolerant. The Mid example that Steve posted shows that Mid returns "" in cases which would have thrown an exception with Substring. There are similar differences with some of the others. I have found that quite useful in the past; you can end up writing those checks yourself before calling Substring. It also means that editing code using the old VB style commands to System.String can introduce some unexpected exception. I work on one project which started in VB5, and I learnt not to replace the old versions VB without a reason quite quickly.

ComBSTR assignment

I'm confused about COM string assignments. Which of the following string assignment is correct. Why?
CComBSTR str;
.
.
Obj->str = L"" //Option1
OR should it be
Obj->str = CComBSTR(L"") //Option2
What is the reason
A real BSTR is:
temporarily allocated from the COM heap (via SysAllocString() and family)
a data structure in which the string data is preceded by its length, stored in a 32-bit value.
passed as a pointer to the fifth byte of that data structure, where the string data resides.
See the documentation:
MSDN: BSTR
Most functions which accept a BSTR will not crash when passed a BSTR created the simple assignment. This leads to confusion as people observe what seems to be working code from which they infer that a BSTR can be initialized just like any WCHAR *. That inference is incorrect.
Only real BSTRs can be passed to OLE Automation interfaces.
By using the CComBSTR() constructor, which calls SysAllocString(), your code will create a real BSTR. The CComBSTR() destructor will take care of returning the allocated storage to the system via SysFreeString().
If you pass the CComBSTR() to an API which takes ownership, be sure to call the .Detach() method to ensure the BSTR is not freed. BSTRs are not reference counted (unlike COM objects, which are), and therefore an attempt to free a BSTR more than once will crash.
If you use str = CComBSTR(L"") you use the constructor:
CComBSTR( LPCSTR pSrc );
If you use str = L"" you use the assignment operator:
CComBSTR& operator =(LPCSTR pSrc);
They both would initialize the CComBSTR object correctly.
Personally, I'd prefer option 1, because that doesn't require constructing a new CComBSTR object. (Whether their code does so behind the scenes is a different story, of course.)
Option 1 is preferred because it only does one allocation for the string where as option 2 does 2 (not withstanding the creation of a new temporary object for no particular reason). Unlike the bstr_t type in VC++ the ATL one does not do referenced counted strings so it will copy the entire string across.

Copy bytes in memory to an Array in VB.NET

unfortunately I cannot resort to C# in my current project, so I'll have to solve this without the unsafe keyword.
I've got a bitmap, and I need to access the pixels and channel values directly. I'd like to go beyond Marshal.ReadByte() and Marshal.WriteByte() (and definitely beyond GetPixel and SetPixel).
Is there a way to put all the pixel data of the bitmap into a Byte array that works on both 32 and 64 bit systems? I want the exact same layout as the original bitmap, so the padding for each row (if it exists) also needs to be included.
Marshal doesn't seem to have something akin to:
byte[] ReadBytes(IntPtr start, int offset, int count)
Unless I totally missed it...
Any help greatly appreciated,
David
ps. So far all my images are in 32BppPArgb pixelformat.
Marshal does have a Method that does exactly what you are asking. See Marshall.Copy()
public static void Copy(
IntPtr source,
byte[] destination,
int startIndex,
int length
)
Copies data from an unmanaged memory
pointer to a managed 8-bit unsigned
integer array.
And there are overloads to go the other direction as well
Would something like this do? (untested):
Public Shared Function BytesFromBitmap(ByVal Image As Drawing.Bitmap) As Byte()
Using buffer As New IO.MemoryStream()
image.Save(result, Drawing.Imaging.ImageFormat.Bmp)
Using rdr As New IO.BinaryReader(buffer)
Return rdr.ReadBytes(buffer.Length)
End Using
End Using
End Function
It won't let you manipulate the pixels in a Drawing.Bitmap object directly, but it will let you copy that bitmap to a byte array, as per the question title.
Another option is serialization via the BinaryFormatter, but I think that will still require you to pass it through a MemoryStream.
VB does not offer methods for direct memory access. You have two choices:
Use the Marshal class
Write a small unsafe C# (or C++/CLI) library that handles only these operations and reference it from your VB code.
Alright, there is a third option. VB.Net does not inherently support direct memory access, but it can be accomplished. It's just ugly and prone to errors. Nonetheless, if you're willing to put in the effort, you can try building a bitmap access library using these techniques combined with the approach referenced previously.
shf301 is right on the money, but I'd like to add a link to a comprehensive explanation/tutorial on fast pixel data access. Rather than saving the image to a stream and accessing a file-in-memory, it would be better to lock the bitmap, copy pixel data out, access it, and copy it back in. The performance of this technique is pretty good.
Code is in c#, but the approach is language-neutral and easy to read.
http://ilab.ahemm.org/tutBitmap.html