HSQLDB and double byte chars - hsqldb

I need to store double byte chars (Japanese Kanji) in an in-memory DB within a Java EE web app.
I'm considering using HSQL.
Will it support double byte chars?

Related

Sending Unicode Characters greater than 0x7F through RS232

My application is in Windows CE 6.0 using Compact Framework and is being used to issue remote commands to a device through RS-232. These commands are send using bytes with specific hex values, e.g. sending 0x22 0x28 0x00 0x01 as a command sequence. I'm sending the bytes one at a time. The hex values are stored internally in a string for each command sequence, e.g. "22,28,00,01". I'm sending the bytes using the following code.
Dim i As Integer
Dim SendString() As String
Dim SendByte, a As String
DutCommand = "22,0A,00,02,E7,83" 'Sample command string
SendString = Split(DutCommand, ",") 'Split the string
For i = 0 To UBound(SendString) 'Send each byte after encoding
SendByte = Chr(CInt("&H" & SendString(i)))
CommPort.Write(SendByte)
Next
SendByte is being properly encoded even for values greater than 0x7F but the last two bytes being sent (0xE7 and 0x83) are being sent as 0x3F, the ASCII code for "?" since it's greater than 0x7F.
Am I missing a setting for the Comm port to handle encoding? Is there a simple method for sending the data with values greater than 0x7F?
You simply forgot to convert the hex values to bytes. It needs to look like this:
For i = 0 To UBound(SendString) 'Send each byte after encoding
Dim b = Byte.Parse(SendString(i), Globalization.NumberStyles.HexNumber)
CommPort.BaseStream.WriteByte(b)
Next
The non-stringy way is:
Dim DutCommand As Byte() = {&H22, &H0A, &H00, &H02, &HE7, &H83}
CommPort.Write(DutCommand, 0, DutCommand.Length)
I am assuming that you are using SerialPort.Write.
If so, notice what the documentation says:
By default, SerialPort uses ASCIIEncoding to encode the characters. ASCIIEncoding encodes all characters greater than 127 as (char)63 or '?'. To support additional characters in that range, set Encoding to UTF8Encoding, UTF32Encoding, or UnicodeEncoding.
Seems like the solution is pretty clear. You'll need to set the CommPort.Encoding property to the desired value.
See SerialPort.Encoding for more info.
As per the documentation for SerialPort.Write:
By default, SerialPort uses ASCIIEncoding to encode the characters.
ASCIIEncoding encodes all characters greater than 127 as (char)63 or
'?'. To support additional characters in that range, set Encoding to
UTF8Encoding, UTF32Encoding, or UnicodeEncoding.
You could also consider using the Write overload that actually just writes the raw bytes.

Representing data types e.g. Chars, Strings, Integers etc

I am a .NET Developer and I do not believe I know enough about encoding. I have read this article: http://www.joelonsoftware.com/articles/Unicode.html.
Say I declare this string:
Dim TestString As String = "1"
I believe this will be represented as a Unicode character. Say I declare this integer:
Dim TestInt As Integer = 1
How is this represented? I assume that Unicode is not used? i.e. it is only used for String and Chars? Is that correct? Therefore I believe that on a 32 bit machine 1 would simply be represented as:
00000000 0000000 0000000 00000001
Do numeric data types have byte order marks: http://en.wikipedia.org/wiki/Byte_order_mark ?
All strings in .NET are UTF-16. From the language spec:
Visual Basic .NET defines the following primitive types:
...
The Char value type, which represents a single Unicode character and
maps to System.Char...
The String reference type, which
represents a sequence of Unicode characters and maps to System.String...
Why should an integral value types like an integer be represented with Unicode in computer memory? Unicode is (citing from Wikipedia):
a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems.
So yes, it's only used for Strings and Chars.
Also note that an Integer will always be 4-byte signed integer, no matter if you use a 32 bit or 64 bit machine.
Byte order marks are an entire different topic. As already said in a comment, it's used in text file or stream.

Silverlight UTF8 encoder produces wacky output

I've been trying to trace down a bug for hours now and it has come down to this:
Dim length as Integer = 300
Dim buffer() As Byte = binaryReader.ReadBytes(length)
Dim text As String = System.Text.Encoding.UTF8.GetString(buffer, 0, buffer.Length)
The problem is the buffer contains 300 bytes but the length of the string 'text' is now 285. When I convert it back to bytes, the length is 521 bytes... WTF?
The same code is a normal WinForms app works perfectly. The data being read by the binary reader is a UTF8 encoded string. Any ideas why Silverlight is playing funny buggers?
I bet your stream contains some characters that require more than one byte. UTF8 uses a single byte when possible, but uses more bytes when the character is outside the ASCII range.
This explains why your buffer is longer than the string (300 vs 285).
Example:
string: "t e s t ä " (length = 5 -last char takes 2 bytes)
bytes: 0x74 | 0x65 | 0x73 | 0x74 | 0xc3 0xa4 (length = 6)
As to why it becomes even longer when you convert the text back to bytes, my best guess (also looking at the 521 size you get) is that you are using Encoding.Unicode instead of Encoding.UTF8 to perform the conversion. Unicode always uses two bytes for each character.
(btw. obviously this has nothing to do with Silverlight. You are probably testing the code with two different strings in Winforms vs. Silverlight. No worry, we've all done stupid mistakes like that :-) )

How to define a string literal containing non-ASCII characters?

I'm programming in VB.NET using Visual Studio 2008.
I need to define a string literal containing the character "÷" equivalent to Chr(247).
I understand that internally VS uses UTF-16 encoding, but when the source file is written to disk it contains the single byte value F7 for this character.
This source file is processed by another program that uses UTF-8 encoding by default, so it fails to interpret this character correctly, attempting to combine it with the following single-byte character.
What encoding would correctly interpret the single byte F7 as the single character ÷?
Alternatively, is there a way of expressing a non-ASCII literal that uses only ASCII characters - like using some kind of escape sequence?
well, i always thought that by default VS uses UTF-8 to save files. But ÷ is F7 in encoding ISO 8859-1. If this is not enough for you go here: how to change source file encoding in csharp project (visual studio / msbuild machine)?

VBA - Read file byte by byte on system with Asian locale

I am trying to convert a file from binary to text, by simply replacing each character with the hexadecimal code. For example, character 'c' will be replaced by '63'.
I have a code which is working fine in normal systems, but it breaks down in the PC where I need to use it as it has default locale set to Chinese.
I am using the following statements to read a byte -
ch$ = " "
Get #f%, , ch$
I suspect there is a problem when I am reading the file byte by byte, as it is skipping certain bytes because they form composite characters. It's probably reading 2 bytes which form an Asian character as one byte. It is thus forming a much smaller file than the expected size.
How can I read the file byte by byte?
Full code is pasted here: http://pastebin.com/kjpSnqzV
Your suspicion is correct. VB file reading automatically converts strings into Unicode from the default code page on the PC. On an Asian code page, some characters are represented as more than one byte.
I advise you to use a Byte variable rather than a string - that will stop VB being over helpful.
Dim ch As Byte
Get #f%, , ch
Another possible problem with the original code is that some byte sequences are illegal on Asian code pages (they don't represent valid characters). So your code could experience errors for some input files, but presumably you want it to work with any file.