Using XOR on characters as a simple checksum; is a char just a byte? - arduino-c++

I have a string of characters and want to generate a simple checksum by accumulating XOR on each character, then adding the lowest-order byte result of that to the end of the string as formatted by sprintf(twoCharacterBuffer, "%02X", valueHoldingXOR);.
If I just XOR the characters in the string, putting them into an unsigned char value, the compiler warns me that "'sprintf' output between 3 and 9 bytes into a destination of size 2"
The Arduino documentation is a little vague, possibly on purpose, about the number of bytes in a character. I'd like to just XOR the lowest-order byte, whether it's 1 or 2 or 4 bytes, but am not sure of the correct way to do that. Or can I assume that a char is a byte and just cast it?

Related

How can a 32 bytes address represent more than 32 characters?

I have just started studying solidity and coding in general, and I tend to see things like this:
Click for image
I am confused as to how a "32 bytes hash" can include more than 32 characters (even after the "0x000"). I was under the impression that each byte can represent a character. I often see references, as well, saying things like "32 bytes address (64 bytes hex address)". But how can a 64 byte hex address be represented if it is a 32 bytes address - would you still need a byte per character? I know this is probably a stupid/noob question, and I'm probably missing something obvious, but I can't quite figure it out.
One byte is the range 00000000 - 11111111 in binary, or 0x00 - 0xFF in hex. As you can see, one byte is represented in hex as a 2 character string. Therefore, a 32 byte hex string is 64 characters long.
The 32-bit address points to the first byte of 32, 64, 1000 or 100 million sequential bytes. All other follow or are stored on address + 1, +2, +3...

Processing: How to convert a char datatype into its utf-8 int representation?

How can I convert a char datatype into its utf-8 int representation in Processing?
So if I had an array ['a', 'b', 'c'] I'd like to obtain another array [61, 62, 63].
After my answer I figured out a much easier and more direct way of converting to the types of numbers you wanted. What you want for 'a' is 61 instead of 97 and so forth. That is not very hard seeing that 61 is the hexadecimal representation of the decimal 97. So all you need to do is feed your char into a specific method like so:
Integer.toHexString((int)'a');
If you have an array of chars like so:
char[] c = {'a', 'b', 'c', 'd'};
Then you can use the above thusly:
Integer.toHexString((int)c[0]);
and so on and so forth.
EDIT
As per v.k.'s example in the comments below, you can do the following in Processing:
char c = 'a';
The above will give you a hex representation of the character as a String.
// to save the hex representation as an int you need to parse it since hex() returns a String
int hexNum = PApplet.parseInt(hex(c));
// OR
int hexNum = int(c);
For the benefit of the OP and the commenter below. You will get 97 for 'a' even if you used my previous suggestion in the answer because 97 is the decimal representation of hexadecimal 61. Seeing that UTF-8 matches with the first 127 ASCII entries value for value, I don't see why one would expect anything different anyway. As for the UnsupportedEncodingException, a simple fix would be to wrap the statements in a try/catch block. However that is not necessary seeing that the above directly answers the question in a much simpler way.
what do you mean "utf-8 int"? UTF8 is a multi-byte encoding scheme for letters (technically, glyphs) represented as Unicode numbers. In your example you use trivial letters from the ASCII set, but that set has very little to do with a real unicode/utf8 question.
For simple letters, you can literally just int cast:
print((int)'a') -> 97
print((int)'A') -> 65
But you can't do that with characters outside the 16 bit char range. print((int)'二') works, (giving 20108, or 4E8C in hex) but print((int)'𠄢') will give a compile error because the character code for 𠄢 does not fit in 16 bits (it's supposed to be 131362, or 20122 in hex, which gets encoded as a three byte UTF-8 sequence 239+191+189)
So for Unicode characters with a code higher than 0xFFFF you can't use int casting, and you'll actually have to think hard about what you're decoding. If you want true Unicode point values, you'll have to literally decode the byte print, but the Processing IDE doesn't actually let you do that; it will tell you that "𠄢".length() is 1, when in real Java it's really actually 3. There is -in current Processing- no way to actually get the Unicode value for any character with a code higher than 0xFFFF.
update
Someone mentioned you actually wanted hex strings. If so, use the built in hex function.
println(hex((int)'a')) -> 00000061
and if you only want 2, 4, or 6 characters, just use substring:
println(hex((int)'a').substring(4)) -> 0061

How do I perform XOR of const char in objective C?

I need to send hexadecimal values to a device through UDP/IP protocol, before i need to send i have to do XOR of the first two bytes with the two bytes of the "message sequence number" problem is that
when and where do i find MSB and LSB of the message sequence number
how do i perform XOR for the first two bytes, if i do so then how to append back to the original?
here is my array const char connectByteArray[] = {0x21,0x01,0x01,0x00,0xC0,0x50};
The below point will help to answer you better i think so
"XOR the first byte of the encryption block with the MSB of the message sequence number, and XOR the second byte of the encryption block with the LSB of the message sequence number"
//Bitwise XOR operator is ^ .
byte msb = (byte) (connectByteArray[0])<<8 //LSB
byte lsb = (byte) (connectByteArray[0]) >> 8 //MSB

Why does this code encodes random salt first as hexadecimal digits?

I'm looking at some existing code that is generating a salt which is used as input into an authentication hash.
The salt is 16 bytes long, and is generated by first using an OS random number generator to get 8 bytes of random data.
Then each byte in the 8 byte buffer is used to place data into 2 bytes of the 16 byte buffer as follows:
out[j] = hexTable[data[i] & 0xF];
out[j-1] = hexTable[data[i] >> 4 & 0xF];
Where out is the 16 byte salt, data is the initial 8 byte buffer, j and i are just loop incrementers obviously, and hexTable is just an array of the hex digits i.e. 0 to F.
Why is all this being done? Why isn't the 16 byte salt just populated with random data to begin with? Why go through this elaborate process?
Is what is being done here a standard way of generating salts? What's the benefit and point of this over just generating 16 random bytes in the first place?
This is simply conversion of your 8 random bytes to 16 hexadecimal digits.
It seems that someone misunderstood the concept of salt, or what input your hash needs, and thought it only accepts hexadecimal digits.
Maybe also the salt is stored somewhere where it is easier to store hexadecimal digits instead of pure bytes, and the programmer thought it would be good to be able to reuse the stored salt as-is (i.e. without converting it back to bytes first).

Do certain characters take more bytes than others?

I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.
I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.
The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?
It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):
In ASCII or ISO 8859, each character is represented by one byte
In UTF-32, each character is represented by 4 bytes
In UTF-8, each character uses between 1 and 4 bytes
In ISO 2022, it's much more complicated
US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.
It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.
The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.
If you choose a char from the ASCII set, then yes your delimter is a small as possible.
No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).
A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.
You can find an ASCII chart and what numbers correspond to what characters here.