Character encoding that won't change the higher bits after I set them - cocoa-touch

I'm looking for a character encoding that allows me to set a byte higher than 127. NSASCIICharacterEncoding and NSUTF8CharacterEncoding replace those higher values.

The character encoding only matters when you're trying to interpret the bytes as characters. If that's what you need to do, and if you're using data that comes from some outside source, then use whatever encoding the outside source used.
On the other hand, if you're just trying to manage a collection of bytes (i.e. not characters), then look into using NSData instead. NSData doesn't care about character encodings, doesn't change the order of your bytes, and will happily keep track of as much data as you give it. (There's a mutable version if you need to modify the data it contains.)

Related

How to send xon/xoff in case of binary data?

In case of software data flow control, we use xon and xoff (0x11 and 0x13) standard characters to pause and resume transmission. But if we want to send binary data which contains characters which match with the ascii value of xon and xoff, what character set should we use to send xon or xoff ?
I simple solution is to use base64 encoding, which you have it in python ..
base64.b64encode(yourData) - encode
base64.b64decode(yourData) - decode,
it adds the additional overhead but the sent data is in simple character format. even HDLC used base64 so this will be one option for you I suppose.
Using software handshaking precludes the sending of binary data.
Short of doing something esoteric (sending 9 bits/byte instead of 8 - very non-standard) there is no distinction between 2 of the 256 different binary data and the 2 codes selected for uses as XON/XOFF.
There are various protocols that attempt to deal with this. They all encode the "binary data" into something efficient but not a one-to-one mapping. One can use escape codes, compression, data packets, etc. Of course, both ends of the communication need to know how to encode/decode. This often limits your choices. If in doubt, start with Binary-to-text encoding as it tends to be easier to debug. http://en.wikipedia.org/wiki/Binary-to-text_encoding
To be able to use those two special characters as control ones, you have to make sure they do not occur in the payload data. One way to do that is to encode payload with a reduced alphabet that does not include the special characters. The binary-to-text encodings mentioned in a parallel answer would do the job, but if low overhead not depending on distribution of input bytes is critical, then the escapeless encoding may help.

Use of byte arrays and hex values in Cryptography

When we are using cryptography always we are seeing byte arrays are being used instead of String values. But when we are looking at the techniques of most of the cryptography algorithms they uses hex values to do any operations. Eg. AES: MixColumns, SubBytes all these techniques(I suppose it uses) uses hex values to do those operations.
Can you explain how these byte arrays are used in these operations as hex values.
I have an assignment to develop a encryption algorithm , therefore any related sample codes would be much appropriate.
Every four digits of binary makes a hexadecimal digit, so, you can convert back and forth quite easily (see: http://en.wikipedia.org/wiki/Hexadecimal#Binary_conversion).
I don't think I full understand what you're asking, though.
The most important thing to understand about hexadecimal is that it is a system for representing numeric values, just like binary or decimal. It is nothing more than notation. As you may know, many computer languages allow you to specify numeric literals in a few different ways:
int a = 42;
int a = 0x2A;
These store the same value into the variable 'a', and a compiler should generate identical code for them. The difference between these two lines will be lost very early in the compilation process, because the compiler cares about the value you specified, and not so much about the representation you used to encode it in your source file.
Main takeaway: there is no such thing as "hex values" - there are just hex representations of values.
That all said, you also talk about string values. Obviously 42 != "42" != "2A" != 0x2A. If you have a string, you'll need to parse it to a numeric value before you do any computation with it.
Bytes, byte arrays and/or memory areas are normally displayed within an IDE (integrated development environment) and debugger as hexadecimals. This is because it is the most efficient and clear representation of a byte. It is pretty easy to convert them into bits (in his mind) for the experienced programmer. You can clearly see how XOR and shift works as well, for instance. Those (and addition) are the most common operations when doing symmetric encryption/hashing.
So it's unlikely that the program performs this kind of conversion, it's probably the environment you are in. That, and source code (which is converted to bytes at compile time) probably uses a lot of literals in hexadecimal notation as well.
Cryptography in general except hash functions is a method to convert data from one format to another mostly referred as cipher text using a secret key. The secret key can be applied to the cipher text to get the original data also referred as plain text. In this process data is processed in byte level though it can be bit level as well. The point here the text or strings which we referring to are in limited range of a byte. Example ASCII is defined in certain range in byte value of 0 - 255. In practical when a crypto operation is performed, the character is converted to equivalent byte and the using the key the process is performed. Now the outcome byte or bytes will most probably be out of range of human readable defined text like ASCII encoded etc. For this reason any data to which a crypto function is need to be applied is converted to byte array first. For example the text to be enciphered is "Hello how are you doing?" . The following steps shall be followed:
1. byte[] data = "Hello how are you doing?".getBytes()
2. Process encipher on data using key which is also byte[]
3. The output blob is referred as cipherTextBytes[]
4. Encryption is complete
5. Using Key[], a process is performed over cipherTextBytes[] which returns data bytes
6 A simple new String(data[]) will return string value of Hellow how are you doing.
This is a simple info which might help you to understand reference code and manuals better. In no way I am trying to explain you the core of cryptography here.

How Do I Convert a Byte Stream to a Text String?

I'm working on a licensing system for my application. I'd like to put all licensing information (licensee name, expiration date, and enabled features) into an object, encrypt that object with a private key, then represent the encrypted data as a single text string which I can send via email to my customers.
I've managed to get the encrypted data into a byte stream, but I don't know how to convert that byte stream into a text value -- something that contains no control characters or whitespace. Can anyone offer advice on how to do that? I've been researching the Encoding class, but I can't find a text-only encoding.
I'm using Net 2.0 -- mostly VB, but I can do C# also.
Use a Base64Encoder to convert it to a text string that can be decoded with a Base64Decoder. It is great for representing arbitary binary data in a text friendly manner, only upper and lower case A-Z and 0-9 digits.
BinHex is an example of one way to do that. It may not be exactly what you want -- for example, you might want to encode your data such that it's impossible to inadvertently spell words in your string, and you may or may not care about maximizing the density of information. But it's an example that may help you come up with your own encoding.
I've found Base32 useful for license keys before. There are some C# implementations linked from this answer. My own license code is based on this implementation, which avoids ambiguous characters to make it easier to retype the keys.

Objective-C How to get unicode character

I want to get unicode code point for a given unicode character in Objective-C. NSString said it internal use UTF-16 encoding and said,
The NSString class has two primitive methods—length and characterAtIndex:—that provide the basis for all other methods in its interface. The length method returns the total number of Unicode characters in the string. characterAtIndex: gives access to each character in the string by index, with index values starting at 0.
That seems assume characterAtIndex method is unicode aware. However it return unichar is a 16 bits unsigned int type.
- (unichar)characterAtIndex:(NSUInteger)index
The questions are:
Q1: How it present unicode code point above UFFFF?
Q2: If Q1 make sense, is there method to get unicode code point for a given unicode character in Objective-C.
Thx.
The short answer to "Q1: How it present unicode code point above UFFFF?" is: You need to be UTF16 aware and correctly handle Surrogate Code Points. The info and links below should give you pointers and example code that allow you to do this.
The NSString documentation is correct. However, while you said "NSString said it internal use UTF-16 encoding", it's more accurate to say that the public / abstract interface for NSString is UTF16 based. The difference is that this leaves the internal representation of a string a private implementation detail, but the public methods such as characterAtIndex: and length are always in UTF16.
The reason for this is it tends to strike the best balance between older ASCII-centric and Unicode aware strings, largely due to the fact that Unicode is a strict superset of ASCII (ASCII uses 7 bits, for 128 characters, which are mapped to the first 128 Unicode Code Points).
To represent Unicode Code Points that are > U+FFFF, which obviously exceeds what can be represented in a single UTF16 Code Unit, UTF16 uses special Surrogate Code Points to form a Surrogate Pair, which when combined together form a Unicode Code Point > U+FFFF. You can find details about this at:
Unicode UTF FAQ - What are surrogates?
Unicode UTF FAQ - What’s the algorithm to convert from UTF-16 to character codes?
Although the official Unicode UTF FAQ - How do I write a UTF converter? now recommends the use of International Components for Unicode, it used to recommend some code officially sanctioned and maintained by Unicode. Although no longer directly available from Unicode.org, you can still find copies of the "no longer official" example code in various open-source projects: ConvertUTF.c and ConvertUTF.h. If you need to roll your own, I'd strongly recommend examining this code first, as it is well tested.
From the documentation of length:
The number returned includes the
individual characters of composed
character sequences, so you cannot use
this method to determine if a string
will be visible when printed or how
long it will appear.
From this, I would infer that any characters above U+FFFF would be counted as two characters and would be encoded as a Surrogate Pair (see the relevant entry at http://unicode.org/glossary/).
If you have a UTF-32 encoded string with the character you wish to convert, you could create a new NSString with initWithBytesNoCopy:length:encoding:freeWhenDone: and use the result of that to determine how the character is encoded in UTF-16, but if you're going to be doing much heavy Unicode processing, your best bet is probably to get familiar with ICU (http://site.icu-project.org/).

Char.ConvertFromUtf32 not available in Silverlight

I'm converting a WinForms app to Silverlight (VB.NET). What should I use instead of Char.ConvertFromUtf32 as it's not available to use in Silverlight?
UTF-32 is currently not part of Silverlight, so you have to find a way around the limitation. I think you should stop a moment and think exactly why you need to read UTF32-encoded text.
If you are reading such text from a database or a file on the server, I would perform the conversion server-side (if possible I would convert everything to UTF-8 and get rid of the UTF-32 data in one shot).
If you are parsing a user-provided file on the client side, I would detect the UTF-32 encoding and gently tell the user that the file encoding is not supported. UTF32 is pretty rare nowadays, so I guess it should not be a very common case (but I could be wrong not knowing your exact situation).
In order to detect the file encoding you have to look at the first few bytes (byte order mark) -more information here, if they are not present the task becomes much harder and involves some kind of heuristics based on character frequency.
From: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/how-to-convert-between-hexadecimal-strings-and-numeric-types
You can use a direct cast, like:
// Get the character corresponding to the integral value.
string stringValue = Char.ConvertFromUtf32(value);
char charValue = (char)value;
Small warning, it will only work up to 0xffff. It will not work for high range Unicode from 0x10000 to 0x10ffff.
Also, if you need to parse \uXXXX, try this other question: How do I convert Unicode escape sequences to Unicode characters in a .NET string?