Is 0000 a valid EBCDIC signed value? - ebcdic

We have an ASCII file with numbers formatted as EBCDIC signed fields.
Sometimes the value is 0000 while I would expect 000{ or 000}.
Is 0000 a valid EBCDIC signed value within an ASCII file?

Short Answer
Yes, both '0000' and '000{' denote a positive zero. '000}' denotes a negative zero.
Detailed Answer
Packed decimal number are often used on IBM mainframe systems, since the processor has a set of decimal instructions. Those instructions assume that its operands follow the rules for decimal packed numbers in storage. See IBM z/Architecture Principles of Operation, Chapter 8 "Decimal Instructions".
In summary, a decimal packed number has digits, i.e. 0x0 - 0x9, in every nibble of every byte, except for the right nibble in the rightmost byte (rightmost nibble). The rightmost nibble holds the sign, which has preferred values 0xC for positive, and 0xD for negative values. The system also accepts 0xA, 0xE, and 0xF as positive signs, and 0xBas negtive sign.
Making Packed Decimal Human Readable
If you need to make a decimal packed number human readable, you can use the UNPK (unpack) processor instruction. This instruction transforms each byte, except for the rightmost byte, nibble by nibble, to the corresponding EBCDIC character digit, i.e.
0x0 --> '0' (= 0xF0)
0x1 --> '1' (= 0xF1)
...
0x9 --> '9' (= 0xF9)
The rigtmost byte is handled differently, since it contains a digit in the left nibble and the sign in the right nibble. This byte is transformed by simply exchanging the nibbles. For decimal numbers with preferred sign values, this is:
positive values: 0xdC --> 0xCd
negative values: 0xdD --> 0xDd
where the lowercase d denotes the digit nibble value, i.e. 0x0, 0x1, ..., 0x9.
So, positvie values lead to:
0xC0, 0xC1, ..., 0xC9
and negative values lead to
0xD0, 0xD1, ..., 0xD9.
The corresponding resulting EBCDIC characters
'{', 'A', 'B', ..., 'I' (positive values)
'}', 'J', 'K', ..., 'R' (negative values)
To make the numbers really human readable, programs then usually overlay the left nibble of this last character with 0xF to make it a read EBCDIC character digit. This is called zoned decimal format.
So far, only the preferred sign codes were used. If alternate sign codes (as noted above) would be used, all sorts of additional characters might appear. For example, variations of the number zero with alternate sign codes would show (in EBCDIC):
positive zero: 0x0A --> 0xA0, which is 'µ'
positive zero: 0x0E --> 0xE0, which is '\'
positive zero: 0x0F --> 0xF0, which is '0'
negative zero: 0x0B --> 0xB0, which is '^'
Handling Imporperly Unpacked Numbers
If the program doing the unpacking of packed decimal numbers does not handle the sign nibble correctly for human redability, you can:
In EBCDIC, overlay the left nibble of the right most character byte with 0xF to make sure it is a real EBCDIC character digit.
In ASCII, overlay the left nibble of the right most character byte with 0x3 to make sure it is a real ASCII character digit.

Related

Quoted Printables Encoding - counting Bits

Let's say I want to encode a word in quoted printable (with charset ISO 8859-1) and count bits afterwards. How do you count the encoded quoted printable tag ("=" and hex) in bits?
Original: hätte -> 7+8+7+7+7 = 36 Bits
Encoded: h=E4tte -> does "=E4" count for 3*7 Bits or 1*7 Bits?

Scientific notation with three significant figures

Is there a way to use scientific notation in objective c and have it display three significant digits only? What I am current using is:
string = [NSString stringWithFormat:#"%e", floatNumber];
// floatNumber = 100000; string = 1.000000e+06
I just want string = 1.00e+06
Use the format specifier ".2" as follows:
string = [NSString stringWithFormat:#"%.2e", floatNumber];
From apple's documentation:
The format specifiers supported by the NSString formatting methods and CFString formatting functions follow the IEEE printf specification...
And from the IEEE printf specification, if you read under the Description section, you will find:
e, E
The double argument shall be converted in the style "[-]d.ddde±dd", where there is one digit before the radix character (which is non-zero if the argument is non-zero) and the number of digits after it is equal to the precision; if the precision is missing, it shall be taken as 6; if the precision is zero and no '#' flag is present, no radix character shall appear. The low-order digit shall be rounded in an implementation-defined manner. The E conversion specifier shall produce a number with 'E' instead of 'e' introducing the exponent. The exponent shall always contain at least two digits. If the value is zero, the exponent shall be zero.

Hexadecimal numbers vs. hexadecimal enocding (with base64 as well)

Encoding with hexadecimal numbers seems to be different from using hexadecimals to represent numbers. For example, then hex number 0x40 to me should be equal to 64, or BA_{64}, but when I put it through this hex to base64 converter, I get the output: QA== which to me is equal to some number times 64. Why is this?
Also when I check the integer value of the hex string deadbeef I get 3735928559, but when I check it other places I get: 222 173 190 239. Why is this?
Addendum: So I guess it is because it is easier to break the number into bit chunks than treat it as a whole number when encoding? That is pretty confusing to me but I guess I get it.
You may wish to read this:
http://en.wikipedia.org/wiki/Base64
In summary, base64 specifies a specific encoding, which involves using different values for letters than their ASCII encoding.
For the second part, one source is treating the entire string as a 32 bit integer, and the other is dividing it into bytes and giving the value of each byte.

Why do DocBook generated XHTML5 Section titles have ASCII #c2 characters in them?

I noticed my generated XHTML5 numbered section titles have a  between the number and the title string. I thought this was a generation error. But no, the gentext file of my DocBook distribution, common/en.xml, actually specifies this.
Line 338 of common/en.xml:
<l:template name="section" text="%n. %t"/>
The dot and space following the %n are, when viewed in a hex editor, ASCII character codes C2 and A0, which are the  and NBSP characters respectively. I can understand NBSP. But why the �
I understand I can change this in my customization layer. But the default seems odd.
I'm using docbook-xsl-ns-1.77.1.
That is because the encoding is UTF-8, which is the normal Unicode encoding for text these days. In UTF-8, any character above 0x7F is represented by a sequence of 2, 3, or 4 bytes depending on how many significant code bits it contains.
The 0xC2 is one of the chars that starts a 2-byte sequence. In binary, it's 1100 0010. The two 1 bits denote a 2-char sequence, and the bottom five bits are the first five of the encoded character. The second one, 0xA0, is 1001 0000. The single leading 1 bit (followed by a 0 bit) denotes a continuation of the sequence, and the bottom 6 bits are the bottom bits of the encoded character.
Putting the bottom five bits from the first byte together with the bottom six bits from the second, we get 000 1001 0000, in hex U+A0, which is indeed the nonbreaking space.

Do certain characters take more bytes than others?

I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.
I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.
The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?
It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):
In ASCII or ISO 8859, each character is represented by one byte
In UTF-32, each character is represented by 4 bytes
In UTF-8, each character uses between 1 and 4 bytes
In ISO 2022, it's much more complicated
US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.
It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.
The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.
If you choose a char from the ASCII set, then yes your delimter is a small as possible.
No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).
A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.
You can find an ASCII chart and what numbers correspond to what characters here.