I'm writing a cookie authentication library that replicates that of an existing system. I'm able to create authentication tokens that work. However testing with a token with known value, created by the existing system, I encountered the following puzzle.
The original encoded string purports to be base64url encoded. And, in fact, using any of several base64url code modules and online tools, the decoded value is the expected result.
However base64url encoding the decoded value (again using any of several tools) doesn't reproduce the original string. Both encoded strings decode to the expected results, so apparently both representations are valid.
How? What's the difference?
How can I replicate the original encoded results?
original encoded string: YWRtaW46NTVGRDZDRUE6vtRbQoEXD9O6R4MYd8ro2o6Rzrc
my base64url decode: admin:55FD6CEA:[encrypted hash]
Encoding doesn't match original but the decoded strings match.
my base64url encode: YWRtaW46NTVGRDZDRUE677-977-9W0Lvv70XD9O6R--_vRh377-977-92o7vv73Otw
my base64url decode: admin:55FD6CEA:[encrypted hash]
(Sorry, SSE won't let me show the unicode representation of the hash. I assure you, they do match.)
This string:
YWRtaW46NTVGRDZDRUE6vtRbQoEXD9O6R4MYd8ro2o6Rzrc
is not exactly valid Base64. Valid Base64 consists in a sequence of characters among uppercase letters, lowercase letters, digits, '/' and '+'; it must also have a length which is a multiple of 4; 1 or 2 final '=' signs may appear as padding so that the length is indeed a multiple of 4. This string contains only Base64-valid characters, but only 47 of them, and 47 is not a multiple of 4. With an extra '=' sign at the end, this becomes valid Base64.
That string:
YWRtaW46NTVGRDZDRUE677-977-9W0Lvv70XD9O6R--_vRh377-977-92o7vv73Otw
is not valid Base64. It contains several '-' and one '_' sign, neither of which should appear in a Base64 string. If some tool is decoding that string into the "same" result as the previous string, then the tool is not implementing Base64 at all, but something else (and weird).
I suppose that your strings got garbled at some point through some copy&paste mishap, maybe related to a bad interpretation of bytes as characters. This is the important point: bytes are NOT characters.
It so happens that, traditionally, in older times, computers got on the habit of using so-called "code pages" which were direct mappings of characters onto bytes, with each character being encoded as exactly one byte. Thus came into existence some tools (such as Windows' notepad.exe) that purport to do the inverse, i.e. show the contents of a file (nominally, some bytes) as they character counterparts. This, however, fails when the bytes are not "printable characters" (while a code page such as "Windows-1252" maps each character to a byte value, there can be byte values that are not the mapping of a printable character). This also began to fail even more when people finally realized that there were only 256 possible byte values, and a lot more possible characters, especially when considering Chinese.
Unicode is an evolving standard that maps characters to code units (i.e. numbers), with a bit more than 100000 currently defined. Then some encoding rules (there are several of them, the most frequent being UTF-8) encode the characters into bytes. Crucially, one character can be encoded over several bytes.
In any case, a hash value (or whatever you call an "encrypted hash", which is probably a confusion, because hashing and encrypting are two distinct things) is a sequence of bytes, not characters, and thus is never guaranteed to be the encoding of a sequence of characters in any code page.
Armed with this knowledge, you may try to put some order into your strings and your question.
Edit: thanks to #marfarma for pointing out the URL-safe Base64 encoding where the '+' and '/' characters are replaced by '-' and '_'. This makes the situation clearer. When adding the needed '=' signs, the first string then decodes to:
00000000 61 64 6d 69 6e 3a 35 35 46 44 36 43 45 41 3a be |admin:55FD6CEA:.|
00000010 d4 5b 42 81 17 0f d3 ba 47 83 18 77 ca e8 da 8e |.[B.....G..w....|
00000020 91 ce b7 |...|
while the second becomes:
00000000 61 64 6d 69 6e 3a 35 35 46 44 36 43 45 41 3a ef |admin:55FD6CEA:.|
00000010 bf bd ef bf bd 5b 42 ef bf bd 17 0f d3 ba 47 ef |.....[B.......G.|
00000020 bf bd 18 77 ef bf bd ef bf bd da 8e ef bf bd ce |...w............|
00000030 b7 |.|
We now see what happened: the first string was decoded to bytes but someone fed these bytes to some display system or editors that really expected UTF-8. Some of these bytes were not valid UTF-8 encoding of anything, so they were replaced with the Unicode code point U+FEFF ZERO WIDTH NO-BREAK SPACE, i.e. a space character with no width (thus, nothingness on the screen). The characters where then reencoded as UTF-8, each U+FEFF yielding the EF BF BD sequence of three bytes.
Therefore, the hash value was badly mangled, but the bytes that were altered show up as nothing when interpreted (wrongly) as characters, and what was put in their place also shows up as nothing. Hence no visible difference on the screen.
Related
When downloading the pdf file, I specified the password "123456789987654abc211234567899klm7654321". When opening, I can remove a few characters, for example,
"123456789987654abc211234567899kl" - file will open anyway! But if I use
"123456789987654abc211234567899k" - file not open
help me understand what is the problem
private static void encryptPdf(
InputStream inputStream,
OutputStream outputStream,
String ownerPassword,
String userPassword) throws Exception
{
PDDocument document = PDDocument.load(inputStream);
if (document.isEncrypted())
{
return;
}
AccessPermission accessPermission = new AccessPermission();
StandardProtectionPolicy spp =
new StandardProtectionPolicy(ownerPassword, userPassword, accessPermission);
spp.setEncryptionKeyLength(40);
document.protect(spp);
document.save(outputStream);
document.close();
}
The first step in calculating the encryption key from the password for pdf encryption up to revision 4 is
The password string is generated from host system codepage characters (or system scripts) by first converting the string to PDFDocEncoding. If the input is Unicode, first convert to a codepage encoding, and then to PDFDocEncoding for backward compatibility. Pad or truncate the resulting password string to exactly 32 bytes. If the password string is more than 32 bytes long, use only its first 32 bytes; if it is less than 32 bytes long, pad it by appending the required number of additional bytes from the beginning of the following padding string:
<28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A>
That is, if the password string is n bytes long, append the first 32 - n bytes of the padding string to the end of the password string. If the password string is empty (zero-length), meaning there is no user password, substitute the entire padding string in its place.
(ISO 32000-2 section 7.6.4.3.2 "Algorithm 2: Computing a file encryption key in order to encrypt a document (revision 4 and earlier)")
For more modern encryption types you have a restriction, too, but generally less harsh:
The UTF-8 password string shall be generated from Unicode input by processing the input string with the SASLprep (Internet RFC 4013) profile of stringprep (Internet RFC 3454) using the Normalize and BiDi options, and then converting to a UTF-8 representation.
Truncate the UTF-8 representation to 127 bytes if it is longer than 127 bytes.
(ISO 32000-2 section 7.6.4.3.3 "Algorithm 2.A: Retrieving the file encryption key from an encrypted document in order to decrypt it (revision 6 and later)")
I am making an app in visualstudios's VB to autoinstall the printer in windows. Problem is, that the printer needs a login and pass. I found registry entry, where this is stored, but the password is stored in REG_BINARY format.
Here is how it looks after manually writing the password into printer settings - see UserPass:
Please could you tell me how to convert password (in string) into the reg_binary (see attachement - red square)?
The password in this case was 09882 and it has been stored as 98 09 e9 4c c3 24 26 35 14 6f 83 67 8c ec c4 90. Is there any function in VB to convert 09882 into this REG_BINARY format please?
REG_BINARY means that it is binary data and binary data in .NET is represent by a Byte array. The values you see in RegEdit are the hexadecimal values of the individual bytes, which is a common representation because every byte can be represented by two digits. You need to convert your String to a Byte array and then save it to the Registry like any other data.
How you do that depends on what the application expects. Maybe it is simply converting the text to Bytes based on a specific encoding, e.g. Encoding.ASCII.GetBytes. Maybe it's a hash. You might need to research and/or experiment to find out exactly what's expected.
I am doing a database (Oracle) migration validation and I am writing scripts to make sure the target matches the source. My script is returning values that, when you look at them, look equal. However, they are not.
For instance, the target has PREAPPLICANT and the source has PREAPPLICANT. When you look at them in text, they look fine. But when I converted them to hex, it shows 50 52 45 41 50 50 4c 49 43 41 4e 54 for the target and 50 52 45 96 41 50 50 4c 49 43 41 4e 54 for the source. So there is an extra 96 in the hex.
So, my questions are:
What is the 96 char?
Would you say that the target has incorrect data because it did not bring the char over? I realize this question may be a little subjective, but I'm asking it from the standpoint of "what is this character and how did it get here?"
Is there a way to ignore this character in the SQL script so that the equality check passes? (do I want the equality to pass or fail here?)
It looks like you have Windows-1252 character set here.
https://en.wikipedia.org/wiki/Windows-1252
Character 96 is an En Dash. This makes sense, as the data was PREAPPLICANT.
One user provided "PREAPPLICANT" and another provided "PRE-APPLICANT" and Windows helpfully converted their proper dash into an en dash.
As such, this doesn't appear to be an error in data, more an error in character sets. You should be able to filter these out without too much effort but then you are changing data. It's kind of like when one person enters "Mr Jones" and another enters "Mr. Jones"--you have to decide how much data massaging you want to do.
As you probably already have done, use the DUMP function to get the byte representation of the data in code of you wish to inspect for weirdness.
Here's some text with plain ASCII:
select dump('Dashes-and "smart quotes"') from dual;
Typ=96 Len=25: 68,97,115,104,101,115,45,97,110,100,32,34,115,109,97,114,116,32,113,117,111,116,101,115,34
Now introduce funny characters:
select dump('Dashes—and “smart quotes”') from dual;
Typ=96 Len=31: 68,97,115,104,101,115,226,128,148,97,110,100,32,226,128,156,115,109,97,114,116,32,113,117,111,116,101,115,226,128,157
In this case, the number of bytes increased because my DB is using UTF8. Numbers outside of the valid range for ASCII stand out and can be inspected further.
Here's another way to see the special characters:
select asciistr('Dashes—and “smart quotes”') from dual;
Dashes\2014and \201Csmart quotes\201D
This one converts non-ASCII characters into backslashed Unicode hex.
In JavaCard 2.2.2 API, I can see that some symmetric ciphers are implemented with a padding mode, for example:
Cipher algorithm ALG_DES_CBC_ISO9797_M1 provides a cipher using DES in
CBC mode or triple DES in outer CBC mode, and pads input data
according to the ISO 9797 method 1 scheme.
But for the AES cipher, there is no padding mode that is available (ALG_AES_BLOCK_128_ECB_NOPAD and ALG_AES_BLOCK_128_CBC_NOPAD).
So how explain that it's not supported for this algorithm?
Are these padding methods vulnerable to known attacks using AES?
If other padding modes are available depends on the Java Card API you are using as well as the implementation details for the specific Java Card.
Later API's have:
a new getInstance method which can be used with PAD_PKCS5;
additional constants such as ALG_AES_CBC_PKCS5.
The special getInstance method was added because of the explosion of modes and padding methods.
Older API implementations may indeed not have these methods, but please again check availability.
AES itself is a block cipher. The different modes such as CBC use a cipher and a padding - so CBC_AES_PKCS7PADDING would be more logical in some sense. As a block cipher, AES is therefore not vulnerable to padding oracle attacks.
CBC on the other hand is vulnerable against padding oracle - and other plaintext oracle - attacks. So you should protect your IV and ciphertext with e.g. a AES-CMAC authentication tag if you need protection against these attacks.
That's however not a reason why the padding modes were not included. The different padding modes are certainly present now.
Not necessarily - it only means, that this algorithm does not automatically pad input data. You have to do it by yourself (probably pad it to multiples of 16 bytes, because this is what AES needs).
So how explain that it's not supported for this algorithm?
I don't know for sure, but note that there are several ways of doing this and maybe author decided that you have choose most suitable padding style for you.
If case you want to know more about padding, consider this example:
You have to encrypt a word "overflow" with AES.
First, you have to convert it to byte form, because this is what AES operates on.
ASCII encoded string "overflow" is
"6F 76 65 72 66 6C 6F 77 00"
(last byte is string terminator, AKA \0 or null byte)
Unfortunately, this is also insufficient for pure AES algorithm, because it can OLNY operate on whole blocks of data - like 16 byte block of data.
This means, you need 16-9=7 more bytes of data. So you pad your encoded string to full 16 bytes of data with null byte for example. Result is
"6F 76 65 72 66 6C 6F 77 00 00 00 00 00 00 00 00"
Now you choose your encryption key and encrypt data.
After you decrypt your data, you receive again
"6F 76 65 72 66 6C 6F 77 00 00 00 00 00 00 00 00"
And now the crux of the matter: how do you know, which bytes where originally in your string, and which are padding bytes?
In case of en/decrypting strings this is very simple because string (almost) always ends with null byte, and never have multiple consecutive null bytes at the end. So it's easy to determine where to cut your data.
More information about styles of "crypto-padding" you can find here: https://en.wikipedia.org/wiki/Padding_%28cryptography%29#Byte_padding
Suppose I have the MUSICAL SYMBOL G CLEF symbol: ** 𝄞 ** that I wish to have in a string literal in my Objective-C source file.
The OS X Character Viewer says that the CLEF is UTF8 F0 9D 84 9E and Unicode 1D11E(D834+DD1E) in their terms.
After some futzing around, and using the ICU UNICODE Demonstration Page, I did get the following code to work:
NSString *uni=#"\U0001d11e";
NSString *uni2=[[NSString alloc] initWithUTF8String:"\xF0\x9D\x84\x9E"];
NSString *uni3=#"𝄞";
NSLog(#"unicode: %# and %# and %#",uni, uni2, uni3);
My questions:
Is it possible to streamline the way I am doing UTF-8 literals? That seems kludgy to me.
Is the #"\U0001d11e part UTF-32?
Why does cutting and pasting the CLEF from Character Viewer actually work? I thought Objective-C files had to be UTF-8?
I would prefer the way you did it in uni3, but sadly that is not recommended. Failing that, I would prefer the method in uni to that in uni2. Another option would be [NSString stringWithFormat:#"%C", 0x1d11e].
It is a "universal character name", introduced in C99 (section 6.4.3) and imported into Objective-C as of OS X 10.5. Technically this doesn't have to give you UTF-8 (it's up to the compiler), but in practice UTF-8 is probably what you'll get.
The encoding of the source code file is probably UTF-8, matching what the runtime expects, so everything happens to work. It's also possible the source file is UTF-16 or UTF-32 and the compiler is doing the Right Thing when compiling it. None the less, Apple does not recommend this.
Answers to your questions (same order):
Why choose? Xcode uses C99 in default setup. Refer to the C0X draft specification 6.4.3 on Universal Character Names. See below.
More technically, the #"\U0001d11e is the 32 bit Unicode code point for that character in the ISO 10646 character set.
I would not count on this behavior working. You should absolutely, positively, without question have all the characters in your source file be 7 bit ASCII. For string literals, use an encoding or, preferably, a suitable external resource able to handle binary data.
Universal Character Names (from the WG14/N1256 C0X Draft which CLANG follows fairly well):
Universal Character Names may be used
in identifiers, character constants,
and string literals to designate
characters that are not in the basic
character set.
The universal
character name \Unnnnnnnn designates
the character whose eight-digit short
identifier (as specified by ISO/IEC
10646) is nnnnnnnn) Similarly, the
universal character name \unnnn
designates the character whose
four-digit short identifier is nnnn
(and whose eight-digit short
identifier is 0000nnnn).
Therefor, you can produce your character or string in a natural, mixed way:
char *utf8CStr =
"May all your CLEF's \xF0\x9D\x84\x9E be left like this: \U0001d11e";
NSString *uni4=[[NSString alloc] initWithUTF8String:utf8CStr];
The \Unnnnnnnn form allows you to select any Unicode code point, and this is the same value as "Unicode" field at the bottom left of the Character Viewer. The direct entry of \Unnnnnnnn in the C99 source file is handled appropriately by the compiler. Note that there are only two options: \unnnn which is a 256 character offset to the default code page or \Unnnnnnnn which is the full 32 bit character of any Unicode code point. You need to pad the left with 0's if you are not using all 4 or all 8 digits or \u or \U.
The form of \xF0\x9D\x84\x9E in the same string literal is more interesting. This is inserting the raw UTF-8 encoding of the same character. Once passed to the initWithUTF8String method, but the literal and the encoded literal end up as encoded UTF-8.
It may, arguably, be a violation of 130 of section 5.1.1.2 to use raw bytes in this way. Given that a raw UTF-8 string would be encoded similarly, I think you are OK.
You can write the clef character in your string literal, too:
NSString *uni2=[[NSString alloc] initWithUTF8String:"𝄞"];
The \U0001d11e matches the unicode code point for the G clef character. The UTF-32 form of a character is the same as its codepoint, so you can think of it as UTF-32 if you want to. Here's a link to the unicode tables for musical symbols.
Your file probably is UTF-8. The G clef is a valid UTF8 character - check out the output from hexdump for your file:
00 4e 53 53 74 72 69 6e 67 20 2a 75 6e 69 33 3d 40 |NSString *uni3=#|
10 22 f0 9d 84 9e 22 3b 0a 20 20 4e 53 4c 6f 67 28 |"....";. NSLog(|
As you can see, the correct UTF-8 representation of that character is in the file right where you'd expect it. It's probably safer to use one of your other methods and try to keep the source file in the ASCII range.
I created some utility classes to convert easily between unicode code points, UTF-8 byte sequences and NSString. You can find the code on Github, maybe it is of some use to someone.