pdfbox encrypted file opens despite the password being wrong - pdfbox

When downloading the pdf file, I specified the password "123456789987654abc211234567899klm7654321". When opening, I can remove a few characters, for example,
"123456789987654abc211234567899kl" - file will open anyway! But if I use
"123456789987654abc211234567899k" - file not open
help me understand what is the problem
private static void encryptPdf(
InputStream inputStream,
OutputStream outputStream,
String ownerPassword,
String userPassword) throws Exception
{
PDDocument document = PDDocument.load(inputStream);
if (document.isEncrypted())
{
return;
}
AccessPermission accessPermission = new AccessPermission();
StandardProtectionPolicy spp =
new StandardProtectionPolicy(ownerPassword, userPassword, accessPermission);
spp.setEncryptionKeyLength(40);
document.protect(spp);
document.save(outputStream);
document.close();
}

The first step in calculating the encryption key from the password for pdf encryption up to revision 4 is
The password string is generated from host system codepage characters (or system scripts) by first converting the string to PDFDocEncoding. If the input is Unicode, first convert to a codepage encoding, and then to PDFDocEncoding for backward compatibility. Pad or truncate the resulting password string to exactly 32 bytes. If the password string is more than 32 bytes long, use only its first 32 bytes; if it is less than 32 bytes long, pad it by appending the required number of additional bytes from the beginning of the following padding string:
<28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A>
That is, if the password string is n bytes long, append the first 32 - n bytes of the padding string to the end of the password string. If the password string is empty (zero-length), meaning there is no user password, substitute the entire padding string in its place.
(ISO 32000-2 section 7.6.4.3.2 "Algorithm 2: Computing a file encryption key in order to encrypt a document (revision 4 and earlier)")
For more modern encryption types you have a restriction, too, but generally less harsh:
The UTF-8 password string shall be generated from Unicode input by processing the input string with the SASLprep (Internet RFC 4013) profile of stringprep (Internet RFC 3454) using the Normalize and BiDi options, and then converting to a UTF-8 representation.
Truncate the UTF-8 representation to 127 bytes if it is longer than 127 bytes.
(ISO 32000-2 section 7.6.4.3.3 "Algorithm 2.A: Retrieving the file encryption key from an encrypted document in order to decrypt it (revision 6 and later)")

Related

Converting string into REG_BINARY

I am making an app in visualstudios's VB to autoinstall the printer in windows. Problem is, that the printer needs a login and pass. I found registry entry, where this is stored, but the password is stored in REG_BINARY format.
Here is how it looks after manually writing the password into printer settings - see UserPass:
Please could you tell me how to convert password (in string) into the reg_binary (see attachement - red square)?
The password in this case was 09882 and it has been stored as 98 09 e9 4c c3 24 26 35 14 6f 83 67 8c ec c4 90. Is there any function in VB to convert 09882 into this REG_BINARY format please?
REG_BINARY means that it is binary data and binary data in .NET is represent by a Byte array. The values you see in RegEdit are the hexadecimal values of the individual bytes, which is a common representation because every byte can be represented by two digits. You need to convert your String to a Byte array and then save it to the Registry like any other data.
How you do that depends on what the application expects. Maybe it is simply converting the text to Bytes based on a specific encoding, e.g. Encoding.ASCII.GetBytes. Maybe it's a hash. You might need to research and/or experiment to find out exactly what's expected.

How do I find password related information of a password protected PDF?

I was trying to gather information about password used to protect a PDF. I used PeePDF and xxd editor to view all the objects of password protected PDF.I came to know password info is stored in trailer part of PDF structure.When I run this command , I got
xxd my_encrypted.pdf | tail -n 4
00022c60: 5d0a 2f49 6e66 6f20 3220 3020 520a 2f45 ]./Info 2 0 R./E
00022c70: 6e63 7279 7074 2034 2030 2052 0a3e 3e0a ncrypt 4 0 R.>>.
00022c80: 7374 6172 7478 7265 660a 3134 3139 3934 startxref.141994
00022c90: 0a25 2545 4f46 0a .%%EOF.
So, I understood that /Encrypt dictionary is in object 4. Now using PeePDF, I tried
PPDF> object 4
<< /O ��%�}�&��
v����o
B��z���B�
/Filter /Standard
/Length 128
/V 2
/U ZM����S��3�
fmL
/R 3
/P -1 >>
/O is owner password
/U is user password of PDF
PPDF> info 4
Offset: 699
Size: 206
MD5: 8a74ac53f9e6c1f4da44bcdbb65509e9
Object: dictionary
References: []
I got this info . I didn't even get the hash of password. What is that junk text represent ? Is it encrypted password ? What is MD5 hash represent ?
please tell me if there are any other tools that could analyse PDFs and get hash of password that is protecting PDF.
Thank you
If nothing else helps, read the specification.
In the case at hand the specification to read is the PDF specification, i.e. ISO 32000-1 and ISO 32000-2.
A copy of ISO 32000-1 is published by Adobe at https://www.adobe.com/go/pdfreference/ (beware, the exact location has moved around a bit over the years) - it is complete and merely has the official ISO headers replaced by Adobe headers. Read section 7.6 Encryption therein.
In case of your PDF the encryption dictionary entry V with value 2 indicates the use of "Algorithm 1: Encryption of data using the RC4 or AES algorithms" in 7.6.2, "General Encryption Algorithm," but permitting encryption key lengths greater than 40 bits. The entry R with value 3 indicates the revision of the standard security handler. This algorithm/handler pair already is explained in ISO 32000-1.
In particular you can read there how the O and U values are calculated:
Algorithm 3: Computing the encryption dictionary’s O (owner password) value
a)Pad or truncate the owner password string as described in step (a) of "Algorithm 2: Computing an encryption key". If there is no owner password, use the user password instead.
b)Initialize the MD5 hash function and pass the result of step (a) as input to this function.
c)(Security handlers of revision 3 or greater) Do the following 50 times: Take the output from the previous MD5 hash and pass it as input into a new MD5 hash.
d)Create an RC4 encryption key using the first n bytes of the output from the final MD5 hash, where n shall always be 5 for security handlers of revision 2 but, for security handlers of revision 3 or greater, shall depend on the value of the encryption dictionary’s Length entry.
e)Pad or truncate the user password string as described in step (a) of "Algorithm 2: Computing an encryption key".
f)Encrypt the result of step (e), using an RC4 encryption function with the encryption key obtained in step (d).
g)(Security handlers of revision 3 or greater) Do the following 19 times: Take the output from the previous invocation of the RC4 function and pass it as input to a new invocation of the function; use an encryption key generated by taking each byte of the encryption key obtained in step (d) and performing an XOR (exclusive or) operation between that byte and the single-byte value of the iteration counter (from 1 to 19).
h)Store the output from the final invocation of the RC4 function as the value of the O entry in the encryption dictionary.
and
Algorithm 5: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 3 or greater)
a)Create an encryption key based on the user password string, as described in "Algorithm 2: Computing an encryption key".
b)Initialize the MD5 hash function and pass the 32-byte padding string shown in step (a) of "Algorithm 2: Computing an encryption key" as input to this function.
c)Pass the first element of the file’s file identifier array (the value of the ID entry in the document’s trailer dictionary; see Table 15) to the hash function and finish the hash.
d)Encrypt the 16-byte result of the hash, using an RC4 encryption function with the encryption key from step (a).
e)Do the following 19 times: Take the output from the previous invocation of the RC4 function and pass it as input to a new invocation of the function; use an encryption key generated by taking each byte of the original encryption key obtained in step (a) and performing an XOR (exclusive or) operation between that byte and the single-byte value of the iteration counter (from 1 to 19).
f)Append 16 bytes of arbitrary padding to the output from the final invocation of the RC4 function and store the 32-byte result as the value of the U entry in the encryption dictionary.
In both cases step (a) of "Algorithm 2: Computing an encryption key" is referenced:
Algorithm 2: Computing an encryption key
a)Pad or truncate the password string to exactly 32 bytes. If the password string is more than 32 bytes long, use only its first 32 bytes; if it is less than 32 bytes long, pad it by appending the required number of additional bytes from the beginning of the following padding string:
< 28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A >
That is, if the password string is n bytes long, append the first 32 - n bytes of the padding string to the end of the password string. If the password string is empty (zero-length), meaning there is no user password, substitute the entire padding string in its place.
...

String Serialization in utf-8 using Node Buffer

I have a sql database storing a blob using unhex('6BFD3D0AFDFD4E01FDFD67703A34757F').
The server retrieves the blob and stores it in a Node Buffer as <Buffer 6b 8a 3d 0a 9b eb 4e 01 96 a6 67 70 3a 34 75 7f>.
The server serializes the buffer and send it to the client using buffer.toString() which defaults to utf8 encoding.
The client receives and deserializes the buffer using Buffer.from(buffer, 'utf8'), which results in <Buffer 6b ef bf bd 3d 0a ef bf bd ef bf bd 4e 01 ef bf bd ef bf bd 67 70 3a 34 75 7f> and then if I convert it back to hex using .toString('hex') I get 6BEFBFBD3D0AEFBFBDEFBFBD4E01EFBFBDEFBFBD67703A34757F.
So to sum it all up, if I do:
let startHex = "6BFD3D0AFDFD4E01FDFD67703A34757F"
let buffer = Buffer.from(hex, 'hex')
let endHex = Buffer.from(buffer.toString()).toString('hex').toUpperCase())
console.log(endHex)
The output is:
6BEFBFBD3D0AEFBFBDEFBFBD4E01EFBFBDEFBFBD67703A34757F
My question is why is startHex and endHex different? They aren't just different. They look similar except the endHex has extra characters. I know I get the correct output if I serialize the buffer between the server and the client using base64 or binary, but for my project it is easier if the client is able to figure out startHex given the serialized buffer using utf8. The reason is that I do not have access to the inner workings of the server which actually calls buffer.toString() before sending to the client, so I cannot change the encoding.
You have invalid UTF-8 characters in your original input. The invalid UTF-8 replacement character has bytes EFBFBD and you can see that several times in the output.

No padding for AES cipher in Java Card

In JavaCard 2.2.2 API, I can see that some symmetric ciphers are implemented with a padding mode, for example:
Cipher algorithm ALG_DES_CBC_ISO9797_M1 provides a cipher using DES in
CBC mode or triple DES in outer CBC mode, and pads input data
according to the ISO 9797 method 1 scheme.
But for the AES cipher, there is no padding mode that is available (ALG_AES_BLOCK_128_ECB_NOPAD and ALG_AES_BLOCK_128_CBC_NOPAD).
So how explain that it's not supported for this algorithm?
Are these padding methods vulnerable to known attacks using AES?
If other padding modes are available depends on the Java Card API you are using as well as the implementation details for the specific Java Card.
Later API's have:
a new getInstance method which can be used with PAD_PKCS5;
additional constants such as ALG_AES_CBC_PKCS5.
The special getInstance method was added because of the explosion of modes and padding methods.
Older API implementations may indeed not have these methods, but please again check availability.
AES itself is a block cipher. The different modes such as CBC use a cipher and a padding - so CBC_AES_PKCS7PADDING would be more logical in some sense. As a block cipher, AES is therefore not vulnerable to padding oracle attacks.
CBC on the other hand is vulnerable against padding oracle - and other plaintext oracle - attacks. So you should protect your IV and ciphertext with e.g. a AES-CMAC authentication tag if you need protection against these attacks.
That's however not a reason why the padding modes were not included. The different padding modes are certainly present now.
Not necessarily - it only means, that this algorithm does not automatically pad input data. You have to do it by yourself (probably pad it to multiples of 16 bytes, because this is what AES needs).
So how explain that it's not supported for this algorithm?
I don't know for sure, but note that there are several ways of doing this and maybe author decided that you have choose most suitable padding style for you.
If case you want to know more about padding, consider this example:
You have to encrypt a word "overflow" with AES.
First, you have to convert it to byte form, because this is what AES operates on.
ASCII encoded string "overflow" is
"6F 76 65 72 66 6C 6F 77 00"
(last byte is string terminator, AKA \0 or null byte)
Unfortunately, this is also insufficient for pure AES algorithm, because it can OLNY operate on whole blocks of data - like 16 byte block of data.
This means, you need 16-9=7 more bytes of data. So you pad your encoded string to full 16 bytes of data with null byte for example. Result is
"6F 76 65 72 66 6C 6F 77 00 00 00 00 00 00 00 00"
Now you choose your encryption key and encrypt data.
After you decrypt your data, you receive again
"6F 76 65 72 66 6C 6F 77 00 00 00 00 00 00 00 00"
And now the crux of the matter: how do you know, which bytes where originally in your string, and which are padding bytes?
In case of en/decrypting strings this is very simple because string (almost) always ends with null byte, and never have multiple consecutive null bytes at the end. So it's easy to determine where to cut your data.
More information about styles of "crypto-padding" you can find here: https://en.wikipedia.org/wiki/Padding_%28cryptography%29#Byte_padding

Base64url encoded representation puzzle

I'm writing a cookie authentication library that replicates that of an existing system. I'm able to create authentication tokens that work. However testing with a token with known value, created by the existing system, I encountered the following puzzle.
The original encoded string purports to be base64url encoded. And, in fact, using any of several base64url code modules and online tools, the decoded value is the expected result.
However base64url encoding the decoded value (again using any of several tools) doesn't reproduce the original string. Both encoded strings decode to the expected results, so apparently both representations are valid.
How? What's the difference?
How can I replicate the original encoded results?
original encoded string: YWRtaW46NTVGRDZDRUE6vtRbQoEXD9O6R4MYd8ro2o6Rzrc
my base64url decode: admin:55FD6CEA:[encrypted hash]
Encoding doesn't match original but the decoded strings match.
my base64url encode: YWRtaW46NTVGRDZDRUE677-977-9W0Lvv70XD9O6R--_vRh377-977-92o7vv73Otw
my base64url decode: admin:55FD6CEA:[encrypted hash]
(Sorry, SSE won't let me show the unicode representation of the hash. I assure you, they do match.)
This string:
YWRtaW46NTVGRDZDRUE6vtRbQoEXD9O6R4MYd8ro2o6Rzrc
is not exactly valid Base64. Valid Base64 consists in a sequence of characters among uppercase letters, lowercase letters, digits, '/' and '+'; it must also have a length which is a multiple of 4; 1 or 2 final '=' signs may appear as padding so that the length is indeed a multiple of 4. This string contains only Base64-valid characters, but only 47 of them, and 47 is not a multiple of 4. With an extra '=' sign at the end, this becomes valid Base64.
That string:
YWRtaW46NTVGRDZDRUE677-977-9W0Lvv70XD9O6R--_vRh377-977-92o7vv73Otw
is not valid Base64. It contains several '-' and one '_' sign, neither of which should appear in a Base64 string. If some tool is decoding that string into the "same" result as the previous string, then the tool is not implementing Base64 at all, but something else (and weird).
I suppose that your strings got garbled at some point through some copy&paste mishap, maybe related to a bad interpretation of bytes as characters. This is the important point: bytes are NOT characters.
It so happens that, traditionally, in older times, computers got on the habit of using so-called "code pages" which were direct mappings of characters onto bytes, with each character being encoded as exactly one byte. Thus came into existence some tools (such as Windows' notepad.exe) that purport to do the inverse, i.e. show the contents of a file (nominally, some bytes) as they character counterparts. This, however, fails when the bytes are not "printable characters" (while a code page such as "Windows-1252" maps each character to a byte value, there can be byte values that are not the mapping of a printable character). This also began to fail even more when people finally realized that there were only 256 possible byte values, and a lot more possible characters, especially when considering Chinese.
Unicode is an evolving standard that maps characters to code units (i.e. numbers), with a bit more than 100000 currently defined. Then some encoding rules (there are several of them, the most frequent being UTF-8) encode the characters into bytes. Crucially, one character can be encoded over several bytes.
In any case, a hash value (or whatever you call an "encrypted hash", which is probably a confusion, because hashing and encrypting are two distinct things) is a sequence of bytes, not characters, and thus is never guaranteed to be the encoding of a sequence of characters in any code page.
Armed with this knowledge, you may try to put some order into your strings and your question.
Edit: thanks to #marfarma for pointing out the URL-safe Base64 encoding where the '+' and '/' characters are replaced by '-' and '_'. This makes the situation clearer. When adding the needed '=' signs, the first string then decodes to:
00000000 61 64 6d 69 6e 3a 35 35 46 44 36 43 45 41 3a be |admin:55FD6CEA:.|
00000010 d4 5b 42 81 17 0f d3 ba 47 83 18 77 ca e8 da 8e |.[B.....G..w....|
00000020 91 ce b7 |...|
while the second becomes:
00000000 61 64 6d 69 6e 3a 35 35 46 44 36 43 45 41 3a ef |admin:55FD6CEA:.|
00000010 bf bd ef bf bd 5b 42 ef bf bd 17 0f d3 ba 47 ef |.....[B.......G.|
00000020 bf bd 18 77 ef bf bd ef bf bd da 8e ef bf bd ce |...w............|
00000030 b7 |.|
We now see what happened: the first string was decoded to bytes but someone fed these bytes to some display system or editors that really expected UTF-8. Some of these bytes were not valid UTF-8 encoding of anything, so they were replaced with the Unicode code point U+FEFF ZERO WIDTH NO-BREAK SPACE, i.e. a space character with no width (thus, nothingness on the screen). The characters where then reencoded as UTF-8, each U+FEFF yielding the EF BF BD sequence of three bytes.
Therefore, the hash value was badly mangled, but the bytes that were altered show up as nothing when interpreted (wrongly) as characters, and what was put in their place also shows up as nothing. Hence no visible difference on the screen.