String Serialization in utf-8 using Node Buffer - sql

I have a sql database storing a blob using unhex('6BFD3D0AFDFD4E01FDFD67703A34757F').
The server retrieves the blob and stores it in a Node Buffer as <Buffer 6b 8a 3d 0a 9b eb 4e 01 96 a6 67 70 3a 34 75 7f>.
The server serializes the buffer and send it to the client using buffer.toString() which defaults to utf8 encoding.
The client receives and deserializes the buffer using Buffer.from(buffer, 'utf8'), which results in <Buffer 6b ef bf bd 3d 0a ef bf bd ef bf bd 4e 01 ef bf bd ef bf bd 67 70 3a 34 75 7f> and then if I convert it back to hex using .toString('hex') I get 6BEFBFBD3D0AEFBFBDEFBFBD4E01EFBFBDEFBFBD67703A34757F.
So to sum it all up, if I do:
let startHex = "6BFD3D0AFDFD4E01FDFD67703A34757F"
let buffer = Buffer.from(hex, 'hex')
let endHex = Buffer.from(buffer.toString()).toString('hex').toUpperCase())
console.log(endHex)
The output is:
6BEFBFBD3D0AEFBFBDEFBFBD4E01EFBFBDEFBFBD67703A34757F
My question is why is startHex and endHex different? They aren't just different. They look similar except the endHex has extra characters. I know I get the correct output if I serialize the buffer between the server and the client using base64 or binary, but for my project it is easier if the client is able to figure out startHex given the serialized buffer using utf8. The reason is that I do not have access to the inner workings of the server which actually calls buffer.toString() before sending to the client, so I cannot change the encoding.

You have invalid UTF-8 characters in your original input. The invalid UTF-8 replacement character has bytes EFBFBD and you can see that several times in the output.

Related

How to format/decode service logs from Docker API

I'm trying to get logs from the Docker API at this endpoint. I'm just trying to get the logs returned as a string, not using the websocket option. It mostly works, but the string contains strange characters that I'm not sure what to do with.
I'm using Axios, with Express, like so:
let result = await AXIOS.get(`http://${managerNodeIPAddress}/services/${idForLogs}/logs?stdout=true&stderr=true`);
and if I console.log(result), the data property looks like this:
data: '\x01\x00\x00\x00\x00\x00\x00#Example app listening on port 5000\n' +
'\x01\x00\x00\x00\x00\x00\x00\x1F[16/4/2022-21:05:02] GET/: 200\n' +
'\x01\x00\x00\x00\x00\x00\x00\x1F[16/4/2022-21:05:43] GET/: 200\n' +
'\x01\x00\x00\x00\x00\x00\x00\x1F[16/4/2022-21:05:44] GET/: 200\n' +
'\x01\x00\x00\x00\x00\x00\x00\x1F[16/4/2022-21:06:33] GET/: 200\n' +
// ...
and if I console.log(result.data), it looks like this:
<Buffer 01 00 00 00 00 00 00 23 45 78 61 6d 70 6c 65 20 61 70 70 20 6c 69 73 74 65 6e 69 6e 67 20 6f 6e 20 70 6f 72 74 20 35 30 30 30 0a 01 00 00 00 00 00 00 ... 972 more bytes>
If I send along this response, and try to view it response in Postman, or elsewhere, the viewer doesn't know what to do with the initial \x01-type strings:
I gather that they are escaped binary, or something along those lines, and I need to change something about my request headers, or parse the axios response, in a particular way, to deal with this. I would be happy either
decoding those characters into whatever they are supposed to be (I've tried "decoding" the buffer, using toString('utf-8), etc, but that doesn't seem to get rid of the characters, so they still show up strange when passed along and viewed in certain contexts.). OR,
getting rid of those characters entirely (I tried to do the later with the replace method, but it isn't working for some reason).
I've never dealt with this before, so the world of encoding/decoding things like this feels a bit mysterious, and I would appreciate any pointers anyone might have.
I think I was able to figure this out. As I understand things now, the 00 and 01 are (or represent?) hex bytes, which correspond to the ASCII characters SOH (start of header) and NUL (null). The mac terminal doesn't have trouble interpreting them, but it appears some other applications do. I was able to get rid of them by filtering the buffer array, like so:
let logs = result.data.filter(byte => byte !== 01 && byte !== 00).toString();
I was honestly a little surprised this worked, but it seems to have. It doesn't affect how the logs look in the terminal, and they look fine in Postman now.

pdfbox encrypted file opens despite the password being wrong

When downloading the pdf file, I specified the password "123456789987654abc211234567899klm7654321". When opening, I can remove a few characters, for example,
"123456789987654abc211234567899kl" - file will open anyway! But if I use
"123456789987654abc211234567899k" - file not open
help me understand what is the problem
private static void encryptPdf(
InputStream inputStream,
OutputStream outputStream,
String ownerPassword,
String userPassword) throws Exception
{
PDDocument document = PDDocument.load(inputStream);
if (document.isEncrypted())
{
return;
}
AccessPermission accessPermission = new AccessPermission();
StandardProtectionPolicy spp =
new StandardProtectionPolicy(ownerPassword, userPassword, accessPermission);
spp.setEncryptionKeyLength(40);
document.protect(spp);
document.save(outputStream);
document.close();
}
The first step in calculating the encryption key from the password for pdf encryption up to revision 4 is
The password string is generated from host system codepage characters (or system scripts) by first converting the string to PDFDocEncoding. If the input is Unicode, first convert to a codepage encoding, and then to PDFDocEncoding for backward compatibility. Pad or truncate the resulting password string to exactly 32 bytes. If the password string is more than 32 bytes long, use only its first 32 bytes; if it is less than 32 bytes long, pad it by appending the required number of additional bytes from the beginning of the following padding string:
<28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A>
That is, if the password string is n bytes long, append the first 32 - n bytes of the padding string to the end of the password string. If the password string is empty (zero-length), meaning there is no user password, substitute the entire padding string in its place.
(ISO 32000-2 section 7.6.4.3.2 "Algorithm 2: Computing a file encryption key in order to encrypt a document (revision 4 and earlier)")
For more modern encryption types you have a restriction, too, but generally less harsh:
The UTF-8 password string shall be generated from Unicode input by processing the input string with the SASLprep (Internet RFC 4013) profile of stringprep (Internet RFC 3454) using the Normalize and BiDi options, and then converting to a UTF-8 representation.
Truncate the UTF-8 representation to 127 bytes if it is longer than 127 bytes.
(ISO 32000-2 section 7.6.4.3.3 "Algorithm 2.A: Retrieving the file encryption key from an encrypted document in order to decrypt it (revision 6 and later)")

Base64url encoded representation puzzle

I'm writing a cookie authentication library that replicates that of an existing system. I'm able to create authentication tokens that work. However testing with a token with known value, created by the existing system, I encountered the following puzzle.
The original encoded string purports to be base64url encoded. And, in fact, using any of several base64url code modules and online tools, the decoded value is the expected result.
However base64url encoding the decoded value (again using any of several tools) doesn't reproduce the original string. Both encoded strings decode to the expected results, so apparently both representations are valid.
How? What's the difference?
How can I replicate the original encoded results?
original encoded string: YWRtaW46NTVGRDZDRUE6vtRbQoEXD9O6R4MYd8ro2o6Rzrc
my base64url decode: admin:55FD6CEA:[encrypted hash]
Encoding doesn't match original but the decoded strings match.
my base64url encode: YWRtaW46NTVGRDZDRUE677-977-9W0Lvv70XD9O6R--_vRh377-977-92o7vv73Otw
my base64url decode: admin:55FD6CEA:[encrypted hash]
(Sorry, SSE won't let me show the unicode representation of the hash. I assure you, they do match.)
This string:
YWRtaW46NTVGRDZDRUE6vtRbQoEXD9O6R4MYd8ro2o6Rzrc
is not exactly valid Base64. Valid Base64 consists in a sequence of characters among uppercase letters, lowercase letters, digits, '/' and '+'; it must also have a length which is a multiple of 4; 1 or 2 final '=' signs may appear as padding so that the length is indeed a multiple of 4. This string contains only Base64-valid characters, but only 47 of them, and 47 is not a multiple of 4. With an extra '=' sign at the end, this becomes valid Base64.
That string:
YWRtaW46NTVGRDZDRUE677-977-9W0Lvv70XD9O6R--_vRh377-977-92o7vv73Otw
is not valid Base64. It contains several '-' and one '_' sign, neither of which should appear in a Base64 string. If some tool is decoding that string into the "same" result as the previous string, then the tool is not implementing Base64 at all, but something else (and weird).
I suppose that your strings got garbled at some point through some copy&paste mishap, maybe related to a bad interpretation of bytes as characters. This is the important point: bytes are NOT characters.
It so happens that, traditionally, in older times, computers got on the habit of using so-called "code pages" which were direct mappings of characters onto bytes, with each character being encoded as exactly one byte. Thus came into existence some tools (such as Windows' notepad.exe) that purport to do the inverse, i.e. show the contents of a file (nominally, some bytes) as they character counterparts. This, however, fails when the bytes are not "printable characters" (while a code page such as "Windows-1252" maps each character to a byte value, there can be byte values that are not the mapping of a printable character). This also began to fail even more when people finally realized that there were only 256 possible byte values, and a lot more possible characters, especially when considering Chinese.
Unicode is an evolving standard that maps characters to code units (i.e. numbers), with a bit more than 100000 currently defined. Then some encoding rules (there are several of them, the most frequent being UTF-8) encode the characters into bytes. Crucially, one character can be encoded over several bytes.
In any case, a hash value (or whatever you call an "encrypted hash", which is probably a confusion, because hashing and encrypting are two distinct things) is a sequence of bytes, not characters, and thus is never guaranteed to be the encoding of a sequence of characters in any code page.
Armed with this knowledge, you may try to put some order into your strings and your question.
Edit: thanks to #marfarma for pointing out the URL-safe Base64 encoding where the '+' and '/' characters are replaced by '-' and '_'. This makes the situation clearer. When adding the needed '=' signs, the first string then decodes to:
00000000 61 64 6d 69 6e 3a 35 35 46 44 36 43 45 41 3a be |admin:55FD6CEA:.|
00000010 d4 5b 42 81 17 0f d3 ba 47 83 18 77 ca e8 da 8e |.[B.....G..w....|
00000020 91 ce b7 |...|
while the second becomes:
00000000 61 64 6d 69 6e 3a 35 35 46 44 36 43 45 41 3a ef |admin:55FD6CEA:.|
00000010 bf bd ef bf bd 5b 42 ef bf bd 17 0f d3 ba 47 ef |.....[B.......G.|
00000020 bf bd 18 77 ef bf bd ef bf bd da 8e ef bf bd ce |...w............|
00000030 b7 |.|
We now see what happened: the first string was decoded to bytes but someone fed these bytes to some display system or editors that really expected UTF-8. Some of these bytes were not valid UTF-8 encoding of anything, so they were replaced with the Unicode code point U+FEFF ZERO WIDTH NO-BREAK SPACE, i.e. a space character with no width (thus, nothingness on the screen). The characters where then reencoded as UTF-8, each U+FEFF yielding the EF BF BD sequence of three bytes.
Therefore, the hash value was badly mangled, but the bytes that were altered show up as nothing when interpreted (wrongly) as characters, and what was put in their place also shows up as nothing. Hence no visible difference on the screen.

What is the exact procedure to perform external authentication?

I am trying to perform external authentication on smart card, I got the 8 byte challenge from the card and then I need to generate the card cryptogram on that 8 bytes.
But I don't know how to perform that cryptogram operation (smartcard tool kit converting 8 bytes to 72 bytes).
The following commands are generated by the tool kit
00 A4 04 00 0C A0 00 00 02 43 00 13 00 00 00 01 04
00 22 41 A4 06 83 01 01 95 01 80
command: 80 84 00 00 08 Response: (8 bytes challenge)
command: 80 82 00 00 48 (72 bytes data)
Can any body say what are the steps to follow to convert 8 byte challenge to 72 bytes ?
Conversion is not exactly the right term. You need to apply the cryptographic algorithm with the correct key to the received challenge. I assume, that an External Authenticate command is performed, but the strange data field length allows no assumption on the algorithm used. Possibly an external challenge is also provided in the command and session keys are established. Since the assumed Get Challenge command and the External Authenticate command have a class byte indicating a proprietary command, ISO 7816-4 won't help here and you need to refer to the card specification. To get knowledge of the key you probably have to sign a non-disclosure agreement with the card issuer.

Redirection rules with special characters

I want to use redirect 301 rules (i.e. I hope to be able to avoid rewriting rules) to redirect URLs that contain special characters (like é, à ,...) like for instance
redirect 301 /éxàmple http://mydomain.com/example
However, simply adding this doesn't work. Any suggestions?
How to troubleshoot this on a Windows system
On Windows, you can use Notepad++ to enter Unicode characters correctly. After launching Notepad++, select 'Encoding in UTF-8 without BOM' from the 'Encoding' menu, then type your Unicode characters and save the file.
To make sure that the characters have been saved properly, download a hex editor for Windows and make sure that é is saved as c3 89 and à is saved as c3 a0.
Previous response where I assumed that you are on a Linux system
Most likely the Unicode characters have not been saved properly in .htaccess file.
What do you get when you try this command:
grep -o .x.mple .htaccess | od -t x1 -c
You should get this if your Unicode characters are saved correctly.
0000000 c3 a9 78 c3 a0 6d 70 6c 65 0a 65 78 61 6d 70 6c
303 251 x 303 240 m p l e \n e x a m p l
0000020 65 0a
e \n
0000022
If you have xxd or hd installed, you can get a neater output to do your troubleshooting:
$ grep -o .x.mple .htaccess | xxd -g1
0000000: c3 a9 78 c3 a0 6d 70 6c 65 0a 65 78 61 6d 70 6c ..x..mple.exampl
0000010: 65 0a e.
In all the outputs you can see that é is saved as the binary numbers: c3 89. You can see from http://www.fileformat.info/info/unicode/char/e9/index.htm that the é when encoded in UTF-8 is indeed two-bytes: 0xC3 and 0xA9.
Similarly, à in UTF-8 format is: 0xC3 0xA0. See http://www.fileformat.info/info/unicode/char/e0/index.htm. You can see these codes in the output as well.
These should work, but it depends on some things that you have to check as a checklit:
Do you have mod_alias enabled? If not, you should run a2enmod mod_alias
Do you have some redirection to your example page? (Redirections are applied before aliases)
Then, instead of converting it to UTF-8, you can try to put the characters as they're encoded by browsers, for example %C3%A9 for é, etc.