How does Opera Turbo compress the data (cache)? [closed] - opera

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have an Opera browser with "Opera Turbo" enabled. It is a proxy, which recompress HTML into smaller format. I have a file from opera cache, which was compressed by turbo from 2000 kb to 500 kb. How can I uncompress this file into readable form (the original file have almost no html tags, just 8bit text, "<p>" tags, and html header/footer)?
Here is an example of such file:
.opera$ hexdump -C cache/turbo/g_0000/opr00003.tmp
00000000 78 da 6c 8f bf 4e c4 30 0c c6 67 fa 14 26 48 6c |xзl▐©Nд0.фgЗ.&Hl|
00000010 a1 1c 12 d3 25 1d f8 37 82 54 f1 02 69 63 48 74 |║..с%.Ь7┌TЯ.icHt|
00000020 69 52 12 97 d2 b7 ed 88 40 80 b8 05 06 06 7a 57 |iR.≈р╥М┬#─╦...zW|
00000030 09 21 84 27 fb f3 cf 9f 6d 61 a8 71 45 26 0c 2a |.!└'ШСо÷ma╗qE&.*|
00000040 5d 64 3b a2 41 52 60 88 5a 8e 77 9d bd 97 ec 34 |]d;╒AR`┬Z▌w²╫≈Л4|
00000050 78 42 4f fc 7a 68 91 41 3d 57 92 11 3e 50 be 99 |xBOЭzh▒A=W▓.>P╬≥|
00000060 5d 42 6d 54 4c 48 b2 b7 5e 87 3e f1 c5 d1 f1 82 |]BmTLH╡╥^┤>ЯеяЯ┌|
00000070 fd 78 79 d5 a0 64 1a 53 1d 6d 4b 36 f8 5f 26 ef |Щxyу═d.S.mK6Ь_&О|
00000080 eb 71 fd f5 f8 97 5d e1 d0 87 a8 d3 ff 20 59 72 |КqЩУЬ≈]Ап┤╗сЪ Yr|
00000090 58 94 5d 4a 56 41 f0 40 06 e1 12 09 f6 1b ad 92 |X■]JVAП#.А..Ж.╜▓|
000000a0 59 c2 8c 8a 7c e6 32 91 cf 9f 09 67 fd 0a 22 3a |Yб▄┼|Ф2▒о÷.gЩ.":|
...
and here is a part of original file (I'm not sure is it the really original file or not, but very likely it is):
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
<meta name="description" content="статьи">
<meta name="keywords" content="статьи">
<title>Russia on the Net — статьи</title>
</head>
<link rel="stylesheet" href="/rus/style.css">
<body bgcolor="#FFFFFF">
<center>
...
Size of compressed file is 3397 and of original ~ 8913 bytes. Original file is compressible by bzip2 to 3281 byte; by gzip to 3177 byte; by lzma to 2990 byte; by 7z to 3082 byte; by zip to 3291 byte.
Update: I have information (from chrome opera-mini extension http://ompd-proxy.narod.ru/distrib/opera_mini_proxy.crx - unpack it with 7-zip) that opera mini uses this to unpack data webodf/src/core_RawInflate.js Can this file help me?

The first two bytes 78 DA are a valid 2 byte zLib header (see section 2.2 on CMF and FLG) that precedes deflate compressed data. So the file could be compressed using zLib/deflate.
For a first quick test, you can use my command-line tool Precomp like this:
precomp -v -c- -slow opr00003.tmp
It will report zLib compressed streams and how big they are decompressed ("... can be decompressed to ... bytes"). If this is successful (returns a decompressed size close to the original filesize you know), use your favourite programming language along with the zLib library to decompress your data.
Also note that if you're lucky, the stream (or a part of it) can be recompressed bit-to-bit identical by Precomp and the output file opr00003.pcf contains (a part of) the decompressed data preceded by a small header.
EDIT: As osgx commented and further analysis showed, the data can not be decompressed using zLib/deflate, so this is still an unsolved case.
EDIT2: The update and especially the linked JS show that it is deflate, but it seems to be some custom variant. Comparison with the original code could help as well as comparison to original zLib source code.
Additionally, the JS code could of course be used to try to decompress the data. It doesn't seem to handle the 2 byte header, though, so perhaps these have to be skipped.

There are different file types in opera turbo cache. The first one is cited in question; some files are unpacked (css and js), and there is Z-packed multifile tar-like archive for images (VP8, detected by plain-text RIFF,WEBP,VP8 magics):
Example of Z-packed file header:
5a 03 01 1c 90 02 0a 22 03 18 2a (RIFF data first img) (RIFF data second img)
(RIFF data third img)
RIFF container is clearly visible and it has length field, so I suggest a description:
5a - magic of format
03 - number of files
01 - first file (riff size=0x1c90)
1c 90 - big-endian len of first file
02 - second file (riff size=0a22)
0a 22 - len of second file
03 - third file (riff size=182a)
18 2a
52 49 46 46 == "RIFF" magic of first file
Another example of Z-file with JPGs ("JFIF" magic is visible, ffd8ff jpeg-marker is invisible; 8 files inside):
0000000: 5a08 0118 de02 1cab 0308 0804 162c 0531 Z............,.1
0000010: 4d06 080f 070a 4608 0964"ffd8 ffe0 0010 M.....F..d......
0000020: 4a46 4946 0001 0101 0060 0060 0000 ffdb JFIF.....`.`....
Another detected (by file) type of file is "<000"-file with example header of (hex) "1f 8b 08 00 00 00 00 00 02 ff ec 52 cb 6a c3 30 10 fc 15 63".
file says it is "gzip compressed data, max compression", and it is just unpacked by any gzip.

Related

Generating PDF user password hash

Currently, I am attempting to generating a hash of a user password for PDF, given the encrypted PDF file and the plain password. I follow the instruction of this article. However, the hash I've computed is different from the hash stored in the PDF file.
The hashed user password (/U entry) is simply the 32-byte padding
string above, encrypted with RC4, using the 5-byte file key. Compliant
PDF viewers will check the password given by the user (by attempting
to decrypt the /U entry using the file key, and comparing it against
the padding string) and allow or refuse certain operations based on
the permission settings.
First, I padded my password "123456" using a hardcoded 32-byte string, which gives me
31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
I tried to compute the hash with RC4 using the 5-byte file key as the key. According to the article:
The encryption key is generated as follows:
1. Pad the user password out to 32 bytes, using a hardcoded
32-byte string:
28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A
If the user password is null, just use the entire padding
string. (I.e., concatenate the user password and the padding
string and take the first 32 bytes.)
2. Append the hashed owner password (the /O entry above).
3. Append the permissions (the /P entry), treated as a four-byte
integer, LSB first.
4. Append the file identifier (the /ID entry from the trailer
dictionary). This is an arbitrary string of bytes; Adobe
recommends that it be generated by MD5 hashing various pieces
of information about the document.
5. MD5 hash this string; the first 5 bytes of output are the
encryption key. (This is a 40-bit key, presumably to meet US
export regulations.)
I appended the hashed owner key to the padded password, which gives me
31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
C4 31 FA B9 CC 5E F7 B5 9C 24 4B 61 B7 45 F7 1A
C5 BA 42 7B 1B 91 02 DA 46 8E 77 12 7F 1E 69 D6
Then, I appended the /P entry (-4), treated as a four-byte integer, encoded with little endian, which gives me
31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
C4 31 FA B9 CC 5E F7 B5 9C 24 4B 61 B7 45 F7 1A
C5 BA 42 7B 1B 91 02 DA 46 8E 77 12 7F 1E 69 D6
FC FF FF FF
Last, I appended the file identifier to it. The trailer of my PDF is:
trailer
<<
/Size 13
/Root 2 0 R
/Encrypt 1 0 R
/Info 4 0 R
/ID [<B5185D941CC0EA39ACA809F661EF36D4> <393BE725532F9158DC9E6E8EA97CFBF0>]
>>
and the result is
31 32 33 34 35 36 28 BF 4E 5E 4E 75 8A 41 64 00
4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C
C4 31 FA B9 CC 5E F7 B5 9C 24 4B 61 B7 45 F7 1A
C5 BA 42 7B 1B 91 02 DA 46 8E 77 12 7F 1E 69 D6
FC FF FF FF B5 18 5D 94 1C C0 EA 39 AC A8 09 F6
61 EF 36 D4 39 3B E7 25 53 2F 91 58 DC 9E 6E 8E
A9 7C FB F0
MD5 hashing this block of data returns 942c5e7b2020ce57ce4408f531a65019. I RC4-ed the padded password with cryptii using the first 5 bytes of the MD5 hash as the key. However, it returns
90 e2 b5 21 2a 7d 53 05 70 d9 5d 26 95 c7 c2 05
6e 2a 28 40 63 e7 4a d4 e9 05 86 71 43 d1 39 d6
while the hash in PDF is
58 81 CA 74 65 DC 2E A7 5D D2 39 D4 43 9C 0D DE
28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
Which step am I doing wrong? I suspect that the problem happens because
I am appending the File Idenifier in a wrong format
I am using the wrong drop bytes with RC4.
The hash function is not for PDF 1.6
I make some mistake during those process
Or maybe the article is actually wrong
Files: Original PDF dummy.pdf, dummy-protected.pdf (Password: 123456)
Please help
There are two issues in your calculation:
The article to use refers to PDF encryption algorithms available for PDF-1.3 but your document is encrypted using an algorithm introduced with PDF-1.5.
You make an error when appending the file identifier - actually only the first entry of the ID array shall be appended, not both (which is not really clear from the article you use).
In a comment you asked accordingly
where can I find the password hashing detail for >V1.3 PDF?
I would propose using the PDF specification, ISO 32000.
As ISO specifications go, they are not free, but Adobe used to provide a version of ISO 32000-1 with merely the ISO header removed on their web site. Some days ago it has been removed (By design? By error? I don't know yet.) but you still find copies of it googl'ing for "PDF32000".
The relevant section in ISO 32000-1 is 7.6 Encryption and in particular 7.6.3 Standard Security Handler.
Following that information you should be able to correctly calculate the value in question.
(Alternatively you can also use old Adobe PDF references, the editions for PDF 1.5, 1.6, and 1.7 should also give you the information required for decrypting your document. But these references have been characterized as not normative in nature by prominent Adobe employees, so I would go for the ISO norm.)
Beware, though: After ISO 32000-1 had been published, Adobe introduced an AES-256 encryption scheme as an extension which obviously is not included in ISO 32000-1. You can find a specification in "Adobe Supplement to ISO 32000, base version 1.7, extension level 3".
Furthermore, with ISO 32000-2 that Adobe AES-256 encryption scheme and all older schemes became deprecated, the only encryption scheme to use with PDF-2.0 is a new AES-256 encryption scheme described in ISO 32000-2 which is based on the Adobe scheme but introduces some extra hashing iterations.

Retreive word, pdf from archived database in blob column

We have ended our contract with a Saas tool. We have received our archived data as a Oracle database with word & pdf documents in a Blob column. We were told that the Blob data is in Base64 binary and needs to be decoded before downloading to a file. When looking at the data in the Blob column it looks like this:
í]pÅ™þgµ«•,­´–‘°
8„I–
c#Y’-Ù²ül‡àF»#íˆÝõήŒHRˆ#! qsG\¹‹I ¹ÜåçQ<ª #G.<'Áªâ«
÷È«øB‘J¼÷ýÝ3»#iW–d‡×Í/}ÛÓÓÝÿß÷ßwïÎãðç½ç›ÿƒ&ÐZ*£¹J*wS€
N$LÔmŸ;‘ËåøÔz çÑÛŠ~wß÷èüë*ýDÇêÍ÷,ßÐ#TCÃÃ÷ÔßS?ÑBˆ*ý
Ô|Ñã7I4ÔLÎã¦\®ö¤ÇÝ+>U?
The same as hexdump:
00000000 1f 8b 08 00 00 00 00 00 00 00 ed 5d 0b 70 1c c5 |...........].p..|
00000010 99 fe 67 b5 ab 95 2c ad b4 96 91 b0 8d 0d 03 38 |..g...,........8|
00000020 84 18 49 96 0d 06 63 07 23 59 92 2d d9 b2 fc 90 |..I...c.#Y.-....|
00000030 6c 87 e0 03 46 bb 23 ed 88 dd 9d f5 ce ae 8c 48 |l...F.#........H|
00000040 52 88 40 02 21 09 71 12 73 47 5c b9 8b 49 20 b9 |R.#.!.q.sG\..I .|
00000050 dc e5 12 e7 51 3c aa 20 90 23 47 2e 3c 12 27 c1 |....Q<. .#G.<.'.|
Have tried Base64 library in PL/SQl, Java, Linux to decode but does not work.
Java:
byte[] decoded = Base64.decodeBase64(Files.readAllBytes(Paths.get(fileName)));
byte[] decodedBytes = Base64.decodeBase64(input_file);
PL/SQL:
utl_file.put(l_output, utl_raw.cast_to_varchar2(vblob));
utl_file.putl_output, utl_raw.cast_to_varchar2(utl_encode.base64_decode(vblob)));
raw_decoded := utl_encode.base64_decode(vblob);
utl_file.put_raw(l_output, utl_raw.cast_to_varchar2(raw_decoded));
Could you advise if this is a Base64 data or what format it is in? The expected result is to render blob in MS Word, PDF documents.
I have resolved this issue. Turns out the data in the blob column was direct document (word or pdf) but was compressed. So when querying the blob column used utl_compress to decompress and then directly utl_file output to a physical file.
select UTL_COMPRESS.LZ_UNCOMPRESS(BLOB_COLUMN) into v_blob from TABLE where ID =
Thank you to all who responded.

What are the parts ECDSA entry in the 'known_hosts' file?

I'm trying to extract an ECDSA public key from my known_hosts file that ssh uses to verify a host. I have one below as an example.
This is the entry for "127.0.0.1 ecdsa-sha2-nistp256" in my known_hosts file:
AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBF3QCzKPRluwunLRHaFVEZNGCPD/rT13yFjKiCesA1qoU3rEp9syhnJgTbaJgK70OjoT71fDGkwwcnCZuJQPFfo=
I ran it through a Base64 decoder to get this:
���ecdsa-sha2-nistp256���nistp256���A]2F[rUF=wXʈ'ZSzħ2r`M::WL0rp
So I'm assuming those question marks are some kind of separator (no, those are lengths). I figured that nistp256 is the elliptical curve used, but what exactly is that last value?
From what I've been reading, the public key for ECDSA has a pair of values, x and y, which represent a point on the curve. Is there some way to extract x and y from there?
I'm trying to convert it into a Java public key object, but I need x and y in order to do so.
Not all of characters are shown since they are binary. Write the Base64-decoded value to the file and open it in a hex editor.
The public key for a P256 curve should be a 65-byte array, starting from the byte with value 4 (which means a non-compressed point). The next 32 bytes would be the x value, and the next 32 the y value.
Here is the result in hexadecimal:
Signature algorithm:
00 00 00 13
65 63 64 73 61 2d 73 68 61 32 2d 6e 69 73 74 70 32 35 36
(ecdsa-sha2-nistp256)
Name of domain parameters:
00 00 00 08
6e 69 73 74 70 32 35 36
(nistp256)
Public key value:
00 00 00 41
04
5d d0 0b 32 8f 46 5b b0 ba 72 d1 1d a1 55 11 93 46 08 f0 ff ad 3d 77 c8 58 ca 88 27 ac 03 5a a8
53 7a c4 a7 db 32 86 72 60 4d b6 89 80 ae f4 3a 3a 13 ef 57 c3 1a 4c 30 72 70 99 b8 94 0f 15 fa
So you first have the name of the digital signature algorithm to use, then the name of the curve and then the public component of the key, represented by an uncompressed EC point. Uncompressed points start with 04, then the X coordinate (same size as the key size) and then the Y coordinate.
As you can see, all field values are preceded by four bytes indicating the size of the field. All values and fields are using big-endian notation.

Extra byte(s) at the end of SSL Packet (beyond the length of the packet)

My application is using SSL over SMTP.
But I faced a problem of extra byte at the end.
The packet which I recieved is as follows: (Hex dump of SSL Record packet)
17 03 01 01 00 9A 07 74 E3 4B E0 07 17 71 38 BF 29 7E 70
E9 14 CC B1 97 77 4C B9 AB A0 9F 88 7B D4 ED 14 8E 97 F2
5A BE 46 56 D4 12 BC 15 01 49 EE CE A1 ED 3F D3 6E 7F AA
DC 6B DF 41 11 74 7B 55 B8 D3 3E 8D EF 96 52 B0 BD 50 35
09 E7 2A FF 0E 39 58 C7 91 99 95 22 6F B0 73 57 28 B4 EA
C6 28 4C DC 5C DA 6C 31 FB 63 71 7D 08 F0 DD 78 C4 08 C5
27 90 04 C7 09 59 E4 83 F4 4D 9A 7B 65 E9 AF 38 44 B4 CD
9E 4D BE 80 0D 07 24 8D C3 79 99 DC 02 81 D7 97 21 16 0B
28 44 82 ED E4 5F E6 91 81 A5 28 C1 C8 92 60 36 4E DE 27
AF D0 2B EE FB 9D 12 9C 2B 4F 3F 29 F2 04 8F DC 21 39 4F
80 23 7E 78 3C A0 29 E0 67 E7 9F 90 B6 1F D4 08 63 3E CE
73 E1 17 72 8D B1 8C 3D A8 59 C0 0F 03 59 7A A6 5D F9 7A
40 57 D6 8D 94 48 93 BF D8 17 C6 70 79 36 13 D0 F1 D1 D2
69 D4 05 9D 67 86 6D E9 66 D0 83 4A D8 5E 20
The length of this packet as seen from SSL 3.1 protocol is 256 Bytes.
But there is one extra byte at the end (shown in bold at the end).
Due to this extra byte at the end, when next packet is being read, then this 20 is also read and causes error of SSL_R_WRONG_VERSION_NUMBER (I am using OpenSSL Library for SSL).
Next packet which I recieved is like (as per packet sniffer)
17 03 01 00 18 ...
But when next read is being done, OpenSSL reads packet as 20 17 03 01 .. which causes the error (since 17 03 is wrong version for 03 01)
I would like to know if this (extra byte at the end) is a part of SSL standard.
Please suggest me how to handle this case in OpenSSL. OpenSSL version is 1.0.0.
No. The extra byte is not as a part of SSL Standard.
As per SSL Standard (RFC 2246 for TLS 1.0, Latest is RFC 5246 for TLS 1.2) the record of SSL is as below:
struct {
ContentType type;
ProtocolVersion version;
uint16 length;
select (CipherSpec.cipher_type) {
case stream: GenericStreamCipher;
case block: GenericBlockCipher;
} fragment;
} TLSCiphertext;
The fragment will be exactly of the length as specified by uint16 length member. So, the 20 must be getting inserted either incorrectly by the Server Implementation, or some other software in the middle is inserting it when the data is in network.
Openssl reads exactly the number of bytes as specified by uint16 length member which is why it doesn't read 20.
Some of the points which you can focus on are:
1. Does this happen with the first application data packet which is transferred immediately after handshake? (From the content type I assumed this packet dump is for application data)
2. Is this a random occurance? Do all connections with that particular server exhibit the same behavior?
3. You can try to get the dump of the packet sent at the Server to see if 20 is present when the packet is being sent at the Server side itself or it is getting added during it's flight.
4. Could there be a Firewall related problem? (I don't know about Firewall, so didn't give more details here)
Hope this helps!
I was bashing my head with this one today; finally resorted to this:
_sslStream.Write(merged, 0, merged.Length - 1)
Problem solved, move along!

How can I reconstitute a text file saved in a browser cache, gzipped?

I just lost a couple of days of work to a crashing editor. My file is now an empty file, and the last backup I have is from 4 days ago.
I have the CSS file saved in my Chromium's cache, but it looks like this:
http://myserver.example.com/style.css
HTTP/1.1 200 OK
Date: Mon, 04 Jul 2011 05:18:25 GMT
Last-Modified: Mon, 04 Jul 2011 01:10:47 GMT
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 7588
Content-Type: text/css
00000000: 5e 01 00 00 02 08 00 00 be 45 ba c7 cd 05 2e 00 ^........E......
00000010: 25 68 d9 c7 cd 05 2e 00 1d 01 00 00 48 54 54 50 %h..........HTTP
00000020: 2f 31 2e 31 20 32 30 30 20 4f 4b 00 44 61 74 65 /1.1 200 OK.Date
00000030: 3a 20 4d 6f 6e 2c 20 30 34 20 4a 75 6c 20 32 30 : Mon, 04 Jul 20
00000040: 31 31 20 30 35 3a 31 38 3a 32 35 20 47 4d 54 00 11 05:18:25 GMT.
(etc)
00000000: 1f 8b 08 00 00 00 00 00 00 03 cd 3d fd 8f db b6 ...........=....
00000010: 92 3f d7 7f 05 2f 8b 22 ed c2 f2 87 fc b1 6b 2f .?.../."......k/
00000020: 1a a0 09 5e 1e f0 5e 7b 57 34 c5 dd 0f 87 83 21 ...^..^{W4.....!
00000030: db f2 5a 89 6c f9 49 72 36 5b 63 ff f7 e3 b7 86 ..Z.l.Ir6[c.....
00000040: e4 50 1f 9b 4d ef 52 34 b1 65 71 66 38 1c ce 0c .P..M.R4.eqf8...
00000050: 87 c3 e1 f0 9a fc e3 9c 1e c9 3f e2 94 fc b1 8f ..........?.....
The entire file seems to be there, and I can get the text.
I'd like to get back the plain CSS file somehow. I tried extracting the data, but gzip says it isn't gzip format. But it doesn't seem to be gzip encoded (it's not binary, after all...). Is it base64 or something? I've had a hard time finding any info on this.
Try finding the gzip header by extracting the hex data into an editor and searching for the header as per gzip specification. You should be able to do this by finding the end of the response body and selecting the previous 7588 bytes (you have this info in the response headers: Content-Length: 7588) - this should be the first character of the header.
Depending on the flags set in the header, gzip'd files may be ASCII or binary. You can determine if data are base64 encoded as base64 scheme encodings terminate with the = character. You can decode base64 online.
Alternatively you could try a tool such as ChromeCacheViewer.
The file looks gzip. It has the 1f8b header. Chrome stores the cached files as files, you just need to find them. Google for "location of chrome cache" and find it for your platform.