I am trying to view data in a Redis database.
It is compressed data using Lettuce 6.1.1 version compression library.
It uses a GZIP compression type.
I have tried several online tools to convert the GZIP text to a readable ASCII format.
The tools fail because it does not recognize the GZIP text as GZIP data. Maybe it has something to do with the compression algorithm lettuce uses to compress the data.
Can anyone point me to a tool where I can decompress this data to readable ascii text?
Here is an example of the compressed data:
\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x00\xABVN-\xCBLNu,JM\xF4+\xCDMJ-R\xB2R2604\xB44Q\xAA\x05\x00\x190\x9B\xD1\x1E\x00\x00\x00
This should translate to a number: 301194
Here is a second example:
1.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003602\xB04\x01\x00\x93\xC0t\xC3\x06\x00\x00\x00
2.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003602\xB0\xB0\x04\x00o\x8D\xDE\xA4\x06\x00\x00\x00
3.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003602\xB04\x07\x00)\x91}Z\x06\x00\x00\x00
4.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003602\xB04\x03\x00\xBF\xA1z-\x06\x00\x00\x00
5.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003602\xB04\x00\x00\x8A\x04\x19\xC4\x06\x00\x00\x00
6.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003602\xB04\x02\x00\xA6e\x17*\x06\x00\x00\x00
7.\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x003604\xB44\x01\x00J\x05\x03\xD0\x06\x00\x00\x00
This should be a list of 7 service area numbers.
Not sure of the order but the values should be:
302090
302092
302097
302094
302096
302089
301194
I tried using this online tool:
https://codebeautify.org/gzip-decompress-online
There is no translation that appears in the translation window and no error is shown.
I also tried a this website:
https://www.multiutil.com/gzip-to-text-decompress/
I get the error: Invalid compression text
UPDATE
The RedisInsight screenshot below shows the key-value information. The value information that is compressed as gzip I would like to translate.
I wanted to copy the value that I have highlighted and decompress it so I can document what is stored in the database.
There is nothing wrong with your examples 1 through 7. They are all valid gzip streams that decompress to:
302094
302089
302097
302096
302090
302092
301194
Your first example in your question however has an error in the integtity check at the end. It decodes to:
{#eviceAreaNumber":"301194"}
While the deflate compressed data in the gzip stream is valid, the CRC that follows it is not. The uncompressed length after that is incorrect as well.
The online tools you point to are expecting Base64 encoded data. Not the partial hex encodings you are trying there.
I would like to extract the dictionary of any compression algorithm (zip would be the one I would go for since it is widely used) and dump this dictionary to a text file.
I looked the wikipedia page to try and find the answer in the header, but I didn't really find an explicit answer to my question
Zip can use multiple compression formats, one per compressed file.
For instance the Deflate and LZMA formats use a dictionary which is empty at the beginning and has a length of min(m,n) where m is the number of uncompressed bytes already processed and n is a preset value (32KB for Deflate).
So the dictionary is a portion of the uncompressed file on those formats.
I need to print an encrypted string as is in a rdlc report. My problem is if the string contain a plus sign it creates a new line in the Textbox. How to avoid this?
Encryption produces output that is binary and contains many bytes that have no displayable representation.
Because of this if encrypted data needs to be displayed it is generally either Base64 (best for computers) or hexadecimal (best for people) encoded.
It seems that you may have base64 encoded encrypted data and that is generally composed of the upper and lowercase characters, the 10 digits, "+", "/" and "=". You can not delete these and expect to recover the encrypted data.
If these characters present a problem they can be many times be escaped in some manor or another encoding can be chosen such as hexadecimal or an alternate Base64 character set, see Base64. If you choose an alternate Base64 character set interoperability will most likely be impaired.
Note: More information would produce a better answer.
I had to replace the "+" with "÷".
Users don't notice is it since the PDF is just a visual representation of the CFDI, I haven't had any issues with it.
I need to know the encoding of the values of PDF dictionaries (not the text displayed to the user but the "code behind").
I plan not to use any library for that.
Where can I find it?
the encoding of the values of PDF dictionaries
Values of PDF dictionaries are PDF objects.
You should take a look at the PDF specification ISO 32000-1, in particular chapter 7 Syntax, to find out about PDF objects. You will find:
The tokens that delimit objects and that describe the structure of a PDF file shall use the ASCII character
set. In addition all the reserved words and the names used as keys in PDF standard dictionaries and
certain types of arrays shall be defined using the ASCII character set.
Thus, most of the time you have to deal with ASCII values.
The situation is tricky with strings, though, because there are several types of strings which use the same string syntax options, so you have to interpret their contents according to their context.
Table 35 – String Object Types
Type Description
text string Shall be used for human-readable text, such as text
annotations, bookmark names, article names, and
document information. These strings shall be encoded
using either PDFDocEncoding or UTF-16BE with a
leading byte-order marker.
This type is described in 7.9.2.2, "Text String Type."
PDFDocEncoded string Shall be used for characters and glyphs that are
represented in a single byte, using PDFDocEncoding.
This type is described in 7.9.2.3, "PDFDocEncoded String
Type."
ASCII string Shall be used for characters that are represented in a
single byte using ASCII encoding.
byte string Shall be used for binary data represented as a series of
bytes, where each byte can be any value representable in
8 bits. The string may represent characters but the
encoding is not known. The bytes of the string need not
represent characters. This type shall be used for data
such as MD5 hash values, signature certificates, and Web
Capture identification values.
This type is described in 7.9.2.4, "Byte String Type."
If a string is the value e.g. of the Author metadata, it is a text string, so it is encoded using either PDFDocEncoding or UTF-16BE with a leading byte-order marker.
If on the other hand a string is the value e.g. of Contents in a signature dictionary, it is a byte string holding a binary object, any attempt to interpret it according to some encoding will fail.
The situation is even more tricky with streams.
First of all the stream content may be somehow processed, e.g. it may be compressed. To get to the actual stream contents, you first have to undo this processing.
The the content may either be binary, e.g. a font program, or text, e.g. JavaScript, or it may be a content stream, e.g. the page contents.
A content stream is a PDF stream object whose data consists of a sequence of instructions describing the
graphical elements to be painted on a page. The instructions shall be represented in the form of PDF objects,
using the same object syntax as in the rest of the PDF document.
Thus, they are mostly ASCII values. The exception again are string arguments to text drawing instructions. Their encoding depends entirely on the font currently selected when the string is drawn, and fonts may use standard encodings, but they may also use completely chaotic, ad-hoc encodings.
PS: If you happen to try and analyze an encrypted PDF, you will find that Encryption
applies to all strings and streams in the document's PDF file, with very few exceptions. In particular encryption does not apply to dictionary and array structures, numbers and names. Thus, someone not aware of this might not recognize that the PDF is encrypted but instead assume that strings and streams are encoded in a very weird way.
You find that in the PDF specification (http://www.adobe.com/devnet/pdf/pdf_reference.html). To elaborate a bit on the most important points in your question...
1) PDF dictionaries can contain a variety of value types (booleans, numbers, strings...). The encoding you are going to encounter depends on the type of value.
2) Mostly, the interesting and complex case is that where the type of object is a string.
3) For a string, read section 7.9.2 in the PDF specification. That explains what encodings can be used for such strings (PDFDocEncoding, Unicode encoding...) and how to recognise what encoding you have for a particular string.
To complement #mkl's and #DavidvanDriessche's excellent answers...
Here are three OpenSource command line tools which can help you to transform any PDF into different forms which expand/uncompress/decode object streams (Note, there is not one single, "the-one-and-only-correct" way to do this -- so the outputs of each of the tools will be different):
pdftk
mutool
qpdf
Each of these should be available via your favorite operating systems package manager.
pdftkexample usage:
pdftk in.pdf cat output out1.pdf uncompress
mutool example usage:
mutool clean -d in.pdf out2.pdf
qpdf example usage (my favorite tool for this purpose):
qpdf --qdf --object-streams=disable in.pdf out3.pdf
You should try each of these, compare their outputs for different input PDFs and then decide which one is your favorite (but never forget to remember the other tools when you encounter a case where your favorite shows unexpected results).
If I have a string that will be encoded with Base64, Md5, or some other hash or encryption function, is there a way to at least be able to make a fair guess as to what it is?
You can try to guess but with a lot of false results. Md5 always have 32 characters, base64 have a limited set of possible characters, etc.