In IE, the length limit of the URL is about 2048 bytes, Query string may exceed 2048 bytes.
I use lz-string to compress strings, However, the link in the pdf does not work, Or maybe use another way to compress the string?
var compressed = LZString.compress(query)
https://jsfiddle.net/p9e4a8dg/11
Try reading this link.
If you want pure ASCII you can try compressToBase64
I would like to extract the dictionary of any compression algorithm (zip would be the one I would go for since it is widely used) and dump this dictionary to a text file.
I looked the wikipedia page to try and find the answer in the header, but I didn't really find an explicit answer to my question
Zip can use multiple compression formats, one per compressed file.
For instance the Deflate and LZMA formats use a dictionary which is empty at the beginning and has a length of min(m,n) where m is the number of uncompressed bytes already processed and n is a preset value (32KB for Deflate).
So the dictionary is a portion of the uncompressed file on those formats.
I need to know the encoding of the values of PDF dictionaries (not the text displayed to the user but the "code behind").
I plan not to use any library for that.
Where can I find it?
the encoding of the values of PDF dictionaries
Values of PDF dictionaries are PDF objects.
You should take a look at the PDF specification ISO 32000-1, in particular chapter 7 Syntax, to find out about PDF objects. You will find:
The tokens that delimit objects and that describe the structure of a PDF file shall use the ASCII character
set. In addition all the reserved words and the names used as keys in PDF standard dictionaries and
certain types of arrays shall be defined using the ASCII character set.
Thus, most of the time you have to deal with ASCII values.
The situation is tricky with strings, though, because there are several types of strings which use the same string syntax options, so you have to interpret their contents according to their context.
Table 35 – String Object Types
Type Description
text string Shall be used for human-readable text, such as text
annotations, bookmark names, article names, and
document information. These strings shall be encoded
using either PDFDocEncoding or UTF-16BE with a
leading byte-order marker.
This type is described in 7.9.2.2, "Text String Type."
PDFDocEncoded string Shall be used for characters and glyphs that are
represented in a single byte, using PDFDocEncoding.
This type is described in 7.9.2.3, "PDFDocEncoded String
Type."
ASCII string Shall be used for characters that are represented in a
single byte using ASCII encoding.
byte string Shall be used for binary data represented as a series of
bytes, where each byte can be any value representable in
8 bits. The string may represent characters but the
encoding is not known. The bytes of the string need not
represent characters. This type shall be used for data
such as MD5 hash values, signature certificates, and Web
Capture identification values.
This type is described in 7.9.2.4, "Byte String Type."
If a string is the value e.g. of the Author metadata, it is a text string, so it is encoded using either PDFDocEncoding or UTF-16BE with a leading byte-order marker.
If on the other hand a string is the value e.g. of Contents in a signature dictionary, it is a byte string holding a binary object, any attempt to interpret it according to some encoding will fail.
The situation is even more tricky with streams.
First of all the stream content may be somehow processed, e.g. it may be compressed. To get to the actual stream contents, you first have to undo this processing.
The the content may either be binary, e.g. a font program, or text, e.g. JavaScript, or it may be a content stream, e.g. the page contents.
A content stream is a PDF stream object whose data consists of a sequence of instructions describing the
graphical elements to be painted on a page. The instructions shall be represented in the form of PDF objects,
using the same object syntax as in the rest of the PDF document.
Thus, they are mostly ASCII values. The exception again are string arguments to text drawing instructions. Their encoding depends entirely on the font currently selected when the string is drawn, and fonts may use standard encodings, but they may also use completely chaotic, ad-hoc encodings.
PS: If you happen to try and analyze an encrypted PDF, you will find that Encryption
applies to all strings and streams in the document's PDF file, with very few exceptions. In particular encryption does not apply to dictionary and array structures, numbers and names. Thus, someone not aware of this might not recognize that the PDF is encrypted but instead assume that strings and streams are encoded in a very weird way.
You find that in the PDF specification (http://www.adobe.com/devnet/pdf/pdf_reference.html). To elaborate a bit on the most important points in your question...
1) PDF dictionaries can contain a variety of value types (booleans, numbers, strings...). The encoding you are going to encounter depends on the type of value.
2) Mostly, the interesting and complex case is that where the type of object is a string.
3) For a string, read section 7.9.2 in the PDF specification. That explains what encodings can be used for such strings (PDFDocEncoding, Unicode encoding...) and how to recognise what encoding you have for a particular string.
To complement #mkl's and #DavidvanDriessche's excellent answers...
Here are three OpenSource command line tools which can help you to transform any PDF into different forms which expand/uncompress/decode object streams (Note, there is not one single, "the-one-and-only-correct" way to do this -- so the outputs of each of the tools will be different):
pdftk
mutool
qpdf
Each of these should be available via your favorite operating systems package manager.
pdftkexample usage:
pdftk in.pdf cat output out1.pdf uncompress
mutool example usage:
mutool clean -d in.pdf out2.pdf
qpdf example usage (my favorite tool for this purpose):
qpdf --qdf --object-streams=disable in.pdf out3.pdf
You should try each of these, compare their outputs for different input PDFs and then decide which one is your favorite (but never forget to remember the other tools when you encounter a case where your favorite shows unexpected results).
Okay, the title isn't very clear.
Given a byte array (read from a database blob) that represents EITHER the sequence of bytes contained in a .dll or the sequence of bytes representing the gzip'd version of that dll, is there a (relatively) simple signature that I can look for to differentiate between the two?
I'm trying to puzzle this out on my own, but I've discovered I can save a lot of time by asking for help. Thanks in advance.
Check if it's first two bytes are the gzip magic number 0x1f8b (see RFC 1952). Or just try to gunzip it, the operation will fail if the DLL is not gzip'd.
A gzip file should be fairly straight forward to determine as it ought to consist of a header, footer and some other distinguishable elements in between.
From Wikipedia:
"gzip" is often also used to refer to
the gzip file format, which is:
a 10-byte header, containing a magic
number, a version number and a time
stamp
optional extra headers, such as
the original file name
a body,
containing a DEFLATE-compressed
payload
an 8-byte footer, containing a
CRC-32 checksum and the length of the
original uncompressed data
You might also try determining if the gzip contains any records/entries as each will also have their own header.
You can find specific information on this file format (specifically the member header which is linked) here.
I need to mix audio files of different types into a single output file through code in my iPad app.
For example, I need to merge a .m4a file with .mp3 or .wav or any other format file.
The resulting output file should be of .m4a type.
I have compiled FFMPEG for iOS with the link: http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2009-October/076618.html
Now, I am not able to understand in which direction to proceed?
This is more of a suggestion than an answer. I don't have any experience with obj-c, though I have worked with audio formats in other languages. Unless you find some library that does this specific task, you may need to decode the files and convert their data to some common numerical representation.
The sample data of a .wav file is stored as signed integers between the range of -32768 to 32767, while mp3 sample data is stored as floating points between the range of -1 to +1. Either representation could be converted to the other through some simple calculation.
mp3ToWavSample = mp3Sample * 32767
Once the data is converted "merging" becomes very easy. You can simply add the sample values together.
mergedSample = convertedSample1 + convertedSample2
You would need to apply this to every sample in the mp3. Depending on the size of your files, this could be a significant processing task.
As for adding reverb to your track, I'd suggest that you ask for help on that in a another question.