What is the `level` argument in zlib Deflate? - pdf

I am trying to parse a PDF in C and I am using the zlib library for deflating. In the documentation, the zlibInit function takes an argument which is labeled as level. What is the purpose of that argument and how can I obtain it from the PDF stream?

If there is a level argument, then you are using the wrong function to aid in parsing a PDF. level is only used for compression. For the PDF you need decompression. Compression in zlib is called "deflate". Decompression in zlib is called "inflate". You want to inflate, not deflate.

Related

How to extract the encoding dictionary from gzip archives

I am looking for a method whereby I can extract the encoding dictionary made by DEFLATE algorithm from a gzip archive.
I need the LZ77 made pointers from the whole archive which refer to patterns from the file as well as the Huffman tree with the aforementioned pointers.
Is there any solution in python?
Does anyone know the https://github.com/madler/infgen/blob/master/infgen.c which might provide the dictionary?
The "dictionary" used for compression at any point in the input is nothing more than the 32K bytes of uncompressed data that precede that point.
Yes, infgen will disassemble a deflate stream, showing all of the LZ77 references and the derived Huffman codes in a readable form. You could run infgen from Python and interpret the output in Python.
infgen also has a -b option for a non-human-readable binary format that might be faster to process for what you want to do.

How to match ZLib stream between VBA 6/VBA 7and Java 8?

We are being able to do the following.
In VBA 6/ VBA 7:
Refer a 32 bit zlibwapi.dll (VBA 6) or 64 bit zlibwapi.dll (VBA 7).
Invoke compress() or compress2() methods to generate compressed
streams
Invoke uncompress() and uncompress2() methods to decompress
compressed streams
In Java 8 (JDK 1.8 on Tomcat 8)
Have a simple java program that compresses data using the new
Deflater() instance
Have a simple Java program that decompresses using Inflater()
instance
We are failing when VBA sends out the compressed stream for Java Servlet to uncompress or when Java Servlet sends out compressed response data for VBA to decompress.
We are aware of following facts.
there are 3 formats provided by ZLib (raw, zlib and gzip).
The methods in zlibwapi.dll namely compress() and compress2()
generates compressed bytes in zlib format. This has been mentioned
in a similar thread at
Java decompressing array of bytes
Inflater() instance on Java side allows to uncompress zlib format
data as per a code sample posted at
Compression / Decompression of Strings using the deflater
Java 8 has zlib version 1.2.5 integrated as part of java.utils.zip
package.
We have ensured that we are using zlibwapi.dll version
1.2.5 on VBA side as well.
We have tried to use Hex editors to compare byte streams of compressed data independently generated by VBA and Java as well. We notice some difference in the generated compressed data. We think it is this difference that is causing both the environments to misunderstand each other.
Additionally, we think that when communication occurs, there has to be some common charset that governs the encoding/decoding scheme between both the endpoints. We have even tried to compare the hex code of byte stream generated by VBA and communicated across to Java Servlet.
The bytes seem to be getting some additional 0 bytes inserted in
between the actual set of compressed bytes while communication
occurs. This happens on VBA side. May be because of some unicode interpretation.
Whatever bytes get communicated across to Java appear entirely
different in their representation.
We need to fix our independently working code to communicate with one another and compress and decompress peacefully. We think there are 2 things to address - Getting format to match and using a charset that sends bytes as is. We are looking for any assistance from experts on this front that can help us find correct path to the possible solution. We need answers for
Does compress2() or compress() really generate zlib format?
Which charset will allow us to send bytes as is (if there are 10
bytes, we want to send 10 bytes. Not 20). If its unicode, 0 bytes
get inserted in between (10 bytes become 20 bytes because of this).
Yes.
Don't send characters. Send bytes.

Convert binary streamse use ghostscript api

How do I convert a binary blob pdf located in memory, in the same place jpeg located. Uses pure ghostscript api interface. Example, gsdll32.dll.
The proposed interface uses the default files on the disk.
args = [
"-dFirstPage=10",
"-dLastPage=10",
"-sDEVICE=jpeg",
"-r300",
"-sOutputFile=book.jpg",
"-dNOPAUSE",
"test2.pdf"
]
Basically, you can't. The Ghostscript PDF interpreter assumes that its dealing with a file on disk (see for example the definitions of runpdf and runpdfbegin in pdf_main.ps). Possibly you could convert that into a stream and pass that but it looks like a lot of work to me, all in PostScript.
You definitely can't have the JPEG output written to the same memory location.

Is there a way to create an intermediate output from Sphinx extensions?

When sphinx processes an rst to html conversion is there a way to see an intermediate format after extensions have been processed?
I am looking for an intermediate rst file that is generated after sphinx extensions were run.
Any ideas?
Take a look at the "ReST Builder" extension: https://pythonhosted.org/sphinxcontrib-restbuilder/.
There's not much to say; the extension takes reST as input and outputs ...drumroll... reST!
Quote:
This extension is in particular useful to use in combination with the autodoc extension. In this combination, autodoc generates the documentation based on docstrings, and restbuilder outputs the result are reStructuredText (.rst) files. The resulting files can be fed to any reST parser, for example, they can be automatically uploaded to the GitHub wiki of a project.

Surefire way of determining the codec of a media file

I'm looking for a surefire way of determining the codec used in an audio or video file. The two things I am currently using are the file extension (obvious), and the mime type as determined by running `file -ib' on the file.
This doesn't seem to get me all the way there: loads of formats are `wrapper' formats that hide the exact codec used within -- for example, '.ogg' files can internally use the Vorbis, Speex, or FLAC codecs. Their MIME type is also usually hidden under 'application/ogg' or similar.
The `file' program is apparently able to tell me which codec is used, but it returns this as human-readable prose:
kb.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~0 bps
and as such it is dodgy to use programmatically.
What I'm essentially asking is: is there a script out there (any language) that can wade through these wrapper formats and tell me what the meat of the file is made of?
ffmpeg includes a library called libavformat that can open and demux pretty much any media format. Obviously that's more than you actually need, but I don't think you can find anything else that's quite as complete. I've used it myself with great success. Take a look at this article for an introduction. There's also bindings for these libraries for some common scripting languages, such as python.
(If you don't want to build something using the library, you can probably use the regular ffmpeg binary.)
You can always use your own magic file, copied and modified from the pre-installed magic file, and change the return string so that it can be easily parsed by your program.
See:
http://linux.die.net/man/1/file
http://linux.die.net/man/5/magic