signing with xades4j library - signing

Is there any working example of how to sign using xades4j library? Here's what I want to do:
Create an XML document
Convert some binary data (PDF or DOC file) to base64
Put converted data to the recently created XML document
Sign the XML document in Xades-C or Xades-T format.
First three steps are not a big problem. I could not find any useful working example of xades signing (Step 4).

The xades4j wiki on GitHub has the documentation you're looking for. In addition, the unit tests on the library source code include examples for many scenarios.

Related

Pdf generators from an xml template?

Are there any PDF generators out there be it commercial or open source which can be used for research purposes? That i can use such as pdfnow.com or any standalone desktop apps which allow me to generate a PDF from a XML template. I have tried researching but there is allot of ambiguity going around.
Applidok is generating PDF based on an original (raw) PDF, a template definition and dynamic/user data (e.g. from a form): http://go.applidok.com/en/howitworks.gz.html
Template format there is JSON, not XML, but approach is the same.

Is there a way to create an intermediate output from Sphinx extensions?

When sphinx processes an rst to html conversion is there a way to see an intermediate format after extensions have been processed?
I am looking for an intermediate rst file that is generated after sphinx extensions were run.
Any ideas?
Take a look at the "ReST Builder" extension: https://pythonhosted.org/sphinxcontrib-restbuilder/.
There's not much to say; the extension takes reST as input and outputs ...drumroll... reST!
Quote:
This extension is in particular useful to use in combination with the autodoc extension. In this combination, autodoc generates the documentation based on docstrings, and restbuilder outputs the result are reStructuredText (.rst) files. The resulting files can be fed to any reST parser, for example, they can be automatically uploaded to the GitHub wiki of a project.

Generating PDF documents from LISP

I want to generate a technical report from lisp (AllegroCL in my case) and I studied various packages/project to help me do this.
Requirements:
Need to generate a PDF
May create an intermediate format like RTF, Restructured TEXT, HTML, Word DOC or Latex
Need to be flexible to be able to add content throughout my application
Need to handle Multi-Page, Headers, Footers, Tables, inclusion of Images.
Possibilities:
cl-pdf and cl-typesetting: I checked this one out and it works for now, but is there a better alternative?
Some Latex generator, but ???
Question:
Do you know alternatives to easily generate (PDF) reports from lisp. What is the best workflow to go for?
we are using cl-pdf and cl-typesetting for the last 3 years and it has numerous issues... (like its confusion around encodings, or silently not rendering things that don't fit, or...) so, i don't recommend new development based on them.
currently we are in the process of moving all our export mechanisms to open document format. openoffice is all happy with it, and there's a plugin for ms office, too.
there's .fodt, the so called flat open document text format, which is a mere xml file describing a document. generating it is as easy as generating xml files.
you can also make parts of your document read-only with a password (insert a section and mark it read-only and protected by a password. when generating the xml, you can generate random hashes as password...).

dojo js library + jsdoc -> how to document the code?

I'd love to ask you how do the guys developing dojo create the documentation?
From nightly builds you can get the uncompressed js files with all the comments, and I'm sure there is some kind documenting script that will generate some html or xml out of it.
I guess they use jsdoc as this can be found in their utils folder, but I have no idea on how to use it. jsDoc toolkit uses different /**commenting**/ notations than the original dojo files.
Thanks for all your help
It's all done with a custom PHP parser and Drupal. If you look in util/docscripts/README and util/jsdoc/INSTALL you can get all the gory details about how to generate the docs.
It's different than jsdoc-toolkit or JSDoc (as youv'e discovered).
FWIW, I'm using jsdoc-toolkit as it's much easier to generate static HTML and there's lots of documentation about the tags on the google code page.
Also, just to be clear, I don't develop dojo itself. I just use it a lot at work.
There are two parts to the "dojo jsdoc" process. There is a parser, written in PHP, which generates xml and/or json of the entirety of listed namespaces (defined in util/docscripts/modules, so you can add your own namespaces. There are basic usage instructions atop the file "generate.php") and a Drupal part called "jsdoc" which installs as a drupal module/plugin/whatever.
The Drupal aspect of it is just Dojo's basic view of this data. A well-crafted XSLT or something to iterate over the json and produce html would work just the same, though neither of these are provided by default (would love a contribution!). I shy away from the Drupal bit myself, though it has been running on api.dojotoolkit.org for some time now.
The doc parser is exposed so that you may use its inspection capabilities to write your own custom output as well. I use it to generate the Komodo .cix code completion in a [rather sloppy] PHP file util/docscripts/makeCix.php, which dumps information as found into an XML doc crafted to match the spec there. This could be modified to generate any kind of output you chose with a little finagling.
The doc syntax is all defined on the style guideline page:
http://dojotoolkit.org/reference-guide/developer/styleguide.html

Structure of a PDF file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
For a small project I have to parse pdf files and take a specific part of them (a simple chain of characters). I'd like to use python to do this and I've found several libraries that are capable of doing what I want in some ways.
But now after a few researches, I'm wondering what is the real structure of a pdf file, does anyone know if there is a spec or some explanations anywhere online? I've found a link on adobe but it seems that it's a dead link :(
Here is a link to Adobe's reference material
http://www.adobe.com/devnet/pdf/pdf_reference.html
You should know though that PDF is only about presentation, not structure. Parsing will not come easy.
I found the GNU Introduction to PDF to be helpful in understanding the structure. It includes an easily readable example PDF file that they describe in complete detail.
Other helpful links:
PDF Succinctly book is longer and has helpful pictures.
Introduction to the Insides of PDF is a presentation that isn't as in-depth but gives a quick overview and has lots of pictures.
When I first started working with PDF, I found the PDF reference very hard to navigate.
It might help you to know that the overview of the file structure is found in syntax, and what Adobe call the document structure is the object structure and not the file structure. That is also found in Syntax. The description of operators is hidden away in Appendix A - very useful for understanding what is happening in content streams. If you ever have the pain of working with colour spaces you will find that hidden in Graphics! Hopefully these pointers will help you find things more quickly than I did.
If you are using windows, pdftron CosEdit allows you to browse the object structure to understand it. There is a free demo available that allows you to examine the file but not save it.
Here's the raw reference of PDF 1.7, and here's an article describing the structure of a PDF file. If you use Vim, the pdftk plugin is a good way to explore the document in an ever-so-slightly less raw form, and the pdftk utility itself (and its GPL source) is a great way to tease documents apart.
I'm trying to do pretty much the same thing. The PDF reference is a very difficult document to read. This tutorial is a better start I think.
This may help shed a little light:
(from page 11 of PDF32000.book)
PDF syntax is best understood by considering it as four parts, as shown in Figure 1:
• Objects. A PDF document is a data structure composed from a small set of basic types of data objects.
Sub-clause 7.2, "Lexical Conventions," describes the character set used to write objects and other
syntactic elements. Sub-clause 7.3, "Objects," describes the syntax and essential properties of the objects.
Sub-clause 7.3.8, "Stream Objects," provides complete details of the most complex data type, the stream
object.
• File structure. The PDF file structure determines how objects are stored in a PDF file, how they are
accessed, and how they are updated. This structure is independent of the semantics of the objects. Sub-
clause 7.5, "File Structure," describes the file structure. Sub-clause 7.6, "Encryption," describes a file-level
mechanism for protecting a document’s contents from unauthorized access.
• Document structure. The PDF document structure specifies how the basic object types are used to
represent components of a PDF document: pages, fonts, annotations, and so forth. Sub-clause 7.7,
"Document Structure," describes the overall document structure; later clauses address the detailed
semantics of the components.
• Content streams. A PDF content stream contains a sequence of instructions describing the appearance of
a page or other graphical entity. These instructions, while also represented as objects, are conceptually
distinct from the objects that represent the document structure and are described separately. Sub-clause
7.8, "Content Streams and Resources," discusses PDF content streams and their associated resources.
Looks like navigating a PDF file will require a little more than a passing effort.
If You want to parse PDF using Python please have a look at PDFMINER. This is the best library to parse PDF files till date.
Didier have a tool to parse the PDF:
http://didierstevens.com/files/software/pdf-parser_V0_4_3.zip
or here:
http://blog.didierstevens.com/programs/pdf-tools/ which cataloged several related pdf-analysis tools.
Another tool is here:
http://mshahzadlatif.wordpress.com/2011/09/28/view-pdf-structure-using-adobe-acrobat-or-a-free-tool-called-pdfxplorer/
Extracting text from PDF is a hard problem because PDF has such a layout-oriented structure. You can see the docs and source code of my barely-successful attempt on CPAN (my implementation is in Perl). The PDF data structure is very cool and well designed, but it's easier to write than read.
One way to get some clues is to create a PDF file consisting of a blank page. I have CutePDF Writer on my computer, and made a blank Wordpad document of one page. Printed to a .pdf file, and then opened the .pdf file using Notepad.
Next, use a copy of this file and eliminate lines or blocks of text that might be of interest, then reload in Acrobat Reader. You'd be surprised at how little information is needed to make a working one-page PDF document.
I'm trying to make up a spreadsheet to create a PDF form from code.
You need the PDF Reference manual to start reading about the details and structure of PDF files. I suggest to start with version 1.7.
On windows I used a free tool PDF Analyzer to see the internal structure of PDF files.
This will help in your understanding when reading the reference manual.
(I'm affiliated with PDF Analyzer, no intention to promote)
To extract text from a PDF, try this on Linux, BSD, etc. machine or use Cygwin if on Windows:
pdfinfo -layout some_pdf_file.pdf
A plain text file named some_pdf_file.txt is created. The simpler the PDF file layout, the more straightforward the .txt file output will be.
Hexadecimal characters are frequently present in the .txt file output and will look strange in text editors. These hexadecimal characters usually represent curly single and double quotes, bullet points, hyphens, etc. in the PDF.
To see the context where the hexadecimal characters appear, run this grep command, and keep the original PDF handy to see what character the codes represent in the PDF:
grep -a --color=always "\\\\[0-9][0-9][0-9]" some_pdf_file.txt
This will provide a unique list of the different octal codes in the document:
grep -ao "\\\\[0-9][0-9][0-9]" some_pdf_file.txt|sort|uniq
To convert these hexadecimal characters to ASCII equivalents, a combination of grep, sed, and bc can be used, I'll post the procedure to do that soon.