Prevent unprocessed DOT files from being deleted when using Doxywizard - documentation

I'm using Doxygen wizard to generate documentation for a big embedded C project. I'm able to generate graphs and class diagrams using Dot and Graphviz. However, I would like to edit some dependency graphs by hand - there's information that is not needed all the time e.g. the graph depth is too much.
I noticed when running Doxywizard that before the diagram files are generated and saved as PNG files, "raw files" for lack of a better word are created that hold the code to generate the graphs using Graphviz. These are DOT files that can be opened in the text editor. These files are deleted once the diagram images are generated.
I was able to access them by stopping the Doxywizard mid-process before they got deleted. Is there any way to prevent these DOT files from being deleted?

Doxygen has the possibility to retain the dot files. In the doxygen settings file (Doxyfile) there is the setting DOT_CLEANUP setting this to DOT_CLEANUP=NO will retain the files.
From the documentation:
DOT_CLEANUP
If the DOT_CLEANUP tag is set to YES, doxygen will remove the intermediate files that are used
to generate the various graphs.
Note: This setting is not only used for dot files but also for msc temporary files.
The default value is: YES
See also: https://www.doxygen.nl/manual/config.html and more specifically: https://www.doxygen.nl/manual/config.html#cfg_dot_cleanup
And in the doxywizard:
i.e.
expert tab
topics "pane" the last item dot
in the list of possibilities the last item DOT_CLEANUP

Related

batch annotation for similar pdf files

I have many pdf files with same layout in which I need to fill in some details with plain text and sign in certain places with company stamp.
I am trying to automate things, so I am after a way to fill in the details + stamp the first file and then "copy-paste" that operation to all files.
I managed to find:
online websites allowing me to annotate one file at a time
documentation on how to use pdftk and additional tools to create a script, but it can take a lot of manual operations via command line (e.g. scaling the signature jpeg file, positioning it, adding text in right location etc.) which is very tedious.
Is there any way to annotate the first file using visual tools (like #1) and then extract a script of commands to perform these annotations from command line ?
I can then use that script on multiple files.
Thank you,
-Moshe

Preserve directory structure when unpacking attachments from PDF with pdftk?

I am trying to pack and unpack attachments including a subdirectory hierarchy to a PDF with pdftk ... attach_files and pdftk ... unpack_files. However, while attach_files is capable of representing the subdirectory information by including the / separator in file names, unpack_files puts all files into one flat directory, silently overwriting files if the same name occurs multiple times. Is it possible to get preservation of the hierarchy when unpacking?
As workarounds I have used:
Packing the attachments into a zip file and attaching the zip file. However, this way the attachment hierarchy is no longer easily accessible.
Applying a bijective transformation on the path names, that maps the hierarchy to a flat structure and back. However, this way unpacking is possible only with a script doing the transformation.
Being directly able to preserve the hierarchy information already stored in the PDF would be preferable.
Unfortunately not with the current version of pdftk, it is hardcoded to drop path information both when attaching and unpacking files. In fact, I would be surprised if any hierarchy information got stored in the PDF using pdftk.
That being said, it would not be too hard to write a patch to change this behaviour, I suggest opening an issue with a feature request.

Is it possible to reuse docs generated by Doxygen and merge with new documentation?

Since the codebase is quite large, Doxygen takes a really long time to run. If I could obtain the modified files from some version control system and run Doxygen on them, is it possible to merge the existing documentation with the new pages generated?
If so, how can this be done?
Doxygen does not have an incremental build, though there are some mechanisms that do speedup the generation slightly:
Generated images (e.g. call graphs, inheritance graphs are "cached" i.e. in short a md5sum is stored and when this does not change for the graph the image is not regenerated)
for "independent" parts it is possible to create "tag" files (see documentation e.g. TAGFILES)

Track and output changes/diffs between LaTeX document revisions to PDF?

We want to keep track of changes in a LaTeX document in such a way that people who can't read LaTeX can also see the changes at once. The .tex files are stored in a git repository. So detailed information about the changes is available.
For this purpose I think it might be possible to use the git diff output between two revisions to generate the PDF and somehow mark the changes since the selected other revision of the document.
Do you know of an (easy) way to achieve this?
Do you know of other ways to visualize differences between PDF files?
[Expanding on my comment, since it apparently helped :-) ]
latexdiff is a Perl script that can diff two LaTeX documents and mark up changes without the distractions of the LaTeX markup itself. The README says:
latexdiff is a Perl script, which compares two latex files and marks
up significant differences between them (i.e. a diff for latex files).
Various options are available for visual markup using standard latex
packages such as "color.sty". Changes not directly affecting visible
text, for example in formatting commands, are still marked in
the latex source. Note that only files conforming to latex syntax will
be processed correctly, not generic TeX files. Some further
minor restrictions apply, see documentation.
A rudimentary revision facilility is provided by another Perl script,
latexrevise, which accepts or rejects all changes. Manual
editing of the difference file can be used to override this default
behaviour and accept or reject selected changes only.
The author is F Tilmann.
The project is developed on Github, but you can get the script in a tarball from CTAN if you prefer. The link in the comment is a useful overview of how to use it.

Extract embedded PDF file without a full parse

I want to build a utility to extract embedded files from a PDF (see section 7.11.4 of the spec). However I want the utility to be "small" and not depend on a full PDF parsing framework. I'm wondering if the file format is such that a simple tool could scan through the document for some token or sequence, and from that know where to start extracting the embedded file(s).
Potential difficulties include the possibility that the token or sequence that you scan for could validly exist elsewhere in the document leading to spurious or corrupt document extraction.
I'm not that familiar with the PDF spec, and so I'm looking for
confirmation that this is possible
a general approach that would work
There are at least two scenarios that are going to make your life difficult: encrypted files, and object streams (a compressed object that contains a collection of objects inside).
About the second item (object streams), some PDF generation tools will take most of the objects (dictionaries) inside a PDF file, put them inside a single object, and compress this single object (usually with deflate compression). These means that you cannot just skim through a PDF file looking for some particular token in order to extract some piece of information that you need while ignoring the rest. You will need to actually interpret the structure of PDF files at least partially.
Note that the embedded files that you want to extract are very likely to be compressed as well, even if an objects stream is not used.
Your program will need to be able to do at least the following:
- Processing xref tables
- Processing object streams
- Applying decoding/decompression filters to a data stream.
Once you are able to get all objects from the file, you could in theory go through all of them looking for dictionaries of type EmbeddedFile. This approach has the disadvantage that you might extract files that are not been referenced from anywhere inside the document (because a user deleted it at some point of the file's history for example)
Another approach could be to actually navigate through the structure of the file looking for embedded files on the locations specified by the PDF spec. You can find embedded files in at least the following elements (this list is from the top of my head, there might be a lot more that these):
- Names dictionary
- Document outlines
- Page annotations