Openhtml to pdf messes up pdf display when build natively - native

I am using this - https://github.com/danfickle/openhtmltopdf to convert from html to pdf. (Which uses Apache PDFBox internally and that's where I guess the problem is).
Everything works well in development mode - I am using quarkus.
When I run - mvn clean quarkus:dev and generate the pdf it display properly (with html table and all) as expected.
However when I build natively (mvn package -Pnative) and than generate pdf, it messes up all display. Everything is just one string and it fails to understand css too.
Hard part is, I don't see any errors so can't figure out what's going wrong.
Htmlto pdf code, its really basic -
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withHtmlContent(htmlContent, null);
builder.toStream(os);
builder.run();
Pass any string (Well formatted html) and you will see the difference between two different execution style.
Why is this happening?

Hard to tell without looking into the library code.
Quarkus does its best to integrate libraries and make them work with native mode, by adding the pieces required by native mode. For native mode limitations, please check this link: https://www.graalvm.org/reference-manual/native-image/Limitations/.
Some libraries may work out of the box without additional configuration. Other require additional configuration to comply with the native image rules.
I suspect that the library may be using some sort of reflection that is not registered in the native image. Maybe try to have a look into it first and register additional resources: https://quarkus.io/guides/writing-native-applications-tips

Related

How to detect image in a document

How can I detect images in a document say doc,xls,ppt or pdf ?
I came across with Apache Tika, I am trying its command line option.
http://tika.apache.org/1.2/gettingstarted.html
But not quite sure how it will detect images.
Any help is appreciated.
Thanks
You've said you want to use a command line solution, and not write any Java code, so it's not going to be the prettiest way to do it... If you are happy to write a little bit of Java, and create a new program to call from Python, then you can do it much nicer!
The first thing to do is to have the Tika App extract out any embedded resources within your file. Use the --extract option for this, and have the extraction occur in a special temp directory you app controls, eg
$ java -jar tika.jar --extract ../testWORD_embedded_pdf.doc
Extracting 'image1.emf' (application/x-emf)
Extracting '_1402837031.pdf' (application/pdf)
Grab the output of the extraction if you can, and parse that looking for images (but be aware that some images have an application/ prefix on their canconical mimetype!). You might need to run a second --detect step on a few, I'm not sure, test how the parsers get on with the extraction.
Now, if there were images, they'll be in your test dir. Process them as you want. Finally, zap the temp dir when you're done with the file!
Having used Tika in the past I can't see how Tika can help with images embedded within Office documents or PDFs I was wrong to answer No. You will have may still try to resolve to native APIs like Apache POI and Apache PDFBox. Tika does use both libraries to parse text and metadata but no embedded image support.
Using Tika makes these APIs automatically available (side effect of using Tika).
UPDATE:
Since Tika 0.8: look for EmbeddedResourceHandler and examples - thanks to Gagravarr.

Using Apparat dump with FDT and ant

I am totally new to flash development, don't even know ActiveScript yet.
I have to improve some existing flash application, so at first I need to understand the code.
I want to use some tool for code analysis, something to visualize class dependencies and code structure. I googled and found out about Apparat tool. Now I'm struggling with it because I can not find documentation that describes how to use Apparat. I'm frustrated, but it seems to be the only such tool.
So I started with example.
I've set up apparat running on FDT following this guide:
http://www.webdevotion.be/blog/2010/06/02/how-to-get-up-and-running-with-apparat/
The example (http://blog.joa-ebert.com/2010/05/26/new-apparat-example/) builds well and creates two SWF files. (I'm using ANT builder)
Now I want to analyze existing swf and see a PNG with class dependencies.
How should I do that?
What do I have to add and where?
Or maybe someone can explain how to use dump from windows command line? Something like
dump example.swf exampleAnalysis.png
After resolving all dependencies (which was tricky), I managed to get dump running
dump -i example.swf -uml
But it saves the UML diagram in .DOT format which is really hard to read as Graphviz GVedit cannot zoom and exports to PNG only what you see (messy impossible to read zoomed out graph), smyrna doesn't work and zgrviewer fails to load some files.

Providing an embedded webkit with resources from memory

I'm working on an application that embeds WebKit (via the Gtk bindings). I'm trying to add support for viewing CHM documents (Microsoft's bundled HTML format).
HTML files in such documents have links to images, CSS etc. of the form "/blah.gif" or "/layout.css" and I need to catch these to provide the actual data. I understand how to hook into the "resource-request-starting" signal and one option would be to unpack parts of the document to temporary files and change the uri at this point to point at these files.
What I'd like to do, however, is provide WebKit with the relevant chunk of memory. As far as I can see, you can't do this by catching resource-request-starting, but maybe there's another way to hook in?
An alternative is to base64-encode the image into a data: URI. It's not exactly better than using a temporary file, but it may be simpler to code.

Dojo vs Dijit - files to include or reference?

I've been reading O'Reilly book "Dojo - The Definitive Guid" but somethings are still not definitive to me.
They talk about "bootstrapping" and getting the dojo.css from the AOL CDN".
When I'm testing on my machine, should I use the CDN? Or should I wait and use that only when I deploy?
Secondly, the book talks about CDN for dojo, but not for dijit.
I'm developing on Google App Engine (GAE) - so having the 2000+ Dojo/Dijit files in my Javascript directory is a little annoying, because it slows down my upload to GAE each time.
Firebug is giving me this error:
GET http://localhost:8080/dijit/nls/dijit-all_en-us.js 404 not Found
GET http://localhost:8080/dijit/_editor/plugins/FontChoice.js 404 not Found
I downloaded the sample from here:
http://archive.dojotoolkit.org/nightly/dojotoolkit/dijit/themes/themeTester.html?theme=soria
and I'd like to "simply" get it to run on my machine under local google app engine (which is the localhost:8080 that you see in the URLs above).
I see this statement which probably is causing the second 404 above:
dojo.require("dijit._editor.plugins.FontChoice");
One other error:
cannot access optimized closure
preload("en-us") dijit-all.js (line 479)
anonymous("dijit.nls.dijit-all", ["ROOT", "ar", "ca", 40 more... 0=ROOT 1=ar 2=ca 3=cs 4=da 5=de 6=de-de 7=el 8=en 9=en-gb])dijit-all.js (line 489)
dijit-all.js()
dojo.i18n._searchLocalePath(locale, true, function(loc){\n
To continue for now, I'm going to try to copy the entire dijit library, but is there a solution short of that?
My current script includes look like this:
<script type="text/javascript" src="/javascript/dijit.js"></script>
<script type="text/javascript" src="/javascript/dijit-all.js" charset="utf-8"></script>
I got the dijit.js file by copying and renaming dijit.js.uncompressed.js to dijit.js.
You have a few options actually:
You could use the CDN for everything (though using the full source locally does give you better error messages). Google has them as well. Dijit is here: http://ajax.googleapis.com/ajax/libs/dojo/1.3.2/dijit/dijit.js FYI. This has many advantages in my opinion. User caching of the JS being the primary one.
Build a layered file. I think the O'Reilly book has a section about it but the PragProg book is better in this regard IMO. There's also this doc on dojocampus.org about building. This will trim down the files you need to upload to GAE and speed up your app loading. This is actually what I do in order to cut down on HTTP requests.
Keep doing what you are doing. :)
Regarding the errors you are seeing about 404 for en-us files are essentially harmless. Here's a better description.
You also might be reloading dijit files by using dijit.uncompressed.js and dijit-all.js and causing problems in the process...but I'm not sure about this one.
I just want to clarify that when using CDN all you need to include is the main Dojo script. The rest will be pulled in automatically when you dojo.require() them.
If for some (technical) reasons you don't want to use the X-Domain loader (CDNs use this type of loader), you can do a custom build (well-described in many places). After the build you copy only relevant files to your server. No need to copy all 2000+ tests, demos, unused DojoX projects, Dijits and so on.
During the build you will create a single minified file (or a few layers), which will include all Dojo JavaScript code you use. If you use Dojo widgets, their templates will be already inlined, so you do not incur hits for them. As part of the build CSS files are combined together and minified too. So literally in most cases you will have just two files: a Dojo layer, which includes everything + your custom code, and a CSS file. In more complex cases you may have more files, but usually we are talking about handful.
How to make sure that everything is in the build? Fire up your favorite network analyzer (Live HTTP Headers, Firebug, Fiddler2, or Charles Proxy would do fine) and see if you hit any files outside of your build. If you do — include them in the build, or try to figure out why they are requested, and eliminate these requests (some localization-related calls are fine).
Personally I would start with the CDN option — works well, no hassle, hosted by somebody else with fat pipes.
To address your first question, use the full source version locally for development, so that you can get clearer debug info which points to a legible line in source, rather than the single line the minified version is reduced to. Use the CDN for production.

dojo js library + jsdoc -> how to document the code?

I'd love to ask you how do the guys developing dojo create the documentation?
From nightly builds you can get the uncompressed js files with all the comments, and I'm sure there is some kind documenting script that will generate some html or xml out of it.
I guess they use jsdoc as this can be found in their utils folder, but I have no idea on how to use it. jsDoc toolkit uses different /**commenting**/ notations than the original dojo files.
Thanks for all your help
It's all done with a custom PHP parser and Drupal. If you look in util/docscripts/README and util/jsdoc/INSTALL you can get all the gory details about how to generate the docs.
It's different than jsdoc-toolkit or JSDoc (as youv'e discovered).
FWIW, I'm using jsdoc-toolkit as it's much easier to generate static HTML and there's lots of documentation about the tags on the google code page.
Also, just to be clear, I don't develop dojo itself. I just use it a lot at work.
There are two parts to the "dojo jsdoc" process. There is a parser, written in PHP, which generates xml and/or json of the entirety of listed namespaces (defined in util/docscripts/modules, so you can add your own namespaces. There are basic usage instructions atop the file "generate.php") and a Drupal part called "jsdoc" which installs as a drupal module/plugin/whatever.
The Drupal aspect of it is just Dojo's basic view of this data. A well-crafted XSLT or something to iterate over the json and produce html would work just the same, though neither of these are provided by default (would love a contribution!). I shy away from the Drupal bit myself, though it has been running on api.dojotoolkit.org for some time now.
The doc parser is exposed so that you may use its inspection capabilities to write your own custom output as well. I use it to generate the Komodo .cix code completion in a [rather sloppy] PHP file util/docscripts/makeCix.php, which dumps information as found into an XML doc crafted to match the spec there. This could be modified to generate any kind of output you chose with a little finagling.
The doc syntax is all defined on the style guideline page:
http://dojotoolkit.org/reference-guide/developer/styleguide.html