docx4j Differencer Showing More Differences Than Expected - docx4j

I have two documents:
Document 1 (input)
Document 2 (output)
Document 2 is the result of passing Document 1 through a transformation process which leaves any content and formatting intact (verified by side-by-side compare in Word).
However, the process removes many id numbers from the .docx files.
For example,
<w:p w:rsidP="00B600D6" w:rsidR="00F55D78" w:rsidRDefault="00B600D6">
becomes
<w:p>
according to a dump of each document via the following code:
Body body = ((Document)newerPackage.getMainDocumentPart().getJaxbElement()).getBody();
Node node = org.docx4j.XmlUtils.marshaltoW3CDomDocument(body).getDocumentElement();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(node),
new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));
Using the docx4j Differencer comparison method recommended here, everything (except the first line which has no formatting applied) is shown as a modification.
Question is: Are the diffs a result of the missing id's, the formatting or something else?
In case it's important, we're using docx4j in this context to perform automated sanity/regression tests on our round-tripping proceess (i.e. apply the "loss-less" process and expect no differences)

Disclosure: I work on docx4j
If the only difference between paragraphs is the rsid attributes, they will still be detected as different.
You could "clean" the documents before performing the comparison, so that neither docx has rsid attributes. See the Filter sample.
By the way, an easier way to see the XML for an object (eg a single paragraph, or the entire body) is to use XmlUtils.marshaltoString

Related

Problem with line breaks in PDF document generated by BIRT

I have some cell texts in a BIRT report which do not flow as nicely as I hoped.
For example,
The text is Long value resultwithaverylongname whichcannotbreak and I had hoped that it would be displayed like this:
Long value
resultwithaverylongname
whichcannotbreak
The render options are as follows:
renderOptions.setOutputFormat(IPDFRenderOption.OUTPUT_FORMAT_PDF);
renderOptions.setOption(IPDFRenderOption.PAGE_OVERFLOW, IPDFRenderOption.OUTPUT_TO_MULTIPLE_PAGES);
renderOptions.setOption(IPDFRenderOption.PDF_TEXT_WRAPPING, true);
renderOptions.setOption(IPDFRenderOption.PDF_WORDBREAK, true);
It seems to me that my desired output is physically possible but I don't know why BIRT does not break on a whitespace and breaks in the middle of the word.
I am using BIRT 4.16 (from Sourceforge). The texts contain normal whitespace (no non-breakable spaces) and are displayed via a data object.
3.Sep.21
I now have an example project which I am trying to commit to Github. In the meantime here is a screenshot showing breaks which look good and others which are not...
The git repo is here: https://github.com/pramsden/test.wordbreak
If the text "resultwithaverylongname" physically fits, then you are right:
BIRT should not break it in the middle of the word.
Your renderOptions seem right (depending of what BIRT version you are using).
At first glance this looks like a bug.
But: In German language, we often have quite long words, and I've created a lot of (complex) PDF reports with BIRT, but I never saw this issue.
So I guess it is a tiny silly detail which causes this.
Just to double-check:
Are the spaces between "Long", "value", "result..." normal spaces (0x20)? or non-breaking spaces?
Which BIRT release are you using?
Are you using a data item or a dynamic text item and if so, is it HTML or plain text?
Can you create a reproducible simple test case and post the rptdesign file somewhere?
well i don use BIRT , but try to use (\n),
in my case I use PDFFlow library to generate pdf docs, and to make a line-break i just use \n
this is a simple example code to create a pdf file and use line break
var DocumentBuilder.New()
.AddSection()
.AddParagraphToSection("Hello world! \n go to the next line")
.ToDocument()
.Build("Result.PDF");
try it and tell me if it works

how can i export DataGridView with ARABIC data from Visual Basic to PDF by using iTextSharp [duplicate]

I have a problem with inserting UNICODE characters in a PDF file in eclipse.
There is some solution for this that it is not very efficient for me.
The solution is like this.
document.add(new Paragraph("Unicode: \u0418", new Font(bfComic, 12)));
I want to retrieve data from a database and show them to the user and my characters are in Arabic script and sometimes in Farsi script.
What solution do you suggest?
thanks
You are experiencing different problems:
Encoding of the data:
Please download chapter 2 of my book and go to section 2.2.2 entitled "The Phrase object: a List of Chunks with leading". In this section, look for the title "Database encoding versus the default CharSet used by the JVM".
You will see that database values are retrieved like this:
String name1 = new String(rs.getBytes("given_name"), "UTF-8");
That’s because the database contains different names with special characters. You risk that these special characters are displayed as gibberish if you would retrieve the field like this:
String name2 = rs.getString("given_name")
Encoding of the font:
You create your font like this:
Font font = new Font(bfComic, 12);
You don't show how you create bfComic, but I assume that this object is a BaseFont object using IDENTITY_H as encoding.
Writing from right to left / making ligatures
Although your code will work to show a single character, it won't work to show a sentence correctly.
Suppose that name1 is the Arabic version of the name "Lawrence of Arabia" and that we want to write this name to a PDF. This is done three times in the following screen shot:
The first line is wrong, because the characters are in the wrong order. They are written from left to right whereas they should be written from right to left. This is what will happen when you do:
document.add(name1);
Even if the encoding is correct, you're rendering the text incorrectly.
The second line is also wrong. The characters are now in the correct order, but no ligatures are made: ل followed by و should be combined into a single glyph: لو
You can only achieve this by adding the content to a ColumnText or PdfPCell object, and by setting the run direction to PdfWriter.RUN_DIRECTION_RTL. For instance:
pdfCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
Now the text will be rendered correctly.
This is explained in chapter 11 of my book. You can find a full example here: Ligatures2

Conditionally, Converting of JSON to XML using MuleSoft

I have a simple conversion of JSON to XML using MuleSoft. In "Transform Message" component, I provided JSON Schema as Input and XML Schema as Output. When I run the app, the conversion happens if the file matches with both schema but it generates an empty XML file if it doesn't match.
I want below conditions:
1) If the file matches with schema, the converted output file should be sent to converted folder and the original file should move to Success folder.
2) If the file doesn't match with schema, the original file should move to the Failure folder instead of conversion.
Hope, I explained it comprehensively as I am new to MuleSoft. Here is a sample diagram which may simplify my requirement. Provide me with a new one if I badly designed the process.
First thing you need to create a flowVar that will hold your original payload.
When your doing your evaluation, if its XML then use a simple XPath expression like //elementName[not(node())]
Lastly, on your success use scatter-gather for multi-threading write. Pull your original payload from flowVar and write to Success and Write your regular payload to your Converted folder

How use the one Template for multiple pages in a XWPFDocument with Java

I would like to know, how can i reuse one template (with one page inside and some variables) multiple times a XWPFDocument object.
My idea is:
load the template once in a XWPFDocument as an template-object
clone/create/copy the template-object with all his styles and headers etc
fill the clone with content
add this clone to the destination-XWPFDocument
I got this work for one single page only.
When i try to clone/create/copy the template-object it will lose all his style informations.
How to copy a paragraph of .docx to another .docx withJava and retain the style
How to copy some content in one .docx to another .docx , using POI without losing format?
POI probably does not support this out of the box, but I have done a similar thing in my project poi-mail-merge, it works with the underlying XML to repeatedly replace markers in a template Microsoft Word document and combine the results into one resulting document.
So it basically duplicates the template document multiple times into the resulting document.
See here for how I do it there, basically I work on the XML body text and do replacements/changes there and then append it onto the result document.
POI Mail Merge propably helps in other cases but in my case it doesn't work.
My Workaround is to update my Template-XWPFDocument to the needed structure first, save it temporarily and read it back into a XWPFDocument-object.
Here the steps:
Read the template-file into a XWPFDocument
Read the records from data-file e.g. csv
Calculate the numbers of pages related to the data-records
Get the Bodyelements-Objects from the Template-XWPFDocument
Create new Bodyelements (depending to the numbers of pages) in the Template-XWPFDocument and replace them with the same Objects that we get before
Save the updated Template-XWPFDocument temporarily
Read the temporarily saved Template into a XWPFDocument
Replace all placeholder and fill them with your CSV-Data
Hope this helps somebody

Documentation with Diagram "Hyperlinks" in Enterprise Architect?

I'm struggling to get all the required (and only the required) information into the documentation of my Enterprise Architect Project. Precisely: we have modelled various requirements and displayed the source "standards" for these requirements in our diagrams by using the "hyperlink"-element out of the common toolbox. (This allows us to capture a title, the website where the documentation is found and a description of this documentation).
Now this element is visible on the diagram, but not in the package-view of our model and it does not get generated in our word (docx) documentation.
I can see that it should be possible to get this in the documentation, because a "Model Report" which basically prints everything does print the hyperlinks. But I can't find what I have to select in my template (in the package-tree view, as a package field, element field or diagram field) in order to get this printed. I can't just use the model report since this basically dumps the whole database in the document and reverse-engineering this model report has proven too difficult for me. Actually I would expect this to be in some kind of documentation for EA, but could not find such a thing with this level of detail... is there, is there a reproducible way of finding such things out in further cases? (btw I'm using EA 11.0)
[sorry there were illustrations here, but I'm not allowed to upload them...]
As Geert has already noted, there is a difference between "proper" elements and diagram-only elements. This is actually reflected in the document template editor, where there is an "Element" section inside the "Diagram" section. This will produce output for all elements in the diagram, whether or not they are also in the project browser.
Here's an example of the information you can pull out of your hyperlinks. Given a diagram with a hyperlink:
... and a template which outputs name, alias and hyperlink for each element in the diagram:
... EA will generate a document will the following contents:
So if you want the hyperlink to result in a hyperlink in the document, use the HyperlinkAlias field.
What might be a bit confusing is the fact that in addition to the Hyperlink element type in the Common diagram toolbox, EA allows you to create hyperlinks in regular elements (in the Element Properties dialog, Related tab: Files, which can be local files or web addresses).
In fact, I would recommend that you use those in your Requirement elements rather than diagram-only Hyperlinks if traceability is a priority in your model. The diagram-only Hyperlinks, on the other hand, give you a clearer visual.
Selecting a subset of the elements in a diagram ("only the required information") is a little more involved and depends on how your model is structured. Template fragments will get the job done, but you might be able to achieve your desired result by just using the filters in the document generation dialog.
The hyperlink is an element that is stored in the same package as the diagram it is used on, it is just not visible in the project browser (similar to a note element).
There's a good chance that it doesn't have a name, so make sure you don't omit nameless elements.
So if you print all the element of the package containing the diagram then you should be able to print the hyperlink as well.
In case that fails you might want to consider creating a template fragment based on an SQL query or a script. Those offer lots of flexibility to print whatever you need, even if it is located in a different package.
[Edited on 04.05.15 to reflect the comment by Uffe and provide a final solution]
Ok, based on Geerts answer, using the following custom query fragment in the diagram section:
select
t_object.ea_guid as CLASSGUID,
t_object.Object_Type as CLASSTYPE,
t_object.Object_Id as OBJECTID,
t_object.name as HL_Name,
t_object.Stereotype as HL_Stereotype,
t_object.object_type as HL_Type,
t_object.Alias as HL_Alias,
Note as Notes
--,t_object.*
from t_object
left join t_diagramobjects on (t_object.Object_ID = t_diagramobjects.Object_ID)
left join t_diagram on (t_diagram.Diagram_ID = t_diagramobjects.Diagram_ID)
where t_diagram.Diagram_ID = '#DIAGRAMID#'
and t_object.Object_Type='Text'
I was able to get a list of the hyperlinks following the diagram, this is the fragment:
custom >
{HL_Alias}: {HL_Name}
{Notes}
< custom
The "Notes" can be printed by getting the attribute directly out of the t_object table. Don't get confused as I was at first: the auto-completion on t_object and the results (t_object.*) DO NOT SHOW a Note-Attribute, but it does exist an when you write it into the query, it gets generated in the document.