Docx4j unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"p") - html-table

I am trying add html table in to the another word table' cell.
I can add html table in to the another word table' cell. (OK)
I can generated last word document call lastDocument.docx (OK)
I can not load again
WordprocessingMLPackage.load(lastDocument.docx), throw this
exception Docx4j unexpected element
(uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main",
local:"p")
this is my code:
Tr workingRow = (Tr) XmlUtils.deepCopy(templateRow);
List<?> textElements = WMLPackageUtils.getTargetElements(workingRow, Text.class);
List<Tc> tcList = WMLPackageUtils.getTargetElements(workingRow, Tc.class);
Tc tc = WMLPackageUtils.getTc(tcList, "${Replace_Tex1}");
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
XHTMLImporter.setParagraphFormatting(FormattingOption.IGNORE_CLASS);
XHTMLImporter.setTableFormatting(FormattingOption.IGNORE_CLASS);
for (Object object : textElements) {
Text text = (Text) object;
if (!text.getValue().equals("${Replace_Tex1}"))
continue;
String replacementValue = (String) replacements.get(text.getValue());
//text.setValue(replacementValue);
R r = (R) text.getParent();
r.getContent().clear();
r.getContent().addAll(XHTMLImporter.convert(replacementValue, null));

I guess your problem is:
r.getContent().addAll(XHTMLImporter.convert(replacementValue, null));
Adding w:p (paragraph content) inside a run, which isn't allowed.
You can unzip your docx to look at word/document.xml

Hello i have fixed my code like that
R r = (R) text.getParent();
P paragraphOfText = wordMLPackage.getMainDocumentPart().createParagraphOfText("");
paragraphOfText.getContent().clear();
r.getContent().clear();eviewtable.getContent().add(new Tr(new Tc(itemTable, new Paragraph())));
tc.getContent().addAll(XHTMLImporter.convert(replacementValue, null));
tc.getContent().add(paragraphOfText);
it is working :) Thank you #JasonPlutext

Related

Why does adding w:drawing cause corrupted file

I'm trying to add a chart to a docx file using docx4j. I generated what i wanted in Word and with the help of the docx4j webapp i was able to get the corresponding java code. Unfortunately the generated docx file is said to be corrupted by Word. When trying to debug, I realised that if I commented out the line that added the drawing to the run, the file became readable. I can't figure out what's wrong. Below is my code and the link to what i tried to reproduce http://www.filedropper.com/graphique .
I'm using docx4j 8.2 and office 2016
Chart chartPart = new Chart();
Relationship chartRelationship = wordMLPackage.getMainDocumentPart().addTargetPart(chartPart);
chartPart.setJaxbElement(ChartSpace.createChartSpace());
DefaultXmlPart colorPart = new DefaultXmlPart(new PartName("/word/charts/colors1.xml"));
colorPart.setContentType(new ContentType("application/vnd.ms-office.chartcolorstyle+xml"));
colorPart.setRelationshipType("http://schemas.microsoft.com/office/2011/relationships/chartColorStyle");
chartPart.addTargetPart(colorPart);
colorPart.setDocument(new FileInputStream(new File("colors1.xml")));
DefaultXmlPart stylePart = new DefaultXmlPart(new PartName("/word/charts/style1.xml"));
stylePart.setContentType(new ContentType("application/vnd.ms-office.chartstyle+xml"));
stylePart.setRelationshipType("http://schemas.microsoft.com/office/2011/relationships/chartStyle");
chartPart.addTargetPart(stylePart);
stylePart.setDocument(new FileInputStream(new File("style1.xml")));
EmbeddedPackagePart embeddedPackagePart = new EmbeddedPackagePart(
new PartName("/word/embeddings/Microsoft_Excel_Worksheet.xlsx"));
embeddedPackagePart.setContentType(
new ContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"));
chartPart.addTargetPart(embeddedPackagePart);
embeddedPackagePart.setBinaryData(new java.io.FileInputStream(new File("data.xlsx")));
mainDocumentPart.getContent().add(addGraphic(factory, chartRelationship.getId()));
public static P addGraphic(ObjectFactory factory, String chartId) throws JAXBException {
P p = factory.createP();
// Create object for r
R r = factory.createR();
p.getContent().add(r);
// Create object for drawing (wrapped in JAXBElement)
Drawing drawing = factory.createDrawing();
JAXBElement<org.docx4j.wml.Drawing> drawingWrapped = factory.createRDrawing(drawing);
r.getContent().add(drawingWrapped);
org.docx4j.dml.wordprocessingDrawing.ObjectFactory dmlwordprocessingDrawingObjectFactory = new org.docx4j.dml.wordprocessingDrawing.ObjectFactory();
// Create object for inline
Inline inline = dmlwordprocessingDrawingObjectFactory.createInline();
drawing.getAnchorOrInline().add(inline);
org.docx4j.dml.ObjectFactory dmlObjectFactory = new org.docx4j.dml.ObjectFactory();
// Create object for graphic
Graphic graphic = dmlObjectFactory.createGraphic();
inline.setGraphic(graphic);
// Create object for graphicData
GraphicData graphicdata = dmlObjectFactory.createGraphicData();
graphic.setGraphicData(graphicdata);
graphicdata.setUri("http://schemas.openxmlformats.org/drawingml/2006/chart");
org.docx4j.dml.chart.ObjectFactory dmlchartObjectFactory = new org.docx4j.dml.chart.ObjectFactory();
// Create object for chart (wrapped in JAXBElement)
CTRelId relid = dmlchartObjectFactory.createCTRelId();
JAXBElement<org.docx4j.dml.chart.CTRelId> relidWrapped = dmlchartObjectFactory.createChart(relid);
graphicdata.getAny().add(relidWrapped);
relid.setId(chartId);
// Create object for cNvGraphicFramePr
CTNonVisualGraphicFrameProperties nonvisualgraphicframeproperties = dmlObjectFactory
.createCTNonVisualGraphicFrameProperties();
inline.setCNvGraphicFramePr(nonvisualgraphicframeproperties);
// Create object for extent
CTPositiveSize2D positivesize2d = dmlObjectFactory.createCTPositiveSize2D();
inline.setExtent(positivesize2d);
positivesize2d.setCx(5486400);
positivesize2d.setCy(3200400);
// Create object for effectExtent
CTEffectExtent effectextent = dmlwordprocessingDrawingObjectFactory.createCTEffectExtent();
inline.setEffectExtent(effectextent);
effectextent.setB(0);
effectextent.setL(0);
effectextent.setT(0);
effectextent.setR(0);
// Create object for docPr
CTNonVisualDrawingProps nonvisualdrawingprops = dmlObjectFactory.createCTNonVisualDrawingProps();
inline.setDocPr(nonvisualdrawingprops);
nonvisualdrawingprops.setDescr("");
nonvisualdrawingprops.setName("Graphique 1");
nonvisualdrawingprops.setId(1);
inline.setDistT(Long.valueOf(0));
inline.setDistB(Long.valueOf(0));
inline.setDistL(Long.valueOf(0));
inline.setDistR(Long.valueOf(0));
// Create object for rPr
RPr rpr = factory.createRPr();
r.setRPr(rpr);
// Create object for noProof
BooleanDefaultTrue booleandefaulttrue = factory.createBooleanDefaultTrue();
rpr.setNoProof(booleandefaulttrue);
return p;
}

allow arabic text in pdf table using itext7 (xamarin android)

I have to put my list data in a table in a pdf file. My data has some Arabic words. When my pdf is generated, the Arabic words don't appear. I searched and found that I need itext7.pdfcalligraph so I installed it in my app. I found this code too https://itextpdf.com/en/blog/technical-notes/displaying-text-different-languages-single-pdf-document and tried to do something similar to allow Arabic words in my table but I couldn't figure it out.
This is a trial code before I apply it to my real list:
var path2 = global::Android.OS.Environment.ExternalStorageDirectory.AbsolutePath;
filePath = System.IO.Path.Combine(path2.ToString(), "myfile2.pdf");
stream = new FileStream(filePath, FileMode.Create);
PdfWriter writer = new PdfWriter(stream);
PdfDocument pdf2 = new iText.Kernel.Pdf.PdfDocument(writer);
Document document = new Document(pdf2, PageSize.A4);
FontSet set = new FontSet();
set.AddFont("ARIAL.TTF");
document.SetFontProvider(new FontProvider(set));
document.SetProperty(Property.FONT, "Arial");
string[] sources = new string[] { "يوم","شهر 2020" };
iText.Layout.Element.Table table = new iText.Layout.Element.Table(2, false);
foreach (string source in sources)
{
Paragraph paragraph = new Paragraph();
Bidi bidi = new Bidi(source, Bidi.DirectionDefaultLeftToRight);
if (bidi.BaseLevel != 0)
{
paragraph.SetTextAlignment(iText.Layout.Properties.TextAlignment.RIGHT);
}
paragraph.Add(source);
table.AddCell(new Cell(1, 1).SetTextAlignment(iText.Layout.Properties.TextAlignment.CENTER).Add(paragraph));
}
document.Add(table);
document.Close();
I updated my code and added the arial.ttf to my assets folder . i'm getting the following exception:
System.InvalidOperationException: 'FontProvider and FontSet are empty. Cannot resolve font family name (see ElementPropertyContainer#setFontFamily) without initialized FontProvider (see RootElement#setFontProvider).'
and I still can't figure it out. any ideas?
thanks in advance
- C #
I have a similar situation for Turkish characters, and I've followed these
steps :
Create a folder under projects root folder which is : /wwwroot/Fonts
Add OpenSans-Regular.ttf under the Fonts folder
Path for font is => ../wwwroot/Fonts/OpenSans-Regular.ttf
Create font like below :
public static PdfFont CreateOpenSansRegularFont()
{
var path = "{Your absolute path for FONT}";
return PdfFontFactory.CreateFont(path, PdfEncodings.IDENTITY_H, true);
}
and use it like :
paragraph.Add(source)
.SetFont(FontFactory.CreateOpenSansRegularFont()); //set font in here
table.AddCell(new Cell(1, 1)
.SetTextAlignment(iText.Layout.Properties.TextAlignment.CENTER)
.Add(paragraph));
This is how I used font factory for Turkish characters ex: "ü,i,ç,ş,ö"
For Xamarin-Android, you could try
string documentsPath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
var path = Path.Combine(documentsPath, "Fonts/Arial.ttf");
Look i fixed it in java,this may help you:
String font = "your Arabic font";
//the magic is in the next 4 lines:
PdfFontFactory.register(font);
FontProgram fontProgram = FontProgramFactory.createFont(font, true);
PdfFont f = PdfFontFactory.createFont(fontProgram, PdfEncodings.IDENTITY_H);
LanguageProcessor languageProcessor = new ArabicLigaturizer();
//and look here how i used setBaseDirection and don't use TextAlignment ,it will work without it
com.itextpdf.kernel.pdf.PdfDocument tempPdfDoc = new com.itextpdf.kernel.pdf.PdfDocument(new PdfReader(pdfFile.getPath()), TempWriter);
com.itextpdf.layout.Document TempDoc = new com.itextpdf.layout.Document(tempPdfDoc);
com.itextpdf.layout.element.Paragraph paragraph0 = new com.itextpdf.layout.element.Paragraph(languageProcessor.process("الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية"))
.setFont(f).setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(15);

Undefined merge field in google apps script

I have a Google Apps Script for a Google Spreadsheet based on a Google Form that clients fill out online. The script is triggered by OnFormSubmit and generates a pdf based on a Google Doc template and sends the pdf to me by email using MailApp.sendEmail.
This script has been working fine until recently. The script runs successfully but the pdf output is incorrect. It seems like fields that are left blank are now being ignored in the script and so my pdf output shows the value for the next non-blank field. Ugh!
Anybody know what's going on here?
Below is an example of my script:
var docTemplate = "1FZL4rVe0LLpvMtIsq_3-pwv5POllIsyYThjfemkbkfg";
var docName = "Travel Details";
function onFormSubmit(e) {
var last = e.values[1];
var first = e.values[2];
var order = e.values[3];
var date = e.values[4];
var gender = e.values[5];
var email = "example#gmail.com";
var copyId = DocsList.getFileById(docTemplate)
.makeCopy(docName+' for '+last + ', ' + first)
.getId();
var copyDoc = DocumentApp.openById(copyId);
var copyBody = copyDoc.getActiveSection();
copyBody.replaceText('keyLast', last);
copyBody.replaceText('keyFirst', first);
copyBody.replaceText('keyOrder', order);
copyBody.replaceText('keyDate', date);
copyBody.replaceText('keyGender', gender);
copyDoc.saveAndClose();
var pdf = DocsList.getFileById(copyId).getAs("application/pdf");
MailApp.sendEmail(email, subject, "", {htmlBody: office_message, attachments: pdf,
noReply:true});
DocsList.getFileById(copyId).setTrashed(true);
}
Example of the problem: If client leaves date field blank on the form, the gender value in the resulting pdf is put where the date value should be and the value for gender on the pdf shows "undefined".
Any ideas out there?
You should validate the values of all your variables.
if (last === undefined) {
last = 'No Data!'; //re-assign a different value
};
So, you are changing the value of the variable last if it somehow got set to undefined. That way, hopefully the pdf would still show the field.
If there is some bug that just showed up recently, you should report it as a bug. If everything was working fine, and now it's broken, Google may have changed something.
There might be something wrong with your code. I don't know. Have you looked under the "View" menu and the "Execution Transcript" to see if there are any errors? You should also use Logger.log statements: Logger.log('The value of last is: ' + last); to print output to the log. Then you can check what is actually going on.
I am no great coder but I use this script all the time to send pdf's I have never received an undefined if a field was missing. Typically if something is missing, the keygender is replaced with a blank spot and there is no error. In a spreadsheet, typically this means the columns were changed. It used to be timestamp(0), last(1), first(2), order(3), date(4), gender(5) and now their in a different order.
Try the below code it works
//commons errors -
//Triggers are not set
//spaces after Form questions
//e.values dont work when fields are not mandatory and left blank
//e.namedValues dont work for sending emails use e.values[#]
//place holder keys in template dont match
//spelling errors
//Note expect undefined error when de-bugging as values are not defined until form completed and submitted - run first with a small test form as per below
// Get Template
//from Google Docs and name it
var docTemplate = " "; // *** replace with new templae ID if new Template created***
var docName = "Test Script"; //replace with document name
// When Form Gets submitted
function onFormSubmit(e) {
//Get information from the form and set as variables
//var variablename = "static entry or form value"
//Note: var Variablename = e.namedValues["X"]; is taking the value from the spreadsheet by column name so update if spreadsheet or form questions change
//Additions to the form will be added to the end of the spreadsheet regardless of their position in the form
var Timestamp = e.namedValues["Timestamp"];
var full_name = e.namedValues["Name"];
var position = e.namedValues["Position"]
var contact_email = e.namedValues["Contact Email"];
var phone_number = e.namedValues["Telephone Number"];
// Get document template, copy it as a new doc with Name and email, and save the id
var copyId = DocsList.getFileById(docTemplate)
.makeCopy(full_name+' '+docName+' for ' +contact_email+' '+Timestamp)//Update or remove Variablename to create full doc Name
.getId();
// Open the temporary document
var copyDoc = DocumentApp.openById(copyId);
// Get the documents body section
var copyBody = copyDoc.getActiveSection();
// Replace place holder keys <<namedValues>> in template
//copyBody.replaceText('<<X>>', Variablename); Variables from above
//***Update if template is changed***
copyBody.replaceText('<<Timestamp>>', Timestamp);
copyBody.replaceText('<<Name>>', full_name);
copyBody.replaceText('<<Position>>', position);
copyBody.replaceText('<<Contact Email>>', contact_email);
copyBody.replaceText('<<Telephone Number>>', phone_number);
// Save and close the temporary document
copyDoc.saveAndClose();
// Convert temporary document to PDF by using the getAs blob conversion
var pdf = DocsList.getFileById(copyId).getAs("application/pdf");
{
// Add the data fields to the message
var s = SpreadsheetApp.getActiveSheet();
var columns = s.getRange(1,1,1,s.getLastColumn()).getValues()[0];
var message = " ";
// Only include form fields that are not blank
for ( var keys in columns ) {
var key = columns[keys];
if ( e.namedValues[key] && (e.namedValues[key] != "") ) {
message += key+ ' : '+ e.namedValues[key] + "<br>";
}
}}
// Attach PDF and send the email
//***Change the "To" email address when to form goes live***
var to = "youremail#gmail.com";
var senders_name = e.values[1]
var contact_email = e.values[3]
var subject = "Test";
var htmlbody = "text goes here"+
"<br> <br>"+message+
"<br> <br>Submitted By:"+
"<br> <br>"+full_name+
"<br>"+position+
"<br>"+contact_email+
"<br>"+phone_number+
"<br> <br>Generated by Hansmoleman for Compu-Global-Hyper-Mega-Net";
MailApp.sendEmail({
name: senders_name,
to: to,
cc: contact_email,
replyTo: contact_email,
subject: subject,
htmlBody: htmlbody,
attachments: pdf,
});
}

Split a "tagged" PDF document into multiple documents, keeping the tagging

In a project I have to split a PDF document into two documents, one containing all blank pages, and one containing all pages with content.
For this job, I use a PdfReader to read the source file, and two pdfCopy objects (one for the blank pages document, one for the pages with content document) to write the files to.
I use GetImportedPage to read a PdfImportedPage, which is then added to one of the PdfCopy writers.
Now, the problem is the following: the source file is using the "tagged PDF format". To preserve this (which is absolutely required), I use the SetTagged() method on both PdfCopy writers, and use the extra third parameter in GetImportedPage(...) to keep the tagged format. However, when calling the AddPage(...) on the PdfCopy writer, I get an invalid cast exception:
"Unable to cast object of type 'iTextSharp.text.pdf.PdfDictionary' to type 'iTextSharp.text.pdf.PRIndirectReference'."
Anyone has any ideas on how to solve this ? Any hints ?
Also: the project currently refers version 5.1.0.0 of the itext libraries. In 5.4.4.0 the third parameter to GetImportedPage does not seem to be there anymore.
Below, you can find a code extract:
iTextSharp.text.Document targetPdf = new iTextSharp.text.Document();
iTextSharp.text.Document blankPdf = new iTextSharp.text.Document();
iTextSharp.text.pdf.PdfReader sourcePdfReader = new iTextSharp.text.pdf.PdfReader(inputFile);
iTextSharp.text.pdf.PdfCopy targetPdfWriter = new iTextSharp.text.pdf.PdfSmartCopy(targetPdf, new FileStream(outputFile, FileMode.Create));
iTextSharp.text.pdf.PdfCopy blankPdfWriter = new iTextSharp.text.pdf.PdfSmartCopy(blankPdf, new FileStream(blanksFile, FileMode.Append));
targetPdfWriter.SetTagged();
blankPdfWriter.SetTagged();
try
{
iTextSharp.text.pdf.PdfImportedPage page = null;
int n = sourcePdfReader.NumberOfPages;
targetPdf.Open();
blankPdf.Open();
blankPdf.Add(new iTextSharp.text.Phrase("This document contains the blank pages removed from " + inputFile));
blankPdf.NewPage();
for (int i = 1; i <= n; i++)
{
byte[] pageBytes = sourcePdfReader.GetPageContent(i);
string pageText = "";
iTextSharp.text.pdf.PRTokeniser token = new iTextSharp.text.pdf.PRTokeniser(new iTextSharp.text.pdf.RandomAccessFileOrArray(pageBytes));
while (token.NextToken())
{
if (token.TokenType == iTextSharp.text.pdf.PRTokeniser.TokType.STRING)
{
pageText += token.StringValue;
}
}
if (pageText.Length >= 15)
{
page = targetPdfWriter.GetImportedPage(sourcePdfReader, i, true);
targetPdfWriter.AddPage(page);
}
else
{
page = blankPdfWriter.GetImportedPage(sourcePdfReader, i, true);
blankPdfWriter.AddPage(page);
blankPageCount++;
}
}
}
catch (Exception ex)
{
Console.WriteLine("Exception at LOC1: " + ex.Message);
}
The error occurs in the call to targetPdfWriter.AddPage(page); near the end of the code sample.
Thank you very much for your help.
Koen.

Word OLE Automation - delete first page and manipulate header and footer

I am using PHP to start Word Automation and manipulate word documents, but i guess it can be done in all any other language. What i need to do is quite simple, i need to remove the first page and add header and footer.
Here is my code:
$word = new COM('word.applicantion');
$word->Documents->Open('xxx.docx');
$word->Documents[1]->SaveAs($result_file_name, 12);
Any samples?
This is the way you could do it in VBA. This can likely be ported to PHP fairly simply.
Sub RemoveFirstPageAndAddHeaderFooter()
Dim d As Document
Set d = ActiveDocument
Dim pageOne As Range
Set pageOne = d.Bookmarks("\page").Range
pageOne.Select
Selection.Delete
d.Sections(1).Headers(1).Range.Text = "Some text"
d.Sections(1).Footers(1).Range.InlineShapes.AddPicture "C:\beigeplum.jpg", False, True
End Sub
Note on the ...InlineShapes.AddPicture - the onus would be on you to ensure the picture is the right size. If you want more control over this, you would use .Footers(1).Shapes.AddPicture instead as that let's you set the width/height, top/left, etc.
try
{
$word = new COM("word.application") //$word = new COM("C:\x.docx");
or die("couldnt create an instance of word");
//bring word to the front
$word->Visible = 1;
//open a word document
$word->Documents->Open("file.docx");
// remove first page
$range = $word->ActiveDocument->Bookmarks("\page");
$range->Select();
$word->Selection->Delete();
//save the document as docx
$word->Documents[1]->SaveAs("modified_file.docx", 12); // SaveAs('filename', format) // format: 0 - same?, 1 - doc?, 2 - text, 4 - text other encoding
}
catch(Exception $e)
{
echo "error class.document.php - convert_to_docx: $e 20100816.01714";
}
//close word
if($word)
$word->Quit();
//free object resources
//$word->Release();
$word = null;