how can i export DataGridView with ARABIC data from Visual Basic to PDF by using iTextSharp [duplicate] - vb.net

I have a problem with inserting UNICODE characters in a PDF file in eclipse.
There is some solution for this that it is not very efficient for me.
The solution is like this.
document.add(new Paragraph("Unicode: \u0418", new Font(bfComic, 12)));
I want to retrieve data from a database and show them to the user and my characters are in Arabic script and sometimes in Farsi script.
What solution do you suggest?
thanks

You are experiencing different problems:
Encoding of the data:
Please download chapter 2 of my book and go to section 2.2.2 entitled "The Phrase object: a List of Chunks with leading". In this section, look for the title "Database encoding versus the default CharSet used by the JVM".
You will see that database values are retrieved like this:
String name1 = new String(rs.getBytes("given_name"), "UTF-8");
That’s because the database contains different names with special characters. You risk that these special characters are displayed as gibberish if you would retrieve the field like this:
String name2 = rs.getString("given_name")
Encoding of the font:
You create your font like this:
Font font = new Font(bfComic, 12);
You don't show how you create bfComic, but I assume that this object is a BaseFont object using IDENTITY_H as encoding.
Writing from right to left / making ligatures
Although your code will work to show a single character, it won't work to show a sentence correctly.
Suppose that name1 is the Arabic version of the name "Lawrence of Arabia" and that we want to write this name to a PDF. This is done three times in the following screen shot:
The first line is wrong, because the characters are in the wrong order. They are written from left to right whereas they should be written from right to left. This is what will happen when you do:
document.add(name1);
Even if the encoding is correct, you're rendering the text incorrectly.
The second line is also wrong. The characters are now in the correct order, but no ligatures are made: ل followed by و should be combined into a single glyph: لو
You can only achieve this by adding the content to a ColumnText or PdfPCell object, and by setting the run direction to PdfWriter.RUN_DIRECTION_RTL. For instance:
pdfCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
Now the text will be rendered correctly.
This is explained in chapter 11 of my book. You can find a full example here: Ligatures2

Related

tcolorbox in RMarkdown with shortcuts

I'm using the LaTex library tcolorbox in a RMarkdown document to list R code with tbclisting{...}. It works fine when I use the full command is used in the document
\begin{tcblisting}{colback=red!5!white, colframe=red!50!black,listing only,before skip=5 cm,
title=R code for finding and plotting frequencies from sorted data - Figure 3.5,hbox, enhanced, drop fuzzy shadow, listing options={language=R,keywordstyle=\color{blue}},before=\begin{center}, after=\end{center}}
some text
\end{tcblisting}
Due to the length of the command and the multiple use of similar boxes, changing each box to reflect, say, a new color, is tedious and error prone. I'd like to create a short cut using
\newtcblisting{mybox}[1]{%colback=red!5!white, colframe=red!50!black,listing only,before skip=5 cm,title={#1},hbox, enhanced, drop fuzzy shadow, listing options={language=R,keywordstyle=\color{blue}}, before=\begin{center}, after=\end{center}}
in the preamble as in the LaTex documentation, then implemented using
when I refer to the predefined box using
\begin{mybox}{my box title}
some text
\end{mybox}
but the compiler-to-pdf gives me an error message
**!Package pgfkeys Error: I do not know the key '/tcb/[' and I am going to ignore it. Perhaps you misspelled it.**
I'm thinking that 1) RMarkdown/tcolorbox doesn't support this or 2) something is wrong with my syntax. For 2) I've tried putting the newtcblisting definition in the preamble header-includes: section with the "-" preceding it (no good) and in the main body of the document w/o the "-". Also no good.
Can anyone help with this?

Problem with line breaks in PDF document generated by BIRT

I have some cell texts in a BIRT report which do not flow as nicely as I hoped.
For example,
The text is Long value resultwithaverylongname whichcannotbreak and I had hoped that it would be displayed like this:
Long value
resultwithaverylongname
whichcannotbreak
The render options are as follows:
renderOptions.setOutputFormat(IPDFRenderOption.OUTPUT_FORMAT_PDF);
renderOptions.setOption(IPDFRenderOption.PAGE_OVERFLOW, IPDFRenderOption.OUTPUT_TO_MULTIPLE_PAGES);
renderOptions.setOption(IPDFRenderOption.PDF_TEXT_WRAPPING, true);
renderOptions.setOption(IPDFRenderOption.PDF_WORDBREAK, true);
It seems to me that my desired output is physically possible but I don't know why BIRT does not break on a whitespace and breaks in the middle of the word.
I am using BIRT 4.16 (from Sourceforge). The texts contain normal whitespace (no non-breakable spaces) and are displayed via a data object.
3.Sep.21
I now have an example project which I am trying to commit to Github. In the meantime here is a screenshot showing breaks which look good and others which are not...
The git repo is here: https://github.com/pramsden/test.wordbreak
If the text "resultwithaverylongname" physically fits, then you are right:
BIRT should not break it in the middle of the word.
Your renderOptions seem right (depending of what BIRT version you are using).
At first glance this looks like a bug.
But: In German language, we often have quite long words, and I've created a lot of (complex) PDF reports with BIRT, but I never saw this issue.
So I guess it is a tiny silly detail which causes this.
Just to double-check:
Are the spaces between "Long", "value", "result..." normal spaces (0x20)? or non-breaking spaces?
Which BIRT release are you using?
Are you using a data item or a dynamic text item and if so, is it HTML or plain text?
Can you create a reproducible simple test case and post the rptdesign file somewhere?
well i don use BIRT , but try to use (\n),
in my case I use PDFFlow library to generate pdf docs, and to make a line-break i just use \n
this is a simple example code to create a pdf file and use line break
var DocumentBuilder.New()
.AddSection()
.AddParagraphToSection("Hello world! \n go to the next line")
.ToDocument()
.Build("Result.PDF");
try it and tell me if it works

Edit a Mainframe file in the RecordEditor without a copybook

How do you Edit a (binary EBCDIC) Mainframe file in the RecordEditor with out a Cobol Copybook.
How do you generate Java code to read the file using the RecordEditor.
Note: This is an attempt to split a question that is far to broad to give meaningful answer to
into a series of simpler Question and Answer's.
Try and avoid editing a binary file with a Cobol Copybook if at all possible. This should only be attempted as a last resort !!!.
Try and get
that Cobol copybook (or some field layout document) for the file !!!
Some general advise:
It is feasible when dealing with 10 / 20 fields in a record but not if there a thousands of fields in a Record.
Take your time do not rush the process. Try and get each step correct before moving on
Finally upgrade to the latest version of the RecordEditor (currently 0.98.4)
This process will also work for normal Text file as well
RecordEditor Layout Wizard
To start the wizard select option Record Layouts >>> Layout Wizard.
File Structure screen
The file structure screen has 3 purposes:
Get the File structure - It could be Fixed Width, VB, Windows/Unix Text file
Get the Record-Length (if it is a fixed width file).
Get the font (character-set / encoding)
The RecordEditor will try and work this out for you
Field Selection Screen
The RecordEditor will try and work out where fields start and end but
it is not perfect. You need to carefully check and correct its choices
On this screen, the fields are displayed in alternating colors
you create/delete a field by clicking on
use the Clear Fields button clear all the fields
you can change what field-types to search for using the various check box's (e.g. Mainframe Zones Decimal)
The Add Fields will do another field search
Field Definition screen
On this screen you define the field names and Types. You may need to go back to the **Field Selection Screen* to adjust the fields
Editing the file
Once the Record Layout has been defined, it can be used on the open file screen
Generating Java code
When editing your file, you can generate java~JRecord code to read the file
by selecting Generate >>> Java >>> ....
You can the enter a package-id + generate options:
and finally your sample java code is generated to read / write the
file.

Remove extra ASCII symbol rendering in PDF

I have a user that is storing a 'registered trademark' symbol in her name in our database and when we retrieve it when the database it renders correctly, but when we actually place it onto the website itself in HTML it renders with an extra 'A' symbol in front of it:
You can see above the database value compared to what is rendered in the PDF file. I can access the database value in the backend and edit it through vb code but I am really not sure how or what the code would be to do that as I don't want to remove all ASCII characters just the extra symbol being generated and rendered in the PDF.
Any idea how to do this would be great.
I think the Main-Problem is that you generate wrong HTML-Code by just inserting your Database-Result-Strings into your Website
You can encode your Database String to HTML by using the HtmlEncode-Function from HttpUtility in .NET
Here is an Example from vb.net
myEncodedString = HttpUtility.HtmlEncode(myString)
If you use "myEncodedString" inside your WebPage you'll get no additional Characters and a valid HTML-Code.

docx4j Differencer Showing More Differences Than Expected

I have two documents:
Document 1 (input)
Document 2 (output)
Document 2 is the result of passing Document 1 through a transformation process which leaves any content and formatting intact (verified by side-by-side compare in Word).
However, the process removes many id numbers from the .docx files.
For example,
<w:p w:rsidP="00B600D6" w:rsidR="00F55D78" w:rsidRDefault="00B600D6">
becomes
<w:p>
according to a dump of each document via the following code:
Body body = ((Document)newerPackage.getMainDocumentPart().getJaxbElement()).getBody();
Node node = org.docx4j.XmlUtils.marshaltoW3CDomDocument(body).getDocumentElement();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(node),
new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));
Using the docx4j Differencer comparison method recommended here, everything (except the first line which has no formatting applied) is shown as a modification.
Question is: Are the diffs a result of the missing id's, the formatting or something else?
In case it's important, we're using docx4j in this context to perform automated sanity/regression tests on our round-tripping proceess (i.e. apply the "loss-less" process and expect no differences)
Disclosure: I work on docx4j
If the only difference between paragraphs is the rsid attributes, they will still be detected as different.
You could "clean" the documents before performing the comparison, so that neither docx has rsid attributes. See the Filter sample.
By the way, an easier way to see the XML for an object (eg a single paragraph, or the entire body) is to use XmlUtils.marshaltoString