Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText

Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText - pdf

I am using below code to remove blue colors from pdf text. It is working fine. But it is not changing underlines color, but changing text color correctly.
original file part:
Manipulated File:
As you see in above manipulated file, underline color didn't change.
I am looking fix for this thing since two weeks, can anyone help on this. Below is my change color code:
public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
try (InputStream resource = getClass().getResourceAsStream(source);
PdfReader pdfReader = new PdfReader(source);
OutputStream result = new FileOutputStream(filename);
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
PdfCanvasEditor editor = new PdfCanvasEditor() {
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("rg"));
if (currentlyReplacedBlack == null) {
Color currentFillColor =getGraphicsState().getFillColor();
if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
currentlyReplacedBlack = currentFillColor;
super.write(processor, new PdfLiteral("rg"), listobj);
}
}
} else if (currentlyReplacedBlack != null) {
if (currentlyReplacedBlack instanceof DeviceCmyk) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("k"));
super.write(processor, new PdfLiteral("k"), listobj);
} else if (currentlyReplacedBlack instanceof DeviceGray) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("g"));
super.write(processor, new PdfLiteral("g"), listobj);
} else {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("rg"));
super.write(processor, new PdfLiteral("rg"), listobj);
}
currentlyReplacedBlack = null;
}
super.write(processor, operator, operands);
}
Color currentlyReplacedBlack = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
editor.editPage(pdfDocument, i);
}
}
File file = new File(source);
file.delete();
}
Here is the original file.
https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf
Related Links:
Traverse whole PDF and change some attribute with some object in it using iText
Removing Watermark from PDF iTextSharp
Maven Dependcy Details:
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.1.5</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.0.6</version>
</dependency>
Edited:
Accepted answer is not working for below files:
https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (Page 41)
https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (Page 60).
Please Help.

(The example code here uses iText 7 for Java. You mentioned neither the iText version nor your programming environment in tags or question text but your example code appears to indicate that this is your combination of choice.)
Replacing blue fill colors
The test you based your original code on attempts explicitly only to change text color. The "underline" in your document, though, is (as far as PDF drawing is concerned) not part of the text but instead drawn as a simple path. Thus, the underline explicitly is not touched by the original code and it has to be adapted for your task.
But actually your task, changing everything blue to black, is easier to implement than only changing the blue text, e.g.
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(ChangeColor test testChangeFillRgbBlueToBlack)
Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:
It merely looks at the fill (non-stroking) colors. In your case that suffices as both your text (as usual) and your underline use fill colors only - the underline actually is not drawn as a stroked line but instead as a slim, filled rectangle.
Only RGB blue (and only such blue set using the rg instruction, not set using sc or scn, let alone blues combined out of other colors using funky blend modes) is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).
PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.
The result:
Replacing blue fill and stroke colors
Testing the code above you soon found documents in which the underlines were not changed. As it turned out, these underlines are actually drawn as stroked lines, not as filled rectangle as above.
To also properly edit such documents, therefore, you must not only edit the fill colors but also the stroke colors, e.g. like this:
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(ChangeColor tests testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRev and testChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac)
The results:
and
Replacing different shades of blue from other RGB'ish color spaces
Testing the code above you again found documents in which the blue colors were not changed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. In particular other color setting operators were used than before, sc / scn instead of rg. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used
If we assume that sc and scn instructions with three numeric arguments set some flavor of RGB colors as here (in general this is an oversimplification, Lab and other color spaces can also come with 4 components, but your documents seem RGB oriented) and are less strict in recognizing the blue color, we can generalize the code above as follows:
class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
PdfNumber number0 = new PdfNumber(0);
operands.set(0, number0);
operands.set(1, number0);
operands.set(2, number0);
}
}
super.write(processor, operator, operands);
}
boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
float r = ((PdfNumber)red).floatValue();
float g = ((PdfNumber)green).floatValue();
float b = ((PdfNumber)blue).floatValue();
return b > .5f && r < .9f*b && g < .9f*b;
}
return false;
}
final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}
(ChangeColor helper class)
Used like this
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
we get
and

Related

Text displayed in blue although PDAnnotation removed

We have a requirement where we need to remove annotation on some matched conditional check. PDAnnotaion gets removed when I have executed allPageAnnotationsList.remove(annotationTobeRemoved) statement.
But corresponding text remained displayed in blue color only. How could I update the text color to normal(black)?

Originally I thought you asked for all non-black text on a page to be changed to black. This resulted in my original answer, now the first section 'Updating All Text to Black'. Then you clarified that you only wanted the text in the areas of the removed annotations to be made black. That's shown in the second section 'Updating Text in Areas to Black'.
Updating All Text to Black
First of all, as already described by Tilman in comments, removing link annotations usually merely removes the interactivity of that link but the text in the area of the link annotation remains as is. If you want to update the text color to normal(black), therefore, you have to add a second step and manipulate the colors in the static page contents.
The static page content is defined by a stream of instructions which change the graphics state or draw something. The color used for drawing is part of the graphics state and is set by explicit color setting instructions. Thus, one could think you could simply replace all color setting instructions by instructions selecting normal(black).
Unfortunately it's not that easy because colors may be changed to draw other things, too. E.g. in your document at the start the whole page is filled with white; if you replaced the color setting instruction before that fill instruction, your whole page would be black. Not exactly what you want.
To update the text color to normal(black) but not change other colors, therefore, you have to consider the context of instructions you want to change.
The PDFBox parsing framework can help you here, iterating over a content stream and keeping track of the graphics state.
Based upon that framework, furthermore, a generic content stream editor helper class has been created in this answer, the PdfContentStreamEditor. (For details and example uses see that answer.) Now you merely have to customize it for your use case, e.g. like this:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
#Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
if (currentlyReplacedColor == null)
{
PDColor currentFillColor = getGraphicsState().getNonStrokingColor();
if (!isBlack(currentFillColor))
{
currentlyReplacedColor = currentFillColor;
super.write(contentStreamWriter, SET_NON_STROKING_GRAY, GRAY_BLACK_VALUES);
}
}
} else if (currentlyReplacedColor != null) {
PDColorSpace replacedColorSpace = currentlyReplacedColor.getColorSpace();
List<COSBase> replacedColorValues = new ArrayList<>();
for (float f : currentlyReplacedColor.getComponents())
replacedColorValues.add(new COSFloat(f));
if (replacedColorSpace instanceof PDDeviceCMYK)
super.write(contentStreamWriter, SET_NON_STROKING_CMYK, replacedColorValues);
else if (replacedColorSpace instanceof PDDeviceGray)
super.write(contentStreamWriter, SET_NON_STROKING_GRAY, replacedColorValues);
else if (replacedColorSpace instanceof PDDeviceRGB)
super.write(contentStreamWriter, SET_NON_STROKING_RGB, replacedColorValues);
else {
//TODO
}
currentlyReplacedColor = null;
}
super.write(contentStreamWriter, operator, operands);
}
PDColor currentlyReplacedColor = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
final Operator SET_NON_STROKING_CMYK = Operator.getOperator("k");
final Operator SET_NON_STROKING_RGB = Operator.getOperator("rg");
final Operator SET_NON_STROKING_GRAY = Operator.getOperator("g");
final List<COSBase> GRAY_BLACK_VALUES = Arrays.asList(COSInteger.ZERO);
};
editor.processPage(page);
}
document.save("withBlackText.pdf");
(ChangeTextColor test testMakeTextBlackTestAfterRemovingAnnotation)
Here we check whether the current instruction is a text drawing instruction. If it is and the current color is not already replaced, we check whether the current color is already black'ish. If it is not black, we store it and add an instruction to replace the current fill color by black.
Otherwise, i.e. if the current instruction is not a text drawing instruction, we check whether the current color has been replaced by black. If it has, we restore the original color.
To check whether a given color is black'ish, we use the following helper method.
static boolean isBlack(PDColor pdColor) {
PDColorSpace pdColorSpace = pdColor.getColorSpace();
float[] components = pdColor.getComponents();
if (pdColorSpace instanceof PDDeviceCMYK)
return (components[0] > .9f && components[1] > .9f && components[2] > .9f) || components[3] > .9f;
else if (pdColorSpace instanceof PDDeviceGray)
return components[0] < .1f;
else if (pdColorSpace instanceof PDDeviceRGB)
return components[0] < .1f && components[1] < .1f && components[2] < .1f;
else
return false;
}
(ChangeTextColor helper method)
Updating Text in Areas to Black
In comments you clarified that you only want the text in the areas of the removed annotations to become black.
For this you have to collect the rectangles of the annotations you remove and later check the position before switching colors whether it's inside one of those rectangles.
This can be done by extending the code above as follows. Here I remove every other annotation only and collect their rectangles to check against them later. Also I override the PDFStreamEngine method showText(byte[]) to store the position of the text shown in the current text drawing instruction.
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
List<PDRectangle> areas = new ArrayList<>();
// Remove every other annotation, collect their areas
List<PDAnnotation> annotations = new ArrayList<>();
boolean remove = true;
for (PDAnnotation annotation : page.getAnnotations()) {
if (remove)
areas.add(annotation.getRectangle());
else
annotations.add(annotation);
remove = !remove;
}
page.setAnnotations(annotations);
PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
#Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString) && isInAreas()) {
if (currentlyReplacedColor == null)
{
PDColor currentFillColor = getGraphicsState().getNonStrokingColor();
if (!isBlack(currentFillColor))
{
currentlyReplacedColor = currentFillColor;
super.write(contentStreamWriter, SET_NON_STROKING_GRAY, GRAY_BLACK_VALUES);
}
}
} else if (currentlyReplacedColor != null) {
PDColorSpace replacedColorSpace = currentlyReplacedColor.getColorSpace();
List<COSBase> replacedColorValues = new ArrayList<>();
for (float f : currentlyReplacedColor.getComponents())
replacedColorValues.add(new COSFloat(f));
if (replacedColorSpace instanceof PDDeviceCMYK)
super.write(contentStreamWriter, SET_NON_STROKING_CMYK, replacedColorValues);
else if (replacedColorSpace instanceof PDDeviceGray)
super.write(contentStreamWriter, SET_NON_STROKING_GRAY, replacedColorValues);
else if (replacedColorSpace instanceof PDDeviceRGB)
super.write(contentStreamWriter, SET_NON_STROKING_RGB, replacedColorValues);
else {
//TODO
}
currentlyReplacedColor = null;
}
super.write(contentStreamWriter, operator, operands);
before = null;
after = null;
}
PDColor currentlyReplacedColor = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
final Operator SET_NON_STROKING_CMYK = Operator.getOperator("k");
final Operator SET_NON_STROKING_RGB = Operator.getOperator("rg");
final Operator SET_NON_STROKING_GRAY = Operator.getOperator("g");
final List<COSBase> GRAY_BLACK_VALUES = Arrays.asList(COSInteger.ZERO);
#Override
protected void showText(byte[] string) throws IOException {
Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
if (before == null)
before = getTextMatrix().multiply(ctm);
super.showText(string);
after = getTextMatrix().multiply(ctm);
}
Matrix before = null;
Matrix after = null;
boolean isInAreas() {
return isInAreas(before) || isInAreas(after);
}
boolean isInAreas(Matrix m) {
return m != null && areas.stream().anyMatch(rect -> rect.contains(m.getTranslateX(), m.getTranslateY()));
}
};
editor.processPage(page);
}
document.save("WithoutSomeAnnotation-withBlackTextThere.pdf");

A better solution for Itext 7 Text fitting

I have a project that used Itext 5 and worked as intended.
Program had to put userInput in certain 'Chunks' inside paragraphs. Paragraphs have unmovable (chunks)words per line, and the userInput should scale in the space reserved for the userInput inside paragraph.
Old Project had the following code(made as example)
public class Oldway {
static final transient Font bold2 = FontFactory.getFont("Times-Roman", 10.0f, 1);
public static void main(String[] args) {
Document document = new Document();
document.setPageSize(PageSize.A4);
try {
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(new File("itext5.pdf")));
document.open();
Paragraph title = new Paragraph("Title of doc");
title.setAlignment(1);
document.add(title);
Paragraph dec= new Paragraph();
Chunk ch01 = new Chunk("Prev text ");
dec.add(ch01);
Chunk ch02 = new Chunk(getEmptySpace(42));
dec.add(ch02);
Chunk ch03 = new Chunk(" next Text");
dec.add(ch03);
document.add(dec);
float y = writer.getVerticalPosition(false);
float x2 = document.left() + ch01.getWidthPoint();
float x3 = x2 + ch02.getWidthPoint();
getPlainFillTest("Text to insert", document, y, x3, x2, writer, false);
document.close();
writer.flush();
} catch (FileNotFoundException | DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static Chunk getEmptySpace(int size) {
Chunk ch = new Chunk();
for(int i = 0;i<=size;i++) {
ch.append("\u00a0");
}
return new Chunk(ch);
}
public static void getPlainFillTest(String str,Document document,float y, float x1pos,
float x2pos, PdfWriter writer,boolean withTab) {
if(str.isEmpty() || str.isBlank()) {
str = "________";
}
Rectangle rec2 = null;
if(!withTab)
rec2 = new Rectangle(x2pos, y, x1pos-2,y+10);
else {
rec2 = new Rectangle(x2pos+35, y, x1pos+33,y+10);
}
BaseFont bf = bold2.getBaseFont();
PdfContentByte cb = writer.getDirectContent();
float fontSize = getMaxFontSize(bf, str,(int)rec2.getWidth(), (int)rec2.getHeight());
Phrase phrase = new Phrase(str, new Font(bf, fontSize));
ColumnText.showTextAligned(cb, Element.ALIGN_CENTER, phrase,
// center horizontally
(rec2.getLeft() + rec2.getRight()) / 2,
// shift baseline based on descent
rec2.getBottom() - bf.getDescentPoint(str, fontSize),0);
cb.saveState();//patrulaterul albastru
cb.setColorStroke(Color.BLUE);
cb.rectangle(rec2.getLeft(), rec2.getBottom(), rec2.getWidth(), rec2.getHeight());
cb.stroke();
cb.restoreState();
}
//stackoverflow solution
private static float getMaxFontSize(BaseFont bf, String text, int width, int height){
// avoid infinite loop when text is empty
if(text.isEmpty()){
return 0.0f;
}
float fontSize = 0.1f;
while(bf.getWidthPoint(text, fontSize) < width){
fontSize += 0.1f;
}
float maxHeight = measureHeight(bf, text, fontSize);
while(maxHeight > height){
fontSize -= 0.1f;
maxHeight = measureHeight(bf, text, fontSize);
};
return fontSize;
}
public static float measureHeight(BaseFont baseFont, String text, float fontSize)
{
float ascend = baseFont.getAscentPoint(text, fontSize);
float descend = baseFont.getDescentPoint(text, fontSize);
return ascend - descend;
}}
Now I'm trying to do the same thing in IText 7 and ...is not that easy!
I manage to create a working code, but its messy, and some things don't get the right coordinates. The Itext7 code(made as example):
public class Newway {
public static void main(String[] args) {
PdfWriter writer;
try {
writer = new PdfWriter(new File("test2.pdf"));
PdfDocument document = new PdfDocument(writer);
document.getDocumentInfo().addCreationDate();
document.getDocumentInfo().setTitle("Title");
document.setDefaultPageSize(PageSize.A4);
Document doc = new Document(document);
doc.setFontSize(12);
Paragraph par = new Paragraph();
Text ch01 = new Text("Prev Text ");
par.add(ch01);
Paragraph space = new Paragraph();
space.setMaxWidth(40);
for(int i=0;i<40;i++) {
par.add("\u00a0");
space.add("\u00a0");
}
Text ch02 = new Text(" next text");
par.add(ch02);
doc.add(par);
Paragraph linePara = new Paragraph().add("Test from UserInput")
.setTextAlignment(TextAlignment.CENTER).setBorder(new DottedBorder(1));
float width = doc.getPageEffectiveArea(PageSize.A4).getWidth();
float height = doc.getPageEffectiveArea(PageSize.A4).getHeight();
IRenderer primul = ch01.createRendererSubTree().setParent(doc.getRenderer());
IRenderer spaceR = space.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult primulResult = primul.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
LayoutResult layoutResult = spaceR.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
Rectangle primulBox = ((TextRenderer) primul).getInnerAreaBBox();
Rectangle rect = ((ParagraphRenderer) spaceR).getInnerAreaBBox();
float rwidth = rect.getWidth();
float rheight = rect.getHeight();
float x = primulBox.getWidth()+ doc.getLeftMargin();
float y = rect.getY()+(rheight*2.05f);//rect.getY() is never accurate, is always below the paragraph. WHY ??
Rectangle towr = new Rectangle(x, y, rwidth, rheight*1.12f);//rheight on default is way too small
PdfCanvas pdfcanvas = new PdfCanvas(document.getFirstPage());
Canvas canvas = new Canvas(pdfcanvas, towr);
//from theinternet
float fontSizeL = 1;
float fontSizeR = 14;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
linePara.setFontSize(curFontSize);
// It is important to set parent for the current element renderer to a root renderer
IRenderer renderer = linePara.createRendererSubTree().setParent(canvas.getRenderer());
LayoutContext context = new LayoutContext(new LayoutArea(1, towr));
if (renderer.layout(context).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
} else {
fontSizeR = curFontSize;
}
}
canvas.add(linePara);
new PdfCanvas(document.getFirstPage()).rectangle(towr).setStrokeColor(ColorConstants.BLACK).stroke();
canvas.close();
doc.close();
writer.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}}
The questions are
Is there a better, more elegant way to do this ?
Why rect.getY() is always below the paragraph? and how do I get Y to match the Paragraph real Y coorinate ?
Why the default 'rheight' is always too small? but (rheight*1.1f) works ?
(Optional) How do I set tab() space size in IText 7 ?

This way is quite good because it's taking into account all the possible model element settings and implications of the layout process. Your iText 5 alternative was good enough for basic case of Latin-based text without any modifications on the visual side. The iText 7 code you have is much more flexible and will still work if you use more complex layout settings, complex scripts etc. Also I see the iText 5 code is 105 lines in your example while the iText 7 code is 80 lines.
You are adding some magic +(rheight*2.05f); here while in reality what you are missing here is that when you draw via Canvas you don't have your margins defined anymore, so what you really need instead of rect.getY()+(rheight*2.05f); is rect.getY() + doc.getBottomMargin()
The issue comes from the fact that you are calculating rheight as renderer.getInnerAreaBBox() while this calculation does not take into account default margins that are applied to a paragraph. Margins are included into the occupied area but not the inner area bbox. To fix that, use renderer.getOccupiedArea().getBBox() instead. In this case there is not need to multiply rheight by a coefficient anymore.
The visual result is slightly different now but there is no magic constants anymore. Depending on what you are trying to really achieve you can tune the code further (add some margins here and there etc). But the code adapts well to the change in the user text.
Visual result before:
Visual result after:
Resultant code:
PdfWriter writer;
try {
writer = new PdfWriter(new File("test2.pdf"));
PdfDocument document = new PdfDocument(writer);
document.getDocumentInfo().addCreationDate();
document.getDocumentInfo().setTitle("Title");
document.setDefaultPageSize(PageSize.A4);
Document doc = new Document(document);
doc.setFontSize(12);
Paragraph par = new Paragraph();
Text ch01 = new Text("Prev Text ");
par.add(ch01);
Paragraph space = new Paragraph();
space.setMaxWidth(40);
for(int i=0;i<40;i++) {
par.add("\u00a0");
space.add("\u00a0");
}
Text ch02 = new Text(" next text");
par.add(ch02);
doc.add(par);
Paragraph linePara = new Paragraph().add("Test from UserInput")
.setTextAlignment(TextAlignment.CENTER).setBorder(new DottedBorder(1));
float width = doc.getPageEffectiveArea(PageSize.A4).getWidth();
float height = doc.getPageEffectiveArea(PageSize.A4).getHeight();
IRenderer primul = ch01.createRendererSubTree().setParent(doc.getRenderer());
IRenderer spaceR = space.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult primulResult = primul.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
LayoutResult layoutResult = spaceR.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
Rectangle primulBox = ((TextRenderer) primul).getInnerAreaBBox();
Rectangle rect = ((ParagraphRenderer) spaceR).getOccupiedArea().getBBox();
float rwidth = rect.getWidth();
float rheight = rect.getHeight();
float x = primulBox.getWidth()+ doc.getLeftMargin();
float y = rect.getY() + doc.getBottomMargin();
Rectangle towr = new Rectangle(x, y, rwidth, rheight);
PdfCanvas pdfcanvas = new PdfCanvas(document.getFirstPage());
Canvas canvas = new Canvas(pdfcanvas, towr);
//from theinternet
float fontSizeL = 1;
float fontSizeR = 14;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
linePara.setFontSize(curFontSize);
// It is important to set parent for the current element renderer to a root renderer
IRenderer renderer = linePara.createRendererSubTree().setParent(canvas.getRenderer());
LayoutContext context = new LayoutContext(new LayoutArea(1, towr));
if (renderer.layout(context).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
} else {
fontSizeR = curFontSize;
}
}
canvas.add(linePara);
new PdfCanvas(document.getFirstPage()).rectangle(towr).setStrokeColor(ColorConstants.BLACK).stroke();
canvas.close();
doc.close();
writer.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Traverse whole PDF and Remove underlines of hyperlinks (annotations) only + iText

I have successfully changed the color of underlines using below link code. Can anyone help me how to remove underlines from PDF, the underlines i have find using below link code.
Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText
Below is my code that are finding hyperlinks and changing their colors to black. I have to modify this code to remove those underlines.
PdfCanvasEditor editor = new PdfCanvasEditor() {
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
editor.editPage(pdfDocument, i);
}
Edited:
Accepted answer is not working for below files:
https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (Page 41)
https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (Page 60).
Please Help.

As described in a comment in the context of the referenced question
it is easy to make the editor class above remove vector graphics by replacing fill or stroke instructions by instructions dropping the current path without drawing it. If only doing so in case of the applicable current color being blue, that would likely do the job in case of your example PDFs. But beware, in documents with other graphics with blue elements (e.g. logos), these would be mutilated, too.
This is what the following content editor does:
class PdfGraphicsRemoverByColor extends PdfCanvasEditor {
public PdfGraphicsRemoverByColor(Color color) {
this.color = color;
}
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (color.equals(getGraphicsState().getFillColor())) {
switch (operatorString) {
case "f":
case "f*":
case "F":
operatorString = "n";
break;
case "b":
case "b*":
operatorString = "s";
break;
case "B":
case "B*":
operatorString = "S";
break;
}
}
if (color.equals(getGraphicsState().getStrokeColor())) {
switch (operatorString) {
case "s":
case "S":
operatorString = "n";
break;
case "b":
case "B":
operatorString = "f";
break;
case "b*":
case "B*":
operatorString = "f*";
break;
}
}
operator = new PdfLiteral(operatorString);
operands.set(operands.size() - 1, operator);
super.write(processor, operator, operands);
}
final Color color;
}
(RemoveGraphicsByColor helper class)
Applied like this:
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfGraphicsRemoverByColor(ColorConstants.BLUE);
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(RemoveGraphicsByColor tests)
to the example files Control_of_nitrosamine_impurities_in_sartans__rev.pdf, EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf, and originalFile.pdf from the referenced question, one gets:
and
and
Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:
Only RGB blue is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).
All path fills and strokes are dropped as long as they were blue. Depending on your documents this may have to be filtered.
PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.
Different shades of blue from other RGB'ish color spaces
Testing the code above you found documents in which the blue lines were not removed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used.
To also be able to deal with these documents, the approach above must be generalized; e.g. we can use a Predicate<Color> instead of a single, specific Color, e.g. like this:
class PdfGraphicsRemoverByColorPredicate extends PdfCanvasEditor {
public PdfGraphicsRemoverByColorPredicate(Predicate<Color> colorPredicate) {
this.colorPredicate = colorPredicate;
}
#Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (colorPredicate.test(getGraphicsState().getFillColor())) {
switch (operatorString) {
case "f":
case "f*":
case "F":
operatorString = "n";
break;
case "b":
case "b*":
operatorString = "s";
break;
case "B":
case "B*":
operatorString = "S";
break;
}
}
if (colorPredicate.test(getGraphicsState().getStrokeColor())) {
switch (operatorString) {
case "s":
case "S":
operatorString = "n";
break;
case "b":
case "B":
operatorString = "f";
break;
case "b*":
case "B*":
operatorString = "f*";
break;
}
}
operator = new PdfLiteral(operatorString);
operands.set(operands.size() - 1, operator);
super.write(processor, operator, operands);
}
final Predicate<Color> colorPredicate;
}
(RemoveGraphicsByColor helper class)
Applied like this:
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfGraphicsRemoverByColorPredicate(RemoveGraphicsByColor::isRgbBlue);
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(RemoveGraphicsByColor testRemoveAllBlueLinesFrom* tests)
to the new example files using this predicate method
public static boolean isRgbBlue(Color color) {
if (color instanceof CalRgb || color instanceof DeviceRgb || (color instanceof IccBased && color.getNumberOfComponents() == 3)) {
float[] components = color.getColorValue();
float r = components[0];
float g = components[1];
float b = components[2];
return b > .5f && r < .9f*b && g < .9f*b;
}
return false;
}
(RemoveGraphicsByColor helper method)
one gets
and
Beware, the warnings from above still apply.

Change PDF Annotation font size using itext 7

My question is a bit similar to this one : Change PDF Annotation properties using iTextSharp C#
But I want to specifically change font size of pdf annotation using iText 7. I have searched a lot online but haven't been able to find any great examples or documentation regarding this. Following is the code I have used.
static void EditAnnot(string PDF)
{
string OutPDF = #"C:\Users\AP037X\Desktop\test.pdf";
iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(new iText.Kernel.Pdf.PdfReader(PDF), new iText.Kernel.Pdf.PdfWriter(OutPDF));
iText.Kernel.Pdf.PdfDictionary pageDict = pdfDoc.GetPage(1).GetPdfObject();
iText.Kernel.Pdf.PdfArray annots = pageDict.GetAsArray(iText.Kernel.Pdf.PdfName.Annots);
if (annots != null)
{
for (int i = 0; i < annots.Size(); i++)
{
Console.WriteLine("Scan..");
if (annots.GetAsDictionary(i) == null)
{
Console.WriteLine("1");
//return;
}
iText.Kernel.Pdf.PdfString t = annots.GetAsDictionary(i).GetAsString(iText.Kernel.Pdf.PdfName.Contents);
if (t == null)
{
Console.WriteLine("2");
//return;
}
Console.WriteLine(t);
if (Convert.ToString(t).Trim() == "Change")
{
Console.WriteLine("Found");
Console.WriteLine(annots.Size());
iText.Kernel.Geom.Rectangle rect = annots.GetAsDictionary(i).GetAsRectangle(iText.Kernel.Pdf.PdfName.Rect);
iText.Kernel.Pdf.PdfString cont = new iText.Kernel.Pdf.PdfString("New String");
iText.Kernel.Pdf.Annot.PdfFreeTextAnnotation NewAnnot = new iText.Kernel.Pdf.Annot.PdfFreeTextAnnotation(rect,cont);
float[] color = { 1f,1f,0f};
NewAnnot.SetColor(color);
NewAnnot.Put(iText.Kernel.Pdf.PdfName.Contents, new iText.Kernel.Pdf.PdfString("lion"));
NewAnnot.Put(iText.Kernel.Pdf.PdfName.Font, *What to type here?*);
annots.Remove(i);
annots.Add(i, NewAnnot.GetPdfObject());
}
}
}
pdfDoc.Close();
CompressPDF(OutPDF);
}

Remove underlines from text in PDF file

I have a bunch of PDF files with broken links.
I need to remove those links and right now I can do the following:
Remove link actions
Change text color from blue to black
What I can't do is to remove blue underlines below text that was a link before.
I tried several PDF libraries for .NET (because this is my primary platform)
Aspost.PDF
PDFSharp
ceTe DynamicPDF
PDFBox
You are welcone to recommend solution on any prograning language, platform and library. I just need to do this.

In case of the sample document the underlines are drawn as blue (RGB 0,0,1) filled vector graphics rectangles (long, slim ones). As blue only is used for the links, we can use that criterion to find the rectangles in question.
Here a sample implementation using PDFBox 1.8.10:
void removeBlueRectangles(PDDocument document) throws IOException
{
List<?> pages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < pages.size(); i++)
{
PDPage page = (PDPage) pages.get(i);
PDStream contents = page.getContents();
PDFStreamParser parser = new PDFStreamParser(contents.getStream());
parser.parse();
List<Object> tokens = parser.getTokens();
Stack<Boolean> blueState = new Stack<Boolean>();
blueState.push(false);
for (int j = 0; j < tokens.size(); j++)
{
Object next = tokens.get(j);
if (next instanceof PDFOperator)
{
PDFOperator op = (PDFOperator) next;
if (op.getOperation().equals("q"))
{
blueState.push(blueState.peek());
}
else if (op.getOperation().equals("Q"))
{
blueState.pop();
}
else if (op.getOperation().equals("rg"))
{
if (j > 2)
{
Object r = tokens.get(j-3);
Object g = tokens.get(j-2);
Object b = tokens.get(j-1);
if (r instanceof COSNumber && g instanceof COSNumber && b instanceof COSNumber)
{
blueState.pop();
blueState.push((
Math.abs(((COSNumber)r).floatValue() - 0) < 0.001 &&
Math.abs(((COSNumber)g).floatValue() - 0) < 0.001 &&
Math.abs(((COSNumber)b).floatValue() - 1) < 0.001));
}
}
}
else if (op.getOperation().equals("f"))
{
if (blueState.peek() && j > 0)
{
Object re = tokens.get(j-1);
if (re instanceof PDFOperator && ((PDFOperator)re).getOperation().equals("re"))
{
tokens.set(j, PDFOperator.getOperator("n"));
}
}
}
}
}
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
}
}
(RemoveUnderlines.java)
original.pdf
Applying this to your first sample file original.pdf
public void testOriginal() throws IOException, COSVisitorException
{
try ( InputStream resourceStream = getClass().getResourceAsStream("original.pdf") )
{
PDDocument document = PDDocument.loadNonSeq(resourceStream, null);
removeBlueRectangles(document);
document.save("original-noBlueRectangles.pdf");
document.close();
}
}
(RemoveUnderlines.java)
results in
1178.pdf
You commented
After testing this on many files I have to say this solution works incorrectly in some cases. For example in for this file (dropbox.com/s/23g54bvt781lb93/1178.pdf?dl=0) it removes the entire content of the page. Keep searching..
So I applyed the code to your new sample file 1178.pdf
public void test1178() throws IOException, COSVisitorException
{
try ( InputStream resourceStream = getClass().getResourceAsStream("1178.pdf") )
{
PDDocument document = PDDocument.loadNonSeq(resourceStream, null);
removeBlueRectangles(document);
document.save(new File(RESULT_FOLDER, "1178-noBlueRectangles.pdf"));
document.close();
}
}
(RemoveUnderlines.java)
which resulted in
So I cannot confirm your claim that the solution works incorrectly; in particular I see that it does not remove the entire content of the page.
As I cannot reproduce your observation, I assume there are additional issues in your setup you have not yet mentioned.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText - pdf

Related

Text displayed in blue although PDAnnotation removed

A better solution for Itext 7 Text fitting

Traverse whole PDF and Remove underlines of hyperlinks (annotations) only + iText

Change PDF Annotation font size using itext 7

Remove underlines from text in PDF file

Categories

Resources