How to identify whether the text is boxed in PDF using PDFBOX?

How to identify whether the text is boxed in PDF using PDFBOX? - pdfbox

I am trying to check whether the text is BOXED using apache PDFBOX. for few PDF the below code wont work.
public class PDFBoxReader extends PDFGraphicsStreamEngine {
private static ArrayList<Rectangle2D> recList = new ArrayList<Rectangle2D>();
public PDFBoxReader(PDPage page) {
super(page);
}
public static boolean isTextBoxed(PDDocument document, String text) {
StingBuffer boxedText = new StringBuffer();
for (PDPage page : document.getPages()) {
PDFBoxReader reader = new PDFBoxReader(page);
try {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
rectList = new ArrayList<Rectangle2D>();
reader.processPage(page);
for (Rectangle2D react : rectList) {
Double y = page.cropBox().getUpperRightY() - rect.getY() - rect.getHeight();
rect.setRect(rect.getX(), y, rect.getWidth(), rect.getWidth(), rect.getHeight());
stripper.addRegion("box", rect);
stripper.extractRegions(page);
boxedText.append(stripper.getTextForRegion("box"));
}
if (isTextMatched(text, stripper.getTextForRegion("box"))) {
return true;
}
} catch (IoException exception) {
// exception is handled here
}
}
}
// some more methods here
}
PDF dont have any acroform. it has a paragraph in a bordered box at the end of the page.

Related

Extract images of signatures contained in a pdf file with iText7

I am wondering how we can use ITEXT7 to extract image info associated to digital signatures. I know there have been similar questions asked in the past, but they were mostly around ITEXT5, which is quite different from the ITEXT7 after all the updates and modifications to the software.

You can extract the image from a signature appearance using low-level API.
Complete Java code:
private void saveImageFromSignature(PdfDocument document, String fieldName) throws IOException {
PdfAcroForm acroForm = PdfAcroForm.getAcroForm(document, false);
PdfDictionary xObject = acroForm.getField(name)
.getWidgets()
.get(0)
.getNormalAppearanceObject()
.getAsDictionary(PdfName.Resources)
.getAsDictionary(PdfName.XObject)
.getAsStream(new PdfName("FRM"))
.getAsDictionary(PdfName.Resources)
.getAsDictionary(PdfName.XObject);
PdfStream stream = xObject.getAsStream(new PdfName("Im1"));
PdfImageXObject image = new PdfImageXObject(stream);
BufferedImage result = createImageFromBytes(image.getImageBytes());
//pdf allows using masked image in the signature appearance
PdfStream maskStream = (PdfStream) stream.getAsStream(PdfName.SMask);
if (maskStream != null) {
PdfImageXObject maskImage = new PdfImageXObject(maskStream);
BufferedImage maskBimage = createImageFromBytes(maskImage.getImageBytes());
String fileMask = String.format(getOutputFolder() + "/file_mask_%d.%s",
image.getPdfObject().getIndirectReference().getObjNumber(),
image.identifyImageFileExtension());
ImageIO.write(maskBimage,
image.identifyImageFileExtension(),
new File(fileMask));
//the mask defines an alfa channel
Image transpImg = transformToTransperency(maskBimage);
result = applyTransperency(result, transpImg);
}
String filenameComp = String.format(getOutputFolder() + "/file_comp_%d.%s",
image.getPdfObject().getIndirectReference().getObjNumber(),
image.identifyImageFileExtension());
ImageIO.write(result,
image.identifyImageFileExtension(),
new File(filenameComp));
document.close();
}
private Image transformToTransperency(BufferedImage bi) {
ImageFilter filter = new RGBImageFilter() {
#Override
public int filterRGB(int x, int y, int rgb) {
return (rgb << 8) & 0xFF000000;
}
};
ImageProducer ip = new FilteredImageSource(bi.getSource(), filter);
return Toolkit.getDefaultToolkit().createImage(ip);
}
private BufferedImage applyTransperency(BufferedImage bi, Image mask) {
BufferedImage dest = new BufferedImage(
bi.getWidth(), bi.getHeight(),
BufferedImage.TYPE_INT_ARGB);
Graphics2D g2 = dest.createGraphics();
g2.drawImage(bi, 0, 0, null);
AlphaComposite ac = AlphaComposite.getInstance(AlphaComposite.DST_IN, 1.0F);
g2.setComposite(ac);
g2.drawImage(mask, 0, 0, null);
g2.dispose();
return dest;
}
Upd: This works for a very limited number of cases. Thanks for #mkl.

First of all, thank you for the proposals which personally guided me.
After several tries, here is the code that worked for me:
public void extract(String inputFilename, String fieldName) throws IOException {
try (PdfDocument document = new PdfDocument(new PdfReader(inputFilename))){
PdfAcroForm acroForm = PdfAcroForm.getAcroForm(document, false);
final PdfFormField signatorySignature1 = acroForm.getField(fieldName);
final PdfDictionary appearanceDic = signatorySignature1.getPdfObject().getAsDictionary(PdfName.AP);
final PdfStream normalAppearance = appearanceDic.getAsStream(PdfName.N);
final PdfDictionary ressourceDic = normalAppearance.getAsDictionary(PdfName.Resources);
PdfResources resources = new PdfResources(ressourceDic);
final ImageRenderInfo imageRenderInfo = extractImageRenderInfo(normalAppearance.getBytes(), resources);
Files.write(
Path.of(inputFilename + "_" + fieldName + "_" + System.currentTimeMillis() + ".png"),
imageRenderInfo.getImage().getImageBytes());
} catch (Exception e) {
e.printStackTrace();
}
}
public ImageRenderInfo extractImageRenderInfo(byte[] contentBytes, PdfResources pdfResource) {
MyLocationExtractionStrategy strategy = new MyLocationExtractionStrategy();
PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy, new HashMap<>());
parser.processContent(contentBytes, pdfResource);
return strategy.getImageRenderInfo();
}
class MyLocationExtractionStrategy implements ILocationExtractionStrategy {
private ImageRenderInfo imageRenderInfo;
#Override public Collection<IPdfTextLocation> getResultantLocations() {
return null;
}
#Override public void eventOccurred(IEventData iEventData, EventType eventType) {
if (eventType.equals(EventType.RENDER_IMAGE)) {
imageRenderInfo = (ImageRenderInfo)iEventData;
}
}
#Override public Set<EventType> getSupportedEvents() {
return null;
}
public ImageRenderInfo getImageRenderInfo() {
return this.imageRenderInfo;
}
}

Delete button stops working sporadically, only on Windows, when using our editor plugin

We are developing a markdown editor plugin for eclipse. My colleagues who are using Windows 10 encounter a bug that causes keys to stop working. The most common key is delete, other times it is ctrl + s.
Here is the code for the editor extension:
public class MarkdownEditor extends AbstractTextEditor {
private Activator activator;
private MarkdownRenderer markdownRenderer;
private IWebBrowser browser;
public MarkdownEditor() throws FileNotFoundException {
setSourceViewerConfiguration(new TextSourceViewerConfiguration());
setDocumentProvider(new TextFileDocumentProvider());
// Activator manages connections to the Workbench
activator = Activator.getDefault();
markdownRenderer = new MarkdownRenderer();
}
private IFile saveMarkdown(IEditorInput editorInput, IDocument document, IProgressMonitor progressMonitor) {
IProject project = getCurrentProject(editorInput);
String mdFileName = editorInput.getName();
String fileName = mdFileName.substring(0, mdFileName.lastIndexOf('.'));
String htmlFileName = fileName + ".html";
IFile file = project.getFile(htmlFileName);
String markdownString = "<!DOCTYPE html>\n" + "<html>" + "<head>\n" + "<meta charset=\"utf-8\">\n" + "<title>"
+ htmlFileName + "</title>\n" + "</head>" + "<body>" + markdownRenderer.render(document.get())
+ "</body>\n" + "</html>";
try {
if (!project.isOpen())
project.open(progressMonitor);
if (file.exists())
file.delete(true, progressMonitor);
if (!file.exists()) {
byte[] bytes = markdownString.getBytes();
InputStream source = new ByteArrayInputStream(bytes);
file.create(source, IResource.NONE, progressMonitor);
}
} catch (CoreException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return file;
}
private void loadFileInBrowser(IFile file) {
IWorkbench workbench = PlatformUI.getWorkbench();
try {
if (browser == null)
browser = workbench.getBrowserSupport().createBrowser(Activator.PLUGIN_ID);
URL htmlFile = FileLocator.toFileURL(file.getLocationURI().toURL());
browser.openURL(htmlFile);
IWorkbenchPartSite site = this.getSite();
IWorkbenchPart part = site.getPart();
site.getPage().activate(part);
} catch (IOException | PartInitException e) {
e.printStackTrace();
}
}
#Override
public void init(IEditorSite site, IEditorInput editorInput) throws PartInitException {
super.init(site, editorInput);
IDocumentProvider documentProvider = getDocumentProvider();
IDocument document = documentProvider.getDocument(editorInput);
IFile htmlFile = saveMarkdown(editorInput, document, null);
loadFileInBrowser(htmlFile);
}
#Override
public void doSave(IProgressMonitor progressMonitor) {
IDocumentProvider documentProvider = getDocumentProvider();
if (documentProvider == null)
return;
IEditorInput editorInput = getEditorInput();
IDocument document = documentProvider.getDocument(editorInput);
if (documentProvider.isDeleted(getEditorInput())) {
if (isSaveAsAllowed()) {
/*
* 1GEUSSR: ITPUI:ALL - User should never loose changes made in the editors.
* Changed Behavior to make sure that if called inside a regular save (because
* of deletion of input element) there is a way to report back to the caller.
*/
performSaveAs(progressMonitor);
} else {
}
} else {
// Convert document from string to string array with each instance a single line
// of the document
String[] stringArrayOfDocument = document.get().split("\n");
String[] formattedLines = PipeTableFormat.preprocess(stringArrayOfDocument);
StringBuilder builder = new StringBuilder();
for (String line : formattedLines) {
builder.append(line);
builder.append("\n");
}
String formattedDocument = builder.toString();
// Calculating the position of the cursor
ISelectionProvider selectionProvider = this.getSelectionProvider();
ISelection selection = selectionProvider.getSelection();
int cursorLength = 0;
if (selection instanceof ITextSelection) {
ITextSelection textSelection = (ITextSelection) selection;
cursorLength = textSelection.getOffset(); // etc.
activator.log(Integer.toString(cursorLength));
}
// This sets the cursor on at the start of the file
document.set(formattedDocument);
// Move the cursor
this.setHighlightRange(cursorLength, 0, true);
IFile htmlFile = saveMarkdown(editorInput, document, progressMonitor);
loadFileInBrowser(htmlFile);
performSave(false, progressMonitor);
}
}
private IProject getCurrentProject(IEditorInput editorInput) {
IProject project = editorInput.getAdapter(IProject.class);
if (project == null) {
IResource resource = editorInput.getAdapter(IResource.class);
if (resource != null) {
project = resource.getProject();
}
}
return project;
}
}
Here is the repository: https://github.com/borisv13/GitHub-Flavored-Markdown-Eclipse-Plugin
Any help is accepted

When merging PDF files using itext the result pdf show 0 bytes in android studio

My code is here; I was selecting PDF files from a SD card or internal phone memory using the showFileChooser() method then from onActivityResult(), the files go to mergePdfFiles(View view) method, and from there the files go to createPdf(String[] srcs).
Here is the complete code of my activity:
//Below is onCreate();
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_mergedpdf);
adapter = new ArrayAdapter<>(this,
android.R.layout.simple_list_item_1, locations);
// public void fileSelected(File file ) {
// locations.add(file.toString());
// adapter.notifyDataSetChanged();
// }
NewText = (TextView)findViewById(R.id.textView);
AdView mAdView = findViewById(R.id.adView);
AdRequest adRequest = new AdRequest.Builder().build();
mAdView.loadAd(adRequest);
listView = (ListView)findViewById(R.id.list_items);
btn = (ImageView) findViewById(R.id.imageView8);
btnconvert = (ImageView) findViewById(R.id.imageView11);
btnconvert.setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View v) {
// btTag=((Button)v).getTag().toString();
try {
createPdf( locations);
} catch (DocumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
//Toast.makeText(Mergedpdf.this, "button clicked", Toast.LENGTH_SHORT).show();
}
});
btn.setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View v) {
// btTag=((Button)v).getTag().toString();
showFileChooser();
NewText.setText("Below Files are Selected");
}
});
}
//this is mergedfiles();
public void mergePdfFiles(View view){
Toast.makeText(Mergedpdf.this, "merge function", Toast.LENGTH_SHORT).show();
try {String[] srcs= new String[locations.size()];
for(int i = 0;i<locations.size();i++) {
srcs[i] = locations.get(i);
}
// String[] srcs = {txt1.getText().toString(), txt2.getText().toString()};
createPdf(srcs);
}catch (Exception e){e.printStackTrace();}
}
//This method create merged pdf file
public void createPdf (String[] srcs) {
try {
// Create document object
File docsFolder = new File(Environment.getExternalStorageDirectory() + "/Merged-PDFFiles");
if (!docsFolder.exists()) {
docsFolder.mkdir();
Log.i(TAG, "Created a new directory for PDF");
}
Date date = new Date();
#SuppressLint("SimpleDateFormat") final String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(date);
Document document = new Document();
// pdfCopy = new File(docsFolder.getAbsolutePath(),pdf+"Images.pdf");
// Document document = new Document();
//PdfWriter.getInstance(document, output);
// Create pdf copy object to copy current document to the output mergedresult file
PdfCopy copy = new PdfCopy(document, new FileOutputStream(docsFolder + "/" + timeStamp +"combine.pdf"));
Toast.makeText(Mergedpdf.this, "merged pdf saved", Toast.LENGTH_SHORT).show();
// Open the document
document.open();
PdfReader pr;
int n;
for (int i = 0; i < srcs.length; i++) {
// Create pdf reader object to read each input pdf file
pr = new PdfReader(srcs[i].toString());
// Get the number of pages of the pdf file
n = pr.getNumberOfPages();
for (int page = 1; page <= n; page++) {
// Import all pages from the file to PdfCopy
copy.addPage(copy.getImportedPage(pr, page));
}
}
document.close(); // close the document
} catch (Exception e) {
e.printStackTrace();
}
}
//below is showfilechooser(); method
private void showFileChooser () {
Log.e("AA", "bttag=" + btTag);
String folderPath = Environment.getExternalStorageDirectory() + "/";
Intent intent = new Intent();
intent.setAction(Intent.ACTION_GET_CONTENT);
Uri myUri = Uri.parse(folderPath);
intent.setDataAndType(myUri, "application/pdf");
Intent intentChooser = Intent.createChooser(intent, "Select a file");
startActivityForResult(intentChooser,PICKFILE_RESULT_CODE);
}
//below is onActivityResult(); method
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
if (data != null) {
if (requestCode == PICKFILE_RESULT_CODE) {
if (resultCode == RESULT_OK) {
String FilePath = data.getData().getPath();
locations.add(FilePath);
ArrayAdapter<String> adapter = new ArrayAdapter<String>(this,R.layout.listview,locations);
listView.setAdapter(adapter);
}
}
}
}
//And my declared variables before oncreate().
public com.itextpdf.text.Document Document;
public PdfCopy Copy;
public ByteArrayOutputStream ms;
TextView NewText;
private TextView txt1;
private Button bt1, bt2,bt3;
private Handler handler;
ListView listView;
ArrayList<String> locations = new ArrayList<>();
ArrayAdapter<String> adapter;
ImageView btn, btnconvert, btn3;
private static final String TAG = "PdfCreator";
private final int PICKFILE_RESULT_CODE=10;
private String btTag = "";

Extracting text from a rectangle using iText ( .Net ) does give me the entire line

The following is the code (using iText for.Net Version 7.0.4.0) that i am using for extracting the text from a pdf. What i have observed during my testing is it works well by only extracting the content within a rectangle for most of the pdf's. But for few of them it gives the entire line from the pdf. I know
that the text snippets that intersect with the rect (so part of the text may be outside rect, iText doesn't cut text snippets in pieces).
But I want to understand what parameter in the pdf will be used in iText to split text.
var reader = new PdfReader( filePath );
PdfDocument pdfDoc = new PdfDocument( reader );
var addressRect = new Rectangle( 33, 190, 70, 42 ); //
var addressRegionFilter = new TextRegionEventFilter( addressRect );
var filterListener = new FilteredTextEventListener( new LocationTextExtractionStrategy(), addressRegionFilter );
var addressText = PdfTextExtractor.GetTextFromPage( pdfDoc.GetPage( 1 ), filterListener );
pdfDoc.Close();

This should do the trick.
class RectangleTextExtractionStrategy implements ITextExtractionStrategy
{
private ITextExtractionStrategy innerStrategy = null;
private Rectangle rectangle;
public RectangleTextExtractionStrategy(ITextExtractionStrategy strategy, Rectangle rectangle)
{
this.innerStrategy = strategy;
this.rectangle = rectangle;
}
#Override
public String getResultantText() {
return innerStrategy.getResultantText();
}
#Override
public void eventOccurred(IEventData iEventData, EventType eventType) {
if(eventType != EventType.RENDER_TEXT)
return;
TextRenderInfo tri = (TextRenderInfo) iEventData;
for(TextRenderInfo subTri : tri.getCharacterRenderInfos())
{
Rectangle r2 = new CharacterRenderInfo(subTri).getBoundingBox();
if(intersects(r2))
innerStrategy.eventOccurred(subTri, EventType.RENDER_TEXT);
}
}
private boolean intersects(Rectangle rectangle)
{
// # TODO
return true;
}
#Override
public Set<EventType> getSupportedEvents() {
return innerStrategy.getSupportedEvents();
}
}
The idea here is to split all incoming TextRenderInfo objects into the corresponding events for their characters. Then (if they are in the search region) we delegate the call to another ITextExtractionStrategy.

Pdf file is not viewing in android app

Anyone can help in this code, the pdf file is not loading in app and just showing blank white screen, Logcat showing FileNotFoundExeeption: /storage/sdcard/raw/ourpdf.pdf.
i am trying to make an app that will show information while i click buttons and every button will be active for specific pdf file reading. Any specific help please.
Thanks for help
part1
package com.code.androidpdf;
public class MainActivity extends Activity {
//Globals:
private WebView wv;
private int ViewSize = 0;
//OnCreate Method:
#Override
protected void onCreate(Bundle savedInstanceState)
{
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
//Settings
PDFImage.sShowImages = true; // show images
PDFPaint.s_doAntiAlias = true; // make text smooth
HardReference.sKeepCaches = true; // save images in cache
//Setup above
wv = (WebView)findViewById(R.id.webView1);
wv.getSettings().setBuiltInZoomControls(true);//show zoom buttons
wv.getSettings().setSupportZoom(true);//allow zoom
//get the width of the webview
wv.getViewTreeObserver().addOnGlobalLayoutListener(new ViewTreeObserver.OnGlobalLayoutListener()
{
#Override
public void onGlobalLayout()
{
ViewSize = wv.getWidth();
wv.getViewTreeObserver().removeGlobalOnLayoutListener(this);
}
});
pdfLoadImages();//load images
}
private void pdfLoadImages() {
try
{
// run async
new AsyncTask<Void, Void, Void>()
{
// create and show a progress dialog
ProgressDialog progressDialog = ProgressDialog.show(MainActivity.this, "", "Opening...");
#Override
protected void onPostExecute(Void result)
{
//after async close progress dialog
progressDialog.dismiss();
}
#Override
protected Void doInBackground(Void... params)
{
try
{
// select a document and get bytes
File file = new File(Environment.getExternalStorageDirectory().getPath()+"/randompdf.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
net.sf.andpdf.nio.ByteBuffer bb = null ;
raf.close();
// create a pdf doc
PDFFile pdf = new PDFFile(bb);
//Get the first page from the pdf doc
PDFPage PDFpage = pdf.getPage(1, true);
//create a scaling value according to the WebView Width
final float scale = ViewSize / PDFpage.getWidth() * 0.95f;
//convert the page into a bitmap with a scaling value
Bitmap page = PDFpage.getImage((int)(PDFpage.getWidth() * scale), (int)(PDFpage.getHeight() * scale), null, true, true);
//save the bitmap to a byte array
ByteArrayOutputStream stream = new ByteArrayOutputStream();
page.compress(Bitmap.CompressFormat.PNG, 100, stream);
stream.close();
byte[] byteArray = stream.toByteArray();
//convert the byte array to a base64 string
String base64 = Base64.encodeToString(byteArray, Base64.DEFAULT);
//create the html + add the first image to the html
String html = "<!DOCTYPE html><html><body bgcolor=\"#7f7f7f\"><img src=\"data:image/png;base64,"+base64+"\" hspace=10 vspace=10><br>";
//loop through the rest of the pages and repeat the above
for(int i = 2; i <= pdf.getNumPages(); i++)
{
PDFpage = pdf.getPage(i, true);
page = PDFpage.getImage((int)(PDFpage.getWidth() * scale), (int)(PDFpage.getHeight() * scale), null, true, true);
stream = new ByteArrayOutputStream();
page.compress(Bitmap.CompressFormat.PNG, 100, stream);
stream.close();
byteArray = stream.toByteArray();
base64 = Base64.encodeToString(byteArray, Base64.DEFAULT);
html += "<img src=\"data:image/png;base64,"+base64+"\" hspace=10 vspace=10><br>";
}
html += "</body></html>";
//load the html in the webview
wv.loadDataWithBaseURL("", html, "text/html","UTF-8", "");
}
catch (Exception e)
{
Log.d("CounterA", e.toString());
}
return null;
}
}.execute();
System.gc();// run GC
}
catch (Exception e)
{
Log.d("error", e.toString());
}
}
}

It is (sadly) not possible to view a PDF that is stored locally in your devices. Android L has introduced the feature. So, to display a PDF , you have two options:
See this answer for using webview
How to open local pdf file in webview in android? (note that this requires an internet connection)
Use a third party pdf Viewer.
You can also send an intent for other apps to handle your pdf.

You can get an InputStream for the file using
getResources().openRawResource(R.raw.ourpdf)
Docs: http://developer.android.com/reference/android/content/res/Resources.html#openRawResource(int)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to identify whether the text is boxed in PDF using PDFBOX? - pdfbox

Related

Extract images of signatures contained in a pdf file with iText7

Delete button stops working sporadically, only on Windows, when using our editor plugin

When merging PDF files using itext the result pdf show 0 bytes in android studio

Extracting text from a rectangle using iText ( .Net ) does give me the entire line

Pdf file is not viewing in android app

Categories

Resources