I am using java library pdfbox to extract text/glyph from PDF along with its other characteristics such as page number, font, font size, left, top, height, width, absY values. Now I need to create a function that would take as input - row text and column text of the table present in PDF and output value. For e.g. I am looking for revenue increase (row text) for year 2019 (column text) from table of PDF. So input would be "revenue increase" as row text and "year 2019" as column text and output would be the value say "8.5%". Here tables in PDF can be border or borderless, spread over multiple pages or even in rotated or newspaper format page. This makes things difficult. So looking for help here on how to extract data point "revenue increase" value for "Year 2019" in this case from table using text characteristics available using PDFBOX library.
Related
I'm trying to have text in my document but get the last two characters of the text contents to change size.
For example, I have a text layer in a group and I set the contents to "100ft". I would like to make the last two characters "ft" a different size than the rest of the text.
it's very easy!
1 choose your text layer
2 choose type tool from left side tools
click on your text with type tool selected
select letters that you want to change them sizes
from top panel choose any size you need
enter image description hereI have a data frame of industries divided into six sectors. I brought the excel file (xlsx format) into R by the gdata package. Now what I observe is that infront of some of the name of the firms and also against some dummy variables, the letter appears and this was not in my original excel sheet. and this is random - some names are followed by this and some don't. Can somebody tell me how do I get rid of these signs?
so I've been trying to generate a report. I've tried quite a few things already but there always seems to be problems. I'm currently trying iTextSharp 4.1.6.
My current strategy is to use LibreOffice to create a document with editable pdf fields, or I guess they are called "AcroFields". I'm not sure since I can't find a definition. But anyways, I assume that all of these are "AcroFields":
But if I put all of those into a form and export as pdf only some of them show up as AcroFields:
var reader = new PdfReader(File.ReadAllBytes("abc.pdf"));
foreach(var field in reader.AcroFields.Fields)
{
Console.WriteLine(((DictionaryEntry)field).Key);
}
> Text Box 1
Check Box 1
Numeric Field 1
Formatted Field 1
Date Field 1
List Box 1
Combo Box 1
Push Button 1
Option Button 1
Notice how Label Field 1 is not present. If it were present then doing a text replace might be easy. Except it's not present so it's looking like even iText can't do a simple text replace in a pdf. Is this true? How would you replace text in a pdf document using iTextSharp?
Notice how Label Field 1 is not present.
As there is no AcroForm form field type "label", form labels usually are drawn as regular page content in PDF files.
If it were present then doing a text replace might be easy. Except it's not present so it's looking like even iText can't do a simple text replace in a pdf. Is this true?
Indeed, in general there is no simple text replacement in a PDF.
How would you replace text in a pdf document using iTextSharp?
I would determine the bounding box coordinates of the text to replace using the iText text extraction feature with some extension that returns text plus coordinates. Then I'd remove that text by redaction using iText's PdfCleanUp... classes. Finally I'd add the replacement text as new text in the bounding box determined at start.
Unfortunately for you, both good text extraction and redaction are not present in your version 4.1.6; for this approach you should update at least to 5.5.x.
Alternatively, though, as you've been trying to generate a report, I assume the template design is in your hands. In that case you can put your labels into read-only text fields which you can change (they are read-only only to GUI users).
So I got this text of which I can't know the length beforehand because it depends on how many entries there are in an internal table (see below). The table is given to the Smartforms FM in my report. The text itself works fine with a dynamic text variable, but under that text I need a horizontal line. The Line needs to be right beneath the text at all time. So far I only got a line with a fixed position, which does not lead to the result I want.
If it is possible, how can I get the line to change position based on the length of the text? So that it is right under the text at all time, no matter how many lines the text got.
DATA: l_string TYPE string,
lt_stream_lines TYPE STANDARD TABLE OF string.
loop at i_tab.
* reading one line of i_tab into l_string.
APPEND l_string TO lt_stream_lines.
APPEND '' TO lt_stream_lines.
endloop.
CALL FUNCTION 'CONVERT_STREAM_TO_ITF_TEXT'
EXPORTING
stream_lines = lt_stream_lines
lf = 'X'
TABLES
itf_text = gv_text.
* gv_text then has the full text I want to display
You must have a Main Window containing your Text element followed by a dummy Template element for the horizontal line (one empty cell with the top horizontal border in black color and other borders transparent).
Create a Template element via the context menu:
Draw the border (here I exaggerate the proportions "a little bit"!):
Preview result:
I have a report containing a currency text filed. I want to separate it's number by coma, every 3 digits. so I changed it's Text Format from it's Properties.
It's OK in normal display, but when I export this report to a PDF file, then it's currency numbers doesn't display well.
Please help me about this problem.
You can use formats.
For example you can fill your text field with:
Price is: {Format("{0:C}", MyPrice)}