Problem with size of lower indexed letters like: j,p,q,g etc. inside PDF files

Problem with size of lower indexed letters like: j,p,q,g etc. inside PDF files - pdf

Dear Stack Overflow Members ...
I try to cover words in PDF.
I have choose word: 'informacji' to be deleted from entire PDF file.
Problem is that i cannot get the right size of letter: 'j'- in this case.
Someone more clever, could gess what is written behind it.
I have implemented my own class inherited from LocationTextExtractionStrategy , here is the code:
public override void RenderText(TextRenderInfo renderInfo)
{
LineSegment segment = renderInfo.GetBaseline();
if (renderInfo.GetRise() != 0)
{ // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to
Matrix riseOffsetTransform = new Matrix(0, -renderInfo.GetRise());
segment = segment.TransformBy(riseOffsetTransform);
}
var fnt= renderInfo.GetFont();
TextChunk tc = new TextChunk(renderInfo.GetText(), tclStrat.CreateLocation(renderInfo, segment));
Vector startLine = renderInfo.GetBaseline().GetStartPoint();
Vector endLineTopRight = renderInfo.GetAscentLine().GetEndPoint();
Rectangle textRectangle = new Rectangle(startLine[Vector.I1], startLine[Vector.I2], endLineTopRight[Vector.I1], endLineTopRight[Vector.I2]);
TextInfo textInfo = new TextInfo(tc, textRectangle);
locationalResult.Add(textInfo);
}
and some lines code later, im adding values of textRectangle object to the list of objects wordList[wordList.Count-1].rectanglesToDraw.Add(new SquaresToDraw(page, text.textRectangle.Left, text.textRectangle.Bottom, text.textRectangle.Right, text.textRectangle.Top));
Now additional info (nothing special imo):
RectanglesToDraw is a list of SquaresToDraw
And SquaresToDraw is a class, which looks like:
public class SquaresToDraw
{
public int pageNumber { get; set; }
public float left { get; set; }//llx
public float bottom { get; set; } //lly
public float right { get;set;} //rux
public float top { get; set; }//ruy
public SquaresToDraw(int pageNumber,float left, float bottom, float right,float top)
{
this.pageNumber = pageNumber;
this.left = left;
this.right = right;
this.bottom = bottom;
this.top = top;
}
}
Any help will be appreciated.

You use the base line add lower limit of your rectangle:
Vector startLine = renderInfo.GetBaseline().GetStartPoint();
If you want to cover letters with parts below the base line, too, you should use the descent line instead:
Vector startLine = renderInfo.GetDescentLine().GetStartPoint();

Related

Making a highly customizable method, or a specific method that does a task?

I'm not really sure how I would phrase the title right, so I apologize for the initial confusion.
This is just a small question I had about how to structure code and such and I have no idea on what to call it so I will explain it with this example:
Say I am writing a Call of Duty type game where the player can customize their weapons with certain attachment.
I have a class that defines each gun. It looks something like this:
class Gun {
int clip = 30;
int ammo = 100;
float reloadTime = 5f;
float damage = 10f;
Attachment[] attachments;
//Plus some not included attachments.
void shoot() {
//...
}
void reload() {
//...
}
void applyAllAttachments() {
//Apply the list of attachments' effects
}
}
class Attachment {
void effect() {
//change the gun in some way.
}
}
Now I would like to add 4 attachments, Fast Mags (increase reload speed), Hollow Point (increase damage), Grenade Launcher (Secondary Gun) and Minigun (Replace the barrel with a minigun or something).
For the Fast Mags and the Hollow Point, it should be simple, all I have to do is change a number or a value, but for the Grenade Launcher and Minigun, which have custom, extra functions (like Unity Delegates), would it be wiser to add a function that handles external custom firing types, or would it be better to just have separate methods inside the Gun class that specifically handle to extra minigun functions?
TL;DR
If I want to add a grenade launcher attachment to a gun, should I do this:
class Gun {
int clip = 30;
int ammo = 100;
float reloadTime = 5f;
float damage = 10f;
Attachment[] attachments = Attachment[10];
//Plus some not included attachments.
void shoot() {
//...
customShoot();
}
void customShoot() {
//Apply attachments custom attachment shoot methods.
}
void reload() {
//...
}
void applyAllAttachments() {
//Apply the list of attachments' effects
}
}
class GrenadeLauncher extends Attachment {
#Override
public void effect() {
//Spawn new grenade
}
}
Or This:
class Gun {
int clip = 30;
int ammo = 100;
float reloadTime = 5f;
float damage = 10f;
Attachment[] attachments = Attachment[10];
//Plus some not included attachments.
void shoot() {
//...
if (attachments.GetType() == GrenadeLauncher) {
grenadeLauncherShoot();
}
}
void grenadeLauncherShoot() {
}
void reload() {
//...
}
void applyAllAttachments() {
//Apply the list of attachments' effects
}
}
Sorry for my pseudo/java code, hope it's comprehensible.

The first way is better: You can create new attachments without having to modify the Gun class.
In a general manner, you shouldn't need to check for type, and your code will be cleaner if you don't.
Here, your Attachment class should be abstract (I suppose it already is), and force children to implements some functions.
public abstract class Attachment
{
protected abstract void shoot();
}
Then the gun calls it for all Attachements:
class Gun {
int clip = 30;
int ammo = 100;
float reloadTime = 5f;
float damage = 10f;
Attachment[] attachments = Attachment[10];
//Plus some not included attachments.
void shoot() {
//...
for(int i = 0; i < attachments.length(); ++i) {
attachments[i].shoot();
}
}
void reload() {
//...
}
}
class GrenadeLauncher extends Attachment {
#Override
public void shoot()
{
//Spawn new grenade
}
}
By the way, why did you tag java and Unity? If you work with unity your code should be c# or javascript

JOptionPane cannot find symbol in module

So, the assignment for this week was all about modularization, and the code must contain 6 modules, which is why it looks like a complete mess. Anyway, I'm getting an error that says the module cannot find JOptionPane even though I declared it for the main method. I've posed the code below. Any help appreciated.
Specifically, I'm getting it right here, with this line of code posted at the top.
{ public static String getItemShape ()
{
String typeOfShape;
typeOfShape = JOptionpPane.showInputDialog("Please enter 'C' for a Circle, or 'S' for a Sphere"); //getting input for shape
return typeOfShape; //returning to method
}
}
//This program will find the area or volume of a circle or sphere,
respectively.
import javax.swing.JOptionPane;
public class Java_Chapter_9
{
public static void main(String args[])
{
//Declarations
String itemShape; //type of shape
String runProgram; //user control
Double itemRadius; //radius of tem
Double finalAnswer; //calculation for final answer
//End Declarations
showGreeting (); //Call greeting module
runProgram = JOptionPane.showInputDialog("Please enter 'Y' to run the
program, or 'N' to quit"); //giving user control
while (runProgram.equalsIgnoreCase("y")) //loop for continuous use
{
itemShape = getItemShape (); //calling itemShape module
itemRadius = getItemRadius (); //calling itemradius module
finalAnswer = calculateAnswer (itemRadius, itemShape); //calling the
module for calculation with paramaters
runProgram = JOptionPane.showInputDialog("Enter 'Y' to input more, or
'N' to Quit");
}
showGoodbye ();
}
////////////////////////////////////////////////// starting modules
public static void showGreeting () //greeting module
{
System.out.println("Welcome to the program");
System.out.println("This program will show you the area or volume of a
shape");
}
///////////////////////////////////////////////// seperating modules
public static String getItemShape ()
{
String typeOfShape;
typeOfShape = JOptionpPane.showInputDialog("Please enter 'C' for a
Circle, or 'S' for a Sphere"); //getting input for shape
return typeOfShape; //returning to method
}
////////////////////////////////////////////////// seperating modules
public static double getItemRadius ()
{
double radiusOfItem; //variable withing scope of module
String radiusofItemInput;
radiusOfItemInput = JOptionPane.showInputDialog("Please enter the
radius of the item in inches: ");
radiusOfItem = Double.parseDouble(radiusofItemInput);
return radiusOfItem;
}
////////////////////////////////////////////////// seperating modules
public static double calculateAnswer (double itemRadius, string itemShape);
{
double circleArea;
if (itemShape.equalsIgnoreCase("c"))
{
circleArea = 3.14159 * (itemRadius * itemRadius);
system.out.print("The area of the circle in inches is " + circleArea);
return circleArea;
}
else
{
calculateAnswerSphere (itemRadius, itemShape);
}
/////////////////////////////////////////////// seperating method
{
double sphereVolume;
sphereVolume = (4.0/3) * 3.14159 * (itemRadius * itemRadius *
itemRadius);
system.out.print("The volume of the sphere in cubic inches is "
+sphereVolume);
}
end If;
}
public static void showGoodbye ()
{
System.out.println("Thank you for using the program. Goodbye.");
}
}

Anyway, I'm getting an error that says the module cannot find
JOptionPane even though I declared it for the main method
You have a typo in this line
typeOfShape = JOptionpPane.showInputDialog("Please enter 'C' for a Circle, or 'S' for a Sphere"); //getting input for shape
It should be JOptionPane instead of JOptionpPane. Remove the p

Eclipse plugin - ColumnLabelProvider display only image

So I am developing an Eclipse plug-in and using ColumnLabelProvider to provide label for the columns of my tree viewer.
However, in one of the columns, I only intend to display an image and no text. However, in the final display, Eclipse reserves blank space for the text element even if I return a null.
Is there any way to make it display only image and in the full space provided?
Here is the code snippet:
column4.setLabelProvider(new ColumnLabelProvider() {
#Override
public String getText(Object element) {
return null;
}
#Override
public Image getImage(Object element) {
/* Code to Display an image follows */
.....
}
});

ColumnLabelProvider will always leave space for the text.
You can use a class derived from OwnerDrawLabelProvider to draw the column yourself.
Something like:
public abstract class CentredImageCellLabelProvider extends OwnerDrawLabelProvider
{
protected CentredImageCellLabelProvider()
{
}
#Override
protected void measure(Event event, Object element)
{
}
#Override
protected void erase(final Event event, final Object element)
{
// Don't call super.erase() to suppress non-standard selection draw
}
#Override
protected void paint(final Event event, final Object element)
{
TableItem item = (TableItem)event.item;
Rectangle itemBounds = item.getBounds(event.index);
GC gc = event.gc;
Image image = getImage(element);
Rectangle imageBounds = image.getBounds();
int x = event.x + Math.max(0, (itemBounds.width - imageBounds.width) / 2);
int y = event.y + Math.max(0, (itemBounds.height - imageBounds.height) / 2);
gc.drawImage(image, x, y);
}
protected abstract Image getImage(Object element);
}

Text extraction from table cells

I have a pdf. The pdf contains a table. The table contains many cells (>100). I know the exact position (x,y) and dimension (w,h) of every cell of the table.
I need to extract text from cells using itextsharp. Using PdfReaderContentParser + FilteredTextRenderListener (using a code like this http://itextpdf.com/examples/iia.php?id=279 ) I can extract text but I need to run the whole procedure for each cell. My pdf have many cells and the program needs too much time to run. Is there a way to extract text from a list of "rectangle"? I need to know the text of each rectangle. I'm looking for something like PDFTextStripperByArea by PdfBox (you can define as many regions as you need and the get text using .getTextForRegion("region-name") ).

This option is not immediately included in the iTextSharp distribution but it is easy to realize. In the following I use the iText (Java) class, interface, and method names because I am more at home with Java. They should easily be translatable into iTextSharp (C#) names.
If you use the LocationTextExtractionStrategy, you can can use its a posteriori TextChunkFilter mechanism instead of the a priori FilteredRenderListener mechanism used in the sample you linked to. This mechanism has been introduced in version 5.3.3.
For this you first parse the whole page content using the LocationTextExtractionStrategy without any FilteredRenderListener filtering applied. This makes the strategy object collect TextChunk objects for all PDF text objects on the page containing the associated base line segment.
Then you call the strategy's getResultantText overload with a TextChunkFilter argument (instead of the regular no-argument overload):
public String getResultantText(TextChunkFilter chunkFilter)
You call it with a different TextChunkFilter instance for each table cell. You have to implement this filter interface which is not too difficult as it only defines one method:
public static interface TextChunkFilter
{
/**
* #param textChunk the chunk to check
* #return true if the chunk should be allowed
*/
public boolean accept(TextChunk textChunk);
}
So the accept method of the filter for a given cell must test whether the text chunk in question is inside your cell.
(Instead of separate instances for each cell you can of course also create one instance whose parameters, i.e. cell coordinates, can be changed between getResultantText calls.)
PS: As mentioned by the OP, this TextChunkFilter has not yet been ported to iTextSharp. It should not be hard to do so, though, only one small interface and one method to add to the strategy.
PPS: In a comment sschuberth asked
Do you then still call PdfTextExtractor.getTextFromPage() when using getResultantText(), or does it somehow replace that call? If so, how to you then specify the page to extract to?
Actually PdfTextExtractor.getTextFromPage() internally already uses the no-argument getResultantText() overload:
public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
return parser.processContent(pageNumber, strategy, additionalContentOperators).getResultantText();
}
To make use of a TextChunkFilter you could simply build a similar convenience method, e.g.
public static String getTextFromPage(PdfReader reader, int pageNumber, LocationTextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators, TextChunkFilter chunkFilter) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
return parser.processContent(pageNumber, strategy, additionalContentOperators).getResultantText(chunkFilter);
}
In the context at hand, though, in which we want to parse the page content only once and apply multiple filters, one for each cell, we might generalize this to:
public static List<String> getTextFromPage(PdfReader reader, int pageNumber, LocationTextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators, Iterable<TextChunkFilter> chunkFilters) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
parser.processContent(pageNumber, strategy, additionalContentOperators)
List<String> result = new ArrayList<>();
for (TextChunkFilter chunkFilter : chunkFilters)
{
result.add(strategy).getResultantText(chunkFilter);
}
return result;
}
(You can make this look fancier by using Java 8 collection streaming instead of the old'fashioned for loop.)

Here's my take on how to extract text from a table-like structure in a PDF using itextsharp. It returns a collection of rows and each row contains a collection of interpreted columns. This may work for you on the premise that there is a gap between one column and the next which is greater than the average width of a single character. I also added an option to check for wrapped text within a virtual column. Your mileage may vary.
using (PdfReader pdfReader = new PdfReader(stream))
{
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
TableExtractionStrategy tableExtractionStrategy = new TableExtractionStrategy();
string pageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, tableExtractionStrategy);
var table = tableExtractionStrategy.GetTable();
}
}
public class TableExtractionStrategy : LocationTextExtractionStrategy
{
public float NextCharacterThreshold { get; set; } = 1;
public int NextLineLookAheadDepth { get; set; } = 500;
public bool AccomodateWordWrapping { get; set; } = true;
private List<TableTextChunk> Chunks { get; set; } = new List<TableTextChunk>();
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
string text = renderInfo.GetText();
Vector bottomLeft = renderInfo.GetDescentLine().GetStartPoint();
Vector topRight = renderInfo.GetAscentLine().GetEndPoint();
Rectangle rectangle = new Rectangle(bottomLeft[Vector.I1], bottomLeft[Vector.I2], topRight[Vector.I1], topRight[Vector.I2]);
Chunks.Add(new TableTextChunk(rectangle, text));
}
public List<List<string>> GetTable()
{
List<List<string>> lines = new List<List<string>>();
List<string> currentLine = new List<string>();
float? previousBottom = null;
float? previousRight = null;
StringBuilder currentString = new StringBuilder();
// iterate through all chunks and evaluate
for (int i = 0; i < Chunks.Count; i++)
{
TableTextChunk chunk = Chunks[i];
// determine if we are processing the same row based on defined space between subsequent chunks
if (previousBottom.HasValue && previousBottom == chunk.Rectangle.Bottom)
{
if (chunk.Rectangle.Left - previousRight > 1)
{
currentLine.Add(currentString.ToString());
currentString.Clear();
}
currentString.Append(chunk.Text);
previousRight = chunk.Rectangle.Right;
}
else
{
// if we are processing a new line let's check to see if this could be word wrapping behavior
bool isNewLine = true;
if (AccomodateWordWrapping)
{
int readAheadDepth = Math.Min(i + NextLineLookAheadDepth, Chunks.Count);
if (previousBottom.HasValue)
for (int j = i; j < readAheadDepth; j++)
{
if (previousBottom == Chunks[j].Rectangle.Bottom)
{
isNewLine = false;
break;
}
}
}
// if the text was not word wrapped let's treat this as a new table row
if (isNewLine)
{
if (currentString.Length > 0)
currentLine.Add(currentString.ToString());
currentString.Clear();
previousBottom = chunk.Rectangle.Bottom;
previousRight = chunk.Rectangle.Right;
currentString.Append(chunk.Text);
if (currentLine.Count > 0)
lines.Add(currentLine);
currentLine = new List<string>();
}
else
{
if (chunk.Rectangle.Left - previousRight > 1)
{
currentLine.Add(currentString.ToString());
currentString.Clear();
}
currentString.Append(chunk.Text);
previousRight = chunk.Rectangle.Right;
}
}
}
return lines;
}
private struct TableTextChunk
{
public Rectangle Rectangle;
public string Text;
public TableTextChunk(Rectangle rect, string text)
{
Rectangle = rect;
Text = text;
}
public override string ToString()
{
return Text + " (" + Rectangle.Left + ", " + Rectangle.Bottom + ")";
}
}
}

MapPoint GetPictureFromObject method

According to the msdn documentation:
GetPictureFromObject method
Returns a picture (Visual Basic Picture object) of the current map view.
After digging around, I found that this "Picture" object apparently hasn't existed since VB6. I guess there's no way to write a class to masquerade as this type... Or is there?

It seems that this is a problem with no pretty solution.
public Image GetImage()
{
Image image = null;
object save = Clipboard.GetDataObject();
try
{
Application.ActiveMap.CopyMap();
IDataObject pict = Clipboard.GetDataObject();
string[] formats = pict.GetFormats();
foreach (string s in formats)
{
if (s.EndsWith(System.Windows.Forms.DataFormats.Bitmap))
{
image = (System.Drawing.Image)pict.GetData(System.Windows.Forms.DataFormats.Bitmap);
break;
}
}
}
finally
{
Clipboard.SetDataObject(save);
}
return image;
}

The GetPictureFromObject() method returns a stdole.IPictureDisp COM object.
Here is a working example from http://www.mapping-tools.com/howto/mappoint/programming/creating-map-images-in-c/
// Find the size of the PictureBox in pixels
// Then convert it into HIMETRIC units for MapPoint
// We perform the conversion using Inch2HIMETRIC (see above) and
// the system DPI values
// Note: MapPoint/VB6's definition of "long" is the same as C#'s definition for "int".
int iWidth, iHeight;
Graphics g = myPictureBox.CreateGraphics();
iWidth = (int)((double)myPictureBox.Width * Inch2HIMETRIC / g.DpiX);
iHeight = (int)((double)myPictureBox.Height * Inch2HIMETRIC / g.DpiY);
// GetPictureFromObject() is defined as a member of the MapPointUtilities class
MapPoint.MapPointUtilities myMapUtils = new MapPoint.MapPointUtilities();
// Create the Picture for the current map, as an stdole.IPictureDisp COM object
stdole.IPictureDisp ipicMap = (stdole.IPictureDisp)myMapUtils.GetPictureFromObject(myMap, iWidth, iHeight);
// Convert it to a (metafile) Drawing.Image using the OleCreateConverter defined above
System.Drawing.Image myImage = OleCreateConverter.PictureDispToImage(ipicMap);
// Copy the Drawing.Image to the Picture box
// Refresh and stretch for good measure
myPictureBox.Image = myImage;
myPictureBox.SizeMode = PictureBoxSizeMode.StretchImage;
myPictureBox.Refresh();
// Save it as a PNG
// Although a GIF is capable of holding a MapPoint map, PNGs are generally prefered
myImage.Save(#"london2012.png", System.Drawing.Imaging.ImageFormat.Png);
Here is the metadata from stdole.dll:
using System;
using System.Reflection;
using System.Runtime.InteropServices;
namespace stdole
{
[InterfaceType(2)]
[Guid("7BF80981-BF32-101A-8BBB-00AA00300CAB")]
[ComConversionLoss]
public interface IPictureDisp
{
[DispId(0)]
[ComAliasName("stdole.OLE_HANDLE")]
int Handle { get; }
[DispId(5)]
[ComAliasName("stdole.OLE_YSIZE_HIMETRIC")]
int Height { get; }
[DispId(2)]
[ComAliasName("stdole.OLE_HANDLE")]
int hPal { get; set; }
[DispId(3)]
short Type { get; }
[DispId(4)]
[ComAliasName("stdole.OLE_XSIZE_HIMETRIC")]
int Width { get; }
[DispId(6)]
void Render(int hdc, int x, int y, int cx, int cy, int xSrc, int ySrc, int cxSrc, int cySrc, IntPtr prcWBounds);
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Problem with size of lower indexed letters like: j,p,q,g etc. inside PDF files - pdf

You use the base line add lower limit of your rectangle: Vector startLine = renderInfo.GetBaseline().GetStartPoint(); If you want to cover letters with parts below the base line, too, you should use the descent line instead: Vector startLine = renderInfo.GetDescentLine().GetStartPoint();

Related

Making a highly customizable method, or a specific method that does a task?

JOptionPane cannot find symbol in module

Eclipse plugin - ColumnLabelProvider display only image

Text extraction from table cells

MapPoint GetPictureFromObject method

Categories

Resources