pdfbox extract text by area units

pdfbox extract text by area units - pdfbox

I need to extract the address zip code from a pdf and I use the PDFTextStripperByArea class from pdfbox like in this example ExtractTextByArea. But what are the units of parameters in Rectangle rect = new Rectangle( 10, 280, 275, 60 ); From where must measure and to what units? If they are pixels then it is not convenient to measure pdf page component positions in pixels.

The y-coordinates are java coordinates (y == 0 is top), not PDF coordinates (y == 0 is bottom).
The units are 1/72 inch which is identical to pixels when you render the PDF at 100%.

Related

How to pixel perfect align text-element

I want my vue Konva Text element to completely fill the given height, like i expect of a rectangle.
This is issue becomes obvious when pairing with text images, (converting svg text to canvas) that properly match the given dimension
<v-text :config={
x: 50,
y: 50,
width: 1000,
height: 60,
fontSize: 60,
fontStyle: 'bold',
fontFamily 'Campton Book'
text: 'WELT'
}
/>
<v-rect
:config="{ x: 50, y: 50, fill: 'black', height: 60, width: 200 }"
/>
Second Part, is there any way to always pixel perfectly align the left side with the border? the x coordinate matches the border
Is this due to font constraints? What am I missing?
I tried to get the height of the text node to fix this positioning but this is the given height passed down as props

Text is defined as having parts above and below the baseline. Above is termed 'ascenders' amd below is 'descenders', which are required for lower case letters like j y g.
Setting the text fontSize to 60 does not say 'whatever the string, make it fill a space 60px high'. Instead it says 'Make text in a 60px font', which makes space for the descenders because they will generally be required.
If you know for sure that the text will be all caps, then a solution is to measure the height used and increase the font size by a computed factor so that the font fills the line height.
To do this you'll need to get the glyph measurements as follows:
const lineHeight = 60; // following your code
// make your text shape here...left out for brevity
const metrics = ctx.measureText('YOUR CAPS TEXT');
capsHeight = Math.abs(metrics.actualBoundingBoxAscent)
fontSize = lineHeight * lineHeight / capsHeight;
If I got that right, your 60px case should give a value around 75. That's based on the convention that ascenders are 80% of the line height. Now you set the font size of your shape to this new value and you should be filling the entire line height.
Regarding the left-alignment, this relies on what the font gurus call the a-b-c widths. The left gap is the a-space, the b is the character width (where the ink falls) and the c-space is the same as the a-space but on the right hand side.
Sadly unless someone else can tell me I am wrong, you don't get a-b-c widths in the canvas TextMetric. There is a workaround which is rather convoluted but viable. You would draw the text in black on an off-screen canvas filled with a transparent background. Then get the canvas pixel data and walk horizontal lines from the left of the canvas inspecting pixels and looking for the first colored pixel. Once you find it you have the measurement to offset the text shape horizontally.

iOS Core Graphics how to optimize incremental drawing of very large image?

I have an app written with RXSwift which processes 500+ days of HealthKit data to draw a chart for the user.
The chart image is drawn incrementally using the code below. Starting with a black screen, previous image is drawn in the graphics context, then a new segment is drawn over this image with certain offset. The combined image is saved and the process repeats around 70+ times. Each time the image is saved, so the user sees the update. The result is a single chart image which the user can export from the app.
Even with autorelease pool, I see spikes of memory usage up to 1Gb, which prevents me from doing other resource intensive processing.
How can I optimize incremental drawing of very large (1440 × 5000 pixels) image?
When image is displayed or saved at 3x scale, it is actually 4320 × 15360.
Is there a better way than trying to draw over an image?
autoreleasepool {
//activeEnergyCanvas is custom data processing class
let newActiveEnergySegment = activeEnergyCanvas.draw(in: CGRect(x: 0, y: 0, width: 1440, height: days * 10), with: energyPalette)
let size = CGSize(width: 1440, height: height)
UIGraphicsBeginImageContextWithOptions(size, false, 0.0)
//draw existing image
self.activeEnergyImage.draw(in: CGRect(origin: CGPoint(x: 0, y: 0),
size: size))
//calculate where to draw smaller image over larger one
let offsetRect = CGRect(origin: CGPoint(x: 0, y: offset * 10),
size: newActiveEnergySegment.size)
newActiveEnergySegment.draw(in: offsetRect)
//get the combined image
let newImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
//assign combined image to be displayed
if let unwrappedImage = newImage {
self.activeEnergyImage = unwrappedImage
}
}

Turns out my mistake was in passing invalid drawing scale (0.0) when creating graphics context, which defaulted to drawing at the device's native screen scale.
In case of iPhone 8 it was 3.0 The result is needing extreme amounts of memory to draw, zoom and export these images. Even if all debug logging prints that image is 1440 pixels wide, the actual canvas ends up being 1440 * 3.0 = 4320.
Passing 1.0 as the drawing scale makes the image more fuzzy, but reduces memory usage to less than 200mb.
// UIGraphicsBeginImageContext() <- also uses #3x scale, even when all display size printouts show
let drawingScale: CGFloat = 1.0
UIGraphicsBeginImageContextWithOptions(size, true, drawingScale)

Can you change the bounds of a Sampler in a Metal Shader?

In the fragment function of a Metal Shader file, is there a way to redefine the "bounds" of the texture with respect to what the sample will consider it's normalized coordinates to be?
By default, a value of 0,0 for the sample is the top-left "pixel" and 1,1 is the bottom right "pixel" of the texture. However, I'm re-using textures for drawing and at any given render pass there's only a portion of the texture that contains the relevant data.
For example, in a texture of width: 500 and height: 500, I might have only copied data into the region of 0,0,250,250. In my fragment function, I'd like the sampler to interpret a normalized coordinate of 1.0 to be 250 and not 500. Is that possible?
I realize I can just change the sampler to use pixel addressing, but that comes with a few restrictions as noted in the Metal Shader Specification.

No, but if you know the region you want to sample from, it's quite easy to do a little math in the shader to fix up your sampling coordinates. This is used often with texture atlases.
Suppose you have an image that's 500x500 and you want to sample the bottom-right 125x125 region (just to make things more interesting). You could pass this sampling region in as a float4, storing the bounds as (left, top, width, height) in the xyzw components. In this case, the bounds would be (375, 375, 125, 125). Your incoming texture coordinates are "normalized" with respect to this square. The shader simply scales and biases these coordinates into texel coordinates, then normalizes them to the dimensions of the whole texture:
fragment float4 fragment_main(FragmentParams in [[stage_in]],
texture2d<float, access::sample> tex2d [[texture(0)]],
sampler sampler2d [[sampler(0)]],
// ...
constant float4 &spriteBounds [[buffer(0)]])
{
// original coordinates, normalized with respect to subimage
float2 texCoords = in.texCoords;
// texture dimensions
float2 texSize = float2(tex2d.get_width(), tex2d.get_height());
// adjusted texture coordinates, normalized with respect to full texture
texCoords = (texCoords * spriteBounds.zw + spriteBounds.xy) / texSize;
// sample color at modified coordinates
float4 color = tex2d.sample(sampler2d, texCoords);
// ...
}

The size of PDF documents, how do I convert from millimeters to pixels using Spire.pdf?

The size of PDF documents, how do I convert from millimeters to pixels using Spire.pdf?
PdfDocument doc = new PdfDocument();
doc.PageScaling = PdfPrintPageScaling.ActualSize;
doc.LoadFromFile("myDocument.pdf");
foreach (PdfPageBase page in doc.Pages)
{
//Result returns the pixel type. But I want to show in millimeters
Console.WriteLine("PageSize: {0}X{1}", page.Size.Width, page.Size.Height);
}

The size of PDF pages is not expressed in pixels but in points.
1 inch = 72 points
1 inch = 25.4 mm
That leads to:
1 point = 0.352777778 mm

OpenTK OpenGL Drawing text

I am trying to learn how to do OpenGL using OpenTK and I can successfully draw polygons, circles, and triangles so far but my next question is how to draw text? I have looked at the example on their homepage which was in C# and I translated it to VB .NET.
It currently just draws a white rectangle so I was hoping that someone could spot an error in my code or suggest another way to draw text. I will just list my paint event.
Paint event:
GL.Clear(ClearBufferMask.ColorBufferBit)
GL.Clear(ClearBufferMask.DepthBufferBit)
Dim text_bmp As Bitmap
Dim text_texture As Integer
text_bmp = New Bitmap(ClientSize.Width, ClientSize.Height)
text_texture = GL.GenTexture()
GL.BindTexture(TextureTarget.Texture2D, text_texture)
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, All.Linear)
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, All.Linear)
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, text_bmp.Width, text_bmp.Height, 0 _
, PixelFormat.Bgra, PixelType.UnsignedByte, IntPtr.Zero)
Dim gfx As Graphics
gfx = Graphics.FromImage(text_bmp)
gfx.DrawString("TEST", Me.Font, Brushes.Red, 0, 0)
Dim data As Imaging.BitmapData
data = text_bmp.LockBits(New Rectangle(0, 0, text_bmp.Width, text_bmp.Height), Imaging.ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb)
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, Width, Height, 0, PixelFormat.Bgra, PixelType.UnsignedByte, data.Scan0)
text_bmp.UnlockBits(data)
GL.MatrixMode(MatrixMode.Projection)
GL.LoadIdentity()
GL.Ortho(0, width, Height, 0, -1, 1)
GL.Enable(EnableCap.Texture2D)
GL.Enable(EnableCap.Blend)
GL.BlendFunc(BlendingFactorSrc.One, BlendingFactorDest.OneMinusSrcAlpha)
GL.Begin(BeginMode.Quads)
GL.TexCoord2(0.0F, 1.0F)
GL.Vertex2(0.0F, 0.0F)
GL.TexCoord2(1.0F, 1.0F)
GL.Vertex2(1.0F, 0.0F)
GL.TexCoord2(1.0F, 0.0F)
GL.Vertex2(1.0F, 1.0F)
GL.TexCoord2(0.0F, 0.0F)
GL.Vertex2(0.0F, 1.0F)
GL.End()
GlControl1.SwapBuffers()

You'll get a white rectangle if your card doesn't support NPOT (non-power-of-two) texture sizes. Try testing by setting the bitmap size to e.g. 256x256.

That is an ok method. If you plan to draw lots of text or even a medium amount, that will absolutely destroy performance. What you want to do is look into a program called BMFont:
www.angelcode.com/products/bmfont/‎
What this does is create a texture atlas of text, along with an xml file with the positions, width and height and offsets of every letter. You start off by reading that xml file, and loading each character into a class, with the various values. Then you simply make a function that you pass a string which binds the atlas, than depending on the letters in the string, draws a quad with texture coordinates that vary on the xml data. So you might make a:
for each _char in string
create quad according to xml size
assign texture coordinates relative to xml position
increase position so letters don't draw on top of each other
There are tutorials in other languages on the BMFont website which can be helpful.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pdfbox extract text by area units - pdfbox

The y-coordinates are java coordinates (y == 0 is top), not PDF coordinates (y == 0 is bottom). The units are 1/72 inch which is identical to pixels when you render the PDF at 100%.

Related

How to pixel perfect align text-element

iOS Core Graphics how to optimize incremental drawing of very large image?

Can you change the bounds of a Sampler in a Metal Shader?

The size of PDF documents, how do I convert from millimeters to pixels using Spire.pdf?

OpenTK OpenGL Drawing text

Categories

Resources