how to filter background grid on a scanned paper in PDF - pdf

I have a scanned pdf document. The original document is handwritten and the paper used has a block grid in the background. I wanted to print the document and I was wondering if i could delete the background block filter of the paper and print just the written part.

Every image processing job is different so there may be many ways to do this. Can you first post a picture of the page so we can tell what could work and what wouldn't work.
Is the image in color or B/W ?
Are the lines a different color to the text ?
What resolution is the image scanned at ?
Is the image skewed ?
Most likely, you will need to convert the PDF to an image format and use an image processing function called line removal (with repair) to remove the lines. Unfortunatley the best solutions are commercial and not cheap.

Related

Windows Form, image gridlines

Currently I want to create an application.multiscreen transitionstrong text
But I have a problem I do not know how to do these lines that divide the image into several parts. I want to cut the image into several parts and these parts you can handle, please at least some ideas.
If you want to split images up programmatically you are first going to have to choose a file format to support. I suggest PNG to begin with because their encoding is fairly simple to understand (see HERE) and c# has a class to decode it (see HERE.)
You'll want to think of the PNG file as a matrix of RGB values that you can split up and store into separate new smaller PNG files.
If you want to support other image file types you will have to do some research into their encoding formats and handle them differently.
Well I did. Of course other method as I create a WPF application. show image
This program is intended to control 3, 6 and 9 monitors connected by Raspberry Pi. As you can see in the image I want to first grid to have 2 buttons. The first button to set the image resolution of first grid. The second button can send images from the first grid to the first monitor and so on all grids.
Thank you very much for your help. Wait a few suggestions.

Concatenate/Append png images into single Document (any format)

I have a routine to generate png images from a form (1 to 35 image).
I need to append a variable number of this images for printing.Maybe appending on a single pdf or any kind of document, the goal is to automate printing. I can figure out how to print one by one, but i need to use a A4 page (4 images per page). Do you know anything about this, i have been trying with PdfSharp but i cant't figure out how to do this.
Any suggestion, link or code is welcome.
Thank you. Best regards
Diego Porras
A developer gave me this code for concatenate images (in my code I use png formatted images but you might change image format)I have made small fixes myself and this is the result):
See Code for concatenate images

Poor image rendering with Google Docs PDF viewer

I used Word 2007 to create a PDF file with an 1526px * 900px image filling a whole page. This is not the first time it's happened, but Google Docs PDF viewer absolutely mangles the colour rendering making it unusable.
I've taken screenshots at the same zoom level in Google Docs viewer and Foxit Reader.
Here's an image for comparison:
It's awful! I've tried messing about with some things, but can't find anything that can correct this issue.
In Chrome you can select "Print" and then "Save as PDF". The image quality in the saved PDF file will go up significantly, compared to the one from "Download as PDF". Google seems to be optimizing images to preserve bandwidth.
Let it be recorded here, 16 months after the present original posting by Turkeyphant and a similar posting [1] on the Docs+Drive product forum, that the problem appears to have been fixed within about the past week. Since that time, when a pdf (or Word) file is opened that resides on the Docs+Drive cloud, the file is rendered with what appears to be proper 24-bit color. The treatment whereby the color was reduced to 5 bits, which could encode 32 colors or 32 shades of gray or 16 of each, depending on the image, has been abandoned.
To the best of my knowledge the Docs+Drive staff have not announced this change, either on their Blog or on their product forum. I noticed the change a few days ago and noted it on the conversation [1].
[1] (2013-05-21) Problem in pdf-viewer with color images
https://productforums.google.com/d/msg/docs/_bdfiYgjF2s/5PDMdp9MhFQJ
It might have something to do with compression of the image in the PDF.
I mean, PDF supports JPEG2000-encoded images (JPXDecode Filter) and PDF Reference states that:
From a single JPEG2000 data stream, multiple versions of an image may
be decoded. These different versions form progressions along four
degrees of freedom: sampling resolution, color depth, band, and
location. For example, with a resolution progression, a thumbnail
version of the image may be decoded from the data, followed by a
sequence of other versions of the image, each with approximately four
times as many samples (twice the width times twice the height) as the
previous one. The last version is the full-resolution image.
Google Docs viewer might be displaying only first version of the image (with lower resolution or lower color depth) thus producing "awful" output.
Perhaps the attached pair of images will help towards clarifying what is happening with color in images that are rendered through the Google Docs pdf viewer. I inserted the Wikipedia image RGB_Color_Solid_Cube (1024*1024 pixels) into an otherwise empty Google Docs text document, converted it to pdf, and viewed the resulting pdf files two ways: once through the Google Docs+Drive pdf viewer and once through the regular pdf viewer of the Chrome or Firefox browser. Then I made screenshots. Here is the RGB Color Cube via the Docs PDF Viewer and here is the RGB Color Cube via a regular browser PDF Viewer.
The color resolution in the Docs PDF Viewer version is really awful; it looks like 64 colors at most. Maybe someone else is able to recognize this kind of rendering and identify the problem better.
This is related to compression and it's something that you can't change in the default view of Google Docs Viewer. The simple solution is to upload the PDF and just serve it from the site in an iFrame. Here is an example:
Problem Embedding Google Docs PDF Solution
Mike

Is this possible to break the pdf file smaller than page wise breaking?

I found there is a lot of tools available for breaking the Big PDF files into smaller one by splitting the original PDF file PAGE WISE.for example, if i have a 10 page PDF Document,then we can able to break the original pdf file into 10 pieces in page wise splitting.
But i want similar kind of tool that breaks the PDF file smaller than the Page wise splitting.That means,i need to split the PDF page into different documents based on any parameter like paragraph,section,element...
for example,
If my PDF file having 2 pages with 10 paragraphs then i would like to split the pdf file into 10 separate Pdf file based on paragraph parameter...
Also, I strongly believe pdf does not contain any structure like Open XML.But i also Suspecting
How the tools can able to break the pdf files in to small pdf files by splitting page wise? What kind of mechanism they are using for page wise splitting PDF File?
So, Is there any way to do my work? Please give me your valuable suggestion on this?
PDF is a vector based document description language. It's page based so in a way every page is independent from the next one. Splitting page wise is therefore pretty easy. Contrary to a raster image where you can extract small subsets independently in a pdf you have to render the whole page to know how a small subset looks like.
Say you have a Page (black) which contains a complex shaped object (here it is a line but it could be any text, shape, image, etc.) and you want to extract a subset (red). You would have to first find all the objects that produce visible output in the region of interest. Then you would have to modify them so they are rendered correctly (in this case calculate the green points from the blue points while preserving the shape of the object).
An easier approach would be to include the whole page and clip the viewing area to the dimensions of the region.
You could do this with pdfjam. Check the --trim/--offset/--delta command in conjunction with a custom paper size (Example 6,7 on the pdfjam website). You would still have to somehow calculate the coordinates of the region of interest though.

PDF Colo(u)r Analysis (without Acrobat itself ?)

Is there a library/tool which would list all colours used in a PDF document ?
I'm sure Acrobat itself would do this but I would like an alternative (ideally something that could be scripted).
So the idea is if you have a very simple PDF document with four colours in it the output might say :
RGB(100,0,0)
RGB(105,0,0)
CMYK(0,0,0,1)
CMYK(1,1,1,1)
You could explore the insides with pdfbox, but you would have to write some code to find and catalog all those colors.
Most PDF tools have access to this information but no api to access it. You could take any tool and add it in
Apago PDFspy generates an XML file containing all kinds of metadata extracted from PDF files. It reports color usage including spot colors.
We recently added a function called GetPageColorSpaces(0) to the Quick PDF Library - www.quickpdflibrary.com to retrieve much of the ColorSpace info used in the document.
Here is some sample output.
Resource,\"QuickPDFCS2eb0f578\",Separation,\"HKS 52 E\",DeviceCMYK,0.95,0,0.55,0
Resource,\"QuickPDFCSb7b05308\",Separation,\"Black\",DeviceCMYK,0,0,0,1
Resource,\"QuickPDFCSd9f10810\",Separation,\"Pantone 117 C\",DeviceCMYK,0,0.18,1,0.15
Resource,\"QuickPDFCS9314518c\",Separation,\"All\",DeviceCMYK,0,1,0,0.5
Resource,\"QuickPDFCS333d463d\",Separation,\"noplate\",DeviceCMYK,1,0,0,0
Resource,\"QuickPDFCSb41cafc4\",Separation,\"noprint\",DeviceCMYK,0,1,0,0
Resource,\"Cs10\",DeviceN,Black,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,P1495,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,CalRGB,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",Separation,\"P1495\",DeviceCMYK,0,0.31,0.69,0
XObject,\"R29\",Image,,DeviceRGB,-1,-1,-1,-1
Disclaimer: I work at Atalasoft.
Our product, DotImage with the PDF Reader add-on, can do this. The easiest way is to rasterize the page and then just use any of our image analysis tools to get the colors.
This example shows how to do it if you want to group similar colors -- the deployed example will only work for PNG and JPEG, but if you download the code, it's trivial to include the add-on and get PDF as well (let me know if you need help)
Source here:
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/30/color-scheme-generator.aspx
Run it here:
http://www.atalasoft.com/31apps/ColorSchemeGenerator
If you are working with specific and simple PDF documents from a constrained source then you may be able to find the colors by reading through the content stream. However this cannot be a generic solution.
For example PDF documents can contain gradients or transparency. If your document contains this type of construct then you are likely to end up with a wide range of colors rather than a specific set.
Similarly many PDF documents contain bitmapped images. Given that these will need to be interpolated to be displayed at different resolutions, the set of colors in a displayed PDF may be bigger or different to (though obviously broadly similar to) the embedded bitmap.
Similarly many PDF documents contain constructs in multiple color spaces that are rendered into different color spaces. For example a PDF might contain a DeviceRGB bitmap, a line in an ICC based CMYK color and a Lab based rectangle. The displayed version might be in sRGB for display or CMYK for print. Each of these will influence the precise set of colors.
So the only 100% valid answer is going to be related to a particular render of a PDF at a particular resolution to a particular color space. From the resultant bitmap you can determine the colors that have been used.
There are a variety of PDF libraries that will do this type of render including DotImage (referenced in another answer) and ABCpdf .NET (on which I work).