How to generate a tiff file with meta data - objective-c

I have to generate a tiff file with many images and meta data.
I found that it's possible to convert a png or a jpg to tiff here :
But how to add meta data ? it is possible with ImageMagic for iOS ?
thanks
Edit: finaly i installed ImageMagick on iphone, but i don't found how to create multipage tiff with magickwand .... it's possible also to use libtiff directly :
i found how to create a empty simple page in c code
char buffer[25 * 144] = { /* boring hex omitted */ };
TIFF *image;
char szFileName[512];
strcpy(szFileName, getenv("HOME"));
strcat(szFileName, "/Documents/");
strcat(szFileName, "output.tif");
// Open the TIFF file
if((image = TIFFOpen(szFileName, "w")) == NULL)
{
printf("Could not open output.tif for writing\n");
}
// We need to set some values for basic tags before we can add any data
TIFFSetField(image, TIFFTAG_IMAGEWIDTH, 25 * 8);
TIFFSetField(image, TIFFTAG_IMAGELENGTH, 144);
TIFFSetField(image, TIFFTAG_BITSPERSAMPLE, 1);
TIFFSetField(image, TIFFTAG_SAMPLESPERPIXEL, 1);
TIFFSetField(image, TIFFTAG_ROWSPERSTRIP, 144);
TIFFSetField(image, TIFFTAG_COMPRESSION, COMPRESSION_CCITTFAX4);
TIFFSetField(image, TIFFTAG_PHOTOMETRIC, PHOTOMETRIC_MINISWHITE);
TIFFSetField(image, TIFFTAG_FILLORDER, FILLORDER_MSB2LSB);
TIFFSetField(image, TIFFTAG_PLANARCONFIG, PLANARCONFIG_CONTIG);
TIFFSetField(image, TIFFTAG_XRESOLUTION, 150.0);
TIFFSetField(image, TIFFTAG_YRESOLUTION, 150.0);
TIFFSetField(image, TIFFTAG_RESOLUTIONUNIT, RESUNIT_INCH);
// Write the information to the file
TIFFWriteEncodedStrip(image, 0, buffer, 25 * 144);
// Close the file
TIFFClose(image);
So are there any c tutorial about how insert images Data in the created tiff file ?
and how create multipage tiff ?
thx

I have used LibTIFF - but not on iOS directly. But then, it is a plain old C library, so should be fine. I do note that the Apple Image I/O framework supposedly supports image meta-data (but again, I have not used this myself). See link here - but seemingly nothing for multi-page TIFF or bespoke tags, only standardised camera info tags...
However, in plain C, adding your own bespoke tags is usually performed most simply by modifying the core library and adding them to the main header file: tiff.h along with a few wrapper functions.
See section on "Adding New Tags"
And then you can refer/use them as you would other TIFF tags, e.g. what I have done to load in some embedded xml data:
TIFFGetField(lp_tif,MYTAG, &lp_xml)
Of course, you then have to ship/maintain your modified version of libTIFF (new "public" tags have to go through the process with Adobe).
The example you posted is fine for writing a single TIFF file (i.e set all the tags then write the contents of buffer to the file). For multi-page TIFF - it's one step further. You need to understand the concept if a Image File Directory (IFD). I would suggest looking at this link further to understand the use of the functions:
TIFFWriteDirectory()
TIFFReadDirectory()
NB: Properly, every TIFF file should have one directory to associate all the tags and image data together.
Finally, you can of course go one level even further! If you know the fixed structure of the TIFF file you want to create - simply write the bytes without even using LibTIFF.

In case it is an option for you to use a script for adding the meta data:
Use Phil Harvey's exiftool!
exiftool is a quite powerful, well-documented (and multi-platform) commandline utility to read and write meta data from/to lots of different file formats, including TIFF.

Related

How to replace PDF page

Given a PDF (with multiple pages), a page (stream) and a index , how can i replace the page at target index with my stream and save the pdf ?
public Stream ReplacePDFPageAtIndex(Stream pdfStream,int index,Stream replacePageStream)
{
List<Stream> pdfPages= //fetch pdf pages
pdfPages.RemoveAt(index);
pdfPages.Insert(index,replacePageStream);
Stream newPdf=//create new pdf using the edited list
return newPdf;
}
I was wondering if there is any open source solution or i can roll my own with ease , given the fact that i just need to split the pdf in pages , and replace one of them.
I have also checked ITextSharp but i do not see the functionality that i require.

PDFs: Extracting text associated with font (linux)

The general problem that I'm trying to solve is to determine how much text in a large set of PDFs is associated with different fonts. I know I can extract text from a PDF using pdftotext and fonts information with pdffonts, but I can't figure out how to link those together. I have 100,000+ PDFs to process, so will need something I can program against (and I don't mind a commercial solution).
PDFTron PDFNet SDK can extract all the graphic operations, including text objects, including link to the font being used.
Starting with the ElementReader sample, you can get the Font for every text element.
https://www.pdftron.com/documentation/samples?platforms=windows#elementreader
https://www.pdftron.com/api/PDFNet/?topic=html/T_pdftron_PDF_Font.htm
The Adobe PDF Library - a product my company sells - can do that.
This is part of the sample code:
// This callback function is called fpr each PDWord object.
ACCB1 ASBool ACCB2 WordEnumProc(PDWordFinder wfObj, PDWord pdWord, ASInt32 pgNum, void* clientData)
{
char str[128];
char fontname[100];
// get word text
PDWordGetString(pdWord, str, sizeof(str));
// get the font name
PDStyle style = PDWordGetNthCharStyle(wfObj, pdWord, 0);
PDFont wordFont = PDStyleGetFont(style);
PDFontGetName(wordFont, fontname, sizeof(fontname));
printf("%s [%s]\n", str, fontname);
return true;
}
This is the output example:
...
Chapter [Arial,Bold]
2: [Arial,Bold]
Overview [Arial,Bold]
27 [Arial]
...
This [TimesNewRoman]
book [TimesNewRoman]
describes [TimesNewRoman]
the [TimesNewRoman]
Portable [TimesNewRoman]
Document [TimesNewRoman]
Format [TimesNewRoman]
...

How can I reduce size of generated PDF using iText7

I use this method to copy and scale page by page number from original PDF and put them to generated PDF which contains only selected and scaled pages from original PDF.
private static void addScaledPage(PdfDocument pdf, PdfDocument srcDoc, String pageNumber) throws IOException {
PdfPage page = pdf.addNewPage(PageSize.A4);
PdfCanvas canvas = new PdfCanvas(page);
AffineTransform transformationMatrix = AffineTransform.getScaleInstance(0.86, 0.86);
canvas.concatMatrix(transformationMatrix);
PdfFormXObject pageCopy = srcDoc.getPage(Integer.valueOf(pageNumber)).copyAsFormXObject(pdf);
canvas.addXObject(pageCopy, 50, 30);
}
This code works fine, but small issue happen when I try to take 3 pages from original PDF which have 140 pages and approx. 10 MB size => the generated PDF with 3 selected pages also have approx. 10 MB.
Also, when I try to copy 3 pages or 10 pages from original document I got always the same size of generated PDF => it seems like references are copied from source PDF
I would appreciate to give me some advice, did I do something wrong in the implementation? Or some other advice?
Kindest regards,
It depends a lot on the resources embedded in the document. If a large image that uses CMYK color, or a font with CJK glyphs (either of these resources could easily be several MB in size) is used on the pages you are copying, that entire resource will be copied into the PDF you're creating. The fact that you are only copying three out 140 pages wouldn't make much difference: the bulk of the file size will be taken up by the resource, and the pages won't display properly without it.
A solution would be a workflow that optimizes your document during or after copying the pages. This could convert images to an equivalent, smaller color space, or subset the font so that you only carry the required glyphs. Both of these techniques can substantially reduce the size of the file (but this is all dependent on how the source file itself is constructed, of course).

Best practice to compress generated PDF using iText7

I have existing / source PDF source document and I copy selected pages from it and generate destination PDF with selected pages. Every page in existing / source document is scanned in different resolution and it varies in size:
generated document with 4 pages => 175 kb
generated document with 4 pages => 923 kb (I suppose this is because of higher scan resolution of each page in source document)
What would be best practice to compress this pages?
Is there any code sample with compressing / reducing size of final PDF which consists of copied pages of source document in different resolution?
Kindest regards
If you are just adding scans to a pdf document, it makes sense for the size of the resulting document to go up if you're using a high resolution image.
Keep in mind that iText is a pdf library. Not an image-manipulation library.
You could of course use regular old java to attempt to compress the images.
public static void writeJPG(BufferedImage bufferedImage, OutputStream outputStream, float quality) throws IOException
{
Iterator<ImageWriter> iterator = ImageIO.getImageWritersByFormatName("jpg");
ImageWriter imageWriter = iterator.next();
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(quality);
ImageOutputStream imageOutputStream = new MemoryCacheImageOutputStream(outputStream);
imageWriter.setOutput(imageOutputStream);
IIOImage iioimage = new IIOImage(bufferedImage, null, null);
imageWriter.write(null, iioimage, imageWriteParam);
imageOutputStream.flush();
}
But really, putting scanned images into a pdf makes life so much more difficult. Imagine the people who have to handle that document after you. They open it, see text, try to select it, and nothing happens.
Additionaly, you might change the WriterProperties when creating your PdfWriter instance:
PdfWriter writer = new PdfWriter(dest,
new WriterProperties().setFullCompressionMode(true));
Full compression mode will compress certain objects into an object stream, and it will also compress the cross-reference table of the PDF. Since most of the objects in your document will be images (which are already compressed), compressing objects won't have much effect, but if you have a large number of pages, compressing the cross-reference table may result in smaller PDF files.

Renaming a pdf file (scanned document) using OCR. It should read 3 zones and rename accordingly. e.g. Streetname_LastName_Date.pdf

I am running into trouble with tons of paperwork. I want to have it digitalized in order to simplify search and therefore cut down a huge amount of time spent on looking though the paperwork.
It is rather simple, I want to scan documents, which share the same layout and rename it according to 3 areas within the document. In my case its a reference number, a Last name and the date listed on the document. It would be even better if it could move the files to folders named after an area in the document.
Here is an image, basically this but with hundreds of pdfs in batch.
http://i.imgur.com/8vwwyEb.png
I couldn't find any solution for days and yet the technology is there. Have you ever gotten across a problem like this and found a solution? I would really appreciate your help.
The closes thing I have found is a program called FileCenter, but you need to click a button for each scan. Using ocr on existing files require you to go through a 3 click menu for each file. I wonder if there is an easy batch program, where you just select the rectangles and it does the renaming part.
I will edit this OP if any solution can be found, for anyone googling.
You may do this with the commercial component ByteScout PDF Extractor SDK designed specifically for this purpose. It may extract text from a given region by coordinates with optional OCR (that also works in the selected extraction region) in a batch. Coordinates of the region to extract text from can be measured in the base document with PDF Multitool free utility (asssuming all your PDF files are using the same layout).
You may extract text from given regions in C# like using OCR (English language):
using System;
using System.IO;
using System.Text;
using Bytescout.PDFExtractor;
using System.Drawing;
using System.Diagnostics;
namespace Example
{
class Program
{
static void Main(string[] args)
{
// Create Bytescout.PDFExtractor.TextExtractor instance
TextExtractor extractor = new TextExtractor();
extractor.RegistrationName = "demo";
extractor.RegistrationKey = "demo";
// enable OCR auto mode, will use English by default
extractor.OCRMode = OCRMode.Auto;
string sourceFile = "sample.pdf";
// Load source PDF file
extractor.LoadDocumentFromFile(sourceFile);
// extract from given area (measured from base typical file using PDF Multitool utility), assuming we have a reference string there
extractor.SetExtractionArea(Rectangle.FromLTRB(10, 10, 100,100));
string extractedReference = extractor.GetTextFromPage(0).Trim();
extractor = null; // dispose the extractor and release the original file
// Copy the original file into the file with filename based on the original reference so it will be like "1234-sample.pdf"
string outputFile = extractedReference + "-" + sourceFile;
File.Copy(sourceFile, outputFile);
Console.WriteLine();
Console.WriteLine(inputFile + " has been copied to " + outputFile);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
}
Disclosure: I'm connected with ByteScout