How to merge many VRT file into one - gdal

I have many VRT files generated using gdal_translate originally for adjacent images.
Is there away to merge all those VRT file into one VRT file so that when I run gdal2tiles.py I only need to give it this one composite VRT file?
I thought first gdal_wrap will do the trick, but it turn out that gdal_wrap images into one single image.. However, I dont want to merge images, I would like to merge VRT file.

There is gdalbuildvrt utility in GDAL since 1.6.1 - which merges multiple input files into one VRT mosaic file. See this official documentation for usage details:
http://www.gdal.org/gdalbuildvrt.html
You just need to list all the individual files and the output filename very probably.
You have tagged your questions with "maptiler" label, which refers to http://www.maptiler.com/ product. MapTiler is able to render multiple files out of the box and is not using VRT at all internally. It is more efficient to supply the individual input files to maptiler directly, then to create a VRT and pass it to the software. VRT introduces artificial internal block size for reading the data - which slows down the tile rendering process, in some cases significantly.
Feel free to request a demo of MapTiler Pro and compare the speed, size and quality of the map tiles you receive - and post the results here.

Related

Load files into Photoshop layers as vector smart objects

Bridge is packaged with a script that will load multiple files as their own layer in a Photoshop file. There are two problems when you do this with a vector file:
It converts the files to raster layers. And since you don't get to choose the size of the file beforehand, if they're too small, you can't scale them up without losing quality.
It doesn't preserve antialiasing, leaving ugly jagged edges on whatever art you imported.
Is there a way to import multiple files into Photoshop as vector smart objects? Then you'd have full control over the quality. Alternatively, is there a way to define the size of the vector files you're loading into layers and/or preserve their antialiasing?
I found a script that loads files into Photoshop as smart objects, but this has the same two problems the factory Bridge script has. It appears to do the exact same thing, but converts the layers to smart objects after they are imported.
The only way I currently know of to get vector smart objects into Photoshop is to do so manually one by one by copying from Illustrator or by dragging the files to an open Photoshop file. I'm looking for a way to automate the process.
I'm afraid doing it manually is the only way to get where you want to go. I've wrestled with this same issue for years and hope with every PS/Bridge update they'll add the option to load a stack of smart objects, but so far it's still old-school drag n' drop.
Hit the Adobe suggestion box... maybe with enough requests they'll finally add this as a native feature.

Write KML Extended Data in a different way

I have some GPS raw data that I want to put on a KML file.
Currently I can generate the KML file with the Extended Data using the KML format described here https://developers.google.com/kml/documentation/kmlreference#trackexample and that's fine, but it takes too much time.
I am collecting six different types of extended data, using an Arduino and writing them on a SD card, but the entire writing process for each sample is too slow (I write the data on six different files and then I append each file to the final KML, using the gx:track element).
Is there any other way to write all six parameters at the same time, in the KML format using the Extended Data ? maybe using different tags or same tags in different order?
I don't have enough cpu power to rework the file after collecting gps raw data, so I need to write it right the first time.
write the kml totally yourself, do not use an library. Then it is as fast as simply writing text to a file. if the bottleneck is the file system, then kml is not the right format. Use a custom binary file, and transform later to kml on server side.

Write multiple streams to a single file without knowing the length of the streams?

For performance of reading and writing a large dataset, we have multiple threads compressing and writing out separate files to a SAN. I'm making a new file spec that will instead have all these files appended together into a single file. I will refer to each of these smaller blocks of a data as a subset.
Since each subset will be an unknown size after compression there is no way to know what byte offset to write to. Without compression each writer can write to a predictable address.
Is there a way to append files together on the file-system level without requiring a file copy?
I'll write an example here of how I would expect the result to be on disk. Although I'm not sure how helpful it is to write it this way.
single-dataset.raw
[header 512B][data1-45MB][data2-123MB][data3-4MB][data5-44MB]
I expect the SAN to be NTFS for now in case there are any special features of certain file-systems.
If I make the subsets small enough to fit into ram, I will know the size after compression, but keeping them smaller has other performance drawbacks.
Use sparse files. Just position each subset at some offset "guaranteed" to be beyond the last subset. Your header can then contain the offset of each subset and the filesystem handles the big "empty" chunks for you.
The cooler solution is to write out each subset as a separate file and then use low-level filesystem functions to join the files by chaining the first block of the next file to the last block of the previous file (along with deleting the directory entries for all but the first file).

Is it possible to extract tiff files from PDFs without external libraries?

I was able to use Ned Batchelder's python code, which I converted to C++, to extract jpgs from pdf files. I'm wondering if the same technique can be used to extract tiff files and if so, does anyone know the appropriate offsets and markers to find them?
Thanks,
David
PDF files may contain different image data (not surprisingly).
Most common cases are:
Fax data (CCITT Group 3 and 4)
raw raster data with decoding parameters and optional palette all compressed with Deflate or LZW compression
JPEG data
Recently, I (as developer of a PDF library) start noticing more and more PDFs with JBIG2 image data. Also, JPEG2000 sometimes can be put into a PDF.
I should say, that you probably can extract JPEG/JBIG2/JPEG2000 data into corresponding *.jpeg / *.jp2 / *.jpx files without external libraries but be prepared for all kinds of weird PDFs emitted by broken generators. Also, PDFs quite often use object streams so you'll need to implement sophisticated parser for PDF.
Fax data (i.e. what you probably call TIFF) should be at least packed into a valid TIFF. You can borrow some code for that from open source libtiff for example.
And then comes raw raster data. I don't think that it makes sense to try to extract such data without help of a library. You could do that, of course, but it will take months of work.
So, if you are trying to extract only specific kind of image data from a set of PDFs all created with the same generator, then your task is probably feasible. In all other cases I would recommend to save time, money and hair and use a library for the task.
PDF files store Jpegs as actual JPEGS (DCT and JPX encoding) so in most cases you can rip the data out. With Tiffs, you are looking for CCITT data (but you will need to add a header to the data to make it a Tiff). I wrote 2 blog articles on images in PDF files at http://www.jpedal.org/PDFblog/2010/09/understanding-the-pdf-file-format-images/ and http://www.jpedal.org/PDFblog/2011/07/extract-raw-jpeg-images-from-a-pdf-file/ which might help.

How to optimize PDF file size?

I have an input PDF file (usually, but not always generated by pdfTeX), which I want to convert to an output PDF, which is visually equivalent (no matter the resolution), it has the same metadata (Unicode text info, hyperlinks, outlines etc.), but the file size is as small as possible.
I know about the following methods:
java -cp Multivalent.jar tool.pdf.Compress input.pdf (from http://multivalent.sourceforge.net/). This recompresses all streams, removes unused objects, unifies equivalent objects, compresses whitespace, removes default values, compresses the cross-reference table.
Recompressing suitable images with jbig2 and PNGOUT.
Re-encoding Type1 fonts as CFF fonts.
Unifying equivalent images.
Unifying subsets of the same font to a bigger subset.
Remove fillable forms.
When distilling or otherwise converting (e.g. gs -sDEVICE=pdfwrite), make sure it doesn't degrade image quality, and doesn't increase (!) the image sizes.
I know about the following techniques, but they don't apply in my case, since I already have a PDF:
Use smaller and/or less fonts.
Use vector images instead bitmap images.
Do you have any other ideas how to optimize PDF?
Optimize PDF Files
Avoid Refried Graphics
For graphics that must be inserted as bitmaps, prepare them for maximum compressibility and minimum dimensions. Use the best quality images that you can at the output resolution of the PDF. Inserting compressed JPEGs into PDFs and Distilling them may recompress JPEGs, which can create noticeable artifacts. Use black and white images and text instead of color images to allow the use of the newer JBIG2 standard that excels in monochromatic compression. Be sure to turn off thumbnails when saving PDFs for the Web.
Use Vector Graphics
Use vector-based graphics wherever possible for images that would normally be made into GIFs. Vector images scale perfectly, look marvelous, and their mathematical formulas usually take up less space than bitmapped graphics that describe every pixel (although there are some cases where bitmap graphics are actually smaller than vector graphics). You can also compress vector image data using ZIP compression, which is built into the PDF format. Acrobat Reader version 5 and 6 also support the SVG standard.
Minimize Fonts
How you use fonts, especially in smaller PDFs, can have a significant impact on file size. Minimize the number of fonts you use in your documents to minimize their impact on file size. Each additional fully embedded font can easily take 40K in file size, which is why most authors create "subsetted" fonts that only include the glyphs actually used.
Flatten Fat Forms
Acrobat forms can take up a lot of space in your PDFs. New in Acrobat 8 Pro you can flatten form fields in the Advanced -> PDF Optimizer -> Discard Objects dialog. Flattening forms makes form fields unusable and form data is merged with the page. You can also use PDF Enhancer from Apago to reduce forms by 50% by removing information present in the file but never actually used. You can also combine a refried PDF with the old form pages to create a hybrid PDF in Acrobat (see "Refried PDF" section below).
see article
From PDF specification version 1.5 there are two new methods of compression, object streams and cross reference streams.
You mention that the Multivalent.jar compress tool compresses the cross reference table. This usually means the cross reference table is converted into a stream and then compressed.
The format of this cross reference stream is not fixed. You can change the bit size of the three "columns" of data. It's also possible to pre-process the stream data using a predictor function which will improve the compression level of the data. If you look inside the PDF with a text editor you might be able to find the /Predictor entry in the cross reference stream dictionary to check whether the tool you're using is taking advantage of this feature.
Using a predictor on the compression might be handy for images too.
The second type of compression offered is the use of object streams.
Often in a PDF you have many similar objects. These can now be combined into a single object and then compressed. The documentation for the Multivalent Compress tool mentions that object streams are used but doesn't have many details on the actual choice of which objects to group together. The compression will be better if you group similar objects together into an object stream.