retrieve sparse file from tar created without -S flag - backup

I created a backup for a file then compressed it and store it using tar.
At the time I didn't know it was a sparse file, So I didn't use the -S flag.
Now I am trying to retrieve the data, but I can't since when I extract I get a non sparse file.
Is their a way to retrieve that info, or is it lost for good ?
Thanks in advance.

Sparsity information is kind of redundant. You can determine whether some parts of a file should be sparse by checking whether those parts only contain zeros.
head -c $(( 1024 * 1024 )) /dev/urandom > foo
head -c $(( 1024 * 1024 )) /dev/zero >> foo
head -c $(( 1024 * 1024 )) /dev/urandom >> foo
stat foo
Size: 3145728 Blocks: 6144
fallocate --dig-holes foo
stat foo
Size: 3145728 Blocks: 4096
As you can see from the block count, making it sparse was successful, and all those block that were completely zeroed out have been successfully removed.

Related

Question how to use -ts or -tr flags in gdalwarp for resampling

I run a complex cpp application, where gdal (3.5.3) is a part to get elevation data from eudem v 1.1. The eudem files are quite large and I want to reduce their file size, since I don't need an accuracy of 25m. This is one file I use and want to downsize.
gdalinfo E20N20.TIF
Driver: GTiff/GeoTIFF
Files: E20N20.TIF
Size is 40000, 40000
Coordinate System is:
PROJCS["ETRS89_ETRS_LAEA",
GEOGCS["ETRS89",
DATUM["European_Terrestrial_Reference_System_1989",
SPHEROID["GRS 1980",6378137,298.2572221010042,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6258"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4258"]],
PROJECTION["Lambert_Azimuthal_Equal_Area"],
PARAMETER["latitude_of_center",52],
PARAMETER["longitude_of_center",10],
PARAMETER["false_easting",4321000],
PARAMETER["false_northing",3210000],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]
Origin = (2000000.000000000000000,3000000.000000000000000)
Pixel Size = (25.000000000000000,-25.000000000000000)
Metadata:
AREA_OR_POINT=Area
DataType=Elevation
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 2000000.000, 3000000.000) ( 20d45'24.21"W, 45d41'42.74"N)
Lower Left ( 2000000.000, 2000000.000) ( 16d36'13.25"W, 37d23'21.20"N)
Upper Right ( 3000000.000, 3000000.000) ( 8d 7'39.52"W, 48d38'23.47"N)
Lower Right ( 3000000.000, 2000000.000) ( 5d28'46.07"W, 39d52'33.70"N)
Center ( 2500000.000, 2500000.000) ( 12d41'50.60"W, 43d 6' 4.82"N)
Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
NoData Value=-3.4028234663852886e+38
Metadata:
BandName=Band_1
RepresentationType=ATHEMATIC
```
I tried the following cmd, but w/out success. Its probably about wrong ts or tr values;
```
gdalwarp -r average -tr 1024 1024 -wm 4096 -multi -wo NUM_THREADS=ALL_CPUS -co TILED=YES \
-co NUM_THREADS=ALL_CPUS E20N20.TIF dz_E20N20.TIF
Creating output file that is 977P x 977L.
Processing E20N20.TIF [1/1] : 0Using internal nodata values (e.g. -3.40282e+38) for image E20N20.TIF.
Copying nodata values from source E20N20.TIF to destination dz_E20N20.TIF.
ERROR 1: Integer overflow : nSrcXSize=19989, nSrcYSize=19989
ERROR 1: Integer overflow : nSrcXSize=20012, nSrcYSize=19989
```
I'm lost, here.
Where is your GDAL coming from, this is a 32-bit error because your data exceeds 4GB?
Also, unless you are reprojecting, I strongly advise you to use gdal_translate instead which has much better performance when resampling:
gdal_translate -r average -tr 1024 1024 -wo NUM_THREADS=ALL_CPUS -co TILED=YES \
-co NUM_THREADS=ALL_CPUS E20N20.TIF dz_E20N20.TIF
You can probably use the official Docker images if you need another GDAL build.
Thank you for your time. I found the problem in -wm 4096. I use the Linux Subsystem for windows to make the resampling. I used 1024 a power of 2, because I didn't really understand, how -ts or -ts worked. I finally use:
gdalwarp -r average -ts 8000 8000 -multi -wo NUM_THREADS=ALL_CPUS -co TILED=YES \
-co NUM_THREADS=ALL_CPUS E20N20.TIF dz_E20N20.TIF
This reduces my file size about 75% and the pixel dim moves from 25x25m to 125x125m.

Can anybody tell the units of 262144 in this command (sysctl -w vm.max_map_count=262144)?

I want to know the units of vm.max_map_count=262144 like in KB or MB or GB

Batch PDF Watermark [PDF -> JPG -> PDF]

I am working with over thousands PDF files for a Sheet Music publisher.
All of these PDF files needs a preview PDF. A watermark for PDF files can easily be removed so I am asking for a true way to watermark our PDF:s in a batch operation.
PDF->Apply Watermark->JPG->Back to PDF
How can I do this? Is there a good tool for this operations?
The free route
ImageMagick can do the complete process for you, especially with the composite command's -watermark operator.
#!/bin/sh
# ImageMagick will pick the correct conversion formats based on filename suffixes, or maybe actual binary content?
InputPDF=$1
WatermarkImg=$2
OutputPDF=$3
pdfToImage=pdfToImage.png
imageWithWatermark=imageWithWatermark.png
# Convert PDF to image
convert \
-density 300 \
-trim \
"$InputPDF" \
-quality 100 \
-flatten \
-sharpen 0x1.0 \
$pdfToImage
# Add watermark to intermediate image
composite \
-dissolve 15 \
-tile \
"$WatermarkImg" \
$pdfToImage \
$imageWithWatermark
# Convert intermediate image back to PDF
convert \
$imageWithWatermark \
"$OutputPDF"
# Clean up
rm $pdfToImage $imageWithWatermark
I find the PDF to image conversion acceptable in terms of quality, though you can see some differences when looking at the before and after side-by-side, especially in how bolded glyphs seem less bold:
You can check this good post and its answers for a number of options for converting a PDF to an image, Convert PDF to image with high resolution.
I checked out PDFtoPPM, which was also highly mentioned in that thread, and I still see some degrading of the bolded fonts when converted:
Some more tiling Magick
I used this copyright symbol from Wikimedia Commons and this ImageMagick script:
#!/bin/sh
Infile="Copyright.png"
Outfile="Copyright_tiled.png"
h2=$(convert $Infile -format "%[fx:round(h/2)]" info:)
convert $Infile \
\( -clone 0 -roll +0+"$h2" \) \
+append \
-write mpr:sometile \
+delete \
-size 1224X1584 \
tile:mpr:sometile \
$Outfile
to create this staggered tiling (1224X1584 is the page size (8.5in x 11in) multiplied by 72 px/in, times 2 for a good density of tiles):
And here it is unwatermarked again
#ZachYoung I used some different image magic, also scriptable, the point is:-
Although "What's done cannot be undone" Macbeth (Act 5.1. 63-4) is very true especially within a PDF or image. We also know and expect that it too applies to any PDF (de)constructs. Thus depending on value of a forgery it will always be worth engineering a partially reversed copy, fit for scrutiny or use, but will like the watermarked copy, still not be the original, however all the same, may look almost just as good.
The Idiom implies don't bother yourself about it. Its best not done in the first place.
The nearest to best, is use a watermark exactly the same as the text outlines, like this:-

Increasing disk space for VM

As you can see in the below command I have assigned a total of 500 GB disk space to my VM. But I am seeing 14.4 GB actual space available to the disk and once it gets used completely. I got an error there isn't much space to use? How to extend space for /dev/mapper/centos-root.
I am using VMware ESXi and using centOS for this VM.
[root#localhost Apr]# fdisk -l
Disk /dev/sda: 536.9 GB, 536870912000 bytes, 1048576000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00064efd
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 33554431 15727616 8e Linux LVM
Disk /dev/mapper/centos-root: 14.4 GB, 14382268416 bytes, 28090368 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/centos-swap: 1719 MB, 1719664640 bytes, 3358720 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 byte
Execute bellow Steps to increase your Linux Disk after adding space in VMWare :
Step 1 - update partition table
fdisk /dev/sda
Press p to print the partition table to identify the number of partitions.
Press n to create a new primary partition.
Press p for primary.
Press 3 for the partition number, depending on the output of the partition table print.
Press Enter two times.
Press t to change the system's partition ID.
Press 3 to select the newly creation partition.
Type 8e to change the Hex Code of the partition for Linux LVM.
Press w to write the changes to the partition table.
Step 2 - Restart the virtual machine.
Step 3 - verify that the changes were saved
fdisk -l
Step 4 - convert the new partition to a physical volume
pvcreate /dev/sda3
Step 5 - extend the physical volume (centos is your VG name, if not use your VG name in place of centos )
vgextend centos /dev/sda3
Step 6 - extend the Logical Volume (500G is the size you want to add , if not use the right size in place of 500G)
lvextend -L+500G /dev/mapper/centos-root
Step 7 - expand the ext filesystem online
resize2fs /dev/mapper/centos-root
extend disk without reboot
echo 1 > /sys/block/sda/device/rescan
echo 1 > /sys/block/sdb/device/rescan
echo 1 > /sys/block/nvme0n1/device/rescan_controller
partprobe
gdisk fix warnging
parted change partion size
## parted can executed as command line. but this is very dangerous
parted -s /dev/sdb "resizepart 2 -1" quit
parted -s /dev/sdb "resizepart 3 100%" quit
resizepart 3 100%
pvresize /dev/sda3
lvextend -l +100%FREE cs/root
xfs_growfs /dev/cs/root

Reducing PDF file size using Ghostscript on Linux didn't work

I have about 50-60 pdf files (images) that are 1.5MB large each. Now I don't want to have such large pdf files in my thesis as that would make downloading, reading and printing a pain in the rear. So I tried using ghostscript to do the following:
gs \
-dNOPAUSE -dBATCH \
-sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS="/screen" \
-sOutputFile=output.pdf \
L_2lambda_max_1wl_E0_1_zg.pdf
However, now my 1.4MB pdf is 1.5MB large.
What did I do wrong? Is there some way I can check the resolution of the pdf file? I just need 300dpi images, so would anyone suggest using convert to change the resolution or is there someway I could change the image resolution (reduce it) with gs, since the image is very grainy when I use convert
How I use convert:
convert \
-units PixelsPerInch \
~/Desktop/L_2lambda_max_1wl_E0_1_zg.pdf \
-density 600 \
~/Desktop/output.pdf
Example File
http://dl.dropbox.com/u/13223318/L_2lambda_max_1wl_E0_1_zg.pdf
If you run Ghostscript -dPDFSETTINGS=/screen this is just a sort of shortcut. In fact you'll get (implicitly) a whole bunch of settings used, which you can query with the following command:
gs \
-dNODISPLAY \
-c ".distillersettings {exch ==only ( ) print ===} forall quit" \
| grep '/screen'
On my Ghostscript (v9.06prerelease) I get the following output (slightly edited to increase readability):
/screen
<< /DoThumbnails false
/MonoImageResolution 300
/ColorImageDownsampleType /Average
/PreserveEPSInfo false
/ColorConversionStrategy /sRGB
/GrayImageDownsampleType /Average
/EmbedAllFonts true
/CannotEmbedFontPolicy /Warning
/PreserveOPIComments false
/GrayImageResolution 72
/GrayACSImageDict <<
/ColorTransform 1
/QFactor 0.76
/Blend 1
/HSamples [2 1 1 2]
/VSamples [2 1 1 2]
>>
/ColorImageResolution 72
/PreserveOverprintSettings false
/CreateJobTicket false
/AutoRotatePages /PageByPage
/MonoImageDownsampleType /Average
/NeverEmbed [/Courier
/Courier-Bold
/Courier-Oblique
/Courier-BoldOblique
/Helvetica
/Helvetica-Bold
/Helvetica-Oblique
/Helvetica-BoldOblique
/Times-Roman
/Times-Bold
/Times-Italic
/Times-BoldItalic
/Symbol
/ZapfDingbats]
/ColorACSImageDict <<
/ColorTransform 1
/QFactor 0.76
/Blend 1
/HSamples [2 1 1 2]
/VSamples [2 1 1 2] >>
/CompatibilityLevel 1.3
/UCRandBGInfo /Remove
>>
I'm wondering if your PDFs are image-heavy, and if this sort of conversion does un-welcome things (f.e. re-sampling images with the 'wrong' parameters) which increase the file size...
If this is the case (image-heavy PDF), tell so, and I'll update this answer with a few suggestions....
Update
I had a look at the sample file provided by DNA. Interesting...
No, it does not contain any image.
Instead, it contains one large stream (compressed using /FlateDecode) which consists of roughly 700.000+ (!!) operations, mostly single vector operations in PDF language, such as:
m (moveto),
l (lineto),
d (setdash),
w (setlinewidth),
S (stroke),
s (closepath and stroke),
W* (eoclip),
rg and RG (setrgbcolor)
and a few more.
(That PDF code is very inefficiently written AFAICS (but does its job), because it does concatenate many short strokes instead of doing 'long' ones, and nearly each stroke defines the color again (even if it is the same as before), and has all the other overhead (start stroke, end stroke,...).
Ghostscript's -dPDFSETTINGS=/screen do not have any effect here (there are no images to downsample, for example). The increased file size (+48 kByte to be precise) is probably due to Ghostscript re-organizing some of the internal stroking etc. commands to a different order when it interprets the file.
So there is not much you can do about the PDF file size ...
...unless you convert each of these PDF pages into a REAL image such as PNG:
gs \
-o out72.png \
-sDEVICE=pngalpha \
L_2lambda_max_1wl_E0_1_zg.pdf
(I used the pngalpha output to get transparent background.) The image dimensions of 'out.png' are 259x213px, the filesize is now 70 kByte. But I'm sure you'll not like the quality :-)
The output quality is 'bad' because Ghostscript uses a default resolution of 72 dpi.
Since you said you'd like to have 300dpi, the command becomes this:
gs \
-o out300.png \
-sDEVICE=pngalpha \
-r300 \
L_2lambda_max_1wl_E0_1_zg.pdf
The filesize now is 750 kByte, the image dimensions are 1080x889 Pixels.
Update 2
Since Curiosity is en vogue these days... :-) ...I tried to bring down the file size with the help of Adobe Acrobat X Pro on Mac.
You wanna know the results?
Performing a 'Save as... (PDF with reduced filesize)' -- which for me in the past always yielded very good results! -- created a 1,8++ MByte file (+29%). I guess this definitely puts Ghostscript's performance (file size increase +3%) into a realistic perspective !
DNA decided to go for grayscale PNGs. The way he's creating them is in two steps:
Step 1: Convert a color PDF page (such as this) to a grayscale PDF page, using Ghostscript's pdfwrite device and the settings
-dColorConversionStrategy=/Gray and
-dProcessColorModel=/DeviceGray.
Step 2: Convert the grayscale PDF page to a PNG, using Ghostscript's pngalpha device at a resolution of 300 dpi (-r300 on the GS commandline).
This reduces his initial file size of 1.4 MB to 0.7 MB.
But this workflow has the following disadvantage:
It looses all color info, without saving much disk space as compared to a color output written at the same resolution, directly from the PDF!
There are 2 alternatives to DNA's workflow:
A one-step conversion of (color) PDF -> (color) PNG, using Ghostscript's pngalpha device with the original PDF as input (same settings of 300 dpi resolution). This would have this advantage:
It would keep the color information in the PNG output, requiring only a little more space on disk!
A one-step conversion of (color) PDF -> grayscale PNG, using Ghostscript's pnggray device with the original PDF as input (same settings of 300 dpi resolution), with this mix of advantage/disadvantage :
It would loose the color information in the PNG output.
It would loose the transparent background that was preserved in DNA's workflow.
It would save lots of disk space, because the filesize would go down to about 20% of the output from DNA's workflow.
So you can make up your mind and see the output sizes and quality side-by-side, here is a shell script to demonstrate the differences:
#!/bin/bash
#
# Copywrite (c) 2012 <kurt.pfeifle#gmail.com>
# License: Creative Commons (CC BY-SA 3.0)
function echo_do() {
echo
echo "Command: ${*}"
echo "--------"
echo
"${#}"
}
[ -d out ] || mkdir out
echo
echo " We assume all PDF pages are 1-page PDFs!"
echo " (otherwise we'd have to include something like '%03d'"
echo " into the output filenames in order to get paged output)"
echo
echo '
# Convert Color PDF to Grayscale PDF.
# If PDF has transparent background (most do),
# this will remain transparent in output.)
# ATTENTION: since we don't use a resolution,
# pdfwrite will use its default value of '-r720'.
# (However, this setting will only affect raster objects...)
'
for i in *.pdf
do
echo_do gs \
-o "out/${i}---pdfwrite-devicegray-gs.pdf" \
-sDEVICE=pdfwrite \
-dColorConversionStrategy=/Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
"${i}"
done
echo '
# Convert (previously generated) grayscale PDF to PNG using Alpha channel
# (Alpha channel can make backgrounds transparent)
'
for i in out/*pdfwrite-devicegray*.pdf
do
echo_do gs \
-o "out/$(basename "${i}")---pngalpha-from-pdfwrite-devicegray-gs.png" \
-sDEVICE=pngalpha \
-r300 \
"${i}"
done
echo '
# Convert (color) PDF to grayscale PNG using Alpha channel
# (Alpha channel can make backgrounds transparent)
'
for i in *.pdf
do
# Following only required for 'pdfwrite' output device, not for 'pngalpha'!
# -dProcessColorModel=/DeviceGray
echo_do gs \
-o "out/${i}---pngalphagray_gs.png" \
-sDEVICE=pngalpha \
-dColorConversionStrategy=/Gray \
-r300 \
"${i}"
done
echo '
# Convert (color) PDF to (color) PNG using Alpha channel
# (Alpha channel can make backgrounds transparent)
'
for i in *.pdf
do
echo_do gs \
-o "out/${i}---pngalphacolor_gs.png" \
-sDEVICE=pngalpha \
-r300 \
"${i}"
done
echo '
# Convert (color) PDF to grayscale PNG
# (no Alpha channel here, therefor [mostly] white backgrounds)
'
for i in *.pdf
do
echo_do gs \
-o "out/${i}---pnggray_gs.png" \
-sDEVICE=pnggray \
-r300 \
"${i}"
done
echo " All output to be found in ./out/ ..."
echo
Run this script and compare the different outputs side by side.
Yes, the 'direct-grayscale-PNG-from-color-PDF-using-pnggray-device' one may look a bit worse (and it doesn't sport the transparent background) than the other one -- but it is also only 20% of its file size. On the other hand, if you wan to buy a bit more quality by sacrificing a bit of disk space -- you could use -r400 instead of -r300...