I am looking for the best SVG to NSBezierPath parser, it's ok if the class/library only takes a string of SVG commands, here is an example:
M 435.722 403.542 h -232.44 v -293 h 232.44 c 0 0 0 35.108 0 81.019 c 0.049 2.551 0.079 5.135 0.154 7.748 c -0.208 15.567 12.1 13.618 19.624 28.192 c 2.584 5.005 5.875 30.5 4.875 34.5 c -7 19 -28.707 22.875 -24.653 44.854 c 0 2.667 0 5.31 0 7.923 C 435.722 364.425 435.722 403.542 435.722 403.542 z
The class/library doesn't neccessarily have to parse the .svg file itself, but it should be able to handle all SVG commands and support relative as well as absolute coordinates (in other words, it should be fully compatible with the SVG 1.1 specs found here).
I have found some classes on the web, but they were all limited and failed with my svg commands above. Which is the best SVG to NSBezierPath parser around these days?
Write a program to parse the SVG file using an XML parser.
This will give you your array of paths which you can feed one by one into the following code:
https://www.touchnoc.com/one_way_ticket_from_svg_to_nsbezierpath
** Make sure to look for the code for iOS on the above page.
However, the above code only works for absolute path coordinates not relative path coordinates. Adobe Illustrator outputs only relative path coordinates. You can open your svg from illustrator in a program called sketch (http://www.bohemiancoding.com/sketch/) and export the svg will give you your file in absolute path coordinates.
Your paths are failing probably because they are using relative coordinates - small c, where most of the scripts I found can convert absolute, or big C.
And, there you go. The whole process is not easy but it is doable! I am speaking from experience.
I don't think something like that is available.
But you should be able to parse that yourself very easily, since the commands map directly to NSBezierPath methods.
It's also helpful that the command comes first, so you could do the following:
split into array by spaces
take first element and map to command
take next x elements as parameters (x is known depending on command)
call the matching method on NSBezierPath
If the string is very long, you can also do the same by parsing it stream-wise.
Related
I received a large number of document files, where each document has its own split archive for each page (i.e. file1.001,file1.002,file2.001,file3.001). These are meant to be TIF files that can easily be combined and converted into PDF documents.
However, some of these files will not convert through imagemagick. Some can simply be converted using a different program, which works fine. There are some files where this doesn't work. I tried converting them to .jpg, then to tif, but they won't convert to .jpg. Things got weird when I converted them to .png, as some of these files would have multiple output files associated with them.
This is hard to explain, but I'll try and give an example; file1.001 and file1.002 both have the same image present on them when converted to tif and opened. However, when either of the tif documents is converted to a .png, two .png files are created. One has the original page, but the other one has a second page of the document that I could not view previously.
What could be causing this weird behavior, and how can I convert these to pdf more reliably?
I also used BlueBeam Staple to convert the files, if that helps at all.
Edit:
I've verified I'm on the latest imagemagick release, and I've been using it through PHP to process files. I'm running Windows 10.
Also, here's some example files to play around with. The first TIF actually shows the second page, instead of the page I normally see when I open the file.
Edit 2: Sorry, I thought uploading the image would preserve the file type. Here's a link to some test samples
When I convert your tiff to png, I get two files using IM 7.1.0-10 Q16-HDRI or IM 6.9.12-25 Q16 both on Mac OSX Sierra.
magick -quiet 294944.tif x.png
Produces:
and
Is this not what you get or expect?
P.S.
What are the other two files: 327924.001 327924.002
If those are some kind of split tiff, then it does not look like libtiff, which Imagemagick uses to read TIFFs can handle them. I get errors when attempting to use identify on them.
You definitely have some issue with whatever attempted to write those tiffs.
instrument 294944 page 1 of 2 = G4 199 dpi sheet 2 of 2 294944.tif (25.17 x 17.53 inches)
instrument 294944 page 2 of 2 = G4 199 dpi sheet 1 of 2 294944.tif (24.12 x 17.63 inches)
instrument 327501 page 1 of 1 = UN 72 dpi sheet 1 of 1 327924.001 (124.78 x 93.86 inches)
instrument 327924 page 1 of 2 = G4 400 dpi sheet 1 of 2 327924.002 (23.80 x 17.53 inches)
instrument 327924 page 2 of 2 = G4 400 dpi sheet 2 of 2 327924.002 (23.84 x 17.41 inches)
Two are identified as CCITT Group 4 Fax Encoding which is common for TIFFs of this type.
Tiff is a multi image format so a multipage FAX can be viewed as one file or 4 different printing CMYK colour plates could be sent as one image file for either overlay as one check print or printed one at a time for quality inking.
The file name Tif (or tiff) is usually applied to files with one or more pages (even 400+ for a long novel)
The extension part001.tif part002.tif is usually applied to groups of multiple pages OR for single sequential pages part1.001.tif part1.002.tif
Unfortunately for you you have a mix following a convention that seems to indicate number of pages 002 = 2 pages, but in inconsistent order, so need to check which were used for each file, as there is uncertainty.
Also the internal number does NOT always reflect the filename? perhaps transfer of interest ?
IN ADDITION you have a mix of compression methods and resolution thus cannot be sure of correct scale to be applied.
The best way to resolve this issue is decide how you wish them to be regrouped/sequenced and use the correct scale for each page or group of pages then recombine as desired into PDF.
It would help for a large number to tabulate the pages by number scale size compression etc and then process in identical groups before reorder and merge.
I have written lines of Scilab code which generate a matrix. It is a function whose argument is a vector containing two positive integers and that returns a matrix of size the values of the vector, according to some algorithm. The function also exports the matrix to a figure in LaTeX style, thanks to the prettyprint function.
I would like that figure to be exported to a PDF file, for which I used the function xs2pdf. It works almost fine. The problem is, when serving its intended purpose, the function generates a matrix of size around 40x40, and it never fits on the page. It seems to me like the PDF document created is not even A4.
I didn't include the entire code, all you need to know is that the code generates a matrix named z, and then I have the lines :
//just for this post
z=rand(40,40)
//export to figure
A=prettyprint(z) ;
clf ;
xstring(0,0,A) ;
//export to PDF
xs2pdf(0, '_path_to_pdf_file') ;
The matrix z is created here in order to simulate the matrix that my programme actually generates. If you run this code, having filled in the '_path_to_pdf_file' bit, do you get a decent PDF output?
I could reproduced the same problem. Sometimes the PDF output is not even generated, and Scilab returns an error.
One workaround is to make Scilab create a new TeX file and compile it with pdflatex outside Scilab. The good part is that you can run everything from the same Scilab script. Of course, you'll need a LaTeX distribution installed.
r = 40; c = 40;
z = rand(r,c);
A = prettyprint(z) ;
texfile = "\documentclass{standalone}" + ...
"\usepackage{graphics}" + ...
"\usepackage{amsmath}" + ...
"\setcounter{MaxMatrixCols}{"+ string(c) +"}" + ...
"\begin{document}" + ...
A + ...
"\end{document}"
filename = "matrix.tex";
write(filename,texfile) //write() cannot overwrite a file
dos("pdflatex " + filename) //use unix() instead of dos() in case you're not on Windows
I don't know if you have any knowledge of LaTeX, so I should make a few notes:
The output goes to current Scilab directory. All auxiliary files produced by LaTeX will also be created there.
It uses the standalone class, which crops the PDF output exactly to whatever is described in the .tex file. In this case, only the matrix is printed, with no margins. To use this class, you need the standalone package for LaTeX.
prettystring() outputs the matrix using pmatrix environment, which requires the amsmath package, thus you need this one installed too.
The line \setcounter{MaxMatrixCols}{c} is needed in case you have a matrix with more than 10 columns.
Here is the output:
I have been using QGIS to display a map of the long term precipitation average of the Netherlands. However, when QGIS opens the data, the map is shown upside down
I noticed that the coordinates are displayed from 0 - 266 (lon) and -315 - 0 (lat). I figured that the latitude is projected upside down
In stead of -315 - 0 it should be 0 - 315 and the map should look fine. But I can't figure out how to inverse this value.
The file is a NetCdf file. I openend the XML metadata QGIS made for me with EmEditor, but it did show the right coordinates (in lat/lon), So I think it has something to do with the way QGIS sets up the map or the way it converses the lat/lon to meters.
Anybody who encountered the same problem as me? Thank you in advance!
I'm pretty sure you can use the GDAL configuration option GDAL_NETCDF_BOTTOMUP=[YES/NO] to convert from NetCDF to a geotiff, and get the resulting raster correctly oriented north-up. Try using gdal_translate with the above option. See here for some more details.
Thanks to Micha (see the comments):
I was told to solve the problem using GDAL (Geospatial Data Abstraction Library), a method to look into and translate/process metadata. This was quitte hard to understand, while I am relatively new in programming and using powerfull 'languages' like GDAL.
To enter GDAL codes I used the OSGeo4W Shell, which comes with QGIS. The command that I used to flip my map was:
gdal_translate -of netCDF -co WRITE_BOTTOMUP=NO my netcdf.nc output.nc
(see also this short GDAL/netCDF introduction).
In R you can use rotate function
library(raster)
library(gdalUtils)
workdir <- "Your workind dir"
setwd(workdir)
ncfname <- "adaptor.mars.internal-1563580591.3629887-31353-13-1b665d79-17ad-44a4-90ec-12c7e371994d.nc"
# get the variables you want
dname <- c("v10","u10")
# open using raster
datasetName <-dname[1]
r <- raster(ncfname, varname = datasetName)
r2 <- rotate(r)
writeRaster(r2,"wind.tif",driver = "TIFF")
Does anybody know a way to vectorize the text in a PDF document? That is, I want each letter to be a shape/outline, without any textual content. I'm using a Linux system, and open source or a non-Windows solution would be preferred.
The context: I'm trying to edit some old PDFs, for which I no longer have the fonts. I'd like to do it in Inkscape, but that will replace all the fonts with generic ones, and that's barely readable. I've also been converting back and forth using pdf2ps and ps2pdf, but the font info stays there. So when I load it into Inkscape, it still looks awful.
Any ideas? Thanks.
To achieve this, you will have to:
Split your PDF into individual pages;
Convert your PDF pages into SVG;
Edit the pages you want
Reassemble the pages
This answer will omit step 3, since that's not programmable.
Splitting the PDF
If you don't want a programmatic way to split documents, the modern way would be with using stapler. In your favorite shell:
stapler burst file.pdf
Would generate {file_1.pdf,...,file_N.pdf}, where 1...N are the PDF pages. Stapler itself uses PyPDF2 and the code for splitting a PDF file is not that complex. The following function splits a file and saves the individual pages in the current directory. (shamelessly copying from the commands.py file)
import math
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
def split(filename):
with open(filename) as inputfp:
inputpdf = PdfFileReader(inputfp)
base, ext = os.path.splitext(os.path.basename(filename))
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(inputpdf.getNumPages()))),
'd',
ext
])
for page in range(inputpdf.getNumPages()):
outputpdf = PdfFileWriter()
outputpdf.addPage(inputpdf.getPage(page))
outputname = output_template % (page + 1)
with open(outputname, 'wb') as fp:
outputpdf.write(fp)
Converting the individual pages to SVG
Now to convert the PDFs to editable files, I'd probably use pdf2svg.
pdf2svg input.pdf output.svg
If we take a look at the pdf2svg.c file, we can see that the code in principle is not that complex (assuming the input filename is in the filename variable and the output file name is in the outputname variable). A minimal working example in python follows. It requires the pycairo and pypoppler libraries:
import os
import cairo
import poppler
def convert(inputname, outputname):
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
# We only have one page, since we split prior to converting. Get the page
page = pdffile.get_page(0)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
At this point you should have an SVG in which all text has been converted to paths, and will be able to edit with Inkscape without rendering issues.
Combining steps 1 and 2
You can call pdf2svg in a for loop to do that. But you would need to know the number of pages beforehand. The code below figures the number of pages and does the conversion in a single step. It requires only pycairo and pypoppler:
import os, math
import cairo
import poppler
def convert(inputname, base=None):
'''Converts a multi-page PDF to multiple SVG files.
:param inputname: Name of the PDF to be converted
:param base: Base name for the SVG files (optional)
'''
if base is None:
base, ext = os.path.splitext(os.path.basename(inputname))
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
pages = pdffile.get_n_pages()
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(pages))),
'd',
'.svg'
])
# Iterate over all pages
for nthpage in range(pages):
page = pdffile.get_page(nthpage)
# Output file name based on template
outputname = output_template % (nthpage + 1)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
# Free some memory
surface.finish()
Assembling the SVGs into a single PDF
To reassemble you can use the pair inkscape / stapler to convert the files manually. But it is not hard to write code that does this. The code below uses rsvg and cairo. To convert from SVG and merge everything into a single PDF:
import rsvg
import cairo
def convert_merge(inputfiles, outputname):
# We have to create a PDF surface and inform a size. The size is
# irrelevant, though, as we will define the sizes of each page
# individually.
outputsurface = cairo.PDFSurface(outputname, 1, 1)
outputcontext = cairo.Context(outputsurface)
for inputfile in inputfiles:
# Open the SVG
svg = rsvg.Handle(file=inputfile)
# Set the size of the page itself
outputsurface.set_size(svg.props.width, svg.props.height)
# Draw on the PDF
svg.render_cairo(outputcontext)
# Finish the page and start a new one
outputcontext.show_page()
# Free some memory
outputsurface.finish()
PS: It should be possible to use the command pdftocairo, but it doesn't seem to call render_for_printing(), which makes the output SVG maintain the font information.
I'm afraid to vectorize the PDFs you would still need the original fonts (or a lot of work).
Some possibilities that come to mind:
dump the uncompressed PDF with pdftk and discover what the font names are, then look for them on FontMonster or other font service.
use some online font recognition service to get a close match with your font, in order to preserve kerning (I guess kerning and alignment are what's making your text unreadable)
try replacing the fonts manually (again pdftk to convert the PDF to a PDF which is editable with sed. This editing will break the PDF, but pdftk will then be able to recompress the damaged PDF to a useable one).
Here's what you really want - font substitution. You want some code/app to be able to go through the file and make appropriate changes to the embedded fonts.
This task is doable and is anywhere from easy to non-trivial. It's easy when you have a font that matches the metrics of the font in the file and the encoding used for the font is sane. You could probably do this with iText or DotPdf (the latter is not free beyond the evaluation, and is my company's product). If you modified pdf2ps, you could probably manage changing the fonts on the way through too.
If the fonts used in the file are font subsets that have creative reencoding, then you are in hell and will likely have all manner of pain doing the change. Here's why:
PostScript was designed at a point when there was no Unicode. Adobe used a single byte for characters and whenever you rendered any string, the glyph to draw was taken from a 256 entry table called the encoding vector. If a standard encoding didn't have what you wanted, you were encouraged to make fonts on the fly based on the standard font that differed only in encoding.
When Adobe created Acrobat, they wanted to make transition from PostScript as easy as possible so that font mechanism was modeled. When the ability to embed fonts into PDFs was added, it was clear that this would bloat the files, so PDF also included the ability to have font subsets. Font subsets are made by taking an existing font and removing all the glyphs that won't be used and re-encoding it into the PDF. The may be no standard relationship between the encoding vector and the code points in the file - all those may be changed. Instead, there may be an embedded PostScript function /ToUnicode which will translate encoded characters to a Unicode representation.
So yeah, non-trivial.
For the folks who come after me:
The best solutions I found were to use Evince to print as SVG, or to use the pdf2svg program that's accessible via Synaptic on Mint. However, Inkscape wasn't able to cope with the resulting SVGs--it entered an infinite loop with the error message:
File display/nr-arena-item.cpp line 323 (?): Assertion item->state & NR_ARENA_ITEM_STATE_BBOX failed
I'm giving up this quest for now, but maybe I'll try again in a year or two. In the meantime, maybe one of these solutions will work for you.
I have a EPS file in vector format that I need to convert to PDF, retaining its vector format. I'm using a Windows 7 system, and I'm trying to find a tool that I can redistribute with my application. It can't be GUI or online based; I need my application to use it as a library or via a system call.
I have tried the following tools without success:
ghostscript 9.06 - ps2pdf - Outputs a blank pdf.
ImageMagick - Generates a pdf with the correct image, but it's a raster converter so it does not preserve the vector format.
UniConvertor - Outputs a blank pdf.
pstoedit - Outputs a blank pdf.
Of course, I'm not an expert with any of these tools listed so it's quite possible I'm just not running the tool with the correct configuration; if anyone recognizes a blank pdf as being a symptom of an incorrectly configured run with one of the tools, please let me know of possible fixes. Thank you for any help.
Here is the header of the eps file:
%!PS-Adobe-2.0 EPSF-1.2
%%Creator:Adobe Illustrator(TM) 1.1
%%For:OPS MANUAL FLOE
%%Title:ILLUS.MAC
%%CreationDate:7/27/87 3:40 PM
%%DocumentProcSets:Adobe_Illustrator_1.1 0 0
%%DocumentSuppliedProcSets:Adobe_Illustrator_1.1 0 0
%%DocumentFonts:Courier
%%+Helvetica
%%BoundingBox:000 -750 650 50
%%TemplateBox:288 -360 288 -360
%%EndComments
%%BeginProcSet:Adobe_Illustrator_1.1 0 0
The Bounding box says the marks extend from 0,-750 to 650, 50
So almost the entire content (750/800) is below the page. Note that Ghostscript ignores DSC comments, they are, after all, comments.
In order to position this on the page, you must translate the origin and potentially scale the page. Please note that EPS files are intended for inclusion in other documents, not for printing on their own, and its up to the document manager to read the BoundingVox comments and position the EPS correctly.
In the absence of a document manager, you will have to do this yourself. Note that changing the comments will have no effect at all.
I would suggest you start by prepending the line:
0 750 translate
which will move the origin 750 units vertically and so the page will then extend from 0,0 to 650,800 and see what effect that has.