With cucumber and rspec I want to determine if a certain file is updated and/or overwritten correctly. The files are images.
As a human I would simply compare the images visually. Rspec obviously cannot do this.
Comparing the original, none-scaled or cropped, image is fine.
I have tried to to download the image then compare the contents.
url = "http://audrey:3000/#{camping.image.url}"
online_file = open(url).read
local_file = IO.binread(File.join(Rails.root, "spec", "fixtures", image))
There is too much wrong with this approach. Getting the absolute url is not easy or portable, fetching over the network is slow, comparing binaries seems ugly.
How should I compare images?
Related
I have developed a simple app for Mac which uses a browser window to display some content. Now the assets (images etc.) are visible to anyone who receives the app and discloses the content in finder using 'show package content'.
Is there a way to prevent this? Can I hide it or encapsulate it somehow using code or some XCode function?
A trivial way would be to change the extension on your files so the system doesn't recognize them as images. You'd then have to load the images as data and convert them to images in code, which would be a bit of a pain.
A more rigorous solution would be to encrypt the images in your app bundle, then write a utility function that loads and decrypts images.
Here's another option.
You can zip all the assets. Use whatever is easiest e.g. pkzip or gzip or even just tar it all. Then you hide a lot of info and, if you want to go the extra step, it is easy to encrypt the zipped file and there are lots of libraries around to include in your project and use to unzip it with.
It should be easy to read assets directly from the zipped file, but if you need them individually you could e.g. put a single file / resource inside a zip or you could unzip it. You could even unzip to temporary space and remove it all when the app quits if you have really sensitive stuff that is too big to fit in memory.
** EDIT **
Java works this way right. A jar file is just a renamed zip and it often contains all of the resources the app needs, and it seems to be working there. So if that is a guide performance should not be too bad.
I am trying to make a dynamic PDF generator as an .NET Core API. I want to take an existing PDF, or .docx file, and edit it so it replaces the current name (John Doe) with something that can be replaced like #NAME_PLACEHOLDER.
I then want to transform #NAME_PLACEHOLDER -> John Doe (or whatever is in the KeyValuePair or Dictionary<string, string>).
I am running this on a Docker environment, so I can easily execute commands and I am willing to do that as well.
So far I have tried a few things:
1) pdf2htmlEX
Executes as pdf2htmlEX file.pdf
Does the job pretty well
Can be converted back to PDF using Google Chrome headless or similar
Problem: Only the characters used in the PDF can be used to replace. So if I only use A, B, C as characters, it will make D into Times New Roman (or default font)
2) LibreOffice ODT to PDF
This was pretty nice, because I could simply unzip the .odt file, open content.xml, search and replace, then save it as an .odt file again
Could be converted into PDF rather easily using soffice --convert-to pdf
LibreOffice is quite nice
Problem 1: Microsoft Word -> Save as ODT tends to break the formatting, so we have to use LibreOffice to go and change it back again
Problem 2: We don't want to move away from Microsoft's Office suite
3) HTML to PDF using Chrome Headless
What you see is what you get
By far the best option, if we're all developers aaand have unlimited time
Problem 1: Only our developers can make changes, since our marketing department do not know HTML
Problem 2: Our existing PDFs would have to be rewritten in HTML
As you can see, I have tried a bunch of things. None of them, except Chrome Headless, has lived up to my expectations. What I really like about #3 is what you see is what you get. I can make the whole thing in HTML, press CTRL+P and see what it looks like as a finished PDF, basically.
I am looking for a better solution, though. It can be paid. It can be free. All I need is to change out words/phrases with other words dynamically, which apparently seems like a tough thing to do.
Thanks for specifying what you've already found clearly. It helps a lot providing a succinct answer.
The conversion is always tricky - I'm sure you know Word has trouble displaying/editing some Word documents itself.
I have experience regarding point #2 "LibreOffice ODT to PDF" and can suggest a few things to test:
Don't use Microsoft to do the docx->odt conversion. It's not good as you know. Use LibreOffice itself to do this step. The rest of your process remains the same.
For some documents, Libre Office does doc->odt much better. So, you can instead work with DOC format and get a better result without any other changes.
You won't be able to remove the devs from the process, but you can certainly reduce their role allowing your business/marketing teams to have more direct input simply by:
get the starting point document to the devs to run through the conversion process. The devs can "clean up" the document to make it convert nicely.
make this version of the document the "official" starting point. The business or technical teams can load it, adjust it, and put it back into the process.
if possible, expose a test-platform to the business teams so they can download, adjust, upload and render to PDF. This cycle means they will be able to achieve more and if they're good, do impressive stuff without any dev input.
the above steps simply mean don't expect perfect conversion of arbitrary complex documents. Starting from a (even complex) working baseline is great.
Some of that might show you that your #2 is actually going to get the best overall results.
I hope that helps.
I wrote a script which tests several rasterization programs, by using the official W3C SVG test suite and comparing the rasterized png with the expected pngs pixel by pixel.
The problem is, with v. 1.1 first edition (2011) and v. tiny 1.2 (2008) test suites, in a lot of images the vectors doesn't match with the expected png, because the revision number is not the same, making a lot of false-positives (more than 90%), like this one.
However it's ok with the v. 1.1 first edition test suite.
I could trunk the png to remove the area with the revision number, but it's a really derpy solution.
So by which png should I compare the rasterized vectors ?
Thanks.
There is no non-derpy solution to this problem, for a few reasons. The test images from this time were never meant to be ref-tests (that is, they are not pixel-by-pixel matches). Also, some of the tests that appear in the later test suites were not accepted as legitimate tests, so the revision numbers were not updated.
The later SVG 1.1 2nd edition test suite should be considered canonical, but even that contains some revision-mark errors, like coords-trans-06-t.
This is actually an issue for the SVG WG to resolve, and I'll raise it with them. The revision numbers in all approved tests should match the PNG references, and we can revise the tests so that the revision numbers match.
In the future we'll be converting these tests (and writing new ones) for SVG 2 as reftests and scripted tests in the web-platform-tests project. The SVG 1.1 tests are at this point unmaintained.
If you really need up-to-date PNG reference images, you could regenerate them. They are generated using Batik's command line SVG to PNG conversion tool. In the SVG Working Group's old CVS repository, there is script (script/generate_reference_images.pl) to do the conversions and a set of SVG files to use (imagePatches/) to convert for tests that we knew Batik didn't get right with the original markup.
I've zipped up SVG 1.1 Second Edition test suite sources and put them here in case you want to try re-generating the images.
This is a bit of a two part question, for working with 40mb xml files.
• What’s a reasonable size to store in memory for a program running continually in the background?
• How to find what has changed in an XML file.
So on the first read the XML is loaded into NSData, then uploaded to the server.
Now instead of uploading a 40mb XML every time it changes, I would prefer to upload a “delta” file containing only what has changed. The program would monitor the file for change, and activate when it’s been modified. From what I can see, I would need to parse an old version of the xml file and parse the modified xml file, then compare them? Is it unreasonable to store 80mb in memory like this every time the file is modified?. Now I’m assuming that this has to be done with a DOM parser because I can’t see how you could compare two files like that with a SAX parser since it only has part of the file stored?
I'm a newbie at this so any help would be appreciated!
To compare two files:
There are many ways to do, (As file is to be considered, I may not be correct):
sdiff file1.xml file2.xml A unix command
You can use this command with apple script.
-[NSFileManager contentsEqualAtPath:andPath:]
This method checks to see if two files at given path are the same file, then compares their size, and finally compares their contents.
For other part:
What size is considered for background process, I dont think so, for an application it matters. You can save these into temporary files. Even safari uses 130+ MB as you can easily check through Activity monitor.
NSXMLParser ended up being the most useful for this
I've been working on an app to create various document formats for a while now, and I've had limited success.
Ideally, I'd like to dynamically create a fairly simple ODT/PDF/DOC file. I've been focusing my efforts on ODT, because it is editable, and open enough that there are several tools which will convert it to any of the other formats I need.
The problem is that the ODT XML files are NOT simple, and there aren't any good-quality API's I could find (especially in python). So far, I've had the most success creating a template ODT file, and then manipulating the DOM in python as needed. This is ok generally, but is quickly becoming inadequate and requires too much tweaking every single time I need to alter one of the templates.
The requirements are:
1) Produce a simple document that will have lists, paragraphs, and the ability to draw simple graphics on the page (boxes, circles, etc...)
2) The ability to specify page size, and the different formats should generally print the exact same output when sent to a printer
My questions:
1) Are there any other ways I can produce ODT/PDF/DOC files?
2) Would LaTeX be acceptable? I've never really used it, does anyone have experience converting LaTeX files into other formats?
3) Would it be possible to use HTML? There are a lot of converters online. Technically you can specify dimensions in mm/cm, etc..., but I am worried that the printed output will differ between browsers/converters....
Any other ideas?
have you tried pandoc? i've been using it with good success for the conversion of different formats into each other. why try to invent the wheel twice?
I suppose to be successful, you'd have to define how you want to input everything. Why don't you just use openoffice? it will save to ODT (duh...), PDF, and HTML (though it's not clean HTML, it's actually quite ugly).
In my recent experience, I've had success going from latex -> xhtml via LaTeXML (i had to compile from source). LaTeX is seeming more and more like a terminal format. It's great for PDF, but once you need some flexibility, it kind of fails. I should also note that there is no latex -> dvi in my workflow, so I can't comment on things like tex4ht that reads out of a dvi file (I have too many graphics that don't work with DVI to switch them now).
Shortly I'll be moving everything into docbook 4.5-- i like the docbook-utils package which supports latex, html, and i even saw a converter to ODT. But docbook is super-heavy on the markup, which is annoying, but it will provide me with the flexibility i need going forward.
Since you're using python, have you just considered using ReStructured Text?
I've also really enjoyed publishing from emacs' orgmode, which is a super light weight markup that goes into a bunch of different formats.