JPEG headers missing/corrupted - header

I have a 130kb jpeg image that wont open in anything and I need to fix it. From the various image recovery softwares that I used all I got was "Image headers corrupted/missing". I dont even get anything when I look up in the properties of the file, no dimensions etc., just the file size. Is it possible to recover the image once its headers are lost? I dont want to use any recovery softwares anymore. I got one idea from a colleague to parse the jpeg and look for anomalies compared to a working jpeg. Any other ideas?

The only method I can think of is to look at the JPEG using a hex editor, and check if its contents conform to the JPEG spec. Good luck.

Related

Inkscape objects lose transparency when saved as a PDF

I've seen a number of issues dealing with saving inkscape drawings as anything other than a svg but haven't seen a discussion specifically about transparent objects in PDFs. What's happening is that when I export a png any transparent object looks fine but if I save it as a PDF or eps the transparency is lost.
I've created an example which you can see at this link ( http://imgur.com/a/ieVuu )
I've looked at a lot of other posts and feel like the explanation to this is layered within the responses but I'm a beginner and can't read between the lines to understand it. I wanted to just ask why this is happening and what can be done about it directly?
Use Alpha Channel to set the transparency instead of Opacity option.
Typically this is an issue with the pdf printing program that you are using. Perhaps it's a configuration issue, perhaps it's just not advanced enough to handle it.
Try out several other programs that can print to pdf and see if you can find one that works for your use case.

PDF with OCR text visible, how to hide it from existing PDF

I have several PDF files that have been OCR-processed (not by me). They contain both the scanned image and the OCR text. They seem to work fine in some viewers (iPhone/iPad), but not in others (Preview.app on macOS) which makes them somewhat awkward to read.
From googling around, it seems that the text & image may be layered incorrectly or there is a problem with the fonts used? I'm not even sure I'm using the correct vocabulary, as most hits I get are worthless.
Is it possible to use ghostscript or something to batch-fix these files?
Example of "bad" rendering:
Its impossible to say what's wrong with the PDF file (or viewer) without seeing the PDF file, which alse makes it hard to propose solutions!
You could certainly run the file through Ghostscript to the pdfwrite device, and use the -dFILTERTEXT switch to not process the text. The resulting document would therefore not contain the offending text, but would still contain the image.
Of course, this would then not be possible to search or highlight.
You could instead use -dFILTERIMAGE which would remove the original image leaving the text behind. But then anything in the original document which was not text would now be missing.
The usual 'best practice' is to have the text drawn in rendering mode 3, which makes no marks. This allows you to see the original image without the OCR'ed text interfering. Its possible that the viewer you are using is not honouring the text rendering mode, which would be a (fairly serious) bug in the viewer. The most recent versions of MacOS seems to have some nasty bugs in the Quartz PDF rendering engine.
The other way to do this is to draw the text first, then put the original image on top of it, but that's hard to get wrong, I suspect its more likely the text rendering mode.
EDIT
The PDF file first draws the text, then draws the image on top of the text. The underlying text should not appear. mkl is quite correct in his comment.
The correct way to fix this is to fix the consumer which is rendering it incorrectly. As I mentioned above the latest version of Quartz seems to have some fairly serious bugs, you might choose to raise this as a bug with Apple.
The only other solution would be to run this through something which will remove the text. Ghostscript can do this but there are implications; firstly it will no longer be possible to search/copy/paste text from the document. Secondly you would need to run quite a complex command line in order to prevent the decompressed JPX images being recompressed as JPEG, which would probably result in compromised quality. Finally the resulting file size would be larger.

[Steganography ]Hiding Data in PDF files

I'm trying to hide a file in a PDF file code. I've already search some information to help me. I've tried to uncompress the pdf using pdftk ( pdftk pdf.pdf output uncompress.pdf uncompress ). Then I tried different things such as :
Insert commentary : I put " %TEXT_TO_HIDE " in the uncompress pdf file code.
add new object : I put " 0 0 obj << TEXT_TO_HIDE << endobj " in the uncompress pdf file code.
modify an existing object
then i compress it using pdftk again
In each case, I obtain a new pdf, which is looking different from the original. It's not corrupted but images have different colors, and some original text are missing.
So, do you know some rules to change a pdf code without anyone notice ?
(PS : Sorry if my english is bad ^^ )
You cannot modify a PDF file in a text editor and expect the file to be still compliant in general. PDF is a binary format and you need to read the PDF specification to figure out how to modify it.
That said, there are heaps of places where you can "hide" information in a PDF document, the real question is how much data you want to hide, and to what purpose. The purpose typically links to how secure exactly this needs to be.
As some examples:
1) PDF allows embedding complete files in the actual PDF file. This is not really secure as anyone with decent software can extract these files (but the file itself could still be secured of course).
2) PDF allows adding arbitrary objects anywhere (or almost anywhere) in the file. This is a great way to hide information, but someone with the right tools can browse the object tree (even if the file is compressed) and see what you did.
3) PDF allows adding for example white text on a white background or text behind other objects. Again, there are ways around this for people with the right software.
4) Adobe's PDF spec allows at least 1K of fluff after the %%EOF marker (although ISO 32000 does not). Keep in mind that this is visible to anyone opening the file with a decent text or binary editor. (Thanks Jongware).
In short, you need to define much better what exactly you want to accomplish and how "secure" secure is in your use case.
You should also consider how "robust" the method must be. Should someone be able to save your PDF file with Acrobat for example with the hidden code intact? Some of the above methods may not be robust enough to ensure that with absolute certainty.

Is it possible to compress only a section of an image?

I am wondering if there is any way to compress a specific sections of an image and preserve other sections. For example I want the background of a large image compressed but the title and description text laid over the background to be crisp.
This would be pretty cool. Short answer (no).
Long Answer.
JPEG and PNG.
Do the background with JPEG and save this off as a separate file.
Then do the title and description as a PNG with transparency.
In what every you are making (website, app) you will then be able to overlay these images and since the PNG has transparency it will appear as part of the original image.
At the end of the day we only have a few technologies we can work with ant that is jpg, gif, png, tiff, bmp, (svg some things dont support this) for image decoding for the end user.
Neither of these technologies do what you want well. PNG is awesome, but it the file size will be pretty huge compared to JPG. JPG wont give you crisp text when you have an image in the background.
I wouldnt be surprised if someone has written an encoder for what you want to do but being able to send this file to someone or something. They wont be able to decode it easily without your encoder and hence this is why we stick to the standard formats.
The direct answers are:
If you are asking "can I . . . . in Photoshop", the answer to your question is NO.
If you are asking "can I . . . . programmatically," the answer to your question is YES, with some compression methods.
However, I sense there is a question behind your question.
You mention blurring. That suggests you are trying to save as JPEG because JPEG is the only major image compression technique that causes blurring.
The solution to this problem depends upon the nature of the image. Is it a photograph? Drawing? How many colors does it have?
Could you use another compression method (e.g., PNG as suggested previously)?
You might be able to get away with JPEG using a so called "high" quality setting at the cost of increased file size.
You can select to save it with a format that has lossless compression, but the compression rate is significantly lower. Having said that, save your file as JPG with the highest level (12), then open the new file and compare it to the original. In this level the details loss is relatively lower and if the text isn't on a single color background you might find this acceptable to your needs.

How to convert PDF to an Image without text

I would like to know if its possible to convert a PDF to and image without fonts. My goal is to have only the image without text ?
And if yes, can I do it with ImageMagick/GhostScript ?
Here an example
The image final http://crocodoc_public.s3.amazonaws.com/8b8aa154-45e3-41f9-a465-628e1b2e955d/images/page-001.png
and the original PDF http://crocodoc.com/demo/efwpa (page 2) We can see that the text are on overlay over the image, what I want is to do the same.
So if I got you right, what you want is to remove some text from your PDF (not fonts), and you want to do it programmatically. I suspect you know already that this will only possible if the text is placed on some kind of separate layer in your PDF files. You can try to utilize iText for that. Beware, this will mean you will have to invest some days of learning how to use that library.
I too am the lookout for something like that.
While playing with imagemagick I tried this a command and got some unexpected results.
convert -input.pdf -blur 0x0 output.jpg
this removes the text layers from the pdfs I tried.
I cannot guarantee that this will work for you and if this the right way to achieve, but you may try.
You can do that with Adobe Acrobat. Select the text with the touch up tool and delete it. I don't think you can do that with Ghostscript. You could consider editing the PDF by hand (qpdf helps).