Optimize checking gzcompressed file for changes - gzip

I have function which stores data in gzcompressed files, like
function savedata($fileName,&$data){
file_put_contents($filename,gzcompress($data));
}
I want to optimize and don't save if the data is same as in already stored gzcompressed file.
I can open the whole file, uncompress it and compare with the $data, but I think there should be some other way.Probably gz-d data has some crc or something like that, so I can just compress the data, fetch the crc from it and compare it to the crc in already existing file or something like that.
So I just want to omtimize checking the file and $data for changes to make it quicker.

gzcompress() in PHP compresses to the zlib format, not the gzip format.
Each zlib stream ends with a four-byte check value, though not a CRC. You can compare those to see if two streams are different. However if the two check values are the same, you cannot conclude that the streams are the same.

Related

How to read a binary file with TCL

So I have a function I'm using to read data from a file. It works fine if the file is plain text, but when I try to read a binary file, like a png, it returns a different text (diff confirms that). I opened a hex editor to see what was wrong and found out it is putting some c2 bytes along with the file (I don't know if the position is random or if there are other bytes except this c2 one).
This is my function. I just want it to read and save to a variable.
proc read_file {path} {
set channel [open $path r]
fconfigure $channel -translation binary
set return_string "[read $channel]"
close $channel
return "$return_string"
}
To actually print, I'm doing this:
puts -nonewline [read_file file.png]
When you open a file, it defaults to being in text mode . In text mode (which is really a combination of options) the IO layer translates characters from whatever encoding they are in into Tcl's internal encoding, and does the reverse operation on output. The default encoding scheme is platform specific, but in your case it sounds like it is UTF-8. (Tcl uses a complex internal system of encodings; it doesn't expose those to the outside world.)
By contrast, when you put the channel into binary mode, the bytes on the outside are directly mapped to characters in the range 0-255 (and vice versa on output). You get a perfect copy, provided you put both input and output channels in binary mode. (There are other optimisations for binary mode, but they don't matter here.)
When you only put one of the channels in binary mode, you get what looks like corruption. It isn't random though. In particular, when the input is binary but the output is UTF-8, input bytes in the range 128-255 get converted into multiple output bytes, where the first of those bytes is in the sort of range you observed. There are other combinations that mess things up; the whole range of problems is collectively known as mojibake.
tl;dr Don't mix up binary and text data unless you're very careful. The results of getting it wrong are "surprising".

SAS : read in PDF file

I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)
The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.
Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)
(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)
You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:
%let pdfFileName = Test.pdf;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile "&pdfFileName" lrecl=&lineSize;
input text_line $;
run;
This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (##) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.

dll files compared to gzip files

Okay, the title isn't very clear.
Given a byte array (read from a database blob) that represents EITHER the sequence of bytes contained in a .dll or the sequence of bytes representing the gzip'd version of that dll, is there a (relatively) simple signature that I can look for to differentiate between the two?
I'm trying to puzzle this out on my own, but I've discovered I can save a lot of time by asking for help. Thanks in advance.
Check if it's first two bytes are the gzip magic number 0x1f8b (see RFC 1952). Or just try to gunzip it, the operation will fail if the DLL is not gzip'd.
A gzip file should be fairly straight forward to determine as it ought to consist of a header, footer and some other distinguishable elements in between.
From Wikipedia:
"gzip" is often also used to refer to
the gzip file format, which is:
a 10-byte header, containing a magic
number, a version number and a time
stamp
optional extra headers, such as
the original file name
a body,
containing a DEFLATE-compressed
payload
an 8-byte footer, containing a
CRC-32 checksum and the length of the
original uncompressed data
You might also try determining if the gzip contains any records/entries as each will also have their own header.
You can find specific information on this file format (specifically the member header which is linked) here.

How to mix sound files of different format using FFMPEG?

I need to mix audio files of different types into a single output file through code in my iPad app.
For example, I need to merge a .m4a file with .mp3 or .wav or any other format file.
The resulting output file should be of .m4a type.
I have compiled FFMPEG for iOS with the link: http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2009-October/076618.html
Now, I am not able to understand in which direction to proceed?
This is more of a suggestion than an answer. I don't have any experience with obj-c, though I have worked with audio formats in other languages. Unless you find some library that does this specific task, you may need to decode the files and convert their data to some common numerical representation.
The sample data of a .wav file is stored as signed integers between the range of -32768 to 32767, while mp3 sample data is stored as floating points between the range of -1 to +1. Either representation could be converted to the other through some simple calculation.
mp3ToWavSample = mp3Sample * 32767
Once the data is converted "merging" becomes very easy. You can simply add the sample values together.
mergedSample = convertedSample1 + convertedSample2
You would need to apply this to every sample in the mp3. Depending on the size of your files, this could be a significant processing task.
As for adding reverb to your track, I'd suggest that you ask for help on that in a another question.

How can I add and remove bytes on from the start of a file?

I'm trying to open an existent file save a bytes in the start of it to later read them.
How can I do that? Because the "&" operand isn't working fo this type of data.
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
Help Please.
You cannot add to or remove from the beginning of a file. It just doesn’t work. Instead, you need to read the whole file, and then write a new file with the modified data. (You can, however, replace individual bytes or chunks of bytes in a file without needing to touch the whole file.)
Secondly,
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
You’re doing something wrong. Apparently you’ve read text data from the file and are now trying to convert it to bytes. This is the wrong way of doing it. Do not read text from the file, read the bytes directly (e.g. via My.Computer.FileSystem.ReadAllBytes). Raw byte data and text (i.e. String) are two fundamentally different concepts, do not confuse them. Do not convert needlessly to and fro.