If it exists is the ID3 chunk always the last chunk in an AIFF file - aiff

I'm trying to write some code for reading and writing the ID3 chunk from an AIFF file.
I know this chunk is optional, but if it exists is it always the last chunk or could it be anywhere. If it is always the last chunk this makes writing changes to the file easier.

No I don't think it is, it can one of the first chunks there is nothing to prevent this in the specification.

I'm not sure about AIFF, but in wav files, the ID3 sub chunk can be anywhere, before or after the data chunk. The way to find it would be to look at the first subchunkID and if it is not the ID3 chunk then check the next 4 bytes, which would be the size of the subchunk, and then you can skip to the head of the next subchunk, based on the previous size and check that ID.
Again I have only worked with .wav files so far, but plan on looking at AIFF soon.

Related

How to PREPEND text to a file in Swift or Objective C?

Please note that I'm not asking how to append texts at the end of the file. I'm asking how to prepend texts to the beginning of file.
let handle = try FileHandle(forWritingTo: someFile)
//handle.seekToEndOfFile() // This is for appending
handle.seek(toFileOffset: 0) // Me trying to seek to the beginning of file
handle.write(content)
handle.closeFile()
It seems like my content is being written at the beginning of the file, but it just replaces the existing consent as well... Thanks!
One reasonable solution is to write the new content to a temporary file, then append the existing contents to the end of the temporary file. Then move the temporary file over the old file.
When you seek to a point in an existing file and then perform a write, the existing contents are overwritten from that point. This is why your current approach fails.
In general, most file systems don't have built-in support for prepending data to files. Likewise, most file I/O APIs don't either.
In order to prepend data, you first have to shift all of the existing data further along the file to make room for the new data at the beginning. You typically do this by starting near the end, reading a chunk of data, writing that data to the original position plus the length of data you hope to eventually prepend, and then repeating with the next chunk closer to the beginning of the file. In this way, you gradually shift everything down. Only after you've done all of that can you safely write the new data at the beginning of the file safely.
Frankly, if there's any way to avoid this, you should try to. The performance is likely to be terrible if the file is large and/or you're doing it frequently.

Minimal PDF size according to specs

I'm reading PDF specs and I have a few questions about the structure it has.
First of all, the file signature is %PDF-n.m (8 bytes).
After that the docs says there might be at least 4 bytes of binary data (but there also might not be any). The docs don't say how many binary bytes there could be, so that is my first question. If I was trying to parse a PDF file, how should I parse that part? How would I know how many binary bytes (if any) where placed in there? Where should I stop parsing?
After that, there should be a body, a xref table and a trailer and an %%EOF.
What could be the minimal file size of a PDF, assuming there isn't anything at all (no objects, whatsoever) in the PDF file and assuming the file doesn't contain the optional binary bytes section at the beginning?
Third and last question: If there were more than one body+xref+trailer sections, where would be offset just before the %%EOF be pointing to? The first or the last xref table?
First of all, the file signature is %PDF-n.m (8 bytes). After that the docs says there might be at least 4 bytes of binary data (but there also might not be any). The docs don't say how many binary bytes there could be, so that is my first question. If I was trying to parse a PDF file, how should I parse that part? How would I know how many binary bytes (if any) where placed in there? Where should I stop parsing?
Which docs do you have? The PDF specification ISO 32000-1 says:
If a PDF file contains binary data, as most do (see 7.2, "Lexical Conventions"), the header line shall be
immediately followed by a comment line containing at least four binary characters—that is, characters whose
codes are 128 or greater.
Thus, those at least 4 bytes of binary data are not immediately following the file signature without any structure but they are on a comment line! This implies that they are
preceded by a % (which starts a comment, i.e. data you have to ignore while parsing anyways) and
followed by an end-of-line, i.e. CR, LF, or CR LF.
So it is easy to recognize while parsing. In particular it merely is a special case of a comment line and nothing to treat specially.
(sigh, I just saw you and #Jongware cleared that in comments while I wrote this...)
What could be the minimal file size of a PDF, assuming there isn't anything at all (no objects, whatsoever) in the PDF file and assuming the file doesn't contain the optional binary bytes section at the beginning?
If there are no objects, you don't have a PDF file as certain objects are required in a PDF file, in particular the catalog. So do you mean a minimal valid PDF file?
As you commented you indeed mean a minimal valid PDF.
Please have a look at the question What is the smallest possible valid PDF? on stackoverflow, there are some attempts to create minimal PDFs adhering more or less strictly to the specification. Reading e.g. #plinth's answer you will see stuff that is not PDF anymore but still accepted by Adobe Reader.
Third and last question: If there were more than one body+xref+trailer sections, where would be offset just before the %%EOF be pointing to?
Normally it would be the last cross reference table/stream as the usual use case is
you start with a PDF which has but one cross reference section;
you append an incremental update with a cross reference section pointing to the original as previous, and the new offset before %%EOF points to that new cross reference;
you append yet another incremental update with a cross reference section pointing to the cross references from the first update as previous, and the new offset before %%EOF points to that newest cross reference;
etc...
The exception is the case of linearized documents in which the offset before the %%EOF points to the initial cross references which in turn point to the section at the end of the file as previous. For details cf. Annex F of ISO 32000-1.
And as you can of course apply incremental updates to a linearized document, you can have mixed forms.
In general it is best for a parser to be able to parse any order of partial cross references. And don't forget, there are not only cross reference sections but also alternatively cross reference streams.

Fortran: How to skip many lines of data file efficiently

I have a formatted data file which is typically billions of lines long, with several lines of headers of variable length. The data file takes the form:
# header 1
# header 2
# headers are of variable length.
# data begins from next line.
1.23 4.56 7.89 0.12
2.34 5.67 8.90 1.23
:
:
# billions of lines of data, each row the same length, same format.
-- end of file --
I would like to extract a portion of data from this file, and my current code looks like:
<pre>
do j=1,jmax !Suppose I want to extract jmax lines of data from the file.
[algorithm to determine number of lines to skip, "N(j)"]
!This determines the number of lines to skip from the previous file
!position, when the data was read on j-1th iteration.
!Skip N-1 lines to go to the next data line to read off:
do i=1,N-1
read(unit=unit,fmt='(A)')
end do
!Now read off the line of data I want:
read(unit=unit,fmt='(data_format)'),data1,data2,etc.
!Data is stored in some arrays.
end do
</pre>
The problem is, N(j) can be anywhere between 1 and several billion, so it takes some time to run the code.
My question is, is there a more efficient way of skipping millions of lines of data? The only way I can think of, while sticking to Fortran, is to open the file with direct access and jump to the desired line upon opening the file.
As you suggest, direct access seems like the best option. But that requires the records to all have the same length, which your headers violate. Also, why used formatted output? With a file of this length, its hard to imagine a person reading the file. If you use unformatted IO, the file will be both smaller and IO will be faster. Perhaps create two files, one with the headers (metadata) in human reader form, and the other with the data in native form. Native / binary representation means a loss of portability, which is something to consider if you want to move the files to different computer architectures or have them be useable for decades. Otherwise it's probably worth the convenience. Other options would be to use a more sophisticated file format that combines metadata and data, such as HDF5 or FITS, but for communication between two programs of one person, that's probably excessive.

reading and separating file using byte comparison

I have to read a file which in .dat format and separate data based on 2 first consecutive zero byte comes. first half is json data and other half is binary data.
How should I go about it?
I tried using NSData dataWithContentsOfFile method and read it and then convert it in byte array and compare bytes. Somehow, its not working.
You can use the same kind of procedure that you would use to read a file by lines. Here and here are earlier answers on SO regarding reading a file line-by-line. Just change \n to the byte sequence that is applicable.

How can I add and remove bytes on from the start of a file?

I'm trying to open an existent file save a bytes in the start of it to later read them.
How can I do that? Because the "&" operand isn't working fo this type of data.
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
Help Please.
You cannot add to or remove from the beginning of a file. It just doesn’t work. Instead, you need to read the whole file, and then write a new file with the modified data. (You can, however, replace individual bytes or chunks of bytes in a file without needing to touch the whole file.)
Secondly,
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
You’re doing something wrong. Apparently you’ve read text data from the file and are now trying to convert it to bytes. This is the wrong way of doing it. Do not read text from the file, read the bytes directly (e.g. via My.Computer.FileSystem.ReadAllBytes). Raw byte data and text (i.e. String) are two fundamentally different concepts, do not confuse them. Do not convert needlessly to and fro.