I'm looking for a way to access files within a zip file without extracting the whole file. All the zip solutions I find on the internet seems to extract the whole zip. Does anyone know of a solution?
Google has an objective-c lib based on minizip. http://code.google.com/p/objective-zip/
Supports unzip of individual files
EDIT: the project has moved to GitHub
The zlib library source distribution comes with a 'contrib' directory. In it, you'll find a library called 'minizip' (same license as zlib itself), which has APIs for creating (zip.h) and navigating/extracting (unzip.h) ZIP files. Despite the filename, there are functions in unzip.h which let you list or search for files within the zip file without extracting it.
If the zip is up on the internet you can have a look at pinch which will let you extract individual files from the zip without downloading the whole file.
https://github.com/epatel/pinch-objc
Maybe you can use it as a base to extract individual files from a local zip archive.
Related
I have been reading and working on SO questions related to the Street View House Numbers (SVHN) datasets. The files are available at 2 different locations:
Stanford:
The Street View House Numbers (SVHN) Dataset
kaggle:
Street View House Numbers (SVHN) | Kaggle
My question is related to the format of the digitStruct.mat files for each image set (train, test, and extras). These define the name, label, and bounding box dimensions for each image. As I understand, the mat file is written as a Matlab structure in HDF5 format (that can be read with h5py).
I have been able to access and read the digitStruct.mat files from kaggle with h5py. I cannot open the same files from the Stanford site with h5py (or with HDFView). Some SO posts I've read indicate the Stanford files are an older Matlab format and should be read with scipy.io.loadmat.
Are the files at Stanford and kaggle the same?
If not, what are the differences?
Should I be able to open the Stanford digitStruct.mat files with h5py?
If so, what method should I use to download and extract the Standford tar.gz files? (FYI, I'm on Win-7, and have been using HTTP download and WinZip to extract.)
I am adding additional info to document different behavior observed with different .mat files. It may help with diagnosis.
I can open and operate on .mat files from kaggle with this call:
h5f = h5py.File('digitStruct.mat','r')
For files from Stanford, I get different errors depending on the file and function used to open.
The command below executes without an error message. That leads me to believe it is not a Matlab v7.3 file that can be opened with h5py.
mat = scipy.io.loadmat('./Stanford/test_32x32.mat')
Both of these calls do not work (brief error message provided):
mat = scipy.io.loadmat('./test/digitStruct.mat')
Traceback...
NotImplementedError: Please use HDF reader for matlab v7.3 files
h5f = h5py.File('./test/digitStruct.mat','r')
Traceback...
OSError: Unable to open file (file signature not found)
In addition, I cannot open test/digitStruct.mat with HDFView. My conclusion for the Stanford digitStruct.mat files: they might be Matlab v7.3 files, but were corrupted when I downloaded. However, I'm not sure what I did wrong (since I can download and read kaggle files without problems).
With some Linux detective work, I figured out the problem.
As I suspected, the digitStruct.mat files extracted from the *.tar.gz files on the Stanford site are HDF5 (Matlab v7.3) files, and were corrupted when I downloaded.
To confirm, I downloaded the 3 tar.gz files with a browser on a Linux system, then used the tar command to extract them, and successfully opened with h5py on Linux. I then transferred them to my Windows system, and each worked as expected with h5py.
This is a little surprising, as I have used WinZip to extract tarball files in the past. Apparently there's something special about these that caused the corruption.
Hopefully this saves someone the same headache in the future.
Note: the 3 xxxx_32x32.mat files are an older Matlab format that must be accessed with scipy.io.loadmat()
MS Word's .docx files contain a bunch of .xml files.
Setup.exe files spit out hundreds of files that a program uses.
Zips, rars etc also hold lots of compressed stuff.
So how are they made? What does MS Word or another program that produces these files have to do to put files inside files?
When I looked this up I just got a bunch of results about compression, but let's say I wanted to make a program that 'wraps' files inside a file without making the final result any smaller. What would I even have to write?
I'm not asking/expecting any source code that does this, I just need a pointer. Is there something you think I'm misunderstanding based on what I've asked here?
Even a simple link to an article or some documentation would be greatly appreciated.
Ok, I'll just come up with some headers for ordinary files and write them along with the bytes of the actual files into one custom-defined file. You guys were very helpful, thank you!
Historically, Windows had a number of technologies to support solutions like this. These were often called Compound Files or Structured storage. However, I don't think the newer Office documents use these technologies. I think the Office file formats are similar to ZIP files with a different extensions. If you change a file with .docx extension to .zip and open it with your favorite compression tool, you'll see a bunch of folders and XML files.
Here are some links to descriptions of different file formats that create "files within files"
Zip file format
Compound File Binary Format (CFBF)
Structured Storage
Compound Document File Format
Office Open XML I: Exploring the Office Open XML Formats
At least on POSIX systems (e.g. Linux), a file is only a stream (i.e. a sequence) of bytes. And you can only grow (or shrink, i.e. truncate) it at the end - there is no way to insert bytes in the middle (without copying the rest).
You need some conventions, and some additional software, to handle it otherwise.
You might be interested in Sqlite, which gives you a library to handle some (e.g.) *.sqlite file as an SQL database
You could also use GDBM - a library giving you some indexed file abstraction.
libtar is a library to manipulate tar archives. See also tardy, a tar file postprocessor.
Is there a way to read the ptd or zgy file outside of Petrel? I have an application that would like to read the 3d seismic data that petrel holds in these formats without opening petrel to export the data into ASCII or something else. Obviously its a better user experience to just read it from my own application.
You can use zgy access C++ library deployed with Petrel. It's named Slb.Salmon.ZgyPublic.zip and located in the Petrel root folder. The archive contains binaries (native DLLs), C++ header files and documentation.
As for ptd, it is an extension of a folder name which contains files in many formats (binary, XML etc.), belonging to one project. The project's main file has pet extension, it is stored in binary format. There is no documentation on the format, it may change without notice, so you are not supposed to read those files directly.
I noticed that Apple started using zip archives to replace document packages (folders appearing as a single file in Finder) in the iWork applications. I'm considering doing the same as I keep getting support emails related to my document packages getting corrupted when copying them to a windows fileserver.
My questions is what would be the best way to do this in a NSDocument-based application?
I guess the easiest way would be to create a directory file wrapper, create an archive of it and return it in NSDocument's
- (NSFileWrapper *)fileWrapperOfType:(NSString *)typeName error:(NSError **)outError
But I fail to understand how to create a zip archive of the NSFileWrapper.
If you just want to make a zip file your format (ie, "mydoc.myextension" is actually a zip file), there's no convenient, built-in Cocoa mechanism for creating zip archives with code. Take a look at this Google Code project: ziparchive I don't believe a file wrapper will help in that case, though.
Since you cited iWork, I don't own iWork 09, but previous versions use a package format (ie, NSFileWrapper would be ideal) but zip the XML that describes the document's structure, while keeping attachments (like embedded media, images, etc.) in a resource folder, all within the package. I assume they do this because XML can be quite large for large, complicated documents, but compresses very well because it's text. This results in an overall smaller document.
If indeed Apple has moved to making the entire document one big zip archive (which I would find odd), they'd either be extracting necessary resources to a temp folder somewhere or loading the whole thing into memory (a step backward from their package-based approach, IMO). These are considerations you'll need to take into account as well.
You’ll want to take the data from the file wrapper and feed it into something like ziparchive.
Pierre-Olivier Latour has written an extension to NSData that deals with zip compression. You can get it here: http://code.google.com/p/polkit/
I know this is a little late to the party but I thought I'd offer up another link that could help anyone that comes across this post.
Looks like the ZipBrowser sample from Apple would be a good start http://developer.apple.com/library/mac/#samplecode/ZipBrowser/Introduction/Intro.html
HTH
Is it possible to programmatically zip/unzip files in vb.net? Meaning, not that it will extract the files for the user, but take the files inside the zip and be able to use them in the application? Then, is it possible for this to create a zip?
I couldn't seem to find a compression namespace anywhere.
Thanks for the help!
We used SharpZibLib in the past with great success.
You can also have look at the System.IO.Compression namespace, it provides the functionality compress and decompress streams but unfortunately not the functionality to extract files from a Zip file :(
Update:
I wasn't aware of this namespace System.IO.Packaging, seems it can indeed deal with files 'packed' into a zip file.
For an excellent commercial solution try http://xceed.com/
We have used this and it's great for working with zip file (and for merging and creating self-extracting zips if this is required)
note: Not affiliated with Xceed in any way.