Fast-to-load cross-platform alternative to MX files (Mathematica) - serialization

In Mathematica, one can save intermediate results / the partial state of the workspace with Save (.m files) or DumpSave (.mx files).
.m files are portable, but are very slow to load (with large data).
.mx files are fast to load, but are not portable between platforms/architectures.
Is there a way to save generic Mathematica expressions in a way that loading them is fast, and they're portable between platforms? Has anyone experimented with / benchmarked different methods to do this?
One possible solution is to save .m files (cross-platform), then convert them to .mx files when starting work on a new platform (a one-time operation). Is there a fully automatic way to convert .m files to .mx files?

From the posts Alexey linked:
str=OpenWrite[file,BinaryFormat->True];
BinaryWrite[str,Compress[expr],"TerminatedString"];
Close[str];
This is not quite as fast as using an mx file, but it is still very fast.
David Bailey
Another alternative seems to be WDX (Wolfram Data eXchange) which I am
using without problems on a variety of machines and which also seems to
be portable, can be used exactly like MX files, is binary, is documented
and thus I would consider officially supported. And it is used by the
data paclet functionality, so I guess it is reasonably performant and
well tested on all systems (an assumption which my experience does
support up to now).
(excerpt from answer by Albert Retey, also from Alexey's link)
But these do not work as Save/DumpSave does, in that it does not save the FullDefinition of expr, only the explicit value of expr.

Related

Is there a way to force thrift generate separate file for each class?

I have thrift file with many structs in it. For C#, thrift generates bunch of files, but for Objective C it gives me only .h and .m files. Version of thrift is 0.9.0.
Typically not. The way in which the Thrift compiler organizes generated sources are pretty much based on what the original developer considered being consistent with the usual way to do things in particular language. The generated code is not intended to be neatly organized or maintainable in the first run, the primary goal is performance (and of course perfectly working code).
Sorry, there is no other answer, except maybe providing a patch.

How to do File I/O in Opa?

After reading (nearly) the whole ebook and taking a look at the API
i am still asking myself how to realize "traditional" web server behaviour with opa.
I understand (at least i believe that) that opa links external resources specified at
compile time into the executable, making them immutable and permament.
But what if, say, i would want to change the stylesheet of an application without recompiling it?
There seems to be a few methods in the stdlib (apidoc) but they are not covering
what i am used to from other programming languages.
A possible solution i could think of is making use of the internal database,
but that looks like a bit of an overkill for something simple like traditional File I/O.
Edit: this blog post explains more about dealing with external resources in Opa.
Long story short: you'll rarely work with external files in Opa.
Let me try to break this down. Opa will indeed embed resources. But for development mode you indeed just want to be able to tweak them (mainly CSS) and see changes immediately. If you compiled your program in a non-release mode then it will support this kind of actions (try --help, below is an excerpt)
Debugging Resources : dynamic edition:
[...]
--debug-editable-css
Export the CSS files embedded in the server to the file
system, so that they can be viewed and edited during
execution of the application
For many other editable&changing resources one would indede use the database.
And if you really need to work with files (again: with Opa you'll need it much less than with traditional web languages) then take a look at stdlib.io and, for advanced use, at BslFile module with bindings to Ocaml functions for file manipulation.
I think this module is for you :
http://opalang.org/resources/doc/index.html#file.opa.html/!/value_stdlib.io
import stdlib.io
my_css = File.content("css/file.css")
I am not seeing some way to write file, but I think if you need to write you should use the db.
But to read I think this is the solution :)

How to unrar/unzip with Objective-C, including single file retrieval

I have a need to handle various rar/zip files, in Objective-C. Ideally I'd like to be as flexible as possible in terms of rar/zip versions. I'd also like to be able to only extract certain files from the rar/zip files, after pulling out a list of the file contents.
If that wasn't enough, I'd like to be able to access and modify the zip comment.
Is this easily possible in objective-c? I've searched around a lot and found a lot of half-finished libraries that don't do everything I want, or only support rar up to version 2, or don't support extracting single files.
I know I could just use the command line unzip tool that ships with MacOS Panther and up, but this seems inelegant and doesn't help me with rar files, as no unrar application ships with MacOS by default.
Can anyone point me at a decent library that does one or the other of these two types of files, or a recommended best approach for dealing with this problem? I know that one option is to wrap the unrar source, and also wrap the zlib source, but this to me is a daunting task. If there's no other option I'll do it - any advice or guidance on this would be gratefully received.
Thanks!
Yes, doing that it's easy in objective C. For zip files just use ZLIB (it's already included in Mac OS X.
RAR is not that simple though. Look for a C library (not an Objective-C library). There will be way more C libraries for RAR handling than Objective-C ones. And you can use all C libraries you want within an Objective-C program.

Use ZIP-archives to store NSDocument data

I noticed that Apple started using zip archives to replace document packages (folders appearing as a single file in Finder) in the iWork applications. I'm considering doing the same as I keep getting support emails related to my document packages getting corrupted when copying them to a windows fileserver.
My questions is what would be the best way to do this in a NSDocument-based application?
I guess the easiest way would be to create a directory file wrapper, create an archive of it and return it in NSDocument's
- (NSFileWrapper *)fileWrapperOfType:(NSString *)typeName error:(NSError **)outError
But I fail to understand how to create a zip archive of the NSFileWrapper.
If you just want to make a zip file your format (ie, "mydoc.myextension" is actually a zip file), there's no convenient, built-in Cocoa mechanism for creating zip archives with code. Take a look at this Google Code project: ziparchive I don't believe a file wrapper will help in that case, though.
Since you cited iWork, I don't own iWork 09, but previous versions use a package format (ie, NSFileWrapper would be ideal) but zip the XML that describes the document's structure, while keeping attachments (like embedded media, images, etc.) in a resource folder, all within the package. I assume they do this because XML can be quite large for large, complicated documents, but compresses very well because it's text. This results in an overall smaller document.
If indeed Apple has moved to making the entire document one big zip archive (which I would find odd), they'd either be extracting necessary resources to a temp folder somewhere or loading the whole thing into memory (a step backward from their package-based approach, IMO). These are considerations you'll need to take into account as well.
You’ll want to take the data from the file wrapper and feed it into something like ziparchive.
Pierre-Olivier Latour has written an extension to NSData that deals with zip compression. You can get it here: http://code.google.com/p/polkit/
I know this is a little late to the party but I thought I'd offer up another link that could help anyone that comes across this post.
Looks like the ZipBrowser sample from Apple would be a good start http://developer.apple.com/library/mac/#samplecode/ZipBrowser/Introduction/Intro.html
HTH

How to decide on document file extension?

I'm writing a new document-based cross-platform chemistry application (Win, Mac, Unix), which saves files in its own format (no standard format exists for this field). I'm trying to decide on a file extension for the saved files. My questions are:
How important is it nowadays to stick to 3 characters?
Where can you check how much this file extension is already used? (Google helps, of course, but it does not tell me how much a given app is popular)
Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").
Finally, do you have other important guidelines when choosing for your own programs?
PS: I tried to keep the right balance between "giving too little information" and "being too specific to be really useful to others". I'll happily provide more information in comments if the need arises.
FileInfo.com lists a lot of file extensions along with their own estimation of how much it is ued.
I suggest a unique extension (rather then xml.gz) so that the OS can identify the file type to users when looking at a file listing or whatever. 'Ringing a bell' is important, especially if you will have less sophisticated users.
I don't see any need to stick to 3 characters, but I wouldn't go bigger than 5 (I don't suppose I have a real reason for this, other than personal preference).
How important is it nowadays to stick to 3 characters?
It's not unless you have to support older operating systems. All current OSes handle >3 char file extensions without any problems. Think of .html, .config, .resx, and I'm sure there are more.
Where can you check how much this file extension is already used?
check out FileExt.
Do I really need to use a file-specific extension? My save
format is gzip'ed XML, so I could name
it .xml.gz, but I fear it would
confuse beginning users (i.e. when you
see it, it does not immediately "ring
a bell").
Remember that windows (and windows users) associate files with applications by extension, so using something too generic like .xml.gz may cause problems. You are probably better coming up with something that is more specific to your file type or application. Users don't care weather your format is gzipped xml internally, they care about what is in the file. Think about abstraction layers, your users will think of it as a file containing chemistry info not gzipped xml, so .chem is far more appropriate than .xml.gz
Some suggestions of things to thing about:
Obviously, don't clash with anything big - Don't use .doc, .xls, .exe, etc.
Don't clash with anything common in your industry domain that your user demographic is likely to have installed. For example, if you are writing a programming tool, don't use .cs or .cpp. You probably know your domain best, so write a list of all the apps you and your users are likely to have installed, and any of their competitors and avoid them.
Make sure your app includes the options to register and unregister the extension. don't just automatically do it in the installation, make sure it's an option.
Remember unix/linux and Mac are case sensitive, so consider sticking to always all lower case by default.
Remember CD/DVD file naming rules are stricter, so don't use non alpha numeric characters.
Finally, remember that most non-tech users are going to have file extensions turned off, so don't stress about it too much.
There is more info here.
Wikipedia has lists of files extensions here (by type) and here (alphabetical), and also some general information
Depends on the platform, but in general, not very important for newer Operating Systems. Check the documentation for the platforms you're targeting.
I'm not aware of better alternatives to Google. Hopefully someone else has a better suggestion for this one.
Not unless you have some reason to do so. Examples would be "I want to ensure that Windows always opens this program with my app". I'm not sure that your users need to be concerned with the extension anyway. The default configuration on Windows, for example, is to hide extensions for known file types. BUT if you have a compelling reason (such as allowing your program to easily identify files it should be able to handle, for example) then you could use the extension, or you could come up with something else.
I have only ever once written a program where I thought I needed to come up with my own extension. I used my initials. Then later I realized I didn't really need a special extension and reverted to ".xml". However, most extensions seem to be something that seems to mean something. (.doc for documents, etc.) so something meaningful is a good idea if you do need to go this route.
It sure depends on the OSes you want to support, but people have globally moved over the 3-characters extension limit these days: .html is well used for webpages, for example.
Of course, if you go to much longer extensions, people will stop visually recognizing it as a file extension, I think...
Barring your needing to be compatible with a specific OS that you know still has the three-letter limitation, no need to keep it to three characters. It may be useful to have a three-character version of it if you end up supporting those platforms.
The Wikipedia list of file formats is pretty good. Some mime mapping lists will list common extensions associated with those mappings. Ray already mentioned FileInfo.com.
It's a convenience thing; I'd probably go with your own but document the fact that they're just gzipped XML files conforming to a specific DTD and make it easy for users to use .xml.gz instead. Be sure that your software doesn't care about the extension, so that users could even choose their own if they wanted, although I'd tend to avoid encouraging them to by providing a reasonable default.
I'd go for typeability, clarity, uniqueness, and brevity -- in that order. For instance, .config is a lot easier to type than .q2z but it falls down on uniqueness. (I'm not suggesting it for your app; it's an example.) Similarly, .q2z is just a pain. :-) So for instance, .chemstuff is easy to type and probably not in wide use elsewhere. (Again, not a suggestion, just an example.)
Have it as document_name.app_name.xml.gz where document_name and app_name are variables, the latter some easily readable and recognisable short string of your application's title.
Modern systems are quite flexible, and there is absolutely no need to drag the 3-character extensions further along in time with us.
I agree that .xml.gz would confuse users, however keep in mind that modern systems are moving into recognizing files not based on extensions but by probing their headers and even contents instead. In fact, users do not often even see the extensions. For gzipped XML files, a system may decide to first unpack the file stream in memory, then find out it is a literal XML file, then it may take its 'xmlns' as the application identifier. However, such systems are not yet widespread use. In any case, don't make the mistake of only opening files by extension - be smart and raise the bar - do exactly the above to find out if the file can be considered a document for your application.