How to decide on document file extension?

How to decide on document file extension? - cross-platform

I'm writing a new document-based cross-platform chemistry application (Win, Mac, Unix), which saves files in its own format (no standard format exists for this field). I'm trying to decide on a file extension for the saved files. My questions are:
How important is it nowadays to stick to 3 characters?
Where can you check how much this file extension is already used? (Google helps, of course, but it does not tell me how much a given app is popular)
Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").
Finally, do you have other important guidelines when choosing for your own programs?
PS: I tried to keep the right balance between "giving too little information" and "being too specific to be really useful to others". I'll happily provide more information in comments if the need arises.

FileInfo.com lists a lot of file extensions along with their own estimation of how much it is ued.
I suggest a unique extension (rather then xml.gz) so that the OS can identify the file type to users when looking at a file listing or whatever. 'Ringing a bell' is important, especially if you will have less sophisticated users.
I don't see any need to stick to 3 characters, but I wouldn't go bigger than 5 (I don't suppose I have a real reason for this, other than personal preference).

How important is it nowadays to stick to 3 characters?
It's not unless you have to support older operating systems. All current OSes handle >3 char file extensions without any problems. Think of .html, .config, .resx, and I'm sure there are more.
Where can you check how much this file extension is already used?
check out FileExt.
Do I really need to use a file-specific extension? My save
format is gzip'ed XML, so I could name
it .xml.gz, but I fear it would
confuse beginning users (i.e. when you
see it, it does not immediately "ring
a bell").
Remember that windows (and windows users) associate files with applications by extension, so using something too generic like .xml.gz may cause problems. You are probably better coming up with something that is more specific to your file type or application. Users don't care weather your format is gzipped xml internally, they care about what is in the file. Think about abstraction layers, your users will think of it as a file containing chemistry info not gzipped xml, so .chem is far more appropriate than .xml.gz
Some suggestions of things to thing about:
Obviously, don't clash with anything big - Don't use .doc, .xls, .exe, etc.
Don't clash with anything common in your industry domain that your user demographic is likely to have installed. For example, if you are writing a programming tool, don't use .cs or .cpp. You probably know your domain best, so write a list of all the apps you and your users are likely to have installed, and any of their competitors and avoid them.
Make sure your app includes the options to register and unregister the extension. don't just automatically do it in the installation, make sure it's an option.
Remember unix/linux and Mac are case sensitive, so consider sticking to always all lower case by default.
Remember CD/DVD file naming rules are stricter, so don't use non alpha numeric characters.
Finally, remember that most non-tech users are going to have file extensions turned off, so don't stress about it too much.
There is more info here.
Wikipedia has lists of files extensions here (by type) and here (alphabetical), and also some general information

Depends on the platform, but in general, not very important for newer Operating Systems. Check the documentation for the platforms you're targeting.
I'm not aware of better alternatives to Google. Hopefully someone else has a better suggestion for this one.
Not unless you have some reason to do so. Examples would be "I want to ensure that Windows always opens this program with my app". I'm not sure that your users need to be concerned with the extension anyway. The default configuration on Windows, for example, is to hide extensions for known file types. BUT if you have a compelling reason (such as allowing your program to easily identify files it should be able to handle, for example) then you could use the extension, or you could come up with something else.
I have only ever once written a program where I thought I needed to come up with my own extension. I used my initials. Then later I realized I didn't really need a special extension and reverted to ".xml". However, most extensions seem to be something that seems to mean something. (.doc for documents, etc.) so something meaningful is a good idea if you do need to go this route.

It sure depends on the OSes you want to support, but people have globally moved over the 3-characters extension limit these days: .html is well used for webpages, for example.
Of course, if you go to much longer extensions, people will stop visually recognizing it as a file extension, I think...

Barring your needing to be compatible with a specific OS that you know still has the three-letter limitation, no need to keep it to three characters. It may be useful to have a three-character version of it if you end up supporting those platforms.
The Wikipedia list of file formats is pretty good. Some mime mapping lists will list common extensions associated with those mappings. Ray already mentioned FileInfo.com.
It's a convenience thing; I'd probably go with your own but document the fact that they're just gzipped XML files conforming to a specific DTD and make it easy for users to use .xml.gz instead. Be sure that your software doesn't care about the extension, so that users could even choose their own if they wanted, although I'd tend to avoid encouraging them to by providing a reasonable default.
I'd go for typeability, clarity, uniqueness, and brevity -- in that order. For instance, .config is a lot easier to type than .q2z but it falls down on uniqueness. (I'm not suggesting it for your app; it's an example.) Similarly, .q2z is just a pain. :-) So for instance, .chemstuff is easy to type and probably not in wide use elsewhere. (Again, not a suggestion, just an example.)

Have it as document_name.app_name.xml.gz where document_name and app_name are variables, the latter some easily readable and recognisable short string of your application's title.
Modern systems are quite flexible, and there is absolutely no need to drag the 3-character extensions further along in time with us.
I agree that .xml.gz would confuse users, however keep in mind that modern systems are moving into recognizing files not based on extensions but by probing their headers and even contents instead. In fact, users do not often even see the extensions. For gzipped XML files, a system may decide to first unpack the file stream in memory, then find out it is a literal XML file, then it may take its 'xmlns' as the application identifier. However, such systems are not yet widespread use. In any case, don't make the mistake of only opening files by extension - be smart and raise the bar - do exactly the above to find out if the file can be considered a document for your application.

Related

How to use get all NSLocalization using genstrings while preserved current translations

Let say my iOS app already have translation localizatible.strings for Japanese. Say "Continue" = "続ける";
However, I've added new NSLocalization additions to my code but I want to use genstrings to get all new NSLocalizations without having to merge them manually.
Is there any way to do that?

There are tools that manage localization and that automatically make updates to translations based on changes to the base language (and helping the translator make the necessary changes only to whatever has been changed).
For example www.gengo.com has a free online tool called Strings (which I haven't tried yet). There are also desktop apps that look very good, such as Localization Manager as part of Localization Suite http://www.loc-suite.org/ (which I haven't tried properly yet either).
Localization agencies may have their own tools, too.
These tools are a must if you do a lot of updates and have several languages but for smaller projects, they can take a bit too much getting used to. For an occasional task or a small project with few languages, manually merging the changes of your base language localizable.strings files to your translated localizable.strings files might be quicker though.

Structure of QuickTime's 'dref' atom 'alis' element

I need to rewrite a QuickTime reference movie, making it point to another set of files.
I'm working in Windows environment, so I don't have acces to the QuickTime API, and being the referenced files unaccesible, I can't also use the COM interface to load the movie because it can't resolve the referenced paths.
The documentation in the "QuickTime File Format Specification" says that the 'dref' atom can have a list of 'alis', 'url ' and 'rsrc' data references. In this case I need to parse the 'alis' elements. According to the reference, "Data reference is a Macintosh alias".
So long, I have not been able to see a declaration of the structure or any related information. Do you know the structure of an alias record? Where can I find detailed information about it's structure?
Thank you a lot for your help!

The format is very similar to the sort of alias that you could generate in the Finder by right-clicking an item, and creating an alias to it.
Aside: When the QuickTime format was originally specified, Apple intelligently chose to incorporate a number of other standards and paradigms that were extensively already being used elsewhere in the OS. This is one of the reasons why QT is (or was) able to do really clever things like reference movies. Unfortunately, there's also now a lot of cruft leftover from OS features that are no longer relevant (ie. AppleShare). Back in its heyday, QuickTime was slick, especially compared to its competitors; today, it's vastly underappreciated due to the buggy Windows port, and the relatively low processing power of the desktop systems of its time.
Back ontopic, unfortunately, the format for alias files is not an open/published standard, and there is precious little documentation on the topic on the 'net. There's one really old doc that deconstructs the alias format used in Mac OS Classic. Although the structure used in OS X is very similar, the alias files themselves tend to be much larger, as they contain numerous extra data strings at the end of the file that are not documented in the above-linked documentation.
Also, aliases created in the finder do look a bit different from the ones contained within the dref atom, although I've never run through them bit-by-bit to deduce the actual differences. If you want to take a peek at what those files, and have the OS X Developer Tools installed, you can run
setfile -a a [filename]
on a Finder-generated alias to strip the file of its alias-ness so that you can look at its contents in a hex editor (otherwise, the OS will just redirect you to the linked file - doh!). You can re-set the file's alias attribute, or arbitrarily designate any file as an alias by running
setfile -a A [filename]
Unfortunately, during my experiments, dumping the alis portion of a QT movie's dref atom has never seemed to generate an alias that Mac OS was able to interpret.
Fortunately (or not, as it was in my case), the functions that Mac OS allegedly uses to create/handle aliases are part of a public API called the Alias Manager, which is part of the very-low-level CoreServices framework. If you've got time to delve into this further, you can write some code to experiment with Mac OS's built-in alias-generating and interpreting capabilities.
Unfortunately, if you're dealing with an old/buggy file, you have no way of knowing if the file was actually generated by CoreServices' Alias Manager, or if that framework has changed/evolved/regressed since then. Because it's a closed format, 3rd-party developers who opt to not use the Alias Manager can only take guesses as to the format's "legal" structure.

You can use this Java program to see what is in the header, and extract data (it's a bit old, but may still work). What is more useful, though, is the thorough discussion by the author about the Quicktime header.
But I think you may just be looking for the Apple documentation, currently found here.

How to do File I/O in Opa?

After reading (nearly) the whole ebook and taking a look at the API
i am still asking myself how to realize "traditional" web server behaviour with opa.
I understand (at least i believe that) that opa links external resources specified at
compile time into the executable, making them immutable and permament.
But what if, say, i would want to change the stylesheet of an application without recompiling it?
There seems to be a few methods in the stdlib (apidoc) but they are not covering
what i am used to from other programming languages.
A possible solution i could think of is making use of the internal database,
but that looks like a bit of an overkill for something simple like traditional File I/O.

Edit: this blog post explains more about dealing with external resources in Opa.
Long story short: you'll rarely work with external files in Opa.
Let me try to break this down. Opa will indeed embed resources. But for development mode you indeed just want to be able to tweak them (mainly CSS) and see changes immediately. If you compiled your program in a non-release mode then it will support this kind of actions (try --help, below is an excerpt)
Debugging Resources : dynamic edition:
[...]
--debug-editable-css
Export the CSS files embedded in the server to the file
system, so that they can be viewed and edited during
execution of the application
For many other editable&changing resources one would indede use the database.
And if you really need to work with files (again: with Opa you'll need it much less than with traditional web languages) then take a look at stdlib.io and, for advanced use, at BslFile module with bindings to Ocaml functions for file manipulation.

I think this module is for you :
http://opalang.org/resources/doc/index.html#file.opa.html/!/value_stdlib.io
import stdlib.io
my_css = File.content("css/file.css")
I am not seeing some way to write file, but I think if you need to write you should use the db.
But to read I think this is the solution :)

Fast-to-load cross-platform alternative to MX files (Mathematica)

In Mathematica, one can save intermediate results / the partial state of the workspace with Save (.m files) or DumpSave (.mx files）.
.m files are portable, but are very slow to load (with large data).
.mx files are fast to load, but are not portable between platforms/architectures.
Is there a way to save generic Mathematica expressions in a way that loading them is fast, and they're portable between platforms? Has anyone experimented with / benchmarked different methods to do this?
One possible solution is to save .m files (cross-platform), then convert them to .mx files when starting work on a new platform (a one-time operation). Is there a fully automatic way to convert .m files to .mx files?

From the posts Alexey linked:
str=OpenWrite[file,BinaryFormat->True];
BinaryWrite[str,Compress[expr],"TerminatedString"];
Close[str];
This is not quite as fast as using an mx file, but it is still very fast.
David Bailey
Another alternative seems to be WDX (Wolfram Data eXchange) which I am
using without problems on a variety of machines and which also seems to
be portable, can be used exactly like MX files, is binary, is documented
and thus I would consider officially supported. And it is used by the
data paclet functionality, so I guess it is reasonably performant and
well tested on all systems (an assumption which my experience does
support up to now).
(excerpt from answer by Albert Retey, also from Alexey's link)
But these do not work as Save/DumpSave does, in that it does not save the FullDefinition of expr, only the explicit value of expr.

Batch source-code aware spell check

What is a tool or technique that can be used to perform spell checks upon a whole source code base and its associated resource files?
The spell check should be source code aware meaning that it would stick to checking string literals in the code and not the code itself. Bonus points if the spell checker understands common resource file formats, for example text files containing name-value pairs (only check the values). Super-bonus points if you can tell it which parts of an XML DTD or Schema should be checked and which should be ignored.
Many IDEs can do this for the file you are currently working with. The difference in what I am looking for is something that can operate upon a whole source code base at once.
Something like a Findbugs or PMD type tool for mis-spellings would be ideal.

As you mentioned, many IDEs have this functionality already, and one such IDE is Eclipse. However, unlike many other IDEs Eclipse is:
A) open source
B) designed to be programmable
For instance, here's an article on using Eclipse's code formatting functionality from the command line:
http://www.peterfriese.de/formatting-your-code-using-the-eclipse-code-formatter/
In theory, you should be able to do something similar with it's spell-checking mechanism. I know this isn't exactly what you're looking for, and if there is a program for doing spell-checking in code then obviously that'd be better, but if not then Eclipse may be the next best thing.

This seems little old but seems to do a good job
Source Code Spell Checker

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas