Which is it Perl or perl, TIF or TIFF, ant or Ant, ClearCase or Clear Case? - semantics

In one sentence I have manage to create 16 possible variations on how I present information. Does it matter as long as the context is clear? Do any common mistakes irritate you?

regarding Perl: How should I capitalize Perl?
TIFF stands for Tagged Image File Format, whereas the extension of files using that format is often ".tif".
That is for the purpose of compatibility with 8.3 filenames, I believe.
I generally like the Perl way of capitalizing when used as a proper noun, but lowercasing when referring to the command itself (assuming the command is lowercase to begin with).

Well, Perl and TIFF have already been answered, so I'll add the last two
the Apache Foundation writes "Apache Ant".
Rational ClearCase (or sometimes "IBM Rational ClearCase") is written as such at its web site.

Even though Perl was originally an acronym for Practical Extration and Report Language, it is written Perl.
These things dont 'bother' me as much as they provide insights into the level of knowledge of the speaker/author. You see, we work in a industry that requires precision, so precision in language does matter as it affects the understanding of the consumer.

The one that really seems to bother me is when people fully upper case JAVA as though it was an acronym.

Related

Quick uses for scripting languages?

I feel that there are a lot of quick uses for scripting languages that you may only think of if you have the shell open at all times. I leave a terminal tab open with python running and have solved many problems with a few lines of code typed off the top of my head. What are some of your less obvious uses for the scripting language of your choice.
Most recently in my Windows centric world I have used it to rename large numbers of files, search/filter log files for a specific occurrence, perform network diagnostics, and a host of smaller things I can't think of at the moment that some of my colleagues not having a UNIX background would never have thought of.
I just used a Lua script in SciTE to take a selected SVG path and do some operations on it (find min values and translate to 0, scale, round up values to avoid having a ton of decimal digits). It is just handy.
Reformat text is some complicated way;
Prepare some text based on a template logic;
Rename multiple files (e.g. music collection or photos);
etc.
Something very similar was discussed in the Wikibooks article Ad Hoc Data Analysis From The Command Line.
This mostly discusses the use of Unix commands rather than scripting languages, but the principle is the same ... have a shell open at all times.
Use BeautifulSoup to clean up some HTML.

How to decide on document file extension?

I'm writing a new document-based cross-platform chemistry application (Win, Mac, Unix), which saves files in its own format (no standard format exists for this field). I'm trying to decide on a file extension for the saved files. My questions are:
How important is it nowadays to stick to 3 characters?
Where can you check how much this file extension is already used? (Google helps, of course, but it does not tell me how much a given app is popular)
Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").
Finally, do you have other important guidelines when choosing for your own programs?
PS: I tried to keep the right balance between "giving too little information" and "being too specific to be really useful to others". I'll happily provide more information in comments if the need arises.
FileInfo.com lists a lot of file extensions along with their own estimation of how much it is ued.
I suggest a unique extension (rather then xml.gz) so that the OS can identify the file type to users when looking at a file listing or whatever. 'Ringing a bell' is important, especially if you will have less sophisticated users.
I don't see any need to stick to 3 characters, but I wouldn't go bigger than 5 (I don't suppose I have a real reason for this, other than personal preference).
How important is it nowadays to stick to 3 characters?
It's not unless you have to support older operating systems. All current OSes handle >3 char file extensions without any problems. Think of .html, .config, .resx, and I'm sure there are more.
Where can you check how much this file extension is already used?
check out FileExt.
Do I really need to use a file-specific extension? My save
format is gzip'ed XML, so I could name
it .xml.gz, but I fear it would
confuse beginning users (i.e. when you
see it, it does not immediately "ring
a bell").
Remember that windows (and windows users) associate files with applications by extension, so using something too generic like .xml.gz may cause problems. You are probably better coming up with something that is more specific to your file type or application. Users don't care weather your format is gzipped xml internally, they care about what is in the file. Think about abstraction layers, your users will think of it as a file containing chemistry info not gzipped xml, so .chem is far more appropriate than .xml.gz
Some suggestions of things to thing about:
Obviously, don't clash with anything big - Don't use .doc, .xls, .exe, etc.
Don't clash with anything common in your industry domain that your user demographic is likely to have installed. For example, if you are writing a programming tool, don't use .cs or .cpp. You probably know your domain best, so write a list of all the apps you and your users are likely to have installed, and any of their competitors and avoid them.
Make sure your app includes the options to register and unregister the extension. don't just automatically do it in the installation, make sure it's an option.
Remember unix/linux and Mac are case sensitive, so consider sticking to always all lower case by default.
Remember CD/DVD file naming rules are stricter, so don't use non alpha numeric characters.
Finally, remember that most non-tech users are going to have file extensions turned off, so don't stress about it too much.
There is more info here.
Wikipedia has lists of files extensions here (by type) and here (alphabetical), and also some general information
Depends on the platform, but in general, not very important for newer Operating Systems. Check the documentation for the platforms you're targeting.
I'm not aware of better alternatives to Google. Hopefully someone else has a better suggestion for this one.
Not unless you have some reason to do so. Examples would be "I want to ensure that Windows always opens this program with my app". I'm not sure that your users need to be concerned with the extension anyway. The default configuration on Windows, for example, is to hide extensions for known file types. BUT if you have a compelling reason (such as allowing your program to easily identify files it should be able to handle, for example) then you could use the extension, or you could come up with something else.
I have only ever once written a program where I thought I needed to come up with my own extension. I used my initials. Then later I realized I didn't really need a special extension and reverted to ".xml". However, most extensions seem to be something that seems to mean something. (.doc for documents, etc.) so something meaningful is a good idea if you do need to go this route.
It sure depends on the OSes you want to support, but people have globally moved over the 3-characters extension limit these days: .html is well used for webpages, for example.
Of course, if you go to much longer extensions, people will stop visually recognizing it as a file extension, I think...
Barring your needing to be compatible with a specific OS that you know still has the three-letter limitation, no need to keep it to three characters. It may be useful to have a three-character version of it if you end up supporting those platforms.
The Wikipedia list of file formats is pretty good. Some mime mapping lists will list common extensions associated with those mappings. Ray already mentioned FileInfo.com.
It's a convenience thing; I'd probably go with your own but document the fact that they're just gzipped XML files conforming to a specific DTD and make it easy for users to use .xml.gz instead. Be sure that your software doesn't care about the extension, so that users could even choose their own if they wanted, although I'd tend to avoid encouraging them to by providing a reasonable default.
I'd go for typeability, clarity, uniqueness, and brevity -- in that order. For instance, .config is a lot easier to type than .q2z but it falls down on uniqueness. (I'm not suggesting it for your app; it's an example.) Similarly, .q2z is just a pain. :-) So for instance, .chemstuff is easy to type and probably not in wide use elsewhere. (Again, not a suggestion, just an example.)
Have it as document_name.app_name.xml.gz where document_name and app_name are variables, the latter some easily readable and recognisable short string of your application's title.
Modern systems are quite flexible, and there is absolutely no need to drag the 3-character extensions further along in time with us.
I agree that .xml.gz would confuse users, however keep in mind that modern systems are moving into recognizing files not based on extensions but by probing their headers and even contents instead. In fact, users do not often even see the extensions. For gzipped XML files, a system may decide to first unpack the file stream in memory, then find out it is a literal XML file, then it may take its 'xmlns' as the application identifier. However, such systems are not yet widespread use. In any case, don't make the mistake of only opening files by extension - be smart and raise the bar - do exactly the above to find out if the file can be considered a document for your application.

Autodocumentation type functionality for Fortran?

In the past I've used Doxygen for C and C++, but now I've been thrown on Fortran project and I would like to get a quick all encompassing look at the architecture.
In the past I've found reverse engineering tools to be useful where no documentation of the architecture exists.
So, is there a tool out there that will reverse engineer Fortran code?
I tried to use Doxygen, but didn't have any luck. I will be working with two different projects - one Fortran 90 and I think is in Fortran 77.
Thanks for any insights and feedback.
Tools which may help with reverse engineering:
SciTools Understand
Link with some more tools (search "fortran")
Also, maybe some of these unit testing frameworks will be helpful (I haven't used them, so I cannot comment on the pros and cons of any of them):
FUnit
FRUIT
Ftnunit
(these links link to fortranwiki, where you can find a tidbit on every one of them, and from there there are links to their home sites).
Doxygen 1.6.1 will generate documentation, call graphs, etc. for Fortran source code in free-format (F90) format. You are out of luck for auto-documenting fixed-format (F77) code with doxygen.
All is not lost, however. The conversion from fixed to free format is straightforward and can be automated to a great degree - change comment characters to '!', change continuation characters to '&', and append '&' to lines to be continued. In fact, if the appended continuation character is placed in column 73, it should be ignored by standard F77 compilers (which still only recognize code in columns 1 through 72) but will be recognized by F9x/F2003/F2008 compilers. This allows the same code to be recognized as both in fixed and free format, which lets you gracefully migrate from one format to the other.
Conveniently, there are about a thousand small programs that will do this format adjustment to some degree or another. Realistically, if you're going to be maintaining the code, you might as well move it away from the 1928 spec for Hollerith (IBM) punched cards. :)

Converting PCL to PDF

I am looking to create (as a proof-of-concept) an OCaml (preferably) program that converts PCL code to PDF format. I am not sure where to start. Is there a standardized algorithm for doing so? Is there any other advice available for accomplishing this task?
Thanks!
Conversion of PCL to PDF can be incredibly complex (assuming you need it to be generic and not just for simple PCL). We've investaged this many times and in the end always revert to using other tools. We keep investigating as we are a development shop who uses and understands all elements of PCL to great detail. If you are not really familure with PCL it will be daunting task. One of the major issues is that overtime, printers have become, for the most part, tollerent of malformed PCL and as such, creating something that follows the rules to the letter of the law is not always sufficient. If; however, you have control over the PCL, you may be able to work it out with some amount of success.
I don't mean to turn you off of this and I realize that you've come here looking for a programming answer but I have to say, this is a far from simple task and there are no 'standarized algorithms' for this (that I'm aware of).
If this is designed to be a tool to work alongside of somehting else you are building I'd highly recommend looking at these guys:
PageTech
This is by far the most complete set of tools (Windows) for handling this. There are a few others but, based on our extensive use of PCL and conversion tools over the years, this is the only one that work all the time.
EDIT: Most recently we've been working with LincPDF (http://www.lincolnco.com/). This is also an excellent product with has one big benefit, deployment is simple. Some of the other tools have complex software installations. This solution is very easy for us to deploy as a feature in an application. It's also faster then any tools we've tested to date (at least with the PCL that we generate from our apps which is quite complex as they include specialized fonts and macros).
Ghostscript developers have recently integrated their sister products GhostXPS, GhostPCL and GhostSVG into their Ghostscript source code tree. (It's now called GhostPDL.) So all of these additional functionalities (load, render and convert XPS, PCL and SVG) are now available from there.
This means you could build their language switching binary from their sources. This, in theory, can consume PCL, PDF and PostScript and convert this to a host of other formats. While it worked for me whenever I needed it, Ghostscript developers recommend to stop using the language switching binary (since it's 'almost non-supported' -- see KenS' comment to this answer) and instead switch to using the explicit binaries pcl6.exe (PCL input), gsvg.exe (SVG input, also 'almost non-supported') and gxps.exe (support status unclear to me).
So to 'convert PCL code to PDF format' as the request areads, you could use the pcl6 command line utility, a sister product to Ghostscript's gs/gswin32c.exe.
Sample commandline:
pcl6.exe \
-o output.pdf \
-sDEVICE=pdfwrite \
[...more parameters as required (optional)...] \
-f input.pcl
Updated as per KenS' hints in the comment....
There is a series of reference books from HP; you could re-implement a PCL parser and output corresponding PDF.
You might start with the "PCL 5 Printer Language Technical Reference Manual" (http://h20000.www2.hp.com/bc/docs/support/SupportManual/bpl13210/bpl13210.pdf) . Search HP for more (http://search.hp.com/query.html?qt=PCL+reference).
Or you could steal code or ideas from GhostPCL (http://www.ghostscript.com/GhostPCL.html)

Batch source-code aware spell check

What is a tool or technique that can be used to perform spell checks upon a whole source code base and its associated resource files?
The spell check should be source code aware meaning that it would stick to checking string literals in the code and not the code itself. Bonus points if the spell checker understands common resource file formats, for example text files containing name-value pairs (only check the values). Super-bonus points if you can tell it which parts of an XML DTD or Schema should be checked and which should be ignored.
Many IDEs can do this for the file you are currently working with. The difference in what I am looking for is something that can operate upon a whole source code base at once.
Something like a Findbugs or PMD type tool for mis-spellings would be ideal.
As you mentioned, many IDEs have this functionality already, and one such IDE is Eclipse. However, unlike many other IDEs Eclipse is:
A) open source
B) designed to be programmable
For instance, here's an article on using Eclipse's code formatting functionality from the command line:
http://www.peterfriese.de/formatting-your-code-using-the-eclipse-code-formatter/
In theory, you should be able to do something similar with it's spell-checking mechanism. I know this isn't exactly what you're looking for, and if there is a program for doing spell-checking in code then obviously that'd be better, but if not then Eclipse may be the next best thing.
This seems little old but seems to do a good job
Source Code Spell Checker