Batch source-code aware spell check - spell-checking

What is a tool or technique that can be used to perform spell checks upon a whole source code base and its associated resource files?
The spell check should be source code aware meaning that it would stick to checking string literals in the code and not the code itself. Bonus points if the spell checker understands common resource file formats, for example text files containing name-value pairs (only check the values). Super-bonus points if you can tell it which parts of an XML DTD or Schema should be checked and which should be ignored.
Many IDEs can do this for the file you are currently working with. The difference in what I am looking for is something that can operate upon a whole source code base at once.
Something like a Findbugs or PMD type tool for mis-spellings would be ideal.

As you mentioned, many IDEs have this functionality already, and one such IDE is Eclipse. However, unlike many other IDEs Eclipse is:
A) open source
B) designed to be programmable
For instance, here's an article on using Eclipse's code formatting functionality from the command line:
http://www.peterfriese.de/formatting-your-code-using-the-eclipse-code-formatter/
In theory, you should be able to do something similar with it's spell-checking mechanism. I know this isn't exactly what you're looking for, and if there is a program for doing spell-checking in code then obviously that'd be better, but if not then Eclipse may be the next best thing.

This seems little old but seems to do a good job
Source Code Spell Checker

Related

Workflow / best practices for XLIFF

I am using a command line tool (ng-xi18n) to extract the i18n strings from an angular 2 app I wrote. The output of this command is a messages.xlf file. Coming from a .po background, and being not familiar with .xlf, I assumed that this file is the equivalent to the .pot file (correct me if I am wrong).
I then assumed that if I want to translate my app, I had to cp messages.xlf messages.de.xlf to have a copy (messages.de.xlf) of the template file (messages.xlf) where I can translate each message into German (hence the .de.xlf).
After translating some dummy texts and running the app, I saw that it worked as expected, so I quit translating and continued developing the app. After some time, I added more i18n strings, and eventually thought that I had to update my template. And this is where things got hardly maintainable. I updated the template messages.xlf file, and quickly was wondering how I could update the new strings to my already translated messages.de.xlf file without loosing my progress.
When I was developing using .po files, this was no problem thanks to good tools like poEdit, but I didn't find anything comparable for .xlf. After trying some tools, I thought that the best choice would be Lokalize, but I didn't find a possibility to merge the template file to already translated (but outdated) files either.
Up to now, this was rather an essay than a question, so here's a quick summary:
Is the workflow of dealing with .xlf files really comparable to .po as I initially thought (described above), or is it completely different?
How am I suppose to update my already translated files?
What are the best practices dealing with .xlf files?
What are proof of concept tools to work with .xlf?
Sidenotes:
The Lokalize handbook was not helpful at all. I see a lot of functions that sound promising, like:
"File" > "Update file from template". I did not find anything in the handbook to explain this function. If I click on this, nothing happens.
"Sync" > "Open file for sync/merge". This seems to be a function to merge two similar files (by multiple translators) rather than a tool to update the translation file from a template. Even though there is a tooltip in Lokalize's primary sync tab, notifying me about "x unmatched entries", I just couldn't find anything to append those unmatched entries to my .de.xlf file.
[Update] Turns out, I had similar issues as in this question. After downgrading my version of Lokalize to the suggested one, many issues (including the ones mentioned in the question) disappeared. However, now the "Update file from template" option is greyed out, and I don't know why.
I also tried OmegaT, which does not work at all on my platform (Ubuntu 16.04).
[Update] Virtaal works great for merging new strings from a template, but the UI in general is very poorly designed...
Googling did not help, as every hit seems to be related to XCode or something.
Thanks for any help in advance, I really appreciate it
I wrote a small npm command line tool called xliffmerge.
In principle it does the same, that Roland Oldengarm does with his gulp tasks described in his blog article.
It is free and you can have a look at it at https://github.com/martinroob/ngx-i18nsupport#readme
The best workflow automation solution I have seen described so far is from Roland Oldengarm's blog entry "Angular 2: Automated i18n workflow using gulp". To summarize, in a few dozen lines of Gulp code he created the tooling to handle some of the challenges you faced. Specifically it runs ng-xi18n to extract the messages; creates an English translation with sources copied to targets; updates existing translations by adding new trans-units, keeping existing ones, and removing missing ones; and then exposes all xlf files as TypeScript string constants. These last strings can then be imported to supply the bootstrapModule with its translation provider options.
Caveat: I have not used this exact solution (and code) myself, but I was able to expose generated xlf as TypeScript strings and use them in an app in a manner similar to what he described. As for maintaining translations, I have leveraged IntelliJ IDEA (WebStorm) file comparison features and Counterparts Lite (for Mac) for that. My own efforts are still in early stages but are working end to end for an application that is in active development.
Official Angular docs are now updated for Internationalization (i18n) at https://angular.io/docs/ts/latest/cookbook/i18n.html including a section specifically for creating a translation source file with the ng-xi18n tool.

Domain specific language IDE

I've recently developed a domain-specific language using flex and bison. I would like to create a user interface for editing script files using this language. In particular I would like to have common functionalities such as file handling, menus, buttons, syntax highlighting, error checking and so on. Do you know any tool which can be used to develop such kind of application? I would prefer one which can give me a prototype rapidly.
such as file handling, menus, buttons, syntax highlighting, error
checking
I think that file handling, menus, buttons and highlighting are your least concerns. What you call "error checking" on the other hand. That can be a tough nut. I will try to give you some pointers to how you can (in a somewhat primitive manner) detect errors on the fly as the user inputs source code in the editor.
I assume you wish for something like Eclipses (for java at least) real time analysis of the written code in the editor? I'm not familiar with how Eclipse work internally but this is probably done by some pre-compilation process that processes all source code again and again as you change it.
One way to prototype this (and indeed build a non-prototype as well) would be to use Flex and Bison, and I notice you already is familiar with these tools. You can build you grammar and create action code for all interesting parts so you can find syntax deviations fairly easy. After this you make your editor run the flex and bison generated c-code as the user writes the source code in you IDE and have some way of displaying the output. Either in a terminal-like status window or directly in the text-editing field (as Eclipse does) (the latter case is probably a pain to build but by no means impossible and would give you IDE a professional touch).
Suppose you would like to build an IDE for ADA 95 the following Flex and Bison (Actually Lex and Yacc) code could help you do exactly this (it's a decent syntax analyzer that reports errors (what and where)):
http://www.adaic.org/resources/add_content/standards/95lrm/lexer9x.l
http://www.adaic.org/resources/add_content/standards/95lrm/grammar9x.y
Hope this helps.
Edit:
to have cool error highlighting and such in the text-editor field of your IDE you could let your bison generated syntax analyser generate something thats easy to parse, like XML, that contains the type of error and where it lies (row and column for example) and then use that to display the errors... you simply embed an XML parser in the IDE (lots of free one available) and extract the data you need and change the display accordingly... That shouldn't be rocket science when I think about it.

What open source tools could help me understand a large legacy application written in C?

I need a tool that I can use to get a better understanding of a large C
project. I'd like to be able to see the relationship between the various C
modules and what calls what, most used functions, what headers are used, etc.
I've searched here and Google but all the source code analysis tools seem to give
you the number of lines of code and other metrics that I'm not interested in. I just
want to get a high level view of how things are structured and interconnected before jumping into the code.
Does anything like this exist?
I've looked at these but they do not seem to do what I want: Source Code Tools
Since posting this I've tried Doxygen and it seems to give me some of what I need. Any others?
Try GNU cflow, that will analyze the call tree of the functions - you will nicely see the call hierarchy of the functions. Or browse the code with Eclipse.
Source Navigator may be helpful for some things (I used it to see call trees). See screenshots.
cxref builds annotated source code cross reference that's easy to view and navigate (I used to create HTML reference of some of my code). See cxref's output on its own source code here. Can be used to document the code.
It is not OSS, but the tool CppDepend can certainly help when it comes to understand a large legacy application written in C or C++.

What does (filename.java.i, filename.jar.i) extension mean

I have files named xxx.java.i,xxx.java.d,xxx.jar.i. I know that these file are somehow related to Java. What does this extension mean and for what is it used? Is it same type as the .class extension?
You should look at your build system for more information. It is possible that these are intermediate files that get transformed and renamed to ".java". For example, I've seen various build systems that use the ".i" suffix to mean "input", and perform various forms of variable substitution (e.g. changing something like "{VERSION_NUMBER}" to the version number of the library being compiled).
I think they are created by someone to serve his own purpose and unless we ask the author or see the content we won't know what it the purpose is.
If you see garbled characters, it's probably java bytecode and you can use some decompiler to see the code (see: How do I "decompile" Java class files?).

How to decide on document file extension?

I'm writing a new document-based cross-platform chemistry application (Win, Mac, Unix), which saves files in its own format (no standard format exists for this field). I'm trying to decide on a file extension for the saved files. My questions are:
How important is it nowadays to stick to 3 characters?
Where can you check how much this file extension is already used? (Google helps, of course, but it does not tell me how much a given app is popular)
Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").
Finally, do you have other important guidelines when choosing for your own programs?
PS: I tried to keep the right balance between "giving too little information" and "being too specific to be really useful to others". I'll happily provide more information in comments if the need arises.
FileInfo.com lists a lot of file extensions along with their own estimation of how much it is ued.
I suggest a unique extension (rather then xml.gz) so that the OS can identify the file type to users when looking at a file listing or whatever. 'Ringing a bell' is important, especially if you will have less sophisticated users.
I don't see any need to stick to 3 characters, but I wouldn't go bigger than 5 (I don't suppose I have a real reason for this, other than personal preference).
How important is it nowadays to stick to 3 characters?
It's not unless you have to support older operating systems. All current OSes handle >3 char file extensions without any problems. Think of .html, .config, .resx, and I'm sure there are more.
Where can you check how much this file extension is already used?
check out FileExt.
Do I really need to use a file-specific extension? My save
format is gzip'ed XML, so I could name
it .xml.gz, but I fear it would
confuse beginning users (i.e. when you
see it, it does not immediately "ring
a bell").
Remember that windows (and windows users) associate files with applications by extension, so using something too generic like .xml.gz may cause problems. You are probably better coming up with something that is more specific to your file type or application. Users don't care weather your format is gzipped xml internally, they care about what is in the file. Think about abstraction layers, your users will think of it as a file containing chemistry info not gzipped xml, so .chem is far more appropriate than .xml.gz
Some suggestions of things to thing about:
Obviously, don't clash with anything big - Don't use .doc, .xls, .exe, etc.
Don't clash with anything common in your industry domain that your user demographic is likely to have installed. For example, if you are writing a programming tool, don't use .cs or .cpp. You probably know your domain best, so write a list of all the apps you and your users are likely to have installed, and any of their competitors and avoid them.
Make sure your app includes the options to register and unregister the extension. don't just automatically do it in the installation, make sure it's an option.
Remember unix/linux and Mac are case sensitive, so consider sticking to always all lower case by default.
Remember CD/DVD file naming rules are stricter, so don't use non alpha numeric characters.
Finally, remember that most non-tech users are going to have file extensions turned off, so don't stress about it too much.
There is more info here.
Wikipedia has lists of files extensions here (by type) and here (alphabetical), and also some general information
Depends on the platform, but in general, not very important for newer Operating Systems. Check the documentation for the platforms you're targeting.
I'm not aware of better alternatives to Google. Hopefully someone else has a better suggestion for this one.
Not unless you have some reason to do so. Examples would be "I want to ensure that Windows always opens this program with my app". I'm not sure that your users need to be concerned with the extension anyway. The default configuration on Windows, for example, is to hide extensions for known file types. BUT if you have a compelling reason (such as allowing your program to easily identify files it should be able to handle, for example) then you could use the extension, or you could come up with something else.
I have only ever once written a program where I thought I needed to come up with my own extension. I used my initials. Then later I realized I didn't really need a special extension and reverted to ".xml". However, most extensions seem to be something that seems to mean something. (.doc for documents, etc.) so something meaningful is a good idea if you do need to go this route.
It sure depends on the OSes you want to support, but people have globally moved over the 3-characters extension limit these days: .html is well used for webpages, for example.
Of course, if you go to much longer extensions, people will stop visually recognizing it as a file extension, I think...
Barring your needing to be compatible with a specific OS that you know still has the three-letter limitation, no need to keep it to three characters. It may be useful to have a three-character version of it if you end up supporting those platforms.
The Wikipedia list of file formats is pretty good. Some mime mapping lists will list common extensions associated with those mappings. Ray already mentioned FileInfo.com.
It's a convenience thing; I'd probably go with your own but document the fact that they're just gzipped XML files conforming to a specific DTD and make it easy for users to use .xml.gz instead. Be sure that your software doesn't care about the extension, so that users could even choose their own if they wanted, although I'd tend to avoid encouraging them to by providing a reasonable default.
I'd go for typeability, clarity, uniqueness, and brevity -- in that order. For instance, .config is a lot easier to type than .q2z but it falls down on uniqueness. (I'm not suggesting it for your app; it's an example.) Similarly, .q2z is just a pain. :-) So for instance, .chemstuff is easy to type and probably not in wide use elsewhere. (Again, not a suggestion, just an example.)
Have it as document_name.app_name.xml.gz where document_name and app_name are variables, the latter some easily readable and recognisable short string of your application's title.
Modern systems are quite flexible, and there is absolutely no need to drag the 3-character extensions further along in time with us.
I agree that .xml.gz would confuse users, however keep in mind that modern systems are moving into recognizing files not based on extensions but by probing their headers and even contents instead. In fact, users do not often even see the extensions. For gzipped XML files, a system may decide to first unpack the file stream in memory, then find out it is a literal XML file, then it may take its 'xmlns' as the application identifier. However, such systems are not yet widespread use. In any case, don't make the mistake of only opening files by extension - be smart and raise the bar - do exactly the above to find out if the file can be considered a document for your application.