What does (filename.java.i, filename.jar.i) extension mean - filenames

I have files named xxx.java.i,xxx.java.d,xxx.jar.i. I know that these file are somehow related to Java. What does this extension mean and for what is it used? Is it same type as the .class extension?

You should look at your build system for more information. It is possible that these are intermediate files that get transformed and renamed to ".java". For example, I've seen various build systems that use the ".i" suffix to mean "input", and perform various forms of variable substitution (e.g. changing something like "{VERSION_NUMBER}" to the version number of the library being compiled).

I think they are created by someone to serve his own purpose and unless we ask the author or see the content we won't know what it the purpose is.
If you see garbled characters, it's probably java bytecode and you can use some decompiler to see the code (see: How do I "decompile" Java class files?).

Related

Get importlib directives from type library

How can one programmatically determine which type libraries (GUID and version) a given native, VB6-generated DLL/OCX depends on?
For background: The VB6 IDE chokes when opening a project where one of the referenced type libraries can't load one of its dependencies, but it's not so helpful as to say which dependency can't be met--or even which reference has the dependency that can't be met. This is a common occurrence out my company, so I'm trying to supplement the VB6 IDE's poor troubleshooting information.
Relevant details/attempts:
I do have the VB source code. That tells me the GUIDs and versions as of a particular revision in the repo, but when analyzing a DLL/OCX/TLB file I don't know which version of the repo (if any--could be from a branch or might never have been committed to a branch) a given DLL/OCX corresponds to.
I've tried using tlbinf32.dll, but it doesn't appear to be able to list imports.
I don't know much about PE, but I popped open one of the DLLs in a PE viewer and it only shows MSVBVM60.dll in the imports section. This appears to be a special quirk of VB6-produced type libraries: they link only to MSVBVM60 but have some sort of delay-loading mechanism for the rest of the dependencies.
Even most of the existing tools I've tried don't give the information--e.g., depends.exe only finds MSVBVM60.dll.
However: OLEView, a utility that used to ship with Visual Studio, somehow produces an IDL file, which includes the importlib directives. Given that VB doesn't use IDL files, it's clearly generating the information somehow. So it's possible--I just have no idea how.
Really, if OLEView didn't do it I'd have given it up by now as impossible. Any thoughts on how to accomplish this?
It turns out that I was conflating basic DLL functionality and COM. (Not all DLLs are COM DLLs.)
For basic DLLs, the Portable Executable format includes a section describing its imports. The Optional Header's directory 1 is about the DLL's imports. Its structure is given by IMAGE_IMPORT_DESCRIPTOR. This is a starting point for learning about that.
COM DLLs don't seem to have an equivalent as such, but you can discover which other COM components its public interface needs: for each exposed interface, list out the types of their properties and their method arguments, and then use the Registry to look up where those types come from. tlbinf32.dll provides some of the basic functionality for listing members, etc. Here's and intro to that.

YGuard obfuscate single class, package and exclude libraries

I'm trying to use YGuard to obfuscate some parts of my program which contain encryption methods and other sensitive information (which I'll further protect in other ways once I figure this out).
Because the program is quite complex and contains quite many libraries it obviously gives a series of warning and finally fails with:
WARNING: Method initialize_ffi_type is native but com/sun/jna/Native is not kept/exposed.
WARNING: Method getAPIChecksum is native but com/sun/jna/Native is not kept/exposed.
[...]
yGuard was unable to resolve a class (java.lang.ClassNotFoundException: com.sun.tools.javac.parser.Parser$Factory)
Now whatever that means I'd like to
exclude libraries which being all open source have nothing to hide so far
obfuscate just the methods and variables of some Class or some package and leave the rest untouched.
So far in YGuard it seems I have to specify what I don't want to be obfuscated, however I have far too many classes, I'd like instead to do the opposite: Specify what I'd like to obfuscate and proceed increasing the number of Classes and packages I want obfuscated.
Thanks
It is the normal practice for obfuscators to specify what should be kept and not the other way around.
However, you can define library classpaths with the externalclasses rule (link). Classes that are defined in this path are neither obfuscated nor shrinked. The second error you are getting (ClassNotFoundException) indicates that you have not specified all libraries that your project depends on.
In order to obfuscate your code now, what you could do is:
Pack the code that you want to be obfuscated in one jar and define everything else as a library
use a patternset in your keep rule (link) to define everything to be kept except the classes that you want to have obfuscated.

What's the best approach to incremental compilation when building a DSL using Eclipse?

As suggested by the Eclipse documentation, I have an org.eclipse.core.resources.IncrementalProjectBuilder that compiles each source file and separately I also have a org.eclipse.ui.editors.text.TextEditor that can edit each source file. Each source file is compiled into its own compilation unit, but it can reference types from other (already compiled) source files.
Two tasks for which this is important are:
Compiling (to make sure the types we're using actually exist)
Autocomplete (to look up the type so we can see what properties/methods are present on it)
To accomplish this, I want to store a representation of all the compiled types in memory (referred to below as my "type store").
My question is two fold:
Task one above is performed by the builder and task two by the editor. So that they both have access to this type store, should I create a static store somewhere that they both can have access to, or does Eclipse provide a neater way to deal with this problem? Note that it is eclipse, not me, that instantiates the builders and editors when they are needed.
When opening eclipse, I don't want to have to rebuild the whole project just so I can re-populate my type store. My best solution so far is to persist this data somewhere and then repopulate my store from that (perhaps upon project open). Is this how other incremental compilers typically do this? I believe Java's approach is to use a special parser that efficiently extracts this data from the class files.
Any insights would be really appreciated. This is my first DSL.
This is an interesting question and one that doesn't have a simple solution. I'll try to describe a potential solution and also describe in a little bit more detail how JDT accomplishes incremental compilation.
First, a bit about JDT:
Yes, JDT does read class files for some of its information, but only for libraries that don't have source code. And this information is really only used for editing assistance (content assist, navigation, etc).
JDT computes incremental compilation by keeping track of dependencies between compilation units as they are compiled. This state information is stored on disk and retrieved and updated after each compile.
As a more complete example, let's say that after a full build, JDT determines that A.java depends on B.java, which depends on C.java.
If there is a structural change in C.java (a structural change is a change that can affect outside files (e.g., adding/removing a non-private field or method)), then B.java will be recompiled. A.java will not be recompiled since there was no structural change in B.java.
After this bit of clarification on how JDT works, here are some possible answers to your questions:
Yes. This must be done through statically accessible global objects. JDT does this through the JavaCore and JavaModelManager objects. If you don't want to use global singletons, then you can access to your type store available through your plugin's Bundle activator instance. The e4 project does allow dependency injection, which is probably even better (but is not really a part of the core Eclipse APIs).
I think persisting the information on the file system is your best bet. The only real way to determine incremental compile dependencies is to do a full build, so you need to persist the information somewhere. Again, this is how JDT does it. The information is stored in your workspaces' .metadata directory somewhere in the org.eclipse.core.resources plugin. You can have a look at the org.eclipse.jdt.internal.core.builder.State class to see the implementation.
So, this may not be the answer you are looking for, but I think this is the most promising way to approach your problem.

Batch source-code aware spell check

What is a tool or technique that can be used to perform spell checks upon a whole source code base and its associated resource files?
The spell check should be source code aware meaning that it would stick to checking string literals in the code and not the code itself. Bonus points if the spell checker understands common resource file formats, for example text files containing name-value pairs (only check the values). Super-bonus points if you can tell it which parts of an XML DTD or Schema should be checked and which should be ignored.
Many IDEs can do this for the file you are currently working with. The difference in what I am looking for is something that can operate upon a whole source code base at once.
Something like a Findbugs or PMD type tool for mis-spellings would be ideal.
As you mentioned, many IDEs have this functionality already, and one such IDE is Eclipse. However, unlike many other IDEs Eclipse is:
A) open source
B) designed to be programmable
For instance, here's an article on using Eclipse's code formatting functionality from the command line:
http://www.peterfriese.de/formatting-your-code-using-the-eclipse-code-formatter/
In theory, you should be able to do something similar with it's spell-checking mechanism. I know this isn't exactly what you're looking for, and if there is a program for doing spell-checking in code then obviously that'd be better, but if not then Eclipse may be the next best thing.
This seems little old but seems to do a good job
Source Code Spell Checker

Process for reducing the size of an executable

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC
General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.
Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.
On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.
Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.
Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.
Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.
To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.
You could look at something like executable compression.