why we should use key word "-keep" in proguard.cfg - proguard

I don't understand that why we need to use the key word "-keep" in proguard.cfg.
The purpose of proguard is to confuse people who try to read the source code.
In my opinion, we should try to make all codes confused, but why we use "-keep" to keep some source code unchanged.

You're right, for best optimization and obfuscation results, you should use -keep as sparingly as possible. However, if the code is performing reflection, the options may be necessary to preserve specific classes, fields, and methods with their original names. For instance, an external configuration file may contain class names, which can only work if the classes are still present.

Related

Is defining all constants of a project in a single class acceptable?

There are some constants and enumerations in a project, and each one is used by some other classes.
As a design pattern, is it acceptable to create a class for constants and enumerations definition? Or is there a better way to define and use those constants?
It depends on the problem domain. Generally speaking it is rather standard practice to keep them in Java enumeration. The question is - how would you like to use those constants? I have such experience, that constants being hold in interfaces/enumerations are being duplicated and created over and over again due to lack of the knowledge of developers of past constants. In the result, there are many files as such Constants.java, BusinessLogic.java, AppConstants.java etc.. It causes big overwhelm over the purpose and then you don't know if the some constant, lets say APP_MODE should be used from Constants.java or AppConstants.java ?
One of the solutions is to keep those constants in one (or many?) properties files and inject thme using spring' #Value annotation.
You may group by using some prefixing, building groups separated by dot.
One of the advantages of the property files is that you keep one Java logic of using properties, but you still can provide property file (which may vary depending on application). A lot of flexibility, no redundancy.
Another solution is to create one Service to provide properties / constants from database. You can differentiate the values over diffrent environements, but that's another story.
If I were you I create a constant container class packege by package. Just span the logically coherent parts together. Otherwise you will increase the the coupling and dependency. And the most general constants (problem domain independent ones) take place in the utility package's constant container class.

Difference between INCLUDE and modules in Fortran

What are the practical differences between using modules with the use statement or isolated files with the include statement? I mean, if I have a subroutine that is used a lot throughout a program: when or why should I put it inside a module or just write it in a separate file and include it in every other part of the program where it needs to be used?
Also, would it be good practice to write all subroutines intended to go in a module in separate files and use include inside the module? Specially if the code in the subroutines is long, so as to keep the code better organized (that way all subroutines are packed in the mod, but if I have to edit one I don't need to go though a maze of code).
The conceptual differences between the two map through to very significant practical differences.
An INCLUDE line operates at the source level - it accomplishes simple ("dumb") text inclusion. In the absence of any special processor interpretation of the "filename" (no requirement for that actually to be a file) in the include line the complete source could quite easily be manually spliced together by the programmer and fed to the compiler with no difference what-so-ever in the semantics of the source. Included source has no real interpretation in isolation - its meaning is completely dependent on the context in which the include line that references the included source appears.
Modules operate at the much higher entity level of the program, i.e. at the level where the compiler is considering the things that the source actually describes. A module can be compiled in isolation of its downstream users and once it has been compiled the compiler knows exactly what things the module can provide to the program.
Typically what someone using include lines is hoping to do is what modules were actually designed to do.
Example issues:
Because entity declarations can be spread over multiple statements the entities described by included source might not be what you expect. Consider the following source to be included:
INTEGER :: i
In isolation it looks like this declares the name i as an integer scalar (or perhaps a function? Who knows!). Now consider the following scope that includes the above:
INCLUDE "source from above"
DIMENSION :: i(10,10)
i is now a rank two array! Perhaps you want to make it a POINTER? An ALLOCATABLE? A dummy argument? Perhaps that results in an error, or perhaps it is valid source! Throw implicit typing into the mix to really compound the potential fun.
An entity defined in a module is "completely" defined by the module. Attributes that are specific to the scope of use can be changed (VOLATILE, accessibility, etc), but the fundamental entity remains the same. Name clashes are explicitly called out and can be easily worked around with a rename clause on the USE statement.
Fortran has restrictions on statement ordering (specification statements must go before executable statements, etc.). Included source is also subject to those restrictions, again in the context of the point of inclusion, not the point of source definition.
Mix well with source ambiguity between statement function definitions (specification part) and assignment statements (executable part) for some completely obtuse error messages or, worse, silent acceptance by the compiler of erroneous code.
There are requirements on where the USE statement that references a module appears, but the source for the actual module program unit is completely independent of its point of use.
Fancy having some global state to be shared across related procedures and you want to use include? Let me introduce you to common blocks and the associated underlying concept of sequence association...
Sequence association is a unfortunate bleed-through of early underlying Fortran processor implementation that is an error prone, inflexible, anti-optimisation anachronism.
Module variables make common blocks and their associated evils completely unnecessary.
If you were using include lines, then note that you don't actually include the source of a commonly used procedure (the suggestion in your first paragraph is just going to result in a morass of syntax errors from the compiler). What you would typically do is include source that describes the interface of the procedure. For any non-trivial procedure the source that describes the interface is different from the complete source of the procedure - implying that you now need to maintain two source representations of the same thing. This is an error prone maintenance burden.
As mentioned - the compilers automatically gains knowledge of the interface of a module procedure (the compiler knowledge is "explicit" because it actually saw the procedure's code - hence the term "explicit interface"). No need for the programmer to do anything more.
A consequence of the above is that external subprograms should not be used at all unless there are very good reasons to the contrary (perhaps the existence of circular or excessively extensive dependencies) - the basic starting point should be to put everything in a module or main program.
Other posters have mentioned the source code organisation benefits of modules - including the ability to group related procedures and other "stuff" into the one package, with control over accessibility of internal implementation details.
I accept there is a valid use of INCLUDE lines as per the second paragraph of the question - where large modules become unwieldy in size. F2008 has addressed this with submodules, which also bring a number of other benefits. Once they become widely supported the include line work-around should be abandoned.
A second valid use is to overcome a lack of support by the language for generic programming techniques (what templates provide in C++) - i.e. where the types of objects involved in an operation may vary, but the token sequence that describes what to do on those objects is essentially the same. It might be another decade or so before the language sorts that out.
Placing procedures into modules and using those modules makes the interface of the procedure explicit. It allows a Fortran compiler to check for consistency between the actual arguments in a call and the dummy arguments of the procedure. This guards against a variety of programmer mistakes. An explicit interface is also necessary for certain "advanced" features of Fortran >=90; for example, optional or keyword arguments. Without the explicit interface, the compiler won't generate the correct call. Merely including a file doesn't provide these advantages.
M.S.B.'s answer is great and is probably the most important reason to prefer modules over include. I'd like to add a few more thoughts.
Using modules reduces your compiled binary size if that is something that is important to you. A module is compiled once, and when you use it you are symbolically loading that module to use the code. When you include a file, you are actually inserting the new code into your routine. If you use include a lot it can cause your binary to be large and also increase your compile time.
You can also use modules to fake OOP style coding in Fortran 90 through clever use of public and private functions and user defined types in a module. Even if you didn't want to do that, it provides a nice way to group functions that logically belong together.

Why is the java.util.Scanner class declared 'final'?

I use the Scanner class for reading multiple similar files. I would like to extend it to make sure they all use the same delimiter and I can also add methods like skipUntilYouFind(String thisHere) that all valid for them all.
I can make a utility-class that contain them, or embed the Scanner Class as a variable inside another class but this is more cumbersome.
I have found some reasons to declare a class final, but why is it done here?
Probably because extending it and overwriting some of it's methods would probably break it. And making it easier to overwrite methods would expose to much of the inner workings, so if in the future they decide to change those (for performance or some other reasons), it would be harder for them to change the class without breaking all the classes that extend it.
For example, consider the following method in the class:
public boolean nextBoolean() {
clearCaches();
return Boolean.parseBoolean(next(boolPattern()));
}
Say you want to overwrite this because you want to make 'awesome' evaluate to a 'true' boolean (for whatever reason). If you overwrite it, you can't call super.nextBoolean(), since that would consume the next token using the default logic. But if you don't call super.nextBoolean(), clearCaches() won't be called, possibly breaking the other not overwritten methods. You can't call clearCaches() because it's private. If they made it protected, but then realized that it's causing a performance problem, and wanted a new implementation that doesn't clear caches anymore, then they might break your overwritten implementation which would still be calling that.
So basically it's so they can easily change the hidden parts inside the class, which are quite complex, and protecting you from making a broken child class (or a class that could be easily be broken).
I suppose it is due to security reasons. This class reads user input, so that someone with bad intentions could extend it, modify it's behavior and you'd be screwed. If it is final, it is not that easy for the bad guy, because if he makes his own type of Scanner (not java.util.Scanner), the principles of Polymorphism would be broken. See the bad guy can be smart enough to write a bot/script which does this automatically on remote servers... He can even do it by dynamic classloading in compiled application.
I think that the link you provided explains it all.
In your case it seems like you should prefer composition instead of inheritance anyway. You are creating a utility that has some predefined behavior, and that can hide some (or all) of the details of the Scanner class.
I've seen many implementations that used inheritance in order to change a behavior. The end result was usually a monolithic design, and in some cases, a broken contract, and/or broken behavior.

Best practice for naming subclasses

I am often in a situation where I have a concept represented by an interface or class, and then I have a series of subclasses/subinterfaces which extend it.
For example:
A generic "DoiGraphNode"
A "DoiGraphNode" representing a resource
A "DoiGraphNode" representing a Java resource
A "DoiGraphNode" with an associated path, etc., etc.
I can think of three naming conventions, and would appreciate comments on how to choose.
Option 1: Always start with the name of the concept.
Thus: DoiGraphNode, DoiGraphNodeResource, DoiGraphNodeJavaResource, DoiGraphNodeWithPath, etc.
Pro: It is very clear what I am dealing with, it is easy to see all the options I have
Con: Not very natural? Everything looks the same?
Option 2: Put the special stuff in the beginning.
Thus: DoiGraphNode, ResourceDoiGraphNode, JavaResourceDoiGraphNode, PathBaseDoiGraphNode,
etc., etc.
Pro: It is very clear when I see it in the code
Con: Finding it could be difficult, especially if I don't remember the name, lack of visual consistency
Option 3: Put the special stuff and remove some of the redundant text
Thus: DoiGraphNode, ResourceNode, JavaResourceNode, GraphNodeWithPath
Pro: Not that much to write and read
Con: Looks like cr*p, very inconsistent, may conflict with other names
Name them for what they are.
If naming them is hard or ambiguous, it's often a sign that the Class is doing too much (Single Responsibility Principle).
To avoid naming conflicts, choose your namespaces appropriately.
Personnally, I'd use 3
Use whatever you like, it's a subjective thing. The important thing is to make clear what each class represents, and the names should be such that the inheritance relationships make sense. I don't really think it's all that important to encode the relationships in the names, though; that's what documentation is for (and if your names are appropriate for the objects, people should be able to make good guesses as to what inherits from what).
For what it's worth, I usually use option 3, and from my experience looking at other people's code option 2 is probably more prevalent than option 1.
You could find some guidance in a coding standards document, for example there is the IDesign document for C# here.
Personally, I prefer option 2. This is generally the way the .NET Framework names its objects. For instance look at attribute classes. They all end in Attribute (TestMethodAttribute). The same goes for EventHandlers: OnClickEventHandler is a recommended name for an event handler that handles the Click event.
I usually try to follow this in designing my own code and interfaces. Thus an IUnitWriter produces a StringUnitWriter and a DataTableUnitWriter. This way I always know what their base class is and it reads more naturally. Self-documenting code is the end-goal for all agile developers so it seems to work well for me!
I usually name similar to option 1, especially when the classes will be used polymophically.
My reasoning is that the most important bit of information is listed first.
(I.e. the fact that the subclass is basically what the ancestor is,
with (usually) extensions 'added').
I like this option also because when sorting lists of class names,
the related classes will be listed together.
I.e. I usually name the translation unit (file name) the same as
the class name so related class files will naturally be listed together.
Similarly this is useful with incremental search.
Although I tended to use option 2 earlier in my programming career, I avoid it now because as you say it is 'inconsistant' and do not seem very orthogonal.
I often use option 3 when the subclass provides substantial extension or specification, or if the names would be rather long.
For example, my file system name classes are derived from String
but they greatly extend the String class and have a significantly different
use/meaning:
Directory_entry_name derived from String adds extensive functionality.
File_name derived from Directory_entry_name has rather specialized functions.
Directory_name derived from Directory_entry_name also has rather specialized functions.
Also along with option 1, I usually use an unqualified name for an interface class.
For example I might have a class interence chain:
Text (an interface)
Text_abstract (abstract (base) generalization class)
Text_ASCII (concrete class specific for ASCII coding)
Text_unicode (concrete class specific for unicode coding)
I rather like that the interface and the abstract base class automatically appear first in the sorted list.
Option three more logically follows from the concept of inheritance. Since you're specializing the interface or class, the name should show that it's no longer using the base implementation (if one exists).
There are a multitude of tools to see what a class inherits from, so a concise name indicating the real function of the class will go farther than trying to pack too much type information into the name.

What is the best way to solve an Objective-C namespace collision?

Objective-C has no namespaces; it's much like C, everything is within one global namespace. Common practice is to prefix classes with initials, e.g. if you are working at IBM, you could prefix them with "IBM"; if you work for Microsoft, you could use "MS"; and so on. Sometimes the initials refer to the project, e.g. Adium prefixes classes with "AI" (as there is no company behind it of that you could take the initials). Apple prefixes classes with NS and says this prefix is reserved for Apple only.
So far so well. But appending 2 to 4 letters to a class name in front is a very, very limited namespace. E.g. MS or AI could have an entirely different meanings (AI could be Artificial Intelligence for example) and some other developer might decide to use them and create an equally named class. Bang, namespace collision.
Okay, if this is a collision between one of your own classes and one of an external framework you are using, you can easily change the naming of your class, no big deal. But what if you use two external frameworks, both frameworks that you don't have the source to and that you can't change? Your application links with both of them and you get name conflicts. How would you go about solving these? What is the best way to work around them in such a way that you can still use both classes?
In C you can work around these by not linking directly to the library, instead you load the library at runtime, using dlopen(), then find the symbol you are looking for using dlsym() and assign it to a global symbol (that you can name any way you like) and then access it through this global symbol. E.g. if you have a conflict because some C library has a function named open(), you could define a variable named myOpen and have it point to the open() function of the library, thus when you want to use the system open(), you just use open() and when you want to use the other one, you access it via the myOpen identifier.
Is something similar possible in Objective-C and if not, is there any other clever, tricky solution you can use resolve namespace conflicts? Any ideas?
Update:
Just to clarify this: answers that suggest how to avoid namespace collisions in advance or how to create a better namespace are certainly welcome; however, I will not accept them as the answer since they don't solve my problem. I have two libraries and their class names collide. I can't change them; I don't have the source of either one. The collision is already there and tips on how it could have been avoided in advance won't help anymore. I can forward them to the developers of these frameworks and hope they choose a better namespace in the future, but for the time being I'm searching a solution to work with the frameworks right now within a single application. Any solutions to make this possible?
Prefixing your classes with a unique prefix is fundamentally the only option but there are several ways to make this less onerous and ugly. There is a long discussion of options here. My favorite is the #compatibility_alias Objective-C compiler directive (described here). You can use #compatibility_alias to "rename" a class, allowing you to name your class using FQDN or some such prefix:
#interface COM_WHATEVER_ClassName : NSObject
#end
#compatibility_alias ClassName COM_WHATEVER_ClassName
// now ClassName is an alias for COM_WHATEVER_ClassName
#implementation ClassName //OK
//blah
#end
ClassName *myClass; //OK
As part of a complete strategy, you could prefix all your classes with a unique prefix such as the FQDN and then create a header with all the #compatibility_alias (I would imagine you could auto-generate said header).
The downside of prefixing like this is that you have to enter the true class name (e.g. COM_WHATEVER_ClassName above) in anything that needs the class name from a string besides the compiler. Notably, #compatibility_alias is a compiler directive, not a runtime function so NSClassFromString(ClassName) will fail (return nil)--you'll have to use NSClassFromString(COM_WHATERVER_ClassName). You can use ibtool via build phase to modify class names in an Interface Builder nib/xib so that you don't have to write the full COM_WHATEVER_... in Interface Builder.
Final caveat: because this is a compiler directive (and an obscure one at that), it may not be portable across compilers. In particular, I don't know if it works with the Clang frontend from the LLVM project, though it should work with LLVM-GCC (LLVM using the GCC frontend).
If you do not need to use classes from both frameworks at the same time, and you are targeting platforms which support NSBundle unloading (OS X 10.4 or later, no GNUStep support), and performance really isn't an issue for you, I believe that you could load one framework every time you need to use a class from it, and then unload it and load the other one when you need to use the other framework.
My initial idea was to use NSBundle to load one of the frameworks, then copy or rename the classes inside that framework, and then load the other framework. There are two problems with this. First, I couldn't find a function to copy the data pointed to rename or copy a class, and any other classes in that first framework which reference the renamed class would now reference the class from the other framework.
You wouldn't need to copy or rename a class if there were a way to copy the data pointed to by an IMP. You could create a new class and then copy over ivars, methods, properties and categories. Much more work, but it is possible. However, you would still have a problem with the other classes in the framework referencing the wrong class.
EDIT: The fundamental difference between the C and Objective-C runtimes is, as I understand it, when libraries are loaded, the functions in those libraries contain pointers to any symbols they reference, whereas in Objective-C, they contain string representations of the names of thsoe symbols. Thus, in your example, you can use dlsym to get the symbol's address in memory and attach it to another symbol. The other code in the library still works because you're not changing the address of the original symbol. Objective-C uses a lookup table to map class names to addresses, and it's a 1-1 mapping, so you can't have two classes with the same name. Thus, to load both classes, one of them must have their name changed. However, when other classes need to access one of the classes with that name, they will ask the lookup table for its address, and the lookup table will never return the address of the renamed class given the original class's name.
Several people have already shared some tricky and clever code that might help solve the problem. Some of the suggestions may work, but all of them are less than ideal, and some of them are downright nasty to implement. (Sometimes ugly hacks are unavoidable, but I try to avoid them whenever I can.) From a practical standpoint, here are my suggestions.
In any case, inform the developers of both frameworks of the conflict, and make it clear that their failure to avoid and/or deal with it is causing you real business problems, which could translate into lost business revenue if unresolved. Emphasize that while resolving existing conflicts on a per-class basis is a less intrusive fix, changing their prefix entirely (or using one if they're not currently, and shame on them!) is the best way to ensure that they won't see the same problem again.
If the naming conflicts are limited to a reasonably small set of classes, see if you can work around just those classes, especially if one of the conflicting classes isn't being used by your code, directly or indirectly. If so, see whether the vendor will provide a custom version of the framework that doesn't include the conflicting classes. If not, be frank about the fact that their inflexibility is reducing your ROI from using their framework. Don't feel bad about being pushy within reason — the customer is always right. ;-)
If one framework is more "dispensable", you might consider replacing it with another framework (or combination of code), either third-party or homebrew. (The latter is the undesirable worst-case, since it will certainly incur additional business costs, both for development and maintenance.) If you do, inform the vendor of that framework exactly why you decided to not use their framework.
If both frameworks are deemed equally indispensable to your application, explore ways to factor out usage of one of them to one or more separate processes, perhaps communicating via DO as Louis Gerbarg suggested. Depending on the degree of communication, this may not be as bad as you might expect. Several programs (including QuickTime, I believe) use this approach to provide more granular security provided by using Seatbelt sandbox profiles in Leopard, such that only a specific subset of your code is permitted to perform critical or sensitive operations. Performance will be a tradeoff, but may be your only option
I'm guessing that licensing fees, terms, and durations may prevent instant action on any of these points. Hopefully you'll be able to resolve the conflict as soon as possible. Good luck!
This is gross, but you could use distributed objects in order to keep one of the classes only in a subordinate programs address and RPC to it. That will get messy if you are passing a ton of stuff back and forth (and may not be possible if both class are directly manipulating views, etc).
There are other potential solutions, but a lot of them depend on the exact situation. In particular, are you using the modern or legacy runtimes, are you fat or single architecture, 32 or 64 bit, what OS releases are you targeting, are you dynamically linking, statically linking, or do you have a choice, and is it potentially okay to do something that might require maintenance for new software updates.
If you are really desperate, what you could do is:
Not link against one of the libraries directly
Implement an alternate version of the objc runtime routines that changes the name at load time (checkout the objc4 project, what exactly you need to do depends on a number of the questions I asked above, but it should be possible no matter what the answers are).
Use something like mach_override to inject your new implementation
Load the new library using normal methods, it will go through the patched linker routine and get its className changed
The above is going to be pretty labor intensive, and if you need to implement it against multiple archs and different runtime versions it will be very unpleasant, but it can definitely be made to work.
Have you considered using the runtime functions (/usr/include/objc/runtime.h) to clone one of the conflicting classes to a non-colliding class, and then loading the colliding class framework? (this would require the colliding frameworks to be loaded at different times to work.)
You can inspect the classes ivars, methods (with names and implementation addresses) and names with the runtime, and create your own as well dynamically to have the same ivar layout, methods names/implementation addresses, and only differ by name (to avoid the collision)
Desperate situations call for desperate measures. Have you considered hacking the object code (or library file) of one of the libraries, changing the colliding symbol to an alternative name - of the same length but a different spelling (but, recommendation, the same length of name)? Inherently nasty.
It isn't clear if your code is directly calling the two functions with the same name but different implementations or whether the conflict is indirect (nor is it clear whether it makes any difference). However, there's at least an outside chance that renaming would work. It might be an idea, too, to minimize the difference in the spellings, so that if the symbols are in a sorted order in a table, the renaming doesn't move things out of order. Things like binary search get upset if the array they're searching isn't in sorted order as expected.
#compatibility_alias will be able to solve class namespace conflicts, e.g.
#compatibility_alias NewAliasClass OriginalClass;
However, this will not resolve any of the enums, typedefs, or protocol namespace collisions. Furthermore, it does not play well with #class forward decls of the original class. Since most frameworks will come with these non-class things like typedefs, you would likely not be able to fix the namespacing problem with just compatibility_alias.
I looked at a similar problem to yours, but I had access to source and was building the frameworks.
The best solution I found for this was using #compatibility_alias conditionally with #defines to support the enums/typedefs/protocols/etc. You can do this conditionally on the compile unit for the header in question to minimize risk of expanding stuff in the other colliding framework.
It seems that the issue is that you can't reference headers files from both systems in the same translation unit (source file). If you create objective-c wrappers around the libraries (making them more usable in the process), and only #include the headers for each library in the implementation of the wrapper classes, that would effectively separate name collisions.
I don't have enough experience with this in objective-c (just getting started), but I believe that is what I would do in C.
Prefixing the files is the simplest solution I am aware of.
Cocoadev has a namespace page which is a community effort to avoid namespace collisions.
Feel free to add your own to this list, I believe that is what it is for.
http://www.cocoadev.com/index.pl?ChooseYourOwnPrefix
If you have a collision, I would suggest you think hard about how you might refactor one of the frameworks out of your application. Having a collision suggests that the two are doing similar things as it is, and you likely could get around using an extra framework simply by refactoring your application. Not only would this solve your namespace problem, but it would make your code more robust, easier to maintain, and more efficient.
Over a more technical solution, if I were in your position this would be my choice.
If the collision is only at the static link level then you can choose which library is used to resolve symbols:
cc foo.o -ldog bar.o -lcat
If foo.o and bar.o both reference the symbol rat then libdog will resolve foo.o's rat and libcat will resolve bar.o's rat.
Just a thought.. not tested or proven and could be way of the mark but in have you considered writing an adapter for the class's you use from the simpler of the frameworks.. or at least their interfaces?
If you were to write a wrapper around the simpler of the frameworks (or the one who's interfaces you access the least) would it not be possible to compile that wrapper into a library. Given the library is precompiled and only its headers need be distributed, You'd be effectively hiding the underlying framework and would be free to combine it with the second framework with clashing.
I appreciate of course that there are likely to be times when you need to use class's from both frameworks at the same time however, you could provide factories for further class adapters of that framework. On the back of that point I guess you'd need a bit of refactoring to extract out the interfaces you are using from both frameworks which should provide a nice starting point for you to build your wrapper.
You could build upon the library as you and when you need further functionality from the wrapped library, and simply recompile when you it changes.
Again, in no way proven but felt like adding a perspective. hope it helps :)
If you have two frameworks that have the same function name, you could try dynamically loading the frameworks. It'll be inelegant, but possible. How to do it with Objective-C classes, I don't know. I'm guessing the NSBundle class will have methods that'll load a specific class.