How does the organisation of classes in categories and packages work in different versions of Pharo? - smalltalk

Can someone explain how the organisation of classes in Pharo works in different versions of Pharo?
All Classes are part of the Smalltalk global (have always been, seem to stay like this?)
Classes can have a Category, but thats only a kind of tag? (has always been, seems to stay like this? But the categories are somehow mapped to packages sometimes?)
There are different kinds of Packages in different Versions of Pharo
MCPackages representing Monticello Packages
PackageInfo
RPackage (Pharo 1.4)?
In addition there is SystemNavigation which somehow helps navigating classes and methods based on some of the above mentioned constructs?

Classes
The fact that classes are keys in the Smalltalk global is an implementation detail. As long as there is a single global namespace for class names, it is likely that the implementation will stay the same.
Class Categories
The class category is very much like a tag. A class can only be in one category at a time. Originally the class category was used by the Browser for organizing the classes in the system.
When Monticello was created, the class category was overloaded to also indicate membership in a Monticello package theMCPackage and PackageInfo classes were created to manage this mapping.
PackageInfo does all the heavy lifting: finding the classes and loose methods that belong to a package.
MCPackage is a Monticello-specific wrapper for PackageInfo that adds some protocol that wasn't necessarily appropriate for the more general PackageInfo.
Packages
Overloading the class category for package membership was a neat trick to ease the adoption of Monticello (existing development tools didn't need to be taught Monticello), however, it is still a trick. Not to mention the fact that the implementation of PackageInfo was not very efficient.
RPackage was created to address the performance problems of PackageInfo and to be used as part of the next generation of development tools.
Both package implementations will continue to exist until PackageInfo can be phased out.
SystemNavigation
As Frank says,
SystemNavigation is a class that, as its name suggests, permits easy
querying of a number of different things: the classes in the image,
senders-of, implementors-of, information about packages loaded in the
image and so on.

Classes are, at the moment at least, the keys in the Smalltalk dictionary.
PackageInfo contains information about a grouping of classes and extensions to other packages.
A Monticello package contains a deployable unit of code. Usually one of these will correspond to a PackageInfo instance. (Hitting the "+Package" button in a Monticello Browser will create one of these, for instance.) A Monticello package may contain pre-load and post-load scripts, so the two classes perform separate, if related, functions.
SystemNavigation is a class that, as its name suggests, permits easy querying of a number of different things: the classes in the image, senders-of, implementors-of, information about packages loaded in the image and so on.

Related

Is creating a module with interfaces only a good idea?

Creating a module (bundle, package, whatever) with only interfaces seems to me a strange idea. Yet, I don't know the other best solution to solve the following architectural requirement.
There often appears a need for a set of utilities. In many projects I can see the creation of "utils" folder, or even a seperate package (module) with frequently used ones.
Now consider the idea that you don't want to depend upon a concrete utils set. Instead you, therefore, use interfaces.
So you may create the whole project, with multiple modules, dependent only on the "Utils-Interfaces" set, which could be a separate module. Then you think you can re-use it in other projects, as these utils are frequently used.
So what do you do? Create a seperate module (package, bundle...) with interfaces with definitions of the methods to be implemented by concrete utility-classes? And re-use this "glue-interfaces-packages" (possibly with other "glues", such as bridges, providers etc.) in your various other projects? Or is there a better way to design the archictecture regarding the utilities that could be easily switched from one to another?
It seems a bit odd to have an interface for utility methods as it should be clear what they do. Also in most language you won't have static dispatch anymore. And you wouldn't solve a problem by having interfaces for utility methods. I think it would make more sense to look for a library doing the same thing or writing your own if such functionality isn't already implemented. Very specific things should be tied to the project, though.
Let's look at an example in Java:
public static boolean isDigitOnly(String text) {
return "\\d+".matches(text);
}
Let's assume one would use an interface. That would mean that you have to have an instance of such an implementation, most likely a singleton. So what's the point of that? You would write the method head twice and you don't have any advantage; interfaces are used for loose coupling, however such generic utility methods aren't bound to your application.
So maybe you just want to use a library. And actually there is one for exactly this use case: Apache Commons. Of course you may not want to include such a big library for a single method. However, if you need this many utility methods you may want to use it.
Now I've explained how to use and reuse utility methods; however, a part of your question was about using different implementations.
I can't see many cases you wanted this. If, for example, you have a method specific to a certain implementation of sockets, you may instead want
A) the utility method as a part of the API
B) an interface for different socket implementations on which you have one common utility method
If you cannot apply this to your problem, it's probably not a utility method or I didn't consider it. If you could provide me with a more specific problem I'd be happy to give you a more concrete answer.

When exactly does a class/package depend on another?

Many articles/books/.... talk about class or package dependency, few explain what it is. I did find some definitions, but they vary and probably don't cover all cases. E.g.:
"when one class uses another concrete class within its implementation" (so there exists no dependency on an interface?)
"when a class uses another as a variable" (what about inheritance?)
"if changes to the definition of one element may cause changes to the other" (so dependency is a transitive relationship not just on packages, but also on class level?)
"the degree to which each program module relies on each one of the other modules" (but how do you define "relies"?)
Further aspects to consider are method parameters, dependency injection, aspect oriented programming, generics. Any more aspects?
So, can you give a (formal) definition for dependency amongst classes and amongst packages that is fool-proof and covers all these cases and aspects?
If you are asking for dependency in the context of inversion of control or dependency injection, well, you're probably interested in classes that interact with one another directly. That means mostly constructor parameters and properties.
In the context of a UML domain diagram, you're probably interested in "real world" dependency. A dog needs food. That's a dependency. The dog's Bark() method returns a Sound object: that's not something you're interested in, in a UML domain model. The dog doesn't depend on sounds to exist.
You could go philosophical on this also: All classes depend on each other to accomplish a common goal; a (hopefully) great software.
So, all in all, dependency or coupling is not a matter of yes or no. It really depends on the context and on a degree of coupling (weak, strong). I thinks that explains why there are some many divergent definition of dependency.
I wrote a blog post on that topic a while ago: Understanding Code: Static vs Dynamic Dependencies. Basically you need to make a difference between static dependencies, those that are resolved by the compiler at compile-time, and dynamic dependencies, those that are resolved by the runtime (JVM or CLR) at run-time.
static dependencies are typically provoked by calls to static/final methods, read/write to a field, in the definition of the class C the implementation of the interface I by C ... all these associations between code elements that can be found explicitly in the bytecode and source code.
dynamic dependencies are typically provoked by everything that abstracts a method call at compile time, like calls to abstract/virtual methods (polymorphism), variables or parameters typed with an interface (the implementation class is abstracted at compile-time), but also delegates (.NET) or pointers to function (C++).
Most of the time, when you'll read about dependencies in the literature, they are talking about static dependencies.
A static dependencies is direct (meaning not transitive). A tool like NDepend that I mention in the blog post, can also infer indirect (or call it transitive) static dependencies from the set of direct static dependencies.
The idea I defend in the blog post is that when it comes to understand and maintain a program, one needs to focus mostly on the static dependencies, the ones found in the source code.. Indeed, abstractions facilities are used to, well ... abstract, implementation for callers. This makes source code much more easy to develop and maintain. There are however situations, typically at debugging time, where one needs to know what's really behind an abstraction at run-time.
This post is about static dependency - for dynamic dependency and the difference, see
Patrick Smacchia's answer.
In an easy to understand way: an entity (class or package) A depends on an entity B when A cannot be used standalone without B.
Inheritance, aggregation, composition, all of them introduces dependency between related entities.
so there exists no dependency on an interface?
there is, but interface only serves as the glue.
what about inheritance?
see above.
so dependency is a transitive relationship not just on packages, but also on class level?
yep.
but how do you define "relies"?
see above "easy to understand" definition. also related to the 3rd definition you posted.
Update:
So if you have interface A in Package P1, and class C in Package P2 uses A as
method parameter, or
local variable woven into C via AOP, or
class C implements A, or
class C<E extends A>,
then C depends on A and P2 depends on P1.
But if interface A is implemented by class B and class C programs against the interface A and only uses B via dependency injection, then C still (statically!) only depends on A, not on B, because the point of dependency injection is that it doesn't make glued components dependent.

Best way to extend Pharo Smalltalk class behavior?

I want to extend the String class with a method to create a url slug out of a string. I found a link here that shows how you can move extensions to their own package:
Smalltalk Daily 07/13/10: Extending Behavior II.
However, I can't find any "move to package" option in Pharo Smalltalk. Is it ok to just extend the core class with the new method, or is there a better way?
In Pharo or Squeak put the extension methods for MyPackage in a method category called *mypackage (or if you want to be more descriptive *mypackage-slug).
The methods in these categories belong automatically to the MyPackage package (at least from the Monticello point of view)
"Is it ok to just extend the core class with the new method, or is there a better way?"
There are tradeoffs to this decision. In fact, Pharo had String>>asUrl until very recently, when it was removed as part of cleaning the system. On one hand, it is considered bad style by some (see Kent Beck's Best Practices) to have conversion methods between objects that do not have similar protocols (are semantically similar). Additionally, this leads to bloated core classes (like String and Object). However, in your own application, there may be a good reason that balances these factors, and since you are packaging it with your app, and not with the system, rock out.
In pharo 7, * is forbidden.
A message tells you have to tick the extension checkbox in method edition pane.
If you do so, you can choose your package.

Naming convention and structure for utility classes and methods

Do you have any input on how to organize and name utility classes?
Whenever I run in to some code-duplication, could be just a couple of code lines, I move them to a utility class.
After a while, I tend to get a lot of small static classes, usually with only one method, which I usualy put in a utility namespace that gets bloated with classes.
Examples:
ParseCommaSeparatedIntegersFromString( string )
CreateCommaSeparatedStringFromIntegers( int[] )
CleanHtmlTags( string )
GetListOfIdsFromCollectionOfX( CollectionX )
CompressByteData( byte[] )
Usually, naming conventions tell you to name your class as a Noun. I often end up with a lot of classes like HtmlHelper, CompressHelper but they aren't very informative. I've also tried being really specific like HtmlTagCleaner, which usualy ends up with one class per utility method.
Have you any ideas on how to name and group these helper methods?
I believe there is a continuum of complexity, therefore corresponding organizations. Examples follow, choose depending of the complexity of your project and your utilities, and adapt to other constraints :
One class (called Helper), with a few methods
One package (called helper), with a few classes (called XXXHelper), each class with a few methods.
Alternatively, the classes may be split in several non-helper packages if they fit.
One project (called helper), with a few packages (called XXX), each package with ...
Alternatively, the packages can be split in several non-helper packages if they fit.
Several helper projects (split by tier, by library in use or otherwise)...
At each grouping level (package, class) :
the common part of the meaning is the name of the grouping name
inner codes don't need that meaning anymore (so their name is shorter, more focused, and doesn't need abbreviations, it uses full names).
For projects, I usually repeat the common meaning in a superpackage name. Although not my prefered choice in theory, I don't see in my IDE (Eclipse) from which project a class is imported, so I need the information repeated. The project is actually only used as :
a shipping unit : some deliverables or products will have the jar, those that don't need it won't),
to express dependencies : for example, a business project have no dependency on web tier helpers ; having expressed that in projects dependencies, we made an improvement in apparent complexity, good for us ; or finding such a dependency, we know something is wrong, and start to investigate... ; also, by reducing the dependencies, we may accelerate compilation and building ....
to categorize the code, to find it faster : only when it's huge, I'm talking about thousands of classes in the project
Please note that all the above applies to dynamic methods as well, not only static ones.
It's actually our good practices for all our code.
Now that I tried to answer your question (although in a broad way), let me add another thought
(I know you didn't ask for that).
Static methods (except those using static class members) work without context, all data have to be passed as parameters. We all know that, in OO code, this is not the preferred way. In theory, we should look for the object most relevant to the method, and move that method on that object. Remember that code sharing doesn't have to be static, it only has to be public (or otherwise visible).
Examples of where to move a static method :
If there is only one parameter, to that parameter.
If there are several parameters, choose between moving the method on :
the parameter that is used most : the one with several fields or methods used, or used by conditionals (ideally, some conditionnals would be removed by subclasses overriding) ...
one existing object that has already good access to several of the parameters.
build a new class for that need
Although this method moving may seem for OO-purist, we find this actually helps us in the long run (and it proves invaluable when we want to subclass it, to alter an algorithm). Eclipse moves a method in less than a minute (with all verifications), and we gain so much more than a minute when we look for some code, or when we don't code again a method that was coded already.
Limitations : some classes can't be extended, usually because they are out of control (JDK, libraries ...). I believe this is the real helper justification, when you need to put a method on a class that you can't change.
Our good practice then is to name the helper with the name of the class to extend, with Helper suffix. (StringHelper, DateHelper). This close matching between the class where we would like the code to be and the Helper helps us find those method in a few seconds, even without knowledge if someone else in our project wrote that method or not.
Helper suffix is a good convention, since it is used in other languages (at least in Java, IIRC rails use it).
The intent of your helper should be transported by the method name, and use the class only as placeholder. For example ParseCommaSeparatedIntegersFromString is a bad name for a couple of reasons:
too long, really
it is redundant, in a statically typed language you can remove FromString suffix since it is deduced from signature
What do you think about:
CSVHelper.parse(String)
CSVHelper.create(int[])
HTMLHelper.clean(String)
...

What is the best way to solve an Objective-C namespace collision?

Objective-C has no namespaces; it's much like C, everything is within one global namespace. Common practice is to prefix classes with initials, e.g. if you are working at IBM, you could prefix them with "IBM"; if you work for Microsoft, you could use "MS"; and so on. Sometimes the initials refer to the project, e.g. Adium prefixes classes with "AI" (as there is no company behind it of that you could take the initials). Apple prefixes classes with NS and says this prefix is reserved for Apple only.
So far so well. But appending 2 to 4 letters to a class name in front is a very, very limited namespace. E.g. MS or AI could have an entirely different meanings (AI could be Artificial Intelligence for example) and some other developer might decide to use them and create an equally named class. Bang, namespace collision.
Okay, if this is a collision between one of your own classes and one of an external framework you are using, you can easily change the naming of your class, no big deal. But what if you use two external frameworks, both frameworks that you don't have the source to and that you can't change? Your application links with both of them and you get name conflicts. How would you go about solving these? What is the best way to work around them in such a way that you can still use both classes?
In C you can work around these by not linking directly to the library, instead you load the library at runtime, using dlopen(), then find the symbol you are looking for using dlsym() and assign it to a global symbol (that you can name any way you like) and then access it through this global symbol. E.g. if you have a conflict because some C library has a function named open(), you could define a variable named myOpen and have it point to the open() function of the library, thus when you want to use the system open(), you just use open() and when you want to use the other one, you access it via the myOpen identifier.
Is something similar possible in Objective-C and if not, is there any other clever, tricky solution you can use resolve namespace conflicts? Any ideas?
Update:
Just to clarify this: answers that suggest how to avoid namespace collisions in advance or how to create a better namespace are certainly welcome; however, I will not accept them as the answer since they don't solve my problem. I have two libraries and their class names collide. I can't change them; I don't have the source of either one. The collision is already there and tips on how it could have been avoided in advance won't help anymore. I can forward them to the developers of these frameworks and hope they choose a better namespace in the future, but for the time being I'm searching a solution to work with the frameworks right now within a single application. Any solutions to make this possible?
Prefixing your classes with a unique prefix is fundamentally the only option but there are several ways to make this less onerous and ugly. There is a long discussion of options here. My favorite is the #compatibility_alias Objective-C compiler directive (described here). You can use #compatibility_alias to "rename" a class, allowing you to name your class using FQDN or some such prefix:
#interface COM_WHATEVER_ClassName : NSObject
#end
#compatibility_alias ClassName COM_WHATEVER_ClassName
// now ClassName is an alias for COM_WHATEVER_ClassName
#implementation ClassName //OK
//blah
#end
ClassName *myClass; //OK
As part of a complete strategy, you could prefix all your classes with a unique prefix such as the FQDN and then create a header with all the #compatibility_alias (I would imagine you could auto-generate said header).
The downside of prefixing like this is that you have to enter the true class name (e.g. COM_WHATEVER_ClassName above) in anything that needs the class name from a string besides the compiler. Notably, #compatibility_alias is a compiler directive, not a runtime function so NSClassFromString(ClassName) will fail (return nil)--you'll have to use NSClassFromString(COM_WHATERVER_ClassName). You can use ibtool via build phase to modify class names in an Interface Builder nib/xib so that you don't have to write the full COM_WHATEVER_... in Interface Builder.
Final caveat: because this is a compiler directive (and an obscure one at that), it may not be portable across compilers. In particular, I don't know if it works with the Clang frontend from the LLVM project, though it should work with LLVM-GCC (LLVM using the GCC frontend).
If you do not need to use classes from both frameworks at the same time, and you are targeting platforms which support NSBundle unloading (OS X 10.4 or later, no GNUStep support), and performance really isn't an issue for you, I believe that you could load one framework every time you need to use a class from it, and then unload it and load the other one when you need to use the other framework.
My initial idea was to use NSBundle to load one of the frameworks, then copy or rename the classes inside that framework, and then load the other framework. There are two problems with this. First, I couldn't find a function to copy the data pointed to rename or copy a class, and any other classes in that first framework which reference the renamed class would now reference the class from the other framework.
You wouldn't need to copy or rename a class if there were a way to copy the data pointed to by an IMP. You could create a new class and then copy over ivars, methods, properties and categories. Much more work, but it is possible. However, you would still have a problem with the other classes in the framework referencing the wrong class.
EDIT: The fundamental difference between the C and Objective-C runtimes is, as I understand it, when libraries are loaded, the functions in those libraries contain pointers to any symbols they reference, whereas in Objective-C, they contain string representations of the names of thsoe symbols. Thus, in your example, you can use dlsym to get the symbol's address in memory and attach it to another symbol. The other code in the library still works because you're not changing the address of the original symbol. Objective-C uses a lookup table to map class names to addresses, and it's a 1-1 mapping, so you can't have two classes with the same name. Thus, to load both classes, one of them must have their name changed. However, when other classes need to access one of the classes with that name, they will ask the lookup table for its address, and the lookup table will never return the address of the renamed class given the original class's name.
Several people have already shared some tricky and clever code that might help solve the problem. Some of the suggestions may work, but all of them are less than ideal, and some of them are downright nasty to implement. (Sometimes ugly hacks are unavoidable, but I try to avoid them whenever I can.) From a practical standpoint, here are my suggestions.
In any case, inform the developers of both frameworks of the conflict, and make it clear that their failure to avoid and/or deal with it is causing you real business problems, which could translate into lost business revenue if unresolved. Emphasize that while resolving existing conflicts on a per-class basis is a less intrusive fix, changing their prefix entirely (or using one if they're not currently, and shame on them!) is the best way to ensure that they won't see the same problem again.
If the naming conflicts are limited to a reasonably small set of classes, see if you can work around just those classes, especially if one of the conflicting classes isn't being used by your code, directly or indirectly. If so, see whether the vendor will provide a custom version of the framework that doesn't include the conflicting classes. If not, be frank about the fact that their inflexibility is reducing your ROI from using their framework. Don't feel bad about being pushy within reason — the customer is always right. ;-)
If one framework is more "dispensable", you might consider replacing it with another framework (or combination of code), either third-party or homebrew. (The latter is the undesirable worst-case, since it will certainly incur additional business costs, both for development and maintenance.) If you do, inform the vendor of that framework exactly why you decided to not use their framework.
If both frameworks are deemed equally indispensable to your application, explore ways to factor out usage of one of them to one or more separate processes, perhaps communicating via DO as Louis Gerbarg suggested. Depending on the degree of communication, this may not be as bad as you might expect. Several programs (including QuickTime, I believe) use this approach to provide more granular security provided by using Seatbelt sandbox profiles in Leopard, such that only a specific subset of your code is permitted to perform critical or sensitive operations. Performance will be a tradeoff, but may be your only option
I'm guessing that licensing fees, terms, and durations may prevent instant action on any of these points. Hopefully you'll be able to resolve the conflict as soon as possible. Good luck!
This is gross, but you could use distributed objects in order to keep one of the classes only in a subordinate programs address and RPC to it. That will get messy if you are passing a ton of stuff back and forth (and may not be possible if both class are directly manipulating views, etc).
There are other potential solutions, but a lot of them depend on the exact situation. In particular, are you using the modern or legacy runtimes, are you fat or single architecture, 32 or 64 bit, what OS releases are you targeting, are you dynamically linking, statically linking, or do you have a choice, and is it potentially okay to do something that might require maintenance for new software updates.
If you are really desperate, what you could do is:
Not link against one of the libraries directly
Implement an alternate version of the objc runtime routines that changes the name at load time (checkout the objc4 project, what exactly you need to do depends on a number of the questions I asked above, but it should be possible no matter what the answers are).
Use something like mach_override to inject your new implementation
Load the new library using normal methods, it will go through the patched linker routine and get its className changed
The above is going to be pretty labor intensive, and if you need to implement it against multiple archs and different runtime versions it will be very unpleasant, but it can definitely be made to work.
Have you considered using the runtime functions (/usr/include/objc/runtime.h) to clone one of the conflicting classes to a non-colliding class, and then loading the colliding class framework? (this would require the colliding frameworks to be loaded at different times to work.)
You can inspect the classes ivars, methods (with names and implementation addresses) and names with the runtime, and create your own as well dynamically to have the same ivar layout, methods names/implementation addresses, and only differ by name (to avoid the collision)
Desperate situations call for desperate measures. Have you considered hacking the object code (or library file) of one of the libraries, changing the colliding symbol to an alternative name - of the same length but a different spelling (but, recommendation, the same length of name)? Inherently nasty.
It isn't clear if your code is directly calling the two functions with the same name but different implementations or whether the conflict is indirect (nor is it clear whether it makes any difference). However, there's at least an outside chance that renaming would work. It might be an idea, too, to minimize the difference in the spellings, so that if the symbols are in a sorted order in a table, the renaming doesn't move things out of order. Things like binary search get upset if the array they're searching isn't in sorted order as expected.
#compatibility_alias will be able to solve class namespace conflicts, e.g.
#compatibility_alias NewAliasClass OriginalClass;
However, this will not resolve any of the enums, typedefs, or protocol namespace collisions. Furthermore, it does not play well with #class forward decls of the original class. Since most frameworks will come with these non-class things like typedefs, you would likely not be able to fix the namespacing problem with just compatibility_alias.
I looked at a similar problem to yours, but I had access to source and was building the frameworks.
The best solution I found for this was using #compatibility_alias conditionally with #defines to support the enums/typedefs/protocols/etc. You can do this conditionally on the compile unit for the header in question to minimize risk of expanding stuff in the other colliding framework.
It seems that the issue is that you can't reference headers files from both systems in the same translation unit (source file). If you create objective-c wrappers around the libraries (making them more usable in the process), and only #include the headers for each library in the implementation of the wrapper classes, that would effectively separate name collisions.
I don't have enough experience with this in objective-c (just getting started), but I believe that is what I would do in C.
Prefixing the files is the simplest solution I am aware of.
Cocoadev has a namespace page which is a community effort to avoid namespace collisions.
Feel free to add your own to this list, I believe that is what it is for.
http://www.cocoadev.com/index.pl?ChooseYourOwnPrefix
If you have a collision, I would suggest you think hard about how you might refactor one of the frameworks out of your application. Having a collision suggests that the two are doing similar things as it is, and you likely could get around using an extra framework simply by refactoring your application. Not only would this solve your namespace problem, but it would make your code more robust, easier to maintain, and more efficient.
Over a more technical solution, if I were in your position this would be my choice.
If the collision is only at the static link level then you can choose which library is used to resolve symbols:
cc foo.o -ldog bar.o -lcat
If foo.o and bar.o both reference the symbol rat then libdog will resolve foo.o's rat and libcat will resolve bar.o's rat.
Just a thought.. not tested or proven and could be way of the mark but in have you considered writing an adapter for the class's you use from the simpler of the frameworks.. or at least their interfaces?
If you were to write a wrapper around the simpler of the frameworks (or the one who's interfaces you access the least) would it not be possible to compile that wrapper into a library. Given the library is precompiled and only its headers need be distributed, You'd be effectively hiding the underlying framework and would be free to combine it with the second framework with clashing.
I appreciate of course that there are likely to be times when you need to use class's from both frameworks at the same time however, you could provide factories for further class adapters of that framework. On the back of that point I guess you'd need a bit of refactoring to extract out the interfaces you are using from both frameworks which should provide a nice starting point for you to build your wrapper.
You could build upon the library as you and when you need further functionality from the wrapped library, and simply recompile when you it changes.
Again, in no way proven but felt like adding a perspective. hope it helps :)
If you have two frameworks that have the same function name, you could try dynamically loading the frameworks. It'll be inelegant, but possible. How to do it with Objective-C classes, I don't know. I'm guessing the NSBundle class will have methods that'll load a specific class.