Ast representation of lambda function - smalltalk

I'm developing and Abstract Syntax Tree meta-model for a smalltalk and right now I have troubles with modeling a blocks. They are sort of literals but on the other hand they are behavioral entities like methods. Blocks are sort of lambda functions so maybe someone had better practice of working with them.
I'll be thankful for any advice.

The Refactoring Browser has a very nice AST, have a look at its implementation.
Regarding your question: The Refactoring Browser extracts the shared parts of blocks and methods into a separate node type called SequenceNode. The sequence node models the temps and the sequence of statements. The block node then wraps the sequence node, adds the arguments, and inherits the shared behaviour of value nodes. The method node wraps the sequence node and adds method name, arguments, pragmas, etc.

Related

Hacklang : why were container classes replaced with built-in types?

Just a quote from hack documentation :
Legacy Vector, Map, and Set
These container types should be avoided in new code; use dict,
keyset, and vec instead.
Early in Hack's life, the library provided mutable and immutable
generic class types called: Vector, ImmVector, Map, ImmMap, Set, and
ImmSet. However, these have been replaced by vec, dict, and keyset,
whose use is recommended in all new code. Each generic type had a
corresponding literal form. For example, a variable of type
Vector might be initialized using Vector {22, 33, $v}, where $v
is a variable of type int.
I wonder why this change was made.
I mean, one of PHP weaknesses is that it has bad oop standard library.
Ex : str_replace and array_values methods are outside of the string/array type itself. The PHP standard library is not consistent, sometimes we must pass the array as the first parameter, other times as the second...
I was glad to see that Hack introduced true OOP encapsulation for collections.
Do you know why they stepped back and wrote utility classes such as C\, Dict\, Keyset\ and Vec\ ?
Will there be in the future an addition to add methods to built-in types (ex : Str\starts_with => "toto"->startsWith("t")) ?
Based on Dwayne Reeves' blog post introducing HSL, it seems that the main advantage is the fact that arrays are native values, not objects. This has two important consequences:
For users, the semantics are different when the values cross through arguments. Objects are passed as references, and mutations affect the original object. On the other hand, values are copied on write after passing through arguments, so without references (which are finally to be completely banned in Hack) the callee can't mutate the value of the caller, with the exception of the much stricter inout parameters.
The article cites the invariance of the mutable containers (Vector, Set, etc.) and generally how shared mutable state couples functions closer together. The soundness issues as discussed in the article are somewhat moot because there were also immutable object containers (ImmVector, ImmSet, etc.), although since these interfaces were written in userland, variance boxed the function type signature into tight constraints. There are tangible differences from this: ImmMap<Tk, +Tv> is invariant in Tk solely because of the (function(Tk): Tv) getter. Meanwhile, dict<+Tk, +Tv> is covariant in both type parameters thanks to the inherent mutation protection from copy-on-write.
For the compiler, static values can be allocated quickly and persist over the lifetime of the server. Objects on the other hand have arbitrarily complicated construction routines in general, and the collection objects weren't going to be special-cased it seems.
I will also mention that for most use cases, there is minimal difference even in code style: e.g. the -> reference chains can be directly replaced with the |> pipe operator. There is also no longer a boundary between the privileged "standard functions" and custom user functions on collection types. Finally, the collection types were final of course, so their objective nature didn't offer any actual hierarchical or polymorphic advantages to the end user anyways.

OOP and object parametrization

I am supposed to develop a program, which will heavily depend on input data at runtime (data for initialization, read from XML) and I would like to ask for good OOP practice regarding object/architecture design.
Situation
I have the following objects, object_A, object_B, object_C, each of them has a specified objective.
object_A = evaluation of equations, requires input, produces output
object_B = evaluation of equations, requires input, produces output
object_C = requires data from object_A and object_B as input, produces output
Then there is object_D, which passes the data and calls functions among these objects_A/B/C.
There are 2 ways to tackle this situation that I know of :
a) Inheritance
object_D inherits from object_A, object_B, object_C. Data are passed by appointing appropriate structures in objects_A/_B/_C using "this->", virtual functions in objects_A/_B/_C can then call back to object_D.
hierarchical approach
objects are concealed
difficult to parametrize the object_A/_B/_C (parameters need to travel all the way up in the hierarchy to base classes)
b) Passing pointers
Create object_A/_B/_C, by passing parameters in constructor. Then pass pointers of these objects to constructor of object_D.
no information hiding, all objects are visible
hierarchy might be unclear, especially when there are more levels
easy to pass initialization parameters
Question
What is an appropriate way of handling software architecture, where many objects require passing initialization parameters at runtime?
I think your question is broad and can have more than one good answer. However, I think your scenario can be solved in one of two ways:
Eventing: Instead of tightly coupling your classes using inheritance, you can use events. For instance when Object A finishes processing it raise an event called 'ClassAFinished'. Then you have to create an event handler for ClassAFinish Event that will in turns pass objectA's output to other objects that rely on Object A output.
Second way is Chain of Responsibility design pattern. Since your question is related to OOP I think it's reasonable to use this design pattern. In a nutshell Chain of Responsibility is a design pattern that you use it when you have a series (chain) of objects, each of which will do specific processing (responsibility), but each one of them can't begin processing until it received data from the previous object. When it finishes processing it'll send its output to the next object and so forth.
These are 2 main ideas that I wanted to share with you.

Neo4j - Wrapping/Inheriting a Node object

I'm developing kind of a social network with neo4j, and i wanted to make my Node object a bit more specific for my own needs. Does it considered a good practice to wrap a neo4j Node object or to inherit from it?
My problem with the wrapping approach arises when indexing the nodes objects with the built in Lucene engine. For example, what benefits will i earn if i'll wrap my Node object with a "Profile" class (with methods such as "addFriend", "setFirstName", etc..), but on the other hand, whenever i will run a query against my index i'll get back raw Node objects and not my wrapped objects? I can make some dirty solution for this case, by saving a reference for the wrapped object inside my node properties, but it looks very strange for me to do it.
What would you recommend to do in such case, in order to get a clean and well designed code?
Thanks.
I have found that wrapping a Node does not lead to very maintainable code/design. As you mentioned, one thing you need to take care of is not returning a Node but translating it to your domain object.
If your object has mostly getX methods, then you can just execute Cypher queries, compose your domain object(s) and return those. You don't even need to wrap the Node in this case- all you need is some property that you can use to look up the Node.
If you have setX methods, then you can update the Node via Cypher statements either via a save that updates all properties or on each setX (not great, as you'd be updating too often the setX method now implies persistence). Either of the two approaches does not require the Node to be wrapped.
I tried in earlier projects to wrap the Node but found that it leads to much more trouble and a generally smelly design. Now I work with pure domain POJOS's and keep Neo4j code in the persistence layer only, and this works much better for me. You haven't mentioned which language you're using- if Java, then I believe Spring Data can take care of a lot of boilerplate code.
Put your search code INTO the class they belongs to.
If you need to get, I don't know, something like getFriends from a Post class, you will create the method fromPosts into the Person class, and the getFriends method into Post.
From post, you will call the query from Person class, execute the query and return an Array / List of the nodes mapped into the Person class.
So your getFriends method into the Post class will be something like:
Person.fromPosts(self).results.map { |node| Person.new(node) }
Is simple to do that doing just a map of the result with a Person.new (or new Person, depend from which language are you using) and pass the node to the Person. This means that you must have a new method that populate object from a node.
Spring Data Neo4j is the definitive solution to your need, it maps annotated entity classes to Neo4j with advanced mapping functionality and provides access to nodes and relationships at different levels of abstraction.

Preserve Whole Object VS Don't Look For Things

I was reading Fowler's Refactoring Book and saw Preserve Whole Object. A different, newer opinion says that this refactoring is the exact opposite of what you should do: The Clean Code Talks - Don't Look For Things!.
Fowler does mention that you should look to see if the method can just be moved to the class which uses the large list of arguments. I think that would be the only reasonable alternative. This refactoring seems like a band-aid for a poorly defined method.
The Fowler source material is a bit dated. Is the prevailing wisdom to let this technique go the way of the dodo or is there actually a case when you'd want to do this kind of refactoring? Or have I misunderstood the test-driven style because those examples deal with object construction, not message sending?
There are many concepts in the Object Oriented Design such as Patterns, Principles and Practices that may seem to be either similar or contradictory at first. In fact, most of them are neither similar nor contradictory. And the thing that makes them different and consistent is their intent.
The seeming contradiction between the Preserve Whole Object refactoring and the Service Locator pattern mentioned in The Clean Code Talks video occurs when they are treated as a same concept, although they are different in their intent and their essence.
The Preserve Whole Object refactoring is simply a technique used to make code easier to read, understand and maintain by reducing the number of arguments to a function. The Service Locator, on the other hand, is a design pattern that is used to manage dependencies between different components in a system using the Inversion of Control concept. Unlike the Preserve Whole Object refactoring technique which has local effect on a system, that is applied to a small part of the system (a function), the Service Locator pattern has a global effect on the system and addresses a bigger architectural problem (Dependency Management).
When to use the Preserve Whole Object refactoring?
Use the Preserve Whole Object refactoring when You have two or more arguments to a function which are basically the properties of one object, so pass the object instead.
There is a similar concept called Parameter Object (aka Argument Object) (Introduce Parameter Object refactoring) which states that if You have a group of parameters that are not the properties of one object but are conceptually related to each other or naturally go together, wrap them with a class of their own and pass the instance of that class instead. It is mainly used when sending a message to an object.
A quote from Clean Code, Chapter 3: Functions, Function Arguments, page 43 (Robert C. Martin):
Argument Objects
When a function seems to need more than two or three arguments, it is likely that some of
those arguments ought to be wrapped into a class of their own. Consider, for example, the
difference between the two following declarations:
Circle makeCircle(double x, double y, double radius);
Circle makeCircle(Point center, double radius);
Reducing the number of arguments by creating objects out of them may seem like
cheating, but it’s not. When groups of variables are passed together, the way x and
y are in the example above, they are likely part of a concept that deserves a name of its
own.
When to use the Service Locator pattern?
Use the Service Locator pattern when Your class has dependencies that are not conceptually related and You do not want Your class to depend on concrete implementations. Actually, this is when You would want to use any of the Dependency Management approaches. Another alternative is the Dependency Injection approach which explicitly specifies all the dependencies as a separate arguments to the constructor. Whereas the Service Locator pass all the dependencies in a single container object. In fact, it is this very similarity between the Service Locator pattern and the Preserve Whole Object refactoring of combining the arguments in a single object that serves as a source of confusion. The Dependency Management techniques are mainly used in object construction.
There are pros and cons to both approaches of the Dependency Management which are discussed in the Inversion of Control Containers and the Dependency Injection pattern article by Martin Fowler.
When to use both?
Sometimes there will be situations where Your class will have two or more dependencies that are conceptually related and You might want to combine them in a single object and pass it as a dependency using the Service Locator. So, as You can see these two concepts are not mutually exclusive.

Is a function an example of encapsulation?

By putting functionality into a function, does that alone constitute an example of encapsulation or do you need to use objects to have encapsulation?
I'm trying to understand the concept of encapsulation. What I thought was if I go from something like this:
n = n + 1
which is executed out in the wild as part of a big body of code and then I take that, and put it in a function such as this one, then I have encapsulated that addition logic in a method:
addOne(n)
n = n + 1
return n
Or is it more the case that it is only encapsulation if I am hiding the details of addOne from the outside world - like if it is an object method and I use an access modifier of private/protected?
I will be the first to disagree with what seems to be the answer trend. Yes, a function encapsulates some amount of implementation. You don't need an object (which I think you use to mean a class).
See Meyers too.
Perhaps you are confusing abstraction with encapsulation, which is understood in the broader context of object orientation.
Encapsulation properly includes all three of the following:
Abstraction
Implementation Hiding
Division of Responsibility
Abstraction is only one component of encapsulation. In your example you have abstracted the adding functionality from the main body of code in which it once resided. You do this by identifying some commonality in the code - recognizing a concept (addition) over a specific case (adding the number one to the variable n). Because of this ability, abstraction makes an encapsulated component - a method or an object - reusable.
Equally important to the notion of encapsulation is the idea of implementation hiding. This is why encapsulation is discussed in the arena of object orientation. Implementation hiding protects an object from its users and vice versa. In OO, you do this by presenting an interface of public methods to the users of your object, while the implementation of the object takes place inside private methods.
This serves two benefits. First, by limiting access to your object, you avoid a situation where users of the object can leave the object in an invalid state. Second, from the user's perspective, when they use your object they are only loosely coupled to it - if you change your implementation later on, they are not impacted.
Finally, division of responsility - in the broader context of an OO design - is something that must be considered to address encapsulation properly. It's no use encapsulating a random collection of functions - responsibility needs to be cleanly and logically defined so that there is as little overlap or ambiguity as possible. For example, if we have a Toilet object we will want to wall off its domain of responsibilities from our Kitchen object.
In a limited sense, though, you are correct that a function, let's say, 'modularizes' some functionality by abstracting it. But, as I've said, 'encapsulation' as a term is understood in the broader context of object orientation to apply to a form of modularization that meets the three criteria listed above.
Sure it is.
For example, a method that operates only on its parameters would be considered "better encapsulated" than a method that operates on global static data.
Encapsulation has been around long before OOP :)
A method is no more an example of encapsulation than a car is an example of good driving. Encapsulation isn't about the synax, it is a logical design issue. Both objects and methods can exhibit good and bad encapsulation.
The simplest way to think about it is whether the code hides/abstracts the details from other parts of the code that don't have a need to know/care about the implementation.
Going back to the car example:
Automatic transmission offers good encapsulation: As a driver you care about forward/back and speed.
Manual Transmission is bad encapsulation: From the driver's perspective the specific gear required for low/high speeds is generally irrelevant to the intent of the driver.
No, objects aren't required for encapsulation. In the very broadest sense, "encapsulation" just means "hiding the details from view" and in that regard a method is encapsulating its implementation details.
That doesn't really mean you can go out and say your code is well-designed just because you divided it up into methods, though. A program consisting of 500 public methods isn't much better than that same program implemented in one 1000-line method.
In building a program, regardless of whether you're using object oriented techniques or not, you need to think about encapsulation at many different places: hiding the implementation details of a method, hiding data from code that doesn't need to know about it, simplifying interfaces to modules, etc.
Update: To answer your updated question, both "putting code in a method" and "using an access modifier" are different ways of encapsulating logic, but each one acts at a different level.
Putting code in a method hides the individual lines of code that make up that method so that callers don't need to care about what those lines are; they only worry about the signature of the method.
Flagging a method on a class as (say) "private" hides that method so that a consumer of the class doesn't need to worry about it; they only worry about the public methods (or properties) of your class.
The abstract concept of encapsulation means that you hide implementation details. Object-orientation is but one example of the use of ecnapsulation. Another example is the language called module-2 that uses (or used) implementation modules and definition modules. The definition modules hid the actual implementation and therefore provided encapsulation.
Encapsulation is used when you can consider something a black box. Objects are a black box. You know the methods they provide, but not how they are implemented.
[EDIT]
As for the example in the updated question: it depends on how narrow or broad you define encapsulation. Your AddOne example does not hide anything I believe. It would be information hiding/encapsulation if your variable would be an array index and you would call your method moveNext and maybe have another function setValue and getValue. This would allow people (together maybe with some other functions) to navigate your structure and setting and getting variables with them being aware of you using an array. If you programming language would support other or richer concepts you could change the implementation of moveNext, setValue and getValue with changing the meaning and the interface. To me that is encapsulation.
It's a component-level thing
Check this out:
In computer science, Encapsulation is the hiding of the internal mechanisms and data structures of a software component behind a defined interface, in such a way that users of the component (other pieces of software) only need to know what the component does, and cannot make themselves dependent on the details of how it does it. The purpose is to achieve potential for change: the internal mechanisms of the component can be improved without impact on other components, or the component can be replaced with a different one that supports the same public interface.
(I don't quite understand your question, let me know if that link doesn't cover your doubts)
Let's simplify this somewhat with an analogy: you turn the key of your car and it starts up. You know that there's more to it than just the key, but you don't have to know what is going on in there. To you, key turn = motor start. The interface of the key (that is, e.g., the function call) hides the implementation of the starter motor spinning the engine, etc... (the implementation). That's encapsulation. You're spared from having to know what's going on under the hood, and you're happy for it.
If you created an artificial hand, say, to turn the key for you, that's not encapsulation. You're turning the key with additional middleman cruft without hiding anything. That's what your example reminds me of - it's not encapsulating implementation details, even though both are accomplished through function calls. In this example, anyone picking up your code will not thank you for it. They will, in fact, be more likely to club you with your artificial hand.
Any method you can think of to hide information (classes, functions, dynamic libraries, macros) can be used for encapsulation.
Encapsulation is a process in which attributes(data member) and behavior(member function) of a objects in combined together as a single entity refer as class.
The Reference Model of Open Distributed Processing - written by the International Organisation for Standardization - defines the following concepts:
Entity: Any concrete or abstract thing of interest.
Object: A model of an entity. An object is characterised by its behaviour and, dually, by its state.
Behaviour (of an object): A collection of actions with a set of constraints on when they may occur.
Interface: An abstraction of the behaviour of an object that consists of a subset of the interactions of that object together with a set of constraints on when they may occur.
Encapsulation: the property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.
These, you will appreciate, are quite broad. Let us see, however, whether putting functionality within a function can logically be considered to constitute towards encapsulation in these terms.
Firstly, a function is clearly a model of a, 'Thing of interest,' in that it represents an algorithm you (presumably) desire executed and that algorithm pertains to some problem you are trying to solve (and thus is a model of it).
Does a function have behaviour? It certainly does: it contains a collection of actions (which could be any number of executable statements) that are executed under the constraint that the function must be called from somewhere before it can execute. A function may not spontaneously be called at any time, without causal factor. Sounds like legalese? You betcha. But let's plough on, nonetheless.
Does a function have an interface? It certainly does: it has a name and a collection of formal parameters, which in turn map to the executable statements contained in the function in that, once a function is called, the name and parameter list are understood to uniquely identify the collection of executable statements to be run without the calling party's specifying those actual statements.
Does a function have the property that the information contained in the function is accessible only through interactions at the interfaces supported by the object? Hmm, well, it can.
As some information is accessible via its interface, some information must be hidden and inaccessible within the function. (The property such information exhibits is called information hiding, which Parnas defined by arguing that modules should be designed to hide both difficult decisions and decisions that are likely to change.) So what information is hidden within a function?
To see this, we should first consider scale. It's easy to claim that, for example, Java classes can be encapsulated within a package: some of the classes will be public (and hence be the package's interface) and some will be package-private (and hence information-hidden within the package). In encapsulation theory, the classes form nodes and the packages form encapsulated regions, with the entirety forming an encapsulated graph; the graph of classes and packages is called the third graph.
It's also easy to claim that functions (or methods) themselves are encapsulated within classes. Again, some functions will be public (and hence be part of the class's interface) and some will be private (and hence information-hidden within the class). The graph of functions and classes is called the second graph.
Now we come to functions. If functions are to be a means of encapsulation themselves they they should contain some information public to other functions and some information that's information-hidden within the function. What could this information be?
One candidate is given to us by McCabe. In his landmark paper on cyclomatic complexity, Thomas McCabe describes source code where, 'Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program.'
Let us take the McCabian block of sequential execution as the unit of information that may be encapsulated within a function. As the first block within the function is always the first and only guaranteed block to be executed, we can consider the first block to be public, in that it may be called by other functions. All the other blocks within the function, however, cannot be called by other functions (except in languages that allow jumping into functions mid-flow) and so these blocks may be considered information-hidden within the function.
Taking these (perhaps slightly tenuous) definitions, then we may say yes: putting functionality within a function does constitute to encapsulation. The encapsulation of blocks within functions is the first graph.
There is a caveate, however. Would you consider a package whose every class was public to be encapsulated? According to the definitions above, it does pass the test, as you can say that the interface to the package (i.e., all the public classes) do indeed offer a subset of the package's behaviour to other packages. But the subset in this case is the entire package's behaviour, as no classes are information-hidden. So despite regorously satisfying the above definitions, we feel that it does not satisfy the spirit of the definitions, as surely something must be information-hidden for true encapsulation to be claimed.
The same is true for the exampe you give. We can certainly consider n = n + 1 to be a single McCabian block, as it (and the return statement) are a single, sequential flow of executions. But the function into which you put this thus contains only one block, and that block is the only public block of the function, and therefore there are no information-hidden blocks within your proposed function. So it may satisfy the definition of encapsulation, but I would say that it does not satisfy the spirit.
All this, of course, is academic unless you can prove a benefit such encapsulation.
There are two forces that motivate encapsulation: the semantic and the logical.
Semantic encapsulation merely means encapsulation based on the meaning of the nodes (to use the general term) encapsulated. So if I tell you that I have two packages, one called, 'animal,' and one called 'mineral,' and then give you three classes Dog, Cat and Goat and ask into which packages these classes should be encapsulated, then, given no other information, you would be perfectly right to claim that the semantics of the system would suggest that the three classes be encapsulated within the, 'animal,' package, rather than the, 'mineral.'
The other motivation for encapsulation, however, is logic.
The configuration of a system is the precise and exhaustive identification of each node of the system and the encapsulated region in which it resides; a particular configuration of a Java system is - at the third graph - to identify all the classes of the system and specify the package in which each class resides.
To logically encapsulate a system means to identify some mathematical property of the system that depends on its configuration and then to configure that system so that the property is mathematically minimised.
Encapsulation theory proposes that all encapsulated graphs express a maximum potential number of edges (MPE). In a Java system of classes and packages, for example, the MPE is the maximum potential number of source code dependencies that can exist between all the classes of that system. Two classes within the same package cannot be information-hidden from one another and so both may potentially form depdencies on one another. Two package-private classes in separate packages, however, may not form dependencies on one another.
Encapsulation theory tells us how many packages we should have for a given number of classes so that the MPE is minimised. This can be useful because the weak form of the Principle of Burden states that the maximum potential burden of transforming a collection of entities is a function of the maximum potential number of entities transformed - in other words, the more potential source code dependencies you have between your classes, the greater the potential cost of doing any particular update. Minimising the MPE thus minimises the maximum potential cost of updates.
Given n classes and a requirement of p public classes per package, encapsulation theory shows that the number of packages, r, we should have to minimise the MPE is given by the equation: r = sqrt(n/p).
This also applies to the number of functions you should have, given the total number, n, of McCabian blocks in your system. Functions always have just one public block, as we mentioned above, and so the equation for the number of functions, r, to have in your system simplifies to: r = sqrt(n).
Admittedly, few considered the total number of blocks in their system when practicing encapsulation, but it's readily done at the class/package level. And besides, minimising MPE is almost entirely entuitive: it's done by minimising the number of public classes and trying to uniformly distribute classes over packages (or at least avoid have most packages with, say, 30 classes, and one monster pacakge with 500 classes, in which case the internal MPE of the latter can easily overwhelm the MPE of all the others).
Encapsulation thus involves striking a balance between the semantic and the logical.
All great fun.
in strict object-oriented terminology, one might be tempted to say no, a "mere" function is not sufficiently powerful to be called encapsulation...but in the real world the obvious answer is "yes, a function encapsulates some code".
for the OO purists who bristle at this blasphemy, consider a static anonymous class with no state and a single method; if the AddOne() function is not encapsulation, then neither is this class!
and just to be pedantic, encapsulation is a form of abstraction, not vice-versa. ;-)
It's not normally very meaningful to speak of encapsulation without reference to properties rather than solely methods -- you can put access controls on methods, certainly, but it's difficult to see how that's going to be other than nonsensical without any data scoped to the encapsulated method. Probably you could make some argument validating it, but I suspect it would be tortuous.
So no, you're most likely not using encapsulation just because you put a method in a class rather than having it as a global function.