OO Design Quality Metrics - oop

I'm reading a scientific paper about OO Design Quality Metrics written by Robert martin.
In his paper he describes "a set of metrics that can be used to measure the quality of an object-oriented design in terms of the interdependence between the subsystems of that design"
He goes on about how there should be a good balance between abstraction and instability. Here are the metrics he writes about and how they can be calculated:
Na: The number of concrete and abstract classes (and interfaces) in the package is an indicator of the extensibility of the package.
Afferent Couplings (Ca): The number of classes outside the package that depend upon classes within the package.
Efferent Couplings (Ce): The number of classes inside the package that depend upon classes outside the package.
Abstractness (A): The ratio of the number of abstract classes (and interfaces) in the analyzed package to the total number of classes in the analyzed package. The range for this metric is 0 to 1, with A=0 indicating a completely concrete package and A=1 indicating a completely abstract package.
Instability (I): The ratio of efferent coupling (Ce) to total coupling (Ce + Ca) such that I = Ce / (Ce + Ca). This metric is an indicator of the package's resilience to change. The range for this metric is 0 to 1, with I=0 indicating a completely stable package and I=1 indicating a completely unstable package.
Distance from the Main Sequence (D): The perpendicular distance of a package from the idealized line A + I = 1. This metric is an indicator of the package's balance between abstractness and stability. A package squarely on the main sequence is optimally balanced with respect to its abstractness and stability. Ideal packages are either completely abstract and stable (x=0, y=1) or completely concrete and unstable (x=1, y=0). The range for this metric is 0 to 1, with D=0 indicating a package that is coincident with the main sequence and D=1 indicating a package that is as far from the main sequence as possible.
I made the following simple design.
I'm confuse about the last metric (D). If I calculate the metric D(D' in the picture), I get a negative value of -0.5. But if I read the description is says the value should be between 0 and 1. Also wikipedia states that for metric interfaces are also considered as abstract classes. But I can't make this up from the paper. Is this true?
Did I do something wrong? Is believe this design, although really small, isn't that bad right?

If D is "distance" then you should consider its absolute value, the formula in the paper has an absolute operator also... I'm not sure how you calculate the distance, or I misunderstood you.
About considering abstract class and interface I think both of them are mechanisms to provide an "Interface Framework", which means keeping dependencies in the interface level not the concrete classes... so I think it's safe to consider them the same thing despite some differences.

Related

The Theory of Object oriented programming

This question concerns a generalized approach to building up a hierarchy of objects. I shall use an example to explain my questions.
Suppose im building a digital circuit simulator, and we do this with an Object-oriented class structure. The base class is logic gates, from which we derive binary gates and unary gates, then from binary gates, we derive the classes AND, OR, XOR etc. and from unary gates, we derive the class for a NOT gate. This makes sense, as at each layer you become more specified, and you branch into subsets which have data that is unique to that derived class.
But suppose then we want to build something out of our derived logic gate classes, for example, an 8-bit adder circuit. In this case, the 8-bit adder is not part of the original hierarchy structure - it is not a logic gate, but despite that, it can be built of out of the logic gate objects. Additionally, the 8-bit adder can be broken down into 8 full adders, which can each be broken down into 2 half adders. But as opposed to the original logic gate hierarchy, each of these objects (8-bit adder, full adder & half adder) all operate as individual components, whereas the only classes which actually serve as a component from the logic gate hierarchy are the final derived classes, AND, OR, XOR, NOT. In this sense, I don't understand whether or not it would be appropriate to create a class structure based on inheritance for the adder circuits.
To me, it seems as though the half-adder, full-adder and 8-bit adder, should each be comprised of individual logic gate objects, but this would be inefficient as it would require lots of repetition of code, and so this begs the question, what is the best way to arrange these class structures so that the most complex 8-bit adder can be built from full adders, and the full adders can be built from half adders, and the half adders can be built from individual logic gates.
Many thanks to all that answer.

Strategy to tackle knapsack binded with travelling salesman

I have been assigned the following problem as a research topic for summer. However, I have not been able to successfully find related problem, except that it seems to be a combination of travelling salesman with the knapsack, even though I'm not sure if that is the case. The statement is:
You are a truck driver who earned a big contract and now you must
deliver a lot of packages. There are N packages to be delivered, each
one has to be delivered at a certain address (x,y) on the city.
Additionally each package i has a weight Wi.
For simplicity, suppose the distribution area is rectangular and you
always start at the point (0,90).
You only have one truck, with limited capacity of 1000 (excluding
truck weight). The truck base weight is 10.
The distances to be travelled are far away, so the distance shall be
computer using Haversine distance.
The company who contracted you will provide you with enough fuel, so
you can make unlimited ammount of travels.
However, you must be very careful while delivering the packages since
you must deliver every single one of them and, if you choose to pick
up a package during a trip, you must deliver it no matter what, you
can't leave them in the middle of your trip.
As you are a bit miserly, you agree on the conditions, but you know
that if you don't take a close to optimal strategy, your truck can
wear too much, so you could end leaving the contract incomplete,
which will get you sued and leave you without truck and money.
So, due to your experience, you know that to maximize the chances of survival
of your truck, you must minimize the following function:
http://goo.gl/jQxXzN (sorry, I can't post images because I have not enough reputation).
where m is the number of tripss, n is the number of packages for each trip j, wij is the weight of the i-th gift during the j-th trip, Dist represents Haversine distance between two points, Loc(i) is the location of the i-th gift, Loc(0) and Loc(n) are the (0,90) point (starting point), and wnj (last weight of the trip) is always the truck weight (base weight).
So, basically, those are all the restictions of the research topic I got.
I have been thinking maybe some metaheuristics such as ant collony or genetic algorithms could be of help, but I would have to get to know the problem a bit better. Any idea or information (paper, book, etc) is welcomed.
This sounds like a variation of the Capacitated Vehicle Routing Problem (CRVP), specifically with with single-vehicle and single-depot (however with non-uniform packages). For some reference on this problem, see e.g.:
On the Capacitated Vehicle Routing Problem (T.K. Ralphs, L. Kopman, W.R. Pulleyblank, and L.E. Trotter, Jr)
Modeling and Solving the Capacitated Vehicle Routing Problem on Trees (B. Chandran and S. Raghavan)
I think your idea of metaheuristics---ant colony optimisation (ACO) in particular---would be a wise approach. Note that for TSP related problems, generally ACO is to prefer over genetic algorithms (GA).
Possibly the following somewhat well known article (1) can get you started to studying the possible benefits of ACO approach. (2) extends the result from (1). Note that this covers the regular VHP (no capacitated), but it should prove valuable as a starting point, the least giving inspiration.
SavingsAnts for the Vehicle Routing Problem (K. Doerner et al.)
An improved ant colony optimization for vehicle routing problem (Yu Bin et al.)
There also seems to exists literature specifically one the subject of ACO and CVRP, but I cannot comment to the quality of these, but I'll list them for you to inspect by yourself. Note that (3) is an extension from the authors of (1).
Parallel Ant Systems for the Capacitated Vehicle Routing Problem (K. Doerner et al.)
Ant Colony Optimization for Capacitated Vehicle Routing Problem (W.F. Tan et al.)
Sound like an interesting research topic, good luck!

Routh-Hurwitz useful when I can just calculate eigenvalues?

This is for self-study of N-dimensional system of linear homogeneous ordinary differential equations of the form:
dx/dt=Ax
where A is the coefficient matrix of the system.
I have learned that you can check for stability by determining if the real parts of all the eigenvalues of A are negative. You can check for oscillations if there are any purely imaginary eigenvalues of A.
The author in the book I'm reading then introduces the Routh-Hurwitz criterion for detecting stability and oscillations of the system. This seems to be a more efficient computational short-cut than calculating eigenvalues.
What are the advantages of using Routh-Hurwitz criteria for stability and oscillations, when you can just find the eigenvalues quickly nowadays? For instance, will it be useful when I start to study nonlinear dynamics? Is there some additional use that I am completely missing?
Wikipedia entry on RH stability analysis has stuff about control systems, and ends up with a lot of equations in the s-domain (Laplace transforms), but for my applications I will be staying in the time-domain for the most part, and just focusing fairly narrowly on stability and oscillations in linear (or linearized) systems.
My motivation: it seems easy to calculate eigenvalues on my computer, and the Routh-Hurwitz criterion comes off as sort of anachronistic, the sort of thing that might save me a lot of time if I were doing this by hand, but not very helpful for doing analysis of small-fry systems via Matlab.
Edit: I've asked this at Math Exchange, which seems more appropriate:
https://math.stackexchange.com/questions/690634/use-of-routh-hurwitz-if-you-have-the-eigenvalues
There is an accepted answer there.
This is just legacy educational curriculum which fell way behind of the actual computational age. Routh-Hurwitz gives a very nice theoretical basis for parametrization of root positions and linked to much more abstract math.
However, for control purposes it is just a nice trick that has no practical value except maybe simple transfer functions with one or two unknown parameters. It had real value when computing the roots of the polynomials were expensive or even manual. Today, even root finding of polynomials is based on forming the companion matrix and computing the eigenvalues. In fact you can basically form a meshgrid and check the stability surface by plotting the largest real part in a few minutes.

disambiguating HPCT artificial intelligence architecture

I am trying to construct a small application that will run on a robot with very limited sensory capabilities (NXT with gyroscope/ultrasonic/touch) and the actual AI implementation will be based on hierarchical perceptual control theory. I'm just looking for some guidance regarding the implementation as I'm confused when it comes to moving from theory to implementation.
The scenario
My candidate scenario will have 2 behaviors, one is to avoid obstacles, second is to drive in circular motion based on given diameter.
The problem
I've read several papers but could not determine how I should classify my virtual machines (layers of behavior?) and how they should communicating to lower levels and solving internal conflicts.
These are the list of papers I've went through to find my answers but sadly could not
pct book
paper on multi-legged robot using hpct
pct alternative perspective
and the following ideas are the results of my brainstorming:
The avoidance layer would be part of my 'sensation layer' and that is because it only identifies certain values like close objects e.g. ultrasonic sensor specific range of values. The other second layer would be part of the 'configuration layer' as it would try to detect the pattern in which the robot is driving like straight line, random, circle, or even not moving at all, this is using the gyroscope and motor readings. 'Intensity layer' represents all sensor values so it's not something to consider as part of the design.
Second idea is to have both of the layers as 'configuration' because they would be responding to direct sensor values from 'intensity layer' and they would be represented in a mesh-like design where each layer can send it's reference values to the lower layer that interface with actuators.
My problem here is how conflicting behavior would be handled (maneuvering around objects and keep running in circles)? should be similar to Subsumption where certain layers get suppressed/inhibited and have some sort of priority system? forgive my short explanation as I did not want to make this a lengthy question.
/Y
Here is an example of a robot which implements HPCT and addresses some of the issues relevant to your project, http://www.youtube.com/watch?v=xtYu53dKz2Q.
It is interesting to see a comparison of these two paradigms, as they both approach the field of AI at a similar level, that of embodied agents exhibiting simple behaviors. However, there are some fundamental differences between the two which means that any comparison will be biased towards one or the other depending upon the criteria chosen.
The main difference is of biological plausibility. Subsumption architecture, although inspired by some aspects of biological systems, is not intended to theoretically represent such systems. PCT, on the hand, is exactly that; a theory of how living systems work.
As far as PCT is concerned then, the most important criterion is whether or not the paradigm is biologically plausible, and criteria such as accuracy and complexity are irrelevant.
The other main difference is that Subsumption concerns action selection whereas PCT concerns control of perceptions (control of output versus control of input), which makes any comparison on other criteria problematic.
I had a few specific comments about your dissertation on points that may need
clarification or may be typos.
"creatures will attempt to reach their ultimate goals through
alternating their behaviour" - do you mean altering?
"Each virtual machine's output or error signal is the reference signal of the machine below it" - A reference signal can be a function of one or more output signals from higher-level systems, so more strictly this would be, "Each virtual machine's output or error signal contributes to the reference signal of a machine at a lower level".
"The major difference here is that Subsumption does not incorporate the ideas of 'conflict' " - Well, it does as the purpose of prioritising the different layers, and sub-systems, is to avoid conflict. Conflict is implicit, as there is not a dedicated system to handle conflicts.
"'reorganization' which require considering the goals of other layers." This doesn't quite capture the meaning of reorganisation. Reorganisation happens when there is prolonged error in perceptual control systems, and is a process whereby the structure of the systems changes. So rather than just the reference signals changing the connections between systems or the gain of the systems will change.
"Design complexity: this is an essential property for both theories." Rather than an essential property, in the sense of being required, it is a characteristic, though it is an important property to consider with respect to the implementation or usability of a theory. Complexity, though, has no bearing on the validity of the theory. I would say that PCT is a very simple theory, though complexity arises in defining the transfer functions, but this applies to any theory of living systems.
"The following step was used to create avoidance behaviour:" Having multiple nodes for different speeds seem unnecessarily complex. With PCT it should only be necessary to have one such node, where the distance is controlled by varying the speed (which could be negative).
Section 4.2.1 "For example, the avoidance VM tries to respond directly to certain intensity values with specific error values." This doesn't sound like PCT at all. With PCT, systems never respond with specific error (or output) values, but change the output in order to bring the intensity (in this case) input in to line with the reference.
"Therefore, reorganisation is required to handle that conflicting behaviour. I". If there is conflict reorganisation may be necessary if the current systems are not able to resolve that conflict. However, the result of reorganisation may be a set of systems that are able to resolve conflict. So, it can be possible to design systems that resolve conflict but do not require reorganisation. That is usually done with a higher-level control system, or set of systems; and should be possible in this case.
In this section there is no description of what the controlled variables are, which is of concern. I would suggest being clear about what are goal (variables) of each of the systems.
"Therefore, the designed behaviour is based on controlling reference values." If it is only reference values that are altered then I don't think it is accurate to describe this as 'reorganisation'. Such a node would better be described as a "conflict resolution" node, which should be a higher-level control system.
Figure 4.1. The links annotated as "error signals" are actually output signals. The error signals are the links between the comparator and the output.
"the robot never managed to recover from that state of trying to reorganise the reference values back and forth." I'd suggest the way to resolve this would be to have a system at a level above the conflicted systems, and takes inputs from one or both of them. The variable that it controls could simply be something like, 'circular-motion-while-in-open-space', and the input a function of the avoidance system perception and then a function of the output used as the reference for the circular motion system, which may result in a low, or zero, reference value, essentially switching off the system, thus avoiding conflict, or interference. Remember that a reference signal may be a weighted function of a number of output signals. Those weights, or signals, could be negative so inhibiting the effect of a signal resulting in suppression in a similar way to the Subsumption architecture.
"In reality, HPCT cannot be implemented without the concept of reorganisation because conflict will occur regardless". As described above HPCT can be implemented without reorganisation.
"Looking back at the accuracy of this design, it is difficult to say that it can adapt." Provided the PCT system is designed with clear controlled variables in mind PCT is highly adaptive, or resistant to the effects of disturbances, which is the PCT way of describing adaption in the present context.
In general, it may just require clarification in the text, but as there is a lack of description of controlled variables in the model of the PCT implementation and that, it seems, some 'behavioural' modules used were common to both implementations it makes me wonder whether PCT feedback systems were actually used or whether it was just the concept of the hierarchical architecture that was being contrasted with that of the Subsumption paradigm.
I am happy to provide more detail of HPCT implementation though it looks like this response is somewhat overdue and you've gone beyond that stage.
Partial answer from RM of the CSGnet list:
https://listserv.illinois.edu/wa.cgi?A2=ind1312d&L=csgnet&T=0&P=1261
Forget about the levels. They are just suggestions and are of no use in building a working robot.
A far better reference for the kind of robot you want to develop is the CROWD program, which is documented at http://www.livingcontrolsystems.com/demos/tutor_pct.html.
The agents in the CROWD program do most of what you want your robot to do. So one way to approach the design is to try to implement the control systems in the CROWD programs using the sensors and outputs available for the NXT robot.
Approach the design of the robot by thinking about what perceptions should be controlled in order to produce the behavior you want to see the robot perform. So, for example, if one behavior you want to see is "avoidance" then think about what avoidance behavior is (I presume it is maintaining a goal distance from obstacles) and then think about what perception, if kept under control, would result in you seeing the robot maintain a fixed distance from objects. I suspect it would be the perception of the time delay between sending and receiving of the ultrasound pulses.Since the robot is moving in two-space (I presume) there might have to be two pulse sensors in order to sense the two D location of objects.
There are potential conflicts between the control systems that you will need to build; for example, I think there could be conflicts between the system controlling for moving in a circular path and the system controlling for avoiding obstacles. The agents in the CROWD program have the same problem and sometimes get into dead end conflicts. There are various ways to deal with conflicts of this kind;for example, you could have a higher level system monitoring the error in the two potentially conflicting systems and have it make reduce the the gain in one system or the other if the conflict (error) persists for some time.

Difference between Gene Expression Programming and Cartesian Genetic Programming

Something pretty annoying in evolutionary computing is that mildly different and overlapping concepts tend to pick dramatically different names. My latest confusion because of this is that gene-expression-programming seems very similar to cartesian-genetic-programming.
(how) Are these fundamentally different concepts?
I've read that indirect encoding of GP instructions is an effective technique ( both GEP and CGP do that ). Has there been reached some sort of consensus that indirect encoding has outdated classic tree bases GP?
Well, it seems that there is some difference between gene expression programming (GEP) and cartesian genetic programming (CGP or what I view as classic genetic programming), but the difference might be more hyped up than it really ought to be. Please note that I have never used GEP, so all of my comments are based on my experience with CGP.
In CGP there is no distinction between genotype and a phenotype, in other words- if you're looking at the "genes" of a CGP you're also looking at their expression. There is no encoding here, i.e. the expression tree is the gene itself.
In GEP the genotype is expressed into a phenotype, so if you're looking at the genes you will not readily know what the expression is going to look like. The "inventor" of GP, Cândida Ferreira, has written a really good paper and there are some other resources which try to give a shorter overview of the whole concept.
Ferriera says that the benefits are "obvious," but I really don't see anything that would necessarily make GEP better than CGP. Apparently GEP is multigenic, which means that multiple genes are involved in the expression of a trait (i.e. an expression tree). In any case, the fitness is calculated on the expressed tree, so it doesn't seem like GEP is doing anything to increase the fitness. What the author claims is that GEP increases the speed at which the fitness is reached (i.e. in fewer generations), but frankly speaking you can see dramatic performance shifts from a CGP just by having a different selection algorithm, a different tournament structure, splitting the population into tribes, migrating specimens between tribes, including diversity into the fitness, etc.
Selection:
random
roulette wheel
top-n
take half
etc.
Tournament Frequency:
once per epoch
once per every data instance
once per generation.
Tournament Structure:
Take 3, kill 1 and replace it with the child of the other two.
Sort all individuals in the tournament by fitness, kill the lower half and replace it with the offspring of the upper half (where lower is worse fitness and upper is better fitness).
Randomly pick individuals from the tournament to mate and kill the excess individuals.
Tribes
A population can be split into tribes that evolve independently of each-other:
Migration- periodically, individual(s) from a tribe would be moved to another tribe
The tribes are logically separated so that they're like their own separate populations running in separate environments.
Diversity Fitness
Incorporate diversity into the fitness, where you count how many individuals have the same fitness value (thus are likely to have the same phenotype) and you penalize their fitness by a proportionate value: the more individuals with the same fitness value, the more penalty for those individuals. This way specimens with unique phenotypes will be encouraged, therefore there will be much less stagnation of the population.
Those are just some of the things that can greatly affect the performance of a CGP, and when I say greatly I mean that it's in the same order or greater than Ferriera's performance. So if Ferriera didn't tinker with those ideas too much, then she could have seen much slower performance of the CGPs... especially if she didn't do anything to combat stagnation. So I would be careful when reading performance statistics on GEP, because sometimes people fail to account for all of the "optimizations" available out there.
There seems to be some confusion in these answers that must be clarified. Cartesian GP is different from classic GP (aka tree-based GP), and GEP. Even though they share many concepts and take inspiration from the same biological mechanisms, the representation of the individuals (the solutions) varies.
In CGPthe representation (mapping between genotype and phenotype) is indirect, in other words, not all of the genes in a CGP genome will be expressed in the phenome (a concept also found in GEP and many others). The genotypes can be coded in a grid or array of nodes, and the resulting program graph is the expression of active nodes only.
In GEP the representation is also indirect, and similarly not all genes will be expressed in the phenotype. The representation in this case is much different from treeGP or CGP, but the genotypes are also expressed into a program tree. In my opinion GEP is a more elegant representation, easier to implement, but also suffers from some defects like: you have to find the appropriate tail and head size which is problem specific, the mnltigenic version is a bit of a forced glue between expression trees, and finally it has too much bloat.
Independently of which representation may be better than the other in some specific problem domain, they are general purpose, can be applied to any domain as long as you can encode it.
In general, GEP is simpler from GP. Let's say you allow the following nodes in your program: constants, variables, +, -, *, /, if, ...
For each of such nodes with GP you must create the following operations:
- randomize
- mutate
- crossover
- and probably other genetic operators as well
In GEP for each of such nodes only one operation is needed to be implemented: deserialize, which takes array of numbers (like double in C or Java), and returns the node. It resembles object deserialization in languages like Java or Python (the difference is that deserialization in programming languages uses byte arrays, where here we have arrays of numbers). Even this 'deserialize' operation doesn't have to be implemented by the programmer: it can be implemented by a generic algorithm, just like it's done in Java or Python deserialization.
This simplicity from one point of view may make searching of best solution less successful, but from other side: requires less work from programmer and simpler algorithms may execute faster (easier to optimize, more code and data fits in CPU cache, and so on). So I would say that GEP is slightly better, but of course the definite answer depends on problem, and for many problems the opposite may be true.