Now a days i switched to sonar reports for static code review and performance improvement. Under the rules section I found that the cognitive complexity of my methods are high.
You can find cognitive complexity error in sonar as:
Go to Project->Issues Tab->Rules Drop-down->Cognitive Complexity
Below screen shot gives you a reference of sonar project:
I was not getting any way to calculate and reduce the cognitive complexity of my methods. Finally I found the accurate way to calculate the complexity and i will answer this in my post below. Please check out.
Cognitive Complexity
After searching some blogs and having chat with sonar team I found an easy definition and calculation of cognitive complexity which is as below:
Definition:
Cognitive Complexity, Because Testability != Understandability
Your written code must be as simple to understand as the above definition, simple.
less Cognitive Complexity more Readability
Let's see a method for example to calculate CC, right now I am referring kotlin language, see below image:
In above image there is a method getAppConfigData(), whose cognitive complexity is being measured. Right now the CC of this method is 18. As you can check in above screen shot there is a warning, which tells that the limit of maximum complexity is 15, which is lower than the current CC of this method.
Now the actual question is: How can I calculate the CC of my method at the time of development?
Follow below rules to get your CC of any method or class as:
Increment when there is a break in the linear (top-to-bottom,
left-to-right) flow of the code
Increment when structures that break
the flow are nested
Ignore "shorthand" structures that readably
condense multiple lines of code into one
So whenever above rules matches, just add + count to your CC and remember count will be increased according to level of code break, as example "if" condition gets +1 if it is the first code break but if you have used one more nested if then it will be a +2 for that inner "if" as shown in below image.
That's all I got in terms of Cognitive Complexity.
You can find everything related to CC at sonar blog
Thank You
More explained answer in Sonar Cognitive Complexity
Basic criteria and methodology A Cognitive Complexity score is
assessed according to three basic rules:
Ignore structures that allow multiple statements to be readably shorthanded into one
Increment (add one) for each break in the linear flow of the code
Increment when flow-breaking structures are nested
Additionally, a complexity score is made up of four different types of increments:
Nesting - assessed for nesting control flow structures inside each
other
Structural - assessed on control flow structures that are
subject to a nesting increment, and that increase the nesting count
Fundamental - assessed on statements not subject to a nesting
increment
Hybrid - assessed on control flow structures that are not
subject to a nesting increment, but which do increase the nesting
count
While the type of an increment makes no difference in the math - each increment adds one to the final score - making a distinction among the categories of features being counted makes it easier to understand where nesting increments do and do not apply. These rules and the principles behind them are further detailed in the following sections.
In my case the Cognitive Complexity was due to many number of if conditions.
My SonarQue allowed only 15 if and else if conditions
if() =>1
else if() => 2
.
.
.
else => 15
Suppose the if exceeds more than 15 conditions it showed me Cognitive Complexity.
Related
Getting started with OptaPlanner (v.23.0.Final), I am experimenting with the CloudBalancing example. Using the IncrementalScoreCalculator Java class, I notice that the score calculation speed is much higher in the construction phase (>1M/sec) than in the local search phase (~50k/sec). How can this happen? Is the algorithm outside the score calculation included? That could explain the differnce, since the local search algorithm will spend much more time outside the score calculator than the construction algorithm.
Two reasons:
1) The construction Heuristic starts with no processes assigned to a computer, so all Process.getComputer() is null. Most constraints match on Processes for which computer != null, so they short circuit and don't do any expensive joins, groupBy's, accumulates, etc. So an empty or a partially initialized solution evaluates much faster than a fully initialized one (which Local Search uses).
2) The CH's only do ChangeMove's. LS does more expensive moves including swap moves (twice as big) and pillar moves (n times as big). So the amount of delta impact to calculate per move is bigger in LS too.
I want to use a python Min-Cost Flow solver to be able to construct new networks. This means that I have an initial complete graph, with the vertices being either suppliers or having a demand. Using the algorithm should tell me, based on their costs, what edges will be used to settle all demands. Different to the existing problems, the cost of an edge when being used are not only described by a unit cost but also have an investment of this edge which is independent of the flow. I have been looking into the source code of networkx and or-tools but cannot figure out how to adapt these to implement the investment cost of the edges. Does someone have a better idea or can help me adapting the code?
Best Regards
Justus
You cannot solve this with a standard graph algorithm (eg: MinCostFlow).
Instead you need to formulate it as a Mixed Integer Program.
You can start with this example:
https://developers.google.com/optimization/assignment/assignment_mip
But you need to tweak it a little bit:
You need two classes of decision variables: invest_var (binary) and flow_var (continuous).
The objective will look like this:
min: sum(flow_cost[i,j]*flow_var[i,j]) + sum(invest_cost[i,j]*invest_var[i,j])
And you need to add an additional constraint for each link:
flow_var[i,j] <= BIG_INT * invest_var[i,j]
The purpose of these to constrain flow_var to 0 if invest_var is 0.
Demand and Supply constraints will be similar as in the example.
BIG_INT is a constant. You can set it as:
BIG_INT=max(flow_upper_bound[i,j])
Where flow_upper_bound is an upper bound on your flow_var variables.
Notice, that the problem now becomes a Mixed Integer Linear Program instead of just being a Linear Program.
I know that my question is general, but I'm new to AI area.
I have an experiment with some parameters (almost 6 parameters). Each one of them is independent one, and I want to find the optimal solution for maximum or minimum the output function. However, if I want to do it in traditional programming technique it will take much time since i will use six nested loops.
I just want to know which AI technique to use for this problem? Genetic Algorithm? Neural Network? Machine learning?
Update
Actually, the problem could have more than one evaluation function.
It will have one function that we should minimize it (Cost)
and another function the we want to maximize it (Capacity)
Maybe another functions can be added.
Example:
Construction a glass window can be done in a million ways. However, we want the strongest window with lowest cost. There are many parameters that affect the pressure capacity of the window such as the strength of the glass, Height and Width, slope of the window.
Obviously, if we go to extreme cases (Largest strength glass, with smallest width and height, and zero slope) the window will be extremely strong. However, the cost for that will be very high.
I want to study the interaction between the parameters in specific range.
Without knowing much about the specific problem it sounds like Genetic Algorithms would be ideal. They've been used a lot for parameter optimisation and have often given good results. Personally, I've used them to narrow parameter ranges for edge detection techniques with about 15 variables and they did a decent job.
Having multiple evaluation functions needn't be a problem if you code this into the Genetic Algorithm's fitness function. I'd look up multi objective optimisation with genetic algorithms.
I'd start here: Multi-Objective optimization using genetic algorithms: A tutorial
First of all if you have multiple competing targets the problem is confused.
You have to find a single value that you want to maximize... for example:
value = strength - k*cost
or
value = strength / (k1 + k2*cost)
In both for a fixed strength the lower cost wins and for a fixed cost the higher strength wins but you have a formula to be able to decide if a given solution is better or worse than another. If you don't do this how can you decide if a solution is better than another that is cheaper but weaker?
In some cases a correctly defined value requires a more complex function... for example for strength the value could increase up to a certain point (i.e. having a result stronger than a prescribed amount is just pointless) or a cost could have a cap (because higher than a certain amount a solution is not interesting because it would place the final price out of the market).
Once you find the criteria if the parameters are independent a very simple approach that in my experience is still decent is:
pick a random solution by choosing n random values, one for each parameter within the allowed boundaries
compute target value for this starting point
pick a random number 1 <= k <= n and for each of k parameters randomly chosen from the n compute a random signed increment and change the parameter by that amount.
compute the new target value from the translated solution
if the new value is better keep the new position, otherwise revert to the original one.
repeat from 3 until you run out of time.
Depending on the target function there are random distributions that work better than others, also may be that for different parameters the optimal choice is different.
Some time ago I wrote a C++ code for solving optimization problems using Genetic Algorithms. Here it is: http://create-technology.blogspot.ro/2015/03/a-genetic-algorithm-for-solving.html
It should be very easy to follow.
I am optimizing algorithmic strategies. In the process of choosing from a pool of many optimized strategies, I am in the phase of searching (evaluating) for robustness of the strategy.
Following the guidelines of Dr. Pardo's book "The Evaluation of Trading Strategies" in page 231 Dr. Pardo recomends, in the Numeral 3 to apply the following ratio to the optimized data:
" 3. The ratio of the total profit of all profitable simulations divided by the
total profit of all simulationsis significantly positive"
The Question: from the optimization results, I am not being able to properly understand what does Mr. Pardo means by stating "...all simulationsis significantly positive"; what does Mr. Pardo means by 'significantly positive?
a.) with 95% confidence level?
b.) with a certain p value?
c.) the relation of the average net profit of each simulation minus it' standard deviation
Even though the sentence might seem 'simple' I would REALLY like to understand what Mr. Pardo means by the statement and HOW to calculate it, in order to filter the most robust algorithmic strategies.
The aim of analyzing the optimization profile of an algorithmic simulation is to be able to filter robust strategies.
Therefore the ratio should help us to uncover if the simulation results are on the right track or not.
So, we would like to impose some 'penalties' to our results, so we can select the robust cases from those of doubtful (not robust) result.
I came to the following penalizing measures (found in the book of Mr. Pardo and other sources).
a.) we can use a market return (yearly value) as a benchmark, so all the simulations whose result are below such level, can be excluded from our analysis,
b.) some other benchmark to divide those 'robust' results from those more 'doubtful' (for example, deducing to each result one standard deviation)
From (a) and (b), we can create the ratio:
the total sum of all profitable simulations divided by the profitable results considered robust
The ratio should be greater or equal than 1.
If the ratio is equal to 1 then it means that our simulation result has given interesting results (we are analyzing the positive values in this ratio, but profitable results should always be compared to the negative results also).
If the ratio is greater from 1, then we have not reach the possible scenario, and the result should be compared with the other tests for optimizations.
While simulating trading algorithms, no result is absolute but partial and it's value is taken in relationship to what we expect from the algorithm.
If someone has a better explanation or idea or concept you might find interesting please share, I would gladly read it.
Best regards to all.
Remark on the subject
With all due respect to the subject ( published in 2008 ) the term robustness has its own meaning if-and-only-if the statement also clarifies in which particular respect is the robustness measured and against what phenomena is it to be exposed & tested the Model-under-review's response ( against what perturbances -- type and scale -- shall the Model-under-test hold its robust behaviour, measures of which were both defined and quantified a-priori the test ).
In any case, where such context of the robustness is not defined, the material, be it printed by any bold name, sounds -- and forgive me to speak in plain English -- just like a PR-story, an over-hyped e-zine headline or like a paid advertorial.
Serious quantitative model evaluations, the more if one strives to perform an optimisation ( with respect to some defined quantitative goal ), requires a more thorough insight into the subject than to axiomatically post a trivial "must-have" imperative of
large-average && small-HiLo-range && small StDev.
Any serious Quant-Modelling effort, if it were not to just spoil the consumed hundreds-of-thousands CPU core hours of deep parametric-spaces' scans, shall incorporate a serious parametrisation decision in either dimension of the main TruTrading Strategy sub-spaces --
{ aSelectPOLICY, aDetectPOLICY, anActPOLICY, anAllocatePOLICY, aTerminatePOLICY }
A failure to do so, either cripples the model or leads to a blind-belief, where it is hard to guess, whether the former or the latter is a greater of the both Quant-sins.
Remark on the cited hypothesis
The book states, without any effort to proof the construction, that:
The more robust trading strategywill have an optimization profile with a: 1. Largeaverageprofit 2. Small maximum-minimumrange3. Small standarddeviation
Is it correct?
Now kindly spend a few moments and review this 4D-animated view of a Model-under-test ( visualisation of which is reduced into just four dimensions for easier visual perception ), where none of the above stands true.
<aMouseRightCLICK>.openPictureOnAnotherTab to see full HiRes picture details
Based on contemporary state-of-art adaptive money-management practice, that fails to be correct, be it due to a poor parametrisation ( thus artificially leading the model into a rather "flat-profits" sub-space of aParamSetVectorSPACE )
or due to a principal mis-concept or a poor practice ( including the lack thereof ) of the implementation of the most powerful profit-booster ever -- the very money-management model sub-space.
Item 1 becomes insignificant at all.
Item 2 works right on the contrary to the stated postulate.
Item 3 cannot yield anything but the opposite due to 1 & 2 above.
Question after BIG edition :
I need to built a ranking using genetic algorithm, I have data like this :
P(a>b)=0.9
P(b>c)=0.7
P(c>d)=0.8
P(b>d)=0.3
now, lets interpret a,b,c,d as names of football teams, and P(x>y) is probability that x wins with y. We want to build ranking of teams, we lack some observations P(a>d),P(a>c) are missing due to lack of matches between a vs d and a vs c.
Goal is to find ordering of team names, which the best describes current situation in that four team league.
If we have only 4 teams than solution is straightforward, first we compute probabilities for all 4!=24 orderings of four teams, while ignoring missing values we have :
P(abcd)=P(a>b)P(b>c)P(c>d)P(b>d)
P(abdc)=P(a>b)P(b>c)(1-P(c>d))P(b>d)
...
P(dcba)=(1-P(a>b))(1-P(b>c))(1-P(c>d))(1-P(b>d))
and we choose the ranking with highest probability. I don't want to use any other fitness function.
My question :
As numbers of permutations of n elements is n! calculation of probabilities for all
orderings is impossible for large n (my n is about 40). I want to use genetic algorithm for that problem.
Mutation operator is simple switching of places of two (or more) elements of ranking.
But how to make crossover of two orderings ?
Could P(abcd) be interpreted as cost function of path 'abcd' in assymetric TSP problem but cost of travelling from x to y is different than cost of travelling from y to x, P(x>y)=1-P(y<x) ? There are so many crossover operators for TSP problem, but I think I have to design my own crossover operator, because my problem is slightly different from TSP. Do you have any ideas for solution or frame for conceptual analysis ?
The easiest way, on conceptual and implementation level, is to use crossover operator which make exchange of suborderings between two solutions :
CrossOver(ABcD,AcDB) = AcBD
for random subset of elements (in this case 'a,b,d' in capital letters) we copy and paste first subordering - sequence of elements 'a,b,d' to second ordering.
Edition : asymetric TSP could be turned into symmetric TSP, but with forbidden suborderings, which make GA approach unsuitable.
It's definitely an interesting problem, and it seems most of the answers and comments have focused on the semantic aspects of the problem (i.e., the meaning of the fitness function, etc.).
I'll chip in some information about the syntactic elements -- how do you do crossover and/or mutation in ways that make sense. Obviously, as you noted with the parallel to the TSP, you have a permutation problem. So if you want to use a GA, the natural representation of candidate solutions is simply an ordered list of your points, careful to avoid repitition -- that is, a permutation.
TSP is one such permutation problem, and there are a number of crossover operators (e.g., Edge Assembly Crossover) that you can take from TSP algorithms and use directly. However, I think you'll have problems with that approach. Basically, the problem is this: in TSP, the important quality of solutions is adjacency. That is, abcd has the same fitness as cdab, because it's the same tour, just starting and ending at a different city. In your example, absolute position is much more important that this notion of relative position. abcd means in a sense that a is the best point -- it's important that it came first in the list.
The key thing you have to do to get an effective crossover operator is to account for what the properties are in the parents that make them good, and try to extract and combine exactly those properties. Nick Radcliffe called this "respectful recombination" (note that paper is quite old, and the theory is now understood a bit differently, but the principle is sound). Taking a TSP-designed operator and applying it to your problem will end up producing offspring that try to conserve irrelevant information from the parents.
You ideally need an operator that attempts to preserve absolute position in the string. The best one I know of offhand is known as Cycle Crossover (CX). I'm missing a good reference off the top of my head, but I can point you to some code where I implemented it as part of my graduate work. The basic idea of CX is fairly complicated to describe, and much easier to see in action. Take the following two points:
abcdefgh
cfhgedba
Pick a starting point in parent 1 at random. For simplicity, I'll just start at position 0 with the "a".
Now drop straight down into parent 2, and observe the value there (in this case, "c").
Now search for "c" in parent 1. We find it at position 2.
Now drop straight down again, and observe the "h" in parent 2, position 2.
Again, search for this "h" in parent 1, found at position 7.
Drop straight down and observe the "a" in parent 2.
At this point note that if we search for "a" in parent one, we reach a position where we've already been. Continuing past that will just cycle. In fact, we call the sequence of positions we visited (0, 2, 7) a "cycle". Note that we can simply exchange the values at these positions between the parents as a group and both parents will retain the permutation property, because we have the same three values at each position in the cycle for both parents, just in different orders.
Make the swap of the positions included in the cycle.
Note that this is only one cycle. You then repeat this process starting from a new (unvisited) position each time until all positions have been included in a cycle. After the one iteration described in the above steps, you get the following strings (where an "X" denotes a position in the cycle where the values were swapped between the parents.
cbhdefga
afcgedbh
X X X
Just keep finding and swapping cycles until you're done.
The code I linked from my github account is going to be tightly bound to my own metaheuristics framework, but I think it's a reasonably easy task to pull the basic algorithm out from the code and adapt it for your own system.
Note that you can potentially gain quite a lot from doing something more customized to your particular domain. I think something like CX will make a better black box algorithm than something based on a TSP operator, but black boxes are usually a last resort. Other people's suggestions might lead you to a better overall algorithm.
I've worked on a somewhat similar ranking problem and followed a technique similar to what I describe below. Does this work for you:
Assume the unknown value of an object diverges from your estimate via some distribution, say, the normal distribution. Interpret your ranking statements such as a > b, 0.9 as the statement "The value a lies at the 90% percentile of the distribution centered on b".
For every statement:
def realArrival = calculate a's location on a distribution centered on b
def arrivalGap = | realArrival - expectedArrival |
def fitness = Σ arrivalGap
Fitness function is MIN(fitness)
FWIW, my problem was actually a bin-packing problem, where the equivalent of your "rank" statements were user-provided rankings (1, 2, 3, etc.). So not quite TSP, but NP-Hard. OTOH, bin-packing has a pseudo-polynomial solution proportional to accepted error, which is what I eventually used. I'm not quite sure that would work with your probabilistic ranking statements.
What an interesting problem! If I understand it, what you're really asking is:
"Given a weighted, directed graph, with each edge-weight in the graph representing the probability that the arc is drawn in the correct direction, return the complete sequence of nodes with maximum probability of being a topological sort of the graph."
So if your graph has N edges, there are 2^N graphs of varying likelihood, with some orderings appearing in more than one graph.
I don't know if this will help (very brief Google searches did not enlighten me, but maybe you'll have more success with more perseverance) but my thoughts are that looking for "topological sort" in conjunction with any of "probabilistic", "random", "noise," or "error" (because the edge weights can be considered as a reliability factor) might be helpful.
I strongly question your assertion, in your example, that P(a>c) is not needed, though. You know your application space best, but it seems to me that specifying P(a>c) = 0.99 will give a different fitness for f(abc) than specifying P(a>c) = 0.01.
You might want to throw in "Bayesian" as well, since you might be able to start to infer values for (in your example) P(a>c) given your conditions and hypothetical solutions. The problem is, "topological sort" and "bayesian" is going to give you a whole bunch of hits related to markov chains and markov decision problems, which may or may not be helpful.