What is meant by "unit" in IDEA code duplication analysis?

What is meant by "unit" in IDEA code duplication analysis? - intellij-idea

IntelliJ IDEA has an ability to find duplicated code.
One can tune the amount of "units" (according to their documentation) that is considered duplicate.
However, I can't find any explanation on what is this "unit".
I'm looking for an answer that unambiguously defines such units.

The "units" measure is used in option Do not show duplicates simpler than. This option defines the minimal weight of the reported code fragments.
This weight is computed as a sum of all element weights in the fragment.
And since different elements have the different weights sum of them must be measured in abstract "units".
Element weight can be roughly approximated as:
it's a statement -> 2
it's an expression/literal/identifier -> 1
otherwise -> 0
For example, weight of x = 42; can be approximated as w(x) + w(=) + w(42) + w(;) + w(statement(x=42;)). Which is rougly 1 + 0 + 1 + 2 = 4 .

Related

What does O(nm/8 * log(nm/8)) + O(nm/9 * log(nm/9)) + ... + O(nm/m * log(nm/m)) equal to?

I'm sorry for the question title but I can't find a simpler way to put it. Basically, my algorithm involves quicksort for O(nm/k) elements, where k ranges from 8 to m. I wonder what the total complexity for this is, and how to deduce it? Thank you!

Drop the division inside the logarithms and we get nmlog(mn) * (1/8 + ... + 1/m) = O(nmlog(mn)log(m)) = O(mnlog(m)^2 + mnlog(m)log(n)). [I used the fact that the harmonic series is asymptotically ln(m))
Note that the fact that we dropped the divisions inside the logarithms means that we got an upper bound rather than an exact bound (but a better one than the naive approach of taking the biggest term multiplied by m).

Bayesian estimation of log-normal using JAGS

I try to find 95% credible interval of 50 sample means. Sample sizes range from 2 to 600, and the values in each sample are bounded between 1 and 5.
ex:
sample 1 = (1,3.5,2.8,5,4.6)
sample 2 = (1,5)
sample 3 = (4.1,1.1,5,3.5,2,2.4,...)
Samples with size of 10 or more have a lognormal distribution where i used JAGS for Bayesian estimation of log-normal parameters adapted from John K. Kruschke, with model specification as below:
modelstring = "
model {
for( i in 1 : N ) {
y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 )
}
sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )
muOfLogY ~ dunif( 0.001*meanOfLogY , 1000*meanOfLogY )
muOfY <- exp(muOfLogY+sigmaOfLogY^2/2)
modeOfY <- exp(muOfLogY-sigmaOfLogY^2)
sigmaOfY <- sqrt(exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1))
}
"
The model works fine with sample size > 10. However, with 3 <= samples < 10 i got extreme values in upper limit (e.g., 3000) which exceeded the maximum possible value of the mean (e.g., 5).
In case of sample size = 2, i got the below error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
I am new to JAGS and can't figure out how to solve this issues. I think for smaples < 10 the distribution is no longer lognormal!
Any ideas?
Thank you

First a semantic note. You are not using JAGS to find sample means. You are using JAGS to find the means of the populations from which the samples arose. If you wanted to find the sample (log)means, you could just take the mean of the (logarithms of the) sample values.
Now, if the values in each sample are bounded between 1 and 5 (due to some external constraint), then the sample is NEVER drawn from a log-normal distribution, which inherently puts probability mass over values greater than five.
Let's imagine, for the sake of saying, that the samples do arise from lognormal sampling (and therefore aren't inherently bounded between 1 and 5). Then JAGS is simply telling you that there is not enough information contained in the sample to get a good estimate of the population mean from which it is drawn. I wouldn't worry about understanding the error when the sample size is two, because there is literally no way to get good inference about the population mean from two samples. This is true even if you know that the population is indeed log-normally distributed. And since your populations are not actually log-normally distributed (they are bounded between 1 and 5) the entire inferential procedure is invalid anyway.

What is definition of truncated polynomial?

In NTRUEncryption, I seen the trucated polynimials, but I cannot understand the trunacated polynomial calculation.
So, could tell me anyone How we calculate the truncated polynomial?

The polynomials are truncated in the sense that they only have coefficients up to a certain degree.
Here is how you truncate the product of two truncated polynomials (the sum is trivial):
Assume you have two truncated polynomials, i.e. two polynomials of degree no greater than n-1
a = a[0] + a[1]X + ... + a[n-1]X^(n-1)
b = b[0] + b[1]X + ... + b[n-1]X^(n-1)
Then their "truncated" product is defined as the polynomial
a * b = c[0] + c[1]X + ... +c[n-1]X^(n-1)
where the c[k] coefficients are computed as follow:
Reverse b[0]..b[n-1] to get b[n-1]..b[0].
Rotate the result of step 1 above k+1 times to the right and get b[k]..b[0]b[n-1]..b[k+1]
Denote with b_k[0]..b_k[n-1] the array calculated in 2.
Now define
c[k] = a[0]b_k[0] + a[1]b_k[1] + ... + a[n-1]b_k[n-1].
This operation can also be made by multiplying the polynomials a and b in the usual way and then truncating the result to the degree n-1. The reason for the algorithm above is to avoid computing coefficients that will not be used in the final result.

Solving the multiple-choice multidimensional knapsack

I am trying to solve some (relatively easy) instances of the multiple-choice multidimensional knapsack problem (where there are groups of items where only one item per group can be obtained and the weights of the items are multi-dimensional as well as the knapsack capacity). I have two questions regarding the formulation and solution:
If two groups have different number of items, is it possible to fill in the groups with smaller number of items with items having zero profit and weight=capacity to express the problem in a matrix form? Would this affect the solution? Specifically, assume I have optimization programs, where the first group (item-set) might have three candidate items and the second group has two items (different than three), i.e. these have the following form:
maximize (over x_ij) {v_11 x_11 + v_12 x_12 + v_13 x_13 +
v_21 x_21 + v_22 x_22}
subject to {w^i_11 x_11 + w^i_12 x_12 + w^i_13 x_13 + w^i_21 x_21 + w^i_22 x_22 <= W^i, i=1,2
x_11 + x_12 + x_13 = 1, x_21 + x_22 = 1, x_ij \in {0,1} for all i and j.
Is it OK in this scenario to add an artificial item x_23 with value v_23 = 0 and w^1_23 = W^1, w^2_23 = W^2 to have full products v_ij x_ij (i=1,2 j=1,2,3)?
Given that (1) is possible, has anyone tried to solve instances using some open-source optimization package such as cvx? I know about cplex but it is difficult to get for a non-academic and I am not sure that GLPK supports groups of variables.

Division in double precision

I have two double variables:
a > 0
b >= 0
which could be tiny numbers. 'a' represents singular values of a matrix and 'b' represents the Tikhonov regularization constant. As part of the Tikhonov least squares solution, it is necessary to compute the quantity:
c = a*a / (a*a + b)
However if a is really small (ie small singular values of the matrix), a*a may not be representable in double precision. How can I compute this quotient c in a numerically stable way for the given ranges of a,b?

The best I can come up with is:
c = 1 / (1 + b / a / a)
To derive this equivalency, note that 1/c is (a^2 + b)/c and then decompose the fraction. This form might be more numerically stable since it doesn't require a^2 to be calculated at any point. It'll still lose precision if both b and a are very small. If that case must be handled too, you might look at a Taylor series expansion (may or may not work for this case).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What is meant by "unit" in IDEA code duplication analysis? - intellij-idea

IntelliJ IDEA has an ability to find duplicated code. One can tune the amount of "units" (according to their documentation) that is considered duplicate. However, I can't find any explanation on what is this "unit". I'm looking for an answer that unambiguously defines such units.

Related

What does O(nm/8 * log(nm/8)) + O(nm/9 * log(nm/9)) + ... + O(nm/m * log(nm/m)) equal to?

Bayesian estimation of log-normal using JAGS

What is definition of truncated polynomial?

Solving the multiple-choice multidimensional knapsack

Division in double precision

Categories

Resources