java method: java.lang.Integer.numberOfLeadingZeros(int) can be optimized - jvm

the origin code is :
public static int numberOfLeadingZeros(int i) {
// HD, Figure 5-6
if (i == 0)
return 32;
int n = 1;
if (i >>> 16 == 0) { n += 16; i <<= 16; }
if (i >>> 24 == 0) { n += 8; i <<= 8; }
if (i >>> 28 == 0) { n += 4; i <<= 4; }
if (i >>> 30 == 0) { n += 2; i <<= 2; }
n -= i >>> 31;
return n;
}
I think it can be optimized ,should add following condition:
if (i < 0)
return 0;
the fully optimized code is :
public static int numberOfLeadingZeros(int i) {
if(i<=0) {
return i < 0 ? 0 : 32;
}
int n = 1;
if (i >>> 16 == 0) { n += 16; i <<= 16; }
if (i >>> 24 == 0) { n += 8; i <<= 8; }
if (i >>> 28 == 0) { n += 4; i <<= 4; }
if (i >>> 30 == 0) { n += 2; i <<= 2; }
n -= i >>> 31;
return n;
}

In theory yes, your suggestion makes sense.
In practice, unless you use an exotic JVM, it will not make any difference because the method is intrinsic, so the code that is executed is not the code you can find in the Java class.
For example on x86/64 cpus, the code is here and uses the bsrl CPU instruction, which is as fast as you can hope for.

Besides the fact that this method will likely get replaced by an intrinsic operation for hot spots, this check for negative numbers is only an improvement, if the number is negative. For positive numbers, it is just an additional condition to be evaluated.
So the worth of this optimization depends on the likelihood of negative arguments at this function. When I consider typical use cases of this function, I’d consider negative values a corner case rather than typical argument.
Note that the special handling of zero at the beginning is not an optimization, but a requirement as the algorithm wouldn’t return the correct result for zero without that special handling.
Since your bug report yield to finding an alternative (also shown in your updated question) which improves the negative number case without affecting the performance of the positive number case, as it fuses the required zero test and the test for negative numbers into a single pre-test, there is nothing preventing the suggested optimization.

Bug has been created on oracle bug database: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8189230

Related

When can an algorithm have square root(n) time complexity?

Can someone give me example of an algorithm that has square root(n) time complexity. What does square root time complexity even mean?
Square root time complexity means that the algorithm requires O(N^(1/2)) evaluations where the size of input is N.
As an example for an algorithm which takes O(sqrt(n)) time, Grover's algorithm is one which takes that much time. Grover's algorithm is a quantum algorithm for searching an unsorted database of n entries in O(sqrt(n)) time.
Let us take an example to understand how can we arrive at O(sqrt(N)) runtime complexity, given a problem. This is going to be elaborate, but is interesting to understand. (The following example, in the context for answering this question, is taken from Coding Contest Byte: The Square Root Trick , very interesting problem and interesting trick to arrive at O(sqrt(n)) complexity)
Given A, containing an n elements array, implement a data structure for point updates and range sum queries.
update(i, x)-> A[i] := x (Point Updates Query)
query(lo, hi)-> returns A[lo] + A[lo+1] + .. + A[hi]. (Range Sum Query)
The naive solution uses an array. It takes O(1) time for an update (array-index access) and O(hi - lo) = O(n) for the range sum (iterating from start index to end index and adding up).
A more efficient solution splits the array into length k slices and stores the slice sums in an array S.
The update takes constant time, because we have to update the value for A and the value for the corresponding S. In update(6, 5) we have to change A[6] to 5 which results in changing the value of S1 to keep S up to date.
The range-sum query is interesting. The elements of the first and last slice (partially contained in the queried range) have to be traversed one by one, but for slices completely contained in our range we can use the values in S directly and get a performance boost.
In query(2, 14) we get,
query(2, 14) = A[2] + A[3]+ (A[4] + A[5] + A[6] + A[7]) + (A[8] + A[9] + A[10] + A[11]) + A[12] + A[13] + A[14] ;
query(2, 14) = A[2] + A[3] + S[1] + S[2] + A[12] + A[13] + A[14] ;
query(2, 14) = 0 + 7 + 11 + 9 + 5 + 2 + 0;
query(2, 14) = 34;
The code for update and query is:
def update(S, A, i, k, x):
S[i/k] = S[i/k] - A[i] + x
A[i] = x
def query(S, A, lo, hi, k):
s = 0
i = lo
//Section 1 (Getting sum from Array A itself, starting part)
while (i + 1) % k != 0 and i <= hi:
s += A[i]
i += 1
//Section 2 (Getting sum from Slices directly, intermediary part)
while i + k <= hi:
s += S[i/k]
i += k
//Section 3 (Getting sum from Array A itself, ending part)
while i <= hi:
s += A[i]
i += 1
return s
Let us now determine the complexity.
Each query takes on average
Section 1 takes k/2 time on average. (you might iterate atmost k/2)
Section 2 takes n/k time on average, basically number of slices
Section 3 takes k/2 time on average. (you might iterate atmost k/2)
So, totally, we get k/2 + n/k + k/2 = k + n/k time.
And, this is minimized for k = sqrt(n). sqrt(n) + n/sqrt(n) = 2*sqrt(n)
So we get a O(sqrt(n)) time complexity query.
Prime numbers
As mentioned in some other answers, some basic things related to prime numbers take O(sqrt(n)) time:
Find number of divisors
Find sum of divisors
Find Euler's totient
Below I mention two advanced algorithms which also bear sqrt(n) term in their complexity.
MO's Algorithm
try this problem: Powerful array
My solution:
#include <bits/stdc++.h>
using namespace std;
const int N = 1E6 + 10, k = 500;
struct node {
int l, r, id;
bool operator<(const node &a) {
if(l / k == a.l / k) return r < a.r;
else return l < a.l;
}
} q[N];
long long a[N], cnt[N], ans[N], cur_count;
void add(int pos) {
cur_count += a[pos] * cnt[a[pos]];
++cnt[a[pos]];
cur_count += a[pos] * cnt[a[pos]];
}
void rm(int pos) {
cur_count -= a[pos] * cnt[a[pos]];
--cnt[a[pos]];
cur_count -= a[pos] * cnt[a[pos]];
}
int main() {
int n, t;
cin >> n >> t;
for(int i = 1; i <= n; i++) {
cin >> a[i];
}
for(int i = 0; i < t; i++) {
cin >> q[i].l >> q[i].r;
q[i].id = i;
}
sort(q, q + t);
memset(cnt, 0, sizeof(cnt));
memset(ans, 0, sizeof(ans));
int curl(0), curr(0), l, r;
for(int i = 0; i < t; i++) {
l = q[i].l;
r = q[i].r;
/* This part takes O(n * sqrt(n)) time */
while(curl < l)
rm(curl++);
while(curl > l)
add(--curl);
while(curr > r)
rm(curr--);
while(curr < r)
add(++curr);
ans[q[i].id] = cur_count;
}
for(int i = 0; i < t; i++) {
cout << ans[i] << '\n';
}
return 0;
}
Query Buffering
try this problem: Queries on a Tree
My solution:
#include <bits/stdc++.h>
using namespace std;
const int N = 2e5 + 10, k = 333;
vector<int> t[N], ht;
int tm_, h[N], st[N], nd[N];
inline int hei(int v, int p) {
for(int ch: t[v]) {
if(ch != p) {
h[ch] = h[v] + 1;
hei(ch, v);
}
}
}
inline void tour(int v, int p) {
st[v] = tm_++;
ht.push_back(h[v]);
for(int ch: t[v]) {
if(ch != p) {
tour(ch, v);
}
}
ht.push_back(h[v]);
nd[v] = tm_++;
}
int n, tc[N];
vector<int> loc[N];
long long balance[N];
vector<pair<long long,long long>> buf;
inline long long cbal(int v, int p) {
long long ans = balance[h[v]];
for(int ch: t[v]) {
if(ch != p) {
ans += cbal(ch, v);
}
}
tc[v] += ans;
return ans;
}
inline void bal() {
memset(balance, 0, sizeof(balance));
for(auto arg: buf) {
balance[arg.first] += arg.second;
}
buf.clear();
cbal(1,1);
}
int main() {
int q;
cin >> n >> q;
for(int i = 1; i < n; i++) {
int x, y; cin >> x >> y;
t[x].push_back(y); t[y].push_back(x);
}
hei(1,1);
tour(1,1);
for(int i = 0; i < ht.size(); i++) {
loc[ht[i]].push_back(i);
}
vector<int>::iterator lo, hi;
int x, y, type;
for(int i = 0; i < q; i++) {
cin >> type;
if(type == 1) {
cin >> x >> y;
buf.push_back(make_pair(x,y));
}
else if(type == 2) {
cin >> x;
long long ans(0);
for(auto arg: buf) {
hi = upper_bound(loc[arg.first].begin(), loc[arg.first].end(), nd[x]);
lo = lower_bound(loc[arg.first].begin(), loc[arg.first].end(), st[x]);
ans += arg.second * (hi - lo);
}
cout << tc[x] + ans/2 << '\n';
}
else assert(0);
if(i % k == 0) bal();
}
}
There are many cases.
These are the few problems which can be solved in root(n) complexity [better may be possible also].
Find if a number is prime or not.
Grover's Algorithm: allows search (in quantum context) on unsorted input in time proportional to the square root of the size of the input.link
Factorization of the number.
There are many problems that you will face which will demand use of sqrt(n) complexity algorithm.
As an answer to second part:
sqrt(n) complexity means if the input size to your algorithm is n then there approximately sqrt(n) basic operations ( like **comparison** in case of sorting). Then we can say that the algorithm has sqrt(n) time complexity.
Let's analyze the 3rd problem and it will be clear.
let's n= positive integer. Now there exists 2 positive integer x and y such that
x*y=n;
Now we know that whatever be the value of x and y one of them will be less than sqrt(n). As if both are greater than sqrt(n)
x>sqrt(n) y>sqrt(n) then x*y>sqrt(n)*sqrt(n) => n>n--->contradiction.
So if we check 2 to sqrt(n) then we will have all the factors considered ( 1 and n are trivial factors).
Code snippet:
int n;
cin>>n;
print 1,n;
for(int i=2;i<=sqrt(n);i++) // or for(int i=2;i*i<=n;i++)
if((n%i)==0)
cout<<i<<" ";
Note: You might think that not considering the duplicate we can also achieve the above behaviour by looping from 1 to n. Yes that's possible but who wants to run a program which can run in O(sqrt(n)) in O(n).. We always look for the best one.
Go through the book of Cormen Introduction to Algorithms.
I will also request you to read following stackoverflow question and answers they will clear all the doubts for sure :)
Are there any O(1/n) algorithms?
Plain english explanation Big-O
Which one is better?
How do you calculte big-O complexity?
This link provides a very basic beginner understanding of O() i.e., O(sqrt n) time complexity. It is the last example in the video, but I would suggest that you watch the whole video.
https://www.youtube.com/watch?v=9TlHvipP5yA&list=PLDN4rrl48XKpZkf03iYFl-O29szjTrs_O&index=6
The simplest example of an O() i.e., O(sqrt n) time complexity algorithm in the video is:
p = 0;
for(i = 1; p <= n; i++) {
p = p + i;
}
Mr. Abdul Bari is reknowned for his simple explanations of data structures and algorithms.
Primality test
Solution in JavaScript
const isPrime = n => {
for(let i = 2; i <= Math.sqrt(n); i++) {
if(n % i === 0) return false;
}
return true;
};
Complexity
O(N^1/2) Because, for a given value of n, you only need to find if its divisible by numbers from 2 to its root.
JS Primality Test
O(sqrt(n))
A slightly more performant version, thanks to Samme Bae, for enlightening me with this. 😉
function isPrime(n) {
if (n <= 1)
return false;
if (n <= 3)
return true;
// Skip 4, 6, 8, 9, and 10
if (n % 2 === 0 || n % 3 === 0)
return false;
for (let i = 5; i * i <= n; i += 6) {
if (n % i === 0 || n % (i + 2) === 0)
return false;
}
return true;
}
isPrime(677);

Sum of Hamming Distances

I started preparing for an interview and came across this problem:
An array of integers is given
Now calculate the sum of Hamming distances of all pairs of integers in the array in their binary representation.
Example:
given {1,2,3} or {001,010,011} (used 3 bits just to simplify)
result= HD(001,010)+HD(001,011)+HD(010,011)= 2+1+1=4;
The only optimization, from a purely brute force solution, I know I can use here, is in the individual calculation of Hamming Distance as seen here:
int hamming_distance(unsigned x, unsigned y)
{
int dist;
unsigned val;
dist = 0;
val = x ^ y; // XOR
// Count the number of bits set
while (val != 0)
{
// A bit is set, so increment the count and clear the bit
dist++;
val &= val - 1;
}
// Return the number of differing bits
return dist;
}
What's the best way to go about solving this problem?
Here is my C++ implementation, with O(n) complexity and O(1) space.
int sumOfHammingDistance(vector<unsigned>& nums) {
int n = sizeof(unsigned) * 8;
int len = nums.size();
vector<int> countOfOnes(n, 0);
for (int i = 0; i < len; i++) {
for (int j = 0; j < n; j++) {
countOfOnes[j] += (nums[i] >> j) & 1;
}
}
int sum = 0;
for (int count: countOfOnes) {
sum += count * (len - count);
}
return sum;
}
You can consider the bit-positions separately. That gives you 32 (or some other number) of easier problems, where you still have to calculate the sum of all pairs of hamming distances, except now it's over 1-bit numbers.
The hamming distance between two 1-bit numbers is their XOR.
And now it has become the easiest case of this problem - it's already split per bit.
So to reiterate the answer to that question, you take a bit position, count the number of 0's and the number of 1's, multiply those to get the contribution of this bit position. Sum those for all bit positions. It's even simpler than the linked problem, because the weight of the contribution of every bit is 1 in this problem.

Simulating a card game. degenerate suits

This might be a bit cryptic title but I have a very specific problem. First my current setup
Namely in my card simulator I deal 32 cards to 4 players in sets of 8. So 8 cards per player.
With the 4 standard suits (spades, harts , etc)
My current implementation cycles threw all combinations of 8 out of 32
witch gives me a large number of possibilities.
Namely the first player can have 10518300 different hands be dealt.
The second can then be dealt 735471 different hands.
The third player then 12870 different hands.
and finally the fourth can have only 1
giving me a grand total of 9.9561092e+16 different unique ways to deal a deck of 32 cards to 4 players. if the order of cards doesn’t matter.
On a 4 Ghz processor even with 1 tick per possibility it would take me half a year.
However I would like to simplify this dealing of cards by making the exchange of diamonds, harts and spades. Meaning that dealing of 8 harts to player 1 is equivalent to dealing 8 spades. (note that this doesn’t apply to clubs)
I am looking for a way to generate this. Because this will cut down the possibilities of the first hand by at least a factor of 6. My current implementation is in c++.
But feel free to answer in a different Languages
/** http://stackoverflow.com/a/9331125 */
unsigned cjasMain::nChoosek( unsigned n, unsigned k )
{
//assert(k < n);
if (k > n) return 0;
if (k * 2 > n) k = n-k;
if (k == 0) return 1;
int result = n;
for( int i = 2; i <= k; ++i ) {
result *= (n-i+1);
result /= i;
}
return result;
}
/** [combination c n p x]
* get the [x]th lexicographically ordered set of [r] elements in [n]
* output is in [c], and should be sizeof(int)*[r]
* http://stackoverflow.com/a/794 */
void cjasMain::Combination(int8_t* c,unsigned n,unsigned r, unsigned x){
++x;
assert(x>0);
int i,p,k = 0;
for(i=0;i<r-1;i++){
c[i] = (i != 0) ? c[i-1] : 0;
do {
c[i]++;
p = nChoosek(n-c[i],r-(i+1));
k = k + p;
} while(k < x);
k = k - p;
}
c[r-1] = c[r-2] + x - k;
}
/**http://stackoverflow.com/a/9430993 */
template <unsigned n,std::size_t r>
void cjasMain::Combinations()
{
static_assert(n>=r,"error n needs to be larger then r");
std::vector<bool> v(n);
std::fill(v.begin() + r, v.end(), true);
do
{
for (int i = 0; i < n; ++i)
{
if (!v[i])
{
COUT << (i+1) << " ";
}
}
static int j=0;
COUT <<'\t'<< j++<< "\n";
}
while (std::next_permutation(v.begin(), v.end()));
return;
}
A requirement is that from lexicographical number I can get back the original array.
Even the slightest optimization can help my monto carol simulation I hope.

Which are faster squares or roots?

for (int i = 2; i * i <= n; i++)
for (int i = 2; i <= SQRT(n); i++)
just wondering which is faster I looked at some primitive algorithms for getting roots and it would seem to me that squaring the number would be faster but I don't know for sure. These loops are for determining a numbers "primeness".
Shouldn't the comaprison be between
int sqrt = SQRT(n);
for (int i = 2; i <= sqrt; i++)
and
for (int i = 2; i * i <= n; i++)
The answer will depend on how many loop iterations you do. The sqrt method does less work per iteration, but it has a higher start-up cost. Mind you, this reeks of premature optimisation.
Compiler may 'cache' result of SQRT (n), but i * i it should compute on each step.
Square root will take longer, unless it's implemented in hardware, lookup, or a special machine code version. Newton iteration is the algorithm of choice; it converges quadratically.
Best to benchmark for yourself. I'd recommend moving the call to square root outside the loop so you only do it once rather than every time you check the exit condition.
Why not skip both of them and use some clever maths? The Following code avoid both of them using the Property that Sum of the First n odd numbers is always a perfect square.
A shameless plug for my old blogpost (from my dead blog)
int isPrime(int n)
{
int squares = 1;
int odd = 3;
if( ((n & 1) == 0) || (n < 9)) return (n == 2) || ((n > 1) && (n & 1));
else
{
for( ;squares <= n; odd += 2)
{
if( n % odd == 0)
return 0;
squares+=odd;
}
return 1;
}
}
The square will be faster.
But the square will overflow if n is larger than the square root of the largest int, and then the comparison will go wrong. The square root function could (and you would expect to) be implemented in such a way that is can be calculated on arguments all the way up to the largest representable int. That means it won't go wrong in that way.
In Java, the largest int is 2^31 - 1, which means its square root is just under 46341. If you want to look for primes larger than that, the squaring would stop you.

Number of possible combinations

How many possible combinations of the variables a,b,c,d,e are possible if I know that:
a+b+c+d+e = 500
and that they are all integers and >= 0, so I know they are finite.
#Torlack, #Jason Cohen: Recursion is a bad idea here, because there are "overlapping subproblems." I.e., If you choose a as 1 and b as 2, then you have 3 variables left that should add up to 497; you arrive at the same subproblem by choosing a as 2 and b as 1. (The number of such coincidences explodes as the numbers grow.)
The traditional way to attack such a problem is dynamic programming: build a table bottom-up of the solutions to the sub-problems (starting with "how many combinations of 1 variable add up to 0?") then building up through iteration (the solution to "how many combinations of n variables add up to k?" is the sum of the solutions to "how many combinations of n-1 variables add up to j?" with 0 <= j <= k).
public static long getCombos( int n, int sum ) {
// tab[i][j] is how many combinations of (i+1) vars add up to j
long[][] tab = new long[n][sum+1];
// # of combos of 1 var for any sum is 1
for( int j=0; j < tab[0].length; ++j ) {
tab[0][j] = 1;
}
for( int i=1; i < tab.length; ++i ) {
for( int j=0; j < tab[i].length; ++j ) {
// # combos of (i+1) vars adding up to j is the sum of the #
// of combos of i vars adding up to k, for all 0 <= k <= j
// (choosing i vars forces the choice of the (i+1)st).
tab[i][j] = 0;
for( int k=0; k <= j; ++k ) {
tab[i][j] += tab[i-1][k];
}
}
}
return tab[n-1][sum];
}
$ time java Combos
2656615626
real 0m0.151s
user 0m0.120s
sys 0m0.012s
The answer to your question is 2656615626.
Here's the code that generates the answer:
public static long getNumCombinations( int summands, int sum )
{
if ( summands <= 1 )
return 1;
long combos = 0;
for ( int a = 0 ; a <= sum ; a++ )
combos += getNumCombinations( summands-1, sum-a );
return combos;
}
In your case, summands is 5 and sum is 500.
Note that this code is slow. If you need speed, cache the results from summand,sum pairs.
I'm assuming you want numbers >=0. If you want >0, replace the loop initialization with a = 1 and the loop condition with a < sum. I'm also assuming you want permutations (e.g. 1+2+3+4+5 plus 2+1+3+4+5 etc). You could change the for-loop if you wanted a >= b >= c >= d >= e.
I solved this problem for my dad a couple months ago...extend for your use. These tend to be one time problems so I didn't go for the most reusable...
a+b+c+d = sum
i = number of combinations
for (a=0;a<=sum;a++)
{
for (b = 0; b <= (sum - a); b++)
{
for (c = 0; c <= (sum - a - b); c++)
{
//d = sum - a - b - c;
i++
}
}
}
This would actually be a good question to ask on an interview as it is simple enough that you could write up on a white board, but complex enough that it might trip someone up if they don't think carefully enough about it. Also, you can also for two different answers which cause the implementation to be quite different.
Order Matters
If the order matters then any solution needs to allow for zero to appear for any of the variables; thus, the most straight forward solution would be as follows:
public class Combos {
public static void main() {
long counter = 0;
for (int a = 0; a <= 500; a++) {
for (int b = 0; b <= (500 - a); b++) {
for (int c = 0; c <= (500 - a - b); c++) {
for (int d = 0; d <= (500 - a - b - c); d++) {
counter++;
}
}
}
}
System.out.println(counter);
}
}
Which returns 2656615626.
Order Does Not Matter
If the order does not matter then the solution is not that much harder as you just need to make sure that zero isn't possible unless sum has already been found.
public class Combos {
public static void main() {
long counter = 0;
for (int a = 1; a <= 500; a++) {
for (int b = (a != 500) ? 1 : 0; b <= (500 - a); b++) {
for (int c = (a + b != 500) ? 1 : 0; c <= (500 - a - b); c++) {
for (int d = (a + b + c != 500) ? 1 : 0; d <= (500 - a - b - c); d++) {
counter++;
}
}
}
}
System.out.println(counter);
}
}
Which returns 2573155876.
One way of looking at the problem is as follows:
First, a can be any value from 0 to 500. Then if follows that b+c+d+e = 500-a. This reduces the problem by one variable. Recurse until done.
For example, if a is 500, then b+c+d+e=0 which means that for the case of a = 500, there is only one combination of values for b,c,d and e.
If a is 300, then b+c+d+e=200, which is in fact the same problem as the original problem, just reduced by one variable.
Note: As Chris points out, this is a horrible way of actually trying to solve the problem.
link text
If they are a real numbers then infinite ... otherwise it is a bit trickier.
(OK, for any computer representation of a real number there would be a finite count ... but it would be big!)
It has general formulae, if
a + b + c + d = N
Then number of non-negative integral solution will be C(N + number_of_variable - 1, N)
#Chris Conway answer is correct. I have tested with a simple code that is suitable for smaller sums.
long counter = 0;
int sum=25;
for (int a = 0; a <= sum; a++) {
for (int b = 0; b <= sum ; b++) {
for (int c = 0; c <= sum; c++) {
for (int d = 0; d <= sum; d++) {
for (int e = 0; e <= sum; e++) {
if ((a+b+c+d+e)==sum) counter=counter+1L;
}
}
}
}
}
System.out.println("counter e "+counter);
The answer in math is 504!/(500! * 4!).
Formally, for x1+x2+...xk=n, the number of combination of nonnegative number x1,...xk is the binomial coefficient: (k-1)-combination out of a set containing (n+k-1) elements.
The intuition is to choose (k-1) points from (n+k-1) points and use the number of points between two chosen points to represent a number in x1,..xk.
Sorry about the poor math edition for my fist time answering Stack Overflow.
Just a test for code block
Just a test for code block
Just a test for code block
Including negatives? Infinite.
Including only positives? In this case they wouldn't be called "integers", but "naturals", instead. In this case... I can't really solve this, I wish I could, but my math is too rusty. There is probably some crazy integral way to solve this. I can give some pointers for the math skilled around.
being x the end result,
the range of a would be from 0 to x,
the range of b would be from 0 to (x - a),
the range of c would be from 0 to (x - a - b),
and so forth until the e.
The answer is the sum of all those possibilities.
I am trying to find some more direct formula on Google, but I am really low on my Google-Fu today...