How to determine the boundaries in binary search?

How to determine the boundaries in binary search? - binary-search

I know how binary search works but I always make small mistakes when I need to implement one.
The following code is the solution for leetcode 287 find the duplicate number
class Solution {
public:
int findDuplicate(vector<int>& nums) {
int low = 1, high = nums.size() - 1;
while (low < high) {
int mid = low + (high - low) * 0.5;
int cnt = 0;
for (auto a : nums) {
if (a <= mid) ++cnt;
}
if (cnt <= mid) low = mid + 1;
else high = mid;
}
return low;
}
};
There are several places I am confused about:
1.the condition for the while loop low<high or low<=high
2.a<=mid or a<mid (specific for this example)
3.cnt<= mid or cnt<mid
4.low=mid+1 or low=mid
5.high=mid or high=mid-1
6.which value do I return?
Is there a good way to remember or understand which the correct combinations to use?

When writing a binary search there are a couple of things to consider. The first is what interval range you are searching over and specifically, how you are defining it.
For example, it could be inclusive of both low and high, meaning [low, high], but it could also be exclusive of high, [low, high). Which of these you choose will change the rest of your algorithm.
The obvious implication is the initial values. Generally, high should be the length of the array if it is exclusive and it should be one less if it's inclusive, but it could be something entirely different depending on the problem you're solving.
For the while loop you want it to terminate when the search interval is empty, meaning there are no more candidates to check. If you are using the interval [low, high], then this will be empty when low is strictly greater than high (for example, [5,5] contains 5, but [6,5] contains nothing), so the while loop will check for the opposite, low <= high. However, if you use the interval [low, high), then this interval is empty when low is equal to high, so the while loop needs to check for low < high.
Within the while loop, after checking mid, you want to remove it from the interval so you don't check it again. If high is inclusive, then you have to use one less than mid as the new high in order to exclude it from the interval. But if high is exclusive, then setting high equal to mid is enough to exclude it.
As for when to update low vs high, this depends on what you're searching for. Besides the basic binary search where you just want to know if something exists exactly in the collection, you will have to consider what to do when you are as close as you can get.
In C++ for example, the more useful versions of binary_search are called lower_bound and upper_bound. If the value being searched for doesn't exist in the container, then these both return the same position, namely the first position which is larger than the search value. This is convenient since this is the position you should insert that value if you want to keep the container sorted. However, if the value is in the container, possibly multiple times, then lower_bound will return the first occurrence of the value, whereas upper_bound will still return the first position larger than the value (or in other words, a right bound to the location of the values).
To get these different behaviors you update either the low or high bound when mid is equal to the search value. If you want the lower bound, then you want to continue searching the lower half of your search range, so you bring high down. If you want the high bound, then you bring low up. In your example, it brings low up when cnt == mid, so it will find an upper bound.
As for what to return, it depends on both your search interval and what you're looking for. In your example, the while loop is checking (low < high), so low and high will be equal when it breaks and it doesn't matter which you use, but even then you may want to return left - 1 or left + 1 depending on the problem. If the while loop is (low <= high) then when it breaks low == high + 1, so it will depend on what you're looking for. When in doubt you can always think through an example.
So to put this all to use, here is a version of the solution you mentioned, but using an interval of [low, high] rather than [low, high):
class Solution {
public:
int findDuplicate(vector<int>& nums) {
int low = 1, high = nums.size() - 2;
while (low <= high) {
int mid = low + (high - low) * 0.5;
int cnt = 0;
for (auto a : nums) {
if (a <= mid) ++cnt;
}
if (cnt <= mid) low = mid + 1;
else high = mid - 1;
}
return low;
}
};
PS: The reason I didn't mention the interval (low, high] or (low, high) is because it messes with the math around calculating the mid index. Because int math will round down, you can end up in a situation where mid is searched again. For example, if low is 7 and high is 9, then low + (high - low) * 0.5 will be 8. After updating low to 8 (since it's exclusive you wouldn't add one), low + (high - low) * 0.5 will still be 8 and your loop will never terminate. You can get around this by adding 1 to the part being divided by 2, but generally it's cleaner to go with an interval where low is inclusive.

You could just use leetcode's guide to binary search for reference. https://leetcode.com/explore/learn/card/binary-search

Related

Trade off between Linear and Binary Search

I have a list of elements to be searched in a dataset of variable lengths. I have tried binary search and I found it is not always efficient when the objective is to search a list of elements.
I did the following study and conclude that if the number of elements to be searched is less than 5% of the data, binary search is efficient, other wise the Linear search is better.
Below are the details
Number of elements : 100000
Number of elements to be searched: 5000
Number of Iterations (Binary Search) =
log2 (N) x SearchCount=log2 (100000) x 5000=83048
Further increase in the number of search elements lead to more iterations than the linear search.
Any thoughts on this?
I am calling the below function only if the number elements to be searched is less than 5%.
private int SearchIndex(ref List<long> entitylist, ref long[] DataList, int i, int len, ref int listcount)
{
int Start = i;
int End = len-1;
int mid;
while (Start <= End)
{
mid = (Start + End) / 2;
long target = DataList[mid];
if (target == entitylist[listcount])
{
i = mid;
listcount++;
return i;
}
else
{
if (target < entitylist[listcount])
{
Start = mid + 1;
}
if (target > entitylist[listcount])
{
End = mid - 1;
}
}
}
listcount++;
return -1; //if the element in the list is not in the dataset
}
In the code I retun the index rather than the value because, I need to work with Index in the calling function. If i=-1, the calling function resets the value to the previous i and calls the function again with a new element to search.

In your problem you are looking for M values in an N long array, N > M, but M can be quite large.
Usually this can be approached as M independent binary searches (or even with the slight optimization of using the previous result as a starting point): you are going to O(M*log(N)).
However, using the fact that also the M values are sorted, you can find all of them in one pass, with linear search. In this case you are going to have your problem O(N). In fact this is better than O(M*log(N)) for M large.
But you have a third option: since M values are sorted, binary split M too, and every time you find it, you can limit the subsequent searches in the ranges on the left and on the right of the found index.
The first look-up is on all the N values, the second two on (average) N/2, than 4 on N/4 data,.... I think that this scale as O(log(M)*log(N)). Not sure of it, comments welcome!
However here is a test code - I have slightly modified your code, but without altering its functionality.
In case you have M=100000 and N=1000000, the "M binary search approach" takes about 1.8M iterations, that's more that the 1M needed to scan linearly the N values. But with what I suggest it takes just 272K iterations.
Even in case the M values are very "collapsed" (eg, they are consecutive), and the linear search is in the best condition (100K iterations would be enough to get all of them, see the comments in the code), the algorithm performs very well.

How to calculate the sum of all the odd numbers less than 1000 using a for loop in Xcode (Objective C)

I'm very new to programming and I have no idea where to start, just looking for some help with this, I know its very simple but I'm clueless, thanks for the help!
So this is the code I have:
NSInteger sum = 0;
for (int a = 1; a < 500; a++) {
sum += (a * 2 - 1); }
NSLog(#"The sum of all the odd numbers within the range = %ld",(long)sum);
but I'm getting a answer of 249,001, but it should be 250,000
Appreciate the help!

Your immediate problem is that you're missing a term in the sum: your output differs from the actual answer by 999. This ought to motivate you to write a <= 500 instead as the stopping condition in the for loop.
But, in reality, you would not use a for loop for this as there is an alternative that's much cheaper computationally speaking.
Note that this is an arithmetic progression and there is therefore a closed-form solution to this. That is, you can get the answer out in O(1) rather than by using a loop which would be O(n); i.e. the compute time grows linearly with the number of terms that you want.
Recognising that there are 500 odd numbers in your range, you can use
n * (2 * a + (n - 1) * d) / 2
to compute this. In your case, n is 500. d (the difference between the terms) is 2. a (the first term) is 1.
See https://en.wikipedia.org/wiki/Arithmetic_progression

Get the most occuring number amongst several integers without using arrays

DISCLAIMER: Rather theoretical question here, not looking for a correct answere, just asking for some inspiration!
Consider this:
A function is called repetitively and returns integers based on seeds (the same seed returns the same integer). Your task is to find out which integer is returned most often. Easy enough, right?
But: You are not allowed to use arrays or fields to store return values of said function!
Example:
int mostFrequentNumber = 0;
int occurencesOfMostFrequentNumber = 0;
int iterations = 10000000;
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
int occurencesOfResult = magic();
if(occurencesOfResult > occurencesOfMostFrequentNumber)
{
mostFrequentNumber = result;
occurencesOfMostFrequentNumber = occurencesOfResult;
}
}
If getNumberFromSeed() returns 2,1,5,18,5,6 and 5 then mostFrequentNumber should be 5 and occurencesOfMostFrequentNumber should be 3 because 5 is returned 3 times.
I know this could easily be solved using a two-dimentional list to store results and occurences. But imagine for a minute that you can not use any kind of arrays, lists, dictionaries etc. (Maybe because the system that is running the code has such a limited memory, that you cannot store enough integers at once or because your prehistoric programming language has no concept of collections).
How would you find mostFrequentNumber and occurencesOfMostFrequentNumber? What does magic() do?? (Of cause you do not have to stick to the example code. Any ideas are welcome!)
EDIT: I should add that the integers returned by getNumber() should be calculated using a seed, so the same seed returns the same integer (i.e. int result = getNumber(5); this would always assign the same value to result)

Make an hypothesis: Assume that the distribution of integers is, e.g., Normal.
Start simple. Have two variables
. N the number of elements read so far
. M1 the average of said elements.
Initialize both variables to 0.
Every time you read a new value x update N to be N + 1 and M1 to be M1 + (x - M1)/N.
At the end M1 will equal the average of all values. If the distribution was Normal this value will have a high frequency.
Now improve the above. Add a third variable:
M2 the average of all (x - M1)^2 for all values of xread so far.
Initialize M2 to 0. Now get a small memory of say 10 elements or so. For every new value x that you read update N and M1 as above and M2 as:
M2 := M2 + (x - M1)^2 * (N - 1) / N
At every step M2 is the variance of the distribution and sqrt(M2) its standard deviation.
As you proceed remember the frequencies of only the values read so far whose distances to M1 are less than sqrt(M2). This requires the use of some additional array, however, the array will be very short compared to the high number of iterations you will run. This modification will allow you to guess better the most frequent value instead of simply answering the mean (or average) as above.
UPDATE
Given that this is about insights for inspiration there is plenty of room for considering and adapting the approach I've proposed to any particular situation. Here are some thoughts
When I say assume that the distribution is Normal you should think of it as: Given that the problem has no solution, let's see if there is some qualitative information I can use to decide what kind of distribution would the data have. Given that the algorithm is intended to find the most frequent number, it should be fine to assume that the distribution is not uniform. Let's try with Normal, LogNormal, etc. to see what can be found out (more on this below.)
If the game completely disallows the use of any array, then fine, keep track of only, say 10 numbers. This would allow you to count the occurrences of the 10 best candidates, which will give more confidence to your answer. In doing this choose your candidates around the theoretical most likely value according to the distribution of your hypothesis.
You cannot use arrays but perhaps you can read the sequence of numbers two or three times, not just once. In that case you can read it once to check whether you hypothesis about its distribution is good nor bad. For instance, if you compute not just the variance but the skewness and the kurtosis you will have more elements to check your hypothesis. For instance, if the first reading indicates that there is some bias, you could use a LogNormal distribution instead, etc.
Finally, in addition to providing the approximate answer you would be able to use the information collected during the reading to estimate an interval of confidence around your answer.

Alright, I found a decent solution myself:
int mostFrequentNumber = 0;
int occurencesOfMostFrequentNumber = 0;
int iterations = 10000000;
int maxNumber = -2147483647;
int minNumber = 2147483647;
//Step 1: Find the largest and smallest number that _can_ occur
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
if(result > maxNumber)
{
maxNumber = result;
}
if(result < minNumber)
{
minNumber = result;
}
}
//Step 2: for each possible number between minNumber and maxNumber, count occurences
for(int thisNumber = minNumber; thisNumber <= maxNumber; thisNumber++)
{
int occurenceOfThisNumber = 0;
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
if(result == thisNumber)
{
occurenceOfThisNumber++;
}
}
if(occurenceOfThisNumber > occurencesOfMostFrequentNumber)
{
occurencesOfMostFrequentNumber = occurenceOfThisNumber;
mostFrequentNumber = thisNumber;
}
}
I must admit, this may take a long time, depending on the smallest and largest possible. But it will work without using arrays.

Fast FFT Bit Reversal, Can I Count Down Backwards Bit Reversed?

I'm using FFT's for audio processing, and I've come up with some potentially very fast ways of doing the bit reversal needed which might be of use to others, but because of the size of my FFT's (8192), I'm trying to reduce memory usage / cache flushing do to size of lookup tables or code, and increase performance. I've seen lots of clever bit reversal routines; they all allow you can feed them with any arbitrary value and get a bit reversed output, but FFT's don't need that flexibility since they go in a predictable sequence. First let me state what I have tried and/or figured out since it may be the fastest to date and you can see the problem, then I'll ask the question.
1) I've written a program to generate straight through, unlooped x86 source code that can be pasted into my FFT code, which reads an audio sample, multiplies it by a window value (that's a lookup table itself) and then just places the resulting value in it's proper bit reversed sorted position by absolute values within the x86 addressing modes like: movlps [edi+1876],xmm0. This is the absolute fastest way to do this for smaller FFT sizes. The problem is when I write straight through code to handle 8192 values, the code grows beyond the L1 instruction cache size and performance drops way down. Of course in contrast, a 32K bit reversal lookup table mixed with a 32K window table, plus other stuff, is also too big to fit the L1 data cache, and performance drops way down, but that's the way I'm currently doing it.
2) I've found patterns in the bit reversal sequence that can be exploited to reduce lookup table size, for example using 4 bit numbers (0..15) as an example, the bit reversal sequence looks like: 0,8,4,12,2,10,6,14|1,5,9,13,3,11,7,15. First thing that can be seen is that the last 8 numbers are the same as the first 8 +1, so I can chop my LUT half. If I look at the difference between the numbers there is more redundancy, so if I start with a zero in a register and want to add values to it to get the next bit reversed number they would be: +0,+8,-4,+8,-10,+8,-4,+8 and the same for the second half. As can be seen, I could have a lookup table of just 0 and -10 because the +8's and -4's always show up in a predictable way. The code would be unrolled to handle 4 values per loop: one would be a lookup table read, and the other 3 would be straight code for +8, -4, +8, before looping around again. Then a second loop could handle the 1,5,9,13,3,11,7,15 sequence. This is great, because I can now chop down my lookup table by another factor of 4. This scales up the same way for an 8192 size FFT. I can now get by with a 4K size LUT instead of 32K. I can exploit the same pattern and double the size of my code and chop down the LUT by another half yet again, however far I want to go. But in order to eliminate the LUT altogether, I'm back to the prohibitive code size.
For large FFT sizes, I believe that this #2 solution is the absolute fastest to date, since a relatively small percentage of lookup table reads need to be done, and every algorithm I currently find on the web requires too many serial/dependency calculations which can't be vectorized.
The question is, is there an algorithm that can increment numbers so the MSB acts like the LSB, and so on? In other words (in binary): 0000, 1000, 0100, 1100, 0010, etc… I've tried to think up some way, and so far, short of a bunch of nested loops, I can't seem to find a way for a fast and simple algorithm that is a mirror image of simply adding 1 to the LSB of a number. Yet it seems like there should be a way.

One other approach to consider: take a well known bit reversal algorithm - typically a few masks, shifts, and ORs - then implement this with SSE, so you get e.g. 8 x 16 bit bit reversals for the price of one. For 16 bits you need 5*log2(N) = 20 instructions, so the aggregate throughput would be 2.5 instructions per bit reversal.

This is the most trivial and straightforward solution (in C):
void BitReversedIncrement(unsigned *var, int bit)
{
unsigned c, one = 1u << bit;
do {
c = *var & one;
(*var) ^= one;
one >>= 1;
} while (one && c);
}
The main problem with is the conditional branches, which are often costly on modern CPUs. You have one conditional branch per bit.
You can do reversed increments by working on several bits at a time, e.g. 3 if ints are 32-bit:
void BitReversedIncrement2(unsigned *var, int bit)
{
unsigned r = *var, t = 0;
while (bit >= 2 && !t)
{
unsigned tt = (r >> (bit - 2)) & 7;
t = (07351624 >> (tt * 3)) & 7;
r ^= ((tt ^ t) << (bit - 2));
bit -= 3;
}
if (bit >= 0 && !t)
{
t = r & ((1 << (bit + 1)) - 1);
r ^= t;
t <<= 2 - bit;
t = (07351624 >> (t * 3)) & 7;
t >>= 2 - bit;
r |= t;
}
*var = r;
}
This is better, you only have 1 conditional branch per 3 bits.
If your CPU supports 64-bit ints, you can work on 4 bits at a time:
void BitReversedIncrement3(unsigned *var, int bit)
{
unsigned r = *var, t = 0;
while (bit >= 3 && !t)
{
unsigned tt = (r >> (bit - 3)) & 0xF;
t = (0xF7B3D591E6A2C48ULL >> (tt * 4)) & 0xF;
r ^= ((tt ^ t) << (bit - 3));
bit -= 4;
}
if (bit >= 0 && !t)
{
t = r & ((1 << (bit + 1)) - 1);
r ^= t;
t <<= 3 - bit;
t = (0xF7B3D591E6A2C48ULL >> (t * 4)) & 0xF;
t >>= 3 - bit;
r |= t;
}
*var = r;
}
Which is even better. And the only look-up table (07351624 or 0xF7B3D591E6A2C48) is tiny and likely encoded as an immediate instruction operand.
You can further improve the code if the bit position for the reversed "1" is a known constant. Just unroll the while loop into nested ifs, substitute the reversed one bit position constant.

For larger FFTs, paying attention to cache blocking (minimizing total uncovered cache miss cycles) can have a far larger effect on performance than optimization of the cycle count taken by indexing bit reversal. Make sure not to de-optimize a bigger effect by a larger cycle count while optimizing the smaller effect. For small FFTs, where everything fits in cache, LUTs can be a good solution as long as you pay attention to any load-use hazards by making sure things are or can be pipelined appropriately.

What's the fastest way to divide an integer by 3?

int x = n / 3; // <-- make this faster
// for instance
int a = n * 3; // <-- normal integer multiplication
int b = (n << 1) + n; // <-- potentially faster multiplication

The guy who said "leave it to the compiler" was right, but I don't have the "reputation" to mod him up or comment. I asked gcc to compile int test(int a) { return a / 3; } for an ix86 and then disassembled the output. Just for academic interest, what it's doing is roughly multiplying by 0x55555556 and then taking the top 32 bits of the 64 bit result of that. You can demonstrate this to yourself with eg:
$ ruby -e 'puts(60000 * 0x55555556 >> 32)'
20000
$ ruby -e 'puts(72 * 0x55555556 >> 32)'
24
$
The wikipedia page on Montgomery division is hard to read but fortunately the compiler guys have done it so you don't have to.

This is the fastest as the compiler will optimize it if it can depending on the output processor.
int a;
int b;
a = some value;
b = a / 3;

There is a faster way to do it if you know the ranges of the values, for example, if you are dividing a signed integer by 3 and you know the range of the value to be divided is 0 to 768, then you can multiply it by a factor and shift it to the left by a power of 2 to that factor divided by 3.
eg.
Range 0 -> 768
you could use shifting of 10 bits, which multiplying by 1024, you want to divide by 3 so your multiplier should be 1024 / 3 = 341,
so you can now use (x * 341) >> 10
(Make sure the shift is a signed shift if using signed integers), also make sure the shift is an actually shift and not a bit ROLL
This will effectively divide the value 3, and will run at about 1.6 times the speed as a natural divide by 3 on a standard x86 / x64 CPU.
Of course the only reason you can make this optimization when the compiler cant is because the compiler does not know the maximum range of X and therefore cannot make this determination, but you as the programmer can.
Sometime it may even be more beneficial to move the value into a larger value and then do the same thing, ie. if you have an int of full range you could make it an 64-bit value and then do the multiply and shift instead of dividing by 3.
I had to do this recently to speed up image processing, i needed to find the average of 3 color channels, each color channel with a byte range (0 - 255). red green and blue.
At first i just simply used:
avg = (r + g + b) / 3;
(So r + g + b has a maximum of 768 and a minimum of 0, because each channel is a byte 0 - 255)
After millions of iterations the entire operation took 36 milliseconds.
I changed the line to:
avg = (r + g + b) * 341 >> 10;
And that took it down to 22 milliseconds, its amazing what can be done with a little ingenuity.
This speed up occurred in C# even though I had optimisations turned on and was running the program natively without debugging info and not through the IDE.

See How To Divide By 3 for an extended discussion of more efficiently dividing by 3, focused on doing FPGA arithmetic operations.
Also relevant:
Optimizing integer divisions with Multiply Shift in C#

Depending on your platform and depending on your C compiler, a native solution like just using
y = x / 3
Can be fast or it can be awfully slow (even if division is done entirely in hardware, if it is done using a DIV instruction, this instruction is about 3 to 4 times slower than a multiplication on modern CPUs). Very good C compilers with optimization flags turned on may optimize this operation, but if you want to be sure, you are better off optimizing it yourself.
For optimization it is important to have integer numbers of a known size. In C int has no known size (it can vary by platform and compiler!), so you are better using C99 fixed-size integers. The code below assumes that you want to divide an unsigned 32-bit integer by three and that you C compiler knows about 64 bit integer numbers (NOTE: Even on a 32 bit CPU architecture most C compilers can handle 64 bit integers just fine):
static inline uint32_t divby3 (
uint32_t divideMe
) {
return (uint32_t)(((uint64_t)0xAAAAAAABULL * divideMe) >> 33);
}
As crazy as this might sound, but the method above indeed does divide by 3. All it needs for doing so is a single 64 bit multiplication and a shift (like I said, multiplications might be 3 to 4 times faster than divisions on your CPU). In a 64 bit application this code will be a lot faster than in a 32 bit application (in a 32 bit application multiplying two 64 bit numbers take 3 multiplications and 3 additions on 32 bit values) - however, it might be still faster than a division on a 32 bit machine.
On the other hand, if your compiler is a very good one and knows the trick how to optimize integer division by a constant (latest GCC does, I just checked), it will generate the code above anyway (GCC will create exactly this code for "/3" if you enable at least optimization level 1). For other compilers... you cannot rely or expect that it will use tricks like that, even though this method is very well documented and mentioned everywhere on the Internet.
Problem is that it only works for constant numbers, not for variable ones. You always need to know the magic number (here 0xAAAAAAAB) and the correct operations after the multiplication (shifts and/or additions in most cases) and both is different depending on the number you want to divide by and both take too much CPU time to calculate them on the fly (that would be slower than hardware division). However, it's easy for a compiler to calculate these during compile time (where one second more or less compile time plays hardly a role).

For 64 bit numbers:
uint64_t divBy3(uint64_t x)
{
return x*12297829382473034411ULL;
}
However this isn't the truncating integer division you might expect.
It works correctly if the number is already divisible by 3, but it returns a huge number if it isn't.
For example if you run it on for example 11, it returns 6148914691236517209. This looks like a garbage but it's in fact the correct answer: multiply it by 3 and you get back the 11!
If you are looking for the truncating division, then just use the / operator. I highly doubt you can get much faster than that.
Theory:
64 bit unsigned arithmetic is a modulo 2^64 arithmetic.
This means for each integer which is coprime with the 2^64 modulus (essentially all odd numbers) there exists a multiplicative inverse which you can use to multiply with instead of division. This magic number can be obtained by solving the 3*x + 2^64*y = 1 equation using the Extended Euclidean Algorithm.

What if you really don't want to multiply or divide? Here is is an approximation I just invented. It works because (x/3) = (x/4) + (x/12). But since (x/12) = (x/4) / 3 we just have to repeat the process until its good enough.
#include <stdio.h>
void main()
{
int n = 1000;
int a,b;
a = n >> 2;
b = (a >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
printf("a=%d\n", a);
}
The result is 330. It could be made more accurate using b = ((b+2)>>2); to account for rounding.
If you are allowed to multiply, just pick a suitable approximation for (1/3), with a power-of-2 divisor. For example, n * (1/3) ~= n * 43 / 128 = (n * 43) >> 7.
This technique is most useful in Indiana.

I don't know if it's faster but if you want to use a bitwise operator to perform binary division you can use the shift and subtract method described at this page:
Set quotient to 0
Align leftmost digits in dividend and divisor
Repeat:
If that portion of the dividend above the divisor is greater than or equal to the divisor:
Then subtract divisor from that portion of the dividend and
Concatentate 1 to the right hand end of the quotient
Else concatentate 0 to the right hand end of the quotient
Shift the divisor one place right
Until dividend is less than the divisor:
quotient is correct, dividend is remainder
STOP

For really large integer division (e.g. numbers bigger than 64bit) you can represent your number as an int[] and perform division quite fast by taking two digits at a time and divide them by 3. The remainder will be part of the next two digits and so forth.
eg. 11004 / 3 you say
11/3 = 3, remaineder = 2 (from 11-3*3)
20/3 = 6, remainder = 2 (from 20-6*3)
20/3 = 6, remainder = 2 (from 20-6*3)
24/3 = 8, remainder = 0
hence the result 3668
internal static List<int> Div3(int[] a)
{
int remainder = 0;
var res = new List<int>();
for (int i = 0; i < a.Length; i++)
{
var val = remainder + a[i];
var div = val/3;
remainder = 10*(val%3);
if (div > 9)
{
res.Add(div/10);
res.Add(div%10);
}
else
res.Add(div);
}
if (res[0] == 0) res.RemoveAt(0);
return res;
}

If you really want to see this article on integer division, but it only has academic merit ... it would be an interesting application that actually needed to perform that benefited from that kind of trick.

Easy computation ... at most n iterations where n is your number of bits:
uint8_t divideby3(uint8_t x)
{
uint8_t answer =0;
do
{
x>>=1;
answer+=x;
x=-x;
}while(x);
return answer;
}

A lookup table approach would also be faster in some architectures.
uint8_t DivBy3LU(uint8_t u8Operand)
{
uint8_t ai8Div3 = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, ....];
return ai8Div3[u8Operand];
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas