When to use "=" in binary search condition? - binary-search

I am quite confused about the scenarios when to use the = in binary search. For example, this is what i found from wiki, in which it is using while (imin <= imax)
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
int imid = midpoint(imin, imax);
if (A[imid] == key)
return imid;
else if (A[imid] < key)
imin = imid + 1;
else
imax = imid - 1;
}
return KEY_NOT_FOUND;
}
However, I also found a lot of code using something like
while (imin < imax)
My questions are: what's the concern to use the = or not? Any reason behind?
Thanks so much!

Note these two algorithms on wiki:
Iterative binary search:
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
// calculate the midpoint for roughly equal partition
int imid = midpoint(imin, imax);
if (A[imid] == key)
// key found at index imid
return imid;
// determine which subarray to search
else if (A[imid] < key)
// change min index to search upper subarray
imin = imid + 1;
else
// change max index to search lower subarray
imax = imid - 1;
}
// key was not found
return KEY_NOT_FOUND;
}
Iterative binary search with deferred detection of equality:
int binary_search(int A[], int key, int imin, int imax)
{
// continually narrow search until just one element remains
while (imin < imax)
{
int imid = midpoint(imin, imax);
// code must guarantee the interval is reduced at each iteration
assert(imid < imax);
// note: 0 <= imin < imax implies imid will always be less than imax
// reduce the search
if (A[imid] < key)
imin = imid + 1;
else
imax = imid;
}
// At exit of while:
// if A[] is empty, then imax < imin
// otherwise imax == imin
// deferred test for equality
if ((imax == imin) && (A[imin] == key))
return imin;
else
return KEY_NOT_FOUND;
}
You have three cases to consider, when imin < imax, imin == imax and imin > imax. The first algorithm deals with less than and equality within the while loop, whereas in the second algorithm, the equality case is deferred to the if statement. As wiki states:
The iterative and recursive versions take three paths based on the key comparison: one path for less than, one path for greater than, and one path for equality. (There are two conditional branches.) The path for equality is taken only when the record is finally matched, so it is rarely taken. That branch path can be moved outside the search loop in the deferred test for equality version of the algorithm.
The deferred detection approach foregoes the possibility of early termination on discovery of a match, so the search will take about log2(N) iterations. On average, a successful early termination search will not save many iterations. For large arrays that are a power of 2, the savings is about two iterations. Half the time, a match is found with one iteration left to go; one quarter the time with two iterations left, one eighth with three iterations, and so forth. The infinite series sum is 2.
The deferred detection algorithm has the advantage that if the keys are not unique, it returns the smallest index (the starting index) of the region where elements have the search key. The early termination version would return the first match it found, and that match might be anywhere in region of equal keys.
So the use of either <= in a while loop, or simply <, will depend on your choice of implementation.

When using binary search, sometimes it's important to look at what low < high and low <= high may bring.
For example,
Say you're at an iteration where you have an array like [50,10] where low and mid are at 50 : INDEX 0 and high is at 10 : INDEX 1.
Now if you were to use a while(low < high), let's say the condition would set low = mid + 1 since arr[low] > arr[high]. So now the low would be equal to high and the loop will break. Leaving mid at index 0.
If you require mid after the loop, the answer would simply be wrong, since it's at index 0 but low and high are both at index 1 , indicating that the smaller number was 10 instead of 50.
So in this case, we need while(loop <= high) so that mid will still be calculated when low == high, giving us low = mid = high.
Reference:
https://medium.com/#hazemu/useful-insights-into-binary-search-problems-8769d388b9c

If we want to determine if a specific value exists in the sorted array or not, we would want to use <=, here's a visual walkthrough I made that really drills in as to why <= should be used
https://youtu.be/7jci-yQhGho

Related

Big-O of fixed loops

I was discussing some code during an interview and don't think I did a good job articulating one of my blocks of code.
I know (high-level) we are taught two for loops == O(n^2), but what happens when you make some assertions as part of the work that limit the work done to a constant amount.
The code I came up with was something like
String[] someVal = new String[]{'a','b','c','d'} ;// this was really - some other computation
if(someVal != 4) {
return false;
}
for(int i=0; i < someVal; i++){
String subString = someVal[i];
if(subString.length() != 8){
return false;
}
for(int j = 0; j < subString.length(); j++){
// do some other stuff
}
}
So there are two for loops, but the number of iterations become fixed because of the length check before proceeding.
for(int i=0; i < **4**; i++){
String subString = someVal[i];
if(subString.length() != 8){ return false }
for(int j = 0; j < **8**; j++){
// do some other stuff
}
}
I tried to argue this made it constant, but didn't do a great job.
Was I completely wrong or off-base?
Your early exit condition inside of the for loop is if(subString.length() != 8), so your second for loop is executed any time if the length exactly 8. This in fact makes the complexity of the second for loop constant, as it is not depending on the input size. But before your first for loop you have another early exit condition if(someVal != 4) making the first for loop also constant.
So yes, I would follow your argumentation, that the complete function has a constant big-O time complexity. Maybe repeating in the explanation that big-O always describes an upper bound complexity, which will never be crossed and constant time factors can be reduced to 1.
But keep in mind, that a constant complexity based on real world input could still be longer in execution time than a O(n) complexity, based on the size of n. If it would be a known pre-condition that n does not grow beyond a (low) given number, I would not argue about the Big-O complexity, but about overall expected runtime, where the second constant for loop could have a larger impact than expected by Big-O complexity analysis.

NSString constrainedToSize method?

Not to get confused with the NSString sizeWithFont method that returns a CGSize, what I'm looking for is a method that returns an NSString constrained to a certain CGSize. The reason I want to do this is so that when drawing text with Core Text, I can get append an ellipses (...) to the end of the string. I know NSString's drawInRect method does this for me, but I'm using Core Text, and kCTLineBreakByTruncatingTail truncates the end of each line rather than the end of the string.
There's this method that I found that truncates a string to a certain width, and it's not that hard to change it to make it work for a CGSize, but the algorithm is unbelievably slow for long strings, and is practically unusable. (It took over 10 seconds to truncate a long string). There has to be a more "computer science"/mathematical algorithm way to do this faster. Anyone daring enough to try to come up with a faster implementation?
Edit: I've managed to make this in to a binary algorithm:
-(NSString*)getStringByTruncatingToSize:(CGSize)size string:(NSString*)string withFont:(UIFont*)font
{
int min = 0, max = string.length, mid;
while (min < max) {
mid = (min+max)/2;
NSString *currentString = [string substringWithRange:NSMakeRange(min, mid - min)];
CGSize currentSize = [currentString sizeWithFont:font constrainedToSize:CGSizeMake(size.width, MAXFLOAT)];
if (currentSize.height < size.height){
min = mid + 1;
} else if (currentSize.height > size.height) {
max = mid - 1;
} else {
break;
}
}
NSMutableString *finalString = [[string substringWithRange:NSMakeRange(0, min)] mutableCopy];
if(finalString.length < self.length)
[finalString replaceCharactersInRange:NSMakeRange(finalString.length - 3, 3) withString:#"..."];
return finalString;
}
The problem is that this sometimes cuts the string too short when it has room to spare. I think this is where that last condition comes in to play. How do I make sure it doesn't cut off too much?
Good news! There is a "computer science/mathematical way" to do this faster.
The example you link to does a linear search: it just chops one character at a time from the end of the string until it's short enough. So, the amount of time it takes will scale linearly with the length of the string, and with long strings it will be really slow, as you've discovered.
However, you can easily apply a binary search technique to the string. Instead of starting at the end and dropping off one character at a time, you start in the middle:
THIS IS THE STRING THAT YOU WANT TO TRUNCATE
^
You compute the width of "THIS IS THE STRING THAT". If it is too wide, you move your test point to the midpoint of the space on the left. Like this:
THIS IS THE STRING THAT YOU WANT TO TRUNCATE
^ |
On the other hand, if it isn't wide enough, you move the test point to the midpoint of the other half:
THIS IS THE STRING THAT YOU WANT TO TRUNCATE
| ^
You repeat this until you find the point that is just under your width limit. Because you're dividing your search area in half each time, you'll never need to compute the width more than log2 N times (where N is the length of the string) which doesn't grow very fast, even for very long strings.
To put it another way, if you double the length of your input string, that's only one additional width computation.
Starting with Wikipedia's binary search sample, here's an example. Note that since we're not looking for an exact match (you want largest that will fit) the logic is slightly different.
int binary_search(NSString *A, float max_width, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imax >= imin)
{
/* calculate the midpoint for roughly equal partition */
int imid = (imin + imax) / 2;
// determine which subarray to search
float width = ComputeWidthOfString([A substringToIndex:imid]);
if (width < max_width)
// change min index to search upper subarray
imin = imid + 1;
else if (width > max_width )
// change max index to search lower subarray
imax = imid - 1;
else
// exact match found at index imid
return imid;
}
// Normally, this is the "not found" case, but we're just looking for
// the best fit, so we return something here.
return imin;
}
You need to do some math or testing to figure out what's the right index at the bottom, but it's definitely imin or imax, plus or minus one.

Which are faster squares or roots?

for (int i = 2; i * i <= n; i++)
for (int i = 2; i <= SQRT(n); i++)
just wondering which is faster I looked at some primitive algorithms for getting roots and it would seem to me that squaring the number would be faster but I don't know for sure. These loops are for determining a numbers "primeness".
Shouldn't the comaprison be between
int sqrt = SQRT(n);
for (int i = 2; i <= sqrt; i++)
and
for (int i = 2; i * i <= n; i++)
The answer will depend on how many loop iterations you do. The sqrt method does less work per iteration, but it has a higher start-up cost. Mind you, this reeks of premature optimisation.
Compiler may 'cache' result of SQRT (n), but i * i it should compute on each step.
Square root will take longer, unless it's implemented in hardware, lookup, or a special machine code version. Newton iteration is the algorithm of choice; it converges quadratically.
Best to benchmark for yourself. I'd recommend moving the call to square root outside the loop so you only do it once rather than every time you check the exit condition.
Why not skip both of them and use some clever maths? The Following code avoid both of them using the Property that Sum of the First n odd numbers is always a perfect square.
A shameless plug for my old blogpost (from my dead blog)
int isPrime(int n)
{
int squares = 1;
int odd = 3;
if( ((n & 1) == 0) || (n < 9)) return (n == 2) || ((n > 1) && (n & 1));
else
{
for( ;squares <= n; odd += 2)
{
if( n % odd == 0)
return 0;
squares+=odd;
}
return 1;
}
}
The square will be faster.
But the square will overflow if n is larger than the square root of the largest int, and then the comparison will go wrong. The square root function could (and you would expect to) be implemented in such a way that is can be calculated on arguments all the way up to the largest representable int. That means it won't go wrong in that way.
In Java, the largest int is 2^31 - 1, which means its square root is just under 46341. If you want to look for primes larger than that, the squaring would stop you.

Non repeating random numbers in Objective-C

I'm using
for (int i = 1, i<100, i++)
int i = arc4random() % array count;
but I'm getting repeats every time. How can I fill out the chosen int value from the range, so that when the program loops I will not get any dupe?
It sounds like you want shuffling of a set rather than "true" randomness. Simply create an array where all the positions match the numbers and initialize a counter:
num[ 0] = 0
num[ 1] = 1
: :
num[99] = 99
numNums = 100
Then, whenever you want a random number, use the following method:
idx = rnd (numNums); // return value 0 through numNums-1
val = num[idx]; // get then number at that position.
num[idx] = val[numNums-1]; // remove it from pool by overwriting with highest
numNums--; // and removing the highest position from pool.
return val; // give it back to caller.
This will return a random value from an ever-decreasing pool, guaranteeing no repeats. You will have to beware of the pool running down to zero size of course, and intelligently re-initialize the pool.
This is a more deterministic solution than keeping a list of used numbers and continuing to loop until you find one not in that list. The performance of that sort of algorithm will degrade as the pool gets smaller.
A C function using static values something like this should do the trick. Call it with
int i = myRandom (200);
to set the pool up (with any number zero or greater specifying the size) or
int i = myRandom (-1);
to get the next number from the pool (any negative number will suffice). If the function can't allocate enough memory, it will return -2. If there's no numbers left in the pool, it will return -1 (at which point you could re-initialize the pool if you wish). Here's the function with a unit testing main for you to try out:
#include <stdio.h>
#include <stdlib.h>
#define ERR_NO_NUM -1
#define ERR_NO_MEM -2
int myRandom (int size) {
int i, n;
static int numNums = 0;
static int *numArr = NULL;
// Initialize with a specific size.
if (size >= 0) {
if (numArr != NULL)
free (numArr);
if ((numArr = malloc (sizeof(int) * size)) == NULL)
return ERR_NO_MEM;
for (i = 0; i < size; i++)
numArr[i] = i;
numNums = size;
}
// Error if no numbers left in pool.
if (numNums == 0)
return ERR_NO_NUM;
// Get random number from pool and remove it (rnd in this
// case returns a number between 0 and numNums-1 inclusive).
n = rand() % numNums;
i = numArr[n];
numArr[n] = numArr[numNums-1];
numNums--;
if (numNums == 0) {
free (numArr);
numArr = 0;
}
return i;
}
int main (void) {
int i;
srand (time (NULL));
i = myRandom (20);
while (i >= 0) {
printf ("Number = %3d\n", i);
i = myRandom (-1);
}
printf ("Final = %3d\n", i);
return 0;
}
And here's the output from one run:
Number = 19
Number = 10
Number = 2
Number = 15
Number = 0
Number = 6
Number = 1
Number = 3
Number = 17
Number = 14
Number = 12
Number = 18
Number = 4
Number = 9
Number = 7
Number = 8
Number = 16
Number = 5
Number = 11
Number = 13
Final = -1
Keep in mind that, because it uses statics, it's not safe for calling from two different places if they want to maintain their own separate pools. If that were the case, the statics would be replaced with a buffer (holding count and pool) that would "belong" to the caller (a double-pointer could be passed in for this purpose).
And, if you're looking for the "multiple pool" version, I include it here for completeness.
#include <stdio.h>
#include <stdlib.h>
#define ERR_NO_NUM -1
#define ERR_NO_MEM -2
int myRandom (int size, int *ppPool[]) {
int i, n;
// Initialize with a specific size.
if (size >= 0) {
if (*ppPool != NULL)
free (*ppPool);
if ((*ppPool = malloc (sizeof(int) * (size + 1))) == NULL)
return ERR_NO_MEM;
(*ppPool)[0] = size;
for (i = 0; i < size; i++) {
(*ppPool)[i+1] = i;
}
}
// Error if no numbers left in pool.
if (*ppPool == NULL)
return ERR_NO_NUM;
// Get random number from pool and remove it (rnd in this
// case returns a number between 0 and numNums-1 inclusive).
n = rand() % (*ppPool)[0];
i = (*ppPool)[n+1];
(*ppPool)[n+1] = (*ppPool)[(*ppPool)[0]];
(*ppPool)[0]--;
if ((*ppPool)[0] == 0) {
free (*ppPool);
*ppPool = NULL;
}
return i;
}
int main (void) {
int i;
int *pPool;
srand (time (NULL));
pPool = NULL;
i = myRandom (20, &pPool);
while (i >= 0) {
printf ("Number = %3d\n", i);
i = myRandom (-1, &pPool);
}
printf ("Final = %3d\n", i);
return 0;
}
As you can see from the modified main(), you need to first initialise an int pointer to NULL then pass its address to the myRandom() function. This allows each client (location in the code) to have their own pool which is automatically allocated and freed, although you could still share pools if you wish.
You could use Format-Preserving Encryption to encrypt a counter. Your counter just goes from 0 upwards, and the encryption uses a key of your choice to turn it into a seemingly random value of whatever radix and width you want.
Block ciphers normally have a fixed block size of e.g. 64 or 128 bits. But Format-Preserving Encryption allows you to take a standard cipher like AES and make a smaller-width cipher, of whatever radix and width you want (e.g. radix 2, width 16), with an algorithm which is still cryptographically robust.
It is guaranteed to never have collisions (because cryptographic algorithms create a 1:1 mapping). It is also reversible (a 2-way mapping), so you can take the resulting number and get back to the counter value you started with.
AES-FFX is one proposed standard method to achieve this. I've experimented with some basic Python code which is based on the AES-FFX idea, although not fully conformant--see Python code here. It can e.g. encrypt a counter to a random-looking 7-digit decimal number, or a 16-bit number.
You need to keep track of the numbers you have already used (for instance, in an array). Get a random number, and discard it if it has already been used.
Without relying on external stochastic processes, like radioactive decay or user input, computers will always generate pseudorandom numbers - that is numbers which have many of the statistical properties of random numbers, but repeat in sequences.
This explains the suggestions to randomise the computer's output by shuffling.
Discarding previously used numbers may lengthen the sequence artificially, but at a cost to the statistics which give the impression of randomness.
The best way to do this is create an array for numbers already used. After a random number has been created then add it to the array. Then when you go to create another random number, ensure that it is not in the array of used numbers.
In addition to using secondary array to store already generated random numbers, invoking random no. seeding function before every call of random no. generation function might help to generate different seq. of random numbers in every run.

Generate combinations ordered by an attribute

I'm looking for a way to generate combinations of objects ordered by a single attribute. I don't think lexicographical order is what I'm looking for... I'll try to give an example. Let's say I have a list of objects A,B,C,D with the attribute values I want to order by being 3,3,2,1. This gives A3, B3, C2, D1 objects. Now I want to generate combinations of 2 objects, but they need to be ordered in a descending way:
A3 B3
A3 C2
B3 C2
A3 D1
B3 D1
C2 D1
Generating all combinations and sorting them is not acceptable because the real world scenario involves large sets and millions of combinations. (set of 40, order of 8), and I need only combinations above the certain threshold.
Actually I need count of combinations above a threshold grouped by a sum of a given attribute, but I think it is far more difficult to do - so I'd settle for developing all combinations above a threshold and counting them. If that's possible at all.
EDIT - My original question wasn't very precise... I don't actually need these combinations ordered, just thought it would help to isolate combinations above a threshold. To be more precise, in the above example, giving a threshold of 5, I'm looking for an information that the given set produces 1 combination with a sum of 6 ( A3 B3 ) and 2 with a sum of 5 ( A3 C2, B3 C2). I don't actually need the combinations themselves.
I was looking into subset-sum problem, but if I understood correctly given dynamic solution it will only give you information is there a given sum or no, not count of the sums.
Thanks
Actually, I think you do want lexicographic order, but descending rather than ascending. In addition:
It's not clear to me from your description that A, B, ... D play any role in your answer (except possibly as the container for the values).
I think your question example is simply "For each integer at least 5, up to the maximum possible total of two values, how many distinct pairs from the set {3, 3, 2, 1} have sums of that integer?"
The interesting part is the early bailout, once no possible solution can be reached (remaining achievable sums are too small).
I'll post sample code later.
Here's the sample code I promised, with a few remarks following:
public class Combos {
/* permanent state for instance */
private int values[];
private int length;
/* transient state during single "count" computation */
private int n;
private int limit;
private Tally<Integer> tally;
private int best[][]; // used for early-bail-out
private void initializeForCount(int n, int limit) {
this.n = n;
this.limit = limit;
best = new int[n+1][length+1];
for (int i = 1; i <= n; ++i) {
for (int j = 0; j <= length - i; ++j) {
best[i][j] = values[j] + best[i-1][j+1];
}
}
}
private void countAt(int left, int start, int sum) {
if (left == 0) {
tally.inc(sum);
} else {
for (
int i = start;
i <= length - left
&& limit <= sum + best[left][i]; // bail-out-check
++i
) {
countAt(left - 1, i + 1, sum + values[i]);
}
}
}
public Tally<Integer> count(int n, int limit) {
tally = new Tally<Integer>();
if (n <= length) {
initializeForCount(n, limit);
countAt(n, 0, 0);
}
return tally;
}
public Combos(int[] values) {
this.values = values;
this.length = values.length;
}
}
Preface remarks:
This uses a little helper class called Tally, that just isolates the tabulation (including initialization for never-before-seen keys). I'll put it at the end.
To keep this concise, I've taken some shortcuts that aren't good practice for "real" code:
This doesn't check for a null value array, etc.
I assume that the value array is already sorted into descending order, required for the early-bail-out technique. (Good production code would include the sorting.)
I put transient data into instance variables instead of passing them as arguments among the private methods that support count. That makes this class non-thread-safe.
Explanation:
An instance of Combos is created with the (descending ordered) array of integers to combine. The value array is set up once per instance, but multiple calls to count can be made with varying population sizes and limits.
The count method triggers a (mostly) standard recursive traversal of unique combinations of n integers from values. The limit argument gives the lower bound on sums of interest.
The countAt method examines combinations of integers from values. The left argument is how many integers remain to make up n integers in a sum, start is the position in values from which to search, and sum is the partial sum.
The early-bail-out mechanism is based on computing best, a two-dimensional array that specifies the "best" sum reachable from a given state. The value in best[n][p] is the largest sum of n values beginning in position p of the original values.
The recursion of countAt bottoms out when the correct population has been accumulated; this adds the current sum (of n values) to the tally. If countAt has not bottomed out, it sweeps the values from the start-ing position to increase the current partial sum, as long as:
enough positions remain in values to achieve the specified population, and
the best (largest) subtotal remaining is big enough to make the limit.
A sample run with your question's data:
int[] values = {3, 3, 2, 1};
Combos mine = new Combos(values);
Tally<Integer> tally = mine.count(2, 5);
for (int i = 5; i < 9; ++i) {
int n = tally.get(i);
if (0 < n) {
System.out.println("found " + tally.get(i) + " sums of " + i);
}
}
produces the results you specified:
found 2 sums of 5
found 1 sums of 6
Here's the Tally code:
public static class Tally<T> {
private Map<T,Integer> tally = new HashMap<T,Integer>();
public Tally() {/* nothing */}
public void inc(T key) {
Integer value = tally.get(key);
if (value == null) {
value = Integer.valueOf(0);
}
tally.put(key, (value + 1));
}
public int get(T key) {
Integer result = tally.get(key);
return result == null ? 0 : result;
}
public Collection<T> keys() {
return tally.keySet();
}
}
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
Check out this question in stackoverflow: Algorithm to return all combinations
I also just used a the java code below to generate all permutations, but it could easily be used to generate unique combination's given an index.
public static <E> E[] permutation(E[] s, int num) {//s is the input elements array and num is the number which represents the permutation
int factorial = 1;
for(int i = 2; i < s.length; i++)
factorial *= i;//calculates the factorial of (s.length - 1)
if (num/s.length >= factorial)// Optional. if the number is not in the range of [0, s.length! - 1]
return null;
for(int i = 0; i < s.length - 1; i++){//go over the array
int tempi = (num / factorial) % (s.length - i);//calculates the next cell from the cells left (the cells in the range [i, s.length - 1])
E temp = s[i + tempi];//Temporarily saves the value of the cell needed to add to the permutation this time
for(int j = i + tempi; j > i; j--)//shift all elements to "cover" the "missing" cell
s[j] = s[j-1];
s[i] = temp;//put the chosen cell in the correct spot
factorial /= (s.length - (i + 1));//updates the factorial
}
return s;
}
I am extremely sorry (after all those clarifications in the comments) to say that I could not find an efficient solution to this problem. I tried for the past hour with no results.
The reason (I think) is that this problem is very similar to problems like the traveling salesman problem. Until unless you try all the combinations, there is no way to know which attributes will add upto the threshold.
There seems to be no clever trick that can solve this class of problems.
Still there are many optimizations that you can do to the actual code.
Try sorting the data according to the attributes. You may be able to avoid processing some values from the list when you find that a higher value cannot satisfy the threshold (so all lower values can be eliminated).
If you're using C# there is a fairly good generics library here. Note though that the generation of some permutations is not in lexicographic order
Here's a recursive approach to count the number of these subsets: We define a function count(minIndex,numElements,minSum) that returns the number of subsets of size numElements whose sum is at least minSum, containing elements with indices minIndex or greater.
As in the problem statement, we sort our elements in descending order, e.g. [3,3,2,1], and call the first index zero, and the total number of elements N. We assume all elements are nonnegative. To find all 2-subsets whose sum is at least 5, we call count(0,2,5).
Sample Code (Java):
int count(int minIndex, int numElements, int minSum)
{
int total = 0;
if (numElements == 1)
{
// just count number of elements >= minSum
for (int i = minIndex; i <= N-1; i++)
if (a[i] >= minSum) total++; else break;
}
else
{
if (minSum <= 0)
{
// any subset will do (n-choose-k of them)
if (numElements <= (N-minIndex))
total = nchoosek(N-minIndex, numElements);
}
else
{
// add element a[i] to the set, and then consider the count
// for all elements to its right
for (int i = minIndex; i <= (N-numElements); i++)
total += count(i+1, numElements-1, minSum-a[i]);
}
}
return total;
}
Btw, I've run the above with an array of 40 elements, and size-8 subsets and consistently got back results in less than a second.