What are the practical uses of modulus (%) in programming? [duplicate]

What are the practical uses of modulus (%) in programming? [duplicate] - operators

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Recognizing when to use the mod operator
What are the practical uses of modulus? I know what modulo division is. The first scenario which comes to my mind is to use it to find odd and even numbers, and clock arithmetic. But where else I could use it?

The most common use I've found is for "wrapping round" your array indices.
For example, if you just want to cycle through an array repeatedly, you could use:
int a[10];
for (int i = 0; true; i = (i + 1) % 10)
{
// ... use a[i] ...
}
The modulo ensures that i stays in the [0, 10) range.

I usually use them in tight loops, when I have to do something every X loops as opposed to on every iteration..
Example:
int i;
for (i = 1; i <= 1000000; i++)
{
do_something(i);
if (i % 1000 == 0)
printf("%d processed\n", i);
}

One use for the modulus operation is when making a hash table. It's used to convert the value out of the hash function into an index into the array. (If the hash table size is a power of two, the modulus could be done with a bit-mask, but it's still a modulus operation.)

To print a number as string, you need the modulus to find the value of a digit.
string number_to_string(uint number) {
string result = "";
while (number != 0) {
result = cast(char)((number % 10) + '0') ~ result;
// ^^^^^^^^^^^
number /= 10;
}
return result;
}

For the control number of international bank account numbers, the mod97 technique.
Also in large batches to do something after n iterations. Here is an example for NHibernate:
ISession session = sessionFactory.openSession();
ITransaction tx = session.BeginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.Save(customer);
if ( i % 20 == 0 ) { //20, same as the ADO batch size
//Flush a batch of inserts and release memory:
session.Flush();
session.Clear();
}
}
tx.Commit();
session.Close();

The usual implementation of buffered communications uses circular buffers, and you manage them with modulus arithmetic.

For languages that don't have bitwise operators, modulus can be used to get the lowest n bits of a number. For example, to get the lowest 8 bits of x:
x % 256
which is equivalent to:
x & 255

Cryptography. That alone would account for an obscene percentage of modulus (I exaggerate, but you get the point).
Try the Wikipedia page too:
Modular arithmetic is referenced in number theory, group theory, ring theory, knot theory, abstract algebra, cryptography, computer science, chemistry and the visual and musical arts.
In my experience, any sufficiently advanced algorithm is probably going to touch on one more of the above topics.

Well, there are many perspectives you can look at it. If you are looking at it as a mathematical operation then it's just a modulo division. Even we don't need this as whatever % do, we can achieve using subtraction as well, but every programming language implement it in very optimized way.
And modulu division is not limited to finding odd and even numbers or clock arithmetic. There are hundreds of algorithms which need this module operation, for example, cryptography algorithms, etc. So it's a general mathematical operation like other +, -, *, /, etc.
Except the mathematical perspective, different languages use this symbol for defining built-in data structures, like in Perl %hash is used to show that the programmer declared a hash. So it all varies based on the programing language design.
So still there are a lot of other perspectives which one can do add to the list of use of %.

Related

Chessprogramming: how to get a single move out of a bitboard attack-mask most efficently

How can I get efficiently a single move out of an attack mask, that looks like this:
....1...
1...1...
.1..1..1
..1.1.1.
...111..
11111111
..1.11..
.1..1.1.
for a queen.
What I've done in the past, is to get the square-indices of every single possible move from the queen by counting the trailing zeros (bitScanForward)
and after I generated the new move i removed this square from the attack mask and continued with the next attack-square. Is there any technic to get the single attack bits directly?

I think what you are describing is already the most efficient way. Looping over the bitboard until it is zero and pick one move at a time.
To sketch the idea with some code, it could look like this:
using Bitboard = uint64_t; // 64 bit unsigned integer
pMoves createAllMoves(Bitboard mask, int from_sq, Move* pMoves) {
while(moves != 0) {
int to_sq = findAndClearSetBit(mask);
*pMoves++ = createMove(from_sq, to_sq);
}
return pMoves;
}
The findAndClearSetBit function can choose any set bit, but commonly on today's hardware, finding the least significant bit is most efficient. If you are using GCC or Clang, you can use __builtin_ctzll which should be optimized to the specific hardware:
int findAndClearSetBit(Bitboard& mask) {
int sq = __builtin_ctzll(mask); // find least significant bit
mask &= mask - 1; // clear least significant bit
return sq;
}
If I am not mistaken, your existing function bitScanForward is already an implementation to find the least significant bit. So, you can use it to get a portable version.

Efficient random permutation of n-set-bits

For the problem of producing a bit-pattern with exactly n set bits, I know of two practical methods, but they both have limitations I'm not happy with.
First, you can enumerate all of the possible word values which have that many bits set in a pre-computed table, and then generate a random index into that table to pick out a possible result. This has the problem that as the output size grows the list of candidate outputs eventually becomes impractically large.
Alternatively, you can pick n non-overlapping bit positions at random (for example, by using a partial Fisher-Yates shuffle) and set those bits only. This approach, however, computes a random state in a much larger space than the number of possible results. For example, it may choose the first and second bits out of three, or it might, separately, choose the second and first bits.
This second approach must consume more bits from the random number source than are strictly required. Since it is choosing n bits in a specific order when their order is unimportant, this means that it is making an arbitrary distinction between n! different ways of producing the same result, and consuming at least floor(log_2(n!)) more bits than are necessary.
Can this be avoided?
There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.
clarification
The first approach requires picking a single random number between zero and (where w is the output size), as this is the number of possible solutions.
The second approach requires picking n random values between zero and w-1, zero and w-2, etc., and these have a product of , which is times larger than the first approach.
This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits.

Seems like you want a variant of Floyd's algorithm:
Algorithm to select a single, random combination of values?
Should be especially useful in your case, because the containment test is a simple bitmask operation. This will require only k calls to the RNG. In the code below, I assume you have randint(limit) which produces a uniform random from 0 to limit-1, and that you want k bits set in a 32-bit int:
mask = 0;
for (j = 32 - k; j < 32; ++j) {
r = randint(j+1);
b = 1 << r;
if (mask & b) mask |= (1 << j);
else mask |= b;
}
How many bits of entropy you need here depends on how randint() is implemented. If k > 16, set it to 32 - k and negate the result.
Your alternative suggestion of generating a single random number representing one combination among the set (mathematicians would call this a rank of the combination) is simpler if you use colex order rather than lexicographic rank. This code, for example:
for (i = k; i >= 1; --i) {
while ((b = binomial(n, i)) > r) --n;
buf[i-1] = n;
r -= b;
}
will fill the array buf[] with indices from 0 to n-1 for the k-combination at colex rank r. In your case, you'd replace buf[i-1] = n with mask |= (1 << n). The binomial() function is binomial coefficient, which I do with a lookup table (see this). That would make the most efficient use of entropy, but I still think Floyd's algorithm would be a better compromise.

[Expanding my comment:] If you only have a little raw entropy available, then use a PRNG to stretch it further. You only need enough raw entropy to seed a PRNG. Use the PRNG to do the actual shuffle, not the raw entropy. For the next shuffle reseed the PRNG with some more raw entropy. That spreads out the raw entropy and makes less of a demand on your entropy source.
If you know exactly the range of numbers you need out of the PRNG, then you can, carefully, set up your own LCG PRNG to cover the appropriate range while needing the minimum entropy to seed it.
ETA: In C++there is a next_permutation() method. Try using that. See std::next_permutation Implementation Explanation for more.

Is this a theory problem or a practical problem?
You could still do the partial shuffle, but keep track of the order of the ones and forget the zeroes. There are log(k!) bits of unused entropy in their final order for your future consumption.
You could also just use the recurrence (n choose k) = (n-1 choose k-1) + (n-1 choose k) directly. Generate a random number between 0 and (n choose k)-1. Call it r. Iterate over all of the bits from the nth to the first. If we have to set j of the i remaining bits, set the ith if r < (i-1 choose j-1) and clear it, subtracting (i-1 choose j-1), otherwise.
Practically, I wouldn't worry about the couple of words of wasted entropy from the partial shuffle; generating a random 32-bit word with 16 bits set costs somewhere between 64 and 80 bits of entropy, and that's entirely acceptable. The growth rate of the required entropy is asymptotically worse than the theoretical bound, so I'd do something different for really big words.
For really big words, you might generate n independent bits that are 1 with probability k/n. This immediately blows your entropy budget (and then some), but it only uses linearly many bits. The number of set bits is tightly concentrated around k, though. For a further expected linear entropy cost, I can fix it up. This approach has much better memory locality than the partial shuffle approach, so I'd probably prefer it in practice.

I would use solution number 3, generate the i-th permutation.
But do you need to generate the first i-1 ones?
You can do it a bit faster than that with kind of divide and conquer method proposed here: Returning i-th combination of a bit array and maybe you can improve the solution a bit

Background
From the formula you have given - w! / ((w-n)! * n!) it looks like your problem set has to do with the binomial coefficient which deals with calculating the number of unique combinations and not permutations which deals with duplicates in different positions.
You said:
"There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.
...
This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits."
So, there is a way to efficiently compute the nth unique combination, or rank, from the k-indexes. The k-indexes refers to a unique combination. For example, lets say that the n choose k case of 4 choose 3 is taken. This means that there are a total of 4 numbers that can be selected (0, 1, 2, 3), which is represented by n, and they are taken in groups of 3, which is represented by k. The total number of unique combinations can be calculated as n! / ((k! * (n-k)!). The rank of zero corresponds to the k-index of (2, 1, 0). Rank one is represented by the k-index group of (3, 1, 0), and so forth.
Solution
There is a formula that can be used to very efficiently translate between a k-index group and the corresponding rank without iteration. Likewise, there is a formula for translating between the rank and corresponding k-index group.
I have written a paper on this formula and how it can be seen from Pascal's Triangle. The paper is called Tablizing The Binomial Coeffieicent.
I have written a C# class which is in the public domain that implements the formula described in the paper. It uses very little memory and can be downloaded from the site. It performs the following tasks:
Outputs all the k-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the k-index to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the entire set.
Converts the index in a sorted binomial coefficient table to the corresponding k-index. The technique used is also much faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers. This version returns a long value. There is at least one other method that returns an int. Make sure that you use the method that returns a long value.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with at least 2 cases and there are no known bugs.
The following tested example code demonstrates how to use the class and will iterate through each unique combination:
public void Test10Choose5()
{
String S;
int Loop;
int N = 10; // Total number of elements in the set.
int K = 5; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the indexes for a lexigraphic element.
int[] KIndexes = new int[K];
StringBuilder SB = new StringBuilder();
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Combo, KIndexes);
// Verify that the Kindexes returned can be used to retrive the
// rank or lexigraphic order of the KIndexes in the table.
int Val = BC.GetIndex(true, KIndexes);
if (Val != Combo)
{
S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
Console.WriteLine(S);
}
SB.Remove(0, SB.Length);
for (Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
S = "KIndexes = " + SB.ToString();
Console.WriteLine(S);
}
}
So, the way to apply the class to your problem is by considering each bit in the word size as the total number of items. This would be n in the n!/((k! (n - k)!) formula. To obtain k, or the group size, simply count the number of bits set to 1. You would have to create a list or array of the class objects for each possible k, which in this case would be 32. Note that the class does not handle N choose N, N choose 0, or N choose 1 so the code would have to check for those cases and return 1 for both the 32 choose 0 case and 32 choose 32 case. For 32 choose 1, it would need to return 32.
If you need to use values not much larger than 32 choose 16 (the worst case for 32 items - yields 601,080,390 unique combinations), then you can use 32 bit integers, which is how the class is currently implemented. If you need to use 64 bit integers, then you will have to convert the class to use 64 bit longs. The largest value that a long can hold is 18,446,744,073,709,551,616 which is 2 ^ 64. The worst case for n choose k when n is 64 is 64 choose 32. 64 choose 32 is 1,832,624,140,942,590,534 - so a long value will work for all 64 choose k cases. If you need numbers bigger than that, then you will probably want to look into using some sort of big integer class. In C#, the .NET framework has a BigInteger class. If you are working in a different language, it should not be hard to port.
If you are looking for a very good PRNG, one of the fastest, lightweight, and high quality output is the Tiny Mersenne Twister or TinyMT for short . I ported the code over to C++ and C#. it can be found here, along with a link to the original author's C code.
Rather than using a shuffling algorithm like Fisher-Yates, you might consider doing something like the following example instead:
// Get 7 random cards.
ulong Card;
ulong SevenCardHand = 0;
for (int CardLoop = 0; CardLoop < 7; CardLoop++)
{
do
{
// The card has a value of between 0 and 51. So, get a random value and
// left shift it into the proper bit position.
Card = (1UL << RandObj.Next(CardsInDeck));
} while ((SevenCardHand & Card) != 0);
SevenCardHand |= Card;
}
The above code is faster than any shuffling algorithm (at least for obtaining a subset of random cards) since it only works on 7 cards instead of 52. It also packs the cards into individual bits within a single 64 bit word. It makes evaluating poker hands much more efficient as well.
As a side, note, the best binomial coefficient calculator I have found that works with very large numbers (it accurately calculated a case that yielded over 15,000 digits in the result) can be found here.

Constrained Single-Objective Optimization

Introduction
I need to split an array filled with a certain type (let's take water buckets for example) with two values set (in this case weight and volume), while keeping the difference between the total of the weight to a minimum (preferred) and the difference between the total of the volumes less than 1000 (required). This doesn't need to be a full-fetched genetic algorithm or something similar, but it should be better than what I currently have...
Current Implementation
Due to not knowing how to do it better, I started by splitting the array in two same-length arrays (the array can be filled with an uneven number of items), replacing a possibly void spot with an item with both values being 0. The sides don't need to have the same amount of items, I just didn't knew how to handle it otherwise.
After having these distributed, I'm trying to optimize them like this:
func (main *Main) Optimize() {
for {
difference := main.Difference(WEIGHT)
for i := 0; i < len(main.left); i++ {
for j := 0; j < len(main.right); j++ {
if main.DifferenceAfter(i, j, WEIGHT) < main.Difference(WEIGHT) {
main.left[i], main.right[j] = main.right[j], main.left[i]
}
}
}
if difference == main.Difference(WEIGHT) {
break
}
}
for main.Difference(CAPACITY) > 1000 {
leftIndex := 0
rightIndex := 0
liters := 0
weight := 100
for i := 0; i < len(main.left); i++ {
for j := 0; j < len(main.right); j++ {
if main.DifferenceAfter(i, j, CAPACITY) < main.Difference(CAPACITY) {
newLiters := main.Difference(CAPACITY) - main.DifferenceAfter(i, j, CAPACITY)
newWeight := main.Difference(WEIGHT) - main.DifferenceAfter(i, j, WEIGHT)
if newLiters > liters && newWeight <= weight || newLiters == liters && newWeight < weight {
leftIndex = i
rightIndex = j
liters = newLiters
weight = newWeight
}
}
}
}
main.left[leftIndex], main.right[rightIndex] = main.right[rightIndex], main.left[leftIndex]
}
}
Functions:
main.Difference(const) calculates the absolute difference between the two sides, the constant taken as an argument decides the value to calculate the difference for
main.DifferenceAfter(i, j, const) simulates a swap between the two buckets, i being the left one and j being the right one, and calculates the resulting absolute difference then, the constant again determines the value to check
Explanation:
Basically this starts by optimizing the weight, which is what the first for-loop does. On every iteration, it tries every possible combination of buckets that can be switched and if the difference after that is less than the current difference (resulting in better distribution) it switches them. If the weight doesn't change anymore, it breaks out of the for-loop. While not perfect, this works quite well, and I consider this acceptable for what I'm trying to accomplish.
Then it's supposed to optimize the distribution based on the volume, so the total difference is less than 1000. Here I tried to be more careful and search for the best combination in a run before switching it. Thus it searches for the bucket switch resulting in the biggest capacity change and is also supposed to search for a tradeoff between this, though I see the flaw that the first bucket combination tried will set the liters and weight variables, resulting in the next possible combinations being reduced by a big a amount.
Conclusion
I think I need to include some more math here, but I'm honestly stuck here and don't know how to continue here, so I'd like to get some help from you, basically that can help me here is welcome.

As previously said, your problem is actually a constrained optimisation problem with a constraint on your difference of volumes.
Mathematically, this would be minimise the difference of volumes under constraint that the difference of volumes is less than 1000. The simplest way to express it as a linear optimisation problem would be:
min weights . x
subject to volumes . x < 1000.0
for all i, x[i] = +1 or -1
Where a . b is the vector dot product. Once this problem is solved, all indices where x = +1 correspond to your first array, all indices where x = -1 correspond to your second array.
Unfortunately, 0-1 integer programming is known to be NP-hard. The simplest way of solving it is to perform exhaustive brute force exploring of the space, but it requires testing all 2^n possible vectors x (where n is the length of your original weights and volumes vectors), which can quickly get out of hands. There is a lot of literature on this topic, with more efficient algorithms, but they are often highly specific to a particular set of problems and/or constraints. You can google "linear integer programming" to see what has been done on this topic.
I think the simplest might be to perform a heuristic-based brute force search, where you prune your search tree early when it would get you out of your volume constraint, and stay close to your constraint (as a general rule, the solution of linear optimisation problems are on the edge of the feasible space).
Here are a couple of articles you might want to read on this kind of optimisations:
UCLA Linear integer programming
MIT course on Integer programming
Carleton course on Binary programming
Articles on combinatorial optimisation & linear integer programming
If you are not familiar with optimisation articles or math in general, the wikipedia articles provides a good introduction, but most articles on this topic quickly show some (pseudo)code you can adapt right away.
If your n is large, I think at some point you will have to make a trade off between how optimal your solution is and how fast it can be computed. Your solution is probably suboptimal, but it is much faster than the exhaustive search. There might be a better trade off, depending on the exact configuration of your problem.

It seems that in your case, difference of weight is objective, while difference of volume is just a constraint, which means that you are seeking for solutions that optimize difference of weight attribute (as small as possible), and satisfy the condition on difference of volume attribute (total < 1000). In this case, it's a single objective constrained optimization problem.
Whereas, if you are interested in multi-objective optimization, maybe you wanna look at the concept of Pareto Frontier: http://en.wikipedia.org/wiki/Pareto_efficiency . It's good for keeping multiple good solutions with advantages in different objective, i.e., not losing diversity.

How to 'checksum' an array of noisy floating point numbers?

What is a quick and easy way to 'checksum' an array of floating point numbers, while allowing for a specified small amount of inaccuracy?
e.g. I have two algorithms which should (in theory, with infinite precision) output the same array. But they work differently, and so floating point errors will accumulate differently, though the array lengths should be exactly the same. I'd like a quick and easy way to test if the arrays seem to be the same. I could of course compare the numbers pairwise, and report the maximum error; but one algorithm is in C++ and the other is in Mathematica and I don't want the bother of writing out the numbers to a file or pasting them from one system to another. That's why I want a simple checksum.
I could simply add up all the numbers in the array. If the array length is N, and I can tolerate an error of 0.0001 in each number, then I would check if abs(sum1-sum2)<0.0001*N. But this simplistic 'checksum' is not robust, e.g. to an error of +10 in one entry and -10 in another. (And anyway, probability theory says that the error probably grows like sqrt(N), not like N.) Of course, any checksum is a low-dimensional summary of a chunk of data so it will miss some errors, if not most... but simple checksums are nonetheless useful for finding non-malicious bug-type errors.
Or I could create a two-dimensional checksum, [sum(x[n]), sum(abs(x[n]))]. But is the best I can do, i.e. is there a different function I might use that would be "more orthogonal" to the sum(x[n])? And if I used some arbitrary functions, e.g. [sum(f1(x[n])), sum(f2(x[n]))], then how should my 'raw error tolerance' translate into 'checksum error tolerance'?
I'm programming in C++, but I'm happy to see answers in any language.

i have a feeling that what you want may be possible via something like gray codes. if you could translate your values into gray codes and use some kind of checksum that was able to correct n bits you could detect whether or not the two arrays were the same except for n-1 bits of error, right? (each bit of error means a number is "off by one", where the mapping would be such that this was a variation in the least significant digit).
but the exact details are beyond me - particularly for floating point values.
i don't know if it helps, but what gray codes solve is the problem of pathological rounding. rounding sounds like it will solve the problem - a naive solution might round and then checksum. but simple rounding always has pathological cases - for example, if we use floor, then 0.9999999 and 1 are distinct. a gray code approach seems to address that, since neighbouring values are always single bit away, so a bit-based checksum will accurately reflect "distance".
[update:] more exactly, what you want is a checksum that gives an estimate of the hamming distance between your gray-encoded sequences (and the gray encoded part is easy if you just care about 0.0001 since you can multiple everything by 10000 and use integers).
and it seems like such checksums do exist: Any error-correcting code can be used for error detection. A code with minimum Hamming distance, d, can detect up to d − 1 errors in a code word. Using minimum-distance-based error-correcting codes for error detection can be suitable if a strict limit on the minimum number of errors to be detected is desired.
so, just in case it's not clear:
multiple by minimum error to get integers
convert to gray code equivalent
use an error detecting code with a minimum hamming distance larger than the error you can tolerate.
but i am still not sure that's right. you still get the pathological rounding in the conversion from float to integer. so it seems like you need a minimum hamming distance that is 1 + len(data) (worst case, with a rounding error on each value). is that feasible? probably not for large arrays.
maybe ask again with better tags/description now that a general direction is possible? or just add tags now? we need someone who does this for a living. [i added a couple of tags]

I've spent a while looking for a deterministic answer, and been unable to find one. If there is a good answer, it's likely to require heavy-duty mathematical skills (functional analysis).
I'm pretty sure there is no solution based on "discretize in some cunning way, then apply a discrete checksum", e.g. "discretize into strings of 0/1/?, where ? means wildcard". Any discretization will have the property that two floating-point numbers very close to each other can end up with different discrete codes, and then the discrete checksum won't tell us what we want to know.
However, a very simple randomized scheme should work fine. Generate a pseudorandom string S from the alphabet {+1,-1}, and compute csx=sum(X_i*S_i) and csy=sum(Y_i*S_i), where X and Y are my original arrays of floating point numbers. If we model the errors as independent Normal random variables with mean 0, then it's easy to compute the distribution of csx-csy. We could do this for several strings S, and then do a hypothesis test that the mean error is 0. The number of strings S needed for the test is fixed, it doesn't grow linearly in the size of the arrays, so it satisfies my need for a "low-dimensional summary". This method also gives an estimate of the standard deviation of the error, which may be handy.

Try this:
#include <complex>
#include <cmath>
#include <iostream>
// PARAMETERS
const size_t no_freqs = 3;
const double freqs[no_freqs] = {0.05, 0.16, 0.39}; // (for example)
int main() {
std::complex<double> spectral_amplitude[no_freqs];
for (size_t i = 0; i < no_freqs; ++i) spectral_amplitude[i] = 0.0;
size_t n_data = 0;
{
std::complex<double> datum;
while (std::cin >> datum) {
for (size_t i = 0; i < no_freqs; ++i) {
spectral_amplitude[i] += datum * std::exp(
std::complex<double>(0.0, 1.0) * freqs[i] * double(n_data)
);
}
++n_data;
}
}
std::cout << "Fuzzy checksum:\n";
for (size_t i = 0; i < no_freqs; ++i) {
std::cout << real(spectral_amplitude[i]) << "\n";
std::cout << imag(spectral_amplitude[i]) << "\n";
}
std::cout << "\n";
return 0;
}
It returns just a few, arbitrary points of a Fourier transform of the entire data set. These make a fuzzy checksum, so to speak.

How about computing a standard integer checksum on the data obtained by zeroing the least significant digits of the data, the ones that you don't care about?

Is there a practical limit to the size of bit masks?

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.
I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.
What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?
I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.

Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):
// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//set the right bit
bytes[offset >> 3] |= (1 << (offset & 0x7));
return 0; //success
}
//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//get the right bit
return (bytes[offset >> 3] & (1 << (offset & 0x7));
}

I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";
(journalling masks in flash memory if you want to know)
many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.
My 2 cents.

With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)
How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.
If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).
Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.

Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )
However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.
( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )
The instant you have to crack open a constants table to debug code you know you've gone too far.

Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.

For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.
That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.

You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.
Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.
Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas