Octave minimization for a many-body Hamiltonian with non-linear constraint

Octave minimization for a many-body Hamiltonian with non-linear constraint - optimization

I work in theoretical physics, and I have come upon a problem that requires the minimization of a particular Hamiltonian operator for a system of 8 particles, with one non-linear constraint. Due to the complexity of the system, I cannot define the entire Hamiltonian "in one go", nor the constraint. By this I mean that the quantity I am searching for is defined recurrently, depending on complex summations over quantities calculated for systems of 7 particles, which in turn depend on quantities calculated for systems of 6, and so on, until it reaches a one or two-particle system, for which said quantities are given as initial values, dependent on the elements of a column vector (the argument/minization parameters). The constraint itself is also of this form, requiring the "overlap" between the states of 8 particles to be exactly 1. (I.E. the state be normalized) I have been thinking of a way to use fmincon for this, but I've come up short, since my function has an implicit dependence on the parameters, and I can't write the whole thing explicitly. For a better understanding, here is some of the code:
for m=3:npairs+1
for n=3:npairs+1
for i=1:nsps
for j=1:nsps
overlap(m,n)=overlap(m,n)+x(i)*x(j)*(delta(i,j)*(overlap(m-1,n-1)-N(m-1,n-1,i))+p0p(m-1,n-1,j,i));
p(m,n,i)=(n-1)*x(i)*overlap(m,n-1)-(n-2)*(n-1)*x(i)*x(i)*((m-1)*x(i)*overlap(m-1,n-1)-(m-2)*(m-1)*x(i)*x(i)*p(m-1,n-1,i));
N(m,n,i)=2*(n-1)*x(i)*p(n-1,m,i);
p0p(m,n,i,j)=(m-1)*(n-1)*x(i)*x(j)*overlap(m-1,n-1)-(m-1)*(n-1)*(m-2)*x(i)*x(i)*x(j)*p(m-2,n-1,i)-(m-1)*(n-1)*(n-2)*x(i)*x(j)*x(j)*p0(m-1,n-2,j)-(m-1)*(n-1)*(m-2)*(n-2)*x(i)*x(i)*x(j)*x(j)*(delta(i,j)*(overlap(m-2,n-2)-N(m-2,n-2,i))+p0p(m-2,n-2,j,i));
endfor
endfor
endfor
endfor
function [E]=H(x)
E=summation over all i and j of N and p0p for m=n=8 %not actual code
endfunction
overlap(9,9)=1 %constraint

It's hard to give a specific answer, but I would advise the following to get you started.
First, note that, the inner two steps of the nest loop can be vectorised, since i and j always appear as indices (whereas m and n make backreferences, so they cannot be vectorised). So your 4-level loop can be reduced to a 2-level loop containing 4 functions operating over i-by-j matrices.
Second, note that the whole construct can be expressed as a recursive function. If you have suitable base cases for m = 0, n = 0, you can iteratively obtain all i,j matrices for all cases up to m=9,n=9. In particular, you can try to 'memoize' the early steps, and plug them into higher steps, rather than rely on actual recursion.

Assuming you need to sum with the first two indeces fixed to 8 (if I understood correctly), you can easily do with Anonymous Functions
https://octave.org/doc/v6.1.0/Anonymous-Functions.html#Anonymous-Functions
# creating same data
A=ones(8,8,4,4);
B=2*ones(8,8,4,4);
# defining 2 versions of sums
f = #(A,B) [sum(sum(A(8,8,:,:))), sum(sum(B(8,8,:,:)))];
g = #(A,B) sum(sum(A(8,8,:,:)))+ sum(sum(B(8,8,:,:)));
E1=f(A,B)
E2=g(A,B)
the output will be:
octave:21> E1=f(A,B)
E1 =
16 32
octave:22> E2=g(A,B)
E2 = 48

Related

Re. optimization: Does x += y at the center of a loop always cause a read after write data dependency and so prevent vectorization?

My question is:
Re. optimization: Does x += y at the center of a loop always cause a read after write data dependency and so prevent vectorization?
See https://cvw.cac.cornell.edu/vector/coding_dependencies
Read after write ("flow" or "RAW") dependency
This kind of dependency is not vectorizable. It occurs when the values
of variables involved in a particular loop iteration (the "read") are
determined in a previous loop iteration (the "write"). In other words,
a variable is read (used as an operand to a mathematical operation)
after its value has been modified by a previous loop iteration.
This question is very general in that it is basically asking whether using the += operator in the center of a loop precludes vectorization by causing a read after write ("flow" or "RAW") data dependency.
Eg.
for(i...){
for(j...){
x(i,j) += y(i,j)
}
}
See
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Example 14: Double reduction:

In your example, there is no issue with data dependencies assuming the array does not alias each other: the loop can be easily and safely vectorized since each x(i,j) value is only dependent of y(i,j) and y(i,j) is only read in the loop. If x and y are overlapping, then it is much harder to vectorize the loop since it causes an implicit dependency between the computation of x(i,j) value (because y(i,j) can alias with x(i-1,j) which is computed just before).
In general, automatic vectorization is very hard if not even possible when there is a dependency chain (since the sequential order is mainly the only possible one). The provided article mention such a case (where each operation require the previous one to be computed due to a data dependency). This has nothing to do with the specific += operator. It is all about data dependencies (or any loop carried dependency).
Notes
Regarding floating-point (FP) numbers, operations are no associative by default, so there is no way to execute something like x(0) + x(1) + x(2) + ... in a vectorized way (if x is an array of FP numbers). Indeed, the only possible correct order is ((x(0) + x(1)) + x(2)) + ... based on the IEEE-754 standard. Compiler options can be used to consider FP operations as associative so to be able to execute this kind of computation in a vectorized way. This breaks the IEEE-754 standard and some codes requiring the ordering not to be changed (eg. Kahan summation). Also please note that while the above expression can be vectorized (see parallel scan), it is generally not very efficient on most mainstream CPU yet. Still there are hardware that can benefit from parallel scan: mainstream discrete GPUs.

Optimizing Portfolio With Bounds on Weights and Costs

I wish to create efficient frontiers for portfolios with bounds on both weights and costs. The following code provides the frontiers for portfolios in which the underlying assets are bounded with minimum and maximum weights. How do I add to this a secondary constraint in which the combined annual charges of the underlying assets do not exceed a maximum? Assume each asset has an annual cost which is applied as a percentage. As such the combined weights*charges should not exceed x%.
lb=Bounds(:,1);
ub=Bounds(:,2);
P = Portfolio('AssetList', AssetList,'LowerBound', lb, 'UpperBound', ub, 'Budget', 1);
P = P.estimateAssetMoments(AssetReturns);
[Passetmean, Passetcovar] = P.getAssetMoments;
Correlations=corrcoef(AssetReturns);
% Estimate Frontier
pwgt = P.estimateFrontier(20);
[prsk, pret] = P.estimatePortMoments(pwgt);

Mary,
having entered another set of constraint principles into the model, kindly notice, that the modified efficient frontier problem is out of the grounds of a guaranteed convex-optimisation problem.
Thus one may forget about a comfort of all the popular fmicg(), l-bgfs et al solvers.
This will not simply have a SLOC one-liner to get answer(s) out of the box.
Non-linear problems will require ( the wilder, the more ... ) you to assemble another optimisation function, be it either
a brute-force based scanner,
with a fully orthogonal mesh scanned, with "utility function" defined so that, as the given requirement states, it incorporates also the add-on cost-of-beholding a Portfolio item
or
a genetic-algorithm based approach,
in a belief, the brute-force one might become as time-extensive as to cease to be a feasible approach and a GA-evolution may yield acceptable sub-optimal (local optima) outputs

Can I run a GA to optimize wavelet transform?

I am running a wavelet transform (cmor) to estimate damping and frequencies that exists in a signal.cmor has 2 parameters that I can change them to get more accurate results. center frequency(Fc) and bandwidth frequency(Fb). If I construct a signal with few freqs and damping then I can measure the error of my estimation(fig 2). but in actual case I have a signal and I don't know its freqs and dampings so I can't measure the error.so a friend in here suggested me to reconstruct the signal and find error by measuring the difference between the original and reconstructed signal e(t)=|x(t)−x^(t)|.
so my question is:
Does anyone know a better function to find the error between reconstructed and original signal,rather than e(t)=|x(t)−x^(t)|.
can I use GA to search for Fb and Fc? or do you know a better search method?
Hope this picture shows what I mean, the actual case is last one. others are for explanations
Thanks in advance

You say you don't know the error until after running the wavelet transform, but that's fine. You just run a wavelet transform for every individual the GA produces. Those individuals with lower errors are considered fitter and survive with greater probability. This may be very slow, but conceptually at least, that's the idea.
Let's define a Chromosome datatype containing an encoded pair of values, one for the frequency and another for the damping parameter. Don't worry too much about how their encoded for now, just assume it's an array of two doubles if you like. All that's important is that you have a way to get the values out of the chromosome. For now, I'll just refer to them by name, but you could represent them in binary, as an array of doubles, etc. The other member of the Chromosome type is a double storing its fitness.
We can obviously generate random frequency and damping values, so let's create say 100 random Chromosomes. We don't know how to set their fitness yet, but that's fine. Just set it to zero at first. To set the real fitness value, we're going to have to run the wavelet transform once for each of our 100 parameter settings.
for Chromosome chr in population
chr.fitness = run_wavelet_transform(chr.frequency, chr.damping)
end
Now we have 100 possible wavelet transforms, each with a computed error, stored in our set called population. What's left is to select fitter members of the population, breed them, and allow the fitter members of the population and offspring to survive into the next generation.
while not done
offspring = new_population()
while count(offspring) < N
parent1, parent2 = select_parents(population)
child1, child2 = do_crossover(parent1, parent2)
mutate(child1)
mutate(child2)
child1.fitness = run_wavelet_transform(child1.frequency, child1.damping)
child2.fitness = run_wavelet_transform(child2.frequency, child2.damping)
offspring.add(child1)
offspring.add(child2)
end while
population = merge(population, offspring)
end while
There are a bunch of different ways to do the individual steps like select_parents, do_crossover, mutate, and merge here, but the basic structure of the GA stays pretty much the same. You just have to run a brand new wavelet decomposition for every new offspring.

Efficient random permutation of n-set-bits

For the problem of producing a bit-pattern with exactly n set bits, I know of two practical methods, but they both have limitations I'm not happy with.
First, you can enumerate all of the possible word values which have that many bits set in a pre-computed table, and then generate a random index into that table to pick out a possible result. This has the problem that as the output size grows the list of candidate outputs eventually becomes impractically large.
Alternatively, you can pick n non-overlapping bit positions at random (for example, by using a partial Fisher-Yates shuffle) and set those bits only. This approach, however, computes a random state in a much larger space than the number of possible results. For example, it may choose the first and second bits out of three, or it might, separately, choose the second and first bits.
This second approach must consume more bits from the random number source than are strictly required. Since it is choosing n bits in a specific order when their order is unimportant, this means that it is making an arbitrary distinction between n! different ways of producing the same result, and consuming at least floor(log_2(n!)) more bits than are necessary.
Can this be avoided?
There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.
clarification
The first approach requires picking a single random number between zero and (where w is the output size), as this is the number of possible solutions.
The second approach requires picking n random values between zero and w-1, zero and w-2, etc., and these have a product of , which is times larger than the first approach.
This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits.

Seems like you want a variant of Floyd's algorithm:
Algorithm to select a single, random combination of values?
Should be especially useful in your case, because the containment test is a simple bitmask operation. This will require only k calls to the RNG. In the code below, I assume you have randint(limit) which produces a uniform random from 0 to limit-1, and that you want k bits set in a 32-bit int:
mask = 0;
for (j = 32 - k; j < 32; ++j) {
r = randint(j+1);
b = 1 << r;
if (mask & b) mask |= (1 << j);
else mask |= b;
}
How many bits of entropy you need here depends on how randint() is implemented. If k > 16, set it to 32 - k and negate the result.
Your alternative suggestion of generating a single random number representing one combination among the set (mathematicians would call this a rank of the combination) is simpler if you use colex order rather than lexicographic rank. This code, for example:
for (i = k; i >= 1; --i) {
while ((b = binomial(n, i)) > r) --n;
buf[i-1] = n;
r -= b;
}
will fill the array buf[] with indices from 0 to n-1 for the k-combination at colex rank r. In your case, you'd replace buf[i-1] = n with mask |= (1 << n). The binomial() function is binomial coefficient, which I do with a lookup table (see this). That would make the most efficient use of entropy, but I still think Floyd's algorithm would be a better compromise.

[Expanding my comment:] If you only have a little raw entropy available, then use a PRNG to stretch it further. You only need enough raw entropy to seed a PRNG. Use the PRNG to do the actual shuffle, not the raw entropy. For the next shuffle reseed the PRNG with some more raw entropy. That spreads out the raw entropy and makes less of a demand on your entropy source.
If you know exactly the range of numbers you need out of the PRNG, then you can, carefully, set up your own LCG PRNG to cover the appropriate range while needing the minimum entropy to seed it.
ETA: In C++there is a next_permutation() method. Try using that. See std::next_permutation Implementation Explanation for more.

Is this a theory problem or a practical problem?
You could still do the partial shuffle, but keep track of the order of the ones and forget the zeroes. There are log(k!) bits of unused entropy in their final order for your future consumption.
You could also just use the recurrence (n choose k) = (n-1 choose k-1) + (n-1 choose k) directly. Generate a random number between 0 and (n choose k)-1. Call it r. Iterate over all of the bits from the nth to the first. If we have to set j of the i remaining bits, set the ith if r < (i-1 choose j-1) and clear it, subtracting (i-1 choose j-1), otherwise.
Practically, I wouldn't worry about the couple of words of wasted entropy from the partial shuffle; generating a random 32-bit word with 16 bits set costs somewhere between 64 and 80 bits of entropy, and that's entirely acceptable. The growth rate of the required entropy is asymptotically worse than the theoretical bound, so I'd do something different for really big words.
For really big words, you might generate n independent bits that are 1 with probability k/n. This immediately blows your entropy budget (and then some), but it only uses linearly many bits. The number of set bits is tightly concentrated around k, though. For a further expected linear entropy cost, I can fix it up. This approach has much better memory locality than the partial shuffle approach, so I'd probably prefer it in practice.

I would use solution number 3, generate the i-th permutation.
But do you need to generate the first i-1 ones?
You can do it a bit faster than that with kind of divide and conquer method proposed here: Returning i-th combination of a bit array and maybe you can improve the solution a bit

Background
From the formula you have given - w! / ((w-n)! * n!) it looks like your problem set has to do with the binomial coefficient which deals with calculating the number of unique combinations and not permutations which deals with duplicates in different positions.
You said:
"There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.
...
This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits."
So, there is a way to efficiently compute the nth unique combination, or rank, from the k-indexes. The k-indexes refers to a unique combination. For example, lets say that the n choose k case of 4 choose 3 is taken. This means that there are a total of 4 numbers that can be selected (0, 1, 2, 3), which is represented by n, and they are taken in groups of 3, which is represented by k. The total number of unique combinations can be calculated as n! / ((k! * (n-k)!). The rank of zero corresponds to the k-index of (2, 1, 0). Rank one is represented by the k-index group of (3, 1, 0), and so forth.
Solution
There is a formula that can be used to very efficiently translate between a k-index group and the corresponding rank without iteration. Likewise, there is a formula for translating between the rank and corresponding k-index group.
I have written a paper on this formula and how it can be seen from Pascal's Triangle. The paper is called Tablizing The Binomial Coeffieicent.
I have written a C# class which is in the public domain that implements the formula described in the paper. It uses very little memory and can be downloaded from the site. It performs the following tasks:
Outputs all the k-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the k-index to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the entire set.
Converts the index in a sorted binomial coefficient table to the corresponding k-index. The technique used is also much faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers. This version returns a long value. There is at least one other method that returns an int. Make sure that you use the method that returns a long value.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with at least 2 cases and there are no known bugs.
The following tested example code demonstrates how to use the class and will iterate through each unique combination:
public void Test10Choose5()
{
String S;
int Loop;
int N = 10; // Total number of elements in the set.
int K = 5; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the indexes for a lexigraphic element.
int[] KIndexes = new int[K];
StringBuilder SB = new StringBuilder();
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Combo, KIndexes);
// Verify that the Kindexes returned can be used to retrive the
// rank or lexigraphic order of the KIndexes in the table.
int Val = BC.GetIndex(true, KIndexes);
if (Val != Combo)
{
S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
Console.WriteLine(S);
}
SB.Remove(0, SB.Length);
for (Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
S = "KIndexes = " + SB.ToString();
Console.WriteLine(S);
}
}
So, the way to apply the class to your problem is by considering each bit in the word size as the total number of items. This would be n in the n!/((k! (n - k)!) formula. To obtain k, or the group size, simply count the number of bits set to 1. You would have to create a list or array of the class objects for each possible k, which in this case would be 32. Note that the class does not handle N choose N, N choose 0, or N choose 1 so the code would have to check for those cases and return 1 for both the 32 choose 0 case and 32 choose 32 case. For 32 choose 1, it would need to return 32.
If you need to use values not much larger than 32 choose 16 (the worst case for 32 items - yields 601,080,390 unique combinations), then you can use 32 bit integers, which is how the class is currently implemented. If you need to use 64 bit integers, then you will have to convert the class to use 64 bit longs. The largest value that a long can hold is 18,446,744,073,709,551,616 which is 2 ^ 64. The worst case for n choose k when n is 64 is 64 choose 32. 64 choose 32 is 1,832,624,140,942,590,534 - so a long value will work for all 64 choose k cases. If you need numbers bigger than that, then you will probably want to look into using some sort of big integer class. In C#, the .NET framework has a BigInteger class. If you are working in a different language, it should not be hard to port.
If you are looking for a very good PRNG, one of the fastest, lightweight, and high quality output is the Tiny Mersenne Twister or TinyMT for short . I ported the code over to C++ and C#. it can be found here, along with a link to the original author's C code.
Rather than using a shuffling algorithm like Fisher-Yates, you might consider doing something like the following example instead:
// Get 7 random cards.
ulong Card;
ulong SevenCardHand = 0;
for (int CardLoop = 0; CardLoop < 7; CardLoop++)
{
do
{
// The card has a value of between 0 and 51. So, get a random value and
// left shift it into the proper bit position.
Card = (1UL << RandObj.Next(CardsInDeck));
} while ((SevenCardHand & Card) != 0);
SevenCardHand |= Card;
}
The above code is faster than any shuffling algorithm (at least for obtaining a subset of random cards) since it only works on 7 cards instead of 52. It also packs the cards into individual bits within a single 64 bit word. It makes evaluating poker hands much more efficient as well.
As a side, note, the best binomial coefficient calculator I have found that works with very large numbers (it accurately calculated a case that yielded over 15,000 digits in the result) can be found here.

Why are we using i as a counter in loops? [closed]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I know this might seem like an absolutely silly question to ask, yet I am too curious not to ask...
Why did "i" and "j" become THE variables to use as counters in most control structures?
Although common sense tells me they are just like X, which is used for representing unknown values, I can't help to think that there must be a reason why everyone gets taught the same way over and over again.
Is it because it is actually recommended for best practices, or a convention, or does it have some obscure reason behind it?
Just in case, I know I can give them whatever name I want and that variables names are not relevant.

It comes ultimately from mathematics: the summation notation traditionally uses i for the first index, j for the second, and so on. Example (from http://en.wikipedia.org/wiki/Summation):
It's also used that way for collections of things, like if you have a bunch of variables x1, x2, ... xn, then an arbitrary one will be known as xi.
As for why it's that way, I imagine SLaks is correct and it's because I is the first letter in Index.

I believe it dates back to Fortran. Variables starting with I through Q were integer by default, the others were real. This meant that I was the first integer variable, and J the second, etc., so they fell towards use in loops.

Mathematicians were using i,j,k to designate integers in algebra (subscripts, series, summations etc) long before (e.g 1836 or 1816) computers were around (this is the origin of the FORTRAN variable type defaults). The habit of using letters from the end of the alphabet (...,x,y,z) for unknown variables and from the beginning (a,b,c...) for constants is generally attributed to Rene Descartes, (see also here) so I assume i,j,k...n (in the middle of the alphabet) for integers is likely due to him too.

i = integer
Comes from Fortran where integer variables had to start with the letters I through N and real variables started with the other letters. Thus I was the first and shortest integer variable name. Fortran was one of the earliest programming languages in widespread use and the habits developed by programmers using it carried over to other languages.
EDIT: I have no problem with the answer that it derives from mathematics. Undoubtedly that is where the Fortran designers got their inspiration. The fact is, for me anyway, when I started to program in Fortran we used I, J, K, ... for loop counters because they were short and the first legally allowed variable names for integers. As a sophomore in H.S. I had probably heard of Descartes (and a very few others), but made very little connection to mathematics when programming. In fact, the first course I took was called "Fortran for Business" and was taught not by the math faculty, but the business/econ faculty.
For me, at least, the naming of variables had little to do with mathematics, but everything due to the habits I picked up writing Fortran code that I carried into other languages.

i stands for Index.
j comes after i.

These symbols were used as matrix indexes in mathematics long before electronic computers were invented.

I think it's most likely derived from index (in the mathematical sense) - it's used commonly as an index in sums or other set-based operations, and most likely has been used that way since before there were programming languages.

There's a preference in maths for using consecutive letters in the alphabet for "anonymous" variables used in a similar way. Hence, not just "i, j, k", but also "f, g, h", "p, q, r", "x, y, z" (rarely with "u, v, w" prepended), and "α, β, γ".
Now "f, g, h" and "x, y, z" are not used freely: the former is for functions, the latter for dimensions. "p, q, r" are also often used for functions.
Then there are other constraints on available sequences: "l" and "o" are avoided, because they look too much like "1" and "0" in many fonts. "t" is often used for time, "d & δ" for differentials, and "a, s, m, v" for the physical measures of acceleration, displacement, mass, and velocity. That leaves not so many gaps of three consecutive letters without unwanted associations in mathematics for indices.
Then, as several others have noticed, conventions from mathematics had a strong influence on early programming conventions, and "α, β, γ" weren't available in many early character sets.

I found another possible answer that could be that i, j, and k come from Hamilton's Quaternions.
Euler picked i for the imaginary unit.
Hamilton needed two more square roots of -1:
ii = jj = kk = ijk = -1
Hamilton was really influential, and quaternions were the standard way to do 3D analysis before 1900. By then, mathematicians were used to thinking of (ijk) as a matched set.
Vector calculus replaced quaternionic analysis in the 1890s because it was a better way to write Maxwell's equations. But people tended to write vector quantities as like this: (3i-2j+k) instead of (3,-2,1). So (ijk) became the standard basis vectors in R^3.
Finally, physicists started using group theory to describe symmetries in systems of differential equations. So (ijk) started to connote "vectors that get swapped around by permutation groups," then drifted towards "index-like things that take on all possible values in some specified set," which is basically what they mean in a for loop.

by discarding (a little biased)
a seems an array
b seems another array
c seems a language name
d seems another language name
e seems exception
f looks bad in combination with "for" (for f, a pickup?)
g seems g force
h seems height
i seems an index
j seems i (another index)
k seems a constant k
l seems a number one (1)
m seems a matrix
n seems a node
o seems an output
p sounds like a pointer
q seems a queue
r seems a return value
s seems a string
t looks like time
u reserved for UVW mapping or electic phase
v reserved for UVW mapping or electic phase or a vector
w reserved for UVW mapping or electic phase or a weight
x seems an axis (or an unknown variable)
y seems an axis
z seems a third axis

One sunny afternoon, Archimedes what pondering (as was usual for sunny afternoons) and ran into his buddy Eratosthenes.
Archimedes said, "Archimedes to Eratosthenes greeting! I'm trying to come up with a solution to the ratio of several spherical rigid bodies in equilibrium. I wish to iterate over these bodies multiple times, but I'm having a frightful time keeping track of how many iterations I've done!"
Eratosthenes said, "Why Archimedes, you ripe plum of a kidder, you could merely mark successive rows of lines in the sand, each keeping track of the number of iterations you've done within iteration!"
Archimedes cried out to the world that his great friend was undeniably a shining beacon of intelligence for coming up with such a simple solution. But Archimedes remarked that he likes to walk in circles around his sand pit while he ponders. Thus, there was risk of losing track of which row was on top, and which was on bottom.
"Perhaps I should mark these rows with a letter of the alphabet just off to the side so that I will always know which row is which! What think you of that?" he asked, then added, "But Eratosthenes... whatever letters shall I use?"
Eratosthenes was sure he didn't know which letters would be best, and said as much to Archimedes. But Archimedes was unsatisfied and continued to prod the poor librarian to choose, at least, the two letters that he would require for his current sphere equilibrium solution.
Eratosthenes, finally tired of the incessant request for two letters, yelled, "I JUST DON'T KNOW!!!"
So Archimedes chose the first two letters in Eratosthenes' exclamatory sentence, and thanked his friend for the contribution.
These symbols were quickly adopted by ancient Greek Java developers, and the rest is, well... history.

i think it's because a lot of loops use an Int type variable to do the counting, like
for (int i = 0; etc
and when you type, you actually speak it out in your head (like when you read), so in your mind, you say 'int....'
and when you have to make up a letter right after that 'int....' , you say / type the 'i' because that is the first letter you think of when you've just said 'int'
like you spell a word to kids who start learning reading you spell words for them by using names, like this:
WORD spells William W, Ok O, Ruby R, Done D
So you say Int I, Double d, Float f, string s etc. based on the first letter.
And j is used because when you have done int I, J follows right after it.

I think it's a combination of the other mentioned reasons :
For starters, 'i' was commonly used by mathematicians in their notation, and in the early days of computing with languages that weren't binary (ie had to be parsed and lexed in some fashion), the vast majority of users of computers were also mathematicians (... and scientists and engineers) so the notation fell into use in computer languages for programming loops, and has kind of just stuck around ever since.
Combine this with the fact that screen space in those very early days was very limited, as was memory, it made sense to keep shorter variable names.

Possibly historical ?
FORTRAN, aurguably the first high level language, defined i,j,k,l,m as Integer datatypes by default, and loops could only be controlled by integer variable, the convention continues ?
eg:
do 100 i= j,100,5
....
100 continue
....

i = iterator, i = index, i = integer
Which ever you figure "i" stands for it still "fits the bill".
Also, unless you have only a single line of code within that loop, you should probably be naming the iterator/index/integer variable to something more meaningful. Like: employeeIndex
BTW, I usually use "i" in my simple iterator loops; unless of course it contains multiple lines of code.

i = iota, j = jot; both small changes.
iota is the smallest letter in the greek alphabet; in the English language it's meaning is linked to small changes, as in "not one iota" (from a phrase in the New Testament: "until heaven and earth pass away, not an iota, not a dot, will pass from the Law" (Mt 5:18)).
A counter represents a small change in a value.
And from iota comes jot (iot), which is also a synonym for a small change.
cf. http://en.wikipedia.org/wiki/Iota

Well from Mathematics: (for latin letters)
a,b: used as constants or as integers for a rational number
c: a constant
d: derivative
e: Euler's number
f,g,h: functions
i,j,k: are indexes (also unit vectors and the quaternions)
l: generally not used. looks like 1
m,n: are rows and columns of matrices or as integers for rational numbers
o: also not used (unless you're in little o notation)
p,q: often used as primes
r: sometimes a spatial change of variable other times related to prime numbers
s,t: spatial and temporal variables or s is used as a change of variable for t
u,v,w: change of variable
x,y,z: variables

Many possible main reasons, I guess:
mathematicians use i and j for Natural Numbers in formulas (the ones that use Complex Numbers rarely, at least), so this carried over to programming
from C, i hints to int. And if you need another int then i2 is just way too long, so you decide to use j.
there are languages where the first letter decides the type, and i is then an integer.

It comes from Fortran, where i,j,k,l,m,n are implicitly integers.

It definitely comes from mathematics, which long preceded computer programming.
So, where did if come from in math? My completely uneducated guess is that it's as one fellow said, mathematicians like to use alphabetic clusters for similar things -- f, g, h for functions; x, y, z for numeric variables; p, q, r for logical variables; u, v, w for other sets of variables, especially in calculus; a, b, c for a lot of things. i, j, k comes in handy for iterative variables, and that about exhausts the possibilities. Why not m, n? Well, they are used for integers, but more often the end points of iterations rather than the iterative variables themselves.
Someone should ask a historian of mathematics.

Counters are so common in programs, and in the early days of computing, everything was at a premium...
Programmers naturally tried to conserve pixels, and the 'i' required fewer pixels than any other letter to represent. (Mathematicians, being lazy, picked it for the same reason - as the smallest glyph).
As stated previously, 'j' just naturally followed...
:)

I use it for a number of reasons.
Usually my loops are int based, so
you make a complete triangle on the
keyboard typing "int i" with the
exception of the space I handle with
my thumb. This is a very fast
sequence to type.
The "i" could stand for iterator, integer, increment, or index, each of which makes
logical sense.
With my personal uses set aside, the theory of it being derived from FORTRAN is correct, where integer vars used letters I - N.

I learned FORTRAN on a Control Data Corp. 3100 in 1965. Variables starting with 'I' through 'N' were implied to be integers. Ex: 'IGGY' and 'NORB' were integers, 'XMAX' and 'ALPHA' were floating-point. However, you could override this through explicit declaration.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas