Sum of a two dimensional array - sum

I have this 2D array L(i,j). How can I sum all the elements depending of i and make the result as a function of j
I did :
do j=1,10
do i =1,30
T(j) = Sum( L(:,j)
end do
end do
Is that ok?

Almost... you don't use i (and you don't need to), and you are missing one bracket:
do j=1,10
T(j) = Sum( L(:,j) )
enddo ! j
You could also use the dimension parameter in sum to do this operation in one line:
T = sum( L, dim=1 )
However, I find that very difficult to read and would stick with the loop - it shouldn't make a difference in terms of performance.

Related

summing a square array

Hi I am trying to sum a 2-dimensional square array. Assume the array F(i,j) is well defined for all points i,j. I can sum the whole array by just sum(F), but I want to sum the array by starting from a small square, and then up to the full size array, which will require a DO loop. If I was to write out the logic tediously, I want to sum the array F(i,j) as follows:
DO i = -1,1
DO j = -1,1
value1 = sum(F)
END DO
END DO
DO i = -2,2
DO j = -2,2
value2 = sum(F)
END DO
END DO
DO i = -3,3
DO j = -3,3
value3 = sum(F)
END DO
END DO
and proceed up to i=-30,30, j=-30,30. I tried to implement this in one DO loop by
DO i = -30,30
DO j = -30,30
value4 = sum(F(i:i+1,j:j+1))
END DO
END DO
but this gives me incorrect results. How can I fix this so I can implement this in all in a single do loop? Thanks.
If I am deciphering what you are trying to do, you have a 2-D array with custom extents from -30 to 30 in both dimensions? Then you want to start with a 3x3 matrix in the center of this, get a sum. Then enlarge it to a 5x5 and get a sum. Keep going until you get the sum of the whole matrix.
You will have 30 sums then. Yes you can do this in a single do loop. Putting the answers in a 1-D array Sums(30) would look like this:
do i = 1, 30
Sums(i) = Sum(F(-i:i,-i:i))
end do
The index notation in F carves out growing square matrices to feed into the Sum function.

vectorize join condition in pandas

This code is working correctly as expected. But it takes a lot of time for large dataframes.
for i in excel_df['name_of_college_school'] :
for y in mysql_df['college_name'] :
if SequenceMatcher(None, i.lower(), y.lower() ).ratio() > 0.8:
excel_df.loc[excel_df['name_of_college_school'] == i, 'dupmark4'] = y
I guess, I can not use a function on join clause to compare values like this.
How do I vectorize this?
Update:
Is it possible to update with the highest score? This loop will overwrite the earlier match and it is possible that the earlier match was more relevant than current one.
What you are looking for is fuzzy merging.
a = excel_df.as_matrix()
b = mysql_df.as_matrix()
for i in a:
for j in b:
if SequenceMatcher(None,
i[college_index_a].lower(), y[college_index_b].lower() ).ratio() > 0.8:
i[dupmark_index] = j
Never use loc in a loop, it has a huge overhead. And btw, get the index of the respective columns, (the numerical one). Use this -
df.columns.get_loc("college name")
You could avoid one of the loops using apply and instead of MxN .loc operations, now it'll be M operations.
for y in mysql_df['college_name']:
match = excel_df['name_of_college_school'].apply(lambda x: SequenceMatcher(
None, x.lower(), y.lower()).ratio() > 0.8)
excel_df.loc[match, 'dupmark4'] = y

Fortran: efficient matrix-vector multiplication

I have a piece of code which is a significant bottleneck:
do s = 1,ns
msum = 0.d0
do k = 1,ns
msum = msum + tm(k,s)*f(:,:,k)
end do
m(:,:,s) = msum
end do
This is a simple matrix-vector product m=tm*f (where f is length k) for every x,y.
I thought about using a BLAS routine but i am not sure if any allows multiplying along a specific dimension (k). Do any of you have any good advice?
Unfortunately you do not mention the actual shape of f, i.e. the number of x and y. Since you mention this piece of code to be a bottleneck, you can and should replace msum and use the memory m(:,:,s) and spare the first step in you loop, e.g.
do s = 1,ns
m = tm(k,1)*f(:,:,k)
do k = 2, ns
m(:,:,s) = m(:,:,s) + tm(k,s)*f(:,:,k)
end do
end do
Secondly, a more general appraoch
There are ns summations of nK 2D matrices f(:,:,1:nK) by means of scalar factors that are stored in tm(:,1:ns). The goal is to store these sums in m(:,:,1:ns). Why not sum up element-wise wrt x and y to exploit contiguuos memory sections by means of the result? You already mentioned that you can redesign such that k is the first dimension in f, i.e. f(k,:,:).
Considering only the desired outcome, you ought to have ns 2D matrices m(:,:,1:ns) that are independent of each other (outer loop remains at it is). Lets drop this dimension for a moment. The problem then becomes:
m(:,:) = \sum_{k=1}^{ns} tm_k * f_k(:,:)
We should thus sum over k, e.g. have f(k,:,:) to determine m(:,:) as follows (note that I am adding the outer loop for s again):
nK = size(f, 1) ! the "k"s
nX = size(f, 2) ! the "x"s
nY = size(f, 3) ! the "y"s
m = 0.d0
do s = 1, ns
do ii = 1, nY
call DGEMV('N', nK, nY, &
1.d0, f(:,:,nY), 1, tm(:,s), 1, &
1.d0, m(:,nY,s), 1)
end do !ii
end do !s
See the documentation of DGEMV for more details on its usage.
Of course, the above advice of excluding the first step of the loop to spare the initialization by means of zeros may be applied at well.

Correct loop invariant?

I am trying to find the loop invariant in the following code:
Find Closest Pair Iter(A) :
# Precondition: A is a non-empty list of 2D points and len(A) > 1.
# Postcondition: Returns a pair of points which are the two closest points in A.
min = infinity
p = -1
q = -1
for i = 0,...,len(A) - 1:`=
for j = i + 1,...,len(A) - 1:
if Distance(A[i],A[j]) < min:
min = Distance(A[i],A[j])
p = i
q = j
return (A[p],A[q])
I think the loop invariant is min = Distance(A[i],A[j]) so closest point in A is A[p] and a[q] .
I'm trying to show program correctness. Here I want to prove the inner loop by letting i be some constant, then once I've proven the inner loop, replace it by it's loop invariant and prove the outer loop. By the way this is homework. Any help will be much appreciated.
I'm not sure I fully understand what you mean by replacing the inner loop by its loop invariant. A loop invariant is a condition that holds before the loop and after every iteration of the loop (including the last one).
That being said, I wouldn't like to spoil your homework, so I'll try my best to help you without giving too much of the answer away. Let me try:
There are three variables in your algorithm that hold very important values (min, p and q). You should ask yourself what is true about these values as the algorithm goes through each pair of points (A[i], A[j])?
In a simpler example: if you were designing an algorithm to sum values in a list, you would create a variable called sum before the loop and assign 0 to it. You would then sum the elements one by one through a loop, and then return the variable sum.
Since it is true that this variable holds the sum of every single element "seen" in the loop, and since after the main loop the algorithm will have "seen" every element in the list, the sum variable necessarily holds the sum of all values in the list. In this case the loop invariant would be: The sum variable holds the sum of every element "seen" so far.
Good luck with your homework!

Algorithm - find the minimal subtraction between sum of two arrays

I am hunting job now and doing many algorithm exercises. Here is my problem:
Given two arrays: a and b with same length, the subject is to make |sum(a)-sum(b)| minimal, by swapping elements between a and b.
Here is my though:
assume we swap a[i] and b[j], set Delt = sum(a) - sum(b), x = a[i]-b[j]
then Delt2 = sum(a)-a[i]+b[j] - (sum(b)-b[j]+a[i]) = Delt - 2*x,
then the change = |Delt| - |Delt2|, which is proportional to |Delt|^2 - |Delt2|^2 = 4*x*(Delt-x),
Based on the thought above I got the following code:
Delt = sum(a) - sum(b);
done = false;
while(!done)
{
done = true;
for i = [0, n)
{
for j = [0,n)
{
x = a[i]-b[j];
change = x*(Delt-x);
if(change >0)
{
swap(a[i], b[j]);
Delt = Delt - 2*x;
done = false;
}
}
}
}
However, does anybody have a much better solution ? If you got, please tell me and I would be very grateful of you!
This problem is basically the optimization problem for Partition Problem with an extra constraint of equal parts. I'll prove that adding this constraint doesn't make the problem easier.
NP-Hardness proof:
Assume there was an algorithm A that solves this problem in polynomial time, we can solve the Partition-Problem in polynomial time.
Partition(S):
for i in range(|S|):
S += {0}
result <- A(S\2,S\2) //arbitrary split S into 2 parts
if result is a partition: //simple to check, since partition is NP.
return true.
return false //no partition
Correctness:
If there is a partition denote as (S1,S2) [assume S2 has more elements], on iteration |S2|-|S1| [i.e. when adding |S2|-|S1| zeros]. The input to A will contatin enough zeros so we can return two equal length arrays: S2,S1+{0,0,...,0}, which will be a partition to S, and the algorithm will yield true.
If the algorithm yields true, and iteration k, we had two arrays: S2,S1, with same number of elements, and equal values. by removing k zeros from the arrays, we get a partition to the original S, so S had a partition.
Polynomial:
assume A takes P(n) time, the algorithm we produced will take n*P(n) time, which is also polynomial.
Conclusion:
If this problem is solveable in polynomial time, so does the Partion-Problem, and thus P=NP. based on this: this problem is NP-Hard.
Because this problem is NP-Hard, for an exact solution you will probably need an exponential algorith. One of those is simple backtracking [I leave it as an exercise to the reader to implement a backtracking solution]
EDIT: as mentioned by #jpalecek: by simply creating a reduction: S->S+(0,0,...,0) [k times 0], one can directly prove NP-Hardness by reduction. polynomial is trivial and correctness is very similar to the above partion's correctness proof: [if there is a partition, adding 'balancing' zeros is possible; the other direction is simply trimming those zeros]
Just a comment. Through all this swapping you can basically arrange the contents of both arrays as you like. So it is unimportant in which array the values are at start.
Can't do it in my head but I'm pretty sure there is a constructive solution. I think if you sort them first and then deal them according to some rule. Something along the lines If value > 0 and if sum(a)>sum(b) then insert to a else into b