summing a square array - sum

Hi I am trying to sum a 2-dimensional square array. Assume the array F(i,j) is well defined for all points i,j. I can sum the whole array by just sum(F), but I want to sum the array by starting from a small square, and then up to the full size array, which will require a DO loop. If I was to write out the logic tediously, I want to sum the array F(i,j) as follows:
DO i = -1,1
DO j = -1,1
value1 = sum(F)
END DO
END DO
DO i = -2,2
DO j = -2,2
value2 = sum(F)
END DO
END DO
DO i = -3,3
DO j = -3,3
value3 = sum(F)
END DO
END DO
and proceed up to i=-30,30, j=-30,30. I tried to implement this in one DO loop by
DO i = -30,30
DO j = -30,30
value4 = sum(F(i:i+1,j:j+1))
END DO
END DO
but this gives me incorrect results. How can I fix this so I can implement this in all in a single do loop? Thanks.

If I am deciphering what you are trying to do, you have a 2-D array with custom extents from -30 to 30 in both dimensions? Then you want to start with a 3x3 matrix in the center of this, get a sum. Then enlarge it to a 5x5 and get a sum. Keep going until you get the sum of the whole matrix.
You will have 30 sums then. Yes you can do this in a single do loop. Putting the answers in a 1-D array Sums(30) would look like this:
do i = 1, 30
Sums(i) = Sum(F(-i:i,-i:i))
end do
The index notation in F carves out growing square matrices to feed into the Sum function.

Related

Finding the index of max value of columns in numpy array but removing the previous max

I have an array with N rows and M columns.
I would like to run through all the columns, finding the index of the row in which contains the max value of the column. However, each row should be selected only once.
For instance, let's consider a matrix
1 1
2 2
The output should be [1, 0]. Because the row 1 (value of 2) is the max value of column 0, then we move to column 2, the row 1 is out of consideration, so the row 0 will be the highest cell.
Indeed, things can be solved easily with for a nested for loop, and something like:
removed_rows = []
for i in range (nb_columns):
index_max = 0
value_max = A[0,i]
for j in range (nb_rows):
if j in removed_rows:
continue
else:
if value_max < A[j,i]:
index_max = j
value_max = A[j,i]
removed_rows.append (index_max)
However, it seems slow for a huge matrix. Is there any method we can do it faster (with numpy?)?
Many thanks
This might not be very fast as it still loop through the columns, which I think is unavoidable due to the constrain, but should be faster than your solution as it finds the maximum's index with argmax:
out = []
mm = A.min() - 1
for j in range(A.shape[1]):
idx = np.argmax(A[:,j])
# replace the entire row with mm
# so next `argmax` will ignore this row
A[idx] = mm
out.append(idx)
The above takes about 640 us on 100 x 100 arrays, and 18ms on 1k x 1k arrays. Your code refuses to run on 1k x 1k array within reasonable time on my system.

SPSS: DO REPEAT with different numbers of matched variables

I have a dataset where each case has the following set of variables:
VarA1.1 to VarA25.185 (total of 4625 variables)
VarB.1 to VarB.185 (total of 185 variables)
For each case, VarA1.1, VarA2.1, VarA3.1, etc. are all linked to the same VarB.1.
I want to use a DO REPEAT function to search through each .1 instance using both VarA and VarB.
Example code:
DO REPEAT VarA = VarA1.1 to VarA25.185
/ VarB = VarB.1 to VarB.185.
if (VarA = X) AND ((VarB-Y)<0)
VarC = Z.
END REPEAT.
EXE.
However, it seems that because there are different numbers of variables in the repeat list of VarA and VarB, they don't pair up. I want to associate each VarA#(1-25).1 with VarB.1, each VarA#(1-25).2 with each VarB.2, etc. up to VarB.185 so that in the repeat function the correct pairing of variables is used.
Thanks!
Another way to do this is to use a LOOP on the outside and a DO REPEAT on the inside. So here is some example data, with just three A variables that go to 1 to 10.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO 100.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
*Making random data.
VECTOR A1.(10).
VECTOR A2.(10).
VECTOR A3.(10).
VECTOR B.(10).
NUMERIC X Y.
DO REPEAT a = A1.1 TO Y.
COMPUTE a = RV.BERNOULLI(0.5).
END REPEAT.
EXECUTE.
So here is the part you want to pay attention to. Your DO REPEAT currently loops over the 25 variables. This switches it though, so the LOOP part goes over the 25 variables, but the DO REPEAT goes over each of your A vectors.
VECTOR A1 = A1.1 TO A1.10.
VECTOR A2 = A2.1 TO A2.10.
VECTOR A3 = A3.1 TO A3.10.
VECTOR B = B.1 TO B.10.
VECTOR C.(10).
LOOP #i = 1 TO 10.
DO REPEAT A = A1 A2 A3.
IF (A(#i) = X) AND (B(#i)-Y<0) C.(#i) = B(#i).
END REPEAT.
END LOOP.
EXECUTE.
Code golf it is probably not going to beat the macro approach, since you have to define all of those VECTOR statements. But I think it is a conceptually clear way to write the program.
It looks like what you are trying to do is loop over 25 variables but repeat this for 185 variables.
It would be more intutive to use SPSS Macros to achieve this. Stepping through the below will demonstrate the building blocks for solving your data problem.
DEFINE !MyMacroName ()
SET MPRINT ON.
/* Generate some example data to match desired data format*/.
set seed = 10.
input program.
loop #i = 1 to 50.
compute case = #i.
end case.
end loop.
end file.
end input program.
dataset name sim.
execute.
!do !i =1 !to 25
vector !concat('VarA',!i,'.(185, F1.0).').
do repeat v = !concat('VarA',!i,'.1') to !concat('VarA',!i,'.185').
compute v = TRUNC(RV.UNIFORM(1,6)).
end repeat.
!doend
vector VarB.(185, F1.0).
do repeat v = VarB.1 to VarB.185.
compute v = TRUNC(RV.UNIFORM(1,6)).
end repeat.
execute.
/* Solve actual problem */.
!do !i =1 !to 185
!do !j = 1 !to 25
if (!concat('VarA',!j,'.',!i) = !concat('VarB.',!i)) !concat('VarC', !j)=1.
!doend
!doend
SET MPRINT OFF.
!ENDDEFINE.
/* Run macro */.
!MyMacroName.

Sum of a two dimensional array

I have this 2D array L(i,j). How can I sum all the elements depending of i and make the result as a function of j
I did :
do j=1,10
do i =1,30
T(j) = Sum( L(:,j)
end do
end do
Is that ok?
Almost... you don't use i (and you don't need to), and you are missing one bracket:
do j=1,10
T(j) = Sum( L(:,j) )
enddo ! j
You could also use the dimension parameter in sum to do this operation in one line:
T = sum( L, dim=1 )
However, I find that very difficult to read and would stick with the loop - it shouldn't make a difference in terms of performance.

torch logical indexing of tensor

I looking for an elegant way to select a subset of a torch tensor which satisfies some constrains.
For example, say I have:
A = torch.rand(10,2)-1
and S is a 10x1 tensor,
sel = torch.ge(S,5) -- this is a ByteTensor
I would like to be able to do logical indexing, as follows:
A1 = A[sel]
But that doesn't work.
So there's the index function which accepts a LongTensor but I could not find a simple way to convert S to a LongTensor, except the following:
sel = torch.nonzero(sel)
which returns a K x 2 tensor (K being the number of values of S >= 5). So then I have to convert it to a 1 dimensional array, which finally allows me to index A:
A:index(1,torch.squeeze(sel:select(2,1)))
This is very cumbersome; in e.g. Matlab all I'd have to do is
A(S>=5,:)
Can anyone suggest a better way?
One possible alternative is:
sel = S:ge(5):expandAs(A) -- now you can use this mask with the [] operator
A1 = A[sel]:unfold(1, 2, 2) -- unfold to get back a 2D tensor
Example:
> A = torch.rand(3,2)-1
-0.0047 -0.7976
-0.2653 -0.4582
-0.9713 -0.9660
[torch.DoubleTensor of size 3x2]
> S = torch.Tensor{{6}, {1}, {5}}
6
1
5
[torch.DoubleTensor of size 3x1]
> sel = S:ge(5):expandAs(A)
1 1
0 0
1 1
[torch.ByteTensor of size 3x2]
> A[sel]
-0.0047
-0.7976
-0.9713
-0.9660
[torch.DoubleTensor of size 4]
> A[sel]:unfold(1, 2, 2)
-0.0047 -0.7976
-0.9713 -0.9660
[torch.DoubleTensor of size 2x2]
There are two simpler alternatives:
Use maskedSelect:
result=A:maskedSelect(your_byte_tensor)
Use a simple element-wise multiplication, for example
result=torch.cmul(A,S:gt(0))
The second one is very useful if you need to keep the shape of the original matrix (i.e A), for example to select neurons in a layer at backprop. However, since it puts zeros in the resulting matrix whenever the condition dictated by the ByteTensor doesn't apply, you can't use it to compute the product (or median, etc.). The first one only returns the elements that satisfy the condittion, so this is what I'd use to compute products or medians or any other thing where I don't want zeros.

Fortran: efficient matrix-vector multiplication

I have a piece of code which is a significant bottleneck:
do s = 1,ns
msum = 0.d0
do k = 1,ns
msum = msum + tm(k,s)*f(:,:,k)
end do
m(:,:,s) = msum
end do
This is a simple matrix-vector product m=tm*f (where f is length k) for every x,y.
I thought about using a BLAS routine but i am not sure if any allows multiplying along a specific dimension (k). Do any of you have any good advice?
Unfortunately you do not mention the actual shape of f, i.e. the number of x and y. Since you mention this piece of code to be a bottleneck, you can and should replace msum and use the memory m(:,:,s) and spare the first step in you loop, e.g.
do s = 1,ns
m = tm(k,1)*f(:,:,k)
do k = 2, ns
m(:,:,s) = m(:,:,s) + tm(k,s)*f(:,:,k)
end do
end do
Secondly, a more general appraoch
There are ns summations of nK 2D matrices f(:,:,1:nK) by means of scalar factors that are stored in tm(:,1:ns). The goal is to store these sums in m(:,:,1:ns). Why not sum up element-wise wrt x and y to exploit contiguuos memory sections by means of the result? You already mentioned that you can redesign such that k is the first dimension in f, i.e. f(k,:,:).
Considering only the desired outcome, you ought to have ns 2D matrices m(:,:,1:ns) that are independent of each other (outer loop remains at it is). Lets drop this dimension for a moment. The problem then becomes:
m(:,:) = \sum_{k=1}^{ns} tm_k * f_k(:,:)
We should thus sum over k, e.g. have f(k,:,:) to determine m(:,:) as follows (note that I am adding the outer loop for s again):
nK = size(f, 1) ! the "k"s
nX = size(f, 2) ! the "x"s
nY = size(f, 3) ! the "y"s
m = 0.d0
do s = 1, ns
do ii = 1, nY
call DGEMV('N', nK, nY, &
1.d0, f(:,:,nY), 1, tm(:,s), 1, &
1.d0, m(:,nY,s), 1)
end do !ii
end do !s
See the documentation of DGEMV for more details on its usage.
Of course, the above advice of excluding the first step of the loop to spare the initialization by means of zeros may be applied at well.