How to understand this: `db = np.sum(dscores, axis=0, keepdims=True)` - numpy

In cs231n 2017 class, when we backpropagate the gradient we update the biases like this:
db = np.sum(dscores, axis=0, keepdims=True)
What's the basic idea behind the sum operation? Thanks

This is the formula of derivative (more precisely gradient) of the loss function with respect to the bias (see this question and this post for derivation details).
The numpy.sum call computes the per-column sums along the 0 axis. Example:
dscores = np.array([[1, 2, 3],[2, 3, 4]]) # a 2D matrix
db = np.sum(dscores, axis=0, keepdims=True) # result: [[3 5 7]]
The result is exactly element-wise sum [1, 2, 3] + [2, 3, 4] = [3 5 7]. In addition, keepdims=True preserves the rank of original matrix, that's why the result is [[3 5 7]] instead of just [3 5 7].
By the way, if we were to compute np.sum(dscores, axis=1, keepdims=True), the result would be [[6] [9]].
[Update]
Apparently, the focus of this question is the formula itself. I'd like not to go too much off-topic here and just try to tell the main idea. The sum appears in the formula because of broadcasting over the mini-batch in the forward pass. If you take just one example at a time, the bias derivative is just the error signal, i.e. dscores (see the links above explain it in detail). But for a batch of examples the gradients are added up due to linearity. That's why we take the sum along the batch axis=0.

Numpy axis visual description:

Related

type hint npt.NDArray number of axis

Given I have the number of axes, can I specify the number of axes to the type hint npt.NDArray (from import numpy.typing as npt)
i.e. if I know it is a 3D array, how can I do npt.NDArray[3, np.float64]
On Python 3.9 and 3.10 the following does the job for me:
data = [[1, 2, 3], [4, 5, 6]]
arr: np.ndarray[Tuple[Literal[2], Literal[3]], np.dtype[np.int_]] = np.array(data)
It is a bit cumbersome, but you might follow numpy issue #16544 for future development on easier specification.
In particular, for now you must declare the full shape and can't only declare the rank of the array.
In the future something like ndarray[Shape[:, :, :], dtype] should be available.

Strange behaviour of numpy eigenvector: bug or no bug

NumPy's eigenvector solution differs from Wolfram Alpha and my personal calculation by hand.
>>> import numpy.linalg
>>> import numpy as np
>>> numpy.linalg.eig(np.array([[-2, 1], [2, -1]]))
(array([-3., 0.]), array([[-0.70710678, -0.4472136 ],
[ 0.70710678, -0.89442719]]))
Wolfram Alpha https://www.wolframalpha.com/input/?i=eigenvectors+%7B%7B-2,1%7D,%7B%2B2,-1%7D%7D and my personal calculation give the eigenvectors (-1, 1) and (2, 1). The NumPy solution however differs.
NumPy's calculated eigenvalues however are confirmed by Wolfram Alpha and my personal calculation.
So, is this a bug in NumPy or is my understanding of math to simple? A similar thread Numpy seems to produce incorrect eigenvectors sees the main difference in rounding/scaling of the eigenvectors but the deviation between the solutions would be massive.
Regards
numpy.linalg.eig normalizes the eigen vectors with the results being the column vectors
eig_vectors = np.linalg.eig(np.array([[-2, 1], [2, -1]]))[1]
vec_1 = eig_vectors[:,0]
vec_2 = eig_vectors[:,1]
now these 2 vectors are just normalized versions of the vectors you calculated ie
print(vec_1 * np.sqrt(2)) # where root 2 is the magnitude of [-1, 1]
print(vec_1 * np.sqrt(5)) # where root 5 is the magnitude of [2, 1]
So bottom line the both sets of calculations are equivalent just Numpy likes to normalze the results.

What does tf.gather_nd intuitively do?

Can you intuitively explain or give more examples about tf.gather_nd for indexing and slicing into high-dimensional tensors in Tensorflow?
I read the API, but it is kept quite concise that I find myself hard to follow the function's concept.
Ok, so think about it like this:
You are providing a list of index values to index the provided tensor to get those slices. The first dimension of the indices you provide is for each index you will perform. Let's pretend that tensor is just a list of lists.
[[0]] means you want to get one specific slice(list) at index 0 in the provided tensor. Just like this:
[tensor[0]]
[[0], [1]] means you want get two specific slices at indices 0 and 1 like this:
[tensor[0], tensor[1]]
Now what if tensor is more than one dimensions? We do the same thing:
[[0, 0]] means you want to get one slice at index [0,0] of the 0-th list. Like this:
[tensor[0][0]]
[[0, 1], [2, 3]] means you want return two slices at the indices and dimensions provided. Like this:
[tensor[0][1], tensor[2][3]]
I hope that makes sense. I tried using Python indexing to help explain how it would look in Python to do this to a list of lists.
You provide a tensor and indices representing locations in that tensor. It returns the elements of the tensor corresponding to the indices you provide.
EDIT: An example
import tensorflow as tf
sess = tf.Session()
x = [[1,2,3],[4,5,6]]
y = tf.gather_nd(x, [[1,1],[1,2]])
print(sess.run(y))
[5, 6]

repmat with interlace or Kronecker product in Tensorflow

Suppose I have a tensor:
A=[[1,2,3],[4,5,6]]
Which is a matrix with 2 rows and 3 columns.
I would like to replicate it, suppose twice, to get the following tensor:
A2 = [[1,2,3],
[1,2,3],
[4,5,6],
[4,5,6]]
Using tf.repmat will clearly replicate it differently, so I tried the following code (which works):
A_tiled = tf.reshape(tf.tile(A, [1, 2]), [4, 3])
Unfortunately, it seems to be working very slow when the number of columns become large. Executing it in Matlab using Kronecker product with a vector of ones (Matlab's "kron") seems to be much faster.
Can anyone help?

Jacobian in Tensorflow

I see many people asking this question here but I didn't see a code that I can execute. I am trying to make two operations, to get dOuput/dInput and to get dOutput/dParameters. I tried
# gradient method 1
jac_Action_wrt_Param = tf.pack([tf.concat(1, [tf.reshape(tf.gradients(action_output[:, idx], param)[0], [1, -1])
for param in learnable_param_list]) for idx in range(action_dim)],
axis=1, name='jac_Action_wrt_Param')
jac_Action_wrt_State = tf.pack(
[tf.gradients(action_output[:, idx], state_input)[0] for idx in range(action_dim)], axis=1,
name='jac_Action_wrt_State')
here state is input and action is output. Both methods give None... What did I do wrong?