Another follow-up to numpy.linalg.eig and loading - numpy

Anyway, I'm still struggling with the task of 1) Calculating the eigenvalues and eigenvectors of a matrix, 2) Saving them to a file, 3) Loading the data back. I can do steps 1 and 2; but no matter what I try step 3 always throws an error. See np.savetxt triggers ValueError. Why? and Writing and Reading Eigenvalues & Eigenvectors, follow up
This time I tried saving the eigenvalues and eigenvectors separately, so they're both arrays. Unfortunately I still get the pickle error. Even when loading just the eigenvalues.
eigs=np.linalg.eig(P#K#P)
eigvals=np.real(eigs[0])
eigvecs=np.real(eigs[1])
np.savetxt('eigvals.txt',eigvals)
np.savetxt('eigvecs.txt',eigvecs)
Sure enough, eigvals and eigvecs show up as arrays, sizes 10000 and 10000x10000 respectively, in the Variable Explorer. And when I manually open eigvals.txt, I see a long list of floats as expected. But when I then try np.load('eigvals.txt','r'), I still get the pickle error (ValueError: Cannot load file containing pickled data when allow_pickle=False). What's wrong now?
Thanks

Related

In MediaPipe, is it possible to see augmented landmarks rendered in real time?

So I am using MediaPipe Holistic Solutions to extract keypoints from a body, hands and face, and I am using the data from this extraction for my calculations just fine. The problem is, I want to see if my data augmentation works, but I am unable to see it in real time. An example of how the keypoints are extracted:
lh_arr = (np.array([[result .x, result .y, result .z] for result in results.left_hand_landmarks.landmark]).flatten()
if I then do lets say, lh_arr [10:15]*2, I cant use this new data in the draw_landmarks function, as lh_arr is not class 'mediapipe.python.solution_base.SolutionOutputs'. Is there a way to get draw_landmarks() to use an np array instead or can I convert the np array back into the correct format? I have tried to get get the flattened array back into a dictionary of the same format of results, but it did not work. I can neither augment the results directly, as they are unsupported operand types.

How does np.array() works internally?

I've written my own tensor library and a corresponding Python binding. And I've made sure iterating through my tensor implementation works exactly like how NumPy works. I also made sure important method calls like __len__, __getitem__, __setitem__, etc... all works like how NumPy expected it to be. And so I expect
t = my_tensor.ones((4, 4))
print(t) # works
a = np.array(t)
print(a) # becomes a 32 dimension array
to give me a 4x4 matrix. But instead it gave me a 4x4x1x1.... (32 dims in total) array. I'm out of ways to debug this problem without knowing how NumPy performs the conversion internally. How does np.array works internally? I'm unable to locate the function within NumPy's source code nor I can find useful information on the web.
Have you tried looking at the official Numpy's documentation? https://numpy.org/doc/stable/contents.html
Questions specific as this one are usually solved by looking at the original library documentation (e.g. https://numpy.org/doc/stable/user/quickstart.html#array-creation)

Tensorflow word2vec InvalidArgumentError: Assign requires shapes of both tensors to match

I am using this code to train a word2vec model. I am trying to train it incrementally, with using saver.restore(). I am using new data after restoring the model. Since vocabulary size for the old data and new data are not the same, I got an exception like this:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [28908,200] rhs shape= [71291,200]
Here 71291 is vocabulary size for the old data and 28908 is for new data.
It gets the vocabulary words from the train_data file here, and constructs the network model using size of the vocabulary. I thought that if I could set vocabulary size the same for my old data and new data, I can solve this problem.
So, my question is: Can I do that in this code? As far as I understand, I cannot reach skipgram_word2vec() function.
Or, is there any other way of solving this issue in this code beside what I thought? If it is not possible using this code, I will try other ways for my purpose.
Any help is appreciated.
Having taken a look at the source of word2vec_optimized.py I'd say you will need to change the code there. It operates by opening a text file right up front as "training data". For your purposes, you have to change the build_graph method and allow it to get an option to set all that data ( words, counts, words_per_epoch, current_epoch, total_words_processed, examples, labels, opts.vocab_words, opts.vocab_counts, opts.words_per_epoch ) when initializing, and not from a text file.
Then you need to merge the two text files, and load them once, to produce the vocabulary. Then save all the data above, and use that to restore the network at each subsequent run.
If you use more than 2 texts, you need to include all the text you plan to use in the first data to produce the vocabulary, however.

Error on mean sigma differ if fit without normalization or with it. Why?

When I fitted a histogram using Gaussian its error on mean and sigma seems to be fine. You can look at it here .
But, when I first normalized the histogram and fitted it with Gaussian its parameters value is exactly same as previous case but the error on mean and sigma is almost equal to the actual value or greater.
One of the reason for this is that it may be happening because it is taking error as 1/sqrt{n} and after normalizing n decreased and hence error increased.
Please let me know what is happening and how I can fix it?
You probably want to call
hist->Sumw2()
before rescaling the histogram. Otherwise the uncertainties on all bin contents are just the square roots of the bin contents (which is a huge relative error for bin contents smaller than 1, which is the case when after rescaling). SumW2 triggers to store the sum of all weights squared and not only the bin contents (i.e. the sum of weights in each bin).
See also the documentation of Sumw2() for further details (and also the explanation of weights on the top of the TH1 documentation page).

Error when computing eigenvalues of a scipy LinearOperator: "gmres did not converge"

I'm trying to solve a large eigenvalue problem with Scipy where the matrix A is dense but I can compute its action on a vector without having to assemble A explicitly. So in order to avoid memory issues when the matrix A gets big I'd like to use the sparse solver scipy.sparse.linalg.eigs with a LinearOperator that implemements this action.
Applying eigs to an explicit numpy array A works fine. However, if I apply eigs to a LinearOperator instead then the iterative solver fails to converge. This is true even if the matvec method of the LinearOperator is simply matrix-vector multiplication with the given matrix A.
A minimal example illustrating the failure is attached below (I'm using shift-invert mode because I am interested in the smallest few eigenvalues). This computes the eigenvalues of a random matrix A just fine, but fails when applied to a LinearOperator that is directly converted from A. I tried to fiddle with the parameters for the iterative solver (v0, ncv, maxiter) but to no avail.
Am I missing something obvious? Is there a way to make this work? Any suggestions would be highly appreciated. Many thanks!
Edit: I should clarify what I mean by "make this work" (thanks, Dietrich). The example below uses a random matrix for illustration. However, in my application I know that the eigenvalues are almost purely imaginary (or almost purely real if I multiply the matrix by 1j). I'm interested in the 10-20 smallest-magnitude eigenvalues, but the algorithm doesn't behave well (i.e., never stops even for small-ish matrix sizes) if I specify which='SM'. Therefore I'm using shift-invert mode by passing the parameters sigma=0.0, which='LM'. I'm happy to try a different approach so long as it allows me to compute a bunch of smallest-magnitude eigenvalues.
from scipy.sparse.linalg import eigs, LinearOperator, aslinearoperator
import numpy as np
# Set a seed for reproducibility
np.random.seed(0)
# Size of the matrix
N = 100
# Generate a random matrix of size N x N
# and compute its eigenvalues
A = np.random.random_sample((N, N))
eigvals = eigs(A, sigma=0.0, which='LM', return_eigenvectors=False)
print eigvals
# Convert the matrix to a LinearOperator
A_op = aslinearoperator(A)
# Try to solve the same eigenproblem again.
# This time it produces an error:
#
# ValueError: Error in inverting M: function gmres did not converge (info = 1000).
eigvals2 = eigs(A_op, sigma=0.0, which='LM', return_eigenvectors=False)
I tried running your code, but not passing the sigma parameter to eigs() and it ran without problems (read eigs() docs for its meaning). I didn't see the benefit of it in your example.
Eigs can already find the smallest eigenvalues first. Set which = 'SM'