Scikit-learn NMF return NAN values - numpy

I am working with a 6650254x5650 sparse matrix which values are in numpy.float64 format.
I am using the NMF implemetnation from scikit-learn as following
from sklearn.decomposition import NMF
model = NMF(n_components=12, init='random', random_state=0, max_iter=20, l1_ratio=0.01)
W = model.fit_transform(X_all_sparse, )
H = model.components_
W
It seems for larger number of n_components I get W matrices where all elements are NaN. For example if n_components is larger than 7 - but it works when n_components is 19 ! I wonder what can cause this and what are the other libraries that can handle such big matrices efficiently that I can benchmark against.
update
If other have a similar problem, meanwhile, I am using the implicit library

Related

TensorFlow Probability (tfp) equivalent of np.quantile()

I am trying to find a TensorFlow equivalent of np.quantile(). I have found tfp.stats.quantiles() (tfp stands for TensorFlow Probability). However, its constructs are a bit different from that of np.quantile().
Consider the following example:
import tensorflow_probability as tfp
import tensorflow as tf
import numpy as np
inputs = tf.random.normal((1, 4096, 4))
print("NumPy")
print(np.quantile(inputs.numpy(), q=0.9, axis=1, keepdims=False))
I am not sure from the TFP docs how I could write the above using tfp.stats.quantile(). I tried checking out the source code of both methods, but it didn't help.
Let me try to be more helpful here than I was on GitHub.
There is a difference in behavior between np.quantile and tfp.stats.quantiles. The key difference here is that numpy.quantile will
Compute the q-th quantile of the data along the specified axis.
where q is the
Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive.
and tfp.stats.quantiles
Given a vector x of samples, this function estimates the cut points by returning num_quantiles + 1 cut points
So you need to tell tfp.stats.quantiles how many quantiles you want and then select out the qth quantile. If it isn't clear how to do this just from the API, if you look at the source for tfp.stats.quantiles (for v0.19.0) we can see that it shows us how we can get a similar return structure as NumPy.
For completeness, setting up a virtual environment with
$ cat requirements.txt
numpy==1.24.2
tensorflow==2.11.0
tensorflow-probability==0.19.0
allows us to run
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
inputs = tf.random.normal((1, 4096, 4), dtype=tf.float64)
q = 0.9
numpy_quantiles = np.quantile(inputs.numpy(), q=q, axis=1, keepdims=False)
tfp_quantiles = tfp.stats.quantiles(
inputs, num_quantiles=100, axis=1, interpolation="linear"
)[int(q * 100)]
assert np.allclose(numpy_quantiles, tfp_quantiles.numpy())
print(f"{numpy_quantiles=}")
# numpy_quantiles=array([[1.31727661, 1.2699167 , 1.28735237, 1.27137588]])
print(f"{tfp_quantiles=}")
# tfp_quantiles=<tf.Tensor: shape=(1, 4), dtype=float64, numpy=array([[1.31727661, 1.2699167 , 1.28735237, 1.27137588]])>
You could also use tfp.stats.percentile(inputs, 90., axis=1, keepdims=False) -- the only difference from quantile is the 90. replacing .90.

Why does keras (SGD) optimizer.minimize() not reach global minimum in this example?

I'm in the process of completing a TensorFlow tutorial via DataCamp and am transcribing/replicating the code examples I am working through in my own Jupyter notebook.
Here are the original instructions from the coding problem :
I'm running the following snippet of code and am not able to arrive at the same result that I am generating within the tutorial, which I have confirmed are the correct values via a connected scatterplot of x vs. loss_function(x) as seen a bit further below.
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import Variable, keras
def loss_function(x):
import math
return 4.0*math.cos(x-1)+np.divide(math.cos(2.0*math.pi*x),x)
# Initialize x_1 and x_2
x_1 = Variable(6.0, np.float32)
x_2 = Variable(0.3, np.float32)
# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)
for j in range(100):
# Perform minimization using the loss function and x_1
opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
# Perform minimization using the loss function and x_2
opt.minimize(lambda: loss_function(x_2), var_list=[x_2])
# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
I draw a quick connected scatterplot to confirm (successfully) that the loss function that I using gets me back to the same graph provided by the example (seen in screenshot above)
# Generate loss_function(x) values for given range of x-values
losses = []
for p in np.linspace(0.1, 6.0, 60):
losses.append(loss_function(p))
# Define x,y coordinates
x_coordinates = list(np.linspace(0.1, 6.0, 60))
y_coordinates = losses
# Plot
plt.scatter(x_coordinates, y_coordinates)
plt.plot(x_coordinates, y_coordinates)
plt.title('Plot of Input values (x) vs. Losses')
plt.xlabel('x')
plt.ylabel('loss_function(x)')
plt.show()
Here are the resulting global and local minima, respectively, as per the DataCamp environment :
4.38 is the correct global minimum, and 0.42 indeed corresponds to the first local minima on the graphs RHS (when starting from x_2 = 0.3)
And here are the results from my environment, both of which move opposite the direction that they should be moving towards when seeking to minimize the loss value:
I've spent the better part of the last 90 minutes trying to sort out why my results disagree with those of the DataCamp console / why the optimizer fails to minimize this loss for this simple toy example...?
I appreciate any suggestions that you might have after you've run the provided code in your own environments, many thanks in advance!!!
As it turned out, the difference in outputs arose from the default precision of tf.division() (vs np.division()) and tf.cos() (vs math.cos()) -- operations which were specified in (my transcribed, "custom") definition of the loss_function().
The loss_function() had been predefined in the body of the tutorial and when I "inspected" it using the inspect package ( using inspect.getsourcelines(loss_function) ) in order to redefine it in my own environment, the output of said inspection didn't clearly indicate that tf.division & tf.cos had been used instead of their NumPy counterparts (which my version of the code had used).
The actual difference is quite small, but is apparently sufficient to push the optimizer in the opposite direction (away from the two respective minima).
After swapping in tf.division() and tf.cos (as seen below) I was able to arrive at the same results as seen in the DC console.
Here is the code for the loss_function that will back in to the same results as seen in the console (screenshot) :
def loss_function(x):
import math
return 4.0*tf.cos(x-1)+tf.divide(tf.cos(2.0*math.pi*x),x)

How to use torch to speed up some common computations?

I am trying make some common computations, like matrix multiplication, but without gradient computation. An example of my computation is like
import numpy as np
from scipy.special import logsumexp
var = 1e-8
a = np.random.randint(0,10,(128,20))
result = np.logsumexp(a, axis=1) / 2. + np.log(np.pi * var)
I want to use torch (gpu) to speed up the computation. Here is the code
import numpy as np
import torch
var = 1e-8
a = np.random.randint(0,10,(128,20))
a = torch.numpy_from(a).cuda()
result = torch.logsumexp(a, dim=1)/ 2. + np.log(np.pi*var)
but i have some questions:
Could the above code speed up the computation? I don't know if it works.
Do I need to convert all values into torch.tensor, like from var to torch.tensor(var).cuda() and from np.log(np.pi*var) to a torch.tensor?
Do I need to convert all tensors into gpu by myself, especially for some intermediate variable?
If the above code doesn't work, how can I speed up the computation with gpu?
You could use torch only to do the computations.
import torch
# optimization by passing device argument, tensor is created on gpu and hence move operation is saved
# convert to float to use with logsumexp
a = torch.randint(0,10, (128,20), device="cuda").float()
result = torch.logsumexp(a, dim=1)/ 2.
Answers to your some of your questions:
Could the above code speed up the computation?
It depends. If you have too many matrix multiplication, using gpu can give speed up.
Do I need to convert all values into torch.tensor, like from var to torch.tensor(var).cuda() and from np.log(np.pi*var) to a torch.tensor?
Yes
Do I need to convert all tensors into gpu by myself, especially for some intermediate variable?
Only leaf variables need to converted, intermediate variable will be placed on device on which the operations are done. For ex: if a and b are on gpu, then as a result of operation c=a+b, c will also be on gpu.

Why does Tensorflow output these simple results?

I'm brand new to Tensorflow, but I'm trying to figure out why these results end in ...001, ...002, etc.
I'm following the tutorial here: https://www.tensorflow.org/get_started/get_started
Code:
"""This is a Tensorflow learning script."""
import tensorflow as tf
sess = tf.Session()
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x + b
sess.run(tf.global_variables_initializer()) #This is the same as the above 2 lines
print(sess.run(linear_model, {x: [1, 2, 3, 4]}))
It looks like a simple math function where if I was using 2 as an input, it would be (0.3 * 2) + -0.3 = 0.3.
Output:
[ 0. 0.30000001 0.60000002 0.90000004]
I would expect:
[ 0. 0.3 0.6 0.9]
That's probably a floating point error, because you introduced your variables as a tf.float32 dtype. You could use tf.round (https://www.tensorflow.org/api_docs/python/tf/round) but it doesn't seem to have round-to-the-nearest decimal place capability yet. For that, check out the response in: tf.round() to a specified precision.
The issue is that a floating point variable (like tf.float32) simply cannot store exactly 0.3 due to being stored in binary. It's like trying to store exactly 1/3 in decimal, it'd be 0.33... but you'd have to go out to infinity to get the exact number (which isn't possible our mortal realm!).
See the python docs for more in depth review of the subject.
Tensorflow doesn't have a way to deal with decimal numbers yet (as far as I know)! But once the numbers are returned to python you could round & then convert to a Decimal.

How to change 0.0 - 1.0 class predictions to integer [duplicate]

I was doing Multi-class Classification using Keras.It contained 5 classes of Output. I converted the single class vector to matrix using one hot encoding and made a model. Now to evaluate the model I want to convert back the 5 class probabilistic result back to Single Column.
I am getting this as output in numpy array format
..................0..................1............................2.......................3.............................4
5.35433665e-02 1.72592481e-05 1.49291719e-03 9.44392741e-01
5.53713820e-04
1.97096306e-05 2.08907949e-08 3.11666554e-07 1.40611945e-07
9.99979794e-01
9.99999225e-01 2.42999278e-07 1.58917388e-07 7.84497018e-08
2.85837785e-07
7.09977685e-05 1.02068476e-09 1.38186664e-07 9.99928594e-01
2.73126261e-07
1.29937407e-05 2.49388819e-07 9.99986231e-01 4.76015231e-07
7.39421040e-08
Want to convert this matrix to
[3,4,0,3,2]
It seems like you are looking for np.argmax:
import numpy as np
class_labels = np.argmax(class_prob, axis=1) # assuming you have n-by-5 class_prob