I am trying to understand why the following code crashes my Colab session.
import numpy as np
import tensorflow as tf
x1 = np.random.rand(90000)
x2 = tf.random.uniform((90000,1)).numpy()
print(x1.shape, type(x1))
print(x2.shape, type(x2))
x1 - x2
I can see that memory is exploding which causes the crash but I was hoping someone can explain exactly why this is happening. I also understand that this has to do with broadcasting arrays in numpy and I am just wondering if this is expected behavior so I can avoid it in the future.
The fix is to np.squeze(x2, axis=1) so the vectors have the same shape but clearly there's something I don't understand about what numpy is doing under the hood. Any suggestions and clarifications welcome.
x1 has shape (90000,). x2 has shape (90000, 1). In the expression x1 - x2, broadcasting occurs (as you suspected), giving a result that has shape (90000, 90000). Such an array of floating point values requires 90000*90000*8 = 64800000000 bytes.
Related
I'm in the process of completing a TensorFlow tutorial via DataCamp and am transcribing/replicating the code examples I am working through in my own Jupyter notebook.
Here are the original instructions from the coding problem :
I'm running the following snippet of code and am not able to arrive at the same result that I am generating within the tutorial, which I have confirmed are the correct values via a connected scatterplot of x vs. loss_function(x) as seen a bit further below.
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import Variable, keras
def loss_function(x):
import math
return 4.0*math.cos(x-1)+np.divide(math.cos(2.0*math.pi*x),x)
# Initialize x_1 and x_2
x_1 = Variable(6.0, np.float32)
x_2 = Variable(0.3, np.float32)
# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)
for j in range(100):
# Perform minimization using the loss function and x_1
opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
# Perform minimization using the loss function and x_2
opt.minimize(lambda: loss_function(x_2), var_list=[x_2])
# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
I draw a quick connected scatterplot to confirm (successfully) that the loss function that I using gets me back to the same graph provided by the example (seen in screenshot above)
# Generate loss_function(x) values for given range of x-values
losses = []
for p in np.linspace(0.1, 6.0, 60):
losses.append(loss_function(p))
# Define x,y coordinates
x_coordinates = list(np.linspace(0.1, 6.0, 60))
y_coordinates = losses
# Plot
plt.scatter(x_coordinates, y_coordinates)
plt.plot(x_coordinates, y_coordinates)
plt.title('Plot of Input values (x) vs. Losses')
plt.xlabel('x')
plt.ylabel('loss_function(x)')
plt.show()
Here are the resulting global and local minima, respectively, as per the DataCamp environment :
4.38 is the correct global minimum, and 0.42 indeed corresponds to the first local minima on the graphs RHS (when starting from x_2 = 0.3)
And here are the results from my environment, both of which move opposite the direction that they should be moving towards when seeking to minimize the loss value:
I've spent the better part of the last 90 minutes trying to sort out why my results disagree with those of the DataCamp console / why the optimizer fails to minimize this loss for this simple toy example...?
I appreciate any suggestions that you might have after you've run the provided code in your own environments, many thanks in advance!!!
As it turned out, the difference in outputs arose from the default precision of tf.division() (vs np.division()) and tf.cos() (vs math.cos()) -- operations which were specified in (my transcribed, "custom") definition of the loss_function().
The loss_function() had been predefined in the body of the tutorial and when I "inspected" it using the inspect package ( using inspect.getsourcelines(loss_function) ) in order to redefine it in my own environment, the output of said inspection didn't clearly indicate that tf.division & tf.cos had been used instead of their NumPy counterparts (which my version of the code had used).
The actual difference is quite small, but is apparently sufficient to push the optimizer in the opposite direction (away from the two respective minima).
After swapping in tf.division() and tf.cos (as seen below) I was able to arrive at the same results as seen in the DC console.
Here is the code for the loss_function that will back in to the same results as seen in the console (screenshot) :
def loss_function(x):
import math
return 4.0*tf.cos(x-1)+tf.divide(tf.cos(2.0*math.pi*x),x)
I am using Numba to speed up a function and I bumped into the following problem.
When using the decorator #njit (or #jit) the behaviour of some numpy function is changed.
For example if I use the following function to calculate tanh
from numba import njit
import numpy as np
#njit
def check_tanh(z):
return np.tanh(z)
and I run it for real values of z I get the same as np.tanh(z) as it should be.
If I move instead parallel to the real axis but with an imaginary part, for example z = x+ 1.j, and increase x, the numpy tanh will converge to 1.+0.j, while check_tanh(z) will return a nan (on my computer this is happening when x>360).
Does anyone have an idea of what is going on and how can be fixed?
Thanks in advance!
Using tanh from cmath fixes the issue.
Seems that is still a problem in Numba
https://github.com/numba/numba/issues/2919
I got something wrong going on :
import numpy as np
import matplotlib.pyplot as plt
x = np.concatenate((np.linspace(0,1,100),np.linspace(1,2,50)));
f = np.power(x,2);
df = 2*x;
Df = np.gradient(f,x);
plt.plot(x,df,'r', x,Df,'b');plt.show()
This is what I get :
Otherwise things work ok if using linearly spaced array and not using argument x.
Any suggestions?
I think this is because numpy versions before 1.13 expect the "x" argument to be the constant grid spacing (see https://docs.scipy.org/doc/numpy-1.11.0/reference/generated/numpy.gradient.html#numpy.gradient). Even though the earlier versions expect a scalar dx, they do not check for this, and the result is np.gradient(f) / x, which is a valid division. This is pretty annoying since code written for numpy 1.13 may run on earlier versions with incorrect output and no errors.
I am trying to create a t-distribution by taking the mean of many samples from a normal distribution (and then estimating the shape with kernel density estimation).
For some reason, I am getting pretty different results when I compare what I get with a proper t-distribution. I don't understand what is going wrong, so I think I am confused about something.
Here is the code:
import numpy as np
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import seaborn
inner_sample_size = 10
X = np.arange(-3, 3, 0.01)
results = [
np.mean(np.random.normal(size=inner_sample_size))
for _ in range(10000)
]
estimation = gaussian_kde(results)
plt.plot(X, estimation.evaluate(X))
t_samples = np.random.standard_t(inner_sample_size, 10000)
t_estimator = gaussian_kde(t_samples)
plt.plot(X, t_estimator.evaluate(X))
plt.ylabel("Probability density")
plt.show()
And here is the plot I get:
Where the orange line is numpy's own t-distribution, and the blue line is the one estimated by sampling.
Your assumption that the mean of Standard Normals has T distribution is incorrect. In fact, the mean of Standard Normals has Normal Distribution, which explains the shape of your blue graph. To generate one random variable T from a T distribution with k degrees of freedom, you first generate k+1 independent Standard Normals Z_i, i=0,...,k. You then compute
T = Z_0 / sqrt( sum(Z_i^2, i=1 to k)/k ).
The sum of squared Standard Normals sum(Z_i^2, i=1 to k) has Chi-Squared Distribution with k degrees of freedom, so if there is a pre-canned method to generate this, you should use it, since it's likely more efficient.
I'm having a bit of trouble with fitting a curve to some data, but can't work out where I am going wrong.
In the past I have done this with numpy.linalg.lstsq for exponential functions and scipy.optimize.curve_fit for sigmoid functions. This time I wished to create a script that would let me specify various functions, determine parameters and test their fit against the data. While doing this I noticed that Scipy leastsq and Numpy lstsq seem to provide different answers for the same set of data and the same function. The function is simply y = e^(l*x) and is constrained such that y=1 at x=0.
Excel trend line agrees with the Numpy lstsq result, but as Scipy leastsq is able to take any function, it would be good to work out what the problem is.
import scipy.optimize as optimize
import numpy as np
import matplotlib.pyplot as plt
## Sampled data
x = np.array([0, 14, 37, 975, 2013, 2095, 2147])
y = np.array([1.0, 0.764317544, 0.647136491, 0.070803763, 0.003630962, 0.001485394, 0.000495131])
# function
fp = lambda p, x: np.exp(p*x)
# error function
e = lambda p, x, y: (fp(p, x) - y)
# using scipy least squares
l1, s = optimize.leastsq(e, -0.004, args=(x,y))
print l1
# [-0.0132281]
# using numpy least squares
l2 = np.linalg.lstsq(np.vstack([x, np.zeros(len(x))]).T,np.log(y))[0][0]
print l2
# -0.00313461628963 (same answer as Excel trend line)
# smooth x for plotting
x_ = np.arange(0, x[-1], 0.2)
plt.figure()
plt.plot(x, y, 'rx', x_, fp(l1, x_), 'b-', x_, fp(l2, x_), 'g-')
plt.show()
Edit - additional information
The MWE above includes a small sample of the dataset. When fitting the actual data the scipy.optimize.curve_fit curve presents an R^2 of 0.82, while the numpy.linalg.lstsq curve, which is the same as that calculated by Excel, has an R^2 of 0.41.
You are minimizing different error functions.
When you use numpy.linalg.lstsq, the error function being minimized is
np.sum((np.log(y) - p * x)**2)
while scipy.optimize.leastsq minimizes the function
np.sum((y - np.exp(p * x))**2)
The first case requires a linear dependency between the dependent and independent variables, but the solution is known analitically, while the second can handle any dependency, but relies on an iterative method.
On a separate note, I cannot test it right now, but when using numpy.linalg.lstsq, I you don't need to vstack a row of zeros, the following works as well:
l2 = np.linalg.lstsq(x[:, None], np.log(y))[0][0]
To expound a bit on Jaime's point, any non-linear transformation of the data will lead to a different error function and hence to different solutions. These will lead to different confidence intervals for the fitting parameters. So you have three possible criteria to use to make a decision: which error you want to minimize, which parameters you want more confidence in, and finally, if you are using the fitting to predict some value, which method yields less error in the interesting predicted value. Playing around a bit analytically and in Excel suggests that different kinds of noise in the data (e.g. if the noise function scales the amplitude, affects the time-constant or is additive) leads to different choices of solution.
I'll also add that while this trick "works" for exponential decay to 0, it can't be used in the more general (and common) case of damped exponentials (rising or falling) to values that cannot be assumed to be 0.