question about asterik in curve fitting code - numpy

In this following example where it is trying to curve fit a sigmoid function to data I don't understand what does * in *ppot in line 11 mean
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x, Beta_1, Beta_2):
y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
return y
popt, pcov = curve_fit(sigmoid, xdata, ydata)
x = np.linspace(1960, 2015, 55)
x = x/max(x)
plt.figure(figsize=(8,5))
y = sigmoid(x, *popt)
plt.plot(xdata, ydata, 'ro', label='data')
plt.plot(x,y, linewidth=3.0, label='fit')
plt.legend(loc='best')
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()
thank you in advance.

The curve_fit method returns popt as a list of values, in this case, a list of 2 values (optimal values for the parameters).
Adding the * before a list splits the list into its values each assigned to a parameter of the function.
Example
>>> # Sample list
>>> lst = [1, 2, 3]
>>> lst
[1, 2, 3]
>>> # Creating a function that requires 3 parameters
>>> def add(x, y, z):
... return x + y + z
...
>>> add(*lst)
6
>>> add(lst)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: add() missing 2 required positional arguments: 'y' and 'z'

Related

Numpy.polyfit Not Returning Polynomial

I am trying to create a python program in which the user inputs a set of data and the program spits out an output in which it creates a graph with a line/polynomial which best fits the data.
This is the code:
from matplotlib import pyplot as plt
import numpy as np
x = []
y = []
x_num = 0
while True:
sequence = int(input("Input 1 number in the sequence, type 9040321 to stop"))
if sequence == 9040321:
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
plt.plot(poly)
plt.scatter(x, y, c="blue", label="data")
plt.legend()
plt.show()
break
else:
y.append(sequence)
x.append(x_num)
x_num += 1
I used the polynomial where I inputed 1, 2, 4, 8 each in separate inputs. MatPlotLib graphed it properly, however, for the degree of 2, the output was the following image:
This is clearly not correct, however I am unsure what the problem is. I think it has something to do with the degree, however when I change the degree to 3, it still does not fit. I am looking for a graph like y=sqrt(x) to go over each of the points and when that is not possible, create the line that fits the best.
Edit: I added a print(poly) feature and for the selected input above, it gives [0.75 0.05 1.05]. I do not know what to make of this.
Approximation by a second degree polynomial
np.polyfit gives the coefficients of a polynomial close to the given points. To plot the polynomial as a smooth curve with matplotlib, you need to calculate a lot of x,y pairs. Using np.linspace(start, stop, numsteps) for the xs, numpy's vectorization allows calculating all the corresponding ys in one go. E.g. ys = a * x**2 + b * x + c.
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='crimson', label='given points')
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x), max(x), 100)
ys = poly[0] * xs ** 2 + poly[1] * xs + poly[2]
plt.plot(xs, ys, color='dodgerblue', label=f'$({poly[0]:.2f})x^2+({poly[1]:.2f})x + ({poly[2]:.2f})$')
plt.legend()
plt.show()
Higher degree approximating polynomials
Given N points, an N-1 degree polynomial can pass exactly through each of them. Here is an example with 7 points and polynomials of up to degree 6,
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='black', zorder=3, label='given points')
for degree in range(0, len(x)):
poly = np.polyfit(x, y, deg=degree, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x) - 0.5, max(x) + 0.5, 100)
ys = sum(poly_i * xs**i for i, poly_i in enumerate(poly[::-1]))
plt.plot(xs, ys, label=f'degree {degree}')
plt.legend()
plt.show()
Another example
x = [0, 1, 2, 3, 4]
y = [1, 1, 6, 5, 5]
import numpy as np
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 2, 4, 8]
coeffs = np.polyfit(x, y, 2)
print(coeffs)
poly = np.poly1d(coeffs)
print(poly)
x_cont = np.linspace(0, 4, 81)
y_cont = poly(x_cont)
plt.scatter(x, y)
plt.plot(x_cont, y_cont)
plt.grid(1)
plt.show()
Executing the code, you have the graph above and this is printed in the terminal:
[ 0.75 -1.45 1.75]
2
0.75 x - 1.45 x + 1.75
It seems to me that you had false expectations about the output of polyfit.

'tuple' object has no attribute 'reshape'

I used a dataset "ex1data1.txt", but when I am running it to convert, it is showing the following error:
AttributeError Traceback (most recent call last)
<ipython-input-52-7c523f7ba9e1> in <module>()
1 # Converting loaded dataset into numpy array
2
----> 3 X = np.concatenate((np.ones(len(population)).reshape(len(population), 1), population.reshape(len(population),1)), axis=1)
4
5
AttributeError: 'tuple' object has no attribute 'reshape'
The code is given below:
import csv
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas as pd
import numpy as np
# Loading Dataset
with open('ex1data1.txt') as csvfile:
population, profit = zip(*[(float(row['Population']), float(row['Profit'])) for row in csv.DictReader(csvfile)])
# Creating DataFrame
df = pd.DataFrame()
df['Population'] = population
df['Profit'] = profit
# Plotting using Seaborn
sns.lmplot(x="Population", y="Profit", data=df, fit_reg=False, scatter_kws={'s':45})
# Converting loaded dataset into numpy array
X = np.concatenate((np.ones(len(population)).reshape(len(population), 1), population.reshape(len(population),1)), axis=1)
y = np.array(profit).reshape(len(profit), 1)
# Creating theta matrix , theta = [[0], [0]]
theta = np.zeros((2, 1))
# Learning rate
alpha = 0.1
# Iterations to be taken
iterations = 1500
# Updated theta and calculated cost
theta, cost = gradientDescent(X, y, theta, alpha, iterations)
I don't know how to solve this reshape problem. Can anyone tell how can I solve this problem?
from your definition, population is a tuple. I'd suggest two options, the first is converting it to an array, i.e.
population = np.asarray(population)
Alternatively, you can use the DataFrame column .values attribute, which is essentially a numpy array:
X = np.concatenate((np.ones(len(population)).reshape(len(population), 1), df['Population'].values.reshape(len(population),1)), axis=1)

In TensorFlow, what happens to a shared variable if get_variable is called within map_fn

I am trying to replace a for loop with map_fn, since the latter seems to help improving the loop efficiency.
The question is that, if the fn in map_fn calls get_variable() to create a new variable, how can I set reuse to True for the rest of the loop? Or is the get_variable() only called once in map_fn?
def fn(x):
y = tf.get_variable('y', [])
return x * x
squares = tf.map_fn(fn, np.array([1, 2, 3, 4 ,5 ,6]))
# Out: [array([ 1, 4, 9, 16, 25, 36])]
sess.run([squares])
In [2]: def fn(x):
y = tf.get_variable('y', [])
print(y.name)
return x * x
In [4]: import numpy as np
In [5]: squares = tf.map_fn(fn, np.array([1, 2, 3, 4 ,5 ,6]))
y:0
As we can see that if a print is inserted in the fn and when it is called it prints only once.

TypeError: float() argument must be a string or a number, array = np.array(array, dtype=dtype, order=order, copy=copy)

Im applying K-means clustering to data frame from cvs and excel files
ref: http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_iris.html#example-cluster-plot-cluster-iris-py
I try to run the code with my data from the csv file, data looks like:
DataFile
However receive following errors:
Traceback (most recent call last):
File "", line 1, in
runfile('/Users/nadiastraton/Documents/workspacePython/02450Toolbox_Python/Thesis/Scripts/Clustering/cluster3.py', wdir='/Users/nadiastraton/Documents/workspacePython/02450Toolbox_Python/Thesis/Scripts/Clustering')
File "/Applications/anaconda2/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "/Applications/anaconda2/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 81, in execfile
builtins.execfile(filename, *where)
File "/Users/cluster3.py", line 46, in
est.fit(x.as_matrix)
File "/Applications/anaconda2/lib/python2.7/site-packages/sklearn/cluster/k_means_.py", line 812, in fit
X = self._check_fit_data(X)
File "/Applications/anaconda2/lib/python2.7/site-packages/sklearn/cluster/k_means_.py", line 786, in _check_fit_data
X = check_array(X, accept_sparse='csr', dtype=np.float64)
File "/Applications/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 373, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
TypeError: float() argument must be a string or a number
print(doc)
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
from sklearn.cluster import KMeans
np.random.seed(5)
centers = [[1, 1], [-1, -1], [1, -1]]
data=pd.read_csv('/DataVisualisationSample.csv')
print(data.head())
x = pd.DataFrame(data,columns = ['Post_Share_Count','Post_Like_Count','Comment_Count'])
y = pd.DataFrame(data,columns = ['Comment_Like_Count'])
print(x.info())
estimators = {'k_means_data_3': KMeans(n_clusters=3),
'k_means_data_8': KMeans(n_clusters=12),
'k_means_data_bad_init': KMeans(n_clusters=3, n_init=1,
init='random')}
fignum = 1
for name, est in estimators.items():
fig = plt.figure(fignum, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)
plt.cla()
est.fit(x.as_matrix)
labels = est.labels_
ax.scatter(x[:, 2], x[:, 0], x[:, 1], c=labels.astype(np.int))
ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])
ax.set_xlabel('Post_Share_Count')
ax.set_ylabel('Post_Like_Count')
ax.set_zlabel('Comment_Count')
fignum = fignum + 1
# Plot the ground truth
fig = plt.figure(fignum, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)
plt.cla()
for name, label in [('Popular', 0),
('Not Popular', 1),
('Least Popular', 2)]:
ax.text3D(x[y == label, 2].mean(),
x[y == label, 0].mean() + 1.5,
x[y == label, 1].mean(), name,
horizontalalignment='center',
bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.int)
ax.scatter(x[:, 2], x[:, 0], x[:, 1], c=y).astype(np.int)
ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])
ax.set_xlabel('Post_Share_Count')
ax.set_ylabel('Post_Like_Count')
ax.set_zlabel('Comment_Count')
plt.show()
Tried to fix errors:
(est.fit(x.as_matrix) instead of est.fit(x))
and
(c=labels.astype(np.int) instead of c=labels.astype(np.float)) - (all values in my file are int.)
However changing from np.float to np.int does not fix it.

Unsupported operand type(s) for ** or pow(): 'generator' and 'int'

What I want:
I am trying to read a list of 6,000 coordinates (ra, and dec) and for each one of those coordinates they're 78 points around them. I am applying an angle (ang) and then trying to find the new RA and DEC. There seems to be a problem with z_sq=(x2 + y2) because I got the error Unsupported operand type(s) for ** or pow(): 'generator' and 'int'.
import matplotlib.pyplot as plt
import numpy as np
import pylab as py
coords=np.genfromtxt('HETDEX_reg.txt',dtype=None,usecols=(0,1,2),names= ('ra','dec','ang'))
ra=coords['ra']
dec=coords['dec']
ang=coords['ang']
coords=np.genfromtxt('Ifus_78_base.txt', dtype=None, usecols=(0,1), names=('xx', 'yy'))
xx=coords['xx']
yy=coords['yy']
for i in range(len(ang)):
ang_new=360-ang[i]
x= (xx[j] for j in range(len(xx)))
y= (yy[j] for j in range(len(yy)))
z_sq=(x**2 + y**2)
z=np.sqrt(z_sq)
x_new=(np.deg2rad(x))
y_new=(np.deg2rad(y))
Theta=py.arctan(x_new/y_new)
Tau=90-ang[i]-Theta
Tau_rad=np.deg2rad(Tau)
Delta_Dec=z*py.sin(Tau_rad)
DEC=dec[i]+Delta_Dec
Delta_ra=z*py.cos(Tau_rad)
RA=ra[i]+Delta_ra/(py.cos(DEC/206205))
print DEC
print RA
I should be getting a new set of RA and DEC 78 times (bc there are 78 x and y points), for each original ra and dec.
So the problem is in:
x= (xx[j] for j in range(len(xx)))
y= (yy[j] for j in range(len(yy)))
z_sq=(x**2 + y**2)
try
In [29]: x=(j for j in range(10))
In [30]: z=x**2+x
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-c8ea7ceba90f> in <module>()
----> 1 z=x**2+x
TypeError: unsupported operand type(s) for ** or pow(): 'generator' and 'int'
In [31]: x
Out[31]: <generator object <genexpr> at 0xb2b411e4>
See the error - x is a generator, created by the () expression
x=[j for j in range(10)] is list (list comprehension), but still does not work. x needs to be an array, something that has a ** method.
In [34]: x=np.array([j for j in range(10)])
In [35]: x**2
Out[35]: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])