Related
I am trying to create a python program in which the user inputs a set of data and the program spits out an output in which it creates a graph with a line/polynomial which best fits the data.
This is the code:
from matplotlib import pyplot as plt
import numpy as np
x = []
y = []
x_num = 0
while True:
sequence = int(input("Input 1 number in the sequence, type 9040321 to stop"))
if sequence == 9040321:
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
plt.plot(poly)
plt.scatter(x, y, c="blue", label="data")
plt.legend()
plt.show()
break
else:
y.append(sequence)
x.append(x_num)
x_num += 1
I used the polynomial where I inputed 1, 2, 4, 8 each in separate inputs. MatPlotLib graphed it properly, however, for the degree of 2, the output was the following image:
This is clearly not correct, however I am unsure what the problem is. I think it has something to do with the degree, however when I change the degree to 3, it still does not fit. I am looking for a graph like y=sqrt(x) to go over each of the points and when that is not possible, create the line that fits the best.
Edit: I added a print(poly) feature and for the selected input above, it gives [0.75 0.05 1.05]. I do not know what to make of this.
Approximation by a second degree polynomial
np.polyfit gives the coefficients of a polynomial close to the given points. To plot the polynomial as a smooth curve with matplotlib, you need to calculate a lot of x,y pairs. Using np.linspace(start, stop, numsteps) for the xs, numpy's vectorization allows calculating all the corresponding ys in one go. E.g. ys = a * x**2 + b * x + c.
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='crimson', label='given points')
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x), max(x), 100)
ys = poly[0] * xs ** 2 + poly[1] * xs + poly[2]
plt.plot(xs, ys, color='dodgerblue', label=f'$({poly[0]:.2f})x^2+({poly[1]:.2f})x + ({poly[2]:.2f})$')
plt.legend()
plt.show()
Higher degree approximating polynomials
Given N points, an N-1 degree polynomial can pass exactly through each of them. Here is an example with 7 points and polynomials of up to degree 6,
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='black', zorder=3, label='given points')
for degree in range(0, len(x)):
poly = np.polyfit(x, y, deg=degree, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x) - 0.5, max(x) + 0.5, 100)
ys = sum(poly_i * xs**i for i, poly_i in enumerate(poly[::-1]))
plt.plot(xs, ys, label=f'degree {degree}')
plt.legend()
plt.show()
Another example
x = [0, 1, 2, 3, 4]
y = [1, 1, 6, 5, 5]
import numpy as np
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 2, 4, 8]
coeffs = np.polyfit(x, y, 2)
print(coeffs)
poly = np.poly1d(coeffs)
print(poly)
x_cont = np.linspace(0, 4, 81)
y_cont = poly(x_cont)
plt.scatter(x, y)
plt.plot(x_cont, y_cont)
plt.grid(1)
plt.show()
Executing the code, you have the graph above and this is printed in the terminal:
[ 0.75 -1.45 1.75]
2
0.75 x - 1.45 x + 1.75
It seems to me that you had false expectations about the output of polyfit.
I would like to obtain a smooth curve going through specific points with integer coordinates. Instead of that I get straight line segments between the points. I tried interp1d(x,y,kind='cubic') and also CubicSpline, nothing works. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d,CubicSpline
x = np.arange(34)
y = [8,3,0,1,6,2,1,7,6,2,0,2,6,0,1,6,2,2,0,2,7,0,2,8,6,3,6,2,0,1,6,2,7,2]
f = CubicSpline(x, y)
plt.figure(figsize=(10,3))
plt.plot(x, y, 'o', x, f(x))
plt.show()
and here is the result:
Can you tell me how to get smooth curves instead?
Now you are using the original x-values to draw the curve. You need a new array with much more intermediate x-values. Numpy's np.linspace() creates such an array between a given minimum and maximum.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d, CubicSpline
y = [8, 3, 0, 1, 6, 2, 1, 7, 6, 2, 0, 2, 6, 0, 1, 6, 2, 2, 0, 2, 7, 0, 2, 8, 6, 3, 6, 2, 0, 1, 6, 2, 7, 2]
x = np.arange(len(y))
f = CubicSpline(x, y)
plt.figure(figsize=(10, 3))
xs = np.linspace(x.min(), x.max(), 500)
plt.plot(x, y, 'o', xs, f(xs))
plt.tight_layout()
plt.show()
Consider the following code:
import matplotlib.pyplot as plt
import numpy as np
from pylab import *
graph_data = [[0, 1, 2, 3], [5, 8, 7, 9]]
x = range(len(graph_data[0]))
y = graph_data[1]
fig, ax = plt.subplots()
alpha = 0.5
plt.plot(x, y, '-o',markersize=3, color=[1., alpha, alpha], markeredgewidth=0.0)
ax.fill_between(x, 0, y, facecolor=[1., alpha, alpha], interpolate=False)
plt.show()
filename = 'test1.pdf'
fig.savefig(filename, bbox_inches='tight')
It works fine. However, when zoomed in the generated PDF, I can see two thin gray/black boundaries that separate the line:
I can see this when viewing in both Edge and Chrome. My question is, how can I get rid of the boundaries?
UPDATE I forgot to mention, I was using Sage to generate the graph. Now it seems a problem specific to Sage (and not to Python in general). This time I used native Python, and got correct result.
I could not reproduce it but maybe you can try to not plot the line.
import matplotlib.pyplot as plt
import numpy as np
from pylab import *
graph_data = [[0, 1, 2, 3], [5, 8, 7, 9]]
x = range(len(graph_data[0]))
y = graph_data[1]
fig, ax = plt.subplots()
alpha = 0.5
plt.plot(x, y, 'o',markersize=3, color=[1., alpha, alpha])
ax.fill_between(x, 0, y, facecolor=[1., alpha, alpha], interpolate=False)
plt.show()
filename = 'test1.pdf'
fig.savefig(filename, bbox_inches='tight')
Consider the following MWE.
from pandas import DataFrame
from bokeh.plotting import figure
data = dict(x = [0,1,2,0,1,2],
y = [0,1,2,4,5,6],
g = [1,1,1,2,2,2])
df = DataFrame(data)
p = figure()
p.line( 'x', 'y', source=df[ df.g == 1 ] )
p.line( 'x', 'y', source=df[ df.g == 2 ] )
Ideally, I would like to compress the last to lines in one:
p.line( 'x', 'y', source=df.groupby('g') )
(Real life examples have a large and variable number of groups.) Is there any concise way to do this?
I just found out that the following works
gby = df.groupby('g')
gby.apply( lambda d: p.line( 'x', 'y', source=d ) )
(it has some drawbacks, though).
Any better idea?
I didn't come out with df.groupby so I used df.loc but maybe multi_line is what you are after:
from pandas import DataFrame
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
data = dict(x = [0, 1, 2, 0, 1, 2],
y = [0, 1, 2, 4, 5, 6],
g = [1, 1, 1, 2, 2, 2])
df = DataFrame(data, index = data['g'])
dfs = [DataFrame(df.loc[i].values, columns = df.columns) for i in df['g'].unique()]
source = ColumnDataSource(dict(x = [df['x'].values for df in dfs], y = [df['y'].values for df in dfs]))
p = figure()
p.multi_line('x', 'y', source = source)
show(p)
Result:
This is Tony's solution slightly simplified.
import pandas as pd
from bokeh.plotting import figure
data = dict(x = [0, 1, 2, 0, 1, 2],
y = [0, 1, 2, 4, 5, 6],
g = [1, 1, 1, 2, 2, 2])
df = pd.DataFrame(data)
####################### So far as in the OP
gby = df.groupby('g')
p = figure()
x = [list( sdf['x'] ) for i,sdf in gby]
y = [list( sdf['y'] ) for i,sdf in gby]
p.multi_line( x, y )
from pandas import DataFrame
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
data = dict(x = [0, 1, 2, 0, 1, 2],
y = [0, 1, 2, 4, 5, 6],
g = [1, 1, 1, 2, 2, 2])
df = DataFrame(data)
plt = figure()
for i, group in df.groupby(['g']):
source = ColumnDataSource(group)
plt.line('x','y', source=source, legend_group='g')
show(plt)
I use bar3d() to plot a 3D barchart, and I'd like to flip the y axis. I've tried to use invert_yaxis(), but it seems effectless. I've also tried manually reverse the values in the list with [::-1], but it didn't help either. It keeps displaying the 3D barchart in the very same way.
Any idea how can I flip the y axis?
Here's an example how it's not working for me (not even with 3D line plots):
from matplotlib.pyplot import *
from mpl_toolkits.mplot3d.axes3d import Axes3D
fig1 = figure(1)
ax11 = subplot(2, 2, 1, projection='3d')
ax11.plot([1, 2, 3, 4], [1, 2, 3, 4])
ax12 = subplot(2, 2, 2, projection='3d')
ax12.invert_xaxis()
ax12.plot([1, 2, 3, 4], [1, 2, 3, 4])
ax21 = subplot(2, 2, 3)
ax21.plot([1, 2, 3, 4])
ax22 = subplot(2, 2, 4)
ax22.invert_xaxis()
ax22.plot([1, 2, 3, 4])
show()
And the plot looks like this: http://we.tl/cqSsecVy6P
Thanks,
Daniel
If I understand the question correctly I think the problem is that matplotlib rotates the 3D plot. To remedy this just set the initial viewing angle using ax.view_init(elev, azim). Taking the matplotlib hist3d demo then we just have
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x, y = np.random.rand(2, 100) * 4
hist, xedges, yedges = np.histogram2d(x, y, bins=4)
elements = (len(xedges) - 1) * (len(yedges) - 1)
xpos, ypos = np.meshgrid(xedges[:-1]+0.25, yedges[:-1]+0.25)
xpos = xpos.flatten()
ypos = ypos.flatten()
zpos = np.zeros(elements)
dx = 0.5 * np.ones_like(zpos)
dy = dx.copy()
dz = hist.flatten()
ypos_inv = ypos
ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='b', zsort='average')
ax.view_init(ax.elev, ax.azim+90)
plt.show()
Here I have rotated the axis by 90 degrees which flips one of the axis but not the other.