Assuming I have the following array: [1,1,1,2,2,40,60,70,75,80,85,87,95] and I want to create a histogram out of it based on the following bins - x<=2, [3<=x<=80], [x>=81].
If I do the following: arr.hist(bins=(0,2,80,100)) I get the bins to be at different widths (based on their x range). I want them to represent different size ranges but appear in the histogram at the same width. Is it possible in an elegant way?
I can think of adding a new column for this (holding the bin id that will be calculated based on the boundaries I want) but don't really like this solution..
Thanks!
Sounds like you want a bar graph; You could use bar:
import numpy as np
import matplotlib.pyplot as plt
arr=np.array([1,1,1,2,2,40,60,70,75,80,85,87,95])
h=np.histogram(arr,bins=(0,2,80,100))
plt.bar(range(3),h[0],width=1)
xlab=['x<=2', '3<=x<=80]', 'x>=81']
plt.xticks(arange(0.5,3.5,1),xlab)
Related
I've used pandas.read_csv to generate a 1000-row dataframe with 32 columns. I'm looking to plot a histogram or bar chart (depending on data type) of each column. For columns of type 'int64', I've tried doing matplotlib.pyplot.hist(df['column']) and df.hist(column='column'), as well as calling matplotlib.pyplot.hist on df['column'].values and df['column'].to_numpy(). Weirdly, nthey all take areally long time (>30s) and when I've allowed them to complet, I get unit-height bars in multiple colors, as if there's some sort of implicit grouping and they're all being separated into different groups. Any ideas about what I can do to get a normal histogram? Unfortunately I closed the charts so I can't show you an example right now.
Edit - this seems to be a much bigger problem with Int columns, and casting them to float fixes the problem.
Follow these two steps:
import the Histogram class from the Matplotlib library
use the "plot" method, which will accept a dataframe as argument
import matplotlib.pyplot as plt
plt.hist(df['column'], color='blue', edgecolor='black', bins=int(45/1))
Here's the source.
I am trying to plot a multi-axes line graph in Plotly and my data is based on the percentage (y-axis) v/s date (x-axis).
X and Y-axis coming from the database via pandas
Now since Plotly doesn't understand the order of string date in the x-axis it adjusted it automatically.
I am looking for something where my x-axis remains static for dates and in order and graph plots on top of that mapping based on their dates matching parameter.
static_x_axis = ['02-11-2021', '03-11-2021', '04-11-2021', '05-11-2021', '06-11-2021', '07-11-2021', '08-11-2021', '09-11-2021', '10-11-2021', '11-11-2021', '12-11-2021', '13-11-2021', '14-11-2021', '15-11-2021', '16-11-2021', '17-11-2021', '18-11-2021', '19-11-2021', '20-11-2021', '21-11-2021', '22-11-2021', '23-11-2021']
and the above list determines the x-axis mapping.
I tried using range but seems that does not support static mapping or either map all graphs from the 0th point.
Overall I am looking for a way that either follows a static date range or either does not break the current order of dates like what happened in the above graph.
Thanks in advance for your help.
from your question your data:
x date as a string representation (i.e. categorical)
y a number between 0 and 1 (a precentage)
three traces
you describe that x is unordered as source. Require it to be sorted in the x-axis
below simulates a figure in this way
then applies categorical axis sorting
import pandas as pd
import numpy as np
import plotly.graph_objects as go
s = pd.Series(pd.date_range("2-nov-2021", periods=40).strftime("%d-%m-%Y"))
fig = go.Figure(
[
go.Scatter(
x=s.sample(10).sort_index().values,
y=np.linspace(n/4, n/3, 10),
mode="lines+markers+text",
)
for n in range(1,4)
]
).update_traces(texttemplate="%{y:.2f}", textposition="top center")
fig.show()
fig.update_layout(xaxis={"categoryorder": "array", "categoryarray": s.values})
fig.show()
I have the following Dataset and I wanna create a plot, which to columns compares with each other.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
ds=pd.read_csv('h-t-t-p-:bit.ly/uforeports') #My DataSet
ds.head(5) # Only the fist 5 rows to show
ds1= ds.head(4).drop(['Colors Reported','State'],axis=1) # Droping of unnecesssary rows
print(ds1)
Now I wanna compare "City" and "Shape Reported" with help of plotting. I found something with Pandas but this is not so elegant!
x=ds.loc[0:100,['State']]
y=ds.loc[0:100,['Shape Reported']]
x.apply(pd.value_counts).plot(kind='bar', subplots=True)
y.apply(pd.value_counts).plot(kind='bar', subplots=True)
Do you know a better solution with Matplotlib to this problem?
This is what I want
It's not exactly clear how you want to compare them.
The simplest way of drawing a bar chart is:
df['State'].value_counts().plot.bar()
df['Shape Reported'].value_counts().plot.bar()
If you just want to do it for the first 100 rows as in your example, just add head(100):
df['State'].head(100).value_counts().plot.bar()
df['Shape Reported'].head(100).value_counts().plot.bar()
EDIT:
To compare the two values you can plot a bivariate distribution plot. This is easily done with seaborn:
import seaborn
sns.displot(df,x='State', y='Shape Reported', height=6, aspect=1.33)
Result:
I have this array with 10 values.
I get that my array has so many numbers behind the comma.
But I notice there's value on top left corner.
Anyone knows what is it and how remove it?
thank you in advance.
the array:
0.00409960926442099
0.00409960926442083
0.004099609264420652
0.004099609264420653
0.004099609264420585
0.0040996092644205884
0.004099609264420545
0.004099609264420517
0.004099609264420514
0.004099609264420513
As your values are all very close together, the usual ticks would all be the same. For example, if you use '%.6f' as the tick format, you'd get '0.00410' for each of the ticks. That would not be very helpful. Therefore, matplotlib puts a base number '4.099609264420e-3' together with an offset '1e-16' to label the yticks. So, every real ytick would be the base plus the offset times the tick-value.
To get rid of these strange numbers, you have to re-evaluate what exactly you want to achieve with your plot. If you'd set some y-limits (e.g. plt.ylim(0.004099, 0.004100)), you'd get a quite dull horizontal line. Note that 1e-16 is very close to the maximum precision you can get using standard floating-point math.
Here is some demo code to show how it would look with the '%.6f' format:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
plt.plot([0.00409960926442099, 0.00409960926442083, 0.004099609264420652, 0.004099609264420653,
0.004099609264420585, 0.0040996092644205884, 0.004099609264420545, 0.004099609264420517,
0.004099609264420514, 0.004099609264420513])
plt.gca().yaxis.set_major_formatter(mtick.FormatStrFormatter('%.6f'))
plt.tight_layout()
plt.show()
The method plt.hist() in pyplot has a way to create a 'step-like' plot style when calling
plt.hist(data, histtype='step')
but the 'ordinary' methods that plot raw data without processing (plt.plot(), plt.scatter(), etc.) apparently do not have style options to obtain the same result. My goal is to plot a given set of points using that style, without making histogram of these points.
Is that achievable with standard library methods for plotting a given 2-D set of points?
I also think that there is at least one hack (generating a fake distribution which would have histogram equal to our data) and a 'low-level' solution to draw each segment manually, but none of these ways seems favorable.
Maybe you are looking for drawstyle="steps".
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
data = np.cumsum(np.random.randn(10))
plt.plot(data, drawstyle="steps")
plt.show()
Note that this is slightly different from histograms, because the lines do not go to zero at the ends.