Pandas time series plot xticklabel [duplicate] - pandas

This question already has answers here:
Pandas Dataframe line plot display date on xaxis
(1 answer)
How to get ticks every hour
(2 answers)
datetime dtypes in pandas read_csv
(6 answers)
Closed 4 years ago.
I am trying to plot time series data, and label the x-axis by month.
This is the first few of my dataframe:
Time Value
2012-01-01 00:00:00 1.223
2012-01-01 00:00:30 2.132
2012-01-01 00:01:00 1.417
2012-01-01 00:01:30 1.767
And my code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import MonthLocator
%matplotlib inline
df = pd.read_csv('data.csv', index_col=0)
value = df.value
plt.figure(figsize=(15,10))
ax = value.plot(x_compat=True)
ax.xaxis.set_major_locator(MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(plt.NullFormatter())
plt.show()
As suggested in other answers, I already added the (x_compat=True) argument. However, this doesn't give the ticks and ticklabels of my x-axis at all. How can I fix this?
And I also tried the second way with matplotlib by this
plt.figure(figsize=(15,10))
plt.plot(df.index, df.value)
plt.gca().xaxis.set_major_locator(MonthLocator(bymonthday=15))
plt.gca().xaxis.set_major_formatter(DateFormatter("%M"))
But still I can't see the ticklabels in month. It doesn't give me anything on x-axis.
Answer I should convert df.index using to_datatime first.

Related

Seaborn boxplot for classification with pandas wide to long [duplicate]

This question already has answers here:
Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
(4 answers)
Plotting grouped barplot using seaborn
(2 answers)
Can't plot time series with seaborn
(2 answers)
Plotting multiple boxplots in seaborn
(2 answers)
Pandas DataFrame.hist Seaborn equivalent
(5 answers)
Closed 9 months ago.
I have data that I would like to train an ml classifier on. The data is in wide format. I'd like to do a boxplot with searborn sns.boxplot(x='variable',y='value', hue='target', data=df_train). How do I reshape the data to be able to pass it to sns.boxplot?
Sample data
import pandas as pd
from sklearn import datasets
X, y = datasets.make_classification(n_samples=100, n_features=5, random_state=1)
df_train = pd.DataFrame(X)
df_train['y']=y
pd.melt is what you want to use.
dfg_train = df_train.melt(id_vars='y')
sns.boxplot(x='variable',y='value', hue='y', data=dfg_train)

Plot a barchart after pandas.value_counts() [duplicate]

This question already has an answer here:
Using pandas value_counts and matplotlib
(1 answer)
Closed 8 months ago.
I have a dataframe with several columns and I need to plot a graph based on the number of counts in the 'Total'column.
I performed the following code:
df['Total'].value_counts()
The output are as follows:
2 10
20 15
4 8
8 20
This means the the number 2 appears in the Total columns 10 times, number 20 appears 15 times and so on.
How do I plot a barchart with the x-axis as the number itself and the y-axis as the occurances and in ascending
order? The x-axis will plot 2 -> 4 -> 8 -> 20.
What are the next steps after:
%matplotlib inline
import matplotlib.pyplot as plt
Consider this as an example:
This denoted your 'Total' column -> [2,2,2,2,2,20,20,20,20,4,4,4,8,8,8,8,8]
import pandas as pd
import matplotlib.pyplot as plt
import collections
total = [2,2,2,2,2,20,20,20,20,4,4,4,8,8,8,8,8]
df = pd.DataFrame(total, columns=['total'])
#print(df.value_counts())
fig,ax = plt.subplots()
df['total'].value_counts().plot(ax = ax, kind = 'bar', ylabel = 'frequency')
plt.show()
This gives the following output:

Show values of hbar values in matplotlib [duplicate]

This question already has answers here:
How to plot and annotate grouped bars in seaborn / matplotlib
(1 answer)
How to add value labels on a bar chart
(7 answers)
Closed 1 year ago.
I want to show my values in hbar diagram but somehow I'm not able to render the values right to the bars.
Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
index = ['average', '50th', '95th', '99.9th', 'max']
aaa = [3180, 2153, 9172, 9368, 9432]
bbb = [3857, 3367, 11638, 14555, 14731]
ccc = [740, 716, 1326, 1927, 2591]
df = pd.DataFrame({'aaa': aaa, 'bbb': bbb, 'ccc': ccc}, index=index)
ax = df.plot.barh()
How can I display my values to each bar and is also possible to save the figure for e.g. .png file?

pandas.groupby --> DatetimeIndex --> groupby year

I come from Javascript and struggle. Need to sort data by DatetimeIndex, further by the year.
CSV looks like this (i shortened it because of more than 1300 entries):
date,value
2016-05-09,1201
2017-05-10,2329
2018-05-11,1716
2019-05-12,10539
I wrote my code like this to throw away the first and last 2.5 percent of the dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv( "fcc-forum-pageviews.csv", index_col="date", parse_dates=True).sort_values('value')
df = df.iloc[(int(round((df.count() / 100 * 2,5)[0]))):(int(round(((df.count() / 100 * 97,5)[0])-1)))]
df = df.sort_index()
Now I need to group my DatetimeIndex by years to plot it in a manner way by matplotlib. I struggle right here:
def draw_bar_plot():
df_bar = df
fig, ax = plt.subplots()
fig.figure.savefig('bar_plot.png')
return fig
I really dont know how to groupby years.
Doing something like:
print(df_bar.groupby(df_bar.index).first())
leads to:
value
date
2016-05-19 19736
2016-05-20 17491
2016-05-26 18060
2016-05-27 19997
2016-05-28 19044
... ...
2019-11-23 146658
2019-11-24 138875
2019-11-30 141161
2019-12-01 142918
2019-12-03 158549
How to group this by year? Maybe further explain how to get the data ploted by mathplotlib as a bar chart accurately.
This will group the data by year
df_year_wise_sum = df.groupby([df.index.year]).sum()
This line of code will give a bar plot
df_year_wise_sum.plot(kind='bar')
plt.savefig('bar_plot.png')
plt.show()

How can I convert pandas date time xticks to readable format?

I am plotting a time series with a date time index. The plot needs to be a particular size for the journal format. Consequently, the sticks are not readable since they span many years.
Here is a data sample
2013-02-10 0.7714492098202259
2013-02-11 0.7709101833765016
2013-02-12 0.7704911332770049
2013-02-13 0.7694975914173087
2013-02-14 0.7692108921323576
The data is a series with a datetime index and spans from 2013 to 2016. I use
data.plot(ax = ax)
to plot the data.
How can I format my xticks to read like '13 instead of 2013?
It seems there is some incompatibility between pandas and matplotlib formatters/locators when it comes to dates. See e.g. those questions:
Pandas plot - modify major and minor xticks for dates
Pandas Dataframe line plot display date on xaxis
I'm not entirely sure why it still works in some cases to use matplotlib formatters and not in others. However because of those issues, the bullet-proof solution is to use matplotlib to plot the graph instead of the pandas plotting function.
This allows to use locators and formatters just as seen in the matplotlib example.
Here the solution to the question would look as follows:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
dates = pd.date_range("2013-01-01", "2017-06-20" )
y = np.cumsum(np.random.normal(size=len(dates)))
s = pd.Series(y, index=dates)
fig, ax = plt.subplots()
ax.plot(s.index, s.values)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
yearFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearFmt)
plt.show()
According to this example, you can do the following
import matplotlib.dates as mdates
yearsFmt = mdates.DateFormatter("'%y")
years = mdates.YearLocator()
ax = df.plot()
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
Full work below
Add word value so pd.read_clipboard puts dates into index
value
2013-02-10 0.7714492098202259
2014-02-11 0.7709101833765016
2015-02-12 0.7704911332770049
2016-02-13 0.7694975914173087
2017-02-14 0.7692108921323576
Then read in data and convert index
df = pd.read_clipboard(sep='\s+')
df.index = pd.to_datetime(df.index)