Getting the xlabels to reflect the DataFrame column - pandas

I have the attached DF that I am trying to plot however, the xvalues are the index from the DF when I would want them to be the actual episode names. Any recommendations here?DF pic

By first selecting the column from the DataFrame, and the plotting that you end up not "carrying over" those labels with you. You fix this two different ways:
plot from the DataFrame, not from the Series
top_episodes.plot(x="EpisodeTitle", y="Viewship(MM)", kind="bar")
Alternatively, set the index to be the EpisodeTitle, and then perform your column selection/plotting.
top_episodes.set_index("EpisodeTitle")["Viewship(MM)"].plot(kind="bar")

Related

Pandas plot with only tow dates as index

I have this simple pandas dataframe with dates as index:
df=pd.DataFrame({'a':[20,30,12],'b':[15,18,18]},index=['2021-10-7','2021-10-8','2021-10-9']) df.index=pd.to_datetime(df.index)
when I try to plot a simple pandas.plot with only two dates in xaxis
df.iloc[-2:].plot()
it gives me the following plot with lot of numbers in axis
Plot works fine if I plot the entire db: db.plot()
Thank you for support
You can add below line after setting your index to make it work.
df.index.freq = 'D'
So Your entire code looks like this:
df=pd.DataFrame({'a':[20,30,12],'b':[15,18,18]},index=['2021-10-7','2021-10-8','2021-10-9'])
df.index = pd.to_datetime(df.index)
df.index.freq = 'D'
Alternatively:
You can also use date_range like below :
Please note this would work only if your data is like the one provided which has frequency of daily, You need to adjust in cases where the frequencies are different.
df=pd.DataFrame({'a':[20,30,12],'b':[15,18,18]},index=['2021-10-7','2021-10-8','2021-10-9'])
df.index=pd.date_range(start = '2021-10-07', end='2021-10-09')
Both approaches will give you same plot which you have mentioned in the question(similar to bottom one in the provided question)

Export row-wise colored dataframe to excel using pandas styler

I have some trouble with using Pandas Style. Actually I try to color rows in a dataFrame if a certain column contain a certain value. And at the End I want to put the colored df in an Excelfile.
thanks.

Applying functions to DataFrame columns in plots

I'd like to apply functions to columns of a DataFrame when plotting them.
I understand that the standard way to plot when using Pandas is the .plot method.
How can I do math operations within this method, say for example multiply two columns in the plot?
Thanks!
Series actually have a plot method as well, so it should work to apply
(df['col1'] * df['col2']).plot()
Otherwise, if you need to do this more than once it would be the usual thing to make a new column in your dataframe:
df['newcol'] = df['col1'] * df['col2']

Stacking multiple plots together with a single x-axis

Suppose I have multiple time dependent variables and I want to plot them all together stacked one of on top of another like the image below, how would I do so in matplotlib? Currently when I try plotting them they appear as multiple independent plots.
EDIT:
I have a Pandas dataframe with K columns corresponding to dependent variables and N rows corresponding to observed values for those K variables.
Sample code:
df = get_representation(mat) #df is the Pandas dataframe
for i in xrange(len(df.columns)):
plt.plot(df.ix[:,i])
plt.show()
I would like to plot them all one on top of another.
You could just stack all the curves by shifting each curve vertically:
df = get_representation(mat) #df is the Pandas dataframe
for i in xrange(len(df.columns)):
plt.plot(df.ix[:, i] + shift*i)
plt.show()
Here shift denotes the average distance between the curves.

Pandas Timeseries plotting

I have a Pandas timeseries object with dates and corresponding values. But, when I try to plot it, the plot is a L-shape plot (the dates and values are automatically arranged in such a way that the highest value comes first...).
This is what did to generate the plot:
df = pd.read_csv('C:\data\test1.csv') # two-column dataframe
data_list = df['values'].tolist()
dates_list = df['date'].tolist()
df_ts = pd.Series(data_list, index=dates_list)
df_ts.plot()
I am not sure where I am making a mistake. I am reading in a csv file, converting to a timeseries obj and plotting it. Any suggestions is very much appreciated.
Thanks!
PD
don't bother creating the unnecessary intermediate data structures, just organize your DataFrame better.
df['date'] = pd.to_datetime(df.date) #make sure you're actually dealing with timestamps.
df.set_index('date', inplace=True)
df.sort(inplace=True)
df.plot()