Line graphs in matplotlib - pandas

I am plotting the below data frame using google charts.
Hour S1 S2 S3
1 174 0 811
2 166 0 221
3 213 1 1061
But with google charts, I am not able to save it to a file. Just wondering whether I can plot the dataframe in matplotlib line charts.
Any help would be appreciated.

pandas has charting method, just do:
df.plot()
where df is your pandas dataframe

matplotlib 1.5 and above supports a data wkarg
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot('S1', 'S2', data=df)
or just directly pass in the columns as input
ax.plot(df['S1'])

Related

Spaces in file and matplotlib pyplot only plotting first 10 values

I am using jupyter notebook to plot some data from a file that has a whole lot of space values between the two floats (x and y). For some reason, the plot that I output only outputs the first 10 values of the data set with lots of spaces.
The code works just file for another data file which does not have many spaces.
See below, the code and plot output for the file that works just fine (Ca_sr.World.avg_stddev.dat) and the code and plot output for the file that does not work well (Rep01_Cys239SG_ligandF1_dist.dat)
For the second (space-filled file) I have resorted to naming the spaces s1, s2, sn... and think that this may be the problem. Still not sure why this would only plot the first 10 lines of the data, though. To note, I have also tried to put an xlim value and it didn't work either.
Would greatly appreciate any help or suggestions!
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import os
import glob
import gc
prefix_file='~/Desktop/'
file_name='Ca_sr.World.avg_stddev.dat'
# plot a file (standalone from dataset)
filename=prefix_file + file_name
readfile= pd.read_csv(
filename,
delimiter=' ',
names=['x', 'y', 's'],
dtype={'x': np.float64, 'y': np.float64, 's': np.float64}
)
plt.plot(readfile['x'], readfile['y'] ,label='Hake RyR efflux',color='red')
plt.show()
Successful plot
prefix_file='~/Desktop/'
file_name='Rep01_Cys239SG_ligandF1_dist.dat'
# plot a file (standalone from dataset)
filename=prefix_file + file_name
readfile= pd.read_csv(
filename,
delimiter=' ',
names=['s1','s2','s3','s4','s5','s6','s7','x','s8','s9','s10','s11','s12', 'y'],
dtype={'x': np.float64, 'y': np.float64}
)
plt.plot(readfile['x'], readfile['y'] ,label='Hake RyR efflux',color='blue')
plt.show()
Unsuccessful plot
Example of first 20 lines of Ca_sr.World.avg_stddev.dat:
0 11795 0
1e-05 11941.56 73.400861030372
2e-05 12063 60.216276869298
3e-05 12089.76 80.550247671872
4e-05 12117.32 84.542401196086
5e-05 12138.96 81.17018171718
6e-05 12182.64 63.665299810807
7e-05 12148.24 73.5615551766
8e-05 12168.32 74.333690881053
9e-05 12161.12 85.144733248745
0.0001 12165.88 64.499500773262
0.00011 12190.88 76.538784939402
0.00012 12180.44 68.184502638063
0.00013 12190.72 62.885623158239
0.00014 12174.16 85.336829095063
0.00015 12175.36 58.05024030958
0.00016 12187.4 70.30163582734
0.00017 12209.36 69.206866711331
0.00018 12186.84 70.809423101731
0.00019 12212.04 82.26857480229
Example of first 20 lines of Rep01_Cys239SG_ligandF1_dist.dat:
#Frame Dis_00003
1 12.0948
2 11.8884
3 11.8573
4 11.8988
5 12.0257
6 10.8092
7 10.6126
8 10.6221
9 10.6896
10 10.5544
11 10.0383
12 10.5199
13 10.0731
14 10.6336
15 10.6044
16 9.9472
17 10.2276
18 9.9793
19 10.4104
UPDATE
This definitely has something to do with the way I am hardcoding the lines into the names value. In short, I need a way to ignore all the NaN (space) values and process only the floats.
See the following image of the data description
Data description :

Graph plotting in pandas and seaborn

I have the table with 5 columns with 8000 rows:
Market DeliveryWindowID #Orders #UniqueShoppersAvailable #UniqueShoppersFulfilled
NY 296 2 2 5
MA 365 3 4 8
How do I plot a graph in pandas or seaborn that will show the #Order, #UniqueShoppersAvailable, #UniqueShoppersFulfilled v/s the market and delivery window?
Using Seaborn, reshape your dataframe with melt first:
df_chart = df.melt(['Market','DeliveryWindowID'])
sns.barplot('Market', 'value',hue='variable', data=df_chart)
Output:
One way is to set Market as index forcing it onto the x axis and do a bar graph if you wanted a quick visualization. This can be stacked or not.
Not Stacked
import matplotlib .pyplot as plt
df.drop(columns=['DeliveryWindowID']).set_index(df.Market).plot(kind='bar')
Stacked
df.drop(columns=['DeliveryWindowID']).set_index(df.Market).plot(kind='bar', stacked=True)

how to plot a dataframe with two different axes in pandas matplotlib

So my data frame is like this:
6month final-formula numPatients6month
160243.0 1 0.401193 417
172110.0 2 0.458548 323
157638.0 3 0.369403 268
180306.0 4 0.338761 238
175324.0 5 0.247011 237
170709.0 6 0.328555 218
195762.0 7 0.232895 190
172571.0 8 0.319588 194
172055.0 9 0.415517 145
174609.0 10 0.344697 132
174089.0 11 0.402965 106
196130.0 12 0.375000 80
and I am plotting 6month, final-formula column
dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
plt.show()
till now its ok, it shows 6month in the x axis and final-formula in the y-axis.
what I want is that to show the numPatients6month in the same plot, but in another y axis.
according to the below diagram. I want to show numPatients6month in the position 1, or simply show that number on above each bar.
I tried to conduct that by twinx, but it seems it is for the case we have two plot and we want to plot it in the same figure.
fig = plt.figure()
ax = fig.add_subplot(111)
ax2 = ax.twinx()
ax.set_ylabel('numPatients6month')
I appreciate your help :)
This is the solution that resolved it.I share here may help someone :)
ax=dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
ax2 = ax.twinx()
ax2.spines['right'].set_position(('axes', 1.0))
dffinal.plot(ax=ax2,x='6month', y='numPatients6month')
plt.show()
Store the AxesSubplot in a variable called ax
ax = dffinal.plot(kind='bar',x='6month', y='final-formula')
and then
ax.tick_params(labeltop=False, labelright=True)
This will, bring the labels to the right as well.
Is this enough, or would you like to also know how to add values to the top of the bars? Because your question indicated, one of the two would satisfy.

Tick labels overlap in pandas bar chart

TL;DR: In pandas how do I plot a bar chart so that its x axis tick labels look like those of a line chart?
I made a time series with evenly spaced intervals (one item each day) and can plot it like such just fine:
intensity[350:450].plot()
plt.show()
But switching to a bar chart created this mess:
intensity[350:450].plot(kind = 'bar')
plt.show()
I then created a bar chart using matplotlib directly but it lacks the nice date time series tick label formatter of pandas:
def bar_chart(series):
fig, ax = plt.subplots(1)
ax.bar(series.index, series)
fig.autofmt_xdate()
plt.show()
bar_chart(intensity[350:450])
Here's an excerpt from the intensity Series:
intensity[390:400]
2017-03-07 3
2017-03-08 0
2017-03-09 3
2017-03-10 0
2017-03-11 0
2017-03-12 0
2017-03-13 2
2017-03-14 0
2017-03-15 3
2017-03-16 0
Freq: D, dtype: int64
I could go all out on this and just create the tick labels by hand completely but I'd rather not have to baby matplotlib and let do pandas its job and do what it did in the very first figure but with a bar plot. So how do I do that?
Pandas bar plots are categorical plots. They create one tick (+label) for each category. If the categories are dates and those dates are continuous one may aim at leaving certain dates out, e.g. to plot only every fifth category,
ax = series.plot(kind="bar")
ax.set_xticklabels([t if not i%5 else "" for i,t in enumerate(ax.get_xticklabels())])
In contrast, matplotlib bar charts are numberical plots. Here a useful ticker can be applied, which ticks the dates weekly, monthly or whatever is needed.
In addition, matplotlib allows to have full control over the tick positions and their labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
index = pd.date_range("2018-01-26", "2018-05-05")
series = pd.Series(np.random.rayleigh(size=100), index=index)
plt.bar(series.index, series.values)
plt.gca().xaxis.set_major_locator(dates.MonthLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter("%b\n%Y"))
plt.show()

Legend and title to line charts using matplotlib

I am plotting the below data frame using google charts.
Group G1 G2
Hour
6 19 1
8 1 2
I have plotted the above dataframe in line chart. But i am not able to add legend and title to the line chart. And also, I am trying to increase the size of the line charts as it appears to be very small. Not sure whether do we have these options in matplotlib. Any help would be appreciated.
import matplotlib.pyplot as plt
plt.plot(dft2)
plt.xlabel('Hour')
plt.ylabel('Count')
plt.show()
dfg2.plot(legend=True, figsize=(8,8))
plt.legend(ncol=3, bbox_to_anchor=[1.35, 1], handlelength=2, handletextpad=1, columnspacing=1, title='Legend')
plt.title('Title here!', color='black', fontsize=17)
plt.xlabel('Hour', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.show()