Best way of showing dates in a Bar plot with Pandas - pandas

I create a bar plot like this:
But since each x axis label is one day of january (for example 1, 3, 4, 5, 7, 8, etc) I think the best way of showing this is something like
__________________________________________________ x axis
Jan 1 3 4 5 7 8 ...
2019
But I dont know how to do this with Pandas.
Here is my code:
import pandas as pd
import matplotlib.plt as plt
df = pd.read_excel('solved.xlsx', sheetname="Jan")
fig, ax = plt.subplots()
df_plot=df.plot(x='Date', y=['Solved', 'POT'], ax=ax, kind='bar',
width=0.9, color=('#ffc114','#0098c9'), label=('Solved','POT'))
def line_format(label):
"""
Convert time label to the format of pandas line plot
"""
month = label.month_name()[:3]
if month == 'Jan':
month += f'\n{label.year}'
return month
ax.set_xticklabels(map(lambda x: line_format(x), df.Date))
The function was a solution provided here: Pandas bar plot changes date format
I dont know how to modify it to get the axis I want
My data example solved.xlsx:
Date Q A B Solved POT
2019-01-04 Q4 11 9 14 5
2019-01-05 Q4 9 11 14 5
2019-01-08 Q4 11 18 10 6
2019-01-09 Q4 18 19 18 5

I have found a solution:
import pandas as pd
import matplotlib.plt as plt
df = pd.read_excel('solved.xlsx', sheetname="Jan")
fig, ax = plt.subplots()
df_plot=df.plot(x='Date', y=['Solved', 'POT'], ax=ax, kind='bar',
width=0.9, color=('#ffc114','#0098c9'), label=('Solved','POT'))
def line_format(label):
"""
Convert time label to the format of pandas line plot
"""
day = label.day
if day == 2:
day = str(day) + f'\n{label.month_name()[:3]}'
return day
ax.set_xticklabels(map(lambda x: line_format(x), df.Date))
plt.show()
In my particular case I didnt have the date 2019-01-01 . So the first day for me was Jan 2

Related

Matplotlib plot for loop issue

I'm trying to plot time series data in matplotlib using a for loop. The goal is to dynamically plot 'n' years worth of daily closing price data. If i load 7 years of data, I get 7 unique plots. I have created a summary of the start and end dates for a data set, yearly_date_ranges (date is the index). I use this to populate start and end dates. The code I've written so far produces 7 plots of all daily data instead of 7 unique plots, one for each year. Any help is appreciated. Thanks in advance!
yearly_date_ranges
start end
Date
2014 2014-04-01 2014-12-31
2015 2015-01-01 2015-12-31
2016 2016-01-01 2016-12-31
2017 2017-01-01 2017-12-31
2018 2018-01-01 2018-12-31
2019 2019-01-01 2019-12-31
2020 2020-01-01 2020-05-28
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(12,20))
for i in range(len(yearly_date_ranges)):
ax = fig.add_subplot(len(yearly_date_ranges),1,i + 1)
for row in yearly_date_ranges.itertuples(index=False):
start = row.start
end = row.end
subset = data[start:end]
ax.plot(subset['Close'])
plt.show()
Dynamically you should do something like this:
fig, axes = plt.subplots(7,1, figsize=(12,20))
years = data.index.year
for ax, (k,d) in zip(axes.ravel(), data.groupby(years)):
d.plot(y='Close', ax=ax)
This worked! Thank you for the help
fig, axes = plt.subplots(7,1, figsize=(12,20))
years = data.index.year
for ax, (k,d) in zip(axes.ravel(), data['Close'].groupby(years)):
d.plot(x='Close', ax=ax)

Can't get y-axis on matplotlib histogram to display the right numbers

So I have this simple DataFrame which i am trying to plot a histogram with
Hour Count Average Count
2 6 4 0.129032
4 7 1 0.032258
1 12 9 0.290323
3 16 3 0.096774
0 20 2022 65.225806
What I want is the Hour to be on the x-axis and Average Count to be on the Y axis. But when i tried this:
fig, hour = plt.subplots(1, 1)
hour.hist(test.Hour)
hour.set_xlabel('Time in 24 Hours')
hour.set_ylabel('Frequency')
plt.show()
I got this instead. I have tried doing test.Count and test['Average Count'] but both only affects the x-axis
Are you looking for something like this?
'df' is the name of the dataframe.
df.plot(x='Hour', y = 'Averag Count', kind='bar')
Output

Plotting 2 columns as 2 lines and 1 column as x axis on Dataframes

I'm new to pandas and all these dataframe. I am interested to know how I could transform my current codes to plt.figure instead. I would like to plot 2 columns (Tourism Receipts, Visitors) as line while putting another column as the x axis (Quarters).
It seems that this code works. But i would like to know whether there may be a better way to do it such as plt.plot but allowing me to set the x-axis as Quarters and the other 2 columns as lines?
df1= df.set_index('Quarters').plot(figsize=(10,5), grid=True)
Dataframe (from my csv file):
| Quarters | Tourism Receipts | Visitors |
| 2019 Q1 | 10 | 1 |
| 2019 Q2 | 20 | 2 |
| 2019 Q3 | 30 | 3 |
| 2019 Q4 | 40 | 4 |
I understand this following method
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(20,10))
plt.plot(x,y)
plt.title
plt.xlabel
plt.ylabel
I would like to enquire whether there is a way to do transform the 'df.set_index' method to plt instead?
You can actually combine both, using the .plot method which saves a lot of effort from pd and use matplotlib features side-by-side to customize the output.
This is a sample code of who to address this:
from matplotlib import pyplot as plt
import pandas as pd
fig, ax = plt.subplots(1, figsize=(10, 10))
df.set_index('Quarters')[['Tourism Receipts', 'Visitors']].plot(figsize=(10,5), grid=True, ax=ax)
ax.set_yticks(range(-10, 41, 5))
# ax.set_yticklabels( ('{}%'.format(x) for x in range(0, 101, 10)), fontsize=15)
ax.set_xticks(df.Quarters)
ax.set_xticklabels(["{} Q{}".format('2019', x) for x in df.Quarters])
ax.legend(loc='lower left')
You can do the same for yticks as well.
PS: The df.Quarters doesn't include year, so I am assuming 2019.

How do I use grouped data to plot rainfall averages in specific hourly ranges

I extracted the following data from a dataframe .
https://i.imgur.com/rCLfV83.jpg
The question is, how do I plot a graph, probably a histogram type, where the horizontal axis are the hours as bins [16:00 17:00 18:00 ...24:00] and the bars are the average rainfall during each of those hours.
I just don't know enough pandas yet to get this off the ground so I need some help. Sample data below as requested.
Date Hours `Precip`
1996-07-30 21 1
1996-08-17 16 1
18 1
1996-08-30 16 1
17 1
19 5
22 1
1996-09-30 19 5
20 5
1996-10-06 20 1
21 1
1996-10-19 18 4
1996-10-30 19 1
1996-11-05 20 3
1996-11-16 16 1
19 1
1996-11-17 16 1
1996-11-29 16 1
1996-12-04 16 9
17 27
19 1
1996-12-12 19 1
1996-12-30 19 10
22 1
1997-01-18 20 1
It seems df is a multi-index DataFrame after a groupby.
Transform the index to a DatetimeIndex
date_hour_idx = df.reset_index()[['Date', 'Hours']] \
.apply(lambda x: '{} {}:00'.format(x['Date'], x['Hours']), axis=1)
precip_series = df.reset_index()['Precip']
precip_series.index = pd.to_datetime(date_hour_idx)
Resample to hours using 'H'
# This will show NaN for hours without an entry
resampled_nan = precip_series.resample('H').asfreq()
# This will fill NaN with 0s
resampled_fillna = precip_series.resample('H').asfreq().fillna(0)
If you want this to be the mean per hour, change your groupby(...).sum() to groupby(...).mean()
You can resample to other intervals too -> pandas resample documentation
More about resampling the DatetimeIndex -> https://pandas.pydata.org/pandas-docs/stable/reference/resampling.html
It seems to be easy when you have data.
I generate artificial data by Pandas for this example:
import pandas as pd
import radar
import random
'''>>> date'''
r2 =()
for a in range(1,51):
t= (str(radar.random_datetime(start='1985-05-01', stop='1985-05-04')),)
r2 = r2 + t
r3 =list(r2)
r3.sort()
#print(r3)
'''>>> variable'''
x = [random.randint(0,16) for x in range(50)]
df= pd.DataFrame({'date': r3, 'measurement': x})
print(df)
'''order'''
col1 = df.join(df['date'].str.partition(' ')[[0,2]]).rename({0: 'daty', 2: 'godziny'}, axis=1)
col2 = df['measurement'].rename('pomiary')
p3 = pd.concat([col1, col2], axis=1, sort=False)
p3 = p3.drop(['measurement'], axis=1)
p3 = p3.drop(['date'], axis=1)
Time for sum and plot:
dx = p3.groupby(['daty']).mean()
print(dx)
import matplotlib.pyplot as plt
dx.plot.bar()
plt.show()
Plot of the mean measurements

How to show label as it is in the data?

I am plotting a graph between time and value. I want these time label to be shown as 2018-06-30 18:35:45 as in the main csv data.
Instead the graph on x axis shows time as 06:30 20.
How can the labels of x axis can be exact as mentioned in the main data.
Code I used is:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Mlogi_ALL_idle_day.csv')
df['time']=pd.to_datetime(df['time'], unit='ns')
x = df['time']
y = df['d']
fig, ax = plt.subplots()
ax.plot_date(x, y, linestyle='-')
plt.title('Idle for 1 July - 2 July 2018')
plt.legend()
plt.ylabel('duration')
plt.xlabel('Time')
fig.autofmt_xdate()
plt.show()
And the data look like:
In[10]:df.head()
Out[10]:
time d
0 2018-06-30 18:35:45 41000000000
1 2018-06-30 18:36:47 44000000000
2 2018-06-30 18:37:46 43000000000
3 2018-06-30 18:38:46 40000000000
4 2018-06-30 18:39:47 43000000000
To change the date label of the major x-axis tick you can use
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M:%S'))