I have plotted a graph using data I have on excel and have the values of the x and y axes. However, I want to change the value on the x-axis by just presenting specific values which would reflect the key days on the axis only. Is that possible?
Here is the code I have written:
import pandas as pd
from matplotlib import pyplot as plt #download matplot library
#create a graph of the cryptocurrencies in excel
btc = pd.read_excel('/Users/User/Desktop/bitcoin_prices.xlsx')
btc.set_index('Date', inplace=True) #Chart Fit
btc.plot()
plt.xlabel('Date', fontsize= 12)
plt.ylabel('Price ($)', fontsize= 12)
plt.title('Cryptocurrency Prices', fontsize=15)
plt.figure(figsize=(60,40))
plt.show() #plot then show the file
Thank you.
I guess you want the program to recognize the datetime format of the 'Date' column. Supply parse_dates=['Dates'] to the loading call. Then you can index your data for certain days. For example:
import datetime as dt
import numpy as np
import pandas as pd
btc = pd.read_csv('my_excel_data.xlsx', parse_dates=['Dates'], index_col='Dates')
selected_time = np.arange(dt.datetime(2015, 1, 1), dt.datetime(2016, 1, 1), dt.timedelta(7))
btc_2015 = btc.loc[selected_time, :]
If you want specific labels for specific dates you have to read into axes and date formatters
Related
I have a pandas dataframe with lots of time intervals of varying start times and lengths. I am interested in the distribution of start times over 24hours. I therefore have another column entitled Hour with just that in. I have plotted a histogram using seaborn to look at the distribution but obviously the x axis starts at 0 and runs to 24. I wonder if there is a way to change so it runs from 8 to 8 and loops over at 23 to 0 so it provides a better visualisation of my data from a time perspective. Thanks in advance.
sns.distplot(df2['Hour'], bins = 24, kde = False).set(xlim=(0,23))
If you want to have a custom order of x-values on your bar plot, I'd suggest using matplotlib directly and plot your histogram simply as a bar plot with width=1 to get rid of padding between bars.
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
# prepare sample data
dates = pd.date_range(
start=datetime(2020, 1, 1),
end=datetime(2020, 1, 7),
freq="H")
random_dates = np.random.choice(dates, 1000)
df = pd.DataFrame(data={"date":random_dates})
df["hour"] = df["date"].dt.hour
# set your preferred order of hours
hour_order = [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7]
# calculate frequencies of each hour and sort them
plot_df = (
df["hour"]
.value_counts()
.rename_axis("hour", axis=0)
.reset_index(name="freq")
.set_index("hour")
.loc[hour_order]
.reset_index())
# day / night colour split
day_mask = ((8 <= plot_df["hour"]) & (plot_df["hour"] <= 20))
plot_df["color"] = np.where(day_mask, "skyblue", "midnightblue")
# actual plotting - note that you have to cast hours as strings
fig = plt.figure(figsize=(8,4))
ax = fig.add_subplot(111)
ax.bar(
x=plot_df["hour"].astype(str),
height=plot_df["freq"],
color=plot_df["color"], width=1)
ax.set_xlabel('Hour')
ax.set_ylabel('Frequency')
plt.show()
My DataFrame's structure
trx.columns
Index(['dest', 'orig', 'timestamp', 'transcode', 'amount'], dtype='object')
I'm trying to plot transcode (transaction code) against amount to see the how much money is spent per transaction. I made sure to convert transcode to a categorical type as seen below.
trx['transcode']
...
Name: transcode, Length: 21893, dtype: category
Categories (3, int64): [1, 17, 99]
The result I get from doing plt.scatter(trx['transcode'], trx['amount']) is
Scatter plot
While the above plot is not entirely wrong, I would like the X axis to contain just the three possible values of transcode [1, 17, 99] instead of the entire [1, 100] range.
Thanks!
In matplotlib 2.1 you can plot categorical variables by using strings. I.e. if you provide the column for the x values as string, it will recognize them as categories.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
plt.scatter(df["x"].astype(str), df["y"])
plt.margins(x=0.5)
plt.show()
In order to optain the same in matplotlib <=2.0 one would plot against some index instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
u, inv = np.unique(df["x"], return_inverse=True)
plt.scatter(inv, df["y"])
plt.xticks(range(len(u)),u)
plt.margins(x=0.5)
plt.show()
The same plot can be obtained using seaborn's stripplot:
sns.stripplot(x="x", y="y", data=df)
And a potentially nicer representation can be done via seaborn's swarmplot:
sns.swarmplot(x="x", y="y", data=df)
I'm new in matplotlib, and I want change the value's format of the axis x, like this:
Axis x : 250000, 500000, etc. Change to the format 0.250, 0.500, etc.
What can I do for this point?
Thanks
You would usually just scale the data prior to plotting. So instead of plt.plot(x,y), you'd use plt.plot(x,y/1e6).
To format the values with 3 decimal places, use a matplotlib.ticker.StrMethodFormatter and supply a format with 3 decimals, in this case "{x:.3f}".
import matplotlib.pyplot as plt
import matplotlib.ticker
import numpy as np; np.random.seed(42)
x = np.arange(5)
y = np.array([5e5,2e5,0,3e5,4e5])
plt.plot(x,y/1e6)
plt.gca().yaxis.set_major_formatter(matplotlib.ticker.StrMethodFormatter("{x:.3f}"))
plt.show()
I am plotting a time series with a date time index. The plot needs to be a particular size for the journal format. Consequently, the sticks are not readable since they span many years.
Here is a data sample
2013-02-10 0.7714492098202259
2013-02-11 0.7709101833765016
2013-02-12 0.7704911332770049
2013-02-13 0.7694975914173087
2013-02-14 0.7692108921323576
The data is a series with a datetime index and spans from 2013 to 2016. I use
data.plot(ax = ax)
to plot the data.
How can I format my xticks to read like '13 instead of 2013?
It seems there is some incompatibility between pandas and matplotlib formatters/locators when it comes to dates. See e.g. those questions:
Pandas plot - modify major and minor xticks for dates
Pandas Dataframe line plot display date on xaxis
I'm not entirely sure why it still works in some cases to use matplotlib formatters and not in others. However because of those issues, the bullet-proof solution is to use matplotlib to plot the graph instead of the pandas plotting function.
This allows to use locators and formatters just as seen in the matplotlib example.
Here the solution to the question would look as follows:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
dates = pd.date_range("2013-01-01", "2017-06-20" )
y = np.cumsum(np.random.normal(size=len(dates)))
s = pd.Series(y, index=dates)
fig, ax = plt.subplots()
ax.plot(s.index, s.values)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
yearFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearFmt)
plt.show()
According to this example, you can do the following
import matplotlib.dates as mdates
yearsFmt = mdates.DateFormatter("'%y")
years = mdates.YearLocator()
ax = df.plot()
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
Full work below
Add word value so pd.read_clipboard puts dates into index
value
2013-02-10 0.7714492098202259
2014-02-11 0.7709101833765016
2015-02-12 0.7704911332770049
2016-02-13 0.7694975914173087
2017-02-14 0.7692108921323576
Then read in data and convert index
df = pd.read_clipboard(sep='\s+')
df.index = pd.to_datetime(df.index)
I have a 2D variable XVAR from a netcdf file, with dimension [year, month]. I want to plot the flattened XVAR (1D array with length nyear*nmonth) and set up the x-axis as this: years on major ticks, and months on minor ticks. The difficulty is that I donot know how to create a 1d array at monthly step. There is no monthdelta method that I could use (though I understand that the reason is because each month has different numbers of days).
In the delta=? step below, I tried delta=relativedelta.relativedelta(months=1), but got an error "object has no attribute 'total_seconds'", which I donot completely understand.
import numpy as np
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_toolkits.basemap import Basemap
from datetime import date, timedelta
ncfile = Dataset('filepath',mode='r')
XVAR4d = ncfile.variables['XVAR'][:]
XVAR2d = np.nanmean(XVAR4d,axis=(2,3)).flatten()
yrs = ncfile.variables['YEAR']
stt = date(np.min(yrs),1,1)
end = date(np.max(yrs)+1,1,1)
delta = ?
dates = mdates.drange(stt,end,delta)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(yearsFmt)
ax1.xaxis.set_minor_locator(months)
ax1.set_xlim(stt,end)
ax1.plot(dates,xvar2d,c='r')
I decided I did like the idea of my second comment, so I am turning it into an actual proposed answer.
Instead of using drange, create the dates yourself:
totalMonths = 12*(np.max(yrs) - np.min(yrs)+1)
dates = mdates.date2num([date(np.min(yrs)+(i//12),i%12+1,1) for i in range(totalMonths)])