MatplotLib/Pandas Using Time as X Axis - pandas

I am working on a project where I would like to read sensor data from a CSV file and do a live graph.
I am using Matplotlib for the graphing and Pandas for the data handling.
For the CSV I am using:
Column 0= Pass/Fail Boolean
Column 1= Time in %h:%m:%s format.
Column 3= Error Code (int64)
When I run the script I get a "ValueError: values must be a 1D array".
I believe its coming from the time data, but when i check the dtype it is a datetime64 as expected. My program is below:
from itertools import count
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
plt.style.use('fivethirtyeight')
x_vals = []
y_vals = []
index = count()
def animate(i):
# Use Pandas to Read CSV and create a Dataframe. Use KWARGS to choose columns, and then
specify name and type of data.
data = pd.read_csv('C:/Python/20220124.csv',usecols=[0, 1, 3], names=["Pass", "Time",
"Error Code"], header= None, parse_dates=[1], dtype={"Pass": 'boolean', "Error Code":
'Int64'})
pd.to_datetime(data['Time'])
y1 = data['Pass']
x1 = data['Time']
y2 = data['Error Code']
# Pyplot Clear Axes
plt.cla()
#Pyplot Plot data in line graphs
plt.plot(x1, y1, label='Pass/Fail', lw=3, c='c', marker='o', markersize=4, mfc='k')
plt.plot(x1, y2, label='Error Code', lw=2, ls='--', c='k')
plt.legend(loc='upper left')
plt.tight_layout()
#Pyplot Get Current Axes
ax = plt.gca()
ani = FuncAnimation(plt.gcf(), animate, interval=1000)
plt.tight_layout()
plt.show()

You can do this using only pandas. Pandas also contain a pre-built function for visualization :
df = data[['Error Code', 'Time']] # Create a pandas series contain the data that will be ploted
df.set_index('Time', inplace = True) # set the time as an index (it will serve as x-axis)
df.plot() # plot the graph

Related

Barplot per each ax in matplotlib

I have the following dataset, ratings in stars for two fictitious places:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'id':['A','A','A','A','A','A','A','B','B','B','B','B','B'],
'rating':[1,2,4,5,5,5,3,1,3,3,3,5,2]})
Since the rating is a category (is not a continuous data) I convert it to a category:
df['rating_cat'] = pd.Categorical(df['rating'])
What I want is to create a bar plot per each fictitious place ('A or B'), and the count per each rating. This is the intended plot:
I guess using a for per each value in id could work, but I have some trouble to decide the size:
fig, ax = plt.subplots(1,2,figsize=(6,6))
axs = ax.flatten()
cats = df['rating_cat'].cat.categories.tolist()
ids_uniques = df.id.unique()
for i in range(len(ids_uniques)):
ax[i].bar(df[df['id']==ids_uniques[i]], df['rating'].size())
But it returns me an error TypeError: 'int' object is not callable
Perhaps it's something complicated what I am doing, please, could you guide me with this code
The pure matplotlib way:
from math import ceil
# Prepare the data for plotting
df_plot = df.groupby(["id", "rating"]).size()
unique_ids = df_plot.index.get_level_values("id").unique()
# Calculate the grid spec. This will be a n x 2 grid
# to fit one chart by id
ncols = 2
nrows = ceil(len(unique_ids) / ncols)
fig = plt.figure(figsize=(6,6))
for i, id_ in enumerate(unique_ids):
# In a figure grid spanning nrows x ncols, plot into the
# axes at position i + 1
ax = fig.add_subplot(nrows, ncols, i+1)
df_plot.xs(id_).plot(axes=ax, kind="bar")
You can simplify things a lot with Seaborn:
import seaborn as sns
sns.catplot(data=df, x="rating", col="id", col_wrap=2, kind="count")
If you're ok with installing a new library, seaborn has a very helpful countplot. Seaborn uses matplotlib under the hood and makes certain plots easier.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'id':['A','A','A','A','A','A','A','B','B','B','B','B','B'],
'rating':[1,2,4,5,5,5,3,1,3,3,3,5,2]})
sns.countplot(
data = df,
x = 'rating',
hue = 'id',
)
plt.show()
plt.close()

Building a histogram

How can a distribution histogram similar to this one be constructed based on the data from the table?
enter image description here
enter image description here
Code python:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Data.xlsx')
print(df)
df.plot.hist(df)
plt.show()
It isn't clear exactly what the x and y axes of your desired plot are. Hopefully this will get you started. Sometimes trying to comeup with a MRE will help you solve your own problem.
import random
import pandas as pd
import matplotlib.pyplot as plt
#######################################
# generate some random data for a MWE #
#######################################
random.seed(22)
data = [random.randint(0, 100) for _ in range(0, 10)]
data = pd.Series(sorted(data))
freqs = [random.uniform(0, 1) for _ in range(0, 10)]
freqs = sorted(freqs)
freqs = pd.Series(freqs)
df = pd.DataFrame()
df['data'] = data
df['frequencies'] = freqs
###############################################
# Desired bar plot using pandas built in plot #
###############################################
df.plot(x='data', y='frequencies', kind='bar')
plt.show()

Matplotlib to Create histogram by Row

I have three arrays that essentially correspond to a matrix of gene expression values and then column labels specifying condition IDs and row values specifying a specific gene. I'm trying to define a function that will plot a histogram by just providing the gene name.
Basically I need to specify YAL001C and create a histogram of the values across the row. I'm very new to matplotlib and I'm not sure how do this. Would it have something to do with using something like an np.where(gene = YAL001C) argument? I guess I'm just not sure where that would fit into code for matplotlib.
I currently have the following code, but it doesn't work:
def histogram(gene):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
x = np.where(geneList == gene, exprMat)
bins = 50
ax.hist(x, bins, color = 'green', edgecolor = 'black', alpha = 0.8 )
plt.show()
In case you want to avoid using pandas, you can still accomplish what you want using numpy, but you need to add some codes to figure out what row corresponds to a given gene. Here is one of the ways you could code it:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[0.15, -0.22, 0.07],
[-0.07, -0.76, -0.12],
[-1.22, -0.27, -0.1],
[-0.09, 1.2, 0.16]
])
def plot_hist(gene):
list_genes = ['YAL001C', 'YAL002W', 'YAL003W', 'YAL004W']
if gene in list_genes:
sn_gene = list_genes.index(gene)
else:
print(f'{gene} is not in the list of genes')
return
fig, ax = plt.subplots(figsize=(6,4))
plt.hist(data[sn_gene,:])
plt.title(f'gene: {gene}')
plt.show()
plot_hist('YAL001C')
Here is one of the ways you could accomplish that (passing the data related to the corresponding row to the method):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.array([[0.15, -0.22, 0.07],
[-0.07, -0.76, -0.12],
[-1.22, -0.27, -0.1],
[-0.09, 1.2, 0.16]
])
df = pd.DataFrame(data=data,
index=['YAL001C', 'YAL002W', 'YAL003W', 'YAL004W'],
columns=['cln3-1', 'cln3-2', 'clb'])
print(df)
def plot_hist(gene):
fig, ax = plt.subplots(1,2, figsize=(9,4))
ax[0].bar(df.columns, df.loc[gene])
ax[1].hist(df.loc[gene])
plt.show()
plot_hist('YAL001C')
Left: bar-plot, Right: histogram

How to plot y-axis with specific scientific format?

I want to plot my graph with a scientific number in y-axis. I have used ticklabel format from matplotlib. But I am not getting my desire output in y-axis label. I have attached my script with output image (image_1) and image_2 is my desire one.
Code:
import numpy as np
import matplotlib.pyplot as plt
x1, y1 = [], []
label_added =False
with open("50kev_vacancy.txt") as f:
for line in f:
cols = line.split()
x1.append(float(cols[0]))
y1.append(float(cols[3]))
if not label_added:
plt.plot(x1,y1,'b-', label="50kev")
label_added = True
else:
plt.plot(x1,y1,'b-')
plt.title('Different PKA energy')
plt.xlabel('time_ps')
plt.ylabel('Number of vacancy')
plt.ticklabel_format(style='sci', axis='y', scilimits=(0,0))
legend = plt.legend(loc='upper center', shadow=True, fontsize='x-large')
plt.tight_layout()
plt.savefig("Different_PKA_energy_vacancy_vs_time.jpeg", dpi=50)
Output:
desired output:

Pandas histogram df.hist() group by

How to plot a histogram with pandas DataFrame.hist() using group by?
I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"
There are two Groups classes: "yes" and "no"
Using:
df.hist()
I get the hist for each of the 4 columns.
Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").
I tried this withouth success:
df.hist(by = "group")
Using Seaborn
If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid.
import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)
df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')
bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
This is not the most flexible workaround but will work for your question specifically.
def sephist(col):
yes = df[df['group'] == 'yes'][col]
no = df[df['group'] == 'no'][col]
return yes, no
for num, alpha in enumerate('abcd'):
plt.subplot(2, 2, num)
plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
plt.legend(loc='upper right')
plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
You could make this more generic by:
adding a df and by parameter to sephist: def sephist(df, by, col)
making the subplots loop more flexible: for num, alpha in enumerate(df.columns)
Because the first argument to matplotlib.pyplot.hist can take
either a single array or a sequency of arrays which are not required
to be of the same length
...an alternattive would be:
for num, alpha in enumerate('abcd'):
plt.subplot(2, 2, num)
plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
plt.legend(loc='upper right')
plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
I generalized one of the other comment's solutions. Hope it helps someone out there. I added a line to ensure binning (number and range) is preserved for each column, regardless of group. The code should work for both "binary" and "categorical" groupings, i.e. "by" can specify a column wherein there are N number of unique groups. Plotting also stops if the number of columns to plot exceeds the subplot space.
import numpy as np
import matplotlib.pyplot as plt
def composite_histplot(df, columns, by, nbins=25, alpha=0.5):
def _sephist(df, col, by):
unique_vals = df[by].unique()
df_by = dict()
for uv in unique_vals:
df_by[uv] = df[df[by] == uv][col]
return df_by
subplt_c = 4
subplt_r = 5
fig = plt.figure()
for num, col in enumerate(columns):
if num + 1 > subplt_c * subplt_r:
continue
plt.subplot(subplt_c, subplt_r, num+1)
bins = np.linspace(df[col].min(), df[col].max(), nbins)
for lbl, sepcol in _sephist(df, col, by).items():
plt.hist(sepcol, bins=bins, alpha=alpha, label=lbl)
plt.legend(loc='upper right', title=by)
plt.title(col)
plt.tight_layout()
return fig
TLDR oneliner;
It won't create the subplots but will create 4 different plots;
[df.groupby('group')[i].plot(kind='hist',title=i)[0] and plt.legend() and plt.show() for i in 'ABCD']
Full working example below
import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)
[df.groupby('group')[i].plot(kind='hist',title=i)[0] and plt.legend() and plt.show() for i in 'ABCD']