KeyError while plotting a graph in matplotlib - pandas

I am trying to plot a simple graph for the dataframe below
indeces Zeitstempel Ergebnis
0 382 16.04.2020 16:12:07 PASS
1 383 16.04.2020 16:13:07 PASS
2 392 16.04.2020 16:13:20 FAIL
3 382 16.04.2020 16:13:22 PASS
4 383 16.04.2020 16:14:22 PASS
It has three columns. The x-axis should be Zeitstempel, y-axis should be indeces and I would also want to specify the values in Ergebnis column(maybe color coding green for PASS,red for FAIL and grey for BLOCKED)
as to which index is passing or failing or blocking at what time. The actual dataframe has 1172 rows × 3 columns values but in the above i have only mentioned few.
The code I am trying is as below but somehow I am not able to figure out how to plot all the 3 as required.
times = pd.date_range('2020-04-16 04:12 AM', '2020-04-16 11:00 PM', freq='1H')
fig, ax = plt.subplots(1)
fig.autofmt_xdate()
df.plot(kind='line',x='times',y='Index',ax=ax)
xfmt = mdates.DateFormatter('%d-%m-%y %H:%M')
ax.xaxis.set_major_formatter(xfmt)
ax = plt.gca()
plt.show()
times has Zeitstempel values and Index has indeces values stored in them. This gives me KeyError. Is there a simpler way to do this? I am new to matplotlib and I am running out of possibilities. Please suggest.

Take a look at the answer I posted at: How to read a dataframe in np.genfromtxt instead of a file in matplotlib. It shows how to load data from a csv file with np.genfromtxt() then generate the desired color coded plot (similar to what you want to do). If you can map your Pandas data to a NumPy recarray, the rest of the process will still work the same.
I'm not familiar with Pandas, so can only supply pseudo-code. It will look something like this. Replace this call to np.genfromtxt() (that reads the csv data from a file):
csv_arr = np.genfromtxt(csv, # Data to be read
...)
With the following lines to create the recarray. (I reused the csv_arr name to simplify. Feel free to use any name you like):
csv_dt = np.dtype([ ('indeces', '<i4'), ('Zeitstempel', 'O'), ('Ergebnis', '<U7') ])
csv_arr = np.empty(shape=(nrows,), dtype=csv_dt)
csv_arr['Zeitstempel'] = # pandas Zeitstempel data goes here as a numpy array
csv_arr['indeces'] = # pandas indeces data goes here as a numpy array
csv_arr['Ergebnis'] = # pandas Ergebnis data goes here as a numpy array
After you add your Pandas data to csv_arr, the rest of the code should work the same and create the same plot shown in the referenced answer. Good luck.

Related

Generating a mouse heatmap with X, Y coordinates

I'm trying to use Python to generate a mouse heatmap using a large set of X, Y coordinates. I've imported the CSV using Pandas, here's the first few rows to get an idea of what it looks like:
X Y
0 2537 638
1 2516 637
2 2451 644
3 2317 652
4 2147 658
5 1999 647
I've tried using Matplotlib with not a lot of success, so swapped over to Seaborn to attempt to generate the heatmap that way. For reference, this is what I'm hoping to generate (with a different image in the background):
https://imgur.com/s5qiBsB
This is what my current code looks like:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
df = pd.read_csv(r'C:\Users\Jen\Desktop\mp.csv')
df[["x", "y"]] = pd.DataFrame.to_numpy(df)
matrix = np.zeros((df.x.max()+1, df.y.max()+1))
matrix[df.x, df.y] = df.index
sns.heatmap(matrix, cmap='jet')
plt.show()
With the following as a result:
https://imgur.com/12dMBsk
Obviously, this isn't exactly what I'm going for. First off, my x and y axes are swapped. What do I need to do to make my result look more like the example I provided? How do I create that blob effect around the different points?
More than happy to try anything at this point. This dataset is about 13,000 rows but I anticipate it will be even larger in the future.
(For reference, these were captured using 2 monitors, each at a resolution of 1650x1050, hence the large x values)

one-line dataframe df.to_csv fails, flipping all the data

I'm using a dataframe to calculate a bunch of stuff, with results winding up in the SECOND-TO-LAST LINE of the df.
I need to append JUST THAT ONE LINE to a CSV file.
Instead of storing the labels across and data beneath, the thing continually puts labels in the first column, with the data in the second column.
Subsequent writes keep appending data DOWN - under the first column.
I'm using code like this:
if not os.path.isfile(csvFilePath):
df.iloc[-2].to_csv(csvFilePath, mode='w', index=True, sep=';', header=True)
else:
df.iloc[-2].to_csv(csvFilePath, mode='a', index=False, sep=';', header=False)
The "csv" file it produces looks like this (two iterations):
;2021-04-29 07:00:00
open;54408.26
high;54529.67
low;54300.0
close;54500.0
volume;180.44990968
ATR;648.08
RSI;41.2556049907123
ticker;54228.51
BidTarget_1;53012.42
Bdistance_1;1216.0
BidTarget_2;54031.94
BCOGdistance_2;197.0
AskTarget_1;54934.18
ACOGdistance_1;705.67
AskTarget_2;55494.92
ACOGdistance_2;1266.41
TotBid;207.34781091999974
TotAsk;199.80037382000046
AskBidRatio;0.96
54408.26
54529.67
54300.0
54500.0
180.44990968
648.08
41.2556049907123
54071.49
53011.46
1060.0
53665.5
406.0
54620.97
549.48
54398.77
327.28
208.08094453999973
186.65960602000038
0.9
I'm at a complete loss ...
I start with a .csv that contains
hello, from, this, file
another, amazing, line, csv
save, this, line, data
last, line, of, file
where the second-to-last line is the desired output.
I think you can get what you want by using
import pandas
df = pandas.read_csv("myfile.csv", header=None)
df.iloc[-2].to_frame().transpose()
The trick is that df.iloc[-2] returns a Pandas Series. You can determine the datatype using
type(df.iloc[-2])
which returns pandas.core.series.Series. I'm not sure why the Pandas Series are oriented the way they are.
The Pandas Series can be converted back to a dataframe using df.iloc[-2].to_frame(), but the orientation is flipped 90 degrees (matching the Series orientation). To get back to the desired orientation, the transformation called transpose (flip about the diagonal) is needed.

How can I specify multiple variables for the hue parameters when plotting with seaborn?

When using seaborn, is there a way I can include multiple variables (columns) for the hue parameter? Another way to ask this question would be how can I group my data by multiple variables before plotting them on a single x,y axis plot?
I want to do something like below. However currently I am not able to specify two variables for the hue parameter.:
sns.relplot(x='#', y='Attack', hue=['Legendary', 'Stage'], data=df)
For example, assume I have a pandas DataFrame like below containing an a Pokemon database obtained via this tutorial.
I want to plot on the x-axis the pokedex #, and the y-axis the Attack. However, I want to data to be grouped by both Stage and Legendary. Using matplotlib, I wrote a custom function that groups the dataframe by ['Legendary','Stage'], and then iterates through each group for the plotting (see results below). Although my custom function works as intended, I was hoping this can be achieved simply by seaborn. I am guessing there must be other people what have attempted to visualize more than 3 variables in a single plot using seaborn?
fig, ax = plt.subplots()
grouping_variables = ['Stage','Legendary']
group_1 = df.groupby(grouping_variables)
for group_1_label, group_1_df in group_1:
ax.scatter(group_1_df['#'], group_1_df['Attack'], label=group_1_label)
ax_legend = ax.legend(title=grouping_variables)
Edit 1:
Note: In the example I provided, I grouped the data by obly two variables (ex: Legendary and Stage). However, other situations may require arbitrary number of variables (ex: 5 variables).
You can leverage the fact that hue accepts either a column name, or a sequence of the same length as your data, listing the color categories to assign each data point to. So...
sns.relplot(x='#', y='Attack', hue='Stage', data=df)
... is basically the same as:
sns.relplot(x='#', y='Attack', hue=df['Stage'], data=df)
You typically wouldn't use the latter, it's just more typing to achieve the same thing -- unless you want to construct a custom sequence on the fly:
sns.relplot(x='#', y='Attack', data=df,
hue=df[['Legendary', 'Stage']].apply(tuple, axis=1))
The way you build the sequence that you pass via hue is entirely up to you, the only requirement is that it must have the same length as your data, and if an array-like, it must be one-dimensional, so you can't just pass hue=df[['Legendary', 'Stage']], you have to somehow concatenate the columns into one. I chose tuple as the simplest and most versatile way, but if you want to have more control over the formatting, build a Series of strings. I'll save it into a separate variable here for better readability and so that I can assign it a name (which will be used as the legend title), but you don't have to:
hue = df[['Legendary', 'Stage']].apply(
lambda row: f"{row.Legendary}, {row.Stage}", axis=1)
hue.name = 'Legendary, Stage'
sns.relplot(x='#', y='Attack', hue=hue, data=df)
To use hue of seaborn.relplot, consider concatenating the needed groups into a single column and then run the plot on new variable:
def run_plot(df, flds):
# CREATE NEW COLUMN OF CONCATENATED VALUES
df['_'.join(flds)] = pd.Series(df.reindex(flds, axis='columns')
.astype('str')
.values.tolist()
).str.join('_')
# PLOT WITH hue
sns.relplot(x='#', y='Attack', hue='_'.join(flds), data=random_df, aspect=1.5)
plt.show()
plt.clf()
plt.close()
To demonstrate with random data
Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
### DATA
np.random.seed(22320)
random_df = pd.DataFrame({'#': np.arange(1,501),
'Name': np.random.choice(['Bulbasaur', 'Ivysaur', 'Venusaur',
'Charmander', 'Charmeleon'], 500),
'HP': np.random.randint(1, 100, 500),
'Attack': np.random.randint(1, 100, 500),
'Defense': np.random.randint(1, 100, 500),
'Sp. Atk': np.random.randint(1, 100, 500),
'Sp. Def': np.random.randint(1, 100, 500),
'Speed': np.random.randint(1, 100, 500),
'Stage': np.random.randint(1, 3, 500),
'Legend': np.random.choice([True, False], 500)
})
Plots
run_plot(random_df, ['Legend', 'Stage'])
run_plot(random_df, ['Legend', 'Stage', 'Name'])
In seaborn's scatterplot(), you can combine both a hue= and a style= parameter to produce different markers and different colors for each combinations
example (taken verbatim from the documentation):
tips = sns.load_dataset("tips")
ax = sns.scatterplot(x="total_bill", y="tip", data=tips)
ax = sns.scatterplot(x="total_bill", y="tip",
hue="day", style="time", data=tips)

Seaborn time series plotting: a different problem for each function

I'm trying to use seaborn dataframe functionality (e.g. passing column names to x, y and hue plot parameters) for my timeseries (in pandas datetime format) plots.
x should come from a timeseries column(converted from a pd.Series of strings with pd.to_datetime)
y should come from a float column
hue comes from a categorical column that I calculated.
There are multiple streams in the same series that I am trying to separate (and use the hue for separating them visually), and therefore they should not be connected by a line (like in a scatterplot)
I have tried the following plot types, each with a different problem:
sns.scatterplot: gets the plotting right and the labels right bus has problems with the xlimits, and I could not set them right with plt.xlim() using data.Dates.min and data.Dates.min
sns.lineplot: gets the limits and the labels right but I could not find a setting to disable the lines between the individual datapoints like in matplotlib. I tried the setting the markers and the dashes parameters to no avail.
sns.stripplot: my last try, plotted the datapoints correctly and got the xlimits right but messed the labels ticks
Example input data for easy reproduction:
dates = pd.to_datetime(('2017-11-15',
'2017-11-29',
'2017-12-15',
'2017-12-28',
'2018-01-15',
'2018-01-30',
'2018-02-15',
'2018-02-27',
'2018-03-15',
'2018-03-27',
'2018-04-13',
'2018-04-27',
'2018-05-15',
'2018-05-28',
'2018-06-15',
'2018-06-28',
'2018-07-13',
'2018-07-27'))
values = np.random.randn(len(dates))
clusters = np.random.randint(1, size=len(dates))
D = {'Dates': dates, 'Values': values, 'Clusters': clusters}
data = pd.DataFrame(D)
To each of the functions I am passing the same arguments:
sns.OneOfThePlottingFunctions(x='Dates',
y='Values',
hue='Clusters',
data=data)
plt.show()
So to recap, what I want is a plot that uses seaborn's pandas functionality, and plots points(not lines) with correct x limits and readable x labels :)
Any help would be greatly appreciated.
ax = sns.scatterplot(x='Dates', y='Values', hue='Clusters', data=data)
ax.set_xlim(data['Dates'].min(), data['Dates'].max())

Multiple Axes and Plots

sorry if the post, is not that good. It's the first one for me on Stack Overflow.
I have Datasets in the following structure:
Revolution1 Position1 Temperature1 Revolution2 Position2 Temperature2
1/min mm C 1/min m C
datas....
I plot these against the time. Now I want for every different unit a new y axis. So i looked in the matplotlib example and wrote something like this. X ist the X-Values and d is the pandas dataframe:
fig,host=plt.subplots()
fig.subplots_adjust(right=0.75)
par1 = host.twinx()
par2 = host.twinx()
uni_units = np.unique(units[1:])
par2.spines["right"].set_position(("axes", 1.2))
make_patch_spines_invisible(par2)
# Second, show the right spine.
par2.spines["right"].set_visible(True)
for i,v in enumerate(header[1:]):
if d.loc[0,v] == uni_units[0]:
y=d.loc[an:en,v].values
host.plot(x,y,label=v)
if d.loc[0,v] == uni_units[1]:
(v,ct_yax[1]))
y=d.loc[an:en,v].values
par1.plot(x,y,label=v)
if d.loc[0,v] == uni_units[2]:
y=d.loc[an:en,v].values
par2.plot(x,y,label=v)
EDIT: Okay i really missed to ask the question (maybe i was nervous, because it was the first time posting here):
I actually wanted to ask why it does not work, since i only saw 2 plots. But by zooming in I saw it actually plots every curve...
sorry!
If I understand correctly what you want is to get subplots from the Dataframe.
You can achieve such using the subplots parameter within the plotfunction you have under the Dataframe object.
With below toy sample you can get a better idea on how to achieve this:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"y1":[1,5,3,2],"y2":[10,12,11,15]})
df.plot(subplots=True)
plt.show()
Which produces below figure:
You may check documentation about subplots for pandas Dataframe.