I am trying to plot five columns per iteration, but current code is ploting everithing five times. How to explain to it to plot five columns per iteration without repeting them?
n=4
for tag_1,tag_2,tag_3,tag_4,tag_5 in zip(df.columns[n:], df.columns[n+1:], df.columns[n+2:], df.columns[n+3:], df.columns[n+4:]):
fig,ax=plt.subplots(ncols=5, tight_layout=True, sharey=True, figsize=(20,3))
sns.scatterplot(df, x=tag_1, y='variable', ax=ax[0])
sns.scatterplot(df, x=tag_2, y='variable', ax=ax[1])
sns.scatterplot(df, x=tag_3, y='variable', ax=ax[2])
sns.scatterplot(df, x=tag_4, y='variable', ax=ax[3])
sns.scatterplot(df, x=tag_5, y='variable', ax=ax[4])
plt.show()
You are using list slicing in the wrong way. When you use df.columns[n:], you are getting all the column names from the one with index n to the last one. The same is valid for n+1, n+2, n+3 and n+4. This causes the repetition that you are referring to. In addition to that, the fact that the plot is shown five times is due to the behavior of the zip function: when used on iterables with different sizes, the iterable returned by zip has the size of the smaller one (in this case df.columns[n+4:]).
You can achieve what you want by adapting your code as follows:
# Imports to create sample data
import string
import random
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Create some sample data and a sample dataframe
data = { string.ascii_lowercase[i]: [random.randint(0, 100) for _ in range(100)] for i in range(15) }
df = pd.DataFrame(data)
# Iterate in groups of five indexes
for start in range(0, len(df.columns), 5):
# Get the next five columns. Pay attention to the case in which the number of columns is not a multiple of 5
cols = [df.columns[idx] for idx in range(start, min(start+5, len(df.columns)))]
# Adapt your plot and take into account that the last group can be smaller than 5
fig,ax=plt.subplots(ncols=len(cols), tight_layout=True, sharey=True, figsize=(20,3))
for idx in range(len(cols)):
#sns.scatterplot(df, x=cols[idx], y='variable', ax=ax[idx])
sns.scatterplot(df, x=cols[idx], y=df[cols[idx]], ax=ax[idx]) # In the example the values of the column are plotted
plt.show()
In this case, the code performs the following steps:
Iterate over groups of at most five indexes ([0->4], [5->10]...)
Recover the columns that are positioned in the previously recovered indexes. The last group of columns may be smaller than 5 (e.g., 18 columns, the last is composed of the ones with the following indexes: 15, 16, 17
Create the plot taking into account the previous corner case of less than 5 columns
With Seaborn's object interface, available from v0.12, we might do like this:
from numpy import random
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import seaborn.objects as so
sns.set_theme()
First, let's create a sample dataset, just like trolloldem's answer.
random.seed(0) # To produce the same random values across multiple runs
columns = list("abcdefghij")
sample_size = 20
df_orig = pd.DataFrame(
{c: random.randint(100, size=sample_size) for c in columns},
index=pd.Series(range(sample_size), name="variable")
)
Then transform the data frame into a long-form for easier processing.
df = (df_orig
.melt(value_vars=columns, var_name="tag", ignore_index=False)
.reset_index()
)
Then finally render the figures, 5 figures per row.
(
so.Plot(df, x="value", y="variable") # Or you might do x="variable", y="value" instead
.facet(col="tag", wrap=5)
.add(so.Dot())
)
import pandas as pd
df = pd.read_excel(some data)
df2 = df.groupby(['Country', "Year"]).sum()
It looks like this:
Sales COGS Profit Month Number
Country Year
Canada 2013 3000
Canada 2014 3500
Other countries... other data
df3 = df2[[' Sales']]
I can plot it like this with the code:
df3.plot(kind="bar")
And it produces a chart
But I want to turn it into a line chart but my result from a simple plot is:
Stuck as to what one-liner will produce a chart that segments time on the x-axis but plots sales on y-axis with lines for different countries.
You have to stack Country column:
import matplotlib.pyplot as plt
df2 = df.groupby(['Country', 'Year'])['Sales'].sum().unstack('Country')
# Or df2.plot(title='Sales').set_xticks(df2.index)
ax = df2.plot(title='Sales')
ax.set_xticks(df2.index)
plt.show()
Output:
I am new to python.
I am analyzing a dataset and need some help in plotting the barplot with error bars showing SD.
Check an example data set below at the following link https://drive.google.com/file/d/10JDr7d_vhEocWzChg-sfBEumsWVghFS8/view?usp=sharing
Here is the code that I am using;
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
df = pd.read_excel('Sample_data.xlsx')
#Adding a column 'Total' by adding all cell counts in each row
#This will give the cells counted in each sample
df['Total'] = df['Cell1'] + df['Cell2'] + df['Cell3'] + df['Cell4']
df
# Creating a pivot table based on Timepoint and cell types
phenotype = df.pivot_table (index = ['Timepoint'],
values=['Cell1',
'Cell2',
'Cell3',
'Cell4'],
aggfunc = np.sum,
margins = False)
phenotype
# plot different cell types grouped according to the timepoint and error bars = SD
sns.barplot(data = phenotype)
Now I am stuck in plotting cell types based on timepoint column and putting error bars = SD.
Any help is much appreciated.
Thanks.
If you swap the rows and columns from pivot, you get the format you want. Does this fit the intent of your question?
phenotype = df.pivot_table (index = ['Time point'],
values=['Cell1', 'Cell2', 'Cell3', 'Cell4'],
aggfunc = np.sum,
margins = False)
phenotype.reset_index()
phenotype = phenotype.stack().unstack(level=0)
phenotype
Time point 48 72 96
Cell1 54 395 57
Cell2 33 35 39
Cell3 1 3 9
Cell4 2 6 3
sns.boxplot(data = phenotype)
I am working on the Spotify dataset from Kaggle. I plotted a barplot showing the top artists with most songs in the dataframe.
But the X-axis is showing numbers and I want to show names of the Artists.
names = list(df1['artist'][0:19])
plt.figure(figsize=(8,4))
plt.xlabel("Artists")
sns.barplot(x=np.arange(1,20),
y=df1['song_title'][0:19]);
I tried both list and Series object type but both are giving error.
How to replace the numbers in xticks with names?
Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Data
Data from Spotify - All Time Top 2000s Mega Dataset
df = pd.read_csv('Spotify-2000.csv')
titles = pd.DataFrame(df.groupby(['Artist'])['Title'].count()).reset_index().sort_values(['Title'], ascending=False).reset_index(drop=True)
titles.rename(columns={'Title': 'Title Count'}, inplace=True)
# titles.head()
Artist Title Count
Queen 37
The Beatles 36
Coldplay 27
U2 26
The Rolling Stones 24
Plot
plt.figure(figsize=(8, 4))
chart = sns.barplot(x=titles.Artist[0:19], y=titles['Title Count'][0:19])
chart.set_xticklabels(chart.get_xticklabels(), rotation=90)
plt.show()
OK, so I didnt know this, although now it seems stupid not to do so in hindsight!
Pass names(or string labels) in the argument for X-axis.
use plt.xticks(rotate=90) so the labels don't overlap
%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')
import numpy as np
import pandas as pd
my_data = np.array([[ 0.110622 , 0.98174432, 0.56583323],
[ 0.61825694, 0.14166864, 0.44180003],
[ 0.02572145, 0.55764373, 0.24183103],
[ 0.98040318, 0.76171712, 0.41994361],
[ 0.49859658, 0.76637672, 0.75487683]])
pd.DataFrame(my_data).plot(kind='bar', stacked='true')
Using the above code I get:
How do I change this so that the hight of every bar is the max value for that bar instead of the sum, and so all the lower values for the bar are in the same bar as different colors?
Thanks for your help.
If I understood well your question, I would normalize your data multiplying each value by the current maximum and then divided by the sum of all elements. So that:
df = df.apply(lambda x: x*df.max(axis=1)/df.sum(axis=1))
where:
df = pd.DataFrame(my_data)
The new plot is:
df.plot(kind='bar', stacked='true')
Hope that helps.