Scale_fill_manual using column of a dataframe - ggplot2

Using ggplot2 I want to plot a barplot using colors included in the column of that dataframe
Reproducible example below:
df<-cbind.data.frame("bar_colors"=c("red","blue","green"), "Counts"=c(10,20,30), "Spp"=c("a","a","a"))
ggplot(data=df, aes(x=Spp,y=as.numeric(as.character(Counts)), fill=bar_colors)) +
geom_bar(stat="identity")+scale_fill_manual(values = as.character(df$bar_colors))
But the colors are mixed-up. What do I miss?
Thanks
EDIT:
Solved myself by releveling the factor:
df$bar_colors<-factor(df$bar_colors, levels = as.character(df$bar_colors))

Related

merging legends when creating a facet chart in Altair

I am trying to create a facet grid of plots in Altair. I have two different data frames that have a common x axis, and a common category for the facet, but each with a different column for determining the color. To plot, I merge these into a single data frame. The problem I am having is that the legend is being displayed on each plot individually. I want the legend just to appear once, at the side of the facet. Here is a simple example of what I am trying to do and the current results.
import pandas as pd
import altair as alt
df1 = pd.DataFrame({'x':[1,2,3,1,2,3,1,2,3],
'y1':[6,7,8,1,3,5,9,8,7],
'cat':['A','A','A','B','B','B','C','C','C'],
'E1':[120,120,120,200,200,200,80,80,80]})
df2 = pd.DataFrame({'x':[1,2,3,4,1,2,3,4,1,2,3,4,5],
'y2':[6,8,8,9,2,4,6,8,9,7,5,4,3],
'cat':['A','A','A','A','B','B','B','B','C','C','C','C','C'],
'E2':[2,1,3,2,1,1,3,2,3,2,2,2,3]})
merged = pd.merge(df2,df1, how='outer', on=['cat','x'])
p1 = alt.Chart(merged).mark_line().encode(
x='x:Q',
y='y1:Q',
color=alt.Color('E1:Q', scale=alt.Scale(scheme='viridis'), bin=alt.Bin(maxbins=5))
)
p2 = alt.Chart(merged).mark_circle().encode(
x='x:Q',
y='y2:Q',
color=alt.Color('E2:N', scale=alt.Scale(domain=[1,2,3],range=['black','red','blue']))
)
alt.layer(p1 + p2).facet('cat:N')

How Can I Add A Regression Line to pandas.plot(kind='bar)?

I'd like to add a regression line for each flavor below. How can I do that? Do I need to use subplots? Is it possible using pandas.plot or do I need to use the full matplotlib?
import pandas as pd
# initialize list of lists
data = [[1,157.842730083188,202.290991182781,244.849416438322],
[2,234.516775578511,190.104435611797,202.157088214941],
[3,198.279130213755,193.075780258345,194.112394276613],
[4,156.285653517235,198.382900113055,185.380696178104],
[5,190.653607667334,208.807038546447,202.662790911701],
[6,192.027054343382,168.768097007287,179.315293388299],
[7,144.927513854729,166.183469310198,157.338388768229],
[8,194.096584739985,177.710332802887,188.006211652239],
[9,131.613923150861,112.503607632448,128.947939049068],
[10,139.545538050778,129.935716833166,139.334073132085]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['DensityDecileRank', 'Flavor1', 'Flavor2', 'Flavor3'])
df.plot(x='DensityDecileRank',
kind='bar',
stacked=False)
If you don't mind to use numpy to explicitly calculate the regression values,
the following code snippet based on this can be used as a quick solution:
ax = df.plot(x='DensityDecileRank', kind='bar', stacked=False)
rank, flavors = df.columns[0], df.columns[1:]
for flavor in flavors:
reg_func = np.poly1d(np.polyfit(df[rank], df[flavor], 1))
ax.plot(reg_func(df[rank]))
plt.show()
The code above derives the function reg_func for each flavor, which can be used for calculating the regression values based on the rank values.
The regression lines are plotted in the order of the flavor columns to match the colors. Further formatting can be added to ax.plot.

Add a line at z=0 to ggplot2 heatmap

I have plotted a heatmap in ggplot2. I want to add a curved line to the plot to show where z=0 (i.e. where the value of the data used for the fill is zero), how can I do this?
Thanks
Since no example data or code is provided, I'll illustrate with the volcano dataset, representing heights of a volcano in a matrix. Since the data doesn't contain a zero point, we'll draw the line at the arbitrarily chosen 125 mark.
library(ggplot2)
# Convert matrix to data.frame
df <- data.frame(
row = as.vector(row(volcano)),
col = as.vector(col(volcano)),
value = as.vector(volcano)
)
# Set contour breaks at desired level
ggplot(df, aes(col, row, fill = value)) +
geom_raster() +
geom_contour(aes(z = value),
breaks = 125, col = 'red')
Created on 2020-04-06 by the reprex package (v0.3.0)
If this isn't a good approximation of your problem, I'd suggest to include example data and code in your question.

Scatter plot from index column against another column of a Pandas DataFrame

I have a Pandas DataFrame that looks as shown
.
The column localhour is the index. My question is how can I use plot scatter to scatter the localhour column (the index) with use column. Thanks in advance.

Infer Series Labels and Data from pandas dataframe column for plotting

Consider a simple 2x2 dataset with with Series labels prepended as the first column ("Repo")
Repo AllTests Restricted
0 Galactian 1860.0 410.0
1 Forecast-MLib 140.0 47.0
Here are the DataFrame columns:
p(df.columns)
([u'Repo', u'AllTests', u'Restricted']
So we have the first column is the string/label and the second and third columns are data values. We want one series per row corresponding to the Galactian and the Forecast-MLlib repos.
It would seem this would be a common task and there would be a straightforward way to simply plot the DataFrame . However the following related question does not provide any simple way: it essentially throws away the DataFrame structural knowledge and plots manually:
Set matplotlib plot axis to be the dataframe column name
So is there a more natural way to plot these Series - that does not involve deconstructing the already-useful DataFrame but instead infers the first column as labels and the remaining as series data points?
Update Here is a self contained snippet
runtimes = npa([1860.,410.,140.,47.])
runtimes.shape = (2,2)
labels = npa(['Galactian','Forecast-MLlib'])
labels.shape=(2,1)
rtlabels = np.concatenate((labels,runtimes),axis=1)
rtlabels.shape = (2,3)
colnames = ['Repo','AllTests','Restricted']
df = pd.DataFrame(rtlabels, columns=colnames)
ps(df)
df.set_index('Repo').astype(float).plot()
plt.show()
And here is output
Repo AllTests Restricted
0 Galactian 1860.0 410.0
1 Forecast-MLlib 140.0 47.0
And with piRSquared help it looks like this
So the data is showing now .. but the Series and Labels are swapped. Will look further to try to line them up properly.
Another update
By flipping the columns/labels the series are coming out as desired.
The change was to :
labels = npa(['AllTests','Restricted'])
..
colnames = ['Repo','Galactian','Forecast-MLlib']
So the updated code is
runtimes = npa([1860.,410.,140.,47.])
runtimes.shape = (2,2)
labels = npa(['AllTests','Restricted'])
labels.shape=(2,1)
rtlabels = np.concatenate((labels,runtimes),axis=1)
rtlabels.shape = (2,3)
colnames = ['Repo','Galactian','Forecast-MLlib']
df = pd.DataFrame(rtlabels, columns=colnames)
ps(df)
df.set_index('Repo').astype(float).plot()
plt.title("Restricting Long-Running Tests\nin Galactus and Forecast-ML")
plt.show()
p('df columns', df.columns)
ps(df)
Pandas assumes your label information is in the index and columns. Set the index first:
df.set_index('Repo').astype(float).plot()
Or
df.set_index('Repo').T.astype(float).plot()