How to plot bar plot for dataframe? - pandas

I am trying to plot a bar plot, but it looks really bad.
plt.style.use('ggplot')
x = ['High School or Below', 'College', 'Bachelor', 'Master or Above']
y = [maleDataFrame["Education"].str.contains("High School or Below").sum(),
maleDataFrame["Education"].str.contains("College").sum(),
maleDataFrame["Education"].str.contains("Bachelor").sum(),
maleDataFrame["Education"].str.contains("Master or Above").sum()]
x_pos = [i for i, _ in enumerate(x)]
plt.bar(x_pos, y, color=['blue','red','green','yellow'])
plt.xlabel("Type of Education")
plt.ylabel("Level of Education Male")
customTitle = ""
plt.title(customTitle)
plt.xticks(x_pos, x)
How can I fix this?

You can add back the line that defines y to reproduce it using the data you have. Protip: Use bar(x, y, width=30) to modify the width of the bar as per your requirement.
Modified your code to:
plt.style.use('ggplot')
x = ['High School \nor Below', 'College', 'Bachelor', 'Master or \nAbove']
y = [10, 20, 30, 40]
plt.bar(x, y, color=['blue','red','green','yellow'])
plt.xlabel("Type of Education")
plt.ylabel("Level of Education Male")
customTitle = ""
plt.title(customTitle)
To give:

Related

geom_bar for total counts of binned continuous variable

I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

How to draw an angle arc between line segments in ggplot2?

Is it possible to use ggplot2 (or plotnine or other grammar of graphics packages) to draw an angle arc between two straight line segments as shown below?
(ignore the circle etc.)
I know that this can be done with graphics programs such as Geogebra. But I am interested in drawing the angle mark (and label) programmatically in Jupyter.
(By the way, is there a word for this "angle arc"? I don't know how to call it, and just used "angle arc".)
For R, there is the ggforce package that extends ggplot2 and defines a geom_arc() that comes pretty close. Example below:
library(ggplot2)
library(ggforce)
start <- c(x = 0, y = 0)
dat <- data.frame(
x = start[c("x", "x")],
y = start[c("y", "y")],
xend = c(1, 4),
yend = c(5, 1)
)
angles <- with(dat, atan2(xend - x, yend - y))
ggplot(dat) +
geom_segment(aes(x, y, xend = xend, yend = yend)) +
geom_arc(aes(x0 = start["x"], y0 = start["y"], r = 1,
start = angles[1], end = angles[2])) +
coord_equal()

Depth Profiling visualization

I'm trying to create a depth profile graph with the variables depth, distance and temperature. The data collected is from 9 different points with known distances between them (distance 5m apart, 9 stations, 9 different sets of data). The temperature readings are according to these 9 stations where a sonde was dropped directly down, taking readings of temperature every 2 seconds. Max depth at each of the 9 stations were taken from the boat also.
So the data I have is:
Depth at each of the 9 stations (y axis)
Temperature readings at each of the 9 stations, at around .2m intervals vertical until the bottom was reached (fill area)
distance between the stations, (x axis)
Is it possible to create a depth profile similar to this? (obviously without the greater resolution in this graph)
I've already tried messing around with ggplot2 and raster but I just can't seem to figure out how to do this.
One of the problems I've come across is how to make ggplot2 distinguish between say 5m depth temperature reading at station 1 and 5m temperature reading at station 5 since they have the same depth value.
Even if you can guide me towards another program that would allow me to create a graph like this, that would be great
[ REVISION ]
(Please comment me if you know more suitable interpolation methods, especially not needing to cut under bottoms data.)
ggplot() needs long data form.
library(ggplot2)
# example data
max.depths <- c(1.1, 4, 4.7, 7.7, 8.2, 7.8, 10.7, 12.1, 14.3)
depth.list <- sapply(max.depths, function(x) seq(0, x, 0.2))
temp.list <- list()
set.seed(1); for(i in 1:9) temp.list[[i]] <- sapply(depth.list[[i]], function(x) rnorm(1, 20 - x*0.5, 0.2))
set.seed(1); dist <- c(0, sapply(seq(5, 40, 5), function(x) rnorm(1, x, 1)))
dist.list <- sapply(1:9, function(x) rep(dist[x], length(depth.list[[x]])))
main.df <- data.frame(dist = unlist(dist.list), depth = unlist(depth.list) * -1, temp = unlist(temp.list))
# a raw graph
ggplot(main.df, aes(x = dist, y = depth, z = temp)) +
geom_point(aes(colour = temp), size = 1) +
scale_colour_gradientn(colours = topo.colors(10))
# a relatively raw graph (don't run with this example data)
ggplot(main.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) + # geom_contour() +
scale_fill_gradientn(colours = topo.colors(10))
If you want a graph such like you showed, you have to do interpolation. Some packages give you spatial interpolation methods. In this example, I used akima package but you should think seriously that which interpolation methods to use.
I used nx = 300 and ny = 300 in below code but I think it would be better to decide those values carefully. Large nx and ny gives a high resolution graph, but don't foreget real nx and ny (in this example, real nx is only 9 and ny is 101).
library(akima); library(dplyr)
interp.data <- interp(main.df$dist, main.df$depth, main.df$temp, nx = 300, ny = 300)
interp.df <- interp.data %>% interp2xyz() %>% as.data.frame()
names(interp.df) <- c("dist", "depth", "temp")
# draw interp.df
ggplot(interp.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) + # geom_contour() +
scale_fill_gradientn(colours = topo.colors(10))
# to think appropriateness of interpolation (raw and interpolation data)
ggplot(interp.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp), alpha = 0.3) + # interpolation
scale_fill_gradientn(colours = topo.colors(10)) +
geom_point(data = main.df, aes(colour = temp), size = 1) + # raw
scale_colour_gradientn(colours = topo.colors(10))
Bottoms don't match !!I found ?interp says "interpolation only within convex hull!", oops... I'm worrid about the interpolation around the problem-area, is it OK ? If no problem, you need only cut the data under the bottoms. If not, ... I can't answer immediately (below is an example code to cut).
bottoms <- max.depths * -1
# calculate bottom values using linear interpolation
approx.bottoms <- approx(dist, bottoms, n = 300) # n must be the same value as interp()'s nx
# change temp values under bottom into NA
library(dplyr)
interp.cut.df <- interp.df %>% cbind(bottoms = approx.bottoms$y) %>%
mutate(temp = ifelse(depth >= bottoms, temp, NA)) %>% select(-bottoms)
ggplot(interp.cut.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) +
scale_fill_gradientn(colours = topo.colors(10)) +
geom_point(data = main.df, size = 1)
If you want to use stat_contour
It is harder to use stat_contour than geom_raster because it needs a regular grid form. As far as I see your graph, your data (depth and distance) don't form a regular grid, it means it is much difficult to use stat_contour with your raw data. So I used interp.cut.df to draw a contour plot. And stat_contour have a endemic problem (see How to fill in the contour fully using stat_contour), so you need to expand your data.
library(dplyr)
# 1st: change NA into a temp's out range value (I used 0)
interp.contour.df <- interp.cut.df
interp.contour.df[is.na(interp.contour.df)] <- 0
# 2nd: expand the df (It's a little complex, so please use this function)
contour.support.func <- function(df) {
colname <- names(df)
names(df) <- c("x", "y", "z")
Range <- as.data.frame(sapply(df, range))
Dim <- as.data.frame(t(sapply(df, function(x) length(unique(x)))))
arb_z = Range$z[1] - diff(Range$z)/20
df2 <- rbind(df,
expand.grid(x = c(Range$x[1] - diff(Range$x)/20, Range$x[2] + diff(Range$x)/20),
y = seq(Range$y[1], Range$y[2], length = Dim$y), z = arb_z),
expand.grid(x = seq(Range$x[1], Range$x[2], length = Dim$x),
y = c(Range$y[1] - diff(Range$y)/20, Range$y[2] + diff(Range$y)/20), z = arb_z))
names(df2) <- colname
return(df2)
}
interp.contour.df2 <- contour.support.func(interp.contour.df)
# 3rd: check the temp range (these values are used to define contour's border (breaks))
range(interp.cut.df$temp, na.rm=T) # 12.51622 20.18904
# 4th: draw ... the bottom border is dirty !!
ggplot(interp.contour.df2, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = seq(12.51622, 20.18904, length = 11), aes(fill = ..level..)) +
coord_cartesian(xlim = range(dist), ylim = range(bottoms), expand = F) + # cut expanded area
scale_fill_gradientn(colours = topo.colors(10)) # breaks's length is 11, so 10 colors are needed
# [Note]
# You can define the contour's border values (breaks) and colors.
contour.breaks <- c(12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5)
# = seq(12.5, 20.5, 1) or seq(12.5, 20.5, length = 9)
contour.colors <- c("darkblue", "cyan3", "cyan1", "green3", "green", "yellow2","pink", "darkred")
# breaks's length is 9, so 8 colors are needed.
# 5th: vanish the bottom border by bottom line
approx.df <- data.frame(dist = approx.bottoms$x, depth = approx.bottoms$y, temp = 0) # 0 is dummy value
ggplot(interp.contour.df2, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = contour.breaks, aes(fill = ..level..)) +
coord_cartesian(xlim=range(dist), ylim=range(bottoms), expand = F) +
scale_fill_gradientn(colours = contour.colors) +
geom_line(data = approx.df, lwd=1.5, color="gray50")
bonus: legend technic
library(dplyr)
interp.contour.df3 <- interp.contour.df2 %>% mutate(temp2 = cut(temp, breaks = contour.breaks))
interp.contour.df3$temp2 <- factor(interp.contour.df3$temp2, levels = rev(levels(interp.contour.df3$temp2)))
ggplot(interp.contour.df3, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = contour.breaks, aes(fill = ..level..)) +
coord_cartesian(xlim=range(dist), ylim=range(bottoms), expand = F) +
scale_fill_gradientn(colours = contour.colors, guide = F) + # add guide = F
geom_line(data = approx.df, lwd=1.5, color="gray50") +
geom_point(aes(colour = temp2), pch = 15, alpha = 0) + # add
guides(colour = guide_legend(override.aes = list(colour = rev(contour.colors), alpha = 1, cex = 5))) + # add
labs(colour = "temp") # add
You want to treat this as a 3-D surface with temperature as the z dimension. The given plot is a contour plot and it looks like ggplot2 can do that with stat_contour.
I'm not sure how the contour lines are computed (often it's linear interpolation along a Delaunay triangulation). If you want more control over how to interpolate between your x/y grid points, you can calculate a surface model first and feed those z coordinates into ggplot2.

Adding labels on x-axis

I have a String[] label = {"Dogs", "Cats", "Birds", "Pigs"};
I have a graph, and I want the labes to show at x, y on the x-axis, and not at random places. As an example, I have a curve, and dogs should appear at x = 3 and y = 8, and cats at x = 5 and y = 12 etc. How can I achieve this?
Right now to add the labels, I am doing:
graphPane.XAxis.Scale.TextLabels = label;
and this adds the labels with out a system.

Preparing data to plot contours in Matplotlib's Basemap

I'm having a hard time with plotting a basemap with Matplotlib and I'm fairly new to it so I was hoping for some help.
I have data of the format:
[ (lat1, lon1, data1),
(lat2, lon2, data2),
(lat3, lon3, data3),
...
(latN, lonN, dataN) ]
And here is some sample data:
(32.0, -128.5, 3.99)
(31.0, -128.0, 3.5027272727272734)
(31.5, -128.0, 3.7383333333333333)
(32.0, -128.0, 3.624)
(32.5, -128.0, 3.913157894736842)
(33.0, -128.0, 4.443333333333334)
Finally, here are some basic statistics about my data that I'm planning to plot:
LAT MIN: 22
LAT MAX: 50
LAT LEN: 1919
LON MIN: -128
LON MAX: -97
LON LEN: 1919
DATA MIN: 0
DATA MAX: 12
DATA LEN: 1919
I need to contour plot on a basemap of the continental United States. I can't, for the life of me, seem to figure out how to setup the data for plotting.
I read that the X-Axis (LATS) needs to be a np.array, and Y-Axis (LONS) needs to be an np.array and that Z (DATA) needs to be a MxN matrix where M = len(LATS) and N = len(LONS). So to me, I see Z as a diagonal matrix where the diagonal contains the data on the diagonal is the values found in DATA corresponding to the index of LATS and LONS.
Here is my code:
def show_map(self, a):
a = sorted(a, key = lambda entry: entry[0]) # sort by latitude
a = sorted(a, key = lambda entry: entry[1]) # then sort by longitude
lats = [ x[0] for x in a ]
lons = [ x[1] for x in a ]
data = [ x[2] for x in a ]
lat_min = min(lats)
lat_max = max(lats)
lon_min = min(lons)
lon_max = max(lons)
data_min = min(data)
data_max = max(data)
x = np.array(lats)
y = np.array(lons)
z = np.diag(data)
m = Basemap(
projection = 'merc',
llcrnrlat=lat_min, urcrnrlat=lat_max,
llcrnrlon=lon_min, urcrnrlon=lon_max,
rsphere=6371200., resolution='l', area_thresh=10000
lat_ts = 20, resolution = 'c'
)
fig = plt.figure()
plt.subplot(211)
ax = plt.gca()
# draw parallels
delat = 10.0
parallels = np.arange(0., 90, delat)
m.drawparallels(parallels, labels=[1,0,0,0], fontsize=10)
# draw meridians
delon = 10.
meridians = np.arange(180.,360.,delon)
m.drawmeridians(meridians,labels=[0,0,0,1],fontsize=10)
# draw map features
m.drawcoastlines(linewidth = 0.50)
m.drawcountries(linewidth = 0.50)
m.drawstates(linewidth = 0.25)
ny = z.shape[0]; nx = z.shape[1] # make grid
lo, la = m.makegrid(nx, ny)
X, Y = m(lo, la)
clevs = [0,1,2.5,5,7.5,10,15,20,30,40,50,70,100,150,200,250,300,400,500,600,750]
cs = m.contour(X, Y, z, clevs)
plt.show()
The plot I get, however, is this: http://imgur.com/li1Wg. I need something to this effect: http://matplotlib.org/basemap/_images/plotprecip.png
Can someone point out what I'm doing wrong and help me plot this? Thank You.
Thanks
I figured out how to do it. This is the code that I finally wrote, and I think this can help other users. If there is a better way of doing this, please state it, since I'm new to Matplotlib.
https://gist.github.com/3789221
Your linked gist is a solution but still wrong in another place.
In your question and in your linked gist you switched x and y coordinates with lon and lat.
x represents lon
y represents lat
Therefore you still get wrong results with your linked gist.
why are you writing:
z = np.diag(data)
From the documentation, numpy.diag(v, k=0) extracts a diagonal or construct a diagonal array.
That should be why you only get a "diagonal area" of values...