Map of New York State counties with binned colors and legend - ggplot2

I am trying to make a county-level map of the state of New York. I would like to color each county based on their level of unionization. I need the map and legend to have four discrete colors of red, rather than a red gradient. I need the legend to display these four different colors with non-overlapping labels/ranges (e.g. 0-25; 26-50; 51-75; 76-100).
Here is my data:
fips unionized
1 36001 33.33333
2 36005 86.11111
3 36007 0.00000
4 36017 0.00000
5 36021 0.00000
6 36027 66.66667
7 36029 40.00000
8 36035 50.00000
9 36039 0.00000
10 36047 82.85714
11 36051 0.00000
12 36053 100.00000
13 36055 30.76923
14 36057 0.00000
15 36059 84.37500
16 36061 81.81818
17 36063 60.00000
18 36065 50.00000
19 36067 71.42857
20 36069 0.00000
21 36071 55.55556
22 36073 0.00000
23 36079 100.00000
24 36081 92.15686
25 36083 50.00000
26 36085 100.00000
27 36087 87.50000
28 36101 0.00000
29 36103 63.88889
30 36105 0.00000
31 36107 0.00000
32 36111 50.00000
33 36113 50.00000
34 36115 100.00000
35 36117 0.00000
36 36119 73.33333
37 36121 0.00000
38 36123 0.00000
I have successfully made the map with a gradient of colors, but cannot figure out how to make discrete colors in the map and legend.
Here is my code:
library(usmap)
library(ggplot2)
plot_usmap(regions = "counties", include = c("NY"), data = Z, values = "unionized") +
labs(title = "Percent Unionized", subtitle = "") +
scale_fill_continuous(low = "white", high = "red", na.value="light grey", name = "Unionization") + theme(legend.position = "right")
Thanks!

This could be achieved via scale_fill_binned and guide_bins. Try this:
library(usmap)
library(ggplot2)
plot_usmap(regions = "counties", include = c("NY"), data = Z, values = "unionized") +
labs(title = "Percent Unionized", subtitle = "") +
scale_fill_binned(low = "white", high = "red", na.value="light grey", name = "Unionization", guide = guide_bins(axis = FALSE, show.limits = TRUE)) +
theme(legend.position = "right")
A second option would be to bin the variable manually and use scale_fill_manual to set the fill colors which makes it easy to set the labels and has the advantage that it adds the NAs automatically. For the color scale I make use of colorRampPalette (By default colorRampPalette interpolates in rgb color space. To get fill colors like the one using scale_fill_binned you can add the argument space = "Lab".).
library(usmap)
library(ggplot2)
Z$union_bin <- cut_interval(Z$unionized, n = 4, labels = c("0-25", "26-50", "51-75", "76-100"))
plot_usmap(regions = "counties", include = c("NY"), data = Z, values = "union_bin") +
labs(title = "Percent Unionized", subtitle = "") +
scale_fill_manual(values = colorRampPalette(c("white", "red"))(5)[2:5],
na.value="light grey", name = "Unionization") +
theme(legend.position = "right")

Related

Splitting a coordinate string into X and Y columns with a pandas data frame

So I created a pandas data frame showing the coordinates for an event and number of times those coordinates appear, and the coordinates are shown in a string like this.
Coordinates Occurrences x
0 (76.0, -8.0) 1 0
1 (-41.0, -24.0) 1 1
2 (69.0, -1.0) 1 2
3 (37.0, 30.0) 1 3
4 (-60.0, 1.0) 1 4
.. ... ... ..
63 (-45.0, -11.0) 1 63
64 (80.0, -1.0) 1 64
65 (84.0, 24.0) 1 65
66 (76.0, 7.0) 1 66
67 (-81.0, -5.0) 1 67
I want to create a new data frame that shows the x and y coordinates individually and shows their occurrences as well like this--
x Occurrences y Occurrences
76 ... -8 ...
-41 ... -24 ...
69 ... -1 ...
37 ... -30 ...
60 ... 1 ...
I have tried to split the string but don't think I am doing it correctly and don't know how to add it to the table regardless--I think I'd have to do something like a for loop later on in my code--I scraped the data from an API, here is the code to set up the data frame shown.
for key in contents['liveData']['plays']['allPlays']:
# for plays in key['result']['event']:
# print(key)
if (key['result']['event'] == "Shot"):
#print(key['result']['event'])
scoordinates = (key['coordinates']['x'], key['coordinates']['y'])
if scoordinates not in shots:
shots[scoordinates] = 1
else:
shots[scoordinates] += 1
if (key['result']['event'] == "Goal"):
#print(key['result']['event'])
gcoordinates = (key['coordinates']['x'], key['coordinates']['y'])
if gcoordinates not in goals:
goals[gcoordinates] = 1
else:
goals[gcoordinates] += 1
#create data frame using pandas
gdf = pd.DataFrame(list(goals.items()),columns = ['Coordinates','Occurences'])
print(gdf)
sdf = pd.DataFrame(list(shots.items()),columns = ['Coordinates','Occurences'])
print()
try this
import re
df[['x', 'y']] = df.Coordinates.apply(lambda c: pd.Series(dict(zip(['x', 'y'], re.findall('[-]?[0-9]+\.[0-9]+', c.strip())))))
using the in-built string methods to achieve this should be performant:
df[["x", "y"]] = df["Coordinates"].str.strip(r"[()]").str.split(",", expand=True).astype(np.float)
(this also converts x,y to float values, although not requested probably desired)

'float' object has no attribute 'split'

I have a pandas data-frame with a column with float numbers. I tried to split each item in a column by dot '.'. Then I want to add first items to second items. I don't know why this sample code is not working.
data=
0 28.47000
1 28.45000
2 28.16000
3 28.29000
4 28.38000
5 28.49000
6 28.21000
7 29.03000
8 29.11000
9 28.11000
new_array = []
df = list(data)
for i in np.arange(len(data)):
df1 = df[i].split('.')
df2 = df1[0]+df[1]/60
new_array=np.append(new_array,df2)
Use numpy.modf with DataFrame constructor:
arr = np.modf(data.values)
df = pd.DataFrame({'a':data, 'b':arr[1] + arr[0] / 60})
print (df)
a b
0 28.47 28.007833
1 28.45 28.007500
2 28.16 28.002667
3 28.29 28.004833
4 28.38 28.006333
5 28.49 28.008167
6 28.21 28.003500
7 29.03 29.000500
8 29.11 29.001833
9 28.11 28.001833
Detail:
arr = np.modf(data.values)
print(arr)
(array([ 0.47, 0.45, 0.16, 0.29, 0.38, 0.49, 0.21, 0.03, 0.11, 0.11]),
array([ 28., 28., 28., 28., 28., 28., 28., 29., 29., 28.]))
print(arr[0] / 60)
[ 0.00783333 0.0075 0.00266667 0.00483333 0.00633333 0.00816667
0.0035 0.0005 0.00183333 0.00183333]
EDIT:
df = pd.DataFrame({'a':data, 'b':arr[1] + arr[0]*5/3 })
print (df)
a b
0 28.47 28.783333
1 28.45 28.750000
2 28.16 28.266667
3 28.29 28.483333
4 28.38 28.633333
5 28.49 28.816667
6 28.21 28.350000
7 29.03 29.050000
8 29.11 29.183333
9 28.11 28.183333
Your data types are floats, not strings, and so cannot be .split() (this is a string method). Instead you can look to use math.modf to 'split' a float into fractional and decimal parts
https://docs.python.org/3.6/library/math.html
import math
def process(x:float, divisor:int=60) -> float:
"""
Convert a float to its constituent parts. Divide the fractional part by the divisor, and then recombine creating a 'scaled fractional' part,
"""
b, a = math.modf(x)
c = a + b/divisor
return c
df['data'].apply(process)
Out[17]:
0 28.007833
1 28.007500
2 28.002667
3 28.004833
4 28.006333
5 28.008167
6 28.003500
7 29.000500
8 29.001833
9 28.001833
Name: data=, dtype: float64
Your other option is to convert them to strings, split, convert to ints and floats again, do some maths and then combine the floats. I'd rather keep the object as it is personally.

How to create a new column in a Pandas DataFrame using pandas.cut method?

I have a column with house prices that looks like this:
0 0.0
1 1480000.0
2 1035000.0
3 0.0
4 1465000.0
5 850000.0
6 1600000.0
7 0.0
8 0.0
9 0.0
Name: Price, dtype: float64
and I want to create a new column called data['PriceRanges'] which sets each price in a given range. This is what my code looks like:
data = pd.read_csv("Melbourne_housing_FULL.csv")
data.fillna(0, inplace=True)
for i in range(0, 12000000, 50000):
bins = np.array(i)
labels = np.array(str(i))
data['PriceRange'] = pd.cut(data.Price, bins=bins, labels=labels, right=True)
And I get this Error message:
TypeError: len() of unsized object
I've been trying different approaches and seem to be stuck here. I'd really appreciate some help.
Thanks,
Hugo
There is problem you overwrite bins and labels in loop, so there is only last value.
for i in range(0, 12000000, 50000):
bins = np.array(i)
labels = np.array(str(i))
print (bins)
11950000
print (labels)
11950000
There is no necessary loop, only instead range use numpy alternative arange and for labels create ranges. Last add parameter include_lowest=True to cut for include first value of bins (0) to first group.
bins = np.arange(0, 12000000, 50000)
labels = ['{} - {}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
#correct first value
labels[0] = '0 - 50000'
print (labels[:10])
['0 - 50000', '50001 - 100000', '100001 - 150000', '150001 - 200000',
'200001 - 250000', '250001 - 300000', '300001 - 350000', '350001 - 400000',
'400001 - 450000', '450001 - 500000']
data['PriceRange'] = pd.cut(data.Price,
bins=bins,
labels=labels,
right=True,
include_lowest=True)
print (data)
Price PriceRange
0 0.0 0 - 50000
1 1480000.0 1450001 - 1500000
2 1035000.0 1000001 - 1050000
3 0.0 0 - 50000
4 1465000.0 1450001 - 1500000
5 850000.0 800001 - 850000
6 1600000.0 1550001 - 1600000
7 0.0 0 - 50000
8 0.0 0 - 50000
9 0.0 0 - 50000

Add sample size to a panel figure of boxplots

I am trying to add sample size to boxplots (preferably at the top or bottom of them) that are grouped by two levels. I used the facet_grid() function to produce a panel plot. I then tried to use the annotate() function to add the sample sizes, however this couldn't work because it repeated the values in the second panel. Is there a simple way to do this?
head(FeatherData, n=10)
Location Status FeatherD Species ID
## 1 TX Resident -27.41495 Carolina wren CARW (32)
## 2 TX Resident -29.17626 Carolina wren CARW (32)
## 3 TX Resident -31.08070 Carolina wren CARW (32)
## 4 TX Migrant -169.19579 Yellow-rumped warbler YRWA (28)
## 5 TX Migrant -170.42079 Yellow-rumped warbler YRWA (28)
## 6 TX Migrant -158.66925 Yellow-rumped warbler YRWA (28)
## 7 TX Migrant -165.55278 Yellow-rumped warbler YRWA (28)
## 8 TX Migrant -170.43374 Yellow-rumped warbler YRWA (28)
## 9 TX Migrant -170.21801 Yellow-rumped warbler YRWA (28)
## 10 TX Migrant -184.45871 Yellow-rumped warbler YRWA (28)
ggplot(FeatherData, aes(x = Location, y = FeatherD)) +
geom_boxplot(alpha = 0.7, fill='#A4A4A4') +
scale_y_continuous() +
scale_x_discrete(name = "Location") +
theme_bw() +
theme(plot.title = element_text(size = 20, family = "Times", face =
"bold"),
text = element_text(size = 20, family = "Times"),
axis.title = element_text(face="bold"),
axis.text.x=element_text(size = 15)) +
ylab(expression(Feather~delta^2~H["f"]~"‰")) +
facet_grid(. ~ Status)
There's multiple ways to do this sort of task. The most flexible way is to compute your statistic outside the plotting call as a separate dataframe and use it as its own layer:
library(dplyr)
library(ggplot2)
cw_summary <- ChickWeight %>%
group_by(Diet) %>%
tally()
cw_summary
# A tibble: 4 x 2
Diet n
<fctr> <int>
1 1 220
2 2 120
3 3 120
4 4 118
ggplot(ChickWeight, aes(Diet, weight)) +
geom_boxplot() +
facet_grid(~Diet) +
geom_text(data = cw_summary,
aes(Diet, Inf, label = n), vjust = 1)
The other method is to use the summary functions built in, but that can be fiddly. Here's an example:
ggplot(ChickWeight, aes(Diet, weight)) +
geom_boxplot() +
stat_summary(fun.y = median, fun.ymax = length,
geom = "text", aes(label = ..ymax..), vjust = -1) +
facet_grid(~Diet)
Here I used fun.y to position the summary at the median of the y values, and used fun.ymax to compute an internal variable called ..ymax.. with the function length (which just counts the number of observations).

Shiny leaflet select input code

I want change circles color of selected region on the map. Need your advise. Please help me to create the right observeEvent for selectInput action. I need the app clear my circles and highlight only selected region with specific pop up.
My data:
Region Pop Latitude Lontitude
1 Cherkasy 1238593 49.444433 32.059767
2 Chernihiv 1040492 51.498200 31.289350
3 Chernivtsi 909081 48.292079 25.935837
4 City of Kyiv 2909491 50.450100 30.523400
5 Dnipro 3244341 48.464717 35.046183
6 Donetsk 4255450 48.015883 37.802850
7 Ivano-Frankivsk 1381014 48.922633 24.711117
8 Kharkiv 2711475 49.993500 36.230383
9 Kherson 1059481 46.635417 32.616867
10 Khmelnytskiy 1291187 49.422983 26.987133
11 Kirovohrad 969662 48.507933 32.262317
12 Kyiv 1732435 50.450100 30.523400
13 Luhansk 2200807 48.574041 39.307815
14 Lviv 2531265 49.839683 24.029717
15 Mykolayiv 1155174 46.975033 31.994583
16 Odesa 2386441 46.482526 30.723310
17 Poltava 1433804 49.588267 34.551417
18 Rivne 1161537 50.619900 26.251617
19 Sumy 1108651 50.907700 34.798100
20 Ternopil 1063264 49.553517 25.594767
21 Vinnytsya 1597683 49.233083 28.468217
22 Volyn 1042218 50.747233 25.325383
23 Zakarpattya 1258507 48.620800 22.287883
24 Zaporizhzhya 1747639 47.838800 35.139567
25 Zhytomyr 1244219 50.254650 28.658667
Code
library(shiny)
library(leaflet)
library(maps)
library(shinythemes)
library(readxl)
UkrStat <- read_excel("D:/My downloads/Downloads/R Studio/UkrStat.xlsx")
ui <- (fluidPage(theme = shinytheme("superhero"),
titlePanel("Map of Ukraine"),
sidebarLayout(
sidebarPanel(
selectInput("region", label = "Region", choices = c("", UkrStat$Region), selected = "City of Kyiv")
),
mainPanel(
leafletOutput("CountryMap", width = 1000, height = 500))
)
))
server <- function(input, output, session){
output$CountryMap <- renderLeaflet({
leaflet() %>% addTiles() %>% addProviderTiles("CartoDB.Positron") %>%
setView(lng = 31.165580, lat = 48.379433, zoom = 6) %>%
addCircles(lng = UkrStat$Lontitude, lat = UkrStat$Latitude, weight = 1, radius = sqrt(UkrStat$Pop)*30, popup = UkrStat$Region)
})
observeEvent(input$region, {
leafletProxy("CountryMap") %>% clearMarkers()
})
}
# Run the application
shinyApp(ui = ui, server = server)
To achieve what you want you need to replace your observeEvent for input$region with the following:
observeEvent(input$region, {
if(input$region != "")
{
leafletProxy("CountryMap") %>% clearShapes()
index = which(UkrStat$Region == input$region)
leafletProxy("CountryMap")%>% addCircles(lng = UkrStat$Lontitude[index], lat = UkrStat$Latitude[index],
weight = 1, radius = sqrt(UkrStat$Pop[index])*30, popup = UkrStat$Region[index])
}
})
Here I am first clearing all the circles. After that I am finding the index for the selected region and getting the corresponding longitude and latitude and adding circles at that position.
Hope it helps!