creating legend for US cities map using basemap and matplotlib - matplotlib

I have a large dataframe with 5 columns, State, City , Count, latitude, longitude. I am trying to create a geographical map using basemap with 50 US State and cities, in this map the count value is shown by the red circle. The size of circle indicate the counts value.
Here is a sample of dataframe :
city
state
count
latitude
longitude
BROOKLYN
NY
831
40.649188
-73.933724
NEW YORK
NY
646
40.734332
-74.010112
CHICAGO
IL
614
41.850100
-87.650000
HOUSTON
TX
530
29.741797
-95.309376
BRONX
NY
415
40.816461
-73.862173
MIAMI
FL
401
25.752956
-80.271061
PHOENIX
AZ
382
33.859694
-112.115872
DALLAS
TX
311
32.902156
-96.794543
SAN ANTONIO
TX
259
29.518456
-98.60973
ANCHORAGE
AK
20
61.189063
-149.886241
HONOLULU
HI
56
21.271982
-157.821362
PONCE
PR
61
17.987655
-66.623600
I am trying to add legend that shows the count level on lower side of the map with red circle.
This is my code:
plt.figure(figsize=[12,18])
base_map = Basemap(llcrnrlon=-119,llcrnrlat=22,urcrnrlon=-64,urcrnrlat=49,
projection='lcc',lat_1=32,lat_2=45,lon_0=-95)
# load the shapefile, use the name 'states'
#base_map.shadedrelief()
base_map.readshapefile('st99_d00', name='states', drawbounds=True)
# Get the location of each city and plot it
state_set = set(['AK','HI','PR'])
for lat, long, name, size in zip(df['latitude'].tolist(),
df'longitude'].tolist(),
df['state'].tolist(),
df['count'].tolist()):
x, y = base_map(long,lat)#lat, long)
base_map.plot(x,y,marker='o',color='Red',markersize=size/20)#label=name)
if name not in state_set:
state_set.add(name)
plt.text(x, y, name,fontsize=11, color='k')
plt.title('****')
#plt.legend()
plt.show()
I am adding this part to legend section.This code works but the legend color are gray, they are very compact not easy to read. how can I change color and make them smaller or spread out?
# make legend with dummy points
for a in [100, 300, 500,700,850]:
plt.scatter([], [], c='k', alpha=0.5, s=a,
label=str(a) + ' counts')
plt.legend(scatterpoints=1, frameon=False,
labelspacing=1, loc='lower left')
Is this right way to add the legend ?How can I adjust the legend?
I am stuck and pretty confused, I appreciate any help and feedback.
TIA!

Related

adehabitatHR home range estimation is too small

I have lat/long data of two animals tracked in Western Australia and I'd like to find their home ranges using adehabitatHR.
library(sp)
library(rgdal)
library(raster)
library(adehabitatHR)
library(sf)
quolls<-read.csv("quolls.csv")
head(quolls)
Latitude Longitude animal_ID
1 -22.62271 117.1247 1
2 -22.62286 117.1246 1
3 -22.62192 117.1223 1
4 -22.62021 117.1224 1
5 -22.61989 117.1244 1
6 -22.62022 117.1260 1
But the home range estimates of each animal are obviously too small.
I think the EPSG must be wrong but after a very long time looking I still can't find the right one.
Can anyone point me in the right direction please?
# make a SpatialPoints dataframe without a CRS
quolls2 <- quolls
quoll.latlong<-data.frame(x=quolls2$Longitude,y=quolls2$Latitude)
coordinates(quolls2) <- quoll.latlong
# add crs
proj4string(quolls2) <- CRS(SRS_string = "EPSG:4283")
mcp<-mcp(quolls2[,7],percent=95,unout = c("ha"))
mcp
Home range for animal 1 is 1.217428e-08 and animal 2 is 6.253689e-08.
And likewise with kernel density estimation;
quoll_ud <- adehabitatHR::kernelUD(quolls2[7],grid = 450)
quoll_hr <- adehabitatHR::getverticeshr(quoll_ud, 99)
print(quoll_hr)
which estimates animal 1 at 2.36917592701502e-08 and animal 2 at 1.16018636413173e-07.
Just stumbled across the answer.. it's EPSG 28350.
I got it to work in the end by abandoning the raw lats and longs and instead importing a shapefile I had of the animal data with st_read.
Then st_transform to 28350.
Then as mcp accepts only SpatialPoints, I converted the object with
as(obj, "Spatial").

How can I change a single bar in a bar plot to a different colour?

I have a dataframe that looks as follows
ACQUISITION_CHANNEL
RIDER_ID
Organic
2735
Referral
1216
Digital
751
Offline
296
Unknown
108
Job Platforms
67
And I am making a bar plot as below using:
channel_rider_count.plot(kind='bar',
legend=None)
plt.title('Count of Riders by Acquisition Channel')
plt.xlabel('Acquisition Channel')
plt.ylabel('Count')
How can I change the colour of the 'Referral' bar but leave the others the same?
colors = ['b','r','b','b','b','b','b']
channel_rider_count.plot(kind='bar',
legend=None,
color=colors)
plt.title('Count of Riders by Acquisition Channel')
plt.xlabel('Acquisition Channel')
plt.ylabel('Count')

stacked bar chart from grouped object

I get the expected count of following group-by query. But when I add .plot.bar() method, I get bar chart for each record.
How do I get stacked bar chart?
df.groupby(['department', 'status'])['c_name'].count()
department status
Agriculture Accepted 3
Pending 2
Rejected 13
Department of Education and Training Accepted 290
Rejected 65
Higher Education Accepted 424
Pending 24
Rejected 92
Medical Education and Research Accepted 34
Pending 3
Rejected 1
This will create a bar chart but not the stacked one.
.plot(kind='bar', stacked=True)
For each department there should be 3 colors (for Accepted, Pending and Rejected)
Update:
I managed using pivot.
gdf=df.groupby(['department', 'status'])['c_name'].count().reset_index()
gdf.pivot(index='department', columns='status').plot(kind='bar', stacked=True)
But is it possible to improve the chart quality?
You are close, need unstack:
df.groupby(['department','status'])['c_name'].count().unstack().plot(kind='bar', stacked=True)

vba loop through all the pivot fields of a pivot table and return specified values

I have a dataset whose entries has 5 different attributes and one value. For example, I have a height of 5000 people. For each person I have his hair color, eye color, his nationality, the city he were born and the name of his mother (the 5 dimensions).
No/Eye Color/Hair Color/Nationality/Hometown/Mother's Name/Height
Blue Blond Swiss Zürich Nicole 184
Blue Brown English York Ruby 164
Brown Brown French Paris Sophie 154
etc..
So there are 5 dimensions. The data is set dynamically, so the number of categories in each dimensions can vary. I sought to compute the average height of people depending on whether I want to include some dimensions or not (from 1 to 5). For example I wanted the retrieve:
The average height of French and Blue eyed people. Next day only the people born in London. And the week after, the Swiss, blue-eyed, red-haired, born in Geneva and whose mother is called Nicole.
So I create a pivot table with the Eye Color as Row labels, Hair Color as Column labels, the average height as the Data and the last 3 dimensions as Market Filters. This allowed me see all the possible and desired combinations of average height that my data implies.
Now my goal is:
I want to create a Macro that goes through all the possible combinations that my dimensions entails (i.e 2^5-1=31) and store in a vector all the combination of height average that are above a certain value, e.g. 190. And then It could print on a worksheet.
I was thinking on using some booleans arrays vector and For-Each-Next structure, but I must say that I fail to picture how to implement it.
Any ideas?
Thanks for the time and help!

whats A is representing in GPS co-ordinate point?

I am getting GPS information from a device like this
052340.000,A
32.46275,N
75.310415,E
I know N is for north and E for east but what A is representing?
Looking at the value, and some of the other comments, it is unlikely to be an altitude in meters. If this has been extracted from a GPGLL NMEA sentance, the value is time of fix, e.g. 05:23:40, as per the following
$GPGLL
Geographic Position, Latitude / Longitude and time.
eg2. $GPGLL,4916.45,N,12311.12,W,225444,A
4916.46,N Latitude 49 deg. 16.45 min. North
12311.12,W Longitude 123 deg. 11.12 min. West
225444 Fix taken at 22:54:44 UTC
A Data valid
eg3. $GPGLL,5133.81,N,00042.25,W*75
1 2 3 4 5
1 5133.81 Current latitude
2 N North/South
3 00042.25 Current longitude
4 W East/West
5 *75 checksum
$--GLL,lll.ll,a,yyyyy.yy,a,hhmmss.ss,A
llll.ll = Latitude of position
a = N or S
yyyyy.yy = Longitude of position
a = E or W
hhmmss.ss = UTC of position
A = status: A = valid data
Altitude. it would depend on the device as to what units it is. from the number shown in your example i would doubt it is meters, unless you are in an aeroplane.
more info here