Plot bubbles on world map using geopandas and pandas in python (most simple solution) - pandas

how can I plot the dataframe-info below onto the geopandas map? Bubble size should be dependant on case-numbers!
import geopandas
import geoplot
import pandas
d = {"Germany": 5, "United Kingdom" : 3, "Finland" : 1, "United States of America" : 4}
df = pandas.DataFrame.from_dict(d,orient='index')
df.columns = ["Cases"]
def WorldCaseMap():
world = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
ex = geoplot.polyplot(world)
WorldCaseMap()

Make a second df containing centroid geometry and plot it over the first one. Working example below.
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
centroids = world.copy()
centroids.geometry = world.centroid
centroids['size'] = centroids['pop_est'] / 1000000 # to get reasonable plotable number
ax = world.plot(facecolor='w', edgecolor='k')
centroids.plot(markersize='size', ax=ax)

I'm not sure geopandas gives you a bubble-map that easily. Their best example is a choropleth:
gpd_per_person = world['gdp_md_est'] / world['pop_est']
scheme = mapclassify.Quantiles(gpd_per_person, k=5)
# Note: this code sample requires geoplot>=0.4.0.
geoplot.choropleth(
world, hue=gpd_per_person, scheme=scheme,
cmap='Greens', figsize=(8, 4)
)
Source
I found another example of bubble-maps using geopandas here: https://residentmario.github.io/geoplot/gallery/plot_usa_city_elevations.html however I prefer the look of the plotly example. (See below).
Otherwise have a look at plotly's examples they have a bubble map: https://plot.ly/python/bubble-maps/

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
sns.pairplot(X_train,
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
plt.show()
What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
np.random.seed(20230209)
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
g.set(ylabel='')
plt.show()
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})
g.set(xlabel='')
plt.show()

Is there any way to show gray color to states which are not having any data in Plotly map?

I need to show gray color to the states which do not have any data in Plotly.
Sample csv file is: (This states have data)
States which are not having data are: (I have filled the missing values as -1
The current plots generated are: ( I need to show gray color to the states with missing data.
Thanks!
Your solution is to use custom colorscale in combination with
import plotly.express as px
px.choropleth_mapbox
The following is an example on how to use custom colorscale:
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
import copy
import pandas as pd
# Read data from a csv
z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')
z=z_data.values.copy()
# Compute surface color with nan's
surfacecolor = z.copy()
surfacecolor[-10:, -10:] = np.nan
# Replace nans with -100
surfacecolor[np.isnan(surfacecolor)] = -100
# Build surface trace
data = [
go.Surface(
z=z,
surfacecolor=surfacecolor,
cmin = -5,
cmax = 350,
colorscale=[[0, 'gray'],
[0.01, 'gray'],
[0.01, 'blue'],
[1, 'red']]
)
]
# Build layout
layout = go.Layout(
title='Mt Bruno Elevation',
autosize=False,
width=500,
height=500,
margin=dict(
l=65,
r=50,
b=65,
t=90
)
)
fig = go.FigureWidget(data=data, layout=layout)
fig
A similar question has been solved by the plotly community forum.
Please find the plotly documentation on how to define custom colorscales.
Hope this solves your issue!

plotting stations on map

i have three stations named as A, B, C. I want to plot them on a map using matplotlib basemap.
station A: latitude=17.8 longitude=74.48
station B: latitude=-25.02 longitude=25.60
station C: latitude=44.58 longitude=-123.30
As i am new to python and that to matplotlib, i am confused how should i plot it.
i tried the code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);
But it doesnot plot any of my stations.so i hope experts may help me.Thanks in advance.
Here's an example based on your data with which you might play around:
import pygmt
import pandas as pd
# create df with station information
data = {'lon':[74.48, 25.60, -123.30],
'lat':[17.8, -25.02, 44.58],
'station':['A', 'B', 'C']}
df = pd.DataFrame(data)
fig = pygmt.Figure()
pygmt.config(MAP_GRID_PEN = '0.25p,gray')
fig.grdimage("#earth_day_10m", region="g", projection="G130/50/12c", frame="g")
fig.plot(x = df.lon, y = df.lat, style = "t1c", color = "red3", pen = "1p,white")
fig.show()
The tutorials in the User Guide and the gallery examples are a good starting point when working with PyGMT the first time!

set_xlim() does not work with text labels

I am trying to zoom in on geopandas map with labels using set_xlim() in with matplotlib. I basically adapted this SO question to add labels to a map.
However, set_xlim() does not seem to work and did not zoom in on the given extent. (By the way, I've also tried to use text() instead of annotate(), to no avail.)
What I did was the following:
I used the same US county data as in the question linked above, extracted the files, and then executed the following in Jupyter notebook:
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
shpfile='shp/cb_2015_us_county_20m.shp'
gdf=gpd.read_file(shpfile)
gdf.plot()
, which gives a map of all US counties as expected:
Adding labels as with one of the answers also works:
ax = gdf.plot()
gdf.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1);
However, when trying to zoom in to a particular geographic extent with set_xlim() and set_ylim() as follows:
ax = gdf.plot()
gdf.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1);
ax.set_xlim(-84.2, -83.4)
ax.set_ylim(42, 42.55)
, the two functions do not seem to work. Instead of zooming in, they just trimmed everything outside of the given extent.
If the labeling code is dropped out (gdf.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1);, the set_xlim() works as expected:
My question is:
What is the correct way to zoom in to an area when labels are present in a plot?
You need some coordinate transformation.
import cartopy.crs as ccrs
# relevant code follows
# set numbers in degrees of longitude
ax.set_xlim(-84.2, -83.4, ccrs.PlateCarree())
# set numbers in degrees of latitude
ax.set_ylim(42, 42.55, ccrs.PlateCarree())
plt.show()
with the option ccrs.PlateCarree(), the input values are transformed to proper data coordinates.
When I try it, I can't draw on matplotlib with the axes restricted. So it's possible to extract the data.
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,1, figsize=(4,4), dpi=144)
shpfile = './cb_2015_us_county_20m/cb_2015_us_county_20m.shp'
gdf = gpd.read_file(shpfile)
# gdf = gdf.loc[gdf['STATEFP'] == '27']
gdf['coords'] = gdf['geometry'].apply(lambda x: x.representative_point().coords[:])
gdf['coords'] = [coords[0] for coords in gdf['coords']]
gdf = (gdf[(gdf['coords'].str[0] >= -84.2) & (gdf['coords'].str[0] <= -83.4)
& (gdf['coords'].str[1] >= 42) & (gdf['coords'].str[1] <= 42.55)])
gdf.plot(ax=ax)
gdf.apply(lambda x: ax.annotate(text=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1)

Time series plot of categorical or binary variables in pandas or matplotlib

I have data that represent a time series of categorical variables. I want to display the transitions in categories below a traditional line plot of related continuous time series to show off context as time evolves. I'd like to know the best way to do this. My attempt was in terms of Rectangles. The appearance is a bit weird, and importantly the axis labels for the x axis don't render as dates.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
from pandas.plotting import register_matplotlib_converters
import matplotlib.dates as mdates
register_matplotlib_converters()
t0 = pd.DatetimeIndex(["2017-06-01 00:00","2017-06-17 00:00","2017-07-03 00:00","2017-08-02 00:00","2017-08-09 00:00","2017-09-01 00:00"])
t1 = pd.DatetimeIndex(["2017-06-01 00:00","2017-08-15 00:00","2017-09-01 00:00"])
df0 = pd.DataFrame({"cat":[0,2,1,2,0,1]},index = t0)
df1 = pd.DataFrame({"op":[0,1,0]},index=t1)
# Create new plot
fig,ax = plt.subplots(1,figsize=(8,3))
data_layout = {
"cat" : {0: ('bisque','Low'),
1: ('lightseagreen','Medium'),
2: ('rebeccapurple','High')},
"op" : {0: ('darkturquoise','Open'),
1: ('tomato','Close')}
}
vars =("cat","op")
dfs = [df0,df1]
all_ticks = []
leg = []
for j,(v,d) in enumerate(zip(vars,dfs)):
dvals = d[v][:].astype("d")
normal = mpl.colors.Normalize(vmin=0, vmax=2.)
colors = plt.cm.Set1(0.75*normal(dvals.as_matrix()))
handles = []
for i in range(d.count()-1):
s = d[v].index.to_pydatetime()
level = d[v][i]
base = d[v].index[i]
w = s[i+1] - s[i]
patch=mpl.patches.Rectangle((base,float(j)),width=w,color=data_layout[v][level][0],height=1,fill=True)
ax.add_patch(patch)
for lev in data_layout[v]:
print data_layout[v][level]
handles.append(mpl.patches.Patch(color=data_layout[v][lev][0],label=data_layout[v][lev][1]))
all_ticks.append(j+0.5)
leg.append( plt.legend(handles=handles,loc = (3-3*j+1)))
plt.axhline(y=1.,linewidth=3,color="gray")
plt.xlim(pd.Timestamp(2017,6,1).to_pydatetime(),pd.Timestamp(2017,9,1).to_pydatetime())
plt.ylim(0,2)
ax.add_artist(leg[0]) # two legends on one axis
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d') # This fails
plt.yticks(all_ticks,vars)
plt.show()
which produces this with no dates and has jittery lines:. How do I fix this? Is there a better way entirely?
This is a way to display dates on x-axis:
In your code substitute the line that fails with this one:
ax.xaxis.set_major_formatter((mdates.DateFormatter('%Y-%m-%d')))
But I don't remember how it should look like, can you show us the end-result again?