Assign Linestring to Polygon based on Max length - pandas

I have two geopandas dataframes one is Linestring and other is Polygon. I need to assign the Linestring to Polygon based on Max length. The plot of them looks below.The two polygons are separated by edge color which is Balck.
I am using the following code to assign Linestring to Polygon
well_segments = gpd.overlay(Polygons,Linestring, how='intersection')
well_segments['segment_length'] = well_segments.length
well_segments["geometry"] = well_segments.geometry.to_wkt()
well_segments_df = spark.createDataFrame(well_segments)
windowSpec = Window.partitionBy("api12").orderBy(col("segment_length").desc())
well_segments_valid_df = well_segments_df.select("API", "ID", f.row_number().over(windowSpec).alias("rn"), "segment_length", "geometry").filter(f.col("rn") == 1)
Is there any most efiicient way of doing it in Geopandas or Pandas

you have not provided any sample data. So have used some polygons from natural earth dataset and generated 5 lines which will be of different lengths in each to these polygons
the actual solution is:
use sjoin() instead of overlay()
filter down line with greatest length for each polygon (index_right)
longest = (
gpd.sjoin(linestrings, polygons, predicate="intersects")
.assign(len=lambda d: d["geometry"].length)
.sort_values(["index_right", "len"])
.groupby("index_right")["geometry"]
.last()
)
### full working code ###
import geopandas as gpd
from shapely.geometry import LineString
import numpy as np
import folium
import warnings
r = np.random.RandomState(22)
polygons = (
gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
.loc[lambda d: d["geometry"].type.eq("Polygon") & d["continent"].eq("Africa")]
.sample(5, random_state=r)
.loc[:, ["geometry"]]
)
lss = (
polygons.exterior.apply(
lambda g: np.array(g.coords)[r.choice(len(g.coords), [5, 2])]
)
.explode()
.apply(LineString)
)
linestrings = gpd.GeoDataFrame(geometry=lss, crs=polygons.crs).reset_index(drop=True)
# find the longest line in each polygon
with warnings.catch_warnings():
warnings.simplefilter("ignore")
longest = (
gpd.sjoin(linestrings, polygons, predicate="intersects")
.assign(len=lambda d: d["geometry"].length)
.sort_values(["index_right", "len"])
.groupby("index_right")["geometry"]
.last()
)
longest = gpd.GeoSeries(longest, crs=polygons.crs)
# visualise it...
m = polygons.explore(height=300, width=600, color="cyan", name="polys")
m = linestrings.explore(m=m, name="all lines", color="blue", style_kwds={"weight":.8})
m = longest.explore(m=m, name="longest", color="red")
folium.LayerControl().add_to(m)
m

Related

Merge countries using Cartopy

I am using the following code to make a map for Sweden, Norway and Finland together as one area. however, I am struggling with it. I'm following this example, Python Mapping in Matplotlib Cartopy Color One Country.
from shapely.geometry import Polygon
from cartopy.io import shapereader
import cartopy.io.img_tiles as cimgt
import cartopy.crs as ccrs
import geopandas
import matplotlib.pyplot as plt
def rect_from_bound(xmin, xmax, ymin, ymax):
"""Returns list of (x,y)'s for a rectangle"""
xs = [xmax, xmin, xmin, xmax, xmax]
ys = [ymax, ymax, ymin, ymin, ymax]
return [(x, y) for x, y in zip(xs, ys)]
# request data for use by geopandas
resolution = '10m'
category = 'cultural'
name = 'admin_0_countries'
countries = ['Norway', 'Sweden', 'Finland']
shpfilename = shapereader.natural_earth(resolution, category, name)
df = geopandas.read_file(shpfilename)
extent = [2, 32, 55, 72]
# get geometry of a country
for country in (countries):
poly = [df.loc[df['ADMIN'] == country]['geometry'].values[0]]
stamen_terrain = cimgt.StamenTerrain()
# projections that involved
st_proj = stamen_terrain.crs #projection used by Stamen images
ll_proj = ccrs.PlateCarree() #CRS for raw long/lat
# create fig and axes using intended projection
fig = plt.figure(figsize=(8,9))
ax = fig.add_subplot(122, projection=st_proj)
ax.add_geometries(poly, crs=ll_proj, facecolor='none', edgecolor='black')
pad1 = 0.5 #padding, degrees unit
exts = [poly[0].bounds[0] - pad1, poly[0].bounds[2] + pad1, poly[0].bounds[1] - pad1, poly[0].bounds[3] + pad1];
ax.set_extent(exts, crs=ll_proj)
# make a mask polygon by polygon's difference operation
# base polygon is a rectangle, another polygon is simplified switzerland
msk = Polygon(rect_from_bound(*exts)).difference( poly[0].simplify(0.01) )
msk_stm = st_proj.project_geometry (msk, ll_proj) # project geometry to the projection used by stamen
# get and plot Stamen images
ax.add_image(stamen_terrain, 8) # this requests image, and plot
# plot the mask using semi-transparency (alpha=0.65) on the masked-out portion
ax.add_geometries( msk_stm, st_proj, zorder=12, facecolor='white', edgecolor='none', alpha=0.65)
ax.gridlines(draw_labels=True)
plt.show()
What I have is separated maps. THoguh I need only one map of them.
Can you please help?
Thank you.
The code here that you adapted to your work is good for a single country. If multiple contiguous countries are new target, one need to select all of them and dissolve into a single geometry. Only a few lines of code need to be modified.
Example: new target countries: ['Norway','Sweden', 'Finland']
The line of code that need to be replaced:
poly = [df.loc[df['ADMIN'] == 'Switzerland']['geometry'].values[0]]
Replace it with these lines of code:
scan3 = df[ df['ADMIN'].isin(['Norway','Sweden', 'Finland']) ]
scan3_dissolved = scan3.dissolve(by='LEVEL')
poly = [scan3_dissolved['geometry'].values[0]]
And you should get a plot similar to this:

Stacked Bar Graph with Errorbars in Pandas / Matplotlib

I want to show my Data in two (or more) stacked Bargraphs inkluding Errorbars. My Code leans on an working Example, but uses df`s at input instead of Arrays.
I tried to set the df-output to an array, but this will not work
from uncertain_panda import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
raw_data = {'': ['Error', 'Value'],'Stars': [3, 18],'Cats': [2,15],'Planets': [1,12],'Dogs': [2,16]}
df = pd.DataFrame(raw_data)
df.set_index('', inplace=True)
print(df)
N = 2
ind = np.arange(N)
width = 0.35
first_Value = df.loc[['Value'],['Cats','Dogs']]
second_Value = df.loc[['Value'],['Stars','Planets']]
first_Error = df.loc[['Error'],['Cats','Dogs']]
second_Error = df.loc[['Error'],['Stars','Planets']]
p1 = plt.bar(ind, first_Value, width, yerr=first_Error)
p2 = plt.bar(ind, second_Value, width, yerr=second_Error, bottom=first_Value)
plt.xticks(ind, ('Pets', 'Universe'))
plt.legend((p1[0], p2[0]), ('Cats', 'Dogs', 'Stars', 'Planets'))
plt.show()
I expect an output like this:
https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/bar_stacked.html#sphx-glr-gallery-lines-bars-and-markers-bar-stacked-py
Instead i get this error:
TypeError: only size-1 arrays can be converted to Python scalars

Adding Labels to a Shapefile Map

I have a shapefile that maps the world to sales territories. The shapefile records lists the sales territory code and name. I would like to be able to add the territory code in the center of the region, but to do using ax.text, I need the center point of the region. Any ideas how to do this?
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import shapefile
from mpl_toolkits.basemap import Basemap as Basemap
from matplotlib.colors import rgb2hex, Normalize
from matplotlib.patches import Polygon
from matplotlib.colorbar import ColorbarBase
from matplotlib.collections import PatchCollection
plt.rcParams['figure.figsize'] = [16,12]
fig = plt.figure()
m = Basemap(llcrnrlon=-121,llcrnrlat=20,urcrnrlon=-62,urcrnrlat=51,
projection='lcc',lat_1=32,lat_2=45,lon_0=-95)
shp_info = m.readshapefile('world_countries_boundary_file_world_2002','countries',drawbounds=True)
sf = shapefile.Reader('territory_map') # Custom file mapping out territories
recs = sf.records()
shapes = sf.shapes()
Nshp = len(shapes)
colors={}
territory_codes=[]
cmap = plt.cm.RdYlGn
# details is a pandas datafile with column "DELTA" that has data to plot
vmin = details.DELTA.min()
vmax = details.DELTA.max()
norm = Normalize(vmin=vmin, vmax=vmax)
for index,row in details.iterrows():
colors[row['TERRITORY_CODE']] = cmap((row['DELTA']-vmin)/(vmax-vmin))[:3]
territory_codes.append(row['TERRITORY_CODE'])
ax = fig.add_subplot(111)
for nshp in range(Nshp):
ptchs = []
pts = np.array((shapes[nshp].points))
prt = shapes[nshp].parts
par = list(prt) + [pts.shape[0]]
for pij in range(len(prt)):
ptchs.append(Polygon(pts[par[pij]:par[pij+1]]))
try:
color = rgb2hex(colors[recs[nshp][0]])
except:
color = 'w' # If no data, leave white (blank)
ax.add_collection(PatchCollection(ptchs, facecolor=color, edgecolor='b', linewidths=.7))
x, y = # Coordinates that are center of region
ax.text(x, y, recs[nshp][0]) # <---- this would be the text to add
# Add colorbar
ax_c = fig.add_axes([0.1, 0.1, 0.8, 0.02])
cb = ColorbarBase(ax_c,cmap=cmap,norm=norm,orientation='horizontal')
cb.ax.set_xlabel('Daily Change, USD')
#Set view to United States
ax.set_xlim(-150,-40)
ax.set_ylim(15,70)
plt.show()
Resulting Map of Code without Territory Names
you're probably looking to take the mean of all the x coordinates and the mean of all the y coordinates of your polygon shape.
I can't test this but it could look something like this:
x,y = pts[0].mean(), pts[1].mean()
or this:
x,y = pts[:,0].mean(), pts[:,1].mean()
depending on the dimensions of your numpy array.

Plotting Lat/Long Points Using Basemap

I am trying to plot points on a map using matplotlib and Basemap, where the points represent the lat/long for specific buildings. My map does indeed plot the points, but puts them in the wrong location. When I use the same data and do the same thing using Bokeh, instead of matplotlib and basemap, I get the correct plot.
Here is the CORRECT result in Bokeh:
Bokeh Version
And here is the INCORRECT result in Basemap:
Basemap Version
I have seen discussion elsewhere on StackOverflow that suggested this might be related to the fact that plot() "shifts" the longitude somehow. I've tried the suggestion from there, which was to include the line:
lons, lats = m.shiftdata(long, lat)
and then use the shifted data. That didn't have any visible impact.
My full sample code which generates both of the plots in Basemap and Bokeh is here:
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.sampledata.us_states import data as states
from bokeh.models import ColumnDataSource, Range1d
# read in data to use for plotted points
buildingdf = pd.read_csv('buildingdata.csv')
lat = buildingdf['latitude'].values
long = buildingdf['longitude'].values
# determine range to print based on min, max lat and long of the data
margin = .2 # buffer to add to the range
lat_min = min(lat) - margin
lat_max = max(lat) + margin
long_min = min(long) - margin
long_max = max(long) + margin
# create map using BASEMAP
m = Basemap(llcrnrlon=long_min,
llcrnrlat=lat_min,
urcrnrlon=long_max,
urcrnrlat=lat_max,
lat_0=(lat_max - lat_min)/2,
lon_0=(long_max-long_min)/2,
projection='merc',
resolution = 'h',
area_thresh=10000.,
)
m.drawcoastlines()
m.drawcountries()
m.drawstates()
m.drawmapboundary(fill_color='#46bcec')
m.fillcontinents(color = 'white',lake_color='#46bcec')
# convert lat and long to map projection coordinates
lons, lats = m(long, lat)
# plot points as red dots
m.scatter(lons, lats, marker = 'o', color='r')
plt.show()
# create map using Bokeh
source = ColumnDataSource(data = dict(lat = lat,lon = long))
# get state boundaries
state_lats = [states[code]["lats"] for code in states]
state_longs = [states[code]["lons"] for code in states]
p = figure(
toolbar_location="left",
plot_width=1100,
plot_height=700,
)
# limit the view to the min and max of the building data
p.y_range = Range1d(lat_min, lat_max)
p.x_range = Range1d(long_min, long_max)
p.xaxis.visible = False
p.yaxis.visible = False
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches(state_longs, state_lats, fill_alpha=0.0,
line_color="black", line_width=2, line_alpha=0.3)
p.circle(x="lon", y="lat", source = source, size=4.5,
fill_color='red',
line_color='grey',
line_alpha=.25
)
show(p)
I don't have enough reputation points to post a link to the data or to include it here.
In the basemap plot the scatter points are hidden behind the fillcontinents. Removing the two lines
#m.drawmapboundary(fill_color='#46bcec')
#m.fillcontinents(color = 'white',lake_color='#46bcec')
would show you the points. Because this might be undesired, the best solution would be to place the scatter on top of the rest of the map by using the zorder argument.
m.scatter(lons, lats, marker = 'o', color='r', zorder=5)
Here is the complete code (and I would like to ask you to include this kind of runnable minimal example with hardcoded data next time asking a question, as it saves everyone a lot of work inventing the data oneself):
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import pandas as pd
import io
u = u"""latitude,longitude
42.357778,-71.059444
39.952222,-75.163889
25.787778,-80.224167
30.267222, -97.763889"""
# read in data to use for plotted points
buildingdf = pd.read_csv(io.StringIO(u), delimiter=",")
lat = buildingdf['latitude'].values
lon = buildingdf['longitude'].values
# determine range to print based on min, max lat and lon of the data
margin = 2 # buffer to add to the range
lat_min = min(lat) - margin
lat_max = max(lat) + margin
lon_min = min(lon) - margin
lon_max = max(lon) + margin
# create map using BASEMAP
m = Basemap(llcrnrlon=lon_min,
llcrnrlat=lat_min,
urcrnrlon=lon_max,
urcrnrlat=lat_max,
lat_0=(lat_max - lat_min)/2,
lon_0=(lon_max-lon_min)/2,
projection='merc',
resolution = 'h',
area_thresh=10000.,
)
m.drawcoastlines()
m.drawcountries()
m.drawstates()
m.drawmapboundary(fill_color='#46bcec')
m.fillcontinents(color = 'white',lake_color='#46bcec')
# convert lat and lon to map projection coordinates
lons, lats = m(lon, lat)
# plot points as red dots
m.scatter(lons, lats, marker = 'o', color='r', zorder=5)
plt.show()

Removing numpy meshgrid points outside of a Shapely polygon

I have a 10 x 10 grid that I would like to remove points outside of a shapely Polygon:
import numpy as np
from shapely.geometry import Polygon, Point
from descartes import PolygonPatch
gridX, gridY = np.mgrid[0.0:10.0, 0.0:10.0]
poly = Polygon([[1,1],[1,7],[7,7],[7,1]])
#plot original figure
fig = plt.figure()
ax = fig.add_subplot(111)
polyp = PolygonPatch(poly)
ax.add_patch(polyp)
ax.scatter(gridX,gridY)
plt.show()
Here is the resulting figure:
And what I want the end result to look like:
I know that I can reshape the array to a 100 x 2 array of grid points:
stacked = np.dstack([gridX,gridY])
reshaped = stacked.reshape(100,2)
I can see if the point lies within the polygon easily:
for i in reshaped:
if Point(i).within(poly):
print True
But I am having trouble taking this information and modifying the original grid
You're pretty close already; instead of printing True, you could just append the points to a list.
output = []
for i in reshaped:
if Point(i).within(poly):
output.append(i)
output = np.array(output)
x, y = output[:, 0], output[:, 1]
It seems that Point.within doesn't consider points that lie on the edge of the polygon to be "within" it though.