Create polygons from points with GeoPandas - pandas

I have a geopandas dataframe containing a list of shapely POINT geometries. There is another column with a list of ID's that specifies which unique polygon each point belongs to. Simplified input code is:
import pandas as pd
from shapely.geometry import Point, LineString, Polygon
from geopandas import GeoDataFrame
data = [[1,10,10],[1,15,20],[1,20,10],[2,30,30],[2,35,40],[2,40,30]]
df_poly = pd.DataFrame(data, columns = ['poly_ID','lon', 'lat'])
geometry = [Point(xy) for xy in zip(df_poly.lon, df_poly.lat)]
geodf_poly = GeoDataFrame(df_poly, geometry=geometry)
geodf_poly.head()
I would like to groupby the poly_ID in order to convert the geometry from POINT to POLYGON. This output would essentially look like:
poly_ID geometry
1 POLYGON ((10 10, 15 20, 20 10))
2 POLYGON ((30 30, 35 40, 40 30))
I imagine this is quite simple, but I'm having trouble getting it to work. I found the following code that allowed me to convert it to open ended polylines, but have not been able to figure it out for polygons. Can anyone suggest how to adapt this?
geodf_poly = geodf_poly.groupby(['poly_ID'])['geometry'].apply(lambda x: LineString(x.tolist()))
Simply replacing LineString with Polygon results in TypeError: object of type 'Point' has no len()

Your request is a bit tricky to accomplish in Pandas because, in your output you want the text 'POLYGON' but numbers inside the brackets.
See the below options work for you
from itertools import chain
df_poly.groupby('poly_ID').agg(list).apply(lambda x: tuple(chain.from_iterable(zip(x['lon'], x['lat']))), axis=1).reset_index(name='geometry')
output
poly_ID geometry
0 1 (10, 10, 15, 20, 20, 10)
1 2 (30, 30, 35, 40, 40, 30)
or
from itertools import chain
df_new =df_poly.groupby('poly_ID').agg(list).apply(lambda x: tuple(chain.from_iterable(zip(x['lon'], x['lat']))), axis=1).reset_index(name='geometry')
df_new['geometry']=df_new.apply(lambda x: 'POLYGON ('+str(x['geometry'])+')',axis=1 )
df_new
Output
poly_ID geometry
0 1 POLYGON ((10, 10, 15, 20, 20, 10))
1 2 POLYGON ((30, 30, 35, 40, 40, 30))
Note: Column geometry is a string & I am not sure you can feed this directly into Shapely

This solution works for large data via .dissolve and .convex_hull.
>>> import pandas as pd
>>> import geopandas as gpd
>>> df = pd.DataFrame(
... {
... "x": [0, 1, 0.1, 0.5, 0, 0, -1, 0],
... "y": [0, 0, 0.1, 0.5, 1, 0, 0, -1],
... "label": ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b'],
... }
... )
>>> gdf = geopandas.GeoDataFrame(
... df,
... geometry=gpd.points_from_xy(df["x"], df["y"]),
... )
>>> gdf
x y label geometry
0 0.0 0.0 a POINT (0.00000 0.00000)
1 1.0 1.0 a POINT (1.00000 1.00000)
2 0.1 0.1 a POINT (0.10000 0.10000)
3 0.5 0.5 a POINT (0.50000 0.50000)
4 0.0 1.0 a POINT (0.00000 1.00000)
5 0.0 0.0 b POINT (0.00000 0.00000)
6 -1.0 0.0 b POINT (-1.00000 0.00000)
7 0.0 -1.0 b POINT (0.00000 -1.00000)
>>> res = gdf.dissolve("label").convex_hull
>>> res.to_wkt()
label
a POLYGON ((0 0, 0 1, 1 0, 0 0))
b POLYGON ((0 -1, -1 0, 0 0, 0 -1))
dtype: object

Related

plotting 2 dictionaries in matplotlib

I have 2 dictionaries: dict1 = {'Beef':10, 'Poultry': 13, 'Pork': 14, 'Lamb': 11} and dict2 = {'Beef':3, 'Poultry': 1, 'Pork': 17, 'Lamb': 16}
I want to plot a double bar chart using the dictionary keys as the x-axis values, and the associated values on the y-axis. I am using matplotlib for this. does anyone have any information?
This part of the matplotlib documentation may what you are looking for. To plot your data, the x and y values need to be extracted from the dicts, for example via dict.keys() and dict.values().
import matplotlib.pyplot as plt
import numpy as np
dict1 = {'Beef':10, 'Poultry': 13, 'Pork': 14, 'Lamb': 11}
dict2 = {'Beef':3, 'Poultry': 1, 'Pork': 17, 'Lamb': 16}
x = dict1.keys()
y1 = dict1.values()
y2 = dict2.values()
N = len(x)
fig, ax = plt.subplots()
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars
p1 = ax.bar(ind, y1, width)
p2 = ax.bar(ind + width, y2, width)
ax.set_xticks(ind + width / 2)
ax.set_xticklabels(x)
ax.legend((p1[0], p2[0]), ('dict1', 'dict2'))
plt.show()
Result:
I'd like to propose a more general approach: instead of just two dicts, what happens if we have a list of dictionaries?
In [89]: from random import randint, seed, shuffle
...: seed(20201213)
...: cats = 'a b c d e f g h i'.split() # categories
...: # List Of Dictionaries
...: lod = [{k:randint(5, 15) for k in shuffle(cats) or cats[:-2]} for _ in range(5)]
...: lod
Out[89]:
[{'d': 14, 'h': 10, 'i': 13, 'f': 13, 'c': 5, 'b': 5, 'a': 14},
{'h': 12, 'd': 5, 'c': 5, 'i': 11, 'b': 14, 'g': 8, 'e': 13},
{'d': 8, 'a': 12, 'f': 7, 'h': 10, 'g': 10, 'c': 11, 'i': 12},
{'g': 11, 'f': 8, 'i': 14, 'h': 11, 'a': 5, 'c': 7, 'b': 8},
{'e': 11, 'h': 13, 'c': 5, 'i': 8, 'd': 12, 'a': 11, 'g': 11}]
As you can see, the keys are not ordered in the same way and the dictionaries do not contain all the possible keys...
Our first step is to find a list of keys (lok), using a set comprehension, followed by sorting the keys (yes, we already know the keys, but here we are looking for a general solution…)
In [90]: lok = sorted(set(k for d in lod for k in d))
The number of elements in the two lists are
In [91]: nk, nd = len(lok), len(lod)
At this point we can compute the width of a single bar, saying that the bar groups are 1 unit apart (hence x = range(nk)) and that we leave 1/3 unit between the groups, we have
In [92]: x, w = range(nk), 0.67/nd
We are ready to go with the plot
In [93]: import matplotlib.pyplot as plt
...: for n, d in enumerate(lod):
...: plt.bar([ξ+n*w for ξ in x], [d.get(k, 0) for k in lok], w,
...: label='dict %d'%(n+1))
...: plt.xticks([ξ+w*nd/2 for ξ in x], lok)
...: plt.legend();
Let's write a small function
def plot_lod(lod, ws=0.33, ax=None, legend=True):
"""bar plot from the values in a list of dictionaries.
lod: list of dictionaries,
ws: optional, white space between groups of bars as a fraction of unity,
ax: optional, the Axes object to draw into,
legend: are we going to draw a legend?
Return: the Axes used to plot and a list of BarContainer objects."""
from matplotlib.pyplot import subplot
from numpy import arange, nan
if ax is None : ax = subplot()
lok = sorted({k for d in lod for k in d})
nk, nd = len(lok), len(lod)
x, w = arange(nk), (1.0-ws)/nd
lobars = [
ax.bar(x+n*w, [d.get(k, nan) for k in lok], w, label='%02d'%(n+1))
for n, d in enumerate(lod)
]
ax.set_xticks(x+w*nd/2-w/2)
ax.set_xticklabels(lok)
if legend : ax.legend()
return ax, lobars
Using the data of the previous example, we get a slightly different graph…

Expected a bytes object, got a 'int' object erro with cudf

I have a pandas dataframe, all the columns are objects type. I am trying to convert it to cudf by typing cudf.from_pandas(df) but I have this error:
ArrowTypeError: Expected a bytes object, got a 'int' object
I don't understand why even that columns are string and not int.
My second question is how to append to a cudf a new element ( like pandas : df. append())
cudf does have an df.append() capability for Series.
import cudf
df1 = cudf.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, 0.3, 0.4]})
df2 = cudf.DataFrame({'a': [4, 5, 6, 7],'b': [0.1, 0.2, None, 0.3]}) #your new elements
df3= df1.a.append(df2.a)
df3
Output:
0 0
1 1
2 2
3 3
0 4
1 5
2 6
3 7
Name: a, dtype: int64

How to define variables, constrains to Pandas Dataframe when using CVXPY for optimization?

import pandas as pd
import numpy as np
import re
import cvxpy as cvx
data = pd.read_excel('Optimality_V3.xlsx', encoding='latin-1')
As u can see I just imported a csv file as a dataframe. Now I want to solve a maximixation function using the CVXPY library to identify the optimal values of row data['D'] such that the sum of values of data['B'] is maximum.
My objective function is quadratic as my decision variable data['D'] and the function is something like this:
data['B'] = data['C'] * data['D']**2 / data['E'].
The constraints I want to assign to every row of data['D']:
data['D'] * 0.8 <= data['D'] <= data['D'] * 1.2
decision_variables = []
variable_constraints = []
for rownum, row in data.iterrows():
var_ind = str('x' + str(rownum))
var_ind = cvx.Variable()
con_ind = var_ind * 0.8 <= var_ind <= var_ind * 1.2
decision_variables.append(str(var_ind))
variable_constraints.append(str(con_ind))
The above code is my attempt at doing this. I am new to CVXPY and trying to figure out how I can create variables named var_ind with constraints con_ind.
Look at documentation for many examples: https://www.cvxpy.org/index.html
data = pd.DataFrame(data={
'A': [1, 2, 3, 4, 5],
'B': [0, 50, 40, 80, 20],
'C': [1200, 600, 900, 6500, 200],
'D': [0.4, 1.2, 0.8, 1.6, 1.1],
'E': [0.4, 0.5, 0.6, 0.4, 0.5],
'F': [0.8, 0.4, 1.2, 1.6, 1],
})
x = cvx.Variable(data.index.size)
constraints = [
x * 0.8 <= x,
x <= x * 1.2
]
objective = cvx.Minimize(
cvx.sum(
cvx.multiply((data['C']/data['E']).tolist(), x**2)
)
)
prob = cvx.Problem(objective, constraints)
prob.solve()
print x.value
The goal of my optimizer is to calculate new value's for column D such that the new values are always (D*0.8 <= new_D(or x below) <= D*1.2, lets call these bounds of x. Apart from these,
The maximization function is:
cvx.sum[cvx.multiply((data['C']*data['F']/data['D']).tolist(), x)]
I have a further constraint:
cvx.sum[cvx.multiply((data['F']*data['E']*data['C']/data['D']).tolist(), x**2)] == data['C'].sum()
import pandas as pd
import numpy as np
import re
import cvxpy as cvx
data = pd.DataFrame(data={
'A': [1, 2, 3, 4, 5],
'B': [100, 50, 40, 80, 20],
'C': [1200, 600, 900, 6500, 200],
'D': [0.4, 1.2, 0.8, 1.6, 1.1],
'E': [0.4, 0.5, 0.6, 0.4, 0.5],
'F': [0.8, 0.4, 1.2, 1.6, 1],
})
x = cvx.Variable(data.index.size)
Now, I want to add a third additional quadratic constraint that says the total sum of column C is always constant.
constraints = [
x * 0.8 <= x,
x <= x * 1.2,
cvx.sum(
cvx.multiply((data['F']*data['E']*data['C']/data['D']).tolist(), x**2)
) == data['C'].sum()
]
The minimization function as you can see is pretty simple and is linear. How do I convert this to a maximization function though?
objective = cvx.Minimmize(
cvx.sum(
cvx.multiply((data['C']*data['F']/data['D']).tolist(), x)
)
)
prob = cvx.Problem(objective, constraints)
prob.solve()
print(x.value)
I am going through the CVXPY documentation and its helping me a lot! But I don't see any examples that have a 3rd constraint designed similar to mine, and I am getting bugs 'DCPError: Problem does not follow DCP rules.'

Geopandas plots as subfigures

Say I have the following geodataframe that contains 3 polygon objects.
import geopandas as gpd
from shapely.geometry import Polygon
p1=Polygon([(0,0),(0,1),(1,1),(1,0)])
p2=Polygon([(3,3),(3,6),(6,6),(6,3)])
p3=Polygon([(3,.5),(4,2),(5,.5)])
gdf=gpd.GeoDataFrame(geometry=[p1,p2,p3])
gdf['Value1']=[1,10,20]
gdf['Value2']=[300,200,100]
gdf content:
>>> gdf
geometry Value1 Value2
0 POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0)) 1 300
1 POLYGON ((3 3, 3 6, 6 6, 6 3, 3 3)) 10 200
2 POLYGON ((3 0.5, 4 2, 5 0.5, 3 0.5)) 20 100
>>>
I can make a separate figure for each plot by calling geopandas.plot() twice. However, is there a way for me to plot both of these maps next to each other in the same figure as subfigures?
Always always always create your matplotlib objects ahead of time and pass them to the plotting methods (or use them directly). Doing so, your code becomes:
from matplotlib import pyplot
import geopandas
from shapely import geometry
p1 = geometry.Polygon([(0,0),(0,1),(1,1),(1,0)])
p2 = geometry.Polygon([(3,3),(3,6),(6,6),(6,3)])
p3 = geometry.Polygon([(3,.5),(4,2),(5,.5)])
gdf = geopandas.GeoDataFrame(dict(
geometry=[p1, p2, p3],
Value1=[1, 10, 20],
Value2=[300, 200, 100],
))
fig, (ax1, ax2) = pyplot.subplots(ncols=2, sharex=True, sharey=True)
gdf.plot(ax=ax1, column='Value1')
gdf.plot(ax=ax2, column='Value2')
Which gives me:
// for plotting multiple GeoDataframe
import geopandas as gpd
gdf = gpd.read_file(geojson)
fig, axes = plt.subplots(1,4, figsize=(40,10))
axes[0].set_title('Some Title')
gdf.plot(ax=axes[0], column='Some column for coloring', cmap='coloring option')
axes[0].set_title('Some Title')
gdf.plot(ax=axes[0], column='Some column for coloring', cmap='coloring option')

Extract triangles form delaunay filter in mayavi

How can I extract triangles from delaunay filter in mayavi?
I want to extract the triangles just like matplotlib does
import numpy as np
import matplotlib.delaunay as triang
from enthought.mayavi import mlab
x = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
z = np.zeros(9)
#matplotlib
centers, edges, triangles_index, neig = triang.delaunay(x,y)
#mayavi
vtk_source = mlab.pipeline.scalar_scatter(x, y, z, figure=False)
delaunay = mlab.pipeline.delaunay2d(vtk_source)
I want to extract the triangles from mayavi delaunay filter to obtain the variables #triangle_index and #centers (just like matplotlib)
The only thing I've found is this
http://docs.enthought.com/mayavi/mayavi/auto/example_delaunay_graph.html
but only get the edges, and are codificated different than matplotlib
To get the triangles index:
poly = delaunay.outputs[0]
tindex = poly.polys.data.to_array().reshape(-1, 4)[:, 1:]
poly is a PolyData object, poly.polys is a CellArray object that stores the index information.
For detail about CellArray: http://www.vtk.org/doc/nightly/html/classvtkCellArray.html
To get the center of every circumcircle, you need to loop every triangle and calculate the center:
centers = []
for i in xrange(poly.number_of_cells):
cell = poly.get_cell(i)
points = cell.points.to_array()[:, :-1].tolist()
center = [0, 0]
points.append(center)
cell.circumcircle(*points)
centers.append(center)
centers = np.array(centers)
cell.circumcircle() is a static function, so you need to pass all the points of the triangle as arguments, the center data will be returned by modify the fourth argument.
Here is the full code:
import numpy as np
from enthought.mayavi import mlab
x = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
z = np.zeros(9)
vtk_source = mlab.pipeline.scalar_scatter(x, y, z, figure=False)
delaunay = mlab.pipeline.delaunay2d(vtk_source)
poly = delaunay.outputs[0]
tindex = poly.polys.data.to_array().reshape(-1, 4)[:, 1:]
centers = []
for i in xrange(poly.number_of_cells):
cell = poly.get_cell(i)
points = cell.points.to_array()[:, :-1].tolist()
center = [0, 0]
points.append(center)
cell.circumcircle(*points)
centers.append(center)
centers = np.array(centers)
print centers
print tindex
The output is:
[[ 1.5 0.5]
[ 1.5 0.5]
[ 0.5 1.5]
[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 1.5]
[ 1.5 1.5]
[ 1.5 1.5]]
[[5 4 2]
[4 1 2]
[7 6 4]
[4 3 1]
[3 0 1]
[6 3 4]
[8 7 4]
[8 4 5]]
The result may not be the same as matplotlib.delaunay, because there are many possible solutions.