I am trying to merge two geodataframes (want to see which polygon each point is in).
The following code gets me a warning first ("CRS does not match!")
and then an error ("RTreeError: Coordinates must not have minimums more than maximums").
What exactly is wrong in there? Are CRS coordinates systems? If so, why are they not loaded the same way?
import geopandas as gpd
from shapely.geometry import Point, mapping,shape
from geopandas import GeoDataFrame, read_file
#from geopandas.tools import overlay
from geopandas.tools import sjoin
print('Reading points...')
points=pd.read_csv(points_csv)
points['geometry'] = points.apply(lambda z: Point(z.Latitude, z.Longitude), axis=1)
PointsGeodataframe = gpd.GeoDataFrame(points)
print PointsGeodataframe.head()
print('Reading polygons...')
PolygonsGeodataframe = gpd.GeoDataFrame.from_file(china_shapefile+".shp")
print PolygonsGeodataframe.head()
print('Merging GeoDataframes...')
merged=sjoin(PointsGeodataframe, PolygonsGeodataframe, how='left', op='intersects')
#merged = PointsGeodataframe.merge(PolygonsGeodataframe, left_on='iso_alpha2', right_on='ISO2', how='left')
print(merged.head(5))
Link to data for reproduction:
Shapefile,
GPS points
As noted in the comments on the question, you can eliminate the CRS does not match! warning by manually setting PointsGeodataframe.crs = PolygonsGeodataframe.crs (assuming the CRSs are indeed the same for both datasets).
However, that doesn't address the RTreeError. It's possible that you have missing lat/lon data in points_csv - in that case you would end up creating Point objects containing NaN values (i.e. Point(nan nan)), which go on to cause issues in rtree. I had a similar problem and the fix was just to filter out rows with missing coordinate data when loading in the CSV:
points=pd.read_csv(points_csv).dropna(subset=["Latitude", "Longitude"])
I'll add an answer here since I was recently struggling with this and couldn't find a great answer here. The Geopandas documentation has a good example for how to solve the "CRS does not match" issue.
I copied the entire code chunk from the documentation below, but the most relevant line is this one, where the to_crs() method is used to reproject a geodataframe. You can call mygeodataframe.crs to find the CRS of each dataframe, and then to_crs() to reproject one to match the other, like so:
world = world.to_crs({'init': 'epsg:3395'})
Simply setting PointsGeodataframe.crs = PolygonsGeodataframe.crs will stop the error from being thrown, but will not correctly reproject the geometry.
Full documentation code for reference:
# load example data
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Check original projection
# (it's Platte Carre! x-y are long and lat)
In [2]: world.crs
Out[2]: {'init': 'epsg:4326'}
# Visualize
In [3]: ax = world.plot()
In [4]: ax.set_title("WGS84 (lat/lon)");
# Reproject to Mercator (after dropping Antartica)
In [5]: world = world[(world.name != "Antarctica") & (world.name != "Fr. S. Antarctic Lands")]
In [6]: world = world.to_crs({'init': 'epsg:3395'}) # world.to_crs(epsg=3395) would also work
In [7]: ax = world.plot()
In [8]: ax.set_title("Mercator");
Related
I am trying to run some simple code that reads a CSV file, and runs the data to show an output in the form of a line graph. The query runs fine and gives me the below output, but for some reason it shows a very odd date format on the x-axis which leads to a very odd line with several outliers (not actually the case). Could someone help?
Date,Value
01/11/2020,4.5202
01/12/2020,4.6555
01/01/2021,4.7194
01/02/2021,4.7317
01/03/2021,4.6655
01/04/2021,4.4641
01/05/2021,4.3875
01/06/2021,4.3560
01/07/2021,4.3318
01/08/2021,4.3607
01/09/2021,4.4853
01/10/2021,4.6456
01/11/2021,5.2262
01/12/2021,5.3259
01/01/2022,5.3820
01/02/2022,5.3855
01/03/2022,5.2673
01/04/2022,4.9346
01/05/2022,4.7287
01/06/2022,4.6274
01/07/2022,4.6632
01/08/2022,4.6929
01/09/2022,4.7841
01/10/2022,4.9572
01/11/2022,5.4293
01/12/2022,5.5214
01/01/2023,5.5697
01/02/2023,5.5738
01/03/2023,5.4550
01/04/2023,5.1962
01/05/2023,4.9534
01/06/2023,4.8514
01/07/2023,4.8112
01/08/2023,4.8415
01/09/2023,4.9338
01/10/2023,5.1461
01/11/2023,5.6022
01/12/2023,5.6960
01/01/2024,5.7451
01/02/2024,5.7499
01/03/2024,5.6308
01/04/2024,5.2752
01/05/2024,5.0306
01/06/2024,4.9282
01/07/2024,4.8877
01/08/2024,4.9188
01/09/2024,5.0127
01/10/2024,5.2100
01/11/2024,5.6716
01/12/2024,5.7669
01/01/2025,5.8176
01/02/2025,5.8229
01/03/2025,5.7031
01/04/2025,5.2633
01/05/2025,5.0164
01/06/2025,4.9133
01/07/2025,4.8730
01/08/2025,4.9053
01/09/2025,5.0005
01/10/2025,5.3274
01/11/2025,5.6325
01/12/2025,5.7293
import pandas as pd
# Read in the CSV file: df
df = pd.read_csv('TTFcurve.csv', parse_dates=['Date'])
# Import figure from bokeh.plotting
from bokeh.plotting import figure, output_file, show
output_file("lines.html")
# Create the figure: p
#x = df.Date
#y = df.Value
p = figure(x_axis_label='Date', y_axis_label='Value')
# Plot mpg vs hp by color
p.line(df['Date'], df['Value'], line_color="red")
# Specify the name of the output file and show the result
show(p)
You have to tell Bokeh that your X axis is datetime:
p = figure(..., x_axis_type='datetime')
Regarding the outliers - check the data. I'm almost certain that Bokeh cannot "invent" any new points here. If you make sure that your data is absolutely fine, please post it so the above plot could be reproduced and checked.
I have the following problem: I have a list of shapely points and a list of shapely polygons.
Now I want to check in which polygon a given point is.
At the moment I am using the following code, which seems not very clever:
# polygons_df is a pandas dataframe that contains the geometry of the polygons and the usage of the polygons (landuses in this case, e.g. residential)
# point_df is a pandas dataframe that contains the geometry of the points and the usage of the point (landuses in this case, e.g. residential)
# polylist is my list of shapely polygons
# pointlist is my list of shapely points
from shapely.geometry import Point, Polygon
import pandas as pd
import geopandas as gpd
i = 0
while i < len(polygons_df.index):
j = 0
while j < len(point_df.index):
if polylist[i].contains(point):
point.at[j, 'tags.landuse'] = polygons_df.iloc[i]['tags.landuse']
else:
pass
j += 1
i += 1
Can I somehow speed this up? I have more than 100.000 points and more than 10.000 polygons and these loops take a while. Thanks!
I know a solution was found in the comments for the particular problem, but to answer a related question of how to check if an array of points is inside a shapely Polygon, I found the following solution:
>>> poly = Polygon([(0,0), (1,0), (0,1)])
>>> contains = np.vectorize(lambda p: poly.contains(Point(p)), signature='(n)->()')
>>> contains(np.array([[0.5,0.49],[0.5,0.51],[0.5,0.52]]))
array([ True, False, False])
I don't know that this neccesarily speeds up the calculations, but at least you can avoid the for-loop.
When you already have time series data set but use internal dtype to index with date/time, you seem to be able to plot the index cleanly as here.
But when I already have data files with columns of date&time in its own format, such as [2009-01-01T00:00], is there a way to have this converted into the object that the plot can read? Currently my plot looks like the following.
Code:
dir = sorted(glob.glob("bsrn_txt_0100/*.txt"))
gen_raw = (pd.read_csv(file, sep='\t', encoding = "utf-8") for file in dir)
gen = pd.concat(gen_raw, ignore_index=True)
gen.drop(gen.columns[[1,2]], axis=1, inplace=True)
#gen['Date/Time'] = gen['Date/Time'][11:] -> cause error, didnt work
filter = gen[gen['Date/Time'].str.endswith('00') | gen['Date/Time'].str.endswith('30')]
filter['rad_tot'] = filter['Direct radiation [W/m**2]'] + filter['Diffuse radiation [W/m**2]']
lis = np.arange(35040) #used the number of rows, checked by printing. THis is for 2009-2010.
plt.xticks(lis, filter['Date/Time'])
plt.plot(lis, filter['rad_tot'], '.')
plt.title('test of generation 2009')
plt.xlabel('Date/Time')
plt.ylabel('radiation total [W/m**2]')
plt.show()
My other approach in mind was to use plotly. Yet again, its main purpose seems to feed in data on the internet. It would be best if I am familiar with all the modules and try for myself, but I am learning as I go to use pandas and matplotlib.
So I would like to ask whether there are anyone who experienced similar issues as I.
I think you need set labels to not visible by loop:
ax = df.plot(...)
spacing = 10
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
if label not in visible:
label.set_visible(False)
I want to plot data from a global cube, but only for a list of countries. So I select a subcube according to the countries' "bounding box".
So far so good. What I'm looking for is an easy way to mask out all points of a cube which do not fall in any of my countries (which are represented as features), so that only those points of the cube which lie within any of my features are plotted.
Any idea is greatly appreciated =)
You can achieve this directly at the plotting stage rather than masking the cube within iris. I've approached this by setting the clip path of the artist returned by pcolor. The method is to create a list of geometries from features (in this case countries from Natural Earth, they could be from a shapefile) then transform these geometries into a matplotlib path which the image can be clipped to. I'll detail this method, and hopefully this will be enough to get you started:
I first defined a function to retrieve the Shapely geometries corresponding to given country names, the geometries come from the Natural Earth 110m administrative boundaries shapefile, access through the cartopy interface.
I then defined a second function which is a wrapper around the iris.plot.pcolor function which makes the plot and clips it to the given geometries.
Now all I need to do is set up the plot as normal, but use the plotting wrapper instead of directly calling the iris.plot.pcolor function.
Here is a complete example:
import cartopy.crs as ccrs
from cartopy.io.shapereader import natural_earth, Reader
from cartopy.mpl.patch import geos_to_path
import iris
import iris.plot as iplt
import matplotlib.pyplot as plt
from matplotlib.path import Path
def get_geometries(country_names):
"""
Get an iterable of Shapely geometries corrresponding to given countries.
"""
# Using the Natural Earth feature interface provided by cartopy.
# You could use a different source, all you need is the geometries.
shape_records = Reader(natural_earth(resolution='110m',
category='cultural',
name='admin_0_countries')).records()
geoms = []
for country in shape_records:
if country.attributes['name_long'] in country_names:
try:
geoms += country.geometry
except TypeError:
geoms.append(country.geometry)
return geoms, ccrs.PlateCarree()._as_mpl_transform
def pcolor_mask_geoms(cube, geoms, transform):
path = Path.make_compound_path(*geos_to_path(geoms))
im = iplt.pcolor(cube)
im.set_clip_path(path, transform=transform)
# First plot the full map:
cube = iris.load_cube(iris.sample_data_path('air_temp.pp'))
plt.figure(figsize=(12, 6))
ax1 = plt.axes(projection=ccrs.PlateCarree())
ax1.coastlines()
iplt.pcolor(cube)
# Now plot just the required countries:
plt.figure(figsize=(12, 6))
ax2 = plt.axes(projection=ccrs.PlateCarree())
ax2.coastlines()
countries = [
'United States',
'United Kingdom',
'Saudi Arabia',
'South Africa',
'Nigeria']
geoms, transform = get_geometries(countries)
pcolor_mask_geoms(cube, geoms, transform(ax2))
plt.show()
The results of which look like this:
If you want to use iris.plot.pcolormesh instead you will need to modify the plotting function a little bit. This is dues to a workaround for a matplotlib issue that is currently included in cartopy. The modified version would look like this:
def pcolor_mask_geoms(cube, geoms, transform):
path = Path.make_compound_path(*geos_to_path(geoms))
im = iplt.pcolormesh(cube)
im.set_clip_path(path, transform=transform)
try:
im._wrapped_collection_fix.set_clip_path(path, transform)
except AttributeError:
pass
I am having a major setback on this question on a while now...
import numpy as np
import matplotlib.pyplot as plt
plt.ion()
fig = plt.figure(1)
ax = fig.add_subplot(111)
ax.set_title("linear realtime")
line, = ax.plot([],[])
i = 0
while ( i < 1000 ):
#EDIT:
# this is just sample data, but I would eventually like to set data
# where it can be floating numbers...
line.set_data(i,i)
fig.canvas.draw()
i += 1
I am trying to draw a linear line in real time but I am unable to come up with the result. Thanks in advance. So far, I have a figure coming up but nothing is being drawn on the canvas.
EDIT:
Interesting.... I am now able to plot the dots on the line but now, I am unable to show their connectivity between each point... I also noticed that if you removed ko- when it is being plotted... nothing appears, does anybody know why?
import numpy as n
import pylab as p
import time
x=0
y=0
p.ion()
fig=p.figure(1)
ax=fig.add_subplot(111)
ax.set_xlim(0,10)
ax.set_ylim(0,10)
line,=ax.plot(x,y,'ko-')
for i in range(10):
x = i
y = i
line.set_data(x,y)
p.draw()
add a p.pause(.001) in your loop. You need to allow time for the gui event loops to trigger and update the display.
This is related to issue #1646.
The other issue you have is that when you do set_data it replaces the data that is plotted with the x and y passed in, not append to the data that is already there. (To see this clearly use p.pause(1)) When you remove 'ko-', which defaults to no marker with a line connecting points you are plotting a single point, hence nothing shows up.
I think you intended to write this:
x=0
y=0
fig=plt.figure(1)
ax=fig.add_subplot(111)
ax.set_xlim(0,10)
ax.set_ylim(0,10)
line,=ax.plot(x,y,'ko-')
for i in range(10):
x = np.concatenate((line.get_xdata(),[i]))
y = np.concatenate((line.get_ydata(),[i]))
line.set_data(x,y)
plt.pause(1)