I'm trying to upload excel and convert it to geodataframe
import pandas as pd
import geopandas as gpd
df = pd.read_excel('Centroids.xlsx')
df.head()
servicename servicecentroid
0 Mönchengladbach, Kreisfreie Stadt POINT (4070115.425463234 3123463.773862813)
1 Mettmann, Kreis POINT (4109488.971501033 3131686.7549837814)
2 Düsseldorf, Kreisfreie Stadt POINT (4098292.026333667 3129901.416880203)
Then I'm trying to convert it to geodataframe, but the following error occurs
gdf = gpd.GeoDataFrame(df, geometry='servicecentroid')
TypeError: Input must be valid geometry objects: POINT (4070115.425463234 3123463.773862813)
Please help me what is wrong with my data?
Thank you.
Are your servicecentroid's actual Points? If you want to create a GeoDataFrame you have to make you have a column 'geometry' with actual Point objects. For example:
df = pd.DataFrame({'servicename':['Mönchengladbach, Kreisfreie Stadt', 'Mettmann, Kreis', 'Düsseldorf, Kreisfreie Stadt'], 'geometry':[Point(4070115.425463234, 3123463.773862813), Point(4109488.971501033, 3131686.7549837814), Point(4098292.026333667, 3129901.416880203)]})
gdf = gpd.GeoDataFrame(df)
print(gdf.dtypes)
This will output (notice the geometry dtype):
servicename object
geometry geometry
dtype: object
Note that there is a comma separating the Point values, so:
Point(4070115.425463234, 3123463.773862813)
... instead of:
Point(4070115.425463234 3123463.773862813)
Edit:
To make your live even easier, you can simply run the following code to transform the points in your original dataframe to actual Point objects. This will take the original values, split them, and re-build them as Points.
def my_func(x):
l = re.search(r'\((.*?)\)',x).group(1).split(' ')
return Point(float(l[0]), float(l[1]))
df.geometry = df.geometry.transform(my_func)
it appears that servicecentroid is a WKT string
GeoDataFrame() geometry argument is a list/array/series of geometry objects not a column name
hence it becomes simple to convert series of WKT strings to series of geometric objects using shapely
import pandas as pd
import io
import shapely.wkt
import geopandas as gpd
df = pd.read_csv(
io.StringIO(
"""servicename servicecentroid
0 Mönchengladbach, Kreisfreie Stadt POINT (4070115.425463234 3123463.773862813)
1 Mettmann, Kreis POINT (4109488.971501033 3131686.7549837814)
2 Düsseldorf, Kreisfreie Stadt POINT (4098292.026333667 3129901.416880203)"""
),
sep="\s\s+",
engine="python",
)
# NB CRS is missing, looks like it is a UTM CRS....
gpd.GeoDataFrame(df, geometry=df["servicecentroid"].apply(shapely.wkt.loads))
Related
I have tried to plot polygons to map with Geopandas and Folium using Geopandas official tutorial and this dataset. I tried to follow the tutorial as literally as I could but still Folium don't draw polygons. Matplotlib map works and I can create Folium map too. Code:
import pandas as pd
import geopandas as gdp
import folium
import matplotlib.pyplot as plt
df = pd.read_csv('https://geo.stat.fi/geoserver/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=postialue:pno_tilasto&outputFormat=csv')
df.to_csv('coordinates.csv')
#limit to Helsinki and drop unnecessary columns
df['population_2019'] = df['he_vakiy']
df['zipcode'] = df['postinumeroalue'].astype(int)
df['population_2019'] = df['population_2019'].astype(int)
df = df[df['zipcode'] < 1000]
df = df[['zipcode', 'nimi', 'geom', 'population_2019']]
df.to_csv('coordinates_hki.csv')
df.head()
#this is from there: https://gis.stackexchange.com/questions/387225/set-geometry-in-#geodataframe-to-another-column-fails-typeerror-input-must-be
from shapely.wkt import loads
df = gdp.read_file('coordinates_hki.csv')
df.geometry = df['geom'].apply(loads)
df.plot(figsize=(6, 6))
plt.show()
df = df.set_crs(epsg=4326)
print(df.crs)
df.plot(figsize=(6, 6))
plt.show()
m = folium.Map(location=[60.1674881,24.9427473], zoom_start=10, tiles='CartoDB positron')
m
for _, r in df.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gdp.GeoSeries(r['geometry']).simplify(tolerance=0.00001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'})
folium.Popup(r['nimi']).add_to(geo_j)
geo_j.add_to(folium.Popup(r['nimi']))
m
The trick here is to realize that your data is not in units of degrees. You can determine this by looking at the centroid of your polygons:
>>> print(df.geometry.centroid)
0 POINT (381147.564 6673464.230)
1 POINT (381878.124 6676471.194)
2 POINT (381245.290 6677483.758)
3 POINT (381050.952 6678206.603)
4 POINT (382129.741 6677505.464)
...
79 POINT (397465.125 6676003.926)
80 POINT (393716.203 6675794.166)
81 POINT (393436.954 6679515.888)
82 POINT (395196.736 6677776.331)
83 POINT (398338.591 6675428.040)
Length: 84, dtype: geometry
These values are way bigger than the normal range for geospatial data, which is -180 to 180 for longitude, and -90 to 90 for latitude. The next step is to figure out what CRS it is actually in. If you take your dataset URL, and strip off the &outputFormat=csv part, you get this URL:
https://geo.stat.fi/geoserver/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=postialue:pno_tilasto
Search for CRS in that document, and you'll find this:
<gml:Envelope srsName="urn:ogc:def:crs:EPSG::3067" srsDimension="2">
So, it turns out your data is in EPSG:3067, a standard for representing Finnish coordiates.
You need to tell geopandas about this, and convert into WGS84 (the most common coordinate system) to make it compatible with folium.
df.geometry = df['geom'].apply(loads)
df = df.set_crs('EPSG:3067')
df = df.to_crs('WGS84')
The function set_crs(), changes the coordinate system that GeoPandas expects the data to be in, but does not change any of the coordinates. The function to_crs() takes the points in the dataset and re-projects them into a new coordinate system. The effect of these two calls is to convert from EPSG:3067 to WGS84.
By adding these two lines, I get the following result:
I want to convert a multidimensional climate data into the pandas data frame. The shape of my numpy array is temperature.shape -> (365,100,200) -> ["time", "longitude", "latitude"]. Then I would like to have the following columns in my pandas dataframe: columns=["time", "lon", "lat", "temp"].
I tried this code:
df = pd.DataFrame(temperature, columns=['time', 'lat', 'lon', 'temp'])
I got this error:
ValueError: Must pass 2-d input
How can I solve it? I could not find any hint in suggested topics. Thanks.
Pandas is expects a 2D array where the columns and rows correspond to the final data frame.
It looks like you're trying to unravel the (365,100,200) array in 365*100*200=7,300,000 individual records. This can be done by flattening the array if you have the values for each independent quantity along each access.
For example, unravelling a (3,4,5) shaped 3D array with X, Y and Z dimensions given by the lists/arrays x_index, y_index, z_index, rather than time, longitude, latitude and M replacing temperature:
import numpy as np
import pandas as pd
nx = 3
ny = 4
nz = 5
M = np.ndarray((nx,ny,nz))
for i in range(nx):
for j in range(ny):
for k in range(nz):
M[i,j,k] = (i+j)*k
# constructed nx by ny by nz matrix from function f(x,y,z) = (x+y)*z
x_index = list(range(nx))
y_index = list(range(ny))
z_index = list(range(nz))
# Get arrays/list giving the values of x/y/z
X, Y, Z = np.meshgrid(x_index,y_index,z_index)
# Make (3,4,5) arrays of each independent variable
pd.DataFrame({"M=(X+Y)*Z":M.flatten(), "X":X.flatten(), "Y":Y.flatten(), "Z":Z.flatten()})
# Flatten the data and independent variables to make 3*4*5=60 individual records
I am beginner in Python and I am stuck with data which is array of 32763 number, separated by comma. Please find the data here data
I want to convert this into two column 1 from (0:16382) and 2nd column from (2:32763). in the end I want to plot column 1 as x axis and column 2 as Y axis. I tried the following code but I am not able to extract the columns
import numpy as np
import pandas as pd
import matplotlib as plt
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
df = pd.DataFrame(data.flatten())
print(df)
and then I want to write the data in some file let us say data1 in the format as shown in attached pic
It is hard to answer without seeing the format of your data, but you can try
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
print(data.shape) # here we check we got something useful
# this should split data into x,y at position 16381
x = data[:16381]
y = data[16381:]
# now you can create a dataframe and print to file
df = pd.DataFrame({'x':x, 'y':y})
df.to_csv('data1.csv', index=False)
Try this.
#input as dataframe df, its chunk_size, extract output as list. you can mention chunksize what you want.
def split_dataframe(df, chunk_size = 16382):
chunks = list()
num_chunks = len(df) // chunk_size + 1
for i in range(num_chunks):
chunks.append(df[i*chunk_size:(i+1)*chunk_size])
return chunks
or
np.array_split
I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6
I'd like to plot products, ratios, etc of columns in a Pandas Data Frame without first creating a new column containing that product, ratio, etc. E.g.,
[df['A']/df['A']].plot()
doesn't work. For the following code:
x = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(x,columns=['A','B','C'])
[df['A']/df['B']].plot()
I get the following error message: "AttributeError: 'list' object has no attribute 'plot' "
The division operation which you are doing in this line:
[df['A']/df['B']].plot()
returns a python list object instead of pandas object.
If you want to plot a particular column first without adding it to the dataframe, you can try this:
import pandas as pd
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(x,columns=['A','B','C'])
df['A'].div(df['B']).plot()
which returns a <matplotlib.axes._subplots.AxesSubplot> object