How to convert the results of a for loop into pandas data frame? - pandas

Using the Haversine formula for distance calculation on a great circle, I use the following code to calculate the coordinates of any point between a known start location (with lat1/lon1) and a known destination (with lat2/lon2):
Here's the complete code:
from math import radians, sin, cos, acos, atan2, sqrt, pi
#enter the following numbers in the corresponding input fields:
#lat1 = starting latitude = 33.95
#lon1 = starting longitude = -118.40
#lat2 = destination latitude = 40.6333
#lon2= destination longitude = -73.7833
lat1 = radians(float(input("Starting latitude: ")))
lon1 = radians(float(input("Starting longitude: ")))
lat2 = radians(float(input("Destination latitude: ")))
lon2 = radians(float(input("Destination longitude: ")))
#Haversine formula to calculate the distance, in radians, between starting point and destination:
d = ((6371.01 * acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2)*cos(lon1 - lon2)))/1.852)/(180*60/pi)
import numpy as np
x = np.arange(0, 1, 0.2)
for f in x:
A=sin((1-f)*d)/sin(d)
B=sin(f*d)/sin(d)
x = A*cos(lat1)*cos(lon1) + B*cos(lat2)*cos(lon2)
y = A*cos(lat1)*sin(lon1) + B*cos(lat2)*sin(lon2)
z = A*sin(lat1) + B*sin(lat2)
lat_rad=atan2(z,sqrt(x**2+y**2))
lon_rad=atan2(y,x)
lat_deg = lat_rad*180/pi
lon_deg = lon_rad*180/pi
print('%.2f' %f, '%.4f' %lat_deg, '%.4f' %lon_deg)
I use the np.arange() function to do a fractional iteration, f, between 0 (the starting point) and 1 (the destination).
The output of the for loop is:
0.00 33.9500 -118.4000
0.20 36.6040 -110.2685
0.40 38.6695 -101.6259
0.60 40.0658 -92.5570
0.80 40.7311 -83.2103
Where, the first number is the fraction (f); the second number is the latitude (lat_deg) and the third number is the longitude (lon_deg).
My question is: how do I convert the output of my code into a pandas (3x6) data frame with the data arranged in 3 columns with header Fraction (col1), Latitude (col2), Longitude (col3)?
Once the output is in a pandas data frame I can then easily write the data into a CSV file.

You're almost there. With the following modifications, you will be able to get your CSV:
Append your values to a list instead of printing them.
Convert the result to a dataframe
Below is your code with the required updates. I have now tested this and it works all the way to the final CSV.
import numpy as np
import pandas as pd
from math import radians, sin, cos, acos, atan2, sqrt, pi
# Numbers per your instructions
lat1 = radians(float(33.95))
lon1 = radians(float(-118.40))
lat2 = radians(float(40.6333))
lon2 = radians(float(-73.7833))
#Haversine formula to calculate the distance, in radians, between starting point and destination:
d = ((6371.01 * acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2)*cos(lon1 - lon2)))/1.852)/(180*60/pi)
x = np.arange(0, 1, 0.2)
# An empty list into which we'll append each list of values
res = []
for f in x:
A=sin((1-f)*d)/sin(d)
B=sin(f*d)/sin(d)
x = A*cos(lat1)*cos(lon1) + B*cos(lat2)*cos(lon2)
y = A*cos(lat1)*sin(lon1) + B*cos(lat2)*sin(lon2)
z = A*sin(lat1) + B*sin(lat2)
lat_rad=atan2(z,sqrt(x**2+y**2))
lon_rad=atan2(y,x)
lat_deg = lat_rad*180/pi
lon_deg = lon_rad*180/pi
# Add the desired values, creating a list of lists
res.append([f, lat_deg, lon_deg])
# Convert the result to a dataframe
res_df= pd.DataFrame(res, columns=['Fraction', 'Latitude', 'Longitude'])
# Voila! You can now save to CSV
res_df.to_csv('coordinates.csv', index=False)

Related

Csv file search speedup

I need to build a relief profile graph by coordinates, I have a csv file with 12,000,000 lines. searching through a csv file of the same height takes about 2 - 2.5 seconds. I rewrote the csv to parquet and it helped me save some time, it takes about 1.7 - 1 second to find one height. However, I need to build a profile for 500 - 2000 values, which makes the time very long. In the future, you may have to increase the base of the csv file, which will slow down this process even more. In this regard, my question is, is it possible to somehow reduce the processing time of values?
Code example:
import dask.dataframe as dk
import numpy as np
import pandas as pd
import time
filename = 'n46_e032_1arc_v3.csv'
df = dk.read_csv(filename)
df.to_parquet('n46_e032_1arc_v3_parquet')
Latitude1y, Longitude1x = 46.6276, 32.5942
Latitude2y, Longitude2x = 46.6451, 32.6781
sec, steps, k = 0.00027778, 1, 11.73
Latitude, Longitude = [Latitude1y], [Longitude1x]
sin, cos = Latitude2y - Latitude1y, Longitude2x - Longitude1x
y, x = Latitude1y, Longitude1x
while Latitude[-1] < Latitude2y and Longitude[-1] < Longitude2x:
y, x, steps = y + sec * k * sin, x + sec * k * cos, steps + 1
Latitude.append(y)
Longitude.append(x)
time_start = time.time()
long, elevation_data = [], []
df2 = dk.read_parquet('n46_e032_1arc_v3_parquet')
for i in range(steps + 1):
elevation_line = df2[(Longitude[i] <= df2['x']) & (df2['x'] <= Longitude[i] + sec) &
(Latitude[i] <= df2['y']) & (df2['y'] <= Latitude[i] + sec)].compute()
elevation = np.asarray(elevation_line.z.tolist())
if elevation[-1] < 0:
elevation_data.append(0)
else:
elevation_data.append(elevation[-1])
long.append(30 * i)
plt.bar(long, elevation_data, width = 30)
plt.show()
print(time.time() - time_start)
Here's one way to solve this problem using KD trees. A KD tree is a data structure for doing fast nearest-neighbor searches.
import scipy.spatial
tree = scipy.spatial.KDTree(df[['x', 'y']].values)
elevations = df['z'].values
long, elevation_data = [], []
for i in range(steps):
lon, lat = Longitude[i], Latitude[i]
dist, idx = tree.query([lon, lat])
elevation = elevations[idx]
if elevation < 0:
elevation = 0
elevation_data.append(elevation)
long.append(30 * i)
Note: if you can make assumptions about the data, like "all of the points in the CSV are equally spaced," faster algorithms are possible.
It looks like your data might be on a regular grid. If (and only if) every combination of x and y exist in your data, then it probably makes sense to turn this into a labeled 2D array of points, after which querying the correct position will be very fast.
For this, I'll use xarray, which is essentially pandas for N-dimensional data, and integrates well with dask:
# bring the dataframe into memory
df = dk.read('n46_e032_1arc_v3_parquet').compute()
da = df.set_index(["y", "x"]).z.to_xarray()
# now you can query the nearest points:
desired_lats = xr.DataArray([46.6276, 46.6451], dims=["point"])
desired_lons = xr.DataArray([32.5942, 32.6781], dims=["point"])
subset = da.sel(y=desired_lats, x=desired_lons, method="nearest")
# if you'd like, you can return to pandas:
subset_s = subset.to_series()
# you could do this only once, and save the reshaped array as a zarr store:
ds = da.to_dataset(name="elevation")
ds.to_zarr("n46_e032_1arc_v3.zarr")

Adding new column with values from two other columns - added conditionally

I have got such data frame:
Short sample of data:
import pandas as pd
df = pd.DataFrame({'longitude':(-122.05, -118.30, -117.81), 'latitude':(37. 37, 34.26, 33.78)})
I need to add one more column "coordinates" where cell value is equal to:
[lon]-122.05[lon] \n [lat] 37.37 [lat]
if there is longitude and latitude (sometimes there are None or "empty" values)
[lon]-122.05[lon]
if there is no latitude value
[B] No coordinates [B]
if there are no longitude and latitude values.
All new cells must be strings.
My code is here:
def prepare_coords(df):
def custom_edit(long, lat):
if not long.empty:
long = "<lon>"+str(long.astype(str))+"</lon>"
if not lat.empty:
lat = str(lat.astype(str))
if lat.endswith("\n"):
lat.rstrip()
lat = "<lat>"+lat+"</lat>"
if len(long) > 1 and len(lat) > 1: # Both: lon and lat
return long + "\n" + lat
elif len(long) > 1: # Only longitude
return long
else:
return np.nan # No longitude
df["coordinates"] = ""
df["coordinates"] = df["coordinates"].apply(custom_edit(df["longitude"], df["latitude"])).astype(str)
return df
df = prepare_coords(df)
But it gives me Atributte Error and "is not a valid function for 'Series' object" error.
How can I fix it?

input must be an array, list, tuple or scalar pyproj

I Have a DF in which I am trying to convert the eastings/northings to long/lats. My df looks like this:
import pandas as pd
import numpy as np
import pyproj
Postcode Eastings Northings
0 AB101AB 394235 806529
1 AB101AF 394181 806429
2 AB101AG 394230 806469
3 AB101AH 394371 806359
4 AB101AL 394296 806581
I am using a well know code block to convert the eastings and northings to long/lats and add those long/lats as new columns to the df:
def proj_transform(df):
bng = pyproj.Proj("+init=EPSG:27700")
wgs84 = pyproj.Proj("+init=EPSG:4326")
lats = pd.Series()
lons = pd.Series()
for idx, val in enumerate(df['Eastings']):
lon, lat = pyproj.transform(bng, wgs84, df['Eastings'][idx], df['Northings'][idx])
lats.set_value(idx, lat)
lons.set_value(idx, lon)
df['lat'] = lats
df['lon'] = lons
return df
df_transform = proj_transform(my_df)
However, I keep getting the following error, "input must be an array, list, tuple or scalar". Does anyone have any insight into where I am going wrong here?
This is the fastest method:
https://gis.stackexchange.com/a/334307/144357
from pyproj import Transformer
trans = Transformer.from_crs(
"EPSG:27700",
"EPSG:4326",
always_xy=True,
)
xx, yy = trans.transform(my_df["Eastings"].values, my_df["Northings"].values)
my_df["X"] = xx
my_df["Y"] = yy
Also helpful for reference:
https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1
https://pyproj4.github.io/pyproj/stable/gotchas.html#init-auth-auth-code-should-be-replaced-with-auth-auth-code
https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
You can use DataFrame.apply with axis=1 and change function like:
def proj_transform(x):
e = x['Eastings']
n = x['Northings']
bng = pyproj.Proj("+init=EPSG:27700")
wgs84 = pyproj.Proj("+init=EPSG:4326")
lon, lat = pyproj.transform(bng, wgs84, e, n)
return pd.Series([lon, lat])
my_df[['lat','lon']] = my_df.apply(proj_transform, axis=1)

Preparing data to plot contours in Matplotlib's Basemap

I'm having a hard time with plotting a basemap with Matplotlib and I'm fairly new to it so I was hoping for some help.
I have data of the format:
[ (lat1, lon1, data1),
(lat2, lon2, data2),
(lat3, lon3, data3),
...
(latN, lonN, dataN) ]
And here is some sample data:
(32.0, -128.5, 3.99)
(31.0, -128.0, 3.5027272727272734)
(31.5, -128.0, 3.7383333333333333)
(32.0, -128.0, 3.624)
(32.5, -128.0, 3.913157894736842)
(33.0, -128.0, 4.443333333333334)
Finally, here are some basic statistics about my data that I'm planning to plot:
LAT MIN: 22
LAT MAX: 50
LAT LEN: 1919
LON MIN: -128
LON MAX: -97
LON LEN: 1919
DATA MIN: 0
DATA MAX: 12
DATA LEN: 1919
I need to contour plot on a basemap of the continental United States. I can't, for the life of me, seem to figure out how to setup the data for plotting.
I read that the X-Axis (LATS) needs to be a np.array, and Y-Axis (LONS) needs to be an np.array and that Z (DATA) needs to be a MxN matrix where M = len(LATS) and N = len(LONS). So to me, I see Z as a diagonal matrix where the diagonal contains the data on the diagonal is the values found in DATA corresponding to the index of LATS and LONS.
Here is my code:
def show_map(self, a):
a = sorted(a, key = lambda entry: entry[0]) # sort by latitude
a = sorted(a, key = lambda entry: entry[1]) # then sort by longitude
lats = [ x[0] for x in a ]
lons = [ x[1] for x in a ]
data = [ x[2] for x in a ]
lat_min = min(lats)
lat_max = max(lats)
lon_min = min(lons)
lon_max = max(lons)
data_min = min(data)
data_max = max(data)
x = np.array(lats)
y = np.array(lons)
z = np.diag(data)
m = Basemap(
projection = 'merc',
llcrnrlat=lat_min, urcrnrlat=lat_max,
llcrnrlon=lon_min, urcrnrlon=lon_max,
rsphere=6371200., resolution='l', area_thresh=10000
lat_ts = 20, resolution = 'c'
)
fig = plt.figure()
plt.subplot(211)
ax = plt.gca()
# draw parallels
delat = 10.0
parallels = np.arange(0., 90, delat)
m.drawparallels(parallels, labels=[1,0,0,0], fontsize=10)
# draw meridians
delon = 10.
meridians = np.arange(180.,360.,delon)
m.drawmeridians(meridians,labels=[0,0,0,1],fontsize=10)
# draw map features
m.drawcoastlines(linewidth = 0.50)
m.drawcountries(linewidth = 0.50)
m.drawstates(linewidth = 0.25)
ny = z.shape[0]; nx = z.shape[1] # make grid
lo, la = m.makegrid(nx, ny)
X, Y = m(lo, la)
clevs = [0,1,2.5,5,7.5,10,15,20,30,40,50,70,100,150,200,250,300,400,500,600,750]
cs = m.contour(X, Y, z, clevs)
plt.show()
The plot I get, however, is this: http://imgur.com/li1Wg. I need something to this effect: http://matplotlib.org/basemap/_images/plotprecip.png
Can someone point out what I'm doing wrong and help me plot this? Thank You.
Thanks
I figured out how to do it. This is the code that I finally wrote, and I think this can help other users. If there is a better way of doing this, please state it, since I'm new to Matplotlib.
https://gist.github.com/3789221
Your linked gist is a solution but still wrong in another place.
In your question and in your linked gist you switched x and y coordinates with lon and lat.
x represents lon
y represents lat
Therefore you still get wrong results with your linked gist.
why are you writing:
z = np.diag(data)
From the documentation, numpy.diag(v, k=0) extracts a diagonal or construct a diagonal array.
That should be why you only get a "diagonal area" of values...

N-D interpolation for equally-spaced data

I'm trying to copy the Scipy Cookbook function:
from scipy import ogrid, sin, mgrid, ndimage, array
x,y = ogrid[-1:1:5j,-1:1:5j]
fvals = sin(x)*sin(y)
newx,newy = mgrid[-1:1:100j,-1:1:100j]
x0 = x[0,0]
y0 = y[0,0]
dx = x[1,0] - x0
dy = y[0,1] - y0
ivals = (newx - x0)/dx
jvals = (newy - y0)/dy
coords = array([ivals, jvals])
newf = ndimage.map_coordinates(fvals, coords)
by using my own function that has to work for many scenarios
import scipy
import numpy as np
"""N-D interpolation for equally-spaced data"""
x = np.c_[plist['modx']]
y = np.transpose(np.c_[plist['mody']])
pdb.set_trace()
#newx,newy = np.meshgrid(plist['newx'],plist['newy'])
newx,newy = scipy.mgrid[plist['modx'][0]:plist['modx'][-1]:-plist['remapto'],
plist['mody'][0]:plist['mody'][-1]:-plist['remapto']]
x0 = x[0,0]
y0 = y[0,0]
dx = x[1,0] - x0
dy = y[0,1] - y0
ivals = (newx - x0)/dx
jvals = (newy - y0)/dy
coords = scipy.array([ivals, jvals])
for i in np.arange(ivals.shape[0]):
nvals[i] = scipy.ndimage.map_coordinates(ivals[i], coords)
I'm having difficulty getting this code to work properly. The problem areas are:
1.) Recreating this line: newx,newy = mgrid[-1:1:100j,-1:1:100j]. In my case I have a dictionary with the grid in vector form. I've tried to recreate this line using np.meshgrid but then I get an error on line coords = scipy.array([ivals, jvals]). I'm looking for some help in recreating this Cookbook function and making it more dynamic
any help is greatly appreciated.
/M
You should have a look at the documentation for map_coordinates. I don't see where the actual data you are trying to interpolate is in your code. What I mean is, presumably you have some data input which is a function of x and y; i.e. input = f(x,y) that you want to interpolate. In the first example you show, this is the array fvals. This should be your first argument to map_coordinates.
For example, if the data you are trying to inperpolate is input, which should be a 2-dimensional array of shape (len(x),len(y)), then the interpolated data would be:
interpolated_data = map_coordinates(input, coords)