Convert a pandas dataframe to geoTIFF (python) - pandas

I have a pandas df with X/Y and lat/lon coordinates.
I want to convert the data frame using lat/lon columns and store the TIFF image in WGS84 CRS.
Thanks

A couple package recommendations for you: xarray and rioxarray.
xarray is pydata's solution to labeled N-dimensional arrays (think pandas but in 3+ dimensions, or think numpy ND-arrays but with pandas indices rather than just positional indices).
rioxarray is an extension package combining xarray with rasterio, giving the ability to read and write raster files including GeoTIFFs. rioxarray has docs on converting xarray DataArrays to rasters. See also the API docs for converting RasterArray and RasterDataset objects to rasters.
In your case, assuming your orthogonal dimensions are (lat, lon) and that model_lat and model_lon are in fact indexed by both lat and lon (e.g. they're in a 3D projection), and that res is the band you'd like to encode, your result would look something like this:
import xarray as xr
import rioxarray
da = df.set_index(['lat', 'lon']).to_xarray()
# promote the data variables model lat/long to 2d coordinates
da = da.set_coords(['model_lat', 'model_long'])
da.res.rio.to_raster(filepath)

Related

Cannot plot a histogram from a Pandas dataframe

I've used pandas.read_csv to generate a 1000-row dataframe with 32 columns. I'm looking to plot a histogram or bar chart (depending on data type) of each column. For columns of type 'int64', I've tried doing matplotlib.pyplot.hist(df['column']) and df.hist(column='column'), as well as calling matplotlib.pyplot.hist on df['column'].values and df['column'].to_numpy(). Weirdly, nthey all take areally long time (>30s) and when I've allowed them to complet, I get unit-height bars in multiple colors, as if there's some sort of implicit grouping and they're all being separated into different groups. Any ideas about what I can do to get a normal histogram? Unfortunately I closed the charts so I can't show you an example right now.
Edit - this seems to be a much bigger problem with Int columns, and casting them to float fixes the problem.
Follow these two steps:
import the Histogram class from the Matplotlib library
use the "plot" method, which will accept a dataframe as argument
import matplotlib.pyplot as plt
plt.hist(df['column'], color='blue', edgecolor='black', bins=int(45/1))
Here's the source.

How to resize data point in streamlit st.map()?

Summary
I am building a real-time dashboard that plots latitude and longitude using st.map(df).
Steps to reproduce
I was following this example here.
Code snippet:
import streamlit as st
import pandas as pd
import numpy as np
data = [[-33.71205471, 29.19017682], [-33.81205471, 29.11017682], [-34.71205471, 29.49017682]]
df = pd.DataFrame(data, columns=['Latitude', 'Longitude'])
st.map(df)
Actual behavior:
The plot works as intended but the data points are too large to distinguish between movements in latititude and longitude. Movements in lat and long have a 7 decimal place granularity (e.g. lat=166.1577634).
Expected behavior:
Here is an example in AWS QuickSight of how the points should look.
Any ideas on how to reduce the size of the map circles for each respective data point?
Thanks!

Compute 2D FFT in NumPy

I load image in NumPy using imageio and it loads in the format H x W x 3 where H and W are the spatial axes and 3 refers to RGB channels.
I wish to compute the images 2D FFT and use np.fft.fft2(img). I look at the documentation of np.fft.fft2 and it uses axes=(-2,-1). Does this mean that my FFT is being computed over the W and the 3 axes?
Will the correct way be np.fft.fft2(img,axes=(0,1))?
I am not sure about the solution that you suggest, but another option may be to calculate the transform for every color, do your processing, and merge again after if needed.
Something like this:
image_red, image_green, image_blue = imageio[:,:,0], imageio[:,:,1], imageio[:,:,2]
fft_red = np.fft.fft(image_red)
etc...
You can plot the proposed solution and this method to check the difference, for example:
import matplotlib.pyplot as plt
plt.figure('onlyRed')
plt.imshow(np.abs(fft_red))

Saving an imshow-like image while preserving resolution

I have an (n, m) array that I've been visualizing with matplotlib.pyplot.imshow. I'd like to save this data in some type of raster graphics file (e.g. a png) so that:
The colors are the ones shown with imshow
Each element of the underlying array is exactly one pixel in the saved image -- meaning that if the underlying array is (n, m) elements, the image is NxM pixels. (I'm not interested in interpolation='nearest' in imshow.)
There is nothing in the saved image except for the pixels corresponding to the data in the array. (I.e. there's no white space around the edges, axes, etc.)
How can I do this?
I've seen some code that can kind of do this by using interpolation='nearest' and forcing matplotlib to (grudgingly) turn off axes, whitespace, etc. However, there must be some way to do this more directly -- maybe with PIL? After all, I have the underlying data. If I can get an RGB value for each element of the underlying array, then I can save it with PIL. Is there some way to extract the RGB data from imshow? I can write my own code to map the array values to RGB values, but I don't want to reinvent the wheel, since that functionality already exists in matplotlib.
As you already guessed there is no need to create a figure. You basically need three steps. Normalize your data, apply the colormap, save the image. matplotlib provides all the necessary functionality:
import numpy as np
import matplotlib.pyplot as plt
# some data (512x512)
import scipy.misc
data = scipy.misc.lena()
# a colormap and a normalization instance
cmap = plt.cm.jet
norm = plt.Normalize(vmin=data.min(), vmax=data.max())
# map the normalized data to colors
# image is now RGBA (512x512x4)
image = cmap(norm(data))
# save the image
plt.imsave('test.png', image)
While the code above explains the single steps, you can also let imsave do all three steps (similar to imshow):
plt.imsave('test.png', data, cmap=cmap)
Result (test.png):

Figures with lots of data points in matplotlib

I generated the attached image using matplotlib (png format). I would like to use eps or pdf, but I find that with all the data points, the figure is really slow to render on the screen. Other than just plotting less of the data, is there anyway to optimize it so that it loads faster?
I think you have three options:
As you mentioned yourself, you can plot fewer points. For the plot you showed in your question I think it would be fine to only plot every other point.
As #tcaswell stated in his comment, you can use a line instead of points which will be rendered more efficiently.
You could rasterize the blue dots. Matplotlib allows you to selectively rasterize single artists, so if you pass rasterized=True to the plotting command you will get a bitmapped version of the points in the output file. This will be way faster to load at the price of limited zooming due to the resolution of the bitmap. (Note that the axes and all the other elements of the plot will remain as vector graphics and font elements).
First, if you want to show a "trend" in your plot , and considering the x,y arrays you are plotting are "huge" you could apply a random sub-sampling to your x,y arrays, as a fraction of your data:
import numpy as np
import matplotlib.pyplot as plt
fraction = 0.50
x_resampled = []
y_resampled = []
for k in range(0,len(x)):
if np.random.rand() < fraction:
x_resampled.append(x[k])
y_resampled.append(y[k])
plt.scatter(x_resampled,y_resampled , s=6)
plt.show()
Second, have you considered using log-scale in the x-axis to increase visibility?
In this example, only the plotting area is rasterized, the axis are still in vector format:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(size=400000)
y = np.random.uniform(size=400000)
plt.scatter(x, y, marker='x', rasterized=False)
plt.savefig("norm.pdf", format='pdf')