For a ML project I'm currently on, I need to verify if the trained data are good or not.
Let's say that I'm "splitting" the sky into several altitude grids (let's take 3 values for the moment) and for a given region (let's say, Europe).
One grid could be a signal reception strength (RSSI), another one the signal quality (RSRQ)
Each cell of the grid is therefor a rectangle and it has a mean value of each measurement (i.e. RSSI or RSRQ) performed in that area.
I have hundreds of millions of data
In the code below, I know how to draw a coloured mesh with xarray for each altitude: I just use xr.plot.pcolormesh(lat,lon, the_data_set); that's fine
But this will only give me a "flat" figure like this:
RSSI value at 3 different altitudes
I need to draw all the pcolormesh() of a dataset for each altitude in such way that:
1: I can have the map at the bottom
2: Each pcolormesh() is stacked and "displayed" at its altitude
3: I need to add a 3d scatter plot for testing my trained data
4: Need to be interactive as I have to zoom in areas
For 2 and 3 above, I managed to do something using plt and cartopy :
enter image description here
But plt/cartopy combination is not as interactive as plotly.
But plotly doesn't have the pcolormesh functionality
And still ... I don't know in anycase, how to "stack" the pcolormesh results that I did get above.
I've been digging Internet for few days but I didn't find something that could satisfy all my criteria.
What I did to get my pcolormesh:
import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
class super_data():
def __init__(self, lon_bound,lat_bound,alt_bound,x_points,y_points,z_points):
self.lon_bound = lon_bound
self.lat_bound = lat_bound
self.alt_bound = alt_bound
self.x_points = x_points
self.y_points = y_points
self.z_points = z_points
self.lon, self.lat, self.alt = np.meshgrid(np.linspace(self.lon_bound[0], self.lon_bound[1], self.x_points),
np.linspace(self.lat_bound[0], self.lat_bound[1], self.y_points),
np.linspace(self.alt_bound[0], self.alt_bound[1], self.z_points))
self.this_xr = xr.Dataset(
coords={'lat': (('latitude', 'longitude','altitude'), self.lat),
'lon': (('latitude', 'longitude','altitude'), self.lon),
'alt': (('latitude', 'longitude','altitude'), self.alt)})
def add_data_array(self,ds_name,ds_min,ds_max):
def create_temp_data(ds_min,ds_max):
data = np.random.randint(ds_min,ds_max,size=self.y_points * self.x_points)
return data
temp_data = []
# Create "z_points" number of layers in the z axis
for i in range(self.z_points):
data = np.concatenate(temp_data)
data = data.reshape(self.z_points,self.x_points, self.y_points)
self.this_xr[ds_name] = (("altitude","longitude","latitude"),data)
def plot(self,dataset, extent=None, plot_center=False):
# I want t
if np.sqrt(self.z_points) == np.floor(np.sqrt(self.z_points)):
side_size = int(np.sqrt(self.z_points))
side_size = int(np.floor(np.sqrt(self.z_points) + 1))
fig = plt.figure()
for i in range(side_size):
for j in range(side_size):
if i_ax < self.z_points+1:
this_dataset = self.this_xr[dataset].sel(altitude=i_ax-1)
# Initialize figure with subplots
ax = fig.add_subplot(side_size, side_size, i_ax, projection=ccrs.PlateCarree())
i_ax += 1
this_dataset.plot.pcolormesh('lon', 'lat', ax=ax, infer_intervals=True, alpha=0.5)
if __name__ == "__main__":
# Wanted coverage :
lons = [-15, 30]
lats = [35, 65]
alts = [1000, 5000]
xarr = super_data(lons,lats,alts,10,8,3)
# Add some fake data
Thanks for you help


Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})

Plotting audio data properties over long time periods

Using Python matplotlib I would like to plot sensor data over a period of several hours. The signal arrives via an audio card and gets sampled over short chunks of data. In the example below amplitude and RMS is plotted.
In order to plot RMS and other properties over much larger time periods than shown here, perhaps down sampling is needed. I am not sure how to accomplish that and would appreciate any further advice. The intention is to run the code on a Raspberry Pi.
Update 1. A very minimal example is shown for getting a longer time view of RMS.
Noticable is a considerable delay in response to audio signals in particular when adding more plots to the figure.
I also tried using Funcanimation without blitting because I would like to show a real-time axis and this is equally slow. Using PyQT should give better results.
import pyaudio
import struct
import matplotlib.pyplot as plt
import numpy as np
mic = pyaudio.PyAudio()
FORMAT = pyaudio.paInt16
RATE = 44100
CHUNK = int(RATE/20)
stream = mic.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True,
fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax2 = fig.add_subplot(2, 1, 2)
ax1.set_xlabel("Samples = 2*Chunk length ")
ax1.set_title('Audio example')
x = np.arange(0, 2 * CHUNK, 2)
ax1.set_ylim(-10e3, 10e3)
ax1.set_xlim(0, CHUNK)
line1, = ax1.plot(x, np.random.rand(CHUNK))
line2, = ax2.plot(x, np.random.rand(CHUNK))
ts = []
rs = []
while True:
data = stream.read(CHUNK)
data = np.frombuffer(data, np.int16)
d = np.frombuffer(data, np.int16).astype(np.float)
rms2 = np.sqrt( np.mean(d**2) )
# Add x and y to lists
#Draw x and y lists
ax2.plot(ts,rs,color= 'black')
# Format plot
ax2.set_xlabel("Time in UTC")
ax2.set_ylabel("RMS values")
plt.setp(ax2.get_xticklabels(), ha="right", rotation=45)

Merge countries using Cartopy

I am using the following code to make a map for Sweden, Norway and Finland together as one area. however, I am struggling with it. I'm following this example, Python Mapping in Matplotlib Cartopy Color One Country.
from shapely.geometry import Polygon
from cartopy.io import shapereader
import cartopy.io.img_tiles as cimgt
import cartopy.crs as ccrs
import geopandas
import matplotlib.pyplot as plt
def rect_from_bound(xmin, xmax, ymin, ymax):
"""Returns list of (x,y)'s for a rectangle"""
xs = [xmax, xmin, xmin, xmax, xmax]
ys = [ymax, ymax, ymin, ymin, ymax]
return [(x, y) for x, y in zip(xs, ys)]
# request data for use by geopandas
resolution = '10m'
category = 'cultural'
name = 'admin_0_countries'
countries = ['Norway', 'Sweden', 'Finland']
shpfilename = shapereader.natural_earth(resolution, category, name)
df = geopandas.read_file(shpfilename)
extent = [2, 32, 55, 72]
# get geometry of a country
for country in (countries):
poly = [df.loc[df['ADMIN'] == country]['geometry'].values[0]]
stamen_terrain = cimgt.StamenTerrain()
# projections that involved
st_proj = stamen_terrain.crs #projection used by Stamen images
ll_proj = ccrs.PlateCarree() #CRS for raw long/lat
# create fig and axes using intended projection
fig = plt.figure(figsize=(8,9))
ax = fig.add_subplot(122, projection=st_proj)
ax.add_geometries(poly, crs=ll_proj, facecolor='none', edgecolor='black')
pad1 = 0.5 #padding, degrees unit
exts = [poly[0].bounds[0] - pad1, poly[0].bounds[2] + pad1, poly[0].bounds[1] - pad1, poly[0].bounds[3] + pad1];
ax.set_extent(exts, crs=ll_proj)
# make a mask polygon by polygon's difference operation
# base polygon is a rectangle, another polygon is simplified switzerland
msk = Polygon(rect_from_bound(*exts)).difference( poly[0].simplify(0.01) )
msk_stm = st_proj.project_geometry (msk, ll_proj) # project geometry to the projection used by stamen
# get and plot Stamen images
ax.add_image(stamen_terrain, 8) # this requests image, and plot
# plot the mask using semi-transparency (alpha=0.65) on the masked-out portion
ax.add_geometries( msk_stm, st_proj, zorder=12, facecolor='white', edgecolor='none', alpha=0.65)
What I have is separated maps. THoguh I need only one map of them.
Can you please help?
Thank you.
The code here that you adapted to your work is good for a single country. If multiple contiguous countries are new target, one need to select all of them and dissolve into a single geometry. Only a few lines of code need to be modified.
Example: new target countries: ['Norway','Sweden', 'Finland']
The line of code that need to be replaced:
poly = [df.loc[df['ADMIN'] == 'Switzerland']['geometry'].values[0]]
Replace it with these lines of code:
scan3 = df[ df['ADMIN'].isin(['Norway','Sweden', 'Finland']) ]
scan3_dissolved = scan3.dissolve(by='LEVEL')
poly = [scan3_dissolved['geometry'].values[0]]
And you should get a plot similar to this:

Discrete Color Bar with Tick labels in between colors

I am trying to plot some data with a discrete color bar. I was following the example given (https://gist.github.com/jakevdp/91077b0cae40f8f8244a) but the issue is this example does not work 1-1 with different spacing. For example, the spacing in the example in the link is for only increasing by 1 but my data is increasing by 0.5. You can see the output from the code I have.. Any help with this would be appreciated. I know I am missing something key here but cant figure it out.
import matplotlib.pylab as plt
import numpy as np
def discrete_cmap(N, base_cmap=None):
"""Create an N-bin discrete colormap from the specified input map"""
# Note that if base_cmap is a string or None, you can simply do
# return plt.cm.get_cmap(base_cmap, N)
# The following works for string, None, or a colormap instance:
base = plt.cm.get_cmap(base_cmap)
color_list = base(np.linspace(0, 1, N))
cmap_name = base.name + str(N)
return base.from_list(cmap_name, color_list, N)
x = np.random.randn(40)
y = np.random.randn(40)
c = np.random.randint(num, size=40)
plt.scatter(x, y, c=c, s=50, cmap=discrete_cmap(num, 'jet'))
plt.clim(-0.5, num - 0.5)
Not sure what version of matplotlib/pyplot introduced this, but plt.get_cmap now supports an int argument specifying the number of colors you want to get, for discrete colormaps.
This automatically results in the colorbar being discrete.
By the way, pandas has an even better handling of the colorbar.
import numpy as np
from matplotlib import pyplot as plt
# remove if not using Jupyter/IPython
%matplotlib inline
# choose number of clusters and number of points in each cluster
n_clusters = 5
n_samples = 20
# there are fancier ways to do this
clusters = np.array([k for k in range(n_clusters) for i in range(n_samples)])
# generate the coordinates of the center
# of each cluster by shuffling a range of values
clusters_x = np.arange(n_clusters)
clusters_y = np.arange(n_clusters)
# get dicts like cluster -> center coordinate
x_dict = dict(enumerate(clusters_x))
y_dict = dict(enumerate(clusters_y))
# get coordinates of cluster center for each point
x = np.array(list(x_dict[k] for k in clusters)).astype(float)
y = np.array(list(y_dict[k] for k in clusters)).astype(float)
# add noise
x += np.random.normal(scale=0.5, size=n_clusters*n_samples)
y += np.random.normal(scale=0.5, size=n_clusters*n_samples)
### Finally, plot
fig, ax = plt.subplots(figsize=(12,8))
# get discrete colormap
cmap = plt.get_cmap('viridis', n_clusters)
# scatter points
scatter = ax.scatter(x, y, c=clusters, cmap=cmap)
# scatter cluster centers
ax.scatter(clusters_x, clusters_y, c='red')
# add colorbar
cbar = plt.colorbar(scatter)
# set ticks locations (not very elegant, but it works):
# - shift by 0.5
# - scale so that the last value is at the center of the last color
tick_locs = (np.arange(n_clusters) + 0.5)*(n_clusters-1)/n_clusters
# set tick labels (as before)
Ok so this is the hack I found for my own question. I am sure there is a better way to do this but this works for what I am doing. Feel free to suggest a better way to do this.
import numpy as np
import matplotlib.pylab as plt
def discrete_cmap(N, base_cmap=None):
"""Create an N-bin discrete colormap from the specified input map"""
# Note that if base_cmap is a string or None, you can simply do
# return plt.cm.get_cmap(base_cmap, N)
# The following works for string, None, or a colormap instance:
base = plt.cm.get_cmap(base_cmap)
color_list = base(np.linspace(0, 1, N))
cmap_name = base.name + str(N)
return base.from_list(cmap_name, color_list, N)
x = np.random.randn(40)
y = np.random.randn(40)
c = np.random.randint(num, size=40)
plt.scatter(x, y, c=c, s=50, cmap=discrete_cmap(num, 'jet'))
plt.clim(-0.5, num - 0.5)
For some reason I cannot upload the image associated with the code above. I get an error when uploading so not sure how to show the final example. But simply I set the color bar axes for tick labels for a vertical color bar and passed in the labels I want and it produced the correct output.

How can draw a line in matplotlib so that the edge (not the center) of the drawn line follows the plotted data?

I'm working on a figure to show traffic levels on a highway map. The idea is that for each
highway segment, I would plot two lines - one for direction. The thickness of each
would correspond to the traffic volume in that direction. I need to plot the lines
so that the left edge (relative to driving direction) of the drawn line follows
the shape of the highway segment. I would like to specify the shape in data coordinates,
but I would like to specify the thickness of the line in points.
My data is like this:
where, for example, ((5,10),(-7,2),(8,9)) is a sequence of x,y values giving the shape of a highway segment, and (210,320) is traffic volumes in the forward and reverse direction, respectively
Looks matter: the result should be pretty.
I figured out a solution using matplotlib.transforms.Transform and shapely.geometry.LineString.parallel_offset.
Note that shapely's parallel_offset method can sometimes return a MultiLineString, which
is not handled by this code. I've changed the second shape so it does not cross over itself to avoid this problem. I think this problem would happen rarely happen in my application.
Another note: the documentation for matplotlib.transforms.Transform seems to imply that the
array returned by the transform method must be the same shape as the array passed
as an argument, but adding additional points to plot in the transform method seems
to work here.
#matplotlib version 1.1.0
#shapely version 1.2.14
#Python 2.7.3
import matplotlib.pyplot as plt
import shapely.geometry
import numpy
import matplotlib.transforms
def get_my_transform(offset_points, fig):
offset_inches = offset_points / 72.0
offset_dots = offset_inches * fig.dpi
class my_transform(matplotlib.transforms.Transform):
input_dims = 2
output_dims = 2
is_separable = False
has_inverse = False
def transform(self, values):
l = shapely.geometry.LineString(values)
l = l.parallel_offset(offset_dots,'right')
return numpy.array(l.xy).T
return my_transform()
def plot_to_right(ax, x,y,linewidth, **args):
t = ax.transData + get_my_transform(linewidth/2.0,ax.figure)
ax.plot(x,y, transform = t,
linewidth = linewidth,
solid_capstyle = 'butt',
data = [[((5,10),(-7,2),(8,9)),(210,320)],
fig = plt.figure()
ax = fig.add_subplot(111)
for shape, volumes in data:
x,y = zip(*shape)
plot_to_right(ax, x,y, volumes[0]/100., c = 'blue')
plot_to_right(ax, x[-1::-1],y[-1::-1], volumes[1]/100., c = 'green')
ax.plot(x,y, c = 'grey', linewidth = 1)