I have troubles with a seemingly easy image processing task. I need to extract a curve from a medical image (the image I attached was fetched from an open source database).
Raw image
I also have about ~50 images like the attached one, but they have small differences and they vary a bit. I'm afraid it's not something that can be used with ML (due to small dataset)
I tried the following approach a modification of which is working well for blood vessels:
import os
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('./img.png')
img_gray = img.copy()
img_gray[img_gray < 70] = 0
img_gray = cv.blur(img_gray * 1.5, (5, 5)).astype(np.uint8)
img_gray = 255 - img_gray
_, img_gray = cv.threshold(img_gray, 150, 255, cv.THRESH_BINARY_INV)
plt.axis('off')
plt.imshow(img_gray, cmap='gray')
plt.savefig('output.png', dpi=600, pad_inches=0, bbox_inches='tight')
Then we have the following result:
Processed curve
Any ideas of how to improve the filtering, please?
Related
I am running a XGboost regressor on a dataset, and finally plotting the feature importance.
because my feature names are long(20-60 characters), it "trims" their names in the saved image (on the left side of the image).
code:
from xgboost import plot_importance
import matplotlib.pyplot as plt
...
plot_importance(search.best_estimator_)
plt.savefig(f'{os.path.join(self.model_save_directory, "feature_importance.png")}')
I tried to set a wide image with:
plt.rcParams["figure.figsize"] = (28, 7)
Now there are more characters in the image of each feature, but not all. On the other hand, the height of the image has shrunk a lot.
Thanks
I was able to show the full feature names with:
plot_importance(search.best_estimator_, max_num_features=None)
plt.xticks(rotation=90)
plt.savefig(f'{os.path.join(self.model_save_directory, "feature_importance.png")}', bbox_inches='tight')
I am trying to create a skyplot(using astropy) containing mean and standard dev values from a hdf5 file. The link to the data is https://wwwmpa.mpa-garching.mpg.de/~ensslin/research/data/faraday2020.html (Faraday Sky 2020).
I have programmed the following code until now, where data is read from the hdf5 file to ggl and ggb, after which values are converted to galactic coordinates (l and b) in gb and gl. I need these values to be plotted in the skyplot.
from astropy import units as u
from astropy.coordinates import SkyCoord
import matplotlib.pyplot as plt
import numpy as np
import h5py
dat = []
ggl=[]
ggb=[]
with h5py.File('faraday2020.hdf5','r') as hdf:
print(list(hdf.keys()))
faraday_sky_mean = hdf['faraday_sky_mean'][:]
faraday_sky_std = hdf['faraday_sky_std'][:]
print(faraday_sky_mean.shape, faraday_sky_mean.dtype)
print(f'Max Mean={max(faraday_sky_mean)}, Min Mean={min(faraday_sky_mean)}')
print(faraday_sky_std.shape, faraday_sky_std.dtype)
print(f'Max StdDev={max(faraday_sky_std)}, Min StdDev={min(faraday_sky_std)}')
ggl = faraday_sky_mean.tolist()
print(len(ggl),type(ggl[0]))
ggb = faraday_sky_std.tolist()
print(len(ggb),type(ggb[0]))
gl = ggl * u.degree
gb = ggb * u.degree
c = SkyCoord(l=gl,b=gb, frame='galactic', unit = (u.deg, u.deg)) #,
l_rad = c.l.wrap_at(180 * u.deg).radian
b_rad = c.b.radian
###
plt.figure(figsize=(8,4.2))
plt.subplot(111, projection="aitoff")
plt.title("Mean and standard dev", y=1.08, fontsize=20)
plt.grid(True)
P1=plt.plot(l_rad, b_rad,c="blue", s=220, marker="h", alpha=0.7) #,
plt.subplots_adjust(top=0.95, bottom=0.0)
plt.xlabel('l (deg)', fontsize=20)
plt.ylabel('b (deg)', fontsize=20)
plt.subplots_adjust(top=0.95, bottom=0.0)
plt.show()
However, I am getting the following error:
'got {}'.format(angles.to(u.degree)))
ValueError: Latitude angle(s) must be within -90 deg <= angle <= 90 deg, got [1.12490771 0.95323024 0.99124631 ... 4.23648627 4.28821608 5.14498169] deg
Please help me on how to fix this.
This is an extension to my previous answer. The original post wanted to plot Mean and Standard Deviation of the Faraday Sky 2020 data on an astropy skyplot. The referenced data source (from Radboud University) only included the mean and standard deviation. The associated longitude and latitude coordinates were obtained from the NASA Goddard LAMBDA-Tools site. The code below shows how to merge the data from both files into a single HDF5 file. For convenience, links to the data sources are repeated here:
Link to the Faraday Sky 2020 data
Link to the HEALPix Pixel Coordinates
The resulting file is named: "faraday2020_with_coords.h5".
from astropy.io import fits
import h5py
fits_file = 'pixel_coords_map_ring_galactic_res9.fits'
faraday_file = 'faraday2020.hdf5'
with fits.open(fits_file) as hdul, \
h5py.File(faraday_file,'r') as h5r, \
h5py.File('faraday2020_with_coords.h5','w') as h5w:
arr = hdul[1].data
dt = [('LONGITUDE', float), ('LATITUDE', float), \
('faraday_sky_mean', float), ('faraday_sky_std', float) ]
ds = h5w.create_dataset('skyplotdata', shape=(arr.shape[0],), dtype=dt)
ds['LONGITUDE'] = arr['LONGITUDE'][:]
ds['LATITUDE'] = arr['LATITUDE'][:]
ds['faraday_sky_mean'] = h5r['faraday_sky_mean'][:]
ds['faraday_sky_std'] = h5r['faraday_sky_std'][:]
I see why you are having problems plotting this data. The data in the linked file (faraday2020.hdf5) is only the mean and standard deviation of the reconstructed Faraday sky. See this note on the linked page: "All maps are presented in Galactic at a HEALPix resolution of Nside=512 and are stored in RING ordering scheme. The units are rad/m2." In other words, you need to get the skyplot coordinates from another source.
A little Googling found the coordinates on the NASA Goddard LAMBDA-Tools site: HEALPix Pixel Coordinates. Specifically, you want this file for NSide=512 / Galactic / Ring Pixel Ordering: pixel_coords_map_ring_galactic_res9.fits
So, first problem solved. Next you need to read the FITS formatted file to get the coordinates. Astropy has the 'fits' module to do that. See code below.
from astropy.io import fits
from astropy import units as u
from astropy.coordinates import SkyCoord
import matplotlib.pyplot as plt
import h5py
filename='pixel_coords_map_ring_galactic_res9.fits'
with fits.open(filename) as hdul:
print(hdul.info())
arr = hdul[1].data
print(arr.shape)
# Returns:
# (3145728,)
print(arr.dtype)
# Returns:
# dtype((numpy.record, [('LONGITUDE', '>f4'), ('LATITUDE', '>f4')]))
ggl = arr['LONGITUDE'][:].tolist()
ggb = arr['LATITUDE'][:].tolist()
gl = ggl * u.degree
gb = ggb * u.degree
c = SkyCoord(l=gl,b=gb, frame='galactic', unit = (u.deg, u.deg))
l_rad = c.l.wrap_at(180 * u.deg).radian
b_rad = c.b.radian
The code above gives you l_rad and b_rad for your skyplot coordinates. Next, you need to merge in the code I gave you earlier to read the Farady Sky Mean and StdDev.
with h5py.File('faraday2020.hdf5','r') as hdf:
faraday_sky_mean = hdf['faraday_sky_mean'][:]
faraday_sky_std = hdf['faraday_sky_std'][:]
Finally, plot both sets of data with matplotlib. I changed the plot to use a scatterplot, color coding the markers with c=faraday_sky_mean (the mean values). You can do the same with faraday_sky_stddev to get the standard deviation values.
plt.figure(figsize=(8,4.2))
plt.subplot(111, projection="aitoff")
plt.title("Mean", y=1.08, fontsize=20)
plt.grid(True)
# P1=plt.plot(l_rad, b_rad,c="blue", marker="h", alpha=0.7) #, s=220)
P2 = plt.scatter(l_rad, b_rad, s=20, c=faraday_sky_mean, cmap='hsv')
plt.subplots_adjust(top=0.95, bottom=0.0)
plt.xlabel('l (deg)', fontsize=20)
plt.ylabel('b (deg)', fontsize=20)
plt.subplots_adjust(top=0.95, bottom=0.0)
plt.show()
print('DONE')
Put it all together, and you will get the images below. I think this is accurate (but know nothing about astrophysics, so not 100% sure). This should get you pointed in the right direction. Good luck.
I'm trying to migrate from Basemap to Cartopy looking demo examples. I have a simple code using both coastlines() and contourf(). I can get both separately but not simultaneously. The data set is a netcdf file containing the sea surface temperature data of the west Med. The code is:
import numpy as np
from netCDF4 import Dataset
import cartopy
import matplotlib.pyplot as plt
# DATA
data = Dataset('20190715.0504.n19.nc','r')
lon = data.variables['lon'][:]
lat = data.variables['lat'][:]
sst = data.variables['mcsst'][0,:,:].squeeze()
xxT,yyT = np.meshgrid(lon,lat)
# PLOT
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_axes([0.01,0.01,0.98,0.98],projection=cartopy.crs.Mercator())
ax1.coastlines()
#ax1.contourf(xxT,yyT,sst)
ax1.set_extent([16.5, -15.0, 35.0, 46.5])
plt.show()
With this code I get:
If I use:
#ax1.coastlines()
ax1.contourf(xxT,yyT,sst)
ax1.set_extent([16.5, -15.0, 35.0, 46.5])
I get a white rectangle.
If I use:
#ax1.coastlines()
ax1.contourf(xxT,yyT,sst)
ax1.set_extent([16.5,-15.0,35.0,46.5],crs=cartopy.crs.Mercator())
I get the contoured data.
But with both:
ax1.coastlines()
ax1.contourf(xxT,yyT,sst)
ax1.set_extent([16.5,-15.0,35.0,46.5],crs=cartopy.crs.Mercator())
the contour is ok ! but without coastlines. And if finally
ax1.coastlines()
ax1.contourf(xxT,yyT,sst)
ax1.set_extent([16.5,-15.0,35.0,46.5])
only coastlines are shown, not contour !. I try to understand how I have to proceed because problems arose when trying to include this into a GUI with options show/hide for coatlines, features, etc. Just in case I'm using Python 3.7.4, Cartopy 0.17, proj4 5.2, matplotlib 3.1.1. Thanks !
Thanks to swatchai suggestion, although, I still don't understand why I need to use the transform keyword with the specific PlateCarree projection keyword, the code works fine if:
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_axes([0.01, 0.01, 0.98, 0.98],projection=cartopy.crs.Mercator())
ax1.coastlines('10m')
ax1.set_extent([16.5, -15.0, 35.0, 46.5])
ax1.contourf(xxT,yyT,sst,transform=cartopy.crs.PlateCarree())
Here the result:
I need to encode an image in 16UC1 format, but I receive the error:
cv_bridge.core.CvBridgeError:encoding specified as 16UC1, but image has incompatible type 32FC1
I tried to use skimage function img_as_uint but since my image values are not between -1 and 1 it doesn't work. i also tried to "normalize" my values by dividing all of them by the value obtained from np.amax, but using the skimage function only returns a blank image.
Is there a way of achieving this conversion?
This is the original 32FC1 image
With numpy you should be able to:
import numpy as np
img = np.random.normal(0, 1, (300, 300, 3)).astype(np.float32) # simulated image
uimg = img.astype(np.uint16)
You probably will first want to do some kind of normalization if it isn't already in an unsigned range. Probably something like:
img_normalized = (img-img.min())/(img.max()-img.min())*256**2
But your normalization strategy will depend on what you want to accomplish.
Thanks for sharing an image. I can visualize as follows:
import numpy as np
import matplotlib.pyplot as plt
arr = np.load('32FC1_image.npz')
img = arr['arr_0']
img = np.squeeze(img) # this gets rid of the extra dimensions that are causing matplotlib to not recognize it as an image, the extra dimensions also may be causing your problems
img_normalized = (img-img.min())/(img.max()-img.min())*256**2
img_normalized = img_normalized.astype(np.uint16)
plt.imshow(img_normalized)
Try using the normalized 16 bit image.
Trying to save some experimental data to file I noticed that when trying to save NxN sized heatmaps the execution would never complete. Investigating further it appears to be due to the .pdf extension. If I use, for example, .png it's extremely fast.
Minimum reproducible example:
import matplotlib.pylab as plt
import numpy as np
import seaborn as sbn
N=200
THE_FIGURE = plt.figure(figsize=(8.27, 6), dpi=300)
ax = plt.subplot(1, 1, 1)
sbn.heatmap(np.random.uniform(1, 20, (N, N)), ax=ax)
THE_FIGURE.savefig('image.pdf', bbox_inches='tight', pad_inches=0.1)
This slowdown becomes noticeable even when N = 100.
N=1000 isn't even happening.
Is this normal? and how can I fix it
thanks!
It makes sense that for larger grids saving the pdf takes longer than saving the png. This can be seen in the following graph, where time for saving the pdf and png as a function of the number of tiles along one axis (N) is shown (solid lines). We can also look at the filesize of pdf and png, where some similar behaviour is oberved (dashed lines).
Find here the code for reproduction. Running this on my computer takes ~1:10 minutes.
import matplotlib.pylab as plt
import numpy as np
import seaborn as sns
import time
import os
def f(N, form = "pdf"):
t0= time.time()
fig = plt.figure(figsize=(8.27, 6), dpi=300)
ax = plt.subplot(1, 1, 1)
sns.heatmap(np.random.uniform(1, 20, (N, N)), ax=ax)
fig.savefig('image.'+form, bbox_inches='tight', pad_inches=0.1)
t1 = time.time()
plt.close(fig)
s = os.path.getsize('image.'+form)
return t1-t0,s
ns = [5,10,15,20,25,30] + range(40,210, 20)
pdf = []
png = []
for i,n in enumerate(ns):
pdf.append(f(n, form="pdf"))
png.append(f(n, form="png"))
#print i, n
pdf = np.array(pdf);png = np.array(png)
plt.figure()
plt.plot(ns, pdf[:,0], label="pdf")
plt.plot(ns, png[:,0], label="png")
plt.xlabel("N")
plt.ylabel("time [s]")
ax2 = plt.gca().twinx()
ax2.plot(ns, pdf[:,1]/1000., label="pdf (filesize)", ls="--")
ax2.plot(ns, png[:,1]/1000., label="png (filesize)", ls="--")
ax2.set_ylabel("filesize [kByte]")
plt.gcf().legend(ncol=2, loc="upper left", bbox_to_anchor=(0.125,0.98))
plt.subplots_adjust(top=0.85)
plt.show()
Also the reason seems intuitive. Png is a bitmap format, it saves the image as pixels. Pdf is a vector format, it saves the image as vector shapes.
While the png needs to store always the same amount of pixels (~2000x1500 in this case), it will take longer to save png for small N (here up to N=30 or NxN = 900). But the more tiles there are in the figure, the more shapes need to be stored in the pdf, hence it will eventually take longer to save many tiles in pdf format. We assume that the time it takes to save the pdf file is roughly proportionally to the amount of tiles to store. This suggests to have a quadratic relationship of time with N, time ~ N**2. Fitting a quadratic polynomial to the data and evaluating the polynomial at t=1000 gives
fit = np.polyfit(ns, pdf[:,0], 2)
print( np.poly1d(fit)(1000) )
gives 340 seconds, which is 5:40 minutes. This is the estimated time it takes to save the 1000x1000 matrix.
Note: All data here is produced on an Intel i5 3.5GHz windows computer running python 2.7 and matplotlib 2.1. Using a different computer will of course change the timings.