How to read multiple Nifti files from the same folder with Nibabel? - medical-imaging

I did see a couple of tuts related to Nibabel, that work fine when you are reading only one nii image, but I need to read 167 files from the same folder, and I don't understand how to do it.
I tried using glob as we use it for OpenCV, but it doesn't work similarly with Nibabel.
data = glob.glob('path to my data' + '*.nii.gz')
print(len(data))
print(data)
data = np.asarray(data)
print(data)

As one of the comments above mentions, the data you describe only contains the image filenames not their pixels.
Once you have a list of filenames (or some containers where each filename string is an element) data you can do this:
import numpy as np;
import nibabel as nb;
if "imdata" not in locals():
for i in range ( len ( data ) ):
imname = data [ i ];
if "imdata" not in locals():
imdata = nb.load ( imname ).get_fdata().flatten();
imlen = len ( imdata );
imdata = imdata.reshape ( 1, imlen );
else:
imdata = np.append ( imdata, nb.load ( imname ).get_fdata().flatten().reshape ( 1, imlen ), axis=0 );
This will work if the images are the same size -- you will get a matrix of #images x #pixelsperimage. You can also do this in 3D or 4D of course, depending on what you need to do with the image data.
The use of locals() is a bit much, for me it was handy when I wrote this for not having to re-load images inside the same session.

Related

Using string output from pytesseract to do a vlookup in pandas dataframe

I'm very new to Python, and I'm trying to make a simple image to song title to BPM program. My approach is using pytesseract to generate a string output; and then, using that string output, I wish to vlookup in a dataframe created by pandas. However, it always return zero value even though that song does exist in the data.
import PIL.ImageGrab
from PIL import ImageGrab
import numpy as np
import pytesseract
import pandas as pd
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
def getTitleImage(left, top, width, height):
printscreen_pil = ImageGrab.grab((left, top, left + width, top + height))
printscreen_numpy = np.array(printscreen_pil.getdata(), dtype='uint8') \
.reshape((printscreen_pil.size[1], printscreen_pil.size[0], 3))
return printscreen_numpy
# Printscreen:
titleImage = getTitleImage(x, y, w, h)
# pytesseract to string:
songTitle = pytesseract.image_to_string(titleImage)
print('Name of the song: ', songTitle)
# Importing the csv data via pandas.
songTable = pd.read_csv(r'C:\Users\leech\Desktop\songList.csv')
# A simple vlookup formula that return the BPM of the song by taking data from the same row.
bpmSong = songTable[songTable['Song Title'] == songTitle]['BPM'].sum()
print('The BPM of the song is: ', bpmSong)
Output:
Name of the song: Macarena
The BPM of the song is: 0
However, when I tried to forcefully provide the string to the songTitle variable, it works:
songTitle = 'Macarena'
print('Name of the song: ', songTitle)
songTable = pd.read_csv(r'C:\Users\leech\Desktop\songList.csv')
bpmSong = songTable[songTable['Song Title'] == songTitle]['BPM'].sum()
print('The BPM of the song is: ', bpmSong)
Output:
Name of the song: Macarena
The BPM of the song is: 103
I have checked the string generated from pytesseract: It has no extra space in the front or the back, totally identical to the forced string, but they still produce different results. What could be the problem?
I found the answer.
It is because the songTitle coming from:
songTitle = pytesseract.image_to_string(titleImage)
...is actually 'Macarena\n' instead of 'Macarena'.
They might look the same after print out, except the former will create a new line after it.
A great lesson learn for me.

Getting an IndexError when trying to run pcolormesh from a pandas DataFrame

I'm trying to generate a pcolormesh plot from a large dataset, where the rows are in units of hertz, the rows are individual files, and the body is an array of magnitude values per file for each frequency. My DataFrame gets constructed correctly with correct labels, but when I pass it in to pcolormesh, it throws the exception "arrays used as indices must be of integer (or boolean) type". The code I am attaching reflects a conversion of the frequency array to an integer array using .astype(int). Note, if I convert the PSD_array (magnitudes) to integers, it DOES work (but isn't helpful), but it doesn't like it otherwise. I also played around with other pcolormesh generations using decimals as the body of the DataFrame and it worked fine.
Ideas would be lovely, I'll keep working on it.
Code: (Note: specific call file paths redacted).
'''def file_List():
files = [file for file in os.listdir('###)]
file_list = []
for file in files:
file_list += [file]
#print(natsorted(file_list))
#print(len(file_list))
return(natsorted(file_list))
###Using Fast Fourier Transform, take in file list generated from
###file_List program and perform FFT on each file.
###We read the files by adding them to the file directory. Could improve
###by making an overarching program that runs everything with an input
###file directory.
def FFT():
'''
runs through FFT for the files in a file list as determined by the
file_List() program.
'''
file_list = file_List() #runs file_List() program and saves
#the list of files as a variable.
df = pd.DataFrame()
freq_array = np.empty((0,204800))
PSD_array = np.empty((0,204800))
print(len(file_list))
count = 0
for file in file_list:
while count < 10:
file_read = pd.read_csv('###'+file,skiprows=22,sep = '\t')
df = pd.DataFrame(file_read, columns = ['X_Value','Acceleration'])
#print(df.head())
q = df['Acceleration'] #data set input
n = len(df['Acceleration']) #number of data points
dt = 2/len(df['X_Value']) #
f_hat = np.fft.fft(q,n) #Runs FFT
PSD = f_hat * np.conj(f_hat) / n #Power Spectral Density
freq = (1/(dt*n)) * np.arange(n) #Creates x axis of frequencies
freq_array = np.append(freq_array,np.array([freq]),axis=0)
PSD_array = np.append(PSD_array,np.array([PSD]),axis=0)
count += 1
#trans_freq = np.transpose(freq_array)
#trans_PSD = np.transpose(PSD_array)
print(freq_array)
int_freq = freq_array.astype(int)
print(int_freq)
#PSD_int = PSD_array.astype(int)
PSD_df = pd.DataFrame(PSD_array, index = np.arange(len(PSD_array)), columns = int_freq[0])
#print(np.arange(len(PSD_array)))
print(PSD_df)
return(PSD_df)
def heatmap(df):
'''
Constructs a heatmap given an input dataframe
'''
plt.pcolormesh(df)
'''

Getting data from odo.resource(source) to odo.resource(target)

I'm trying to extend the odo library with functionality to convert a GDAL dataset (raster with spatial information) to a NetCDF file.
Reading in the gdal dataset goes fine. But in the creation stage of the netcdf I need some metadata of the gdal dataset (metadata that is not know yet when calling odo.odo(source,target) ). How could I achieve this?
a short version of my code so far:
import odo
from odo import resource, append
import gdal
import netCDF4 as nc4
import numpy as np
#resource.register('.+\.tif')
def resource_gdal(uri, **kwargs):
ds = gdal.Open(uri)
# metadata I need to transfer to netcdf
b = ds.GetGeoTransform() #bbox, interval
return ds
#resource.register('.+\.nc')
def resource_netcdf(uri, dshape=None, **kwargs):
ds = nc4.Dataset(uri,'w')
# create lat lon dimensions and variables
ds.createDimension(lat, dshape[0].val)
ds.createDimension(lon, dshape[1].val)
lat = ds.createVariable('lat','f4', ('lat',))
lon = ds.createVariable('lon','f4', ('lon',))
# create a range from the **gdal metadata**
lat_array = np.arange(dshape[0].val)*b[1]+b[0]
lon_array = np.arange(dshape[1].val)*b[5]+b[3]
# assign the range to the netcdf variable
lat[:] = lat_array
lon[:] = lon_array
# create the variable which will hold the gdal data
data = ds.createVariable('data', 'f4', ('lat', 'lon',))
return data
#append.register(nc4.Variable, gdal.Dataset)
def append_gdal_to_nc4(tgt, src, **kwargs):
arr = src.ReadAsArray()
tgt[:] = arr
return tgt
Thanks!
I don't have much experience with odo, but from browsing the source code and docs it looks like resource_netcdf() should not be involved in translating gdal data to netcdf. Translating should be the job of a gdal_to_netcdf() function decorated by convert.register. In such a case, the gdal.Dataset object returned by resource_gdal would have all sufficient information (georeferencing, pixel size) to make a netcdf.

Exporting a 3D numpy to a VTK file for viewing in Paraview/Mayavi

For those that want to export a simple 3D numpy array (along with axes) to a .vtk (or .vtr) file for post-processing and display in Paraview or Mayavi there's a little module called PyEVTK that does exactly that. The module supports structured and unstructured data etc..
Unfortunately, even though the code works fine in unix-based systems I couldn't make it work (keeps crashing) on any windows installation which simply makes things complicated. Ive contacted the developer but his suggestions did not work
Therefore my question is:
How can one use the from vtk.util import numpy_support function to export a 3D array (the function itself doesn't support 3D arrays) to a .vtk file? Is there a simple way to do it without creating vtkDatasets etc etc?
Thanks a lot!
It's been forever and I had entirely forgotten asking this question but I ended up figuring it out. I've written a post about it in my blog (PyScience) providing a tutorial on how to convert between NumPy and VTK. Do take a look if interested:
pyscience.wordpress.com/2014/09/06/numpy-to-vtk-converting-your-numpy-arrays-to-vtk-arrays-and-files/
It's not a direct answer to your question, but if you have tvtk (if you have mayavi, you should have it), you can use it to write your data to vtk format. (See: http://code.enthought.com/projects/files/ETS3_API/enthought.tvtk.misc.html )
It doesn't use PyEVTK, and it supports a broad range of data sources (more than just structured and unstructured grids), so it will probably work where other things aren't.
As a quick example (Mayavi's mlab interface can make this much less verbose, especially if you're already using it.):
import numpy as np
from enthought.tvtk.api import tvtk, write_data
data = np.random.random((10,10,10))
grid = tvtk.ImageData(spacing=(10, 5, -10), origin=(100, 350, 200),
dimensions=data.shape)
grid.point_data.scalars = np.ravel(order='F')
grid.point_data.scalars.name = 'Test Data'
# Writes legacy ".vtk" format if filename ends with "vtk", otherwise
# this will write data using the newer xml-based format.
write_data(grid, 'test.vtk')
And a portion of the output file:
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 10 10 10
SPACING 10 5 -10
ORIGIN 100 350 200
POINT_DATA 1000
SCALARS Test%20Data double
LOOKUP_TABLE default
0.598189 0.228948 0.346975 0.948916 0.0109774 0.30281 0.643976 0.17398 0.374673
0.295613 0.664072 0.307974 0.802966 0.836823 0.827732 0.895217 0.104437 0.292796
0.604939 0.96141 0.0837524 0.498616 0.608173 0.446545 0.364019 0.222914 0.514992
...
...
TVTK of Mayavi has a beautiful way of writing vtk files. Here is a test example I have written for myself following #Joe and tvtk documentation. The advantage it has over evtk, is the support for both ascii and html.Hope it will help other people.
from tvtk.api import tvtk, write_data
import numpy as np
#data = np.random.random((3, 3, 3))
#
#i = tvtk.ImageData(spacing=(1, 1, 1), origin=(0, 0, 0))
#i.point_data.scalars = data.ravel()
#i.point_data.scalars.name = 'scalars'
#i.dimensions = data.shape
#
#w = tvtk.XMLImageDataWriter(input=i, file_name='spoints3d.vti')
#w.write()
points = np.array([[0,0,0], [1,0,0], [1,1,0], [0,1,0]], 'f')
(n1, n2) = points.shape
poly_edge = np.array([[0,1,2,3]])
print n1, n2
## Scalar Data
#temperature = np.array([10., 20., 30., 40.])
#pressure = np.random.rand(n1)
#
## Vector Data
#velocity = np.random.rand(n1,n2)
#force = np.random.rand(n1,n2)
#
##Tensor Data with
comp = 5
stress = np.random.rand(n1,comp)
#
#print stress.shape
## The TVTK dataset.
mesh = tvtk.PolyData(points=points, polys=poly_edge)
#
## Data 0 # scalar data
#mesh.point_data.scalars = temperature
#mesh.point_data.scalars.name = 'Temperature'
#
## Data 1 # additional scalar data
#mesh.point_data.add_array(pressure)
#mesh.point_data.get_array(1).name = 'Pressure'
#mesh.update()
#
## Data 2 # Vector data
#mesh.point_data.vectors = velocity
#mesh.point_data.vectors.name = 'Velocity'
#mesh.update()
#
## Data 3 additional vector data
#mesh.point_data.add_array( force)
#mesh.point_data.get_array(3).name = 'Force'
#mesh.update()
mesh.point_data.tensors = stress
mesh.point_data.tensors.name = 'Stress'
# Data 4 additional tensor Data
#mesh.point_data.add_array(stress)
#mesh.point_data.get_array(4).name = 'Stress'
#mesh.update()
write_data(mesh, 'polydata.vtk')
# XML format
# Method 1
#write_data(mesh, 'polydata')
# Method 2
#w = tvtk.XMLPolyDataWriter(input=mesh, file_name='polydata.vtk')
#w.write()
I know it is a bit late and I do love your tutorials #somada141. This should work too.
def numpy2VTK(img, spacing=[1.0, 1.0, 1.0]):
# evolved from code from Stou S.,
# on http://www.siafoo.net/snippet/314
# This function, as the name suggests, converts numpy array to VTK
importer = vtk.vtkImageImport()
img_data = img.astype('uint8')
img_string = img_data.tostring() # type short
dim = img.shape
importer.CopyImportVoidPointer(img_string, len(img_string))
importer.SetDataScalarType(VTK_UNSIGNED_CHAR)
importer.SetNumberOfScalarComponents(1)
extent = importer.GetDataExtent()
importer.SetDataExtent(extent[0], extent[0] + dim[2] - 1,
extent[2], extent[2] + dim[1] - 1,
extent[4], extent[4] + dim[0] - 1)
importer.SetWholeExtent(extent[0], extent[0] + dim[2] - 1,
extent[2], extent[2] + dim[1] - 1,
extent[4], extent[4] + dim[0] - 1)
importer.SetDataSpacing(spacing[0], spacing[1], spacing[2])
importer.SetDataOrigin(0, 0, 0)
return importer
Hope it helps!
Here's a SimpleITK version with the function load_itk taken from here:
import SimpleITK as sitk
import numpy as np
if len(sys.argv)<3:
print('Wrong number of arguments.', file=sys.stderr)
print('Usage: ' + __file__ + ' input_sitk_file' + ' output_sitk_file', file=sys.stderr)
sys.exit(1)
def quick_read(filename):
# Read image information without reading the bulk data.
file_reader = sitk.ImageFileReader()
file_reader.SetFileName(filename)
file_reader.ReadImageInformation()
print('image size: {0}\nimage spacing: {1}'.format(file_reader.GetSize(), file_reader.GetSpacing()))
# Some files have a rich meta-data dictionary (e.g. DICOM)
for key in file_reader.GetMetaDataKeys():
print(key + ': ' + file_reader.GetMetaData(key))
def load_itk(filename):
# Reads the image using SimpleITK
itkimage = sitk.ReadImage(filename)
# Convert the image to a numpy array first and then shuffle the dimensions to get axis in the order z,y,x
data = sitk.GetArrayFromImage(itkimage)
# Read the origin of the ct_scan, will be used to convert the coordinates from world to voxel and vice versa.
origin = np.array(list(reversed(itkimage.GetOrigin())))
# Read the spacing along each dimension
spacing = np.array(list(reversed(itkimage.GetSpacing())))
return data, origin, spacing
def convert(data, output_filename):
image = sitk.GetImageFromArray(data)
writer = sitk.ImageFileWriter()
writer.SetFileName(output_filename)
writer.Execute(image)
def wait():
print('Press Enter to load & convert or exit using Ctrl+C')
input()
quick_read(sys.argv[1])
print('-'*20)
wait()
data, origin, spacing = load_itk(sys.argv[1])
convert(sys.argv[2])

Empty outputs with python GDAL

Hello im new to Gdal and im struggling a with my codes. Everything seems to go well in my code mut the output bands at the end is empty. The no data value is set to 256 when i specify 255, so I don't really know whats wrong. Thanks any help will be appreciated!!!
Here is my code
from osgeo import gdal
from osgeo import gdalconst
from osgeo import osr
from osgeo import ogr
import numpy
#graticule
src_ds = gdal.Open("E:\\NFI_photo_plot\\photoplotdownloadAllCanada\\provincial_merge\\Aggregate\\graticule1.tif")
band = src_ds.GetRasterBand(1)
band.SetNoDataValue(0)
graticule = band.ReadAsArray()
print('graticule done')
band="none"
#Biomass
dataset1 = gdal.Open("E:\\NFI_photo_plot\\photoplotdownloadAllCanada\provincial_merge\\Aggregate\\Biomass_NFI.tif")
band1 = dataset1.GetRasterBand(1)
band1.SetNoDataValue(-1)
Biomass = band1.ReadAsArray()
maskbiomass = numpy.greater(Biomass, -1).astype(int)
print("biomass done")
Biomass="none"
band1="none"
dataset1="none"
#Baseline
dataset2 = gdal.Open("E:\\NFI_photo_plot\\Baseline\\TOTBM_250.tif")
band2 = dataset2.GetRasterBand(1)
band2.SetNoDataValue(0)
baseline = band2.ReadAsArray()
maskbaseline = numpy.greater(baseline, 0).astype(int)
print('baseline done')
baseline="none"
band2="none"
dataset2="none"
#sommation
biosource=(graticule+maskbiomass+maskbaseline)
biosource1=numpy.uint8(biosource)
biosource="none"
#Écriture
dst_file="E:\\NFI_photo_plot\\photoplotdownloadAllCanada\\provincial_merge\\Aggregate\\Biosource.tif"
dst_driver = gdal.GetDriverByName('GTiff')
dst_ds = dst_driver.Create(dst_file, src_ds.RasterXSize,
src_ds.RasterYSize, 1, gdal.GDT_Byte)
#projection
dst_ds.SetProjection( src_ds.GetProjection() )
dst_ds.SetGeoTransform( src_ds.GetGeoTransform() )
outband=dst_ds.GetRasterBand(1)
outband.WriteArray(biosource1,0,0)
outband.SetNoDataValue(255)
biosource="none"
graticule="none"
A few pointers:
Where you have ="none", these need to be = None to close/cleanup the objects, otherwise you are setting the objects to an array of characters: n o n e, which is not what you intend to do.
Why do you have band1.SetNoDataValue(-1), while other NoData values are 0? Is this data source signed or unsigned? If unsigned, then -1 doesn't exist.
When you open rasters with gdal.Open without the access option, it defaults to gdal.GA_ReadOnly, which means your subsequent SetNoDataValue calls do nothing. If you want to modify the dataset, you need to use gdal.GA_Update as your second parameter to gdal.Open.
Another strategy to create a new raster is to use driver.CreateCopy; see the tutorial for details.