I have a 3d numpy array with the shape (128,128,384). Let's call this array "S". This array only contains binary values either 0s or 1s.
\now \i want to get a 3d plot of this array in such a way that \ I have a grid of indices (x,y,z) and for every entry of S when it is one \ I should get a point printed at the corresponding indices in the 3d grid. e.g. let's say I have 1 entry at S[120,50,36], so I should get a dot at that point in the grid.
So far I have tried many methods but have been able to implement one method that works which is extremely slow and hence useless in my case. that method is to iterate over the entire array and use a scatter plot. \here is a snippet of my code:
from numpy import np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
for i in range(0,128):
for j in range(0,128):
for k in range(0,384):
if S[i,j,k]==1:
ax.scatter(i,j,k,zdir='z', marker='o')
Please suggest to me any method which would be faster than this.
Also, please note that I am not trying to plot the entries in my array. The entries in my array are only a condition that tells me if I should plot corresponding to certain indices.
Thank you very much
You can use numpy.where.
In your example, remove the for loops and just use:
i, j, k = np.where(S==1)
ax.scatter(i,j,k,zdir='z', marker='o')
Related
I am using an array (51x51x181) to make a 3d interpolation in python (and I can calculate any point inbetween if needed).
I need to reduce the size of the array and would like to do this with the least amount of error possible.
Attached you find an example, with the error function I would like to improve on. The number of values in the array should stay the same, however the Angles and Shifts in the example do not have to be equally spaced.
import numpy as np
from scipy.interpolate import RegularGridInterpolator
import itertools
Data=np.zeros((5,180))
Angles=np.linspace(0,360,10)
Shifts=np.linspace(0,100,10)
Data=np.sin(np.deg2rad(Angles[:,None]+Shifts[None,:]))
interp = RegularGridInterpolator((Angles, Shifts),Data, bounds_error=False, fill_value=None)
def errorfunc():
Angles=np.linspace(0,360,50)
Shifts=np.linspace(0,100,50)
Function_Results=np.sin(np.deg2rad(Angles[:,None]+Shifts[None,:]).flatten())
Data_interp=interp(np.array(list(itertools.product(Angles,Shifts))))
Error=np.sqrt(np.mean(np.square(Function_Results-Data_interp)))
return(Error)
I could not find a feasible optimizer in scipy (tried some with poor performance). Is there a standard way to do this?
I have a set of simulation data to which I want to perform an FFT. I am using matplotlib to do this. However, the FFT is looking strange, so I don't know if I am missing something in my code. Would appreciate any help.
Original data:
time-varying data
FFT:
FFT
Code for the FFT calculation:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data[['mz ()']].to_numpy()
time=data[['# t (s)']].to_numpy()
ax.plot(time[:300],mz_res[:300])
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res[:300])
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
fig2, ax_fft=plt.subplots()
ax_fft.plot(frequencies[:150],power[:150]) // taking just half of the frequency range
I am just plotting the first 300 datapoints because the rest is not important.
Am I doing something wrong here? I was expecting single frequency peaks not what I got. Thanks!
Link for the input file:
Pastebin
EDIT
Turns out the mistake was in the conversion of the dataframe to a numpy array. For a reason I have yet to understand, if I convert a dataframe to a numpy array it is converted as an array of arrays, i.e., each element of the resulting array is itself an array of a single element. When I change the code to:
mz_res=data['mz ()'].to_numpy()
so that it is a conversion from a pandas series to a numpy array, then the FFT behaves as expected and I get single frequency peaks from the FFT.
So I just put this here in case someone else finds it useful. Lesson learned: the conversion from a pandas series to a numpy array yields a different result than the conversion from a pandas dataframe.
Solution:
Using the conversion from pandas series to numpy array instead of pandas dataframe to numpy array.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data['mz ()'].to_numpy() #series to array
time=data[['# t (s)']].to_numpy() #dataframe to array
ax.plot(time,mz_res)
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res)
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
indices=np.where(frequencies>0)
freq_pos=frequencies[indices]
power_pos=power[indices]
fig2, ax_fft=plt.subplots()
ax_fft.plot(freq_pos,power_pos) # taking just half of the frequency range
ax_fft.set_title("FFT")
ax_fft.set_xlabel('Frequency (Hz)')
ax_fft.set_ylabel('FFT Amplitude')
ax_fft.set_yscale('linear')
Yields:
Time-dependence
FFT
The method plt.hist() in pyplot has a way to create a 'step-like' plot style when calling
plt.hist(data, histtype='step')
but the 'ordinary' methods that plot raw data without processing (plt.plot(), plt.scatter(), etc.) apparently do not have style options to obtain the same result. My goal is to plot a given set of points using that style, without making histogram of these points.
Is that achievable with standard library methods for plotting a given 2-D set of points?
I also think that there is at least one hack (generating a fake distribution which would have histogram equal to our data) and a 'low-level' solution to draw each segment manually, but none of these ways seems favorable.
Maybe you are looking for drawstyle="steps".
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
data = np.cumsum(np.random.randn(10))
plt.plot(data, drawstyle="steps")
plt.show()
Note that this is slightly different from histograms, because the lines do not go to zero at the ends.
Why is it printing the bins from the histogram?
Shouldn't the semicolon suppress it?
In [1]
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all";
In [2]
%matplotlib inline
data ={'first':np.random.rand(100),
'second':np.random.rand(100)}
fig, axes = plt.subplots(2)
for idx, k in enumerate(data):
axes[idx].hist(data[k], bins=20);
You've set InteractiveShell.ast_node_interactivity = "all";, so you've set all nodes to have ast interactivity enabled. So you get the values of data = {..}
And ; works only for the last top level expression, axes[idx].hist(data[k], bins=20); is not a top level expression, as it is nested in the for, the last top level node is the for, which is a statement.
Simply add a last no-op statement, and end it with ;
%matplotlib inline
data ={'first':np.random.rand(100),
'second':np.random.rand(100)};
fig, axes = plt.subplots(2);
for idx, k in enumerate(data):
axes[idx].hist(data[k], bins=20)
pass; # or None; 0; "foo"; ...
And you won't have any outputs.
Use codetransformer %%ast magic to quickly see the ast of an expression.
If you read the documentation, you will see exactly what it returns - a three item tuple described below. You can display it in the notebook by placing a ? at the end of the call to the histogram. It looks like your InteractiveShell is making it display. Normally, yes a semicolon would suppress the output, although inside of a loop it would be unnecessary.
Returns
n : array or list of arrays
The values of the histogram bins. See normed and weights
for a description of the possible semantics. If input x is an
array, then this is an array of length nbins. If input is a
sequence arrays [data1, data2,..], then this is a list of
arrays with the values of the histograms for each of the arrays
in the same order.
bins : array
The edges of the bins. Length nbins + 1 (nbins left edges and right
edge of last bin). Always a single array even when multiple data
sets are passed in.
patches : list or list of lists
Silent list of individual patches used to create the histogram
or list of such list if multiple input datasets.
I generated the attached image using matplotlib (png format). I would like to use eps or pdf, but I find that with all the data points, the figure is really slow to render on the screen. Other than just plotting less of the data, is there anyway to optimize it so that it loads faster?
I think you have three options:
As you mentioned yourself, you can plot fewer points. For the plot you showed in your question I think it would be fine to only plot every other point.
As #tcaswell stated in his comment, you can use a line instead of points which will be rendered more efficiently.
You could rasterize the blue dots. Matplotlib allows you to selectively rasterize single artists, so if you pass rasterized=True to the plotting command you will get a bitmapped version of the points in the output file. This will be way faster to load at the price of limited zooming due to the resolution of the bitmap. (Note that the axes and all the other elements of the plot will remain as vector graphics and font elements).
First, if you want to show a "trend" in your plot , and considering the x,y arrays you are plotting are "huge" you could apply a random sub-sampling to your x,y arrays, as a fraction of your data:
import numpy as np
import matplotlib.pyplot as plt
fraction = 0.50
x_resampled = []
y_resampled = []
for k in range(0,len(x)):
if np.random.rand() < fraction:
x_resampled.append(x[k])
y_resampled.append(y[k])
plt.scatter(x_resampled,y_resampled , s=6)
plt.show()
Second, have you considered using log-scale in the x-axis to increase visibility?
In this example, only the plotting area is rasterized, the axis are still in vector format:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(size=400000)
y = np.random.uniform(size=400000)
plt.scatter(x, y, marker='x', rasterized=False)
plt.savefig("norm.pdf", format='pdf')