can any one explain about this code output - numpy

1.I have tried to understand this code but I couldn't.would you help me?
a = np.arange(5)
hist, bin_edges = np.histogram(a, density=True)
hist
2.why is the output like this ?
array([0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5])

The default for the bins argument to np.histogram is 10. So the histogram counts which bins your array elements fall into. In this case a = np.array([0, 1, 2, 3, 4]). If we are creating a histogram with 10 bins then we break the interval 0-4 (inclusive) into 10 equal bins. This gives us (note that 11 end points gives us 10 bins):
np.linspace(0, 4, 11) = array([0. , 0.4, 0.8, 1.2, 1.6, 2. , 2.4, 2.8, 3.2, 3.6, 4. ])
We now just need to see which bins your elements in the array a fall into. We can count them as follows:
[1, 0, 1, 0, 0, 1, 0, 1, 0, 1]
Now this is still not exactly what the output is. The density=True argument states (from the docs): "If True, the result is the value of the
probability density function at the bin, normalized such that
the integral over the range is 1."
Each bin (of height .5) has a width of .4 so 5 x .5 x .4 = 1 as is the requirement of this argument.

numpy.arange(5) generates a numpy array of 5 elements evenly spaced: array([0,1,2,3,4]).
np.histogram(a, density=True) returns the bin edges and the values of an histogram obtained from your array a using 10 bins (which is the default value).
bin_edges gives the edges of the bin, while histogram gives the number of occurrences for each bin. Given that you set density=True the occurrences are normalized (the integral over the range is 1.).
Look here for more information.

Please check this post. Hint: When you call np.histogram, the default bin value is 10, so that's why your output has 10 elements.

Related

Numpy Random Choice with Non-regular Array Size

I'm making an array of sums of random choices from a negative binomial distribution (nbd), with each sum being of non-regular length. Right now I implement it as follows:
import numpy
from numpy.random import default_rng
rng = default_rng()
nbd = rng.negative_binomial(1, 0.5, int(1e6))
gmc = [12, 35, 4, 67, 2]
n_pp = np.empty(len(gmc))
for i in range(len(gmc)):
n_pp[i] = np.sum(rng.choice(nbd, gmc[i]))
This works, but when I perform it over my actual data it's very slow (gmc is of dimension 1e6), and I would like to vary this for multiple values of n and p in the nbd (in this example they're set to 1 and 0.5, respectively).
I'd like to work out a pythonic way to do this which eliminates the loop, but I'm not sure it's possible. I want to keep default_rng for the better random generation than the older way of doing it (np.random.choice), if possible.
The distribution of the sum of m samples from the negative binomial distribution with parameters (n, p) is the negative binomial distribution with parameters (m*n, p). So instead of summing random selections from a large, precomputed sample of negative_binomial(1, 0.5), you can generate your result directly with negative_binomial(gmc, 0.5):
In [68]: gmc = [12, 35, 4, 67, 2]
In [69]: npp = rng.negative_binomial(gmc, 0.5)
In [70]: npp
Out[70]: array([ 9, 34, 1, 72, 7])
(The negative_binomial method will broadcast its inputs, so we can pass gmc as an argument to generate all the samples with one call.)
More generally, if you want to vary the n that is used to generate nbd, you would multiply that n by the corresponding element in gmc and pass the product to rng.negative_binomial.

Count number of unique colours in image [duplicate]

This question already has answers here:
Most dominant color in RGB image - OpenCV / NumPy / Python
(3 answers)
Closed 3 years ago.
I am trying to count the number of unique colours in an image. I have some code that I think should work however when I run it on an image its saying a I have 252 different colours out of a possible 16,777,216‬. That seems wrong given the image is BGR so shouldn't their be much more different colours (thousands not hundreds?)?
def count_colours(src):
unique, counts = np.unique(src, return_counts=True)
print(counts.size)
return counts.size
src = cv2.imread('../../images/di8.jpg')
src = imutils.resize(src, height=300)
count_colours(src) # outputs 252 different colours!? only?
Is that value correct? And if not how can I fix my function count_colours()?
Source image:
Edit: is this correct?
def count_colours(src):
unique, counts = np.unique(src.reshape(-1, src.shape[-1]), axis=0, return_counts=True)
return counts.size
If you look at the uniques you are getting back, I'm pretty sure you'll find they are scalars.
You need to use the axis keyword:
>>> import numpy as np
>>> from scipy.misc import face
>>>
>>> img = face()
>>> np.unique(img.reshape(-1, img.shape[-1]), axis=0, return_counts=True)
(array([[ 0, 0, 5],
[ 0, 0, 7],
[ 0, 0, 9],
...,
[255, 248, 255],
[255, 249, 255],
[255, 252, 255]], dtype=uint8), array([1, 2, 2, ..., 1, 1, 1]))
The comment by # Edeki Okoh is correct. You need to find a way to take the color channels into account. There is probably a much cleaner solution but a hacky way to do this would be something like this. Each color channels has values from 0 to 255 so we add 1 in order to make sure that it gets multiplied. Blue will represent the last the digits, green the middle three ones and red the first three. Now every value is representing a unique color.
b,g,r = cv2.split(src)
shiftet_im = b + 1000 * (g + 1) + 1000 * 1000 * (r + 1)
The resulting image should have one channel with each value representing a unique color combination.
I think you only counted for a single channel e.g R-value out of full RGB channel. that's why you have only 252 discrete values.
In theory R G B each can have 256 discrete states.
256*256*256 =16777216
means in total you can have 16777216 possibilities of colors.
My suggestion is to convert RGB uchar CV_8UC3 into a single 32bit data structure like CV_32FC1
Let
Given image as input
# my test small sie text image. which I can count the number of the state by hand
import cv2
import numpy as np
image=cv2.imread('/home/usr/naneDownloads/vuQ9y.png' )# change here
b,g,r = cv2.split(image)
out_in_32U_2D = np.int32(b) << 16 + np.int32(g) << 8 + np.int32(r) #bit wise shift 8 for each channel.
out_in_32U_1D= out_in_32U_2D.reshape(-1) #convert to 1D
np.unique(out_in_32U_1D)
array([-2147483648, -2080374784, -1073741824, -1006632960, 0,
14336, 22528, 30720, 58368, 91136,
123904, 237568, 368640, 499712, 966656,
1490944, 2015232, 3932160, 6029312, 8126464,
15990784, 24379392, 32768000, 65011712, 67108864,
98566144, 132120576, 264241152, 398458880, 532676608,
536870912, 805306368, 1073741824, 1140850688, 1342177280,
1610612736, 1879048192], dtype=int32)
len(np.unique(out_in_32U_1D))
37 # correct for my test wirting paper when compare when my manual counting
The code here should be able to provide you with what you needed

How to invert a numpy histogram back to intensities

I'm wondering if there is a numpythonic way of inverting a histogram back to an intensity signal.
For example:
>>> A = np.array([7, 2, 1, 4, 0, 7, 8, 10])
>>> H, edge = np.histogram(A, bins=10, range=(0,10))
>>> np.sort(A)
[ 0 1 2 4 7 7 8 10]
>>> H
[1 1 1 0 1 0 0 2 1 1]
>>> edge
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
Is there a way to reconstruct the original A intensities using the H and edge? Of course, positional information will have been lost, but I'd just like to recover the intensities and relative number of occurrences.
I have this loopy way of doing it:
>>> reco = []
>>> for i, h in enumerate(H):
... for _ in range(h):
... reco.append(edge[i])
...
>>> reco
[0.0, 1.0, 2.0, 4.0, 7.0, 7.0, 8.0, 9.0]
# I've done something wrong with the right-most histogram bin, but we can ignore that for now
For large histograms, the loopy way is inefficient. Is there a vectorized equivalent of what I did in the loop? (my gut says that numpy.digitize will be involved..)
Sure, you can use np.repeat for this:
import numpy as np
A = np.array([7, 2, 1, 4, 0, 7, 8, 10])
counts, edges = np.histogram(A, bins=10, range=(0,10))
print(np.repeat(edges[:-1], counts))
# [ 0. 1. 2. 4. 7. 7. 8. 9.]
Obviously it's impossible to recover the exact position of a value within a bin, since you lose that information in the process of generating the histogram. You could either use the lower or upper bin edge (as in the example above), or you could use the center value, e.g.:
print(np.repeat((edges[:-1] + edges[1:]) / 2., counts))
# [ 0.5 1.5 2.5 4.5 7.5 7.5 8.5 9.5]

Find subgroups of a numpy array

I have a numpy array like this one:
A = ([249, 250, 3016, 3017, 5679, 5680, 8257, 8258,
10756, 10757, 13178, 13179, 15531, 15532, 17824, 17825,
20058, 20059, 22239, 22240, 24373, 24374, 26455, 26456,
28491, 28492, 30493, 30494, 32452, 32453, 34377, 34378,
36264, 36265, 38118, 38119, 39939, 39940, 41736, 41737,
43501, 43502, 45237, 45238, 46950, 46951, 48637, 48638])
I would like to write a small script that finds a subgroup of values of the array for which the difference is smaller than a certain threshold, let say 3, and that returns the highest value of the subgroup. In the case of A array the output should be:
A_out =([250,3017,5680,8258,10757,13179,...])
Is there a numpy function for that?
Here's a vectorized Numpy approach.
First, the data (in a numpy array) and the threshold:
In [41]: A = np.array([249, 250, 3016, 3017, 5679, 5680, 8257, 8258,
10756, 10757, 13178, 13179, 15531, 15532, 17824, 17825,
20058, 20059, 22239, 22240, 24373, 24374, 26455, 26456,
28491, 28492, 30493, 30494, 32452, 32453, 34377, 34378,
36264, 36265, 38118, 38119, 39939, 39940, 41736, 41737,
43501, 43502, 45237, 45238, 46950, 46951, 48637, 48638])
In [42]: threshold = 3
The following produces the array delta. It is almost the same as delta = np.diff(A), but I want to include one more value that is greater than the threshold at the end of delta.
In [43]: delta = np.hstack((diff(A), threshold + 1))
Now the group maxima are simply A[delta > threshold]:
In [46]: A[delta > threshold]
Out[46]:
array([ 250, 3017, 5680, 8258, 10757, 13179, 15532, 17825, 20059,
22240, 24374, 26456, 28492, 30494, 32453, 34378, 36265, 38119,
39940, 41737, 43502, 45238, 46951, 48638])
Or, if you want, A[delta >= threshold]. That gives the same result for this example:
In [47]: A[delta >= threshold]
Out[47]:
array([ 250, 3017, 5680, 8258, 10757, 13179, 15532, 17825, 20059,
22240, 24374, 26456, 28492, 30494, 32453, 34378, 36265, 38119,
39940, 41737, 43502, 45238, 46951, 48638])
There is a case where this answer differs from #DrV's answer. From your description, it isn't clear to me how a set of values such as 1, 2, 3, 4, 5, 6 should be handled. The consecutive differences are all 1, but the difference between the first and last is 5. The numpy calculation above will treat these as a single group. #DrV's answer will create two groups.
Interpretation 1: The value of an item in a group must not differ more than 3 units from that of the first item of the group
This is one of the things where NumPy's capabilities are at their limits. As you will have to iterate through the list, I suggest a pure Python approach:
first_of_group = A[0]
previous = A[0]
group_lasts = []
for a in A[1:]:
# if this item no longer belongs to the group
if abs(a - first_of_group) > 3:
group_lasts.append(previous)
first_of_group = a
previous = a
# add the last item separately, because it is always a last of the group
group_lasts.append(a)
Now you have the group lasts in group_lasts.
Using any NumPy array functionality here does not seem to provide much help.
Interpretation 2: The value of an item in a group must not differ more than 3 units from the previous item
This is easier, as we can easily form a list of group breaks as in Warren Weckesser's answer. Here NumPy is of a lot of help.

How is the pyplot histogram bins interpreted?

I am confused about the matplotlib hist function.
The documentation explains:
If a sequence of values, the values of the lower bound of the bins to be used.
But when I have two values in sequence i.e [0,1], I only get 1 bin.
And when I have three like so:
plt.hist(votes, bins=[0,1,2], normed=True)
I only get two bins. My guess is that the last value is just an upper bound for the last bin.
Is there a way to have "the rest" of the values in the last bin, other than to but a very big value there? (or in other words, without making that bin much bigger than the others)
It seems like the last bin value is included in the last bin
votes = [0,0,1,2]
plt.hist(votes, bins=[0,1])
This gives me one bin of height 3. i.e. 0,0,1.
While:
votes = [0,0,1,2]
plt.hist(votes, bins=[0,1,2])
Gives me two bins with two in each.
I find this counter intuative, that adding a new bin changes the widthlimits of the others.
votes = [0,0,1]
plit.hist[votes, bins=2)
yeilds two bins size 2 and 1. These seems to have been split on 0,5 since the x-axis goes from 0 to 1.
How should the bins array be interpreted? How is the data split?
votes = [0, 0, 1, 2]
plt.hist(votes, bins=[0,1])
this gives you one bin of height 3, because it splits the data into one single bin with the interval: [0, 1]. It puts into that bin the values: 0, 0, and 1.
votes = [0, 0, 1, 2]
plt.hist(votes, bins=[0, 1, 2])
this gives you an histogram with bins with intervals: [0, 1[ and [1, 2];
so you have 2 items in the 1st bin (the 0 and 0), and 2 items in the 2nd bin (the 1 and 2).
If you try to plot:
plt.hist(votes, bins=[0, 1, 2, 3])
the idea behind the data splitting into bins is the same:
you will get three intervals:
[0, 1[; [1, 2[; [2, 3], and you will notice that the value 2 changes its bin, going to the bin with interval [2, 3] (instead of staying in the bin [1, 2] as in the previous example).
In conclusion, if you have an ordered array in the bins argument like:
[i_0, i_1, i_2, i_3, i_4, ..., i_n]
that will create the bins:
[i_0, i_1[
[i_1, i_2[
[i_2, i_3[
[i_3, i_4[
...
[i_(n-1), i_n]
with the boundaries of each open or closed according to the brackets.