NUMPY: Is there a more readable way to index numpy arrays?

NUMPY: Is there a more readable way to index numpy arrays? - numpy

I have a Numpy array with about 7 columns and I need to index certain values a lot but the current way I do this is not easily readable. for eg. I would like to say rates[-1][high] or something similar. I thought that maybe I could just make variables for the eg. high = 2 but I use the same rates data in many different functions so I would have to set these variables in every single function or pass them as parameters but that's not very useful either. Is there a better way to do this? Thanks in advance!
if (
rates[-1][4] > rates[-1][1]
and rates[-2][4] > rates[-2][1]
and rates[-1][4] > rates[-2][4]
):

If I understand the question correctly you could set 'column' 4 to be it's own array.
import numpy as np
rates = rates = 1+.01*np.arange( 36 ).reshape( 4, 9 )
rates
# array([[1. , 1.01, 1.02, 1.03, 1.04, 1.05, 1.06, 1.07, 1.08],
# [1.09, 1.1 , 1.11, 1.12, 1.13, 1.14, 1.15, 1.16, 1.17],
# [1.18, 1.19, 1.2 , 1.21, 1.22, 1.23, 1.24, 1.25, 1.26],
# [1.27, 1.28, 1.29, 1.3 , 1.31, 1.32, 1.33, 1.34, 1.35]])
high = rates[:, 4]
high
# array([1.04, 1.13, 1.22, 1.31])
Your formula becomes:
if (
high[-1] > rates[-1][1]
and high[-2] > rates[-2][1]
and high[-1] > rates[-2][4] # or > high[-2]
):

Related

Algorithm on how to Sample from a Multinomial Distribution

If we are given a multinomial distribution
p=[0.2,0.4,0.1,0.3]
and we have to sample from this distribution over a number of times and return the result, how do I write the algorithm for this?
Eg - if I have a fair die and I want to roll it 20 time and get the total number of times that it landed on which side,
[4, 1, 7, 5, 2, 1]
this should be the result(randomized) -> It landed 4 times on 1, once on 2, etc.
There is a function to do this in Numpy
in numpy we can use
numpy.random.multinomial()
>>> np.random.multinomial(20, [1/6.]*6, size=1)
array([[4, 1, 7, 5, 2, 1]]) # random
I want to understand how the algorithm is written for performing this action
I've tried this approach in python ->
import numpy as np
import random
probs = [0.2, 0.4, 0.1, 0.3]
def sample(count:int)->list:
output = [0,0,0,0]
for i in range(count):
num = random.random()
if(0 < num <= 0.15):
output[2]+=1
elif(0.15 < num <= 0.25):
output[0]+=1
elif(0.25 < num <= 0.35):
output[3]+=1
elif(0.35 < num <= 0.45):
output[1]+=1
return output
final_output = sample(10)
print(final_output)
np.random.multinomial(10, probs, size=1)
But I don't think this is the optimal way, maybe I'm lacking some concepts in Probability?
The actual Code written in Numpy in CPython:
Link to the Numpy file where the code for numpy.random.multinomial() is written starting from line 4176
Possible Duplicate:
How to sample from a multinomial distribution?
References:
https://numpy.org/doc/stable/reference/random/generated/numpy.random.multinomial.html
Random number generation from multinomial distribution in R using rmultinom() function

If you care about sampling from this distribution multiple times, then it is worth looking at the Aliasing method - https://en.wikipedia.org/wiki/Alias_method#:~:text=In%20computing%2C%20the%20alias%20method,arbitrary%20probability%20distribution%20pi.
You can sample in $O(1)$ time after an initial computation of $O(K\log K)$ where $K$ is the size of the support of the distribution.

Numpy Random Choice with Non-regular Array Size

I'm making an array of sums of random choices from a negative binomial distribution (nbd), with each sum being of non-regular length. Right now I implement it as follows:
import numpy
from numpy.random import default_rng
rng = default_rng()
nbd = rng.negative_binomial(1, 0.5, int(1e6))
gmc = [12, 35, 4, 67, 2]
n_pp = np.empty(len(gmc))
for i in range(len(gmc)):
n_pp[i] = np.sum(rng.choice(nbd, gmc[i]))
This works, but when I perform it over my actual data it's very slow (gmc is of dimension 1e6), and I would like to vary this for multiple values of n and p in the nbd (in this example they're set to 1 and 0.5, respectively).
I'd like to work out a pythonic way to do this which eliminates the loop, but I'm not sure it's possible. I want to keep default_rng for the better random generation than the older way of doing it (np.random.choice), if possible.

The distribution of the sum of m samples from the negative binomial distribution with parameters (n, p) is the negative binomial distribution with parameters (m*n, p). So instead of summing random selections from a large, precomputed sample of negative_binomial(1, 0.5), you can generate your result directly with negative_binomial(gmc, 0.5):
In [68]: gmc = [12, 35, 4, 67, 2]
In [69]: npp = rng.negative_binomial(gmc, 0.5)
In [70]: npp
Out[70]: array([ 9, 34, 1, 72, 7])
(The negative_binomial method will broadcast its inputs, so we can pass gmc as an argument to generate all the samples with one call.)
More generally, if you want to vary the n that is used to generate nbd, you would multiply that n by the corresponding element in gmc and pass the product to rng.negative_binomial.

Count number of unique colours in image [duplicate]

This question already has answers here:
Most dominant color in RGB image - OpenCV / NumPy / Python
(3 answers)
Closed 3 years ago.
I am trying to count the number of unique colours in an image. I have some code that I think should work however when I run it on an image its saying a I have 252 different colours out of a possible 16,777,216‬. That seems wrong given the image is BGR so shouldn't their be much more different colours (thousands not hundreds?)?
def count_colours(src):
unique, counts = np.unique(src, return_counts=True)
print(counts.size)
return counts.size
src = cv2.imread('../../images/di8.jpg')
src = imutils.resize(src, height=300)
count_colours(src) # outputs 252 different colours!? only?
Is that value correct? And if not how can I fix my function count_colours()?
Source image:
Edit: is this correct?
def count_colours(src):
unique, counts = np.unique(src.reshape(-1, src.shape[-1]), axis=0, return_counts=True)
return counts.size

If you look at the uniques you are getting back, I'm pretty sure you'll find they are scalars.
You need to use the axis keyword:
>>> import numpy as np
>>> from scipy.misc import face
>>>
>>> img = face()
>>> np.unique(img.reshape(-1, img.shape[-1]), axis=0, return_counts=True)
(array([[ 0, 0, 5],
[ 0, 0, 7],
[ 0, 0, 9],
...,
[255, 248, 255],
[255, 249, 255],
[255, 252, 255]], dtype=uint8), array([1, 2, 2, ..., 1, 1, 1]))

The comment by # Edeki Okoh is correct. You need to find a way to take the color channels into account. There is probably a much cleaner solution but a hacky way to do this would be something like this. Each color channels has values from 0 to 255 so we add 1 in order to make sure that it gets multiplied. Blue will represent the last the digits, green the middle three ones and red the first three. Now every value is representing a unique color.
b,g,r = cv2.split(src)
shiftet_im = b + 1000 * (g + 1) + 1000 * 1000 * (r + 1)
The resulting image should have one channel with each value representing a unique color combination.

I think you only counted for a single channel e.g R-value out of full RGB channel. that's why you have only 252 discrete values.
In theory R G B each can have 256 discrete states.
256*256*256 =16777216
means in total you can have 16777216 possibilities of colors.
My suggestion is to convert RGB uchar CV_8UC3 into a single 32bit data structure like CV_32FC1
Let
Given image as input
# my test small sie text image. which I can count the number of the state by hand
import cv2
import numpy as np
image=cv2.imread('/home/usr/naneDownloads/vuQ9y.png' )# change here
b,g,r = cv2.split(image)
out_in_32U_2D = np.int32(b) << 16 + np.int32(g) << 8 + np.int32(r) #bit wise shift 8 for each channel.
out_in_32U_1D= out_in_32U_2D.reshape(-1) #convert to 1D
np.unique(out_in_32U_1D)
array([-2147483648, -2080374784, -1073741824, -1006632960, 0,
14336, 22528, 30720, 58368, 91136,
123904, 237568, 368640, 499712, 966656,
1490944, 2015232, 3932160, 6029312, 8126464,
15990784, 24379392, 32768000, 65011712, 67108864,
98566144, 132120576, 264241152, 398458880, 532676608,
536870912, 805306368, 1073741824, 1140850688, 1342177280,
1610612736, 1879048192], dtype=int32)
len(np.unique(out_in_32U_1D))
37 # correct for my test wirting paper when compare when my manual counting
The code here should be able to provide you with what you needed

Find subgroups of a numpy array

I have a numpy array like this one:
A = ([249, 250, 3016, 3017, 5679, 5680, 8257, 8258,
10756, 10757, 13178, 13179, 15531, 15532, 17824, 17825,
20058, 20059, 22239, 22240, 24373, 24374, 26455, 26456,
28491, 28492, 30493, 30494, 32452, 32453, 34377, 34378,
36264, 36265, 38118, 38119, 39939, 39940, 41736, 41737,
43501, 43502, 45237, 45238, 46950, 46951, 48637, 48638])
I would like to write a small script that finds a subgroup of values of the array for which the difference is smaller than a certain threshold, let say 3, and that returns the highest value of the subgroup. In the case of A array the output should be:
A_out =([250,3017,5680,8258,10757,13179,...])
Is there a numpy function for that?

Here's a vectorized Numpy approach.
First, the data (in a numpy array) and the threshold:
In [41]: A = np.array([249, 250, 3016, 3017, 5679, 5680, 8257, 8258,
10756, 10757, 13178, 13179, 15531, 15532, 17824, 17825,
20058, 20059, 22239, 22240, 24373, 24374, 26455, 26456,
28491, 28492, 30493, 30494, 32452, 32453, 34377, 34378,
36264, 36265, 38118, 38119, 39939, 39940, 41736, 41737,
43501, 43502, 45237, 45238, 46950, 46951, 48637, 48638])
In [42]: threshold = 3
The following produces the array delta. It is almost the same as delta = np.diff(A), but I want to include one more value that is greater than the threshold at the end of delta.
In [43]: delta = np.hstack((diff(A), threshold + 1))
Now the group maxima are simply A[delta > threshold]:
In [46]: A[delta > threshold]
Out[46]:
array([ 250, 3017, 5680, 8258, 10757, 13179, 15532, 17825, 20059,
22240, 24374, 26456, 28492, 30494, 32453, 34378, 36265, 38119,
39940, 41737, 43502, 45238, 46951, 48638])
Or, if you want, A[delta >= threshold]. That gives the same result for this example:
In [47]: A[delta >= threshold]
Out[47]:
array([ 250, 3017, 5680, 8258, 10757, 13179, 15532, 17825, 20059,
22240, 24374, 26456, 28492, 30494, 32453, 34378, 36265, 38119,
39940, 41737, 43502, 45238, 46951, 48638])
There is a case where this answer differs from #DrV's answer. From your description, it isn't clear to me how a set of values such as 1, 2, 3, 4, 5, 6 should be handled. The consecutive differences are all 1, but the difference between the first and last is 5. The numpy calculation above will treat these as a single group. #DrV's answer will create two groups.

Interpretation 1: The value of an item in a group must not differ more than 3 units from that of the first item of the group
This is one of the things where NumPy's capabilities are at their limits. As you will have to iterate through the list, I suggest a pure Python approach:
first_of_group = A[0]
previous = A[0]
group_lasts = []
for a in A[1:]:
# if this item no longer belongs to the group
if abs(a - first_of_group) > 3:
group_lasts.append(previous)
first_of_group = a
previous = a
# add the last item separately, because it is always a last of the group
group_lasts.append(a)
Now you have the group lasts in group_lasts.
Using any NumPy array functionality here does not seem to provide much help.
Interpretation 2: The value of an item in a group must not differ more than 3 units from the previous item
This is easier, as we can easily form a list of group breaks as in Warren Weckesser's answer. Here NumPy is of a lot of help.

Numpy/Scipy convert raw values to indexed timeseries?

I've looked at the numpy/scipy documentation, but I can't find any builtin function to do this.
I'd like to convert raw numbers (temperatures, as it happens) representing a time series from their raw state to an indexed series (i.e. first value is 100, subsequent values are scaled against the first raw value). So, if the raw values are (15,7.5,5), the indexed values would be (100,50,33) (mental calculation, hence int values).
This is moderately easy to code oneself, but I'd like to use a builtin if possible. A homebrew is:
def indexise(seq,base=0,scale=100):
if not base:
base=seq[0]
return (i*scale/base for i in seq)

If seq is a numpy array, then instead of (i*scale/base for i in seq), you can use a numpy vectorized operation scale*seq/base.
Here's how I might modify your function:
import numpy as np
def indexise(seq, base=None, scale=100):
seq = np.asfarray(seq)
if base is None:
base = seq[0]
result = scale*seq/base
return result
For example,
In [14]: indexise([15, 7.5, 5, 3, 10, 12])
Out[14]:
array([ 100. , 50. , 33.33333333, 20. ,
66.66666667, 80. ])
In [15]: indexise([15, 7.5, 5, 3, 10, 12], base=10)
Out[15]: array([ 150., 75., 50., 30., 100., 120.])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

NUMPY: Is there a more readable way to index numpy arrays? - numpy

Related

Algorithm on how to Sample from a Multinomial Distribution

Numpy Random Choice with Non-regular Array Size

Count number of unique colours in image [duplicate]

Find subgroups of a numpy array

Numpy/Scipy convert raw values to indexed timeseries?

Categories

Resources