Utilizing Redis for setting and retrieving bits - redis

Main idea is to store random bits to redis and retrieve and drop them when required.
Pseudocode which sets the boolean value to the next position of the bitcollection. This way I should have a growing collection of bits.
Setter:
boolean = rand(1/0)
SETBIT bitcollection boolean (BITCOUNT bitcollection + 1)
Getter:
GETBIT bitcollection 0
Questions:
How can I drop the retrieved bit from position 0?
Is it possible to retrieve more than just the first bit like (0..n)?
Ruby code for better understanding what I try to achieve.
bitcollection = [0, 1, 0, 1, 0]
# set
bitcollection.push 1 #=> [0, 1, 0, 1, 0, 1]
# get
bitcollection.shift(1) #=> 0
puts bitcollection #=> [1, 0, 1, 0, 1]

Related

Unexpected behavior when trying to normalize a column in numpy.array (version 1.17.4)

So, I was trying to normalize (i.e. max = 1, min = value/max) a specific column within a numpy array.
I hoped this piece of code would do the trick:
bar = np.arange(12).reshape(6,2)
bar
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
bar[:,1] = bar[:,1] / bar[:,1].max()
bar
array([[ 0, 0],
[ 2, 0],
[ 4, 0],
[ 6, 0],
[ 8, 0],
[10, 1]])
works as expected if the type of each value is 'float'.
foo = np.array([[1.1,2.2],
[3.3,4.4],
[5.5,6.6]])
foo[:,1] = foo[:,1] / foo[:,1].max()
foo
array([[1.1 , 0.33333333],
[3.3 , 0.66666667],
[5.5 , 1. ]])
I guess what I'm asking is where is this default 'int' I'm missing here?
(I'm taking this as a 'learning opportunity')
If you simply execute:
out = bar[:,1] / bar[:,1].max()
print(out)
>>> [0.09090909 0.27272727 0.45454545 0.63636364 0.81818182 1. ]
It's working just fine, since out is a newly created float array made to store these float values. But np.arange(12) gives you an int array by default. bar[:,1] = bar[:,1] / bar[:,1].max() tries to store the float values inside the integer array, and all the values become integers and you get [0 0 0 0 0 1].
To set the array as a float by default:
bar = np.arange(12, dtype='float').reshape(6,2)
Alternatively, you can also use:
bar = np.arange(12).reshape(6,2).astype('float')
It isn't uncommon for us to need to change the data type of the array throughout the program, as you may not always need the dtype you define originally. So .astype() is actually pretty handy in all kinds of scenarios.
From np.arange documentation :
dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.
Since you passed int values it will infer that the values in the array are int and so they won't change to float, you can do like this if you want:
bar = np.arange(12.0).reshape(6,2)

numpy: Cleanly retrieve coordinates (indices) for highest k values - along a specific axis - in ndarray

I would like to be able to:
select k highest values along (or across?) the first dimension
find indices for those k values
assign those values to a new ndarray of equal shape at their respective positions.
I'm wondering if there is a quicker way to achieve the result exemplified below. In particular, I would like to avoid making the batch indices "manually".
Here's my solution:
# Create unordered array (instrumental to the example)
arr = np.arange(24).reshape(2, 3, 4)
arr_1 = arr[0,::2].copy()
arr_2 = arr[1,1::].copy()
arr[0,::2] = arr_2[:,::-1]
arr[1,1:] = arr_1[:,::-1]
# reshape array to: (batch_size, H*W)
arr_batched = arr.reshape(arr.shape[0], -1)
# find indices for k greatest values along all but the 1st dimension.
gr_ind = np.argpartition(arr_batched, -k)[:, -k]
# flatten and unravel indices.
maxk_ind_flat = gr_ind.flatten()
maxk_ind_shape = np.unravel_index(maxk_ind_flat, arr.shape)
# maxk_ind_shape prints: (array([0, 0, 0, 0]), array([2, 2, 0, 0]), array([1, 0, 2, 3]))
# note: unraveling indices obtained by partitioning an array of shape (2, n) will not keep into account the first dimension (here [0,0,0,0])
# Craft batch indices...
batch_indices = np.repeat(np.arange(arr.shape[0], k)
# ...and join
maxk_indices = tuple([batch_indices]+[ind for ind in maxk_ind_shape[1:]])
# The result is used to re-assign k-highest values for each batch element to a destination matrix:
arr2 = np.zeros_like(arr)
arr2[maxk_indices] = arr[maxk_indices]
# arr2 prints:
# array([[[ 0, 0, 0, 0],
# [ 0, 0, 0, 0],
# [23,22, 0, 0]],
#
# [[ 0, 0, 14, 15],
# [ 0, 0, 0, 0],
# [ 0, 0, 0, 0]]])
Any help would be appreciated.
One way would be to use np.[put/take]_along_axis:
gr_ind = np.argpartition(arr_batched,-k,axis=-1)[:,-k:]
arr_2 = np.zeros_like(arr)
np.put_along_axis(arr_2.reshape(arr_batched.shape),gr_ind,np.take_along_axis(arr_batched,gr_ind,-1),-1)

Generate list of random number with condition - numpy [duplicate]

This question already has answers here:
Is there an efficient way to generate N random integers in a range that have a given sum or average?
(6 answers)
Closed 2 years ago.
I would like to generate a list of 15 integers with sum 12, minimum value is 0 and maximum is 6.
I tried following code
def generate(low,high,total,entity):
while sum(entity)!=total:
entity=np.random.randint(low, high, size=15)
return entity
But above function is not working properly. It is too much time consuming.
Please let me know the efficient way to generate such numbers?
The above will, strictly speaking work. But for 15 numbers between 0 and 6, the odds of generating 12 is not that high. In fact we can calculate the number of possibilities with:
F(s, 1) = 1 for 0≤s≤6
and
F(s, n) = Σ6i=0F(s-i, n-1).
We can calculate that with a value:
from functools import lru_cache
#lru_cache()
def f(s, n, mn, mx):
if n < 1:
return 0
if n == 1:
return int(mn <= s <= mx)
else:
if s < mn:
return 0
return sum(f(s-i, n-1, mn, mx) for i in range(mn, mx+1))
That means that there are 9'483'280 possibilities, out of 4'747'561'509'943 total possibilities to generate a sum of 12, or 0.00019975%. It will thus take approximately 500'624 iterations to come up with such solution.
We thus should better aim to find a straight-forward way to generate such sequence. We can do that by each time calculating the probability of generating a number: the probability of generating i as number as first number in a sequence of n numbers that sums up to s is F(s-i, n-1, 0, 6)/F(s, n, 0, 6). This will guarantee that we generate a uniform list over the list of possibilities, if we would each time draw a uniform number, then it will not match a uniform distribution over the entire list of values that match the given condition:
We can do that recursively with:
from numpy import choice
def sumseq(n, s, mn, mx):
if n > 1:
den = f(s, n, mn, mx)
val, = choice(
range(mn, mx+1),
1,
p=[f(s-i, n-1, mn, mx)/den for i in range(mn, mx+1)]
)
yield val
yield from sumseq(n-1, s-val, mn, mx)
elif n > 0:
yield s
With the above function, we can generate numpy arrays:
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 0, 0, 0, 0, 4, 0, 3, 0, 1, 0, 0, 1, 2, 1])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 0, 1, 0, 0, 1, 4, 1, 0, 0, 2, 1, 0, 0, 2])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 1, 0, 0, 2, 0, 3, 1, 3, 0, 1, 0, 0, 0, 1])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([5, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 0, 0, 0, 4, 2, 3, 0, 0, 0, 0, 0, 3, 0, 0])
You could try it implementing it a little bit differently.
import random
def generate(low,high,goal_sum,size=15):
output = []
for i in range(size):
new_int = random.randint(low,high)
if sum(output) + new_int <= goal_sum:
output.append(new_int)
else:
output.append(0)
random.shuffle(output)
return output
Also, if you use np.random.randint, your high will actually be high-1
Well, there is a simple and natural solution - use distribution which by definition provides you array of values with the fixed sum. Simplest one is Multinomial Distribution. The only code to add is to check and reject (and repeat sampling) if some sampled value is above maximum.
Along the lines
import numpy as np
def sample_sum_interval(n, p, maxv):
while True:
q = np.random.multinomial(n, p)
v = np.where(q > maxv)
if len(v[0]) == 0: # if len(v) > 0, some values are outside the range, reject
return q
return None
np.random.seed(32345)
k = 15
n = 12
maxv = 6
p = np.full((k), np.float64(1.0)/np.float64(k), dtype=np.float64) # probabilities
q = sample_sum_interval(n, p, maxv)
print(q)
print(np.sum(q))
q = sample_sum_interval(n, p, maxv)
print(q)
print(np.sum(q))
q = sample_sum_interval(n, p, maxv)
print(q)
print(np.sum(q))
UPDATE
I quickly looked at #WillemVanOnsem proposed method, and I believe it is different from multinomial used by myself.
If we look at multinomial PMF, and assume equal probabilities for all k numbers,
p1 = ... = pk = 1/k, then we could write PMF as
PMF(x1,...xk)=n!/(x1!...xk!) p1x1...pkxk =
n!/(x1!...xk!) k-x1...k-xk = n!/(x1!...xk!) k-Sumixi = n!/(x1!...xk!) k-n
Obviously, probabilities of particular x1...xk combinations would be different from each other due to factorials in denominator (modulo permutations, of course), which is different from #WillemVanOnsem approach where all of them would have equal probabilities to appear, I believe.
Moral of the story - those methods produce different distributions.

How to detect constant absolute delta in integer series?

I have integer series as follows:
data1 = [1, 2, 3, 4, 3, 2, 1, 2, 1, 1]
data2 = [4, 0, 0, 0, 8, 0, 0, 0]
We can see data1 seems to be "continuous" while data2 is not, as data1 has a maximum constant absolute delta of 1.
How can I decide using Pandas that data1 is "continuous", and data2 is not?
Similar to Andrey's solution but this takes advantage of pandas' rolling windows series method.
data1.rolling(2).apply(lambda x: abs(np.diff(x)) <= 1).all()
>>> True
data2.rolling(2).apply(lambda x: abs(np.diff(x)) <= 1).all()
>>> False
Define continuous to mean "consecutive differences are at most 1 in absolute value". To detect this, you can use .diff():
In [1]: series1, series2 = pd.Series(data1), pd.Series(data2)
In [2]: series1.diff().fillna(0).abs().max()
Out[2]: 1.0
In [3]: series2.diff().fillna(0).abs().max()
Out[3]: 8.0
So series1.diff().fillna(0).abs().max() <= 1 will evaluate to True, and series2.diff().fillna(0).abs().max() <= 1 will evaluate to False.

Delete rows from a ndarray in python

I have a 2D - array A, which contains the x and y coordinates of points
array([[ 0, 0],
[ 0, 0],
[ 0, 0],
[ 3, 4],
[ 4, 1],
[ 5, 10],
[ 9, 7]])
as you can see the point ( 0 , 0 ) appears more often.
I want to delete this point so that the array looks like this:
array([[ 3, 4],
[ 4, 1],
[ 5, 10],
[ 9, 7]])
Since the array in real is very huge, it is very important to do this without for loops, otherwise it takes very long.
I'm new to python but i'm used to matlab, where I can solve it very easily with:
A (A(:,1) == 0 & A(:,2) == 0, :) = []
I thought it is almost the same or very similar in python, but I can't figure it out - am totally stuck. Errors like "use a.any()/all()" or "ufunc "bitwise_and" not supported for the input types" appear and I don't know what I should change.
Technically what you are doing in MATLAB is not deleting elements from A. What you are actually doing is creating a new array that lacks the elements of A. It is equivalent to:
>> A = A (A(:,1) ~= 0 | A(:,2) ~= 0, :);
You can do exactly the same thing in numpy:
>>> a = a[(a[:,0] != 0) | (a[:,1] != 0), :]
However, thanks to numpy's automatic broadcasting, you can make this simpler:
>>> a = a[(a != [0, 0]).any(1)]
This will work for any target array so long as it has the same number of columns as a.