"Updating" the RNG in Python - numpy

I have to iterate an operation over several sets of partially randomized copies of an initial array of 0 and 1.
I would like the copies to be different, of course, but also the sets.
For now I use this code (I omitted certain parts that should not interfere in the problem):
def randomizer(b) :
"""randomizes a fraction 'rate' (global variable) of b"""
c = np.copy(b)
num_elem = len(c)
idx = np.random.choice(range(num_elem), int(num_elem*rate), replace=False)
c[idx] = f(c[idx])
return c
def randomizePatterns(pattern, randomizer) :
"""Return nbTrials partially randomized copies of the given input pattern"""
outputs = np.tile(pattern,(nbTrials,1))
for line in xrange(nbTrials) :
outputs[line] = randomizer(outputs[line,:])
return outputs
def test(pattern,randomizer) :
randomPatterns = randomizePatterns(pattern, randomizer)
"""test the dynamics of a neural network with a fixed but initially random
connection scheme, and returns a boolean corresponding to if
the original pattern that was randomized is retrieved from at least
90% of the randomized patterns"""
return boolean
def metaTest(pattern):
successNumber = 0
for plop in xrange(10):
if test(pattern, randomizer) :
successNumber += 1
return successNumber
The other input to test is a random matrix of floats in [0,4] that I visualize and that has the expected behavior.
When I run metaTest, I always get either 0 or 10, never an intermediate value.
Given the nature of test, I expect a result about 5. I get each result for approximately half of the inputs.
To be more precise, printing the random patterns after each incrementation of plop, I get the same thing ten times, and I would like that to change.

Related

To determine the two optimal cutpoints based on a U-shaped restrict cubic spline curve

I try to determine the two optimal cutpoints based on restrict cubic spline where a U-shape association was found between risk factor (X) and all-cause mortality. The optimal equal-HR method was used with the "CutpointsOEHR" package.
result <-coxph(Surv(survival.death,endpoint.death)~pspline(X,df=0,caic=TRUE)+X1+X2,data=indf)
termplot(result,se=TRUE,col.term=1,ylab='log relative hazard')
#the above two run well.
cuts <- findcutpoints(cox_pspline_fit = result, data = dataset,nquantile = 100, exclude = 0.05,eps = 0.01,shape='U')
#but it comes out the error when I run cuts code.
#Error in if (missing(data) | class(data) != "data.frame") { :
the condition has length > 1

Plotting an exponential function given one parameter

I'm fairly new to python so bare with me. I have plotted a histogram using some generated data. This data has many many points. I have defined it with the variable vals. I have then plotted a histogram with these values, though I have limited it so that only values between 104 and 155 are taken into account. This has been done as follows:
bin_heights, bin_edges = np.histogram(vals, range=[104, 155], bins=30)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2.
plt.errorbar(bin_centres, bin_heights, np.sqrt(bin_heights), fmt=',', capsize=2)
plt.xlabel("$m_{\gamma\gamma} (GeV)$")
plt.ylabel("Number of entries")
plt.show()
Giving the above plot:
My next step is to take into account values from vals which are less than 120. I have done this as follows:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
I need to plot a curve on the same plot as the histogram, which follows the form B(x) = Ae^(-x/λ)
I then estimated a value of λ using the maximum likelihood estimator formula:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
#print(background_data)
N_background=len(background_data)
print(N_background)
sigma_background_data=sum(background_data)
print(sigma_background_data)
lamb = (sigma_background_data)/(N_background) #maximum likelihood estimator for lambda
print('lambda estimate is', lamb)
where lamb = λ. I got a value of roughly lamb = 27.75, which I know is correct. I now need to get an estimate for A.
I have been advised to do this as follows:
Given a value of λ, find A by scaling the PDF to the data such that the area beneath
the scaled PDF has equal area to the data
I'm not quite sure what this means, or how I'd go about trying to do this. PDF means probability density function. I assume an integration will have to take place, so to get the area under the data (vals), I have done this:
data_area= integrate.cumtrapz(background_data, x=None, dx=1.0)
print(data_area)
plt.plot(background_data, data_area)
However, this gives me an error
ValueError: x and y must have same first dimension, but have shapes (981555,) and (981554,)
I'm not sure how to fix it. The end result should be something like:
See the cumtrapz docs:
Returns: ... If initial is None, the shape is such that the axis of integration has one less value than y. If initial is given, the shape is equal to that of y.
So you are either to pass an initial value like
data_area = integrate.cumtrapz(background_data, x=None, dx=1.0, initial = 0.0)
or discard the first value of the background_data:
plt.plot(background_data[1:], data_area)

Finding n-tuple that minimizes expensive cost function

Suppose there are three variables that take on discrete integer values, say w1 = {1,2,3,4,5,6,7,8,9,10,11,12}, w2 = {1,2,3,4,5,6,7,8,9,10,11,12}, and w3 = {1,2,3,4,5,6,7,8,9,10,11,12}. The task is to pick one value from each set such that the resulting triplet minimizes some (black box, computationally expensive) cost function.
I've tried the surrogate optimization in Matlab but I'm not sure it is appropriate. I've also heard about simulated annealing but found no implementation applied to this instance.
Which algorithm, apart from exhaustive search, can solve this combinatorial optimization problem?
Any help would be much appreciated.
The requirement/benefit of Simulated Annealing (SA), is that the objective surface is somewhat smooth, that is, we can be close to a solution.
For a completely random spiky surface- you might as well do a random search
If it is anything smooth, or even sometimes, it makes sense to try SA.
The idea is that (sometimes) changing only 1 of the 3 values, we have little effect on out blackbox function.
Here is a basic example to do this with Simulated Annealing, using frigidum in Python
import numpy as np
w1 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w2 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w3 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
W = np.array([w1,w2,w3])
LENGTH = 12
I define a black-box using the Rastrigin function.
def rastrigin_function_n( x ):
"""
N-dimensional Rastrigin
https://en.wikipedia.org/wiki/Rastrigin_function
x_i is in [-5.12, 5.12]
"""
A = 10
n = x.shape[0]
return A*n + np.sum( x**2- A*np.cos(2*np.pi * x) )
def black_box( x ):
"""
Transform from domain [1,12] to [-5,5]
to be able to push to rastrigin
"""
x = (x - 6.5) * (5/5.5)
return rastrigin_function_n(x)
Simulated Annealing needs to modify state X. Instead of taking/modifying values directly, we keep track of indices. This simplifies creating new proposals as an index is always an integer we can simply add/subtract 1 modulo LENGTH.
def random_start():
"""
returns 3 random indices
"""
return np.random.randint(0, LENGTH, size=3)
def random_small_step(x):
"""
change only 1 index
"""
d = np.array( [1,0,0] )
if np.random.random() < .5:
d = np.array( [-1,0,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def random_big_step(x):
"""
change 2 indici
"""
d = np.array( [1,-1,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def obj(x):
"""
We have a triplet of indici,
1. Calculate corresponding values in W = [w1,w2,w3]
2. Push the values in out black-box function
"""
indices = x
values = W[np.array([0,1,2]), indices]
return black_box(values)
And throw a SA Scheme at it
import frigidum
local_opt = frigidum.sa(random_start=random_start,
neighbours=[random_small_step, random_big_step],
objective_function=obj,
T_start=10**4,
T_stop=0.000001,
repeats=10**3,
copy_state=frigidum.annealing.naked)
I am not sure what the minimum for this function should be, but it found a objective with 47.9095 with indicis np.array([9, 2, 2])
Edit:
For frigidum to change the cooling schedule, use alpha=.9. My experience is that all the work of experiment which cooling scheme works best doesn't out-weight simply let it run a little longer. The multiplication you proposed, (sometimes called geometric) is the standard one, also implemented in frigidum. So to implement Tn+1 = 0.9*Tn you need a alpha=.9. Be aware this cooling step is done after N repeats, so if repeats=100, it will first do 100 proposals before lowering the temperature with factor alpha
Simple variations on current state often works best. Since its best practice to set the initial temperature high enough to make most proposals (>90%) accepted, it doesn't matter the steps are small. But if you fear its soo small, try 2 or 3 variations. Frigidum accepts a list of proposal functions, and combinations can enforce each other.
I have no experience with MINLP. But even if, so many times experiments can surprise us. So if time/cost is small to bring another competitor to the table, yes!
Try every possible combination of the three values and see which has the lowest cost.

Numpy returning False even though both arrays are the same?

From my understanding of numpy, the np.equal([x, prod]) command compares the arrays element by element and returns True for each if they are equal. But every time I execute the command, it returns False for the first comparison. On the other hand, if I copy-paste the two arrays into the command, it returns True for both, as you can see in the screenshot. So, why is there a difference between the two?
You cannot compare floating-point numbers, as they are only an approximation. When you compare them by hardcoded values, they will be equal as they are approximated in the exact same way. But once you apply some mathematical operation on them, it's no longer possible to check if two floating-points are equal.
For example, this
a = 0
for i in range(10):
a += 1/10
print(a)
print(a == 1)
will give you 0.9999999999 and False, even though (1/10) * 10 = 1.
To compare floating-point values, you need to compare the two values against a small delta value. In other words, check if they're just a really small value apart. For example
a = 0
for i in range(10):
a += 1/10
delta = 0.00000001
print(a)
print(abs(a - 1) < delta)
will give you True.
For numpy, you can use numpy.isclose to get a mask or numpy.allclose if you only want a True or False value.

knapsack algorithm not returning optimal value

I am trying to write an algorithm in python for knapsack problem. I did quite a few iterations and came to the following solution. It seems perfect for me. When I ran it on test sets,
it does not output optimal value
and sometimes gives maximum recursion depth error.
after changing the maxValues function it is outputting the optimal value, But it takes very very long time for datasets having more points. how to refine it
For the second problem, I have inspected the data for which it gives the error. The data is like huge and only just couple of them exceeds and the knapsack capacity. So it unnecessarily goes through the entire list.
So what I planned to do is at the start of running my recursive function, I tried to see the entire weights list where each weight is less than the current capacity and prune the rest. The following is the code I am planning to implement.
#~ weights_jump = find_indices(temp_w, lambda e: e < capacity)
#~ if(len(weights_jump)>0):
#~ temp_w[0:weights_jump[0]-1] = []
#~ temp_v[0:weights_jump[0]-1] = []
My main problem remains that why it is not outputting the optimal value. Please help me in this regards and also to integrate the above code into the current algorithm
The following is the main function. the input for this function is as follows,
A knapsack input contains n + 1 lines. The first line contains two integers, the first is the number
of items in the problem, n. The second number is the capacity of the knapsack, K. The remaining
lines present the data for each of the items. Each line, i ∈ 0 . . . n − 1 contains two integers, the
item’s value vi followed by its weight wi
eg input:
n K
v_0 w_0
v_1 w_1
...
v_n-1 w_n-1
def solveIt(inputData):
# parse the input
lines = inputData.split('\n')
firstLine = lines[0].split()
items = int(firstLine[0])
capacity = int(firstLine[1])
K = capacity
values = []
weights = []
for i in range(1, items+1):
line = lines[i]
parts = line.split()
values.append(int(parts[0]))
weights.append(int(parts[1]))
items = len(values)
#end of parsing
value = 0
weight = 0
print(weights)
print(values)
v = node(value,weights,values,K,0,taken);
# prepare the solution in the specified output format
outputData = str(v[0]) + ' ' + str(0) + '\n'
outputData += ' '.join(map(str, v[1]))
return outputData
The following is the recursive function
I'will try to explain this recursive function. Let's say I 'm starting off with root node and now I ill have two decisions to make either to take the first element or not.
before this I will call maxValue function to see the maximum value that can be obtained following this branch. If it is less than existing_max no need to search, so prune.
i will follow left branch if the weight of the first element is less than capacity. so append(1).
updating values,weights list etc and again call node function.
so it first transverses entire left branch and then transverses right branch.
in the right im just updating the values,weights lists and calling node function.
For this function inputs are
value -- The current value of the problem, It is initially set to zero and is it goes it gets increased
weights list
values list
current capacity
current max value found by algo. If this existing_max is greater than the maximum value that can be obtained by following a branch,
there is no need to search that branch. so entire branch is pruned
existing_nodes is the list which tells whether a particular item is taken (1) or not (0)
def node(value,weights,values,capacity,existing_max,existing_nodes):
v1=[];e1=[]; #values we get from left branch
v2=[];e2=[]; #values we get from right branch
e=[];
e = existing_nodes[:];
temp_w = weights[:]
temp_v = values[:];
#first check if the list is empty
if(len(values)==0):
r = [value,existing_nodes[:]]
return r;
#centre check if this entire branch could be pruned. it checks for max value that can be obtained is more than the max value inputted to this
max_value = value+maxValue(weights,values,capacity);
print('existing _max is '+str(existing_max))
print('weight in concern '+str(weights[0])+' value is '+str(value))
if(max_value<=existing_max):
return [0,[]];
#Transversing the left branch
#Transverse only if the weight does not exceed the capacity
print colored('leftbranch','red');
#search for indices of weights where weight < capacity
#~ weights_jump = find_indices(temp_w, lambda e: e < capacity)
#~ if(len(weights_jump)>0):
#~ temp_w[0:weights_jump[0]-1] = []
#~ temp_v[0:weights_jump[0]-1] = []
if(temp_w[0]<=capacity):
updated_value = temp_v[0]+value;
k = capacity-temp_w[0];
temp_w.pop(0);
temp_v.pop(0);
e1 =e[:]
e1.append(1);
print(str(updated_value)+' '+str(k)+' ')
raw_input('press ')
v1= node(updated_value,temp_w,temp_v,k,existing_max,e1);
#after transversing left node update existing_max
if(v1[0]>existing_max):
existing_max = v1[0];
else:
v1 = [0,[]]
#Transverse the right branch
#it implies we are not including the current value so remove that from weights and values.
print('rightbranch')
#~ print(str(value)+' '+str(capacity)+' ')
raw_input("Press Enter to continue...")
weights.pop(0);
values.pop(0);
e2 =e[:];
e2.append(0);
v2 = node(value,weights,values,capacity,existing_max,e2);
if(v1[0]>v2[0]):
return v1;
else:
return v2;
The following is the helper function maxValue which is called from recursive function
def maxValues(weights,values,K):
weights = weights;
values = values;
a=[];
l = 0;
#~ print('hello');
items = len(weights);
#~ print(items);
max = 0;k = K;
for i in range(0,items):
t = (i,float(values[i])/weights[i]);
a.append(t);
#~ print(i);
a = sorted(a,key=operator.itemgetter(1),reverse=True);
#~ print(a);
length = len(a);
for (i,v) in a:
#~ print('this is loop'+str(l)+'and k is '+str(k)+'weight is '+str(weights[i]) );
if weights[i]<=k:
max = max+values[i];
w = weights[i];
#~ print('this'+str(w));
k = k-w;
if(k==0):
break;
else:
max = max+ (float(k)/weights[i])*values[i];
w = k
k = k-w;
#~ print('this is w '+str(max));
if(k==0):
break;
l= l+1;
return max;
I spent days on this and could not do anything.