Meaning of last values in test sampling - dataframe

data.test_1$pred <- ifelse(data.test_1$score<=cut_off, 0, 1)
I am trying to get the predictions from my test sampling. Since I am running this code I would like to know the meaning of the last values in the code "[...], 0, 1)" stands for.

data.test_1$pred <- ifelse(data.test_1$score<=cut_off, 0, 1)
If data.test_1$score is less than or equal to the cut_off value then return 0 else return 1

Related

How to add sequential (time series) constraint to optimization problem using python PuLP?

A simple optimization problem: Find the optimal control sequence for a refrigerator based on the cost of energy. The only constraint is to stay below a temperature threshold, and the objective function tries to minimize the cost of energy used. This problem is simplified so the control is simply a binary array, ie. [0, 1, 0, 1, 0], where 1 means using electricity to cool the fridge, and 0 means to turn of the cooling mechanism (which means there is no cost for this period, but the temperature will increase). We can assume each period is fixed period of time, and has a constant temperature change based on it's on/off status.
Here are the example values:
Cost of energy (for our example 5 periods): [466, 426, 423, 442, 494]
Minimum cooling periods (just as a test): 3
Starting temperature: 0
Temperature threshold(must be less than or equal): 1
Temperature change per period of cooling: -1
Temperature change per period of warming (when control input is 0): 2
And here is the code in PuLP
from pulp import LpProblem, LpMinimize, LpVariable, lpSum, LpStatus, value
from itertools import accumulate
l = list(range(5))
costy = [466, 426, 423, 442, 494]
cost = dict(zip(l, costy))
min_cooling_periods = 3
prob = LpProblem("Fridge", LpMinimize)
si = LpVariable.dicts("time_step", l, lowBound=0, upBound=1, cat='Integer')
prob += lpSum([cost[i]*si[i] for i in l]) # cost function to minimize
prob += lpSum([si[i] for i in l]) >= min_cooling_periods # how many values must be positive
prob.solve()
The optimization seems to work before I try to account for the temperature threshold. With just the cost function, it returns an array of 0s, which does indeed minimize the cost (duh). With the first constraint (how many values must be positive) it picks the cheapest 3 cooling periods, and calculates the total cost correctly.
obj = value(prob.objective)
print(f'Solution is {LpStatus[prob.status]}\nThe total cost of this regime is: {obj}\n')
for v in prob.variables():
print(f'{v.name} = {v.varValue}')
output:
Solution is Optimal
The total cost of this regime is: 1291.0
time_step_0 = 0.0
time_step_1 = 1.0
time_step_2 = 1.0
time_step_3 = 1.0
time_step_4 = 0.0
So, if our control sequence is [0, 1, 1, 1, 0], the temperature will look like this at the end of each cooling/warming period: [2, 1, 0, -1, 1]. The temperature goes up 2 whenever the control input is 1, and down 1 whenever the control input is 1. This example sequence is a valid answer, but will have to change if we add a max temperature threshold of 1, which would mean the first value must be a 1, or else the fridge will warm to a temperature of 2.
However I get incorrect results when trying to specify the sequential constraint of staying within the temperature thresholds with the condition:
up_temp_thresh = 1
down = -1
up = 2
# here is where I try to ensure that the control sequence would never cause the temperature to
# surpass the threshold. In practice I would like a lower and upper threshold but for now
# let us focus only on the upper threshold.
prob += lpSum([e <= up_temp_thresh for e in accumulate([down if si[i] == 1. else up for i in l])]) >= len(l)
In this case the answer comes out the same as before, I am clearly not formulating it correctly as the sequence [0, 1, 1, 1, 0] would surpass the threshold.
I am trying to encode "the temperature at the end of each control sequence must be less than the threshold". I do this by turning the control sequence into an array of the temperature changes, so control sequence [0, 1, 1, 1, 0] gives us temperature changes [2, -1, -1, -1, 2]. Then using the accumulate function, it computes a cumulative sum, equal to the fridge temp after each step, which is [2, 1, 0, -1, 1]. I would like to just check if the max of this array is less than the threshold, but using lpSum I check that the sum of values in the array less than the threshold is equal to the length of the array, which should be the same thing.
However I'm clearly formulating this step incorrectly. As written this last constraint has no effect on the output, and small changes give other wrong answers. It seems the answer should be [1, 1, 1, 0, 0], which gives an acceptable temperature series of [-1, -2, -3, -1, 1]. How can I specify the sequential nature of the control input using PuLP, or another free python optimization library?
The easiest and least error-prone approach would be to create a new set of auxillary variables of your problem which track the temperature of the fridge in each interval. These are not 'primary decision variables' because you cannot directly choose them - rather the value of them is constrained by the on/off decision variables for the fridge.
You would then add constraints on these temperature state variables to represent the dynamics. So in untested code:
l_plus_1 = list(range(6))
fridge_temp = LpVariable.dicts("fridge_temp", l_plus_1, cat='Continuos')
fridge_temp[0] = init_temp # initial temperature of fridge - a known value
for i in l:
prob += fridge_temp[i+1] == fridge_temp[i] + 2 - 3*s[i]
You can then sent the min/max temperature constraints on these new fridge_temp variables.
Note that in the above I've assumed that the fridge temperature variables are defined at one more intervals than the on/off decisions for the fridge. The fridge temperature variables represent the temperature at the start of an interval - and having one extra one means we can ensure the final temperature of the fridge is acceptable.

Check a row for ascension in Numpy, but ignoring elements = 0

I have the code snippet below that searches each row/column in an array to see if all values are either ascending or descending. Ideally, this code would ignore zeros. For example, a row with (5, 0, 3, 1) would come up True for descending. The code below still looks at the zeros. If the masked technique is a dead end, maybe I could create a copy without zeros? I'm very new to Numpy so I would appreciate specific directions. Thanks!
np.ma.masked_equal(grid, 0)
for row in grid:
if np.all(np.diff(row) <= 0) or np.all(np.diff(row) >= 0):
monoScore += .5
for col in np.transpose(grid):
if np.all(np.diff(col) <= 0) or np.all(np.diff(col) >= 0):
monoScore += .5

Optimizing specific numbers to reach value

I'm trying to make a program, that when given specific values (let's say 1, 4 and 10), will try to get how much of each value is needed to reach a certain amount, say 19.
It will always try to use as many high values as possible, so in this case, the result should be 10*1, 4*2, 1*1.
I tried thinking about it, but couldn't end up with an algorithm that could work...
Any help or hints would be welcome!
Here is a python solution that tries all the choices until one is found. If you pass the values it can use in descending order, the first found will be the one that uses the most high values as possible:
def solve(left, idx, nums, used):
if (left == 0):
return True
for i in range(idx, len(nums)):
j = int(left / nums[idx])
while (j > 0):
used.append((nums[idx], j))
if solve(left - j * nums[idx], idx + 1, nums, used):
return True
used.pop()
j -= 1
return False
solution = []
solve(19, 0, [10, 4, 1], solution)
print(solution) # will print [(10, 1), (4, 2), (1, 1)]
If anyone needs a simple algorithm, one way I found was:
sort the values, in descending order
keep track on how many values are kept
for each value, do:
if the sum is equal to the target, stop
if it isn't the first value, remove one of the previous values
while the total sum of values is smaller than the objective:
add the current value once
Have a nice day!
(As juviant mentionned, this won't work if the skips larger numbers, and only uses smaller ones! I'll try to improve it and post a new version when I get it to work)

Octave: summing indexed elements

The easiest way to describe this is via example:
data = [1, 5, 3, 6, 10];
indices = [1, 2, 2, 2, 4];
result = zeroes(1, 5);
I want result(1) to be the sum of all the elements in data whose index is 1, result(2) to be the sum of all the elements in data whose index is 2, etc.
This works but is really slow when applied (changing 5 to 65535) to 64K element vectors:
result = result + arrayfun(#(x) sum(data(index==x)), 1:5);
I think it's creating 64K vectors with 64K elements that's taking up the time. Is there a faster way to do this? Or do I need to figure out a completely different approach?
for i = [1:5]
idx = indices(i);
result(idx) = result(idx) + data(i);
endfor
But that's a very non-octave-y way to do it.
Seeing how MATLAB is very similar to Octave, I will provide an answer that was tested on MATLAB R2016b. Looking at the documentation of Octave 4.2.1 the syntax should be the same.
All you need to do is this:
result = accumarray(indices(:), data(:), [5 1]).'
Which gives:
result =
1 14 0 10 0
Reshaping to a column vector (arrayName(:) ) is necessary because of the expected inputs to accumarray. Specifying the size as [5 1] and then transposing the result was done to avoid some MATLAB error.
accumarray is also described in depth in the MATLAB documentation

Using steganography to embed data in DWT subband coefficients

I have been doing more research on the topic of DWT Steganography. I have came across the code below on the web. This is the first time I have came across subbands coefficients being specified. I have an idea what the code does but I would like someone to verify it!
steg_coeffs = [4, 4.75, 5.5, 6.25, 7];
for jj=1:size(message,2)+1
if jj > size(message,2)
charbits = [0,0,0,0,0,0,0,0];
else
charbits = dec2bin(message(jj),8)';
charbits = charbits(:)'-'0';
end
for ii=1:8
bit_count = bit_count + 1;
if charbits(ii) == 1
if HH(bit_count) <= 0
HH(bit_count) = steg_coeffs(randi(numel(steg_coeffs)));
end
else
if HH(bit_count) >= 0
HH(bit_count) = -1 * steg_coeffs(randi(numel(steg_coeffs)));
end
end
end
I think the steg_coeffs are selected coeffiecnt of the HH subband, where bits will be embedded in these selected coefficients. I have googled randi and believe that it will randomise these specified coeffs on each iteration of the loop and embed in random selection coeffs. I am correct?? Thank you
Typing help randi, you find out that randi(IMAX) will return a scalar, which will be an integer uniformly distributed (based on a prng) in the range 1:IMAX. To put simply, it chooses a random integer between 1 and IMAX.
numel(matrix) returns the total number of elements in the matrix.
So, steg_coeffs(randi(numel(steg_coeffs))) chooses a random element from steg_coeffs, by choosing a random index between 1 and 5.
The embedding algorithm is implemented in the following block.
if charbits(ii) == 1
...
else
...
end
Basically, if you're embedding a 1, the HH coefficient has to be positive. If it isn't, substitute it with one from steg_coeffs. Similarly, if you're embedding a 0, the HH coefficient has to be negative. If it isn't, substitute it with the negative of one from steg_coeffs.
The idea is that when you extract the secret, all you have to check is whether the HH coefficient is positive or negative, to know whether the bit has to be 1 or 0.