I have been trying to implement the square non-linearity activation function function as a custom activation function for a keras model. It's the 10'th function on this list https://en.wikipedia.org/wiki/Activation_function.
I tried using the keras backend but i got nowhere with the multiple if else statements i require so i also tried using the following:
import tensorflow as tf
def square_nonlin(x):
orig = x
x = tf.where(orig >2.0, (tf.ones_like(x)) , x)
x = tf.where(0.0 <= orig <=2.0, (x - tf.math.square(x)/4), x)
x = tf.where(-2.0 <= orig < 0, (x + tf.math.square(x)/4), x)
return tf.where(orig < -2.0, -1, x)
As you can see there's 4 different clauses i need to evaluate. But when i try to compile the Keras model i still get the error:
Using a `tf.Tensor` as a Python `bool` is not allowed
Could anyone help me to get this working in Keras? Thanks a lot.
I've just started a week ago digging into tensorflow and am actively playing around with different activation functions. I think I know what two of your problems are. In your second and third assignments you have compound conditionals you need to put them in under tf.logical_and. The other problem you have is that the last tf.where on the return line returns a -1 which is not a vector, which tensorflow expects. I haven't tried the function with Keras, but in my "activation function" tester this code works.
def square_nonlin(x):
orig = x
x = tf.where(orig >2.0, (tf.ones_like(x)) , x)
x = tf.where(tf.logical_and(0.0 <= orig, orig <=2.0), (x - tf.math.square(x)/4.), x)
x = tf.where(tf.logical_and(-2.0 <= orig, orig < 0), (x + tf.math.square(x)/4.), x)
return tf.where(orig < -2.0, 0*x-1.0, x)
As I said I'm new at this so to "vectorize" -1, I multiplied the x vector by 0 and subtracted -1 which produces a array filled with -1 of the right shape. Perhaps one of the more seasoned tensorflow practioners can suggest the proper way to do that.
Hope this helps.
BTW, tf.greater is equivlent to tf.__gt__ which means that orig > 2.0 expands under the covers in python to tf.greater(orig, 2.0).
Just a follow up. I tried it with the MNIST demo in Keras and the activation function works as coded above.
UPDATE:
The less hacky way to "vectorize" -1 is to use the tf.ones_like function
so replace the last line with
return tf.where(orig < -2.0, -tf.ones_like(x), x)
for a cleaner solution
Related
I'm struggling a bit finding a fast algorithm that's suitable.
I just want to minimize:
norm2(x-s)
st
G.x <= h
x >= 0
sum(x) = R
G is sparse and contains only 1s (and zeros obviously).
In the case of iterative algorithms, it would be nice to get the interim solutions to show to the user.
The context is that s is a vector of current results, and the user is saying "well the sum of these few entries (entries indicated by a few 1.0's in a row in G) should be less than this value (a row in h). So we have to remove quantities from the entries the user specified (indicated by 1.0 entries in G) in a least-squares optimal way, but since we have a global constraint on the total (R) the values removed need to be allocated in a least-squares optimal way amongst the other entries. The entries can't go negative.
All the algorithms I'm looking at are much more general, and as a result are much more complex. Also, they seem quite slow. I don't see this as a complex problem, although mixes of equality and inequality constraints always seem to make things more complex.
This has to be called from Python, so I'm looking at Python libraries like qpsolvers and scipy.optimize. But I suppose Java or C++ libraries could be used and called from Python, which might be good since multithreading is better in Java and C++.
Any thoughts on what library/package/approach to use to best solve this problem?
The size of the problem is about 150,000 rows in s, and a few dozen rows in G.
Thanks!
Your problem is a linear least squares:
minimize_x norm2(x-s)
such that G x <= h
x >= 0
1^T x = R
Thus it fits the bill of the solve_ls function in qpsolvers.
Here is an instance of how I imagine your problem matrices would look like, given what you specified. Since it is sparse we should use SciPy CSC matrices, and regular NumPy arrays for vectors:
import numpy as np
import scipy.sparse as spa
n = 150_000
# minimize 1/2 || x - s ||^2
R = spa.eye(n, format="csc")
s = np.array(range(n), dtype=float)
# such that G * x <= h
G = spa.diags(
diagonals=[
[1.0 if i % 2 == 0 else 0.0 for i in range(n)],
[1.0 if i % 3 == 0 else 0.0 for i in range(n - 1)],
[1.0 if i % 5 == 0 else 0.0 for i in range(n - 1)],
],
offsets=[0, 1, -1],
)
a_dozen_rows = np.linspace(0, n - 1, 12, dtype=int)
G = G[a_dozen_rows]
h = np.ones(12)
# such that sum(x) == 42
A = spa.csc_matrix(np.ones((1, n)))
b = np.array([42.0]).reshape((1,))
# such that x >= 0
lb = np.zeros(n)
Next, we can solve this problem with:
from qpsolvers import solve_ls
x = solve_ls(R, s, G, h, A, b, lb, solver="osqp", verbose=True)
Here I picked CVXOPT but there are other open-source solvers you can install such as ProxQP, OSQP or SCS. You can install a set of open-source solvers by: pip install qpsolvers[open_source_solvers]. After some solvers are installed, you can list those for sparse matrices by:
print(qpsolvers.sparse_solvers)
Finally, here is some code to check that the solution returned by the solver satisfies our constraints:
tol = 1e-6 # tolerance for checks
print(f"- Objective: {0.5 * (x - s).dot(x - s):.1f}")
print(f"- G * x <= h: {(G.dot(x) <= h + tol).all()}")
print(f"- x >= 0: {(x + tol >= 0.0).all()}")
print(f"- sum(x) = {x.sum():.1f}")
I just tried it with OSQP (adding the eps_rel=1e-5 keyword argument when calling solve_ls, otherwise the returned solution would be less accurate than the tol = 1e-6 tolerance) and it found a solution is 737 milliseconds on my (rather old) CPU with:
- Objective: 562494373088866.8
- G * x <= h: True
- x >= 0: True
- sum(x) = 42.0
Hoping this helps. Happy solving!
I would like to vectorize a function with a condition, meaning to calculate its values with array arithmetic. np.vectorize handles vectorization, but it does not work with array arithmetic, so it is not a complete solution
An answer was given as the solution in the question "How to vectorize a function which contains an if statement?" but did not prevent errors here; see the MWE below.
import numpy as np
def myfx(x):
return np.where(x < 1.1, 1, np.arcsin(1 / x))
y = myfx(x)
This runs but raises the following warnings:
<stdin>:2: RuntimeWarning: divide by zero encountered in true_divide
<stdin>:2: RuntimeWarning: invalid value encountered in arcsin
What is the problem, or is there a better way to do this?
I think this could be done by
Getting the indices ks of x for which x[k] > 1.1 for each k in ks.
Applying np.arcsin(1 / x[ks]) to the slice x[ks], and using 1 for the rest of the elements.
Recombining the arrays.
I am not sure about the efficiency, though.
The statement np.where(x < 1.1, 1, np.arcsin(1 / x)) is equivalent to
mask = x < 1.1
a = 1
b = np.arcsin(1 / x)
np.where(mask, a, b)
Notice that you're calling np.arcsin on all the elements of x, regardless of whether 1 / x <= 1 or not. Your basic plan is correct. You can do the operations in-place on an output array using the where keyword of np.arcsin and np.reciprocal, without having to recombine anything:
def myfx(x):
mask = (x >= 1.1)
out = np.ones(x.shape)
np.reciprocal(x, where=mask, out=out) # >= 1.1 implies != 0
return np.arcsin(out, where=mask, out=out)
Using np.ones ensures that the unmasked elements of out are initialized correctly. An equivalent method would be
out = np.empty(x.shape)
out[~mask] = 1
You can always find an arithmetic expression that prevents the "divide by zero".
Example:
def myfx(x):
return np.where( x < 1.1, 1, np.arcsin(1/np.maximum(x, 1.1)) )
The values where x<1.1 in the right wing are not used, so it's not an issue computing np.arcsin(1/1.1) where x < 1.1.
I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx
I am trying to graph two functions, but i want to graph one function for a condition but graph using another function if another condition is met.
A simple example would be:
if x > 0
then sin(x)
else cos(x)
It would then graph cos and sin depending on the x value, there being an obvious gap at x = 0, as cos(0) = 1 and sin(0) = 0.
EDIT: There is a built-in way. I'll leave my original answer below for posterity, but try using the piecewise() function:
plot(piecewise(((cos(x),x<0), (sin(x), 0<x))))
See it here.
I would guess that there's a built-in way to do this, but I don't know it. You can multiply your functions by the Heaviside Step Function to accomplish this task. The step function is 1 if x > 0 and 0 if x < 0, so multiplying this into your functions and then summing them together will select only one of them based on the sign of x, that is to say:
f(x) := heaviside(x) * sin(x) + heaviside(-x) * cos(x)
If x > 0, heaviside(x) = 1 and heaviside(-x) = 0, so f(x) = sin(x).
If x < 0, heaviside(x) = 0 and heaviside(-x) = 1, so f(x) = cos(x).
See it in action here. In general, note that if you want the transition to be at x = a, then you could do heaviside(x-a) and heaviside(-x+a), respectively. If you want N functions, you'll have to have (N-1) multiplied step functions on each term, each with their own (x-a_i) argument. I hope someone else can contribute a cleaner solution.
I'm currently playing with the maximization API for Z3 (opt branch), and I've stumbled upon a following bug:
Whenever I give it any unbounded problem, it simply returns me OPT and gives zero in the resulting model (e.g. maximize Real('x') with no constraints on the model).
Python example:
from z3 import *
context = main_ctx()
x = Real('x')
optimize_context = Z3_mk_optimize(context.ctx)
Z3_optimize_assert(context.ctx, optimize_context, (x >= 0).ast)
Z3_optimize_maximize(context.ctx, optimize_context, x.ast)
out = Z3_optimize_check(context.ctx, optimize_context)
print out
And I get the value of out to be 1 (OPT), while it seems like it should be -1.
Thanks for trying out this experimental branch.
Development is still churning quite a bit these days, but most of the features are reasonably stable and you are invited to try them out.
To answer your question. There is a native way to use the optimization features from Z3.
To paraphrase your example, here is what is relevant:
from z3 import *
x = Real('x')
opt = Optimize()
opt.add(x >= 0)
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
When running it, you will see the following output:
sat
oo
[x = 0]
The first line says that the assertions are satisfiable.
The second line prints the value of the handle "h" under the satisfiabilty call.
The value of the handle holds an expression that meets the maximization/minimization criteria declared by the call to opt.maximize/opt.minimize.
In this case the expression is "oo". It is somewhat of a "hack" because it is going to be up to you to guess that "oo" means infinity. If you interpret this value back to Z3, you will not get infinity.
(I am here restricting the use of Z3 where we don't expose non-standard numbers, there is another part of Z3 that includes non-standard numbers, but that is another story).
Note that the opt.maximize call returns the handle "h",
which is later used to query what was the optimal value.
The last line is some model satisfying the constraints.
When the objective is bounded, the model will be what
you expect, but in this case the objective is unbounded.
There is no finite best value.
Try for example instead:
x = Real('x')
opt = Optimize()
opt.add(x >= 0)
opt.add(x <= 10)
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
This time you get a model that sets x = 10, and this is also the maximal value.
You could also try:
x = Real('x')
opt = Optimize()
opt.add(x >= 0)
opt.add(x < 10)
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
The output is now:
sat
10 + -1*epsilon
[x = 9]
epsilon refers to a non-standard number (infinitesimal). You can set it arbitrarily small.
Again the model uses only standard numbers, so it picks some number, in this case 9.