HAProxy general purpose counters and stick tables - load-balancing

I'm trying to use HAProxy for rate-limiting.
I need to keep track of several endpoints and limit them individually.
So far I was using general purpose counters. However, there is only 3 of them, sc0 to sc2.
In the documentation, it mentions that all the operations on these counters take an optional table parameter, but it's not clear, then, if I can track different things on different tables, but using the same counters.
In other words: is the limit of 3 general purpose counters global, or per sticky table?
If, after proper table definition and track instructions, I do
sc1_inc_gpc0(table1)
(and, under different conditions)
sc1_inc_gpc0(table2)
And then have 2 acl rules like
acl X sc1_get_gpc0(table1) gt 1
acl Y sc1_get_gpc0(table2) gt 1
Will the two acls work indepentently, or they would to all effects track the same counter?
Thanks for all help!
(In case you are wondering: for a number of reasons, at the moment I could not use a different solution than HAProxy for rate-limiting)

Self answered after looking at the source code and testing.
Yes it is possible to use the same counter on different tables
Moreover, you can also increment the number of available counters at build time. The default is 3, but it can be set up to 10 for sure. Then generic versions of the functions like sc_gpc0_rate(<ctr>[,<table>]) can be used, passing the index of the new counter as first argument.

Related

sdiff - limit the result set to X items

I want to get the diff of two sets in redis, but I don't need to return the entire array, just 10 items for example. Is there any way to limit the results?
I was thinking something like this:
SDIFF set1 set2 LIMIT 10
If not, are there any other options to achieve this in a performant way, considering that set1 can be millions of objects and set2 is much much smaller (hundreds).
More info would be helpful on what you want to achieve. Something like this might require you to duplicate your data. Though I don’t know if it’s something you want.
An option is chunking them.
Create a set with a unique generated id that can hold a max of 10 items
Create a sorted set like so…
zadd(key, timestamp, chunkid)
where your timestamp is a unix time and the chunkid is the key the connects to the set. The key can be the name of whatever you would like it to be or it could also be a uniquely generated id.
Use zrange to grab a specific one
(Repeat steps 1-3 for the second set)
Once you have your 1 result from both your sorted sets “zset”. You can now do your sdiff by using the chunkid.
Note that there is advantages and disadvantages in doing this. Like more connection consumption (if calling from a a client), and the obvious being a little more processing. Though it will help immensely if you put this in a lua script.
Hope this helps or at least gives you an idea on how to model your data. Though if this is critical data you might need to use a automated script of some sort to move your data around to meet the modeling requirement.

Additional PlanningEntity in CloudBalancing - bounded-space situation

I successfully amended the nice CloudBalancing example to include the fact that I may only have a limited number of computers open at any given time (thanx optaplanner team - easy to do). I believe this is referred to as a bounded-space problem. It works dandy.
The processes come in groupwise, say 20 processes in a given order per group. I would like to amend the example to have optaplanner also change the order of these groups (not the processes within one group). I have therefore added a class ProcessGroup in the domain with a member List<Process>, the instances of ProcessGroup being stored in a List<ProcessGroup>. The desired optimisation would shuffle the members of this List, causing the instances of ProcessGroup to be placed at different indices of the List List<ProcessGroup>. The index of ProcessGroup should be ProcessGroup.index.
The documentation states that "if in doubt, the planning entity is the many side of the many-to-one relationsship." This would mean that ProcessGroup is the planning entity, the member index being a planning variable, getting assigned to (hopefully) different integers. After every new assignment of indices, I would have to resort the list List<ProcessGroup in ascending order of ProcessGroup.index. This seems very odd and cumbersome. Any better ideas?
Thank you in advance!
Philip.
The current design has a few disadvantages:
It requires 2 (genuine) entity classes (each with 1 planning variable): probably increases search space (= longer to solve, more difficult to find a good or even feasible solution) + it increases configuration complexity. Don't use multiple genuine entity classes if you can avoid it reasonably.
That Integer variable of GroupProcess need to be all different and somehow sequential. That smelled like a chained planning variable (see docs about chained variables and Vehicle Routing example), in which case the entire problem could be represented as a simple VRP with just 1 variable, but does that really apply here?
Train of thought: there's something off in this model:
ProcessGroup has in Integer variable: What does that Integer represent? Shouldn't that Integer variable be on Process instead? Are you ordering Processes or ProcessGroups? If it should be on Process instead, then both Process's variables can be replaced by a chained variable (like VRP) which will be far more efficient.
ProcessGroup has a list of Processes, but that a problem property: which means it doesn't change during planning. I suspect that's correct for your use case, but do assert it.
If none of the reasoning above applies (which would surprise me) than the original model might be valid nonetheless :)

Alphabetical index with millions of rows in redis

For my application, I need an alphabetical index on a set with millions of rows.
When I use a sorted set, and give all members the same score, the result looks perfect.
Performance is also great, with a test set of 2 million rows, the last third does not perform noticably less than the first third of the set.
However, I need to query those results. For example, get the first (max) 100 items that start with "goo". I played around with zscan and sort, but it does not give me a working and performant result.
Since redis is very fast when inserting a new member to the sorted set, it must be technically possible to immediately (well, very quickly) go to the right memory location. I suppose redis uses some kind of quicksort mechanism to accomplish this.
But.. I don't seem to get the result when I just want to query the data, and not write to it.
We use replicated slaves for read actions, and we prefer the (default) read-only config switch. So creating a dummy key and deleting it afterward (however unelegant) is not really an option.
I'm stuck a bit, and I'm thinking about writing a ZLEX command in redis-server itself. Which I could use like this:
HELP "ZLEX" -> (ZLEX set score startswith)
-- Query the lexicographical index of a sorted set, supplying a 'startswith' string.
127.0.0.1:12345> ZLEX myset 0 goo LIMIT 0 100
1) goo
2) goof
3) goons
4) goozer
What are your thoughts? Am I missing something in the standard redis commands?
We're using Redis 2.8.4 x64 on Debian.
Kind regards, TW
Edits:
Note:
Related issue: indexing-using-redis-sorted-sets -> At least the name I gave to ZLEX seems to conform with Antirez' (Salvatore's) standards. As of 24-1-2014, I'm working on implementing ZLEX. It seems to be the easiest and most straight-forward solution for this use case, and Antirez could merge it into the main branch for everyone's benefit.
I've implemented ZLEX.
Here are the full specs.
You can grab the new functionality from here: github tw-bert
I also posted a pull request to Antirez here.
Kind regards, TW
Have you had a look at this ?
It can be useful depending on the length of the field by which you sort, this method requires b*(a^2) keys, where a is the length of the field , and b is amount of rows for this field.

Cplex/OPL local search

I have a model implemented in OPL. I want to use this model to implement a local search in java. I want to initialize solutions with some heuristics and give these initial solutions to cplex find a better solution based on the model, but also I want to limit the search to a specific neighborhood. Any idea about how to do it?
Also, how can I limit the range of all variables? And what's the best: implement these heuristics and local search in own opl or in java or even C++?
Thanks in advance!
Just to add some related observations:
Re Ram's point 3: We have had a lot of success with approach b. In particular it is simple to add constraints to fix the some of the variables to values from a known solution, and then re-solve for the rest of the variables in the problem. More generally, you can add constraints to limit the values to be similar to a previous solution, like:
var >= previousValue - 1
var <= previousValue + 2
This is no use for binary variables of course, but for general integer or continuous variables can work well. This approach can be generalised for collections of variables:
sum(i in indexSet) var[i] >= (sum(i in indexSet) value[i])) - 2
sum(i in indexSet) var[i] <= (sum(i in indexSet) value[i])) + 2
This can work well for sets of binary variables. For an array of 100 binary variables of which maybe 10 had the value 1, we would be looking for a solution where at least 8 have the value 1, but not more than 12. Another variant is to limit something like the Hamming distance (assume that the vars are all binary here):
dvar int changed[indexSet] in 0..1;
forall(i in indexSet)
if (previousValue[i] <= 0.5)
changed[i] == (var[i] >= 0.5) // was zero before
else
changed[i] == (var[i] <= 0.5) // was one before
sum(i in indexSet) changed[i] <= 2;
Here we would be saying that out of an array of e.g. 100 binary variables, only a maximum of two would be allowed to have a different value from the previous solution.
Of course you can combine these ideas. For example, add simple constraints to fix a large part of the problem to previous values, while leaving some other variables to be re-solved, and then add constraints on some of the remaining free variables to limit the new solution to be near to the previous one. You will notice of course that these schemes get more complex to implement and maintain as we try to be more clever.
To make the local search work well you will need to think carefully about how you construct your local neighbourhoods - too small and there will be too little opportunity to make the improvements you seek, while if they are too large they take too long to solve, so you don't get to make so many improvement steps.
A related point is that each neighbourhood needs to be reasonably internally connected. We have done some experiments where we fixed the values of maybe 99% of the variables in a model and solved for the remaining 1%. When the 1% was clustered together in the model (e.g. all the allocation variables for a subset of resources) we got good results, while in comparison we got nowhere by just choosing 1% of the variables at random from anywhere in the model.
An often overlooked idea is to invert these same limits on the model, as a way of forcing some changes into the solution to achieve a degree of diversification. So you could add a constraint to force a specific value to be different from a previous solution, or ensure that at least two out of an array of 100 binary variables have a different value from the previous solution. We have used this approach to get a sort-of tabu search with a hybrid matheuristic model.
Finally, we have mainly done this in C++ and C#, but it would work perfectly well from Java. Not tried it much from OPL, but it should be fine too. The key for us was being able to traverse the problem structure and use problem knowledge to choose the sets of variables we freeze or relax - we just found that easier and faster to code in a language like C#, but then the modelling stuff is more difficult to write and maintain. We are maybe a bit "old-school" and like to have detailed fine-grained control of what we are doing, and find we need to create many more arrays and index sets in OPL to achieve what we want, while we can achieve the same effect with more intelligent loops etc without creating so many data structures in a language like C#.
Those are several questions. So here are some pointers and suggestions:
In Cplex, you give your model an initial solution with the use of IloOplCplexVectors()
Here's a good example in IBM's documentation of how to alter CPLEX's solution.
Within OPL, you can do the same. You basically set a series of values for your variables, and hand those over to CPLEX. (See this example.)
Limiting the search to a specific neighborhood: There is no easy way to respond without knowing the details. But there are two ways that people do this:
a. change the objective to favor that 'neighborhood' and make other areas unattractive.
b. Add constraints that weed out other neighborhoods from the search space.
Regarding limiting the range of variables in OPL, you can do it directly:
dvar int supply in minQty..maxQty;
Or for a whole array of decision variables, you can do something along the lines of:
range CreditsAllowed = 3..12;
dvar int credits[student] in CreditsAllowed;
Hope this helps you move forward.

optimizing a function to find global and local peaks with R

Y
I have 6 parameters for which I know maxi and mini values. I have a complex function that includes the 6 parameters and return a 7th value (say Y). I say complex because Y is not directly related to the 6 parameters; there are many embeded functions in between.
I would like to find the combination of the 6 parameters which returns the highest Y value. I first tried to calculate Y for every combination by constructing an hypercube but I have not enough memory in my computer. So I am looking for kinds of markov chains which progress in the delimited parameter space, and are able to overpass local peaks.
when I give one combination of the 6 parameters, I would like to know the highest local Y value. I tried to write a code with an iterative chain like a markov's one, but I am not sure how to process when the chain reach an edge of the parameter space. Obviously, some algorythms should already exist for this.
Question: Does anybody know what are the best functions in R to do these two things? I read that optim() could be appropriate to find the global peak but I am not sure that it can deal with complex functions (I prefer asking before engaging in a long (for me) process of code writing). And fot he local peaks? optim() should not be able to do this
In advance, thank you for any lead
Julien from France
Take a look at the Optimization and Mathematical Programming Task View on CRAN. I've personally found the differential evolution algorithm to be very fast and robust. It's implemented in the DEoptim package. The rgenoud package is another good candidate.
I like to use the Metropolis-Hastings algorithm. Since you are limiting each parameter to a range, the simple thing to do is let your proposal distribution simply be uniform over the range. That way, you won't run off the edges. It won't be fast, but if you let it run long enough, it will do a good job of sampling your space. The samples will congregate at each peak, and will spread out around them in a way that reflects the local curvature.