Using memtier_benchmark; every key is missed - redis

My misses/sec are populated and there are no hits.
Data contains keys range from 1 to 300 K and data stored is string type
memtier_benchmark -s xx.xxx.xxx.xxx -p xxxxx -P redis -t 1 -n 1 --ratio 0:1 -c 1 -x 2 --key-pattern S:S --authenticate=xxxxxxx --key-prefix=

memtier_benchmark is pretty poorly documented in this regard. If you use it out of the box on first run, it won't simulate any cache hits, which is pretty useless in terms of a tool designed to test cache performance.
The 2 key parameters here are:
--key-pattern=[SET:GET]
--ratio=[SET:GET]
--key-pattern defines the names given to keys that are set and the names of keys requested. For instance, if you use S:S, that means that the software will set the first key as memtier-0 and then immediately request memtier-1, then set memtier-1, then request memtier-2 (Go figure...). That's why you get a 100% miss result.
If you set R:R, that means the software will randomly set the digit in the keyname in both the Set and the Get. This will typically result in a miss ratio of > 90%, depending on how many clients and threads you set. If you're operating a cache with a miss ratio of > 90%, it questionable whether or not you should be running a cache at all, so again, this is pretty useless.
To simulate what a real world cache should be doing, you want to have a miss ratio of < 50%. To achieve this, you need to expand the number of Gets over Sets. The default for memtier_benchmark is 1:10, but again, on first run (or if you're persistently running this against a cold cache) with --key-pattern=S:S as default, you're still going to get a very high miss ratio. If you keep repeating the test against the same cache you are continually populating, you should see your miss ratio being to fall, but again, that may not be something you can rely on if you're testing in an ephemeral environment.
To get a lower miss ratio on a first run, I use:
--key-pattern=S:R --ratio=1:20
This should result in a miss ratio < 50%. That's as good as I've been able to simulate. My actual cache will have a miss ratio of < 5%. I'm still trying to figure out a way to test that with memtier_benchmark.
Also, use --hide-histogram to get rid of the annoying dump of test results.
EDIT:
For a full blown 100% hit / 0% miss ratio, do the following:
Start with a cold, empty cache
Run a test that uses the -ratio= parameter so that only Sets are included in the test, in a very narrow key range:
--hide-histogram --key-pattern=S:S --key-minimum=1 --key-maximum=50 --ratio=1:0
Now, run the test again, and this time, flip the ratio so that only Gets are included:
--hide-histogram --key-pattern=S:S --key-minimum=1 --key-maximum=50 --ratio=0:1
You can then tune the hit/miss ratio by re-running both parts and expanding the --key-maximum= value

Your ratio is 0 set add 1 get. change it to 1:1 or something similar

You need to populate your memcached before trying a read-only benchmark.
To populate it, you can run a write-only workload to write all the keys at least once.

Related

Redis time jumps forward and goes backwards

This script shows that the timestamp that redis returns seems to jump forward a lot and then go backwards from time to time. This happens regularly throughout the run, on every run. What's going on? Is this documented anywhere? I stumbled on this as it messes up my sliding window rate limiter.
res_0, res_1 = 0, 0
for _ in range(200):
script = f"""
local time = redis.call("TIME")
local current_time = time[1] .. "." .. time[2]
return current_time
"""
res_0 = res_1
res_1 = float(await redis.eval(script, numkeys=0))
print(res_1 - res_0)
time.sleep(0.01)
1667745169.747809
0.011765003204345703
0.01197195053100586
0.011564016342163086
0.011634111404418945
0.012428998947143555
0.011847972869873047
0.011600971221923828
0.011788129806518555
0.012033939361572266
0.012130022048950195
0.01160883903503418
0.011954069137573242
0.012022972106933594
0.011958122253417969
0.011713981628417969
0.011844873428344727
0.012138128280639648
0.011618852615356445
0.011570215225219727
0.011890888214111328
0.011478900909423828
0.7928261756896973
-0.5926899909973145
0.11812996864318848
0.11584997177124023
0.12353992462158203
0.1199800968170166
0.11719989776611328
0.12331008911132812
-0.8117339611053467
0.011723995208740234
0.01131582260131836
The most likely reason for this behavior is floating point arithmetic on the calling code side: parsing floats would inevitably round (not sure about the extent, since you didn't tell what platform you are coding against) the input value and the original result precision is lost. So, I would suggest to review your logic so that you process the two components of the result returned by TIME independently using a couple of integers / longs instead.
In addition to that, apart from the obvious possibility of an issue with the clock of the Redis server, there may also be the chance you are contacting different Redis hosts along with each iteration - this may hold true in the event you are using a multi-node Redis topology (replication or cluster).

Optimize "1D" bin packing/sheet cutting

Our use case could be described as a variant of 1D bin packing or sheet cutting.
Imagine a drywall with a beam framing.
We want to optimize the number and size of gypsum boards that would be needed to cover the wall.
Boards must start and end on a beam.
Boards must not overlap (hard constraint).
Less (i.e. bigger) boards, the better (soft constraint).
What we currently do:
Pre-generate all possible boards and pass them as problem facts.
Let the solver pick the best subset of those (nullable planning variable).
First Fit Decreasing + Simulated Annealing
Even relatively small walls (~6m, less than 20 possible boards to pick from) take sometimes minutes and while we mostly get a feasible solution, it's rarely optimal.
Is there a better way to model that?
EDIT
Our current domain model looks like the following. Please note that the planning entity only holds the selected/picked material but nothing else. I.e. currently our planning entities are all equal, which kind of prevents any optimization that depends on planning entity difficulty.
data class Assignment(
#PlanningId
private val id: Long? = null,
#PlanningVariable(
valueRangeProviderRefs = ["materials"],
strengthComparatorClass = MaterialStrengthComparator::class,
nullable = true
)
var material: Material? = null
)
data class Material(
val start: Double,
val stop: Double,
)
Active (sub)pillar change and swap move selectors. See optaplanner docs section about move selectors (move neighorhoods). The default moves (single swap and single change) are probably getting stuck in local optima (and even though SA helps them escape those, those escapes are probably not efficient enough).
That should help, but a custom move to swap two subpillars of the almost the same size, might improve efficiency further.
Also, as you're using SA (Simulated Annealing), know that SA is parameter sensitive. Use optaplanner-benchmark to try multiple SA starting temp parameters with different dataset set sizes. Also compare it to a plain LA (Late Acceptance) in benchmarks too. LA isn't fickle like SA can be. (With fickle I don't mean unstable. I mean potential dataset size sensitive parameter tweaking.)

SUMO - simulating traffic scenario

How can I simulate continuous traffic flow from historical data which consists of:
1. Vehicle ID;
2. Speed;
3. Coordinates
without knowing the routes of each vehicle ID.
This is a commonly asked questions but probably hasn't been answered here before. Unfortunately the answer largely depends on the quality of your input data mainly on the frequency / distance of your location updates (it would be also helpful if there is a time stamp to each datum) and how precise the locations fit your street network. In the best case there is a location update on each edge of the route in the street network and you can simply read off the route by mapping the location to the street. This mapping can be done using the python sumolib coming with sumo:
import sumolib
net = sumolib.net.readNet("myNet.net.xml")
route = []
radius = 1
for x, y in coordinates:
minDist, minEdge = min([(dist, edge) for edge, dist in net.getNeighboringEdges(x_coordinate, y_coordinate, radius)])
if len(route) == 0 or route[-1] != minEdge.getID():
route.append(minEdge.getID())
See also http://sumo.dlr.de/wiki/Tools/Sumolib#locate_nearby_edges_based_on_the_geo-coordinate for additional geo conversion.
This will fail when there is an edge in the route which did not get hit by a data point or if you have a mismatch (for instance matching an edge which goes in the "wrong" direction). In the former case you can easily repair the route using sumo's duarouter.
> duarouter -n myNet.net.xml -r myRoutesWithGaps.rou.xml -o myRepairedRoutes.rou.xml --repair
The latter case is considerably harder both to detect and to repair because it largely depends on your definition of a wrong edge. There are almost clear cases like hitting suddenly the opposite direction (which still can happen in real traffic) and a lot of small detours which are hard to decide and deserve a separate answer.
Since you are asking for continuous input you may also be interested in doing this live with TraCI and in this FAQ on constant input flow.

Eigen: Computation and incrementation of only Upper part with selfAdjointView?

I do something like this to get:
bMRes += MatrixXd(n, n).setZero()
.selfadjointView<Eigen::Upper>().rankUpdate(bM);
This gets me an incrementation of bMRes by bM * bM.transpose() but twice as fast.
Note that bMRes and bM are of type Map<MatrixXd>.
To optimize things further, I would like to skip the copy (and incrementation) of the Lower part.
In other words, I would like to compute and write only the Upper part.
Again, in other words, I would like to have my result in the Upper part and 0's in the Lower part.
If it is not clear enough, feel free to ask questions.
Thanks in advance.
Florian
If your bMRes is self-adjoint originally, you could use the following code, which only updates the upper half of bMRes.
bMRes.selfadjointView<Eigen::Upper>().rankUpdate(bM);
If not, I think you have to accept that .selfadjointView<>() will always copy the other half when assigned to a MatrixXd.
Compared to A*A.transpose() or .rankUpdate(A), the cost of copying half of A can be ignored when A is reasonably large. So I guess you don't need to optimize your code further.
If you just want to evaluate the difference, you could use low-level BLAS APIs. A*A.transpose() is equivalent to gemm(), and .rankUpdate(A) is equivalent to syrk(), but syrk() don't copy the other half automatically.

Hyperopt set timeouts and modify space during execution

if someone can help on:
How to set a timeout for each individual test ? a timeout for the total experiment ?
How to setup a progressive strategy which would eliminate/prune a % of worst scoring branches of search space at different stage of the experiment (while using current optimization algorithms) ? ie. at 30% of the max total experiment, it could remove 50% of the worst scoring classifiers and all its branch of hyperparameters to remove it from upcoming tests. Then, same process at 60%...
Thanks a lot!
Following my exchange on hyperopt's github:
there is not a per-trial timeout but hyperopt-sklearn implements its own solution by just wrapping the function. Please look for "fn_with_timeout" at https://github.com/hyperopt/hyperopt-sklearn/ .
from issue 210: "the optimizers are stateless, and fmin stores all state of the experiment in the trials object. So if you remove some experiments from the trials object, it's as if they never happened. use fmin's "max_evals" parameter to interrupt search as often as you need to make these sorts of modifications. It should be fine to use repeated calls with e.g. max_evals increasing by 1 every time if you want really fine grained control."
Thanks for looking into this, #doxav. I've written some code that addresses question 1, taking part of fn_with_timeout from hyperopt-sklearn and adapting it for standard Hyperopt cost functions.
You can find it here:
https://gist.github.com/hunse/247d91d14aaa8f32b24533767353e35d