Effective Access Time - page-tables

I am given the following question:
Now assume the system has no page faults, we are considering adding a TLB that will take
1 nano-second to lookup an address translation. What hit rate (to the nearest 5%) in the TLB is required
to reduce the effective access time to memory by a factor of 2.5?
I am told an average memory access takes 100ns. Since there are 4 memory accesses for a 3 level PT (3 for the page table 1 for physical memory) I deduced that it takes 400ns.
I am then asked to reduce that by a factor of 2.5. So (2/5) *400 = 160ns.
my goal EAT is 160ns. I started setting up the problem and I can't figure out where to go from here.
I am given the following solution but I am just unable to follow it:
Access time = 100 X + (1-x) 400 – 100 ns for hit (read memory), 400ns for
miss
160 = 100 x + 400 – 400x -> x = .8 -> 80% TLB hit rate
Can someone explain to me how they got to this step? I thought EAT was p(time it takes for a hit) + 1-p(time it takes for a miss) where p is the hit rate. isn't the time it takes for a hit 300ns? and then time it takes for a miss is 400ns?
from my logic I tried: p(300) + ((1-p) (400)) but when I went to compute it, I did not get the correct setup as the solution. Can someone explain where my logic is going wrong? Am I wrong about how many memory accesses a hit takes?

Related

CPLEX won't display decision variables after successful solution

I set up a code for a course scheduling optimization problem using IBM CPLEX.
The decision variable is dvar boolean x[course][roomtype][timeslot];, where x is 1 if the course takes place in a room type r during timeslot t.
The model has worked perfectly fine and is feasible for all instances and scenarios I tried it on. Now, for a new scenario, I increased the number of timeslots from 46 to 240, which increased the overall number of decision variables to over 2 million instead of around 300,000.
Now, I can still run the model and after slightly longer run time I get an optimal solution. Yet, the process I had for analysis before was displaying the decision variables, sorting for the ones with the value of 1 and copying and pasting them into Excel for further analysis.
This is now not possible anymore as CPLEX won't respond for a very long time and then does not let me do anything anymore from that point onwards (Limited extent of decision variable display). I have to close the program and start again.
I assumed the problem was the RAM or overall memory, so I opted for Cloud Services of my university. But even having 128GB of RAM, 12 cores and 500 GB of memory at hand was not sufficient and the performance is exactly the same as with my own private laptop.
Any suggestions on what could be the problem or how to export the solution anyway?
Are there variable limits with CPLEX that would make this impossible to solve?
Thanks a lot in advance!
indeed displaying huge matrixes can freeze the IDE.
You wrote:
Now, I can still run the model and after slightly longer run time I
get an optimal solution. Yet, the process I had for analysis before
was displaying the decision variables, sorting for the ones with the
value of 1 and copying and pasting them into Excel for further
analysis.
You should do that with SheetWrite.
First you build a set of the ones with value 1 and then you export with SheetWrite.
In Excel, Rocket science and optimization
.mod
range A=1..2;
range B=1..3;
range C=1..4;
dvar int X[A][B][C];
subject to
{
forall(a in A,b in B,c in C) X[a][b][c]==a*b*c;
}
tuple someTuple{
int a;
int b;
int c;
int value;
};
{someTuple} someSet = {<i,j,k,X[i][j][k]> | i in A, j in B, k in C:X[i][j][k]==1};
.dat
SheetConnection sheet("write3Darray.xlsx");
someSet to SheetWrite(sheet,"A1:D24");

How to determine when to accelerate and decelerate to reach a given destination?

I'm creating a computer game in which there is a computer-controlled car that needs to travel along a straight line (thus making this problem effectively 1-dimensional) and reach a destination coming to a rest at 0 velocity. This car "thinks" every second and decides whether (and by how much) to accelerate or decelerate.
To summarize, I want the car to accelerate as strongly as possible and then stop as rapidly as possible.
Here are the variables that the car must respect:
- RemainingDistance = Our current remaining distance to the destination in meters.
- Velocity = Our current velocity towards the destination in meters/second.
- MaxVelocity = The maximum speed the car can reach.
- Acceleration = The change in velocity per second. The car can change its acceleration every second to any number in the range [0, Acceleration].
- Deceleration = The change in velocity per second. The car can change its deceleration every second to any number in the range [0, Deceleration].
So to be as clear as I can, here's the math that runs every second to update the car simulation:
Acceleration = some amount between [Deceleration, Acceleration] as chosen by the computer
Velocity = Velocity + Acceleration
RemainingDistance = RemainingDistance - Velocity
So my question is: Every time the car "thinks", what formula(s) should it use to determine the ideal value of Acceleration in order to reach its destination (with a 0 final velocity) in as little time as possible?
(In the event that the car's initial velocity is too high and it can't decelerate fast enough to achieve 0 velocity by the time it reaches its destination, then it should come to a rest as close as possible to the destination.)
Please let me know if you have any questions.

Playing a simulation in fast-forward

I am working a simple game where users create structures in a sector, which is a 10x10 grid. Some structures generate resources and some consume resources. The sector itself might contain some resources outside of any structure. The generators and consumers are related. For example, a well might be generating water, then an splitter be consuming water and making hydrogen and oxygen, while a refinery is consuming hydrogen and oxygen and making rocket fuel, etc.
The rate at which they generate or consume resources can vary by structure - I call this the tick rate. Each time a consumer ticks, it will first attempt to extract those resources from the structures that surround it in the sector. If there are not enough, it will try to get them from the sector's storage. If it is still not enough, the structure will stop. Structures hold the resources that they generate up to some maximum. Once they are full, they will not generate more until some are consumed. If a structure is stopped, it will also not generate more resources, but the resources it already has can still be used by another adjacent structure.
It is not uncommon that there are patterns. For example, if the well is very slow, the splitter will turn off when the well runs out of water, and then the refinery will turn off when the splitter runs out of gases. Then when the well generates again, everything will turn back on.
When the user is playing a sector, I tick the sector continuously at the resolution of shortest tick rate of the sector's structures. This works fine. The pseudo-code looks like this:
const numTicks = (Date.now() - lastTickTime) / shortestTickTime;
let currentTickTime = lastTickTime;
for (i = 0; i < numTicks; i++) {
currentTickTime += shortestTickTime;
// check the consumers - all structures that are consumers
for (curConsumer of consumers) {
if (curConsumer.isRunning &&
(currentTickTime - curConsumer.lastTickTime >= curConsumer.tickRate) {
... check surrounding structures for resources
if (curConsumer.stillNeedsResources) {
... check sector for researches
}
if (curConsumer.stillNeedsResources) {
... no resources available
curConsumer.isRunning = false;
}
}
// check the generators - all structures that are generators
for (curGenerator of generators) {
if (curGenerator.isRunning &&
(currentTickTime - curGenerator.lastTickTime >= curGenerator.tickRate) {
... add the generated resources
}
}
}
}
Now I am dealing with the case of a user coming back to a sector after a long absence - say, a few days - when hundreds or thousands of ticks have passed. If I just naively try to play all of the ticks, it can take several seconds or several minutes to complete.
I am wondering if there are any tips or tricks for simulations of this kind for computing the net change without playing each tick. Or, alternately, if there are changes I can make to the simulation to make this easier to compute. Thanks!
Step 1: Convert the raw data into "graph of nodes" form, where each node represents a machine, and producers are at the bottom and consumers are at the top. For example it might look like:
|
(fuel)
|
Refinery
/ \
/ \
(hydrogen) (oxygen)
\ /
\ /
Splitter
|
(water)
|
well
Note: If a machine has an output buffer (or input buffer/s); then those buffers should be separate nodes. For example, if everything has output buffers it might look like this:
|
(fuel)
|
Refinery output buffer
|
(fuel)
|
Refinery
/ \
/ \
(hydrogen) (oxygen)
| |
Hydrogen Oxygen
output output
buffer buffer
| |
(hydrogen) (oxygen)
\ /
\ /
Splitter
|
(water)
|
Well output buffer
|
(water)
|
well
Step 2: Determine "current steady state average rates" by (initially) working from bottom up (producers to consumers). For example, if a well produces 1 unit of water every 4 ticks then assume it does produce an average of 0.25 water per tick; and if a splitter can convert 1 unit of water into 2 units of hydrogen and 1 unit of oxygen every 3 ticks then that's a max. rate of 0.333 water converted to 0.666 hydrogen and 0.333 oxygen, but you already know the well isn't producing water fast enough and can determine that the splitter will actually consume 0.25 water to produce 0.5 hydrogen and 0.25 oxygen.
Note that if a producer overproduces you will need to backtrack. For example, if a well produces 1 unit of water every 2 ticks then you'd assume it does produce an average of 0.5 water per tick; and if a splitter can convert 1 unit of water into 2 units of hydrogen and 1 unit of oxygen every 3 ticks then you know the well is producing more water than the splitter can consume and have to go back to the well and clamp its output to 0.333 water per tick.
Step 3: Determine when (how many ticks) until the next thing happens that will change the "current steady state average rates". If a resource can become depleted (e.g. the well runs dry) you need to know when that will happen. In the same way, if a "storage vessel" (an output buffer, input buffer, water tank, fuel storage chest, ...) becomes full or empty then you need to know when that will happen too. This is all based on the "current steady state average rates" you have - e.g. if a well's output buffer is empty and gains water (from well) at a rate of 0.5 water per tick and loses water (to splitter) at a rate of 0.33 water per tick; then you can calculate that the quantity it is storing increases at a rate of "0.5 - 0.33 = 0.17 per tick" and (combined with the buffer's capacity) calculate when the output buffer will become full.
Note that "how many ticks until the next thing happens" must also be limited to when you want to stop simulating.
Step 4: Advance time up until the next thing that happens. This mostly means updating the amount of stuff stored in "storage vessels" using the "current steady state average rates" you have; and then modifying any information (e.g. setting a well to "not functioning, ran dry").
Step 5: Repeat the previous steps until you reach the time you want. This is just "if(current_time < stop_time) goto step 2".
Step 6: Update the world with the final state of everything. This is mostly the reverse of step 1 - setting quantities in "storage vessels", marking resources as depleted, etc.
Notes:
You might need to add "transport" as a type of node. For example, if you have conveyor belts taking water from the well to the splitter then that can be simulated like a "storage vessel" but with lag time (e.g. if belt was empty and items start being put on the belt, then it's going to take "length * speed" time before items arrive at the other end).
If you want; you could add "breakdown" to the game (e.g. maybe there's a small chance that a splitter malfunctions and needs to be repaired). This is just an extra thing to take into account at step 3.
Don't forget that you can modify the game design to make it easier. For a start, I would avoid feedback loops (e.g. if the fuel from the reactor is fed into a generator to create power that is consumed by a splitter that ...) because this makes everything significantly harder. I'd also be tempted to avoid "variable speed" things (e.g. miners travelling between an ore field and a drop off point, where the distance the miners travel increases as closer parts of the ore field are consumed so the "average ore per tick" is increasing and never constant).
Don't forget that it's a game - it doesn't have to be 100% perfectly accurate, and only needs to be convincing enough to fool the player. If it's "slightly wrong" (e.g. an output buffer should have 23 items but only has 21 itmes) it's likely that nobody will ever notice.
Depending on other details; you might (or might not) consider stopping early and switching to "simulate one tick at a time". For example, if you need to simulate 12345600 ticks, then you could use the approach I've described for the first 12345500 ticks, then use the approach you already have to do the last 100 ticks. This can help to make some things (e.g. position of items on a conveyor belt) seem a lot more realistic.

Using memtier_benchmark; every key is missed

My misses/sec are populated and there are no hits.
Data contains keys range from 1 to 300 K and data stored is string type
memtier_benchmark -s xx.xxx.xxx.xxx -p xxxxx -P redis -t 1 -n 1 --ratio 0:1 -c 1 -x 2 --key-pattern S:S --authenticate=xxxxxxx --key-prefix=
memtier_benchmark is pretty poorly documented in this regard. If you use it out of the box on first run, it won't simulate any cache hits, which is pretty useless in terms of a tool designed to test cache performance.
The 2 key parameters here are:
--key-pattern=[SET:GET]
--ratio=[SET:GET]
--key-pattern defines the names given to keys that are set and the names of keys requested. For instance, if you use S:S, that means that the software will set the first key as memtier-0 and then immediately request memtier-1, then set memtier-1, then request memtier-2 (Go figure...). That's why you get a 100% miss result.
If you set R:R, that means the software will randomly set the digit in the keyname in both the Set and the Get. This will typically result in a miss ratio of > 90%, depending on how many clients and threads you set. If you're operating a cache with a miss ratio of > 90%, it questionable whether or not you should be running a cache at all, so again, this is pretty useless.
To simulate what a real world cache should be doing, you want to have a miss ratio of < 50%. To achieve this, you need to expand the number of Gets over Sets. The default for memtier_benchmark is 1:10, but again, on first run (or if you're persistently running this against a cold cache) with --key-pattern=S:S as default, you're still going to get a very high miss ratio. If you keep repeating the test against the same cache you are continually populating, you should see your miss ratio being to fall, but again, that may not be something you can rely on if you're testing in an ephemeral environment.
To get a lower miss ratio on a first run, I use:
--key-pattern=S:R --ratio=1:20
This should result in a miss ratio < 50%. That's as good as I've been able to simulate. My actual cache will have a miss ratio of < 5%. I'm still trying to figure out a way to test that with memtier_benchmark.
Also, use --hide-histogram to get rid of the annoying dump of test results.
EDIT:
For a full blown 100% hit / 0% miss ratio, do the following:
Start with a cold, empty cache
Run a test that uses the -ratio= parameter so that only Sets are included in the test, in a very narrow key range:
--hide-histogram --key-pattern=S:S --key-minimum=1 --key-maximum=50 --ratio=1:0
Now, run the test again, and this time, flip the ratio so that only Gets are included:
--hide-histogram --key-pattern=S:S --key-minimum=1 --key-maximum=50 --ratio=0:1
You can then tune the hit/miss ratio by re-running both parts and expanding the --key-maximum= value
Your ratio is 0 set add 1 get. change it to 1:1 or something similar
You need to populate your memcached before trying a read-only benchmark.
To populate it, you can run a write-only workload to write all the keys at least once.

Software memory testing for bus failures

I have a board with quite a few flash chips, some of them are showing intermittent failures. Standard memory tests are not showing any specific problem addresses, other than certain chips are failing intermittently under mechanical and thermal stress.
Suspecting the actual connections and not the flash cells themselves, I'm looking for a way to test the parallel bus for address or data pin errors.
There are some memory tests but they apply better to RAM rather than flash memory (http://www.ganssle.com/testingram.htm). Specifically, the parallel flash has a sequence of bus writes to write to each value; a write/verify failure could easily be the write operation which could be any pin on the bus.
Ideas welcome...
The typical memory tests are there to do that. I prefer a pseudo randomizer (deterministic using an lfsr) to the 0xAA, 0x55, 0xFF, 0x00 tests. This allows for an address bus test as well as data bus test in two passes (repeat inverted). I say typical in the sense of wiggle the data bits and address bits both states each and vary the states of signals and their neighbors. The pounding on a ram to create thermal or other stresses, well you cant write very fast to a flash so you cant really do fast write/read cycles.
Flash creates another problem and that is writing then reading back isnt that interesting, you want to write the read back later, hours, days, weeks to determine if the part is actually holding data.
When you say thermal or stress do you mean only during the time it is above X degrees it fails, or do you mean that due to thermal stress it is broken all the time after the event. Likewise with mechanical, while vibrating or under mechanical stress the part fails, but when relieved of that stress it is okay, or the mechanical stress has done permanent damage that can be detected under stress or not.
Now although you cant do fast write/read cycles, you can punish a flash by reading heavily. I have seen read-disturb problems by constant reading of one block or location. Not necessarily something you have time to do for every location, but you might fill the ram with a pseudo random pattern and concentrate on one location for a while, (minutes, tens of minutes), if you have a part that you know is bad see if this accelerates the detection of the problem and if any location will work or only certain ones. then another thing is to read all the locations repetitively for hours/days or leave it sit for hours/days/weeks and then do a read pass without an erase or write and see if it has lost anything.
unfortunately as you probably know each new failure case takes its own research project and development of a new test.
First step to test a memory is data bus test0 0 0 0 0 0 0 • In this test, data bus wiring is properly tested to0 0 0 0 0 0 0 confirm that the value placed on data bus by processor0 0 0 0 0 0 0 is correctly received by memory device at the other end0 0 0 0 0 0 00 0 0 0 0 0 0 • An obvious way to test is to write all possible0 0 0 0 0 0 0 data values and verify 0 0 0 0 0 0 0 • Each bit can be tested independently• To perform walking 1s test, write the first data value given in the table, verify by reading it back, write the second value, verify and so on. • When you reach the end of the table, the test is complete
In the linked article Jack Ganssle says: "Critical to this [test], and every other RAM test algorithm, is that you write the pattern to all of RAM before doing the read test."
Since reading should be isolated from writing, testing the flash is easier. Perform the writing portion of the tests while the system is not under stress. Then perform the reading portion with the system under stress. By recording the address, expected value, and actual value in enough error cases, you should be able to determine the source of the errors.
If the system never fails when doing the above, you can then perform the whole tests while under stress. Any errors that appear are most likely write errors.
I've decided to design a memory pattern that I think I can deduce both data and address errors from. The concept is to use values significantly different as key indicators of possible read errors. The concept is also to detect a failure on one pin at a time.
The test will read alternately from only bottom and top addresses (0x000000 and 0x3FFFFF - my chip has 22 address lines). In those locations I will put 0xFF and 0x00 respectively (byte wide). The idea is to flip all address and data lines and see what happens. (All other values in the flash have at least 3 bits different from 0x00 and 0xFF)
There are 44 addresses that a single pin failure could send me to in error. In each address put one of 22 values to represent which of the 22 address pin was flipped. Each are 2 bits different from each other, and 3 bits different from 00 and FF. (I tried for 3 bits different from each other but 8 bits could only get 14 values)
07,0B,0D,0E,16,1A,1C,1F,25,29,2C,
2F,34,38,3D,3E,43,49,4A,4F,52,58
The remaining addresses I put a nice pattern of six values 33,55,66,99,AA,CC. (3 bits different from all other values) value(address) = nicePattern[ sum of bits set in address % 6];
I tested this and have statistically collected 100s of intermittent failure incidents synchronized to the mechanical stress.
single bit errors detectable
double bit errors deducible (Explainable by a combination of frequent single bit errors)
3 or more bit errors (generally inconclusive)
Even though some of the chips had 3 failing pins, 70% of the incidents were single bit (they usually didn't fail at the same time)
The testing group is now using this to identify which specific connections are failing.