FAT16 - Timestamp for last modification of entire FAT table, or way to determine whether cluster chain has changed less than O(n)? - fat16

Is there a timestamp somewhere for the last time the FAT table has been changed? I need a way to flatten cluster chains (for constructing NewtonOS VBOs, which uses a single object that then has pointers to other objects/sectors). Or a particular cluster chain, other than traversing the whole thing in O(N)?

Related

Is a SortedList or SortedDictionary the best way to efficiently store and retrieve data by the nearest key?

I am working on a project where I'm storing an object inside a SortedList<float, myData> with timestamps as a key so that I can use the built in binary search to retrieve the data (or two sets if the lower and higher bounds are equally distant from the key) that was stored closest to any given timestamp. For context, the insertion and retrieval occur every 2ms on average (it's for a game replay/rewind function essentially, I'm just storing the current state of entities). And while it works, it's not super efficient with it's O(n) insertion. I know SortedDictionary has an O(log n) insertion, but no built in binary search.
Is there a better option for my situation that you more experienced folks may know of? The ultimate goal of what I'm looking for is a way to do the following:
Save data in sequence of timestamp
Get data flanking any given timestamp (I lerp between for interval values)
"Replay" data starting from given timestamp (so #2 in rapid succession)
Clear any data from before or after a given timestamp
From what I've seen and understood, SortedList wins for #2 (and #3 by proxy) and SortedDictionary is best at number 1 and I'd have to roll my own BinarySearch for it to even do #2/3.
At the moment each entity is handling it's own SortedList, though I wonder if making a sort of RecordManager with a single structure to handle all entities might be more performant.
Any help or suggestions would be great!

redis sorted set implementation

In redis documentation is said that: "Sorted sets are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time we add an element Redis performs an O(log(N)) operation".
How can I prove it can be calculated in order of log(n)?

Why is Hash Table insertion time complexity worst case is not N log N

Looking at the fundamental structure of hash table. We know that it resizes WRT load factor or some other deterministic parameter. I get that if the resizing limit is reached within an insertion we need to create a bigger hash table and insert everything there. Here is the thing which I don't get.
Let's consider a hash table where each bucket contains an AVL - balanced BST. If my hash function returns the same index for every key then I would store everything in the same AVL tree. I know that this hash function would be a really bad function and would not be used but I'm doing a worst case scenario here. So after some time let's say that resizing factor has been reached. So in order to resize I created a new hash table and tried to insert every old elements in my previous table. Since the hash function mapped everything back into one AVL tree, I would need to insert all the N elements into the same AVL. N insertion on an AVL tree is N logN. So why is the worst case of insertion for hash tables considered O(N)?
Here is the proof of adding N elements into Avl three is N logN:
Running time of adding N elements into an empty AVL tree
In short: it depends on how the bucket is implemented. With a linked list, it can be done in O(n) under certain conditions. For an implementation with AVL trees as buckets, this can indeed, wost case, result in O(n log n). In order to calculate the time complexity, the implementation of the buckets should be known.
Frequently a bucket is not implemented with an AVL tree, or a tree in general, but with a linked list. If there is a reference to the last entry of the list, appending can be done in O(1). Otherwise we can still reach O(1) by prepending the linked list (in that case the buckets store data in reversed insertion order).
The idea of using a linked list, is that a dictionary that uses a reasonable hashing function should result in few collisions. Frequently a bucket has zero, or one elements, and sometimes two or three, but not much more. In that case, a simple datastructure can be faster, since a simpler data structure usually requires less cycles per iteration.
Some hash tables use open addressing where buckets are not separated data structures, but in case the bucket is already taken, the next free bucket is used. In that case, a search will thus iterate over the used buckets until it has found a matching entry, or it has reached an empty bucket.
The Wikipedia article on Hash tables discusses how the buckets can be implemented.

Redis ZRANGEBYLEX command complexity

According documentation section for ZRANGEBYLEX command, there is following information. If store keys in ordered set with zero score, later keys can be retrieved with lexicographical order. And ZRANGEBYLEX operation complexity will be O(log(N)+M), where N is total elements count and M is result set size. Documentation has some information about string comparation, but tells nothing about structure, in which elements will be stored.
But after some experiments and reading source code, it's probably what ZRANGEBYLEX operation has a linear time search, when every element in ziplist will be matched against request. If so, complexity will be more larger than described above - about O(N), because every element in ziplist will be scanned.
After debugging with gdb, it's clean that ZRANGEBYLEX command is implemented in genericZrangebylexCommand function. Control flow continues at eptr = zzlFirstInLexRange(zl,&range);, so major work for element retrieving will be performed at zzlFirstInLexRange function. All namings and following control flow consider that ziplist structure is used, and all comparation with input operands are done sequentially element by element.
Inspecting memory with analysis after inserting well-known keys in redis store, it seems that ZSET elements are really stored in ziplist - byte-per-byte comparation with gauge confirm it.
So question - how can documentation be wrong and propagate logarithmic complexity where linear one appears? Or maybe ZRANGEBYLEX command works slightly different? Thanks in advance.
how can documentation be wrong and propagate logarithmic complexity where linear one appears?
The documentation has been wrong on more than a few occasions, but it is an ongoing open source effort that you can contribute to via the repository (https://github.com/antirez/redis-doc).
Or maybe ZRANGEBYLEX command works slightly different?
Your conclusion is correct in the sense that Sorted Set search operations, whether lexicographical or not, exhibit linear time complexity when Ziplists are used for encoding them.
However.
Ziplists are an optimization that prefers CPU to memory, meaning it is meant for use on small sets (i.e. low N values). It is controlled via configuration (see the zset-max-ziplist-entries and zset-max-ziplist-value directives), and once the data grows above the specified thresholds the ziplist encoding is converted to a skip list.
Because ziplists are small (little Ns), their complexity can be assumed to be constant, i.e. O(1). On the other hand, due to their nature, skip lists exhibit logarithmic search time. IMO that means that the documentation's integrity remains intact, as it provides the worst case complexity.

Redis PFADD to check a exists-in-set query

I have a requirement to process multiple records from a queue. But due to some external issues the items may sporadically occur multiple times.
I need to process items only once
What I planned to use is PFADD into redis every record ( as a md5sum) and then see if that returns success. If that shows no increment then the record is a duplicate else process the record.
This seems pretty straightforward , but I am getting too many false positives while using PFADD
Is there a better way to do this ?
Being the probabilistic data structure that it is, Redis' HyperLogLog exhibits 0.81% standard error. You can reduce (but never get rid of) the probability for false positives by using multiple HLLs, each counting a the value of a different hash function on your record.
Also note that if you're using a single HLL there's no real need to hash the record - just PFADD as is.
Alternatively, use a Redis Set to keep all the identifiers/hashes/records and have 100%-accurate membership tests with SISMEMBER. This approach requires more (RAM) resources as you're storing each processed element, but unless your queue is really huge that shouldn't be a problem for a modest Redis instance. To keep memory consumption under control, switch between Sets according to the date and set an expiry on the Set keys (another approach is to use a single Sorted Set and manually remove old items from it by keeping their timestamp in the score).
In general in distributed systems you have to choose between processing items either :
at most once
at least once
Processing something exactly-once would be convenient however this is generally impossible.
That being said there could be acceptable workarounds for your specific use case, and as you suggest storing the items already processed could be an acceptable solution.
Be aware though that PFADD uses HyperLogLog, which is fast and scales but is approximate about the count of the items, so in this case I do not think this is what you want.
However if you are fine with having a small probability of errors, the most appropriate data structure here would be a Bloom filter (as described here for Redis), which can be implemented in a very memory-efficient way.
A simple, efficient, and recommended solution would be to use a simple redis key (for instance a hash) storing a boolean-like value ("0", "1" or "true", "false") for instance with the HSET or SET with the NX option instruction. You could also put it under a namespace if you wish to. It has the added benefit of being able to expire keys also.
It would avoid you to use a set (not the SET command, but rather the SINTER, SUNION commands), which doesn't necessarily work well with Redis cluster if you want to scale to more than one node. SISMEMBER is still fine though (but lacks some features from hashes such as time to live).
If you use a hash, I would also advise you to pick a hash function that has fewer chances of collisions than md5 (a collision means that two different objects end up with the same hash).
An alternative approach to the hash would be to assign an uuid to every item when putting it in the queue (or a squuid if you want to have some time information).