Best way to serialize a byte array key to Redis with Booksleeve - redis

I need to find the best implementation to send a byte array to the key space of a Redis Server with Booksleeve.
I tried different implementation like UTF8 Encoding but i don't know what is the most optimized one in memory of redis server (i will worked with millions of key like this so i really need the shortest in memory key).
Is anyone has already had this requirement?

In the current build for simplicity I've stuck to string keys, however the code would handle binary fine - it uses the binary API. IIRC I received a patch in my inbox just this week that adds binary key support.
Since it seems to be in demand I'll look at that this week.
Edit: a week came and went; the reason being that I'm also doing some work on redis-cluster support, which is going to need some new interfaces anyway, because:
not all operations are supported
parallel (numbered) databases aren't supported
So basically my plan is to roll both pieces of work into the same branch, giving:
a new set of interfaces
which use a struct for the key parameter with an implicit conversion operator from string and byte, allowing either to be used interchangeably
with the redis-cluster and redis-server commands on separate APIs
and a new method on the old connection to get one of the new APIs on a per-DB basis, i.e. Database(3).Keys.Remove(key); or something like that
ETA is still imaginary, but I wanted to explain why I hadn't simply thrown in the existing patch - I think the advent of redis-cluster makes it a good time to revisit the entire API, (but obviously in a way that doesn't break existing code).

Related

cloudflare Durable Objects update object value

Halo! I'm recently diving into cloudflare Workers, especially Durable Objects. I could make a simple request which put a js object into the assigned key. Let's say the key is key0, and the put object value is {"fieldA": "val0", "fieldB": "val1"}. In this case, how can i update the field-value of fieldA without removing fieldB? I've tried simply executing put("key0", {"fieldA": "newVal0"}) and it has kept removing {"fieldB": "val1"}.
Of course it is a common behaviour in js operations, but i cannot find out anything like ~["key0"]["fieldA"] = "newVal0" in docs(maybe i'm missing sth). OTL
Hope this question reach to the gurus in the community! Thanks in advance [:
EDIT after the answers:
In theory, it would be wonderful if flare durable objects support and work just like a normal js object. Such possible worker feature feels like a killer app for the cloud db services, since the average cpu time is quite fast and flare also has super low pricing compared to other big bros. If it happens, i would eager to migrate everything into the flare platform [:
Durable Objects' KV storage only supports get and put operations -- it doesn't have any sort of "update". So, you have two options:
get() the key, modify it, and then write the modified version back. This may sound inefficient, but keep in mind that commonly-accessed keys will likely be in in-memory cache. In fact, this get/modify/put implemented in your JavaScript is probably about as fast as any modification operation that Durable Objects itself could possibly implement built-in. That said, you probably don't want to use this approach with large objects, since the whole object has to be written to disk again after every update.
Split your object across multiple keys. E.g. instead of having the key foo map to {"fieldA": "val0", "fieldB": "val1"}, you could have separate keys foo:fieldA and foo:fieldB. Note that you can fetch all the keys at once using storage.list({prefix: "foo:"}). This approach is not as convenient but allows each field to be written separately to disk.
get and put deal with whole JS objects, so if you want to change part of the object you should get it, update it using normal JS, and then put the entire object back.

API for storing binary blobs

I'm doing some moderately low-level programming of an embedded device that has some NVRAM we plan to use for retaining values between runs of a program. We'd like to abstract the operations into an API over a driver or talking to a daemon. This is lower-level than the serialization semantics I've seen here and there. Basically we want a process or function to be able to reserve some space (with some name or other identifier), store a value (arbitrary byte sequence) in that reserved space, retrieve the value later, and surrender the reservation if it no longer needs to use it. This feels a lot like malloc, write, read, and free. I'm tempted to implement nvAlloc() (or something) and so on. Or am I missing something obvious? Maybe security: another process getting a handle and accessing or corrupting the value.
It seems http://pramfs.sourceforge.net/ and normal file system access are the right answer.

What's the Point of Multiple Redis Databases?

So, I've come to a place where I wanted to segment the data I store in redis into separate databases as I sometimes need to make use of the keys command on one specific kind of data, and wanted to separate it to make that faster.
If I segment into multiple databases, everything is still single threaded, and I still only get to use one core. If I just launch another instance of Redis on the same box, I get to use an extra core. On top of that, I can't name Redis databases, or give them any sort of more logical identifier. So, with all of that said, why/when would I ever want to use multiple Redis databases instead of just spinning up an extra instance of Redis for each extra database I want? And relatedly, why doesn't Redis try to utilize an extra core for each extra database I add? What's the advantage of being single threaded across databases?
You don't want to use multiple databases in a single redis instance. As you noted, multiple instances lets you take advantage of multiple cores. If you use database selection you will have to refactor when upgrading. Monitoring and managing multiple instances is not difficult nor painful.
Indeed, you would get far better metrics on each db by segregation based on instance. Each instance would have stats reflecting that segment of data, which can allow for better tuning and more responsive and accurate monitoring. Use a recent version and separate your data by instance.
As Jonaton said, don't use the keys command. You'll find far better performance if you simply create a key index. Whenever adding a key, add the key name to a set. The keys command is not terribly useful once you scale up since it will take significant time to return.
Let the access pattern determine how to structure your data rather than store it the way you think works and then working around how to access and mince it later. You will see far better performance and find the data consuming code often is much cleaner and simpler.
Regarding single threaded, consider that redis is designed for speed and atomicity. Sure actions modifying data in one db need not wait on another db, but what if that action is saving to the dump file, or processing transactions on slaves? At that point you start getting into the weeds of concurrency programming.
By using multiple instances you turn multi threading complexity into a simpler message passing style system.
In principle, Redis databases on the same instance are no different than schemas in RDBMS database instances.
So, with all of that said, why/when would I ever want to use multiple
Redis databases instead of just spinning up an extra instance of Redis
for each extra database I want?
There's one clear advantage of using redis databases in the same redis instance, and that's management. If you spin up a separate instance for each application, and let's say you've got 3 apps, that's 3 separate redis instances, each of which will likely need a slave for HA in production, so that's 6 total instances. From a management standpoint, this gets messy real quick because you need to monitor all of them, do upgrades/patches, etc. If you don't plan on overloading redis with high I/O, a single instance with a slave is simpler and easier to manage provided it meets your SLA.
Even Salvatore Sanfilippo (creator of Redis) thinks it's a bad idea to use multiple DBs in Redis. See his comment here:
https://groups.google.com/d/topic/redis-db/vS5wX8X4Cjg/discussion
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
I don't really know any benefits of having multiple databases on a single instance. I guess it's useful if multiple services use the same database server(s), so you can avoid key collisions.
I would not recommend building around using the KEYS command, since it's O(n) and that doesn't scale well. What are you using it for that you can accomplish in another way? Maybe redis isn't the best match for you if functionality like KEYS is vital.
I think they mention the benefits of a single threaded server in their FAQ, but the main thing is simplicity - you don't have to bother with concurrency in any real way. Every action is blocking, so no two things can alter the database at the same time. Ideally you would have one (or more) instances per core of each server, and use a consistent hashing algorithm (or a proxy) to divide the keys among them. Of course, you'll loose some functionality - piping will only work for things on the same server, sorts become harder etc.
Redis databases can be used in the rare cases of deploying a new version of the application, where the new version requires working with different entities.
I know this question is years old, but there's another reason multiple databases may be useful.
If you use a "cloud Redis" from your favourite cloud provider, you probably have a minimum memory size and will pay for what you allocate. If however your dataset is smaller than that, then you'll be wasting a bit of the allocation, and so wasting a bit of money.
Using databases you could use the same Redis cloud-instance to provide service for (say) dev, UAT and production, or multiple instances of your application, or whatever else - thus using more of the allocated memory and so being a little more cost-effective.
A use-case I'm looking at has several instances of an application which use 200-300K each, yet the minimum allocation on my cloud provider is 1M. We can consolidate 10 instances onto a single Redis without really making a dent in any limits, and so save about 90% of the Redis hosting cost. I appreciate there are limitations and issues with this approach, but thought it worth mentioning.
I am using redis for implementing a blacklist of email addresses , and i have different TTL values for different levels of blacklisting , so having different DBs on same instance helps me a lot .
Using multiple databases in a single instance may be useful in the following scenario:
Different copies of the same database could be used for production, development or testing using real-time data. People may use replica to clone a redis instance to achieve the same purpose. However, the former approach is easier for existing running programs to just select the right database to switch to the intended mode.
Our motivation has not been mentioned above. We use multiple databases because we routinely need to delete a large set of a certain type of data, and FLUSHDB makes that easy. For example, we can clear all cached web pages, using FLUSHDB on database 0, without affecting all of our other use of Redis.
There is some discussion here but I have not found definitive information about the performance of this vs scan and delete:
https://github.com/StackExchange/StackExchange.Redis/issues/873

Making a file format extensible

I'm writing a particular serialisation system. The first version works well. It's a hierarchial string-key, data-value system. So to get a particular value, you navigate to a particular node and say getInt("some key") etc. etc.
My issue with the current system is that the file size gets quite large very quickly.
I'm going to combat this by adding a string table. The issue with this is that I can't think of a way to support the old system. All I have is a file identifier which is 32 bits long.
I can change the file identifier, but everytime I make another change to the format, I'll need to change the identifier again.
What's an elegant way to implement new features while still supporting the old features?
I've studied the PNG format and creating chunks seems like a good way to go.
Is there any other advice you can give me on chunk dependencies and so forth?
If you need a binary format, look at Protocol Buffers, which Google uses internally for RPCs as well as long-term serialization of records. Each field of a protocol buffer is identified by an integer ID. Old applications ignore (and pass through) the fields that they don't understand, so you can safely add new fields. You never reuse deprecated field IDs or change the type of a field.
Protocol buffers support primitive types (bool, int32, int64, string, byte arrays) as well as repeated and even recursively nested messages. Unfortunately they don't support maps, so you have to turn a map into a list of (key, value).
Don't spend all your time fretting about serialization and deserialization. It's not as fun as designing protobufs.

how to create a system-wide independent universal counter object primarily for Database keys?

I would like to create/use a system-wide independent universal 'counter object' that can be called via COM in a thread-safe manner.
The counter object will be passed an ID to identify which counter to return, handle the counting, 'persist' the count (occasionally), have reasonable performance (as fast as possible) perhaps capable of 1000 counts per second or better (1mS) and be accessible cross-process/out-of-process. The current count status must be persisted between object restarts/shutdowns.
The counter object is liklely to be a 'singleton' type object implemented in some form of free-threaded dictionary, containing maybe 10 counters (perhaps 50 max). The count needs to be monotonic and consistent, (ie: guaranteed unique sequential values).
Each counter should have a few methods, like reset, inc, dec, set, clear, remove. As a luxury, I would like to have a variable-increment (ie: 'step by' value). To support thread-safefty, perhaps some sorm of critical-section or mutex call. It just needs to return a long/4byte signed integer.
I really want something that can be called from anywhere, including VBScript, so I figure COM is my preferred solution.
The primary use of this is for database keys. I am unable to use autoinc or guid type keys and have ruled out database-generated counting systems at this point.
I've spent days researching this and I have really struggled to find a solution. The best I can find is a free-threaded dictionary object that can be instantiated using COM+ from Motobit - it seems to offer all the 'basics' and I guess I could create some form of wrapper for this.
So, here are my questions:
Does such a 'general purpose
counter-object already exist? Can you direct me to it? (MS did
do an IIS/ASP object called
'MSWC.Counter' but this isn't
'cross-process'/ out-of-process
component and isn't thread-safe. (but if it was, it would do!)
What is the best way of creating such
a Component? (I'd prefer VB6
right-now, [don't ask!] but can do in VB.NET2005
if I had to). I don't have the
skills/knowledge/tools to use
anything else.
I am desparate for a workable solution. I need specific guidance! If anybody can code something up for me I am prepared to pay for it.
Update:
Whats wrong with GUIDs? a) 16bytes if I'm lucky (Binary storage), 32+bytes if I'm not (ANSI without formatting) or even worse(64bytes Unicode). b) I have an high-volume replicated app where the GUID is just too big (compared to the actual row data) and c) the overhead of indexing and inserts d) I want a readable number! - I only need 4 byte integer, so why not try and get that? I know you will say that disc-space is cheap, but for my application the cost is in slow inserts, and guids don't help (and I have tried/tested) but would prefer not to use if I have a choice.
Autonumber/autoincs are evil: a) don't get the value until after the insert, b) session specific, c) easy to lose/screw up on a table alter, d) no good for mutli-table inserts, (its not MS-SQL Svr) plus I have a need for counters outside my DB...
By the sound of it, what you're looking to create is an ActiveX EXE. They run in their own process but can be accessed from any other process by instantiating an object from it as though it is just another COM object. It handles all the marshaling necessary to sync its internal thread with the threads of any process calling it. Since all you planning on using is integers, there's no need to worry about the thread safety of objects passed between the threads.
More than likely you can use the MSWC.Counter object within that ActiveX EXE and let it do the counter work.
A database engine is already very good at generating unique primary key values for a dbase table. Either by marking the column auto-increment or by using a Guid. Trying to create your own is a grave mistake. System wide is just not wide enough, it fails miserably when your app grows and more than one machine starts using the database.
Nevertheless, you can get what you want in VB6 by creating a COM server. It's been to long, I forgot the exact names of the project options, something resembling "single use".
I have implemented a similar solution implemented as a REST web service - accessible from any technology that supports http.
Simple c sharp backend implementation using a singleton pattern and will scale nicely under IIS.
The whole thing sounds like a twisted idea, so why should I not add another twisted one. :P
Host an old-skool ASP page.
You can use Application.Lock with a counter then, just like in the sample.
Added benefit: use it from any platform/language. (e.g. other HTML pages with XMLHttpRequest. :)
If you save the value at say every 100th request to a file, you do not even have to worry about IIS resets.
Just set the starting value to last saved value + 100 in Application_OnStart. :P