Clarification regarding Bin.asNull vs real null value vs thombstone - aerospike

I'm looking for some clarification regarding what happening and how it affects the inter and between cluster replication between the following operations:
def test() = {
val key = new Key("ns", Configuration.dummySetName, 2)
client.put(null, key, new Bin("bin0", Value.NULL)) // Is that Thombstone? All Row Will be deleted?
client.put(null, key, new Bin("bin2", "value1")) // set bin2 value to be value1
client.put(null, key, Bin.asNull("bin2")) // Is that deleted on the commit? any zombie records can be with value1? is it shipped via XDR? is durableDelete change something?
val record = client.get(null, key)
println(record)
}
What put tombstone means? Is it’s shipped via XDR? can it create “zombie records״?
What are the difference between durable delete and non durable delete? Is they shipped between clusters via XDR? i got the durable delete true means that it’s create tombstone, which back me to 1
What’s the difference between put null in specific bin to use bin.asNull and put tombstone? Is they all shipped?
I’m a bit confused,
Thanks! 🙏

A durable delete will create a tombstone, which is a small record that will stay in the system until all previous versions of the record it covers are gone (and some other side conditions). Details on the Durable Delete doc.
Tombstones prevent 'zombie' records or records resurrection after a cold restart.
Both durable and non durable deletes are shipped by XDR. There is a different mechanism for non durable deletes, though. Durable deletes (or tombstones) are shipped as any other records.
Putting null in a specific bin will delete that bin. If it is the last remaining bin in the record, it will delete the record, and the delete will be durable if the durable delete policy is set on the write that nullifies the bin. I don't know the difference between writing null in a bin and using the Bin.asNull. I would expect it to be the same if I had to guess.

Related

Redis: How to distinguish between client tracking invalidation of keys across multiple databases

Is there anyway to distinguish which database an invalidation applies to?
example:
Tracking socket:
CLIENT ID // 77
PSUBSCRIBE __redis__:*
Main socket:
CLIENT TRACKING on REDIRECT 77 OPTIN
SELECT 1
SET MYKEY VALUE1
CLIENT CACHING YES
GET MYKEY //VALUE1
SELECT 2
SET MYKEY VALUE2
GET MYKEY //VALUE2
SELECT 1
GET MYKEY //VALUE1
The issue i have is that the tracking socket receives a: redis:invalidate 1) MYKEY when MYKEY is set in database 2. However the key I wanted to track is in database 1.
Short of redesigning the application to avoid key collisions across databases or creating a socket per database+tracking, how can i use tracking in a meaningful way?
Edit: Redis 6.0.8 stand alone install
Found the answer in Redis documentation:
"There is a single keys namespace, not divided by database numbers. So if a client is caching the key foo in database 2, and some other client changes the value of the key foo in database 3, an invalidation message will still be sent. This way we can ignore database numbers reducing both the memory usage and the implementation complexity."

How to really deal with indexing in Redis and correctly implement indexes

I am moving some "live" data structures from MySQL to REDIS. Using StackExchange C# Redis Client, I'm writing (due to some very project-specific restrictions) my own microORM code to store and retrieve object class entities from a Redis Database.
I am pushing c# object as hash keys in Redis.
My general question is about indexing on fields other than the "primary key".
Ok, I've read all the theory of sets and sorted sets, and how to add and remove members from sets, and so on.
I've added some code to correctly create set keys which contain entities hash keys, so that I can lookup those objects by simple indexes or sorted indexes.
However I cannot find or figure out a good strategy for solving the following problems:
1. Index maintenance on expiration
I'd like to add expiration to some object (hash) keys, so that old entities get purged automatically by Redis. However I cannot find a reilable way to update/purge relevant indexes besides running periodically a background task that scans index set keys for expired members and removes them (notification is not good for me)
2. Index updating when some object fields change
In some cases I need to update only a small fraction of hash key values, not the whole entity. If the fields being updated are part of one or more index set keys, I cannot figure out the best way to properly update the set keys.
For example, let's say I need to store a "Session" entity whose primary key is its ID (simple numerical integer), and I need to add an index on the "Node" string field (Node being the reference to the server currently serving the session):
class Session {
[RedisKey]
public int ID { get; set; }
public string RemoteIP { get; set; }
[RedisSimpleIndex]
public string Node { get; set; }
}
RedisKey and RedisSimpleIndex are attributes I use to extract via reflection which fields are used as primary key and which are used for indexing.
Let's suppose I have an instance of Session like this:
{ ID = 2, RemoteIP = "1.2.3.4", Node = "Server10" }
My routines are creating the following keys in Redis:
Hash key: "obj:Session:2"
Hash values: "ID" = "1", "RemoteIP" = "1.2.3.4", "Node" = "Server10"
Set key "idx:Session:Node:Server10"
Set members: "obj:Session:2"
which is fine for looking up all sessions on Server10.
However, if the very same session needs to be moved to a different server (e.g. Server8)and I want to update only the Node field in the Hash set, how can I update indexes too?
The only way I found so far is to SCAN all index keys with pattern idx:Session:Node:* and remove from them any member obj:Session:2, then create/update the index key for the new node (idx:Session:Node:Server8).
Moreover the SCAN command is not available in IDatabase or ITransaction interfaces, and in a HA Clustered environment things get worse since I need to determine which Redis server is holding relevant keys to make this procedure work.
Is there a better way to build/represent simple indexes in Redis? Is my approach wrong?
I'd like to add expiration to some object (hash) keys, so that old entities get purged automatically by Redis. However I cannot find a reilable way to update/purge relevant indexes besides running periodically a background task that scans index set keys for expired members and removes them (notification is not good for me)
You cannot expire individual KV pairs within a hash. This is was discussed in #167. There don't appear plans to change this.
I think, you should be able to use keyspace notifications to subscribe to expire events. You would have to have some worker that subscribes for them and updates all relevant indices accordingly. However, you might get some inconsistent data. For example, your worker might crash and leave the stale indices behind. Also the indices wouldn't be updated instantaneously, so you'd end up with a bit of stale data regardless.
Probably not the best idea, but you could also hack in some custom indexing logic into expire.c. The code seems fairly straightforward. The C module API by contrast doesn't appear to provide any way to hook into the eviction logic.
Another option is to not rely on Redis when it comes to handling expiration logic. So... you would still have a background job, but it would actually issue corresponding DEL commands for expired KV-pairs. This would also allow you to keep the index 100% up to date via transactions.
In some cases I need to update only a small fraction of hash key values, not the whole entity. If the fields being updated are part of one or more index set keys, I cannot figure out the best way to properly update the set keys.
I'm not sure which Redis client you're using, but I found the following pattern to be quite useful in the past:
You have some form of "Updater" class for each hash. It has setters for all relevant fields that could be updated (setFirstName, setLastName etc.).
When you set a field, you mark that particular field as "dirty" (e.g. via a separate boolean).
When you call "save", you update indices for fields that were marked as dirty.
The only way I found so far is to SCAN all index keys with pattern idx:Session:Node:* and remove from them any member obj:Session:2, then create/update the index key for the new node (idx:Session:Node:Server8).
This is cumbersome, but seems like the way to go. Sadly I don't think there is a better solution for this. You might want to consider maintaining a separate set with keys of index KV-pairs that would have to be updated though, as that way you'd avoid going over a bunch of keys that aren't relevant.
You might also want to check out an article about how to maintain those indices. As you already alluded to, there are basically two options: real-time using MULTI transactions or using batch jobs. Once you get into the territory of using key expiration, you are more or less forced to use the batch approach.

Redis Db - Watch if key exists or created

I'm trying Unique Index implemantation with Redis db (ServiceStack Client)
Normally
Check Unique Index Duplication
If Unique Index Exists RETURN WITH WARNING
WATCH for Unique Index (for race-condition)
Open Transaction
Insert new record, Insert new records unique index
Close Transaction
How can I get rid of 1st step?
WATCH for existence. I'm not related with changing of key. I'm related with creation or existance. (surely out of my transaction)
If you are trying to use redis just for checking duplicated then use hashset:
http://redis.io/commands#hash
how do you use the servicestack client? with native client? typed client? (then i can show you how to do that)
and use that command: http://redis.io/commands/hsetnx

Why do SQL id sequences go out of sync (specifically using Postgres)?

I've seen solutions for updating a sequence when it goes out of sync with the primary key it's generating, but I don't understand how this problem occurs in the first place.
Does anyone have insight into how a primary key field, with its default defined as the nextval of a sequence, whose primary keys aren't set explicitly anywhere, can go out of sync with the sequence? I'm using Postgres, and we see this occur now and then. It results eventually in a duplicate key constraint when the sequence produces an id for an existing row.
Your application is probably occasionally setting the value for the primary key for a new row. Then postgresql has no need to get a new sequence and the sequence doesn't get updated.
When a sequence number is allocated, it remains allocated, even if the TX that requested it is rolled back. So a number can be allocated that does not appear in the stable database. Of course, rows can also be deleted after they are created, so the maximum number found in the table need not be the maximum number ever allocated. This applies to any auto-incrementing type.
Also, depending on the technology used, separate sequences can be used with multiple tables, so a value might be missing from TableA but present in TableB. That could be because of a mistake in the use of sequence names, or it might be intentional.

Concurrent insert of keys into a table

Probably a trivial question, but I want to get the best possible solution.
Problem:
I have two or more workers that insert keys into one or more tables. The problem arises when two or more workers try to insert the same key into one of those key tables at the same time.
Typical problem.
Worker A reads the table if a key exists (SELECT). There is no key.
Worker B reads the table if a key exists (SELECT). There is no key.
Worker A inserts the key.
Worker B inserts the key.
Worker A commits.
Worker B commits. Exception is throws as unique constraint is violated
The key tables are simple pairs. First column is autoincrement integer and the second is varchar key.
What is the best solution to such a concurrency problem? I believe it is a common problem. One way for sure is to handle the exceptions thrown, but somehow I don't believe this is the best way to tackle this.
The database I use is Firebird 2.5
EDIT:
Some additional info to make things clear.
Client side synchronization is not a good approach, because the inserts come from different processes (workers). And I could have workers across different machines someday, so even mutexes are a no-go.
The primary key and the first columns of such a table is autoincrement field. No problem there. The varchar field is the problem as it is something that the client inserts.
Typical such table is a table of users. For instance:
1 2056
2 1044
3 1896
4 5966
...
Each worker check if user "xxxx" exists and if not inserts it.
EDIT 2:
Just for the reference if somebody will go the same route. IB/FB return pair of error codes (I am using InterBase Express components). Checking for duplicate value violation look like this:
except
on E: EIBInterBaseError do
begin
if (E.SQLCode = -803) and (E.IBErrorCode = 335544349) then
begin
FKeysConnection.IBT.Rollback;
EnteredKeys := False;
end;
end;
end;
With Firebird you can use the following statement:
UPDATE OR INSERT INTO MY_TABLE (MY_KEY) VALUES (:MY_KEY) MATCHING (MY_KEY) RETURNING MY_ID
assuming there is a BEFORE INSERT trigger which will generate the MY_ID if a NULL value is being inserted.
Here is the documentation.
Update: The above statement will avoid exceptions and cause every statement to succeed. However, in case of many duplicate key values it will also cause many unnecessary updates.
This can be avoided by another approach: just handle the unique constraint exception on the client and ignore it. The details depend on which Delphi library you're using to work with Firebird but it should be possible to examine the SQLCode returned by the server and ignore only the specific case of unique constraint violation.
I do not know if something like this is avalible in Firebird but in SQL Server you can check when inserting the key.
insert into Table1 (KeyValue)
select 'NewKey'
where not exists (select *
from Table1
where KeyValue = 'NewKey')
First option - don't do it.
Don't do it; Unless the WORKERS are doing extraordinary amounts of work (we're talking about computers, so requiring 1 second per record qualifies as "extraordinary amount of work"), just use a single thread; Even better, do all the work in a stored procedure, you'd be amazed by the speedup gained by not transporting data over whatever protocol into your app.
Second option - Use a Queue
Make sure your worker threads don't all work on the same ID. Set up a Queue, push all the ID's that need processing into that queue, have each working thread Dequeue an ID from that Queue. This way you're guaranteed no two workers work on the same record at the same time. This might be difficult to implement if your workers are not all part of the same process.
Last resort
Set up an DB-based "Reservation" system so an Worker Thread can mark a Key for "work in process" so no two workers would work on the same Key. I'd set up a table like this:
CREATE TABLE KEY_RESERVATIONS (
KEY INTEGER NOT NULL, /* This is the KEY you'd be reserving */
RESERVED_UNTIL TIMESTAMP NOT NULL /* We don't want to keep reservations for ever in case of failure */
);
Each of your workers would use short transactions to work on that table: Select a candidate Key, one that's not in the KEY_RESERVATIONS table. Try to INSERT. Failed? Try an other KEY. Periodically delete all reserved key with old RESERVED_UNTIL timestamps. Make sure the transactions for working with KEY_RESERVATIONS are as short as possible, so that two threads both trying to reserve the same key at the same time would fail quickly.
This is what you have to deal with in an optimistic (or no-) locking scheme.
One way to avoid it is to put a pessimistic lock on the table around the whole select, insert, commit sequence.
However, that means you will have to deal with not being able to access the table (handle table-locked exceptions).
If by workers you mean threads in the same application instance instead of different users (application instances), you will need thread synchronization like kubal5003 says around the select-insert-commit sequence.
A combination of the two is needed if you have multiple users/application instances each with multiple threads.
Synchronize your threads to make it impossible to insert the same value or use a db side key generation method (I don't know Firebird so I don't even know if it's there, eg. in MsSQL Server there is identity column or GUIDs also solve the problem because it's unlikely to generate two identical ones)
You should not rely the client to generate the unique key, if there's possibility for duplicates.
Use triggers and generators (maybe with help of stored procedure) to create always unique keys.
More information about proper autoinc implementation in Firebird here: http://www.firebirdfaq.org/faq29/