CEPH Write Acknowledgement in case a replica node is down - amazon-s3

While a ceph write operation , standard PUT operation - in case the data node that holds the partition (based on hash) is found dead, then does the coordinator node still sends SUCCESS ACK back for write operation ?
So the question is in case one of 3 replica nodes is found unhealthy, is the WRITE operation ACKED as failure ?

it seems it will fail in write acknowledgment in case a replica node is down if replication factor > 1 (example 2)
Data management begins with clients writing data to pools. When a client writes data to a Ceph pool, the data is sent to the primary OSD. The primary OSD commits the data locally and sends an immediate acknowledgement to the client if replication factor is 1. If the replication factor is greater than 1 (as it should be in any serious deployment) the primary OSD issues write subops to each subsidiary (secondary, tertiary, etc) OSD and awaits a response. Since we always have exactly one primary OSD, the number of subsidiary OSDs is the replication size - 1. Once all responses have arrived, depending on success, it sends acknowledgement (or failure) back to the client.

Related

How to delete queue message without storing log info in SQL Server 2014

I have already enabled "Service Broker" in my database. But recently I noticed that my database size is increasing day by day unexpectedly. Finally I've found out that my database size increase unexpectedly because of queue message of internal table.
The C drive is almost full, that's why I could not delete queue message from internal table. When I execute query for delete query then log file size increase accordingly, that's why I want to delete queue message without storing log.
Thanks.

Ignite with backup count zero

I have set backup count of ignite cache to zero. I have created two server node(say s1 and s2) and one client node(c1). I have set cache mode as Partitioned. I have inserted data in ignite cache. Stopped server 2 and tried access data it is not getting data. If backup count is 0 then how to copy data from one server node other server node. Does ignite does automatically when we stop node.
The way Ignite manages this is with backups. If you set it to zero, you have no resilience and removing a node will result in data loss (unless you enable persistence). You can configure how Ignite responds to this situation with the Partition Loss Policy.

Moving data from one node to other node in same cluster in Apache Ignite

In a baseline cluster of 8 nodes, we have data in the partitioned template without backup. Assume I have avg 28K entries in all 8 nodes in SampleTable(#Cache). Total data = 28K*8 = 224K entries.
CREATE TABLE SampleTable(....) WITH "template=partitioned"
Now I want to shut down one node and before shutting down I want to move data from 8th Node to other nodes so approx 32K (32K*7=224K) entries to 7 nodes. Can I move data from any node to other nodes?
How can I move all the data from one node to other nodes (cluster) before shutting down that node? Keeping the data balanced and distributed in rest 7 nodes.
I created the table (SampleTable) using create statement and inserted data using insert statement (using JDBC connection).
Persistence is enabled.
I think that the most straightforward way is using backups.
Anyway, if you need to avoid data loss, using backups (or/and persistence) is a must.
As a simple workaround you can try the following steps:
You can scan local data on the node, which you want to shut down, using ScanQuery and store the data in a database.
After that, shut down the node and exclude it from baseline.
Upload data from a database.
The approach described below will work only if there are backups configured in a cluster (> 0).
To remove a node from Baseline Topology and rebalance data between rest 7 nodes you are able to use Cluster Activation Tool:
Stop the node you want to remove from topology.
Wait until the node is stopped. Message Ignite node stopped OK should appear in logs.
Check that node is offline:
$IGNITE_HOME/bin/control.sh --baseline
Cluster state: active
Current topology version: 8
Baseline nodes:
ConsistentID=<node1_id>, STATE=ONLINE
ConsistentID=<node2_id>, STATE=ONLINE
...
ConsistentID=<node8_id>, STATE=OFFLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 8
Other nodes not found.
Remove the node from baseline topology:
$IGNITE_HOME/bin/control.sh --baseline remove <node8_id>

Read Timeout Exception while deleting 1 million keys

I have a redis HA with one master and two slaves. And i have around 10 million keys in my redis.
For a given flow i am deleting around 1 Million keys with a batch size of 1000 and simulatneoulsy in other flows keys are getting put into the redis.
but every time while performing delete operation i encounter java.net.SocketTimeoutException: Read timed out. I have kept timeout at 8 seconds.
Is there any issue with Redis Delete?
I am using Jedis client 2.7.
Stack Trace:
"redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out":{
"RedisInputStream.java:201":"redis.clients.util.RedisInputStream.ensureFill",
"RedisInputStream.java:40":"redis.clients.util.RedisInputStream.readByte",
"Protocol.java:141":"redis.clients.jedis.Protocol.process",
"Protocol.java:205":"redis.clients.jedis.Protocol.read",
"Connection.java:297":"redis.clients.jedis.Connection.readProtocolWithCheckingBroken",
"Connection.java:267":"redis.clients.jedis.Connection.getAll",
"Connection.java:259":"redis.clients.jedis.Connection.getAll",
Because the redis server is single thread model for process request. So all the command incoming will queued as one by one.
And about DEL operation, It not just remove the key from keyspace, it will also blocks until all the memory for the keys are free. So it slow down your redis server. you could try the UNLINK command.
This command is very similar to DEL: it removes the specified keys. Just like DEL a key is ignored if it does not exist. However the command performs the actual memory reclaiming in a different thread, so it is not blocking, while DEL is. This is where the command name comes from: the command just unlinks the keys from the keyspace. The actual removal will happen later asynchronously.
EDITED.
Maybe you should try the incrementally deleting and incrementally puting. E.g. Every minute delete 100 keys or 1000 keys.
And if your some key's type is list or set or zset which hold huge data, you might delete it delayed, first collect them, and then delete them when redis is not very busy.
if your put data flow is also this bulk load huge data into redis, incrementally putting.

Is Hazelcast async write transitive?

I am doing some simple benchmarking with Hazelcast to see if it might fit our needs for a distributed data grid. The idea is to have an odd number of servers (eg 5) with '> n/2' replication (eg 3).
With all servers and the client running on my local machine (no network latency) I get the following results:
5 x H/C server (sync backup = 2, async backup = 0); 100 Client Threads : 35,196 puts/second
5 x H/C server (sync backup = 1, async backup = 1); 100 Client Threads : 41,918 puts/second
5 x H/C server (sync backup = 0, async backup = 2); 100 Client Threads : 52,007 puts/second
As expected, async backups allow higher throughput than sync backups. For our use case we would probably opt for the middle option (1x sync and 1x async) as this give us an acceptable balance between resilience and performance.
My question is: If Hazelcast is configured with 1x sync and 1x async, and the node crashes after the sync backup is performed (server returns 'OK' to client and client thread carries on) but before the async backup is performed (so the data is only on one replica and not the second), will the node that received the sync backup pick up the task of the async backup, or will it just wait until the entire cluster re-balances and the 'missing' data from the crashed node is re-distributed from copies? And if the latter, once the cluster re-balances will there be a total of 3 copies of the data, as there would have been if the node hadn't crashed, or will there only be 2 copies because the sync'd node assumes that another node already received its copy?
The partition owner is responsible for creating all backups.
In other words: The 1st backup does NOT create a new backup request for the 2nd backup - it's all responsibility of the owner.
If a member holding a backup replica is stale then anti-entropy mechanism kicks in and the backup partition will be updated to match the owner.
When a member goes down then the 1st (=sync) backup is eventually promoted to be a new partition owner. It's a responsibility of the new owner to make sure a configured redundancy is honoured - a new backup will be created to make sure there 2 backups as configured.