Datastax keyspace topology Change Issue - datastax

I am planning to change Keyspace Strategy from SimpleStrategy to NetworkTopologyStrategy for making it network aware. I had already changed keyspace strategy of app1,app2 and app3 which i had created. Do i need to change keyspace strategy of below mentioned key space to network aware ?.
dse_leases, dse_system, system_schema, dse_security, system_auth, system_distributed, system, system_traces, solr_admin, dse_perf
Updated : I fount i don't have to change strategy of keyspace system and system_schema because its not user modifiable.Did other keyspace mentioned above need to change

system and system_schema :
Don't have to change strategy of keyspace system and system_schema because
its not user modifiable
dse_leases :
1) A replication factor of 1 is suitable for development and testing on a
single node only, but never in a production environment.
2) For production clusters, increase the replication factor to at least 3
for each logical datacenter that is running analytics.
dse_system :
DSE uses a default replication strategy of "EverywhereStrategy" for the
dse_system keyspace.The best approach is to keep the DSE node in the
cluster and then alter the replication strategy for the dse_system
keyspace to one that is understood by all nodes
(a) ALTER KEYSPACE dse_system WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1':1 ,'DC2':1};
dse_perf :
DataStax Enterprise uses the dse_perf keyspace for storing performance
metrics data.By default DataStax Enterprise writes performance metrics
data with consistency level ONE and writes are performed asynchronously.
Set the replication factor based depending on your environment:
(a) ALTER KEYSPACE "dse_perf" WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
system_auth and dse_security :
Cassandra uses the system_auth and dse_security keyspaces for storing
security authentication and authorization information. DataStax
Enterprise uses the system_auth keyspace when you enable any kind of
authentication. DataStax Enterprise uses the dse_security keyspace only on
analytics nodes (CFS, Hadoop, Spark) when you enable Kerberos
authentication
(a)ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
(b)ALTER KEYSPACE "dse_security" WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

Related

Redis Cluster transactions support

Is there any support for transactions in Redis Cluster python client? Obviously the keys participating in the transaction are mapped to the same slot using Hash Tags
For example: how to perform following commands atomically?
redis.delete("key1.{123}")
redis.set("key2.{123}", 1)
You can use pipelines with transaction=True (default)
pipe = redis.pipeline(transaction=True)
pipe.delete("key1.{123}")
pipe.set("key2.{123}", 1)
pipe.execute()

Ignite with backup count zero

I have set backup count of ignite cache to zero. I have created two server node(say s1 and s2) and one client node(c1). I have set cache mode as Partitioned. I have inserted data in ignite cache. Stopped server 2 and tried access data it is not getting data. If backup count is 0 then how to copy data from one server node other server node. Does ignite does automatically when we stop node.
The way Ignite manages this is with backups. If you set it to zero, you have no resilience and removing a node will result in data loss (unless you enable persistence). You can configure how Ignite responds to this situation with the Partition Loss Policy.

ERROR : FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException

Getting the Error in acquiring locks, when trying to run count(*) on partitioned tables.
The table has 365 partitions when filtered on <= 350 partitions, the queries are working fine.
when tried to include more partitions for the query, it's failing with the error.
working on Hive-managed ACID tables, with the following default values
hive.support.concurrency=true //cannot make it as false, it's throwing <table> is missing from the ValidWriteIdList config: null, should be true for ACID read and write.
hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.txn.strict.locking.mode=false
hive.exec.dynamic.partition.mode=nonstrict
Tried increasing/decreasing values for these following with a beeline session.
hive.lock.numretries
hive.unlock.numretries
hive.lock.sleep.between.retries
hive.metastore.batch.retrieve.max={default 300} //changed to 10000
hive.metastore.server.max.message.size={default 104857600} // changed to 10485760000
hive.metastore.limit.partition.request={default -1} //did not change as -1 is unlimited
hive.metastore.batch.retrieve.max={default 300} //changed to 10000.
hive.lock.query.string.max.length={default 10000} //changed to higher value
Using the HDI-4.0 interactive-query-llap cluster, the meta-store is backed by default sql-server provided along.
The problem is NOT due to service tier of the hive metastore database.
It is most probably due to too many partitions in one query based on the symptom.
I meet the same issue several times.
In the hivemetastore.log, you shall able to see such error:
metastore.RetryingHMSHandler: MetaException(message:Unable to update transaction database com.microsoft.sqlserver.jdbc.SQLServerException: The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:578)
This is due to in Hive metastore, each partition involved in the hive query requires at most 8 parameters to acquire a lock.
Some possible workarounds:
Decompose the the query into multiple sub-queries to read from fewer
partitions.
Reduce the number of partitions by setting different partition keys.
Remove partitioning if partition keys don't have any filters.
Following are the parameters which manage the batch size for INSERT query generated by the direct SQL. Their default value is 1000. Set both of them to 100 (as a good starting point) in the Custom hive-site section of Hive configs via. Ambari and restart ALL Hive related components (including Hive metastore).
hive.direct.sql.max.elements.values.clause=100
hive.direct.sql.max.elements.in.clause=100
We also faced the same error in HDInsight and after doing many configuration changes similar to what you have done, the only thing that worked is scaling our Hive Metastore SQL DB server.
We had to scale it all the way to a P2 tier with 250 DTUs for our workloads to work without these Lock Exceptions. As you may know, with the tier and DTU count, the SQL server's IOPS and response time improves thus we suspected that the Metastore performance was the root cause for these Lock Exceptions with the increase in workloads.
Following link provides information about the DTU based performance variation in SQL servers in Azure.
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu
Additionally as I know, the default Hive metastore that gets provisioned when you opt to not provide an external DB in cluster creation is just an S1 tier DB. This would not be suitable for any high capacity workloads. At the same time, as a best practice always provision your metastores external to the cluster and attach at cluster provisioning time, as this gives you the flexibility to connect the same Metastore to multiple clusters (so that your Hive layer schema can be shared across multiple clusters, e.g. Hadoop for ETLs and Spark for Processing / Machine Learning), and you have the full control to scale up or down your metastore as per your need anytime.
The only way to scale the default metastore is by engaging the Microsoft support.
We faced the same issue in HDINSIGHT. We solved it by upgrading the metastore.
The Default metastore had only 5 DTU which is not recommended for production environments. So we migrated to custom Metastore and spin the Azure SQL SERVER (P2 above 250 DTUs) and the setting the below properties:
hive.direct.sql.max.elements.values.clause=200
hive.direct.sql.max.elements.in.clause=200
Above values are set because SQL SERVER cannot process more than 2100 parameter. When you have partitions more than 348, you faced this issue as 1 partition creates 8 parameters for metastore 8 x 348

Moving data from one node to other node in same cluster in Apache Ignite

In a baseline cluster of 8 nodes, we have data in the partitioned template without backup. Assume I have avg 28K entries in all 8 nodes in SampleTable(#Cache). Total data = 28K*8 = 224K entries.
CREATE TABLE SampleTable(....) WITH "template=partitioned"
Now I want to shut down one node and before shutting down I want to move data from 8th Node to other nodes so approx 32K (32K*7=224K) entries to 7 nodes. Can I move data from any node to other nodes?
How can I move all the data from one node to other nodes (cluster) before shutting down that node? Keeping the data balanced and distributed in rest 7 nodes.
I created the table (SampleTable) using create statement and inserted data using insert statement (using JDBC connection).
Persistence is enabled.
I think that the most straightforward way is using backups.
Anyway, if you need to avoid data loss, using backups (or/and persistence) is a must.
As a simple workaround you can try the following steps:
You can scan local data on the node, which you want to shut down, using ScanQuery and store the data in a database.
After that, shut down the node and exclude it from baseline.
Upload data from a database.
The approach described below will work only if there are backups configured in a cluster (> 0).
To remove a node from Baseline Topology and rebalance data between rest 7 nodes you are able to use Cluster Activation Tool:
Stop the node you want to remove from topology.
Wait until the node is stopped. Message Ignite node stopped OK should appear in logs.
Check that node is offline:
$IGNITE_HOME/bin/control.sh --baseline
Cluster state: active
Current topology version: 8
Baseline nodes:
ConsistentID=<node1_id>, STATE=ONLINE
ConsistentID=<node2_id>, STATE=ONLINE
...
ConsistentID=<node8_id>, STATE=OFFLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 8
Other nodes not found.
Remove the node from baseline topology:
$IGNITE_HOME/bin/control.sh --baseline remove <node8_id>

Is Hazelcast async write transitive?

I am doing some simple benchmarking with Hazelcast to see if it might fit our needs for a distributed data grid. The idea is to have an odd number of servers (eg 5) with '> n/2' replication (eg 3).
With all servers and the client running on my local machine (no network latency) I get the following results:
5 x H/C server (sync backup = 2, async backup = 0); 100 Client Threads : 35,196 puts/second
5 x H/C server (sync backup = 1, async backup = 1); 100 Client Threads : 41,918 puts/second
5 x H/C server (sync backup = 0, async backup = 2); 100 Client Threads : 52,007 puts/second
As expected, async backups allow higher throughput than sync backups. For our use case we would probably opt for the middle option (1x sync and 1x async) as this give us an acceptable balance between resilience and performance.
My question is: If Hazelcast is configured with 1x sync and 1x async, and the node crashes after the sync backup is performed (server returns 'OK' to client and client thread carries on) but before the async backup is performed (so the data is only on one replica and not the second), will the node that received the sync backup pick up the task of the async backup, or will it just wait until the entire cluster re-balances and the 'missing' data from the crashed node is re-distributed from copies? And if the latter, once the cluster re-balances will there be a total of 3 copies of the data, as there would have been if the node hadn't crashed, or will there only be 2 copies because the sync'd node assumes that another node already received its copy?
The partition owner is responsible for creating all backups.
In other words: The 1st backup does NOT create a new backup request for the 2nd backup - it's all responsibility of the owner.
If a member holding a backup replica is stale then anti-entropy mechanism kicks in and the backup partition will be updated to match the owner.
When a member goes down then the 1st (=sync) backup is eventually promoted to be a new partition owner. It's a responsibility of the new owner to make sure a configured redundancy is honoured - a new backup will be created to make sure there 2 backups as configured.