Ignite with backup count zero - ignite

I have set backup count of ignite cache to zero. I have created two server node(say s1 and s2) and one client node(c1). I have set cache mode as Partitioned. I have inserted data in ignite cache. Stopped server 2 and tried access data it is not getting data. If backup count is 0 then how to copy data from one server node other server node. Does ignite does automatically when we stop node.

The way Ignite manages this is with backups. If you set it to zero, you have no resilience and removing a node will result in data loss (unless you enable persistence). You can configure how Ignite responds to this situation with the Partition Loss Policy.

Related

ERROR : FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException

Getting the Error in acquiring locks, when trying to run count(*) on partitioned tables.
The table has 365 partitions when filtered on <= 350 partitions, the queries are working fine.
when tried to include more partitions for the query, it's failing with the error.
working on Hive-managed ACID tables, with the following default values
hive.support.concurrency=true //cannot make it as false, it's throwing <table> is missing from the ValidWriteIdList config: null, should be true for ACID read and write.
hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.txn.strict.locking.mode=false
hive.exec.dynamic.partition.mode=nonstrict
Tried increasing/decreasing values for these following with a beeline session.
hive.lock.numretries
hive.unlock.numretries
hive.lock.sleep.between.retries
hive.metastore.batch.retrieve.max={default 300} //changed to 10000
hive.metastore.server.max.message.size={default 104857600} // changed to 10485760000
hive.metastore.limit.partition.request={default -1} //did not change as -1 is unlimited
hive.metastore.batch.retrieve.max={default 300} //changed to 10000.
hive.lock.query.string.max.length={default 10000} //changed to higher value
Using the HDI-4.0 interactive-query-llap cluster, the meta-store is backed by default sql-server provided along.
The problem is NOT due to service tier of the hive metastore database.
It is most probably due to too many partitions in one query based on the symptom.
I meet the same issue several times.
In the hivemetastore.log, you shall able to see such error:
metastore.RetryingHMSHandler: MetaException(message:Unable to update transaction database com.microsoft.sqlserver.jdbc.SQLServerException: The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:578)
This is due to in Hive metastore, each partition involved in the hive query requires at most 8 parameters to acquire a lock.
Some possible workarounds:
Decompose the the query into multiple sub-queries to read from fewer
partitions.
Reduce the number of partitions by setting different partition keys.
Remove partitioning if partition keys don't have any filters.
Following are the parameters which manage the batch size for INSERT query generated by the direct SQL. Their default value is 1000. Set both of them to 100 (as a good starting point) in the Custom hive-site section of Hive configs via. Ambari and restart ALL Hive related components (including Hive metastore).
hive.direct.sql.max.elements.values.clause=100
hive.direct.sql.max.elements.in.clause=100
We also faced the same error in HDInsight and after doing many configuration changes similar to what you have done, the only thing that worked is scaling our Hive Metastore SQL DB server.
We had to scale it all the way to a P2 tier with 250 DTUs for our workloads to work without these Lock Exceptions. As you may know, with the tier and DTU count, the SQL server's IOPS and response time improves thus we suspected that the Metastore performance was the root cause for these Lock Exceptions with the increase in workloads.
Following link provides information about the DTU based performance variation in SQL servers in Azure.
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu
Additionally as I know, the default Hive metastore that gets provisioned when you opt to not provide an external DB in cluster creation is just an S1 tier DB. This would not be suitable for any high capacity workloads. At the same time, as a best practice always provision your metastores external to the cluster and attach at cluster provisioning time, as this gives you the flexibility to connect the same Metastore to multiple clusters (so that your Hive layer schema can be shared across multiple clusters, e.g. Hadoop for ETLs and Spark for Processing / Machine Learning), and you have the full control to scale up or down your metastore as per your need anytime.
The only way to scale the default metastore is by engaging the Microsoft support.
We faced the same issue in HDINSIGHT. We solved it by upgrading the metastore.
The Default metastore had only 5 DTU which is not recommended for production environments. So we migrated to custom Metastore and spin the Azure SQL SERVER (P2 above 250 DTUs) and the setting the below properties:
hive.direct.sql.max.elements.values.clause=200
hive.direct.sql.max.elements.in.clause=200
Above values are set because SQL SERVER cannot process more than 2100 parameter. When you have partitions more than 348, you faced this issue as 1 partition creates 8 parameters for metastore 8 x 348

Merge two persistent caches in Apache Ignite

My application uses Apache Ignite persistent storage. For some weeks I ran the application storing the persistent data in let's say "c:\db1". Later I ran the same application with persistent data in c:\db2. The data was only stored on this one server node.
Is there a way to merge the data from db1 folder to db2 folder?
No, you can't, at least not easily.
The best way would be two start two nodes in separate clusters, one using c:\db1 and one using c:\db2 and stream data from one to the other:
Start the two clusters
Start a helper application that will load the data
In the application, start two client nodes with different configurations - one connected to the first cluster, one connected to the second
Transfer the data roughly like this (code is not tested!)
IgniteCache cache1 = client1.cache("mycache");
IgniteCache cache2 = client2.cache("mycache");
for (Cache.Entry e : cache1.query(new ScanQuery())) {
client2.put(e.getKey(), e.getValue());
}

Moving data from one node to other node in same cluster in Apache Ignite

In a baseline cluster of 8 nodes, we have data in the partitioned template without backup. Assume I have avg 28K entries in all 8 nodes in SampleTable(#Cache). Total data = 28K*8 = 224K entries.
CREATE TABLE SampleTable(....) WITH "template=partitioned"
Now I want to shut down one node and before shutting down I want to move data from 8th Node to other nodes so approx 32K (32K*7=224K) entries to 7 nodes. Can I move data from any node to other nodes?
How can I move all the data from one node to other nodes (cluster) before shutting down that node? Keeping the data balanced and distributed in rest 7 nodes.
I created the table (SampleTable) using create statement and inserted data using insert statement (using JDBC connection).
Persistence is enabled.
I think that the most straightforward way is using backups.
Anyway, if you need to avoid data loss, using backups (or/and persistence) is a must.
As a simple workaround you can try the following steps:
You can scan local data on the node, which you want to shut down, using ScanQuery and store the data in a database.
After that, shut down the node and exclude it from baseline.
Upload data from a database.
The approach described below will work only if there are backups configured in a cluster (> 0).
To remove a node from Baseline Topology and rebalance data between rest 7 nodes you are able to use Cluster Activation Tool:
Stop the node you want to remove from topology.
Wait until the node is stopped. Message Ignite node stopped OK should appear in logs.
Check that node is offline:
$IGNITE_HOME/bin/control.sh --baseline
Cluster state: active
Current topology version: 8
Baseline nodes:
ConsistentID=<node1_id>, STATE=ONLINE
ConsistentID=<node2_id>, STATE=ONLINE
...
ConsistentID=<node8_id>, STATE=OFFLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 8
Other nodes not found.
Remove the node from baseline topology:
$IGNITE_HOME/bin/control.sh --baseline remove <node8_id>

Is it possible to perform SQL Query with distributed join over a local cache and a partitioned cache?

I am currently using apache ignite 2.3.0 and the java api. I have a data grid with two nodes and two different caches. One is local and the other partitioned.
Lets say my local cache is on node #1.
I want to perform an SQL query (SqlFieldsQuery) with distributed join so that it returns data from local cache on node #1 and data from partitioned cache on node #2.
Is it possible? Do I need to specify the join in some particular order or activate a specific flag?
All my current tests are not returning any rows from partitioned cache that are not located on same node as local cache.
I tested the same query with distributed join over two different partitioned cache with no affinity and it was able to return data from different nodes properly. Is there a reason why this wouldn't work with local cache too?
Thanks
It is not possible to perform joins (both distributed an co-located) between LOCAL and PARTITIONED caches. The workaround is to use two PARTITIONED caches.

Is Hazelcast async write transitive?

I am doing some simple benchmarking with Hazelcast to see if it might fit our needs for a distributed data grid. The idea is to have an odd number of servers (eg 5) with '> n/2' replication (eg 3).
With all servers and the client running on my local machine (no network latency) I get the following results:
5 x H/C server (sync backup = 2, async backup = 0); 100 Client Threads : 35,196 puts/second
5 x H/C server (sync backup = 1, async backup = 1); 100 Client Threads : 41,918 puts/second
5 x H/C server (sync backup = 0, async backup = 2); 100 Client Threads : 52,007 puts/second
As expected, async backups allow higher throughput than sync backups. For our use case we would probably opt for the middle option (1x sync and 1x async) as this give us an acceptable balance between resilience and performance.
My question is: If Hazelcast is configured with 1x sync and 1x async, and the node crashes after the sync backup is performed (server returns 'OK' to client and client thread carries on) but before the async backup is performed (so the data is only on one replica and not the second), will the node that received the sync backup pick up the task of the async backup, or will it just wait until the entire cluster re-balances and the 'missing' data from the crashed node is re-distributed from copies? And if the latter, once the cluster re-balances will there be a total of 3 copies of the data, as there would have been if the node hadn't crashed, or will there only be 2 copies because the sync'd node assumes that another node already received its copy?
The partition owner is responsible for creating all backups.
In other words: The 1st backup does NOT create a new backup request for the 2nd backup - it's all responsibility of the owner.
If a member holding a backup replica is stale then anti-entropy mechanism kicks in and the backup partition will be updated to match the owner.
When a member goes down then the 1st (=sync) backup is eventually promoted to be a new partition owner. It's a responsibility of the new owner to make sure a configured redundancy is honoured - a new backup will be created to make sure there 2 backups as configured.