Master startup cannot progress, in holding-pattern until region onlined - apache

I've setup a Hbase cluster with two nodes and I've noticed the warning "AssignmentManager: STUCK Region-In-Transition" which is not allowing the master to startup.
Node 1: observepreserve.corp.com (Master / Zookeeper)
Node 2: knewshoe.corp.com (Region Server)
Why it is happeneing and how to fix it?
Under Hbase UI I can see the below message.
b94eb458bf643b46deaf6b00998d1f95 hbase:namespace,,1542792846910.b94eb458bf643b46deaf6b00998d1f95. state=OPENING, ts=Wed Nov 21 09:39:46 UTC 2018 (PT18M9.696S ago), server=knewshoe.corp.com,16020,1542792833282
Logs:
2018-11-21 09:40:45,900 INFO [ReadOnlyZKClient-observepreserve.corp.com:2181#0x4068418f] zookeeper.ZooKeeper: Session: 0x167359e5ad60006 closed
2018-11-21 09:40:45,900 INFO [ReadOnlyZKClient-observepreserve.corp.com:2181#0x4068418f-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x167359e5ad60006
2018-11-21 09:40:49,266 WARN [master/observepreserve:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1542792846910.b94eb458bf643b46deaf6b00998d1f95. is NOT online; state={b94eb458bf643b46deaf6b00998d1f95 state=OPENING, ts=1542793186164, server=knewshoe.corp.com,16020,1542792833282}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2018-11-21 09:41:46,095 WARN [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=knewshoe.corp.com,16020,1542792833282, table=hbase:namespace, region=b94eb458bf643b46deaf6b00998d1f95
2018-11-21 09:41:53,267 WARN [master/observepreserve:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1542792846910.b94eb458bf643b46deaf6b00998d1f95. is NOT online; state={b94eb458bf643b46deaf6b00998d1f95 state=OPENING, ts=1542793186164, server=knewshoe.corp.com,16020,1542792833282}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

Yes,reinstall hbase cause this problem!
This is because old metadata was not removed,you need try to delete hbase metadata from zk
and restart hbase,all will be ok,good luck.

Related

Janusgraph not using index in production

Problem
When performing queries in my production environment, the index is not being used and a full scan is performed, but my development environment works fine and uses the index.
After looking deeper at the problem in production, it also seems that the index information is being saved to the storage backend, but the data is not, and is being stored locally. I have no idea why this is...
I will explain the architecture now:
Environments
The following describe my two environments. Important to note, the index in question is a composite index, as such uses the storage backend, but I still included the index-backend in the architecture environment (aka Elasticsearch).
Both local and production environment versions are the same, i.e:
Janusgraph: 0.5.2
ScyllaDB: 0.5.2
Elasticsearch: 7.13.1
Local Environment
Services are running in docker-compose, consisting of a single Janusgraph instance, a single ScyllaDB instance, and a single Elasticsearch Instance.
Production Environment
Running on AWS, kubernetes cluster managed with EKS, I have multiple janusgraph deployments, which connect to a ScyllaDB cluster (in the same k8s cluster), which is done via Scylla For Kubernetes (https://operator.docs.scylladb.com/stable/), and an Elasticsearch cluster.
Setup
The following will give the simplest example I can that contains the problems I describe.
I pre-create the index's with the Janusgraph management system, such as:
# management.groovy
import org.janusgraph.graphdb.database.management.ManagementSystem
cluster = Cluster.open("/opt/janusgraph/my_scripts/gremlin.yaml")
client = cluster.connect()
graph = JanusGraphFactory.open("/opt/janusgraph/my_scripts/env.properties")
g = graph.traversal().withRemote(DriverRemoteConnection.using(client, "g"))
m = graph.openManagement()
uid_property = m.makePropertyKey("uid").dataType(String).make()
user_label = m.makeVertexLabel("User").make()
m.buildIndex("index::User::uid", Vertex.class).addKey(uid_property).indexOnly(user_label).buildCompositeIndex()
m.commit()
Upon inspection with m.printSchema() I can see that the index's are ENABLED, in both my local environment and production environment.
I proceed to import all the data that needs to exist on the graph, both local env and production env are OK.
Performing Queries
The following outline what happens when I run a query
Local Environment
What we see here is a simple lookup just to check that the query is using the index:
gremlin> g.V().has("User", "uid", "00003b90-dcc2-494d-a179-ac9009029501").profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(User), uid.eq(... 1 1 1.837 100.00
\_condition=(~label = User AND uid = 00003b90-dcc2-494d-a179-ac9009029501)
\_isFitted=true
\_query=multiKSQ[1]#4000
\_index=index::User::uid
\_orders=[]
\_isOrdered=true
optimization 0.038
optimization 0.497
backend-query 1 0.901
\_query=index::User::uid:multiKSQ[1]#4000
\_limit=4000
>TOTAL - - 1.837 -
Production Environment
Again, we run the query to see if it using the index (which it is not)
g.V().has("User", "uid", "00003b90-dcc2-494d-a179-ac9009029501").profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(User), uid.eq(... 1 1 11296.568 100.00
\_condition=(~label = User AND uid = 00003b90-dcc2-494d-a179-ac9009029501)
\_isFitted=false
\_query=[]
\_orders=[]
\_isOrdered=true
optimization 0.025
optimization 0.102
scan 0.000
\_condition=VERTEX
\_query=[]
\_fullscan=true
>TOTAL - - 11296.568 -
What Happened? So far my best guess:
The storage backend is NOT being used for storing data, but is being used for storing information about the indexes
Update: Aug 16 2021, after digging around some more I found out something interesting
It is now clear that the data is actually not being saved to the storage backend at all.
In my local environment I set the storage.directory environment variable to /var/lib/janusgraph/data, which mounts onto an empty directory, this directory remains empty. Any vertex/edge updates get's saved to the scyllaDB storage backend, and the data persists between janusgraph instance restarts.
In my production environment, this directory (/var/lib/janusgraph/data) is populated with files:
-rw-r--r-- 1 janusgraph janusgraph 0 Aug 16 05:46 je.lck
-rw-r--r-- 1 janusgraph janusgraph 9650 Aug 16 05:46 je.config.csv
-rw-r--r-- 1 janusgraph janusgraph 450 Aug 16 05:46 je.info.0
-rw-r--r-- 1 janusgraph janusgraph 0 Aug 16 05:46 je.info.0.lck
drwxr-xr-x 2 janusgraph janusgraph 118 Aug 16 05:46 .
-rw-r--r-- 1 janusgraph janusgraph 7533 Aug 16 05:46 00000000.jdb
drwx------ 1 janusgraph janusgraph 75 Aug 16 05:53 ..
-rw-r--r-- 1 janusgraph janusgraph 19951 Aug 16 06:09 je.stat.csv
and any subsequent updates on the graph seem to be reflected here, the update do not get put onto the storage backend, and other janusgraph instances on kubernetes cannot see any changes other instances make, leading me to come to the conclusion, the storage backend is not being used for storing data
The domain name used for the storage.hostname and index.hostname both resolve to IP address's, confirmed with using nslookup.
The endpoints must also work, as the keyspace janusgraph is created, and also has a different replication factor that I defined, and also retains the index information regardless of restarting the janusgraph instances.
Idea 1 (Index is not enabled)
This was disproved via running m.printSchema() showing that all the index's were ENABLED
Idea 2 (Storage backends have different data)
I looked at the data stored in scylladb, and got a summary with nodetool cfstats, this does show something different:
# Local
Keyspace : janusgraph
Read Count: 1688328
Read Latency: 2.5682805710738673E-5 ms
Write Count: 1055210
Write Latency: 1.702409946835227E-5 ms
...
Memtable cell count: 126411
Memtable data size: 345700491
Memtable off heap memory used: 480247808
# Production
Keyspace : janusgraph
Read Count: 6367
Read Latency: 2.1203078372860058E-5 ms
Write Count: 21
Write Latency: 0.0 ms
...
Memtable cell count: 4
Memtable data size: 10092
Memtable off heap memory used: 131072
Although I don't know how to explain the difference, it is clear that both backends contain all the data, verified with various count() queries over labels, such as g.V().hasLabel("User").count(), which both environments report the same result
Idea 3 (Elasticsearch Warnings)
When launching a gremlin console session, there is a difference in that the production environment shows:
07:27:09 WARN org.elasticsearch.client.RestClient - request [PUT http://*******<i_removed_the_domain>******:9200/_cluster/settings] returned 3 warnings: [299 Elasticsearch-7.13.4-c5f60e894ca0c61cdbae4f5a686d9f08bcefc942 "[node.data] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version."],[299 Elasticsearch-7.13.4-c5f60e894ca0c61cdbae4f5a686d9f08bcefc942 "[node.master] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version."],[299 Elasticsearch-7.13.4-c5f60e894ca0c61cdbae4f5a686d9f08bcefc942 "[node.ml] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version."]
but as my problem is using composite index's, I believe we can disregard elasticsearch warnings.
Idea 4 (ScyllaDB cluster node resources)
Another idea I had was increasing the node resources, even with 7gb RAM, the problem still persists.
Finally...
I don't know what to try next in order to solve this problem, this is my first time pushing Janusgraph into production and perhaps I have missed something important. I have been stuck on this problem for quite a while, hence now asking the community here for help.
Thank you very much for reading this for, and hopefully helping me to solve this problem
I solved the problem myself, I realised that my K8s Deployment .yaml file I use for deploying needed all environment variables to have the prefix janusgraph., as such the janusgraph server was starting with all default variables rather than my selected ones.
Every-time I was creating a gremlin shell session (which connected to it's localhost server), although I was specifying all the correct endpoints and configuration, it was still saving the data according to default janusgraph variables. Although, even in this case, I don't know why the index's were successfully created on my specified backend.
But none the less, the solution was to make sure environment variables have the prefix janusgraph.

Having a Redis Timeout/Replica issue

We are currently having a few issues with our program with Redis that happen and occur randomly. We do not understand what is going on.
The code is open source as well if anyone wants to tag in and help on these two repos:
https://github.com/pythonology/BeatTogether.Core
https://github.com/pythonology/BeatTogether.MasterServer/tree/feature/dedicated-servers (using the branch at the moment for dev purposes)
2020-12-10 03:22:17.829 +00:00 [ERR] An error occurred while handling message (MessageType='BroadcastServerStatusRequest').
StackExchange.Redis.RedisServerException: ERR Error running script (call to f_dce2ce9e149bc775618a47cd07eca24d80a33664): #user_script:4: #user_script: 4: -READONLY You can't write against a read only replica.
at BeatTogether.MasterServer.Data.Implementations.Repositories.ServerRepository.AddServer(Server server) in /app/BeatTogether.MasterServer.Data/Implementations/Repositories/ServerRepository.cs:line 80
at BeatTogether.MasterServer.Kernel.Implementations.UserService.BroadcastServerStatus(MasterServerSession session, BroadcastServerStatusRequest request) in /app/BeatTogether.MasterServer.Kernel/Implementations/UserService.cs:line 156
at BeatTogether.Core.Messaging.Implementations.BaseMessageHandler.<>c__DisplayClass6_0`2.<<Register>b__0>d.MoveNext()
--- End of stack trace from previous location --
and
2020-12-10 06:20:23.198 +00:00 [ERR] An error occurred while handling message (MessageType='BroadcastServerStatusRequest').
StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: EVAL, mc: 1/1/0, mgr: 10 of 10 available, clientName: master-server, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=3,Free=32764,Min=1024,Max=32767), v: 2.2.4.27433 at StackExchange.Redis.ConnectionMultiplexer.ThrowFailed[T](TaskCompletionSource`1 source, Exception unthrownException) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2760
--- End of stack trace from previous location ---
at BeatTogether.MasterServer.Data.Implementations.Repositories.ServerRepository.AddServer(Server server) in /app/BeatTogether.MasterServer.Data/Implementations/Repositories/ServerRepository.cs:line 80
at BeatTogether.MasterServer.Kernel.Implementations.UserService.BroadcastServerStatus(MasterServerSession session, BroadcastServerStatusRequest request) in /app/BeatTogether.MasterServer.Kernel/Implementations/UserService.cs:line 156
at BeatTogether.Core.Messaging.Implementations.BaseMessageHandler.<>c__DisplayClass6_0`2.<<Register>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at BeatTogether.Core.Messaging.Implementations.BaseMessageSource.<>c__DisplayClass15_0.<<Signal>b__0>d.MoveNext()
Our redis instance doesn't have slaves, only a master and its one instance on the same machine.
Any help is greatly appreciated. Thanks.

Stripped pool starvation in WAL writing causes node cluster node failure

Moderate workload on 3 node ignite cluster causes one node to fail with stripped pool startvation while archiving WAL.
This happens one or two times in week.
I already checked all IO problems which could hang WAL rollover. But this issue still persist
I am using latest ignite 2.7 as a library inside spring boot application
: >>> Possible starvation in striped pool.
Deadlock: false
Completed: 1397
Thread [name="sys-stripe-7-#8%server.node%", id=22, state=WAITING, blockCnt=3, waitCnt=757]
Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync#b01791b, ownerName=sys-#214%server.node%, ownerId=248]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.awaitNext(FileWriteAheadLogManager.java:2871)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$2300(FileWriteAheadLogManager.java:2451)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.rollOver(FileWriteAheadLogManager.java:1205)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:836)
at o.a.i.i.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4267)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6333)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6082)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5782)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2295)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3242)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:135)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:309)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:304)
at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
ERROR --- [tcp-disco-msg-worker-#2%server.node%] [] o.a.i.i.u.t.G : Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=sys-stripe-1, blockedFor=10s]
WARN --- [tcp-disco-msg-worker-#2%server.node%] [] o.a.i.i.u.t.G : Thread [name="sys-stripe-1-#2%server.node%", id=16, state=WAITING, blockCnt=0, waitCnt=754]
Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync#b01791b, ownerName=sys-#214%server.node%, ownerId=248]
Failure Detection feature is not very well configured in Apache Ignite 2.7 by default. You can turn it off (by setting to NoOp) or set a large failureDetectionTimeout to avoid such messages (and shutdown of nodes).

HANA hdbindexserver start issue after power outage

There was a power outage for our 5+1 node HANA cell cluster.
After we booted up the servers, tried to start the HANA DB.
During HDB start with SIDADM we can see on the nodes 2-3-4-5:
FAIL: process hdbindexserver HDB Indexserver not running
So of course trying to start hdbindexserver with hand with SIDADM:
cd /usr/sap/SIDADM/HDB0x/exe; ./hdbindexserver
But this just produces error:
/usr/sap/SIDADM/HDB0x/foobar003/trace> cat indexserver_alert_foobar003.trc
...
[14268]{-1}[-1/-1] 2017-10-09 19:55:34.593776 e TrexNet Communication.cpp(00501) : no internal interface found
[14287]{-1}[-1/-1] 2017-10-09 19:56:01.428226 e Checkpoint CheckpointMgr.cc(00244) : Skip versions garbage collection savepoint: transaction distribution work failure: snapshot timestamp synchronization failed
[14287]{-1}[-1/-1] 2017-10-09 19:56:22.467184 e Row_Engine transdtx.cc(01410) : Unexpected ltt exception thrown: transaction distribution work failure (at foobar/ptime/storage/tm/transdtx.cc:1410 )
[14287]{-1}[-1/-1] 2017-10-09 19:56:22.467427 f PersistenceLayer PersistenceController.cpp(00679) : startup failed exception 1: no.71000145 (ptime/storage/tm/transdtx.cc:1512)
snapshot timestamp synchronization failed
...
The IPs are up. There is 1 TB of RAM.
The question: what could cause hdbindexserver to fail to start?
Looks like the indexserver process wasn't able to bind the internal network interface again:
Communication.cpp(00501) : no internal interface found
I'd look into the other tracefiles and the system log to check whether the configured NI is up and available.
It seems the persistence storage (disk where data and log file resides) is not responding within time and hence it's getting timed out. Can you check if you can access the data file and log file from the server.
Also check is network I/O slow or disk I/O slow on that server, causing the synchronization to timeout.
You can try stopping the system completely and try to bring HDB on just that server first to check if above issue exists.

Authentication failures in cassandra when 1 of 16 nodes is down

I have a Cassandra cluster running :
Cassandra 2.0.11.83 | DSE 4.6.0 | CQL spec 3.1.1 | Thrift protocol 19.39.0
The cluster has 18 nodes, split among 3 datacenters, 6 in each. My system_auth keyspace has the following replication defined:
replication = {
'class': 'NetworkTopologyStrategy',
'DC1': '4',
'DC2': '4',
'DC3': '4'}
and my authenticator/authorizer are set to:
authenticator: org.apache.cassandra.auth.PasswordAuthenticator
authorizer: org.apache.cassandra.auth.CassandraAuthorizer
This morning I brought down one of the nodes in DC1 for maintenance. Within a few seconds/minute client applications started logging exceptions like this:
"User my_application_user has no MODIFY permission on or any of its parents"
Running 'LIST ALL PERMISSIONS of my_application_user' on one of the other nodes shows that user to have SELECT and MODIFY on the keyspace xxxxx, so I am rather confused. Do I have a setup issue? Is this a bug of some sort?
Re-posting this as the answer, as BrianC suggested above.
So this is resolved... Here's the sequence of events that seems to have fixed it:
Add 18 more nodes
Run cleanup on original nodes (this was part of the original plan)
Run a scrub on 1 table, since it was throwing exceptions on cleanup
Run a repair on the system_auth KS on the original troubled node
Wait for repair service to complete a full pass on all keyspaces
Decom original 18 nodes.
Honestly, I don't know what fixed it. The system_auth repair makes most sense, but what doesn't make sense is that it had run many passes before, so why work now, I don't know. I hope this at least helps someone.