nifi - SelectHiveQL returns Cannot create PoolableCOnnectionFactory - hive

I'm trying to transfer from ExecuteSQL to SelectHiveQL but when a FlowFile reaches SelectHiveQL it fails with Cannot create PoolableConnectionFactory (Could not establish conection to jdbc:hive2://<server>:<port>/<db>;auth=noSasl;mapreduce.map.meory.mb=4000: null). No FlowFile to route to failure...
And deletes my FlowFile. The DBCPConnectionPool is configured the same but it works(but returns corrupted data).
It's not just strange because it doesn't work, it's also strange because it just deletes the FlowFile instead of routing it to failure.
Partial stack trace:
at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1549)
......
Caused by: java.sql.SQLException: Could not establish connection to jdbc:hive2://<server>:<port>/<db>;auth=noSasl;mapreduce.map.memory.mb=4000: null
at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:594)
...
at org.apache.commons.dbcp.BasicDataSource.createPolableCOnectionFactory(BASicDataSource.java:1545)
....
Caused by: org.apache.thrift.transport.TTransportException: null
at org.apache.thrit.transport.TIOStreamTransport.read(TIOStreamTransport.java: 132)
.....
at org.apache.hive.jdbc.HIveCOnection.openSession(HiveCOnnection.java:583)
NiFi Version: 1.6.0(although it didn't work in 1.4.0 either)
Java Version: 1.8.0_121
Hive Version: 1.1.0-cdh5.7.1
Any help? Thanks..

The version of Hive in NiFi is based on 1.2.1, which is not compatible with 1.1.0. I believe you were seeking an alternative by using a Simba driver and the non-Hive processors like ExecuteSQL, I haven’t tried that so I’m not sure if that’s a valid workaround or not.

Related

spark-bigquery-connector VS firewall

Need some help. Trying to import data from BigQuery using spark-bigquery-connector: spark 2.4.0, scala 2.11.12, hadoop 2.7, spark-bigquery-with-dependencies_2.11-0.24.2
The corporate firewall blocks access to external services. Tell me, please, what urls need to be provided for spark-bigquery-connector to work?
Have this error:
Exception in thread "main" com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Error getting access token for service account: Connection refused: connect, iss:

How to connect to remote hive server using spark2.0

I m trying to connect to a remote hive server which is in different cluster using spark. Used both hive2 and thrift but no luck
val s = SparkSession.builder().appName("Man test").config("hive.metastore.uris", "jdbc:hive2://abc.svr.yy.xxxx.net:2171/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST#abc.AD.xxx.COM").enableHiveSupport().getOrCreate()
val s = SparkSession.builder().appName("Man test").config("hive.metastore.uris", "thrift://xxxx.svr.us.yyyy.net:2000").config("spark.sql.warehouse.dir", "/apps/hive/warehouse").enableHiveSupport().getOrCreate()
println("in method session created")
s.sql("show databases").show()
I m getting the below error when use jdbc:hive2
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
and when I use thrift :
javax.security.sasl.SaslException: No common protection layer between client and server.
please let me know if I am missing something here.
I solved the same issue by adding the following to the JVM options.
-Djavax.security.sasl.qop="auth-conf"
See: https://github.com/prestodb/presto/issues/8604

PutHiveStreaming Processor in Nifi throws NPE

I'm debugging a HiveProcessor which follows the official PutHiveStreaming processor, but it writes to Hive 2.x instead of 3.x. The flow runs in Nifi cluster 1.7.1. Although this exception happens, data is still written to Hive.
The exception is:
java.lang.NullPointerException: null
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1147)
at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:471)
at sun.reflect.GeneratedMethodAccessor1641.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
at com.sun.proxy.$Proxy308.isOpen(Unknown Source)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:270)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:558)
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:95)
at org.apache.hive.hcatalog.streaming.StrictJsonWriter.<init>(StrictJsonWriter.java:82)
at org.apache.hive.hcatalog.streaming.StrictJsonWriter.<init>(StrictJsonWriter.java:60)
at org.apache.nifi.util.hive.HiveWriter.lambda$getRecordWriter$0(HiveWriter.java:91)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.nifi.util.hive.HiveWriter.getRecordWriter(HiveWriter.java:91)
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:75)
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
at org.apache.nifi.processors.hive.PutHive2Streaming.makeHiveWriter(PutHive2Streaming.java:1152)
at org.apache.nifi.processors.hive.PutHive2Streaming.getOrCreateWriter(PutHive2Streaming.java:1065)
at org.apache.nifi.processors.hive.PutHive2Streaming.access$1000(PutHive2Streaming.java:114)
at org.apache.nifi.processors.hive.PutHive2Streaming$1.lambda$process$2(PutHive2Streaming.java:858)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
at org.apache.nifi.processors.hive.PutHive2Streaming$1.process(PutHive2Streaming.java:855)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2211)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2179)
at org.apache.nifi.processors.hive.PutHive2Streaming.onTrigger(PutHive2Streaming.java:808)
at org.apache.nifi.processors.hive.PutHive2Streaming.lambda$onTrigger$4(PutHive2Streaming.java:672)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHive2Streaming.onTrigger(PutHive2Streaming.java:672)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I also like to re-produce the error. Would using TestRunners.newTestRunner(processor); be able to find it? I refer to the test case for Hive 3.x
https://github.com/apache/nifi/blob/ea9b0db2f620526c8dd0db595cf8b44c3ef835be/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestPutHiveStreaming.java
The other way is to run Hive 2.x and Nifi container locally. But then I have to run docker cp to copy the nar package by mvn, and attach remote JVM from intellij as this blog describes.
https://community.hortonworks.com/articles/106931/nifi-debugging-tutorial.html
Have someone done similar? or is there an easier way to debug a custom processor?
This is a red herring error, there's some issue on the Hive side where it can't get its own IP address or hostname, and issues this error periodically as a result. However I don't think it causes any real problems, as you said the data gets written to Hive.
Just for completeness, in Apache NiFi PutHiveStreaming is built to work against Hive 1.2.x, not Hive 2.x. There are currently no specific Hive 2.x processors, we've never determined whether the Hive 1.2.x processors work against Hive 2.x.
For debugging, if you can run Hive in a container and expose the metastore port (9083 is the default I believe), then you should be able to create an integration test using things like TestRunners and run NiFi locally from your IDE. This is how other integration tests are performed for external systems such as MongoDB or Elasticsearch for example.
There is a MiniHS2 class in the Hive test suite for integration testing, but it is not in a published artifact so unfortunately we're left with having to run tests against a real Hive instance.
NPE doesn't show up after hcatalog.hive.client.cache.disabled set to true
Kafka Connect recommends this setting, too.
from Kafka Connect Doc https://docs.confluent.io/3.0.0/connect/connect-hdfs/docs/hdfs_connector.html
As connector tasks are long running, the connections to Hive metastore
are kept open until tasks are stopped. In the default Hive
configuration, reconnecting to Hive metastore creates a new
connection. When the number of tasks is large, it is possible that the
retries can cause the number of open connections to exceed the max
allowed connections in the operating system. Thus it is recommended to
set hcatalog.hive.client.cache.disabled to true in hive.xml.
When Max Concurrent Tasks of PutHiveStreaming is set more than 1, this property is automatically set as false
Also the document from Nifi resolved the issue already.
The NiFi PutHiveStreaming has a pool of connections, therefore
multithreaded; Setting hcatalog.hive.client.cache.disabled to true
would allow each connection to set is own Session without relying on
the cache.
ref:
https://community.hortonworks.com/content/supportkb/196628/hive-client-puthivestreaming-fails-against-partiti.html

JDBC connection failed "Could not initialize class org.apache.hive.jdbc.HiveConnection"

My WEBLOGIC_CLASSPATH:
${MW_HOME}/oracle_common/common/bin/CommExtEnv.sh
WEBLOGIC_CLASSPATH="${JAVA_HOME}/lib/tools.jar${CLASSPATHSEP}${PROFILE_CLASSPATH}
${CLASSPATHSEP}${ANT_CONTRIB}/ant-contrib1.0b3.jar${CLASSPATHSEP}${CAM_NODEMANAGER_JAR_PATH}${CLASSPATHSEP}/scratch/hadoop-core-1.1.2.jar${CLASSPATHSEP}/scratch/hive-jdbc-1.2.0-standalone.jar"
Stacktrace:
Could not establish a connection because of
java.lang.ExceptionInInitializerErrorweblogic.jdbc.common.internal.DataSourceUtil.testConnection0(DataSourceUtil.java:423)weblogic.jdbc.common.internal.DataSourceUtil.access$000(DataSourceUtil.java:24)weblogic.jdbc.common.internal.DataSourceUtil$1.run(DataSourceUtil.java:285)java.security.AccessController.doPrivileged(Native
Method)weblogic.jdbc.common.internal.DataSourceUtil.testConnection(DataSourceUtil.java:282)com.bea.console.utils.jdbc.JDBCUtils.testConnection(JDBCUtils.java:937)com.bea.console.actions.jdbc.datasources.createjdbcdatasource.CreateJDBCDataSource.testConnectionConfiguration(CreateJDBCDataSource.java:524)sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)java.lang.reflect.Method.invoke(Method.java:498)org.apache.beehive.netui.pageflow.FlowController.invokeActionMethod(FlowController.java:870)org.apache.beehive.netui.pageflow.FlowController.getActionMethodForward(FlowController.java:809)org.apache.beehive.netui.pageflow.FlowController.internalExecute(FlowController.java:478)org.apache.beehive.netui.pageflow.PageFlowController.internalExecute(PageFlowController.java:306)org.apache.beehive.netui.pageflow.FlowController.execute(FlowController.java:336)org.apache.beehive.netui.pageflow.internal.FlowControllerAction.execute(FlowControllerAction.java:52)org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:431)org.apache.beehive.netui.pageflow.PageFlowRequestProcessor.access$201(PageFlowRequestProcessor.java:97)org.apache.beehive.netui.pageflow.PageFlowRequestProcessor$ActionRunner.execute(PageFlowRequestProcessor.java:2044)...
I have resolved by using the latest drivers at https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/2.2.0
and updated by WEBLOGIC_CLASSPATH with below jars
WEBLOGIC_CLASSPATH=${WEBLOGIC_CLASSPATH}${CLASSPATHSEP}/hadoop-common-2.2.0.jar${CLASSPATHSEP}/hive-jdbc-2.0.0.jar${CLASSPATHSEP}/hive-jdbc-2.0.0-standalone.jar"
I am able to create JDBC connection successfully after bouncing the Weblogic.

Liferay stopped at database shutdown caused a crash

I was stopping the Liferay portal, but few seconds after, I stopped the database (db2 quiesce, that means, that the connections are closed) and apparently, Liferay did not stopped correctly its execution.
After that, I restarted the database and liferay, but the portal does not work now. It shows this message in the browser:
HTTP Status 500 -
type Exception report
message
description The server encountered an internal error () that prevented it from fulfilling this request.
exception
javax.servlet.ServletException: Servlet execution threw an exception
com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:72)
...
root cause
java.lang.NoSuchMethodError: com.liferay.portal.util.PortalUtil.getCDNHostHttp()Ljava/lang/String;
com.liferay.portal.events.ServicePreActionExt.servicePre(ServicePreActionExt.java:937)
After looking in the logs, I found the following messages (they are edited):
SEVERE: Error waiting for multi-thread deployment of directories to completehostConfig.deployWar=Deploying web application archive {0}
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1000)
WARN [DefaultConnectionTester:203] SQL State '08001' of Exception which occurred during a Connection test (fallback DatabaseMetaData test) implies that the database is invalid, and the pool should refill itself with fresh Connections.
com.ibm.db2.jcc.am.DisconnectNonTransientConnectionException: [jcc][t4][2030][11211][3.63.75] A communication error occurred during operations on the connection's underlying socket, socket input stream, or socket output stream. Error location: Reply.fill() - insufficient data (-1). Message: Insufficient data. ERRORCODE=-4499, SQLSTATE=08001
at com.ibm.db2.jcc.am.fd.a(fd.java:321)
WARN [DefaultConnectionTester:136] SQL State '08001' of Exception tested by statusOnException() implies that the database is invalid, and the pool should refill itself with fresh Connections.
WARN [C3P0PooledConnectionPool:708] A ConnectionTest has failed, reporting that all previously acquired Connections are likely invalid. The pool will be reset.
WARN [NewPooledConnection:486] [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
WARN [NewPooledConnection:487] [c3p0] Another error has occurred [ com.ibm.db2.jcc.am.SqlNonTransientConnectionException: [jcc][t4][10335][10366][3.63.75] Invalid operation: Connection is closed. ERRORCODE=-4470, SQLSTATE=08003 ] which will not be reported to listeners!
com.ibm.db2.jcc.am.SqlNonTransientConnectionException: [jcc][t4][10335][10366][3.63.75] Invalid operation: Connection is closed. ERRORCODE=-4470, SQLSTATE=08003
WARN [BasicResourcePool:1841] com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask#4fad5112 -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (3). Last acquisition attempt exception:
com.ibm.db2.jcc.am.SqlNonTransientConnectionException: DB2 SQL Error: SQLCODE=-20157, SQLSTATE=08004, SQLERRMC=FUT5MAN;QUIESCE DATABASE;;, DRIVER=3.63.75
ERROR [PortalJobStore:109] MisfireHandler: Error handling misfires: Unexpected runtime exception: null
org.quartz.JobPersistenceException: Unexpected runtime exception: null [See nested exception: java.lang.reflect.UndeclaredThrowableException]
Caused by: java.lang.reflect.UndeclaredThrowableException
at $Proxy279.prepareStatement(Unknown Source)
at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.countMisfiredTriggersInState(StdJDBCDelegate.java:413)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
Caused by: java.sql.SQLException: Connections could not be acquired from the underlying database!
at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:106)
Caused by: com.mchange.v2.resourcepool.CannotAcquireResourceException: A ResourcePool could not acquire a resource from its primary factory or source.
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1319)
Now, I see that it is almost impossible to start the current Liferay installation. However, I have the database (I made a full backup), and the lucene's data directory. How can I recreate a Liferay installation with these two things? I would like to recover some of this data in a new installation, but I do not how.
This is not the best solution, but I installed Liferay with a new database. Once it was configured, I change the database configuration in order to use the other one.
Probably, it was a problem with the ROOT deployment, but this is very weird.
I could recover all the data from the Lucene and the database.
The database is still quiesced and the Liferay user doesn't have the QUIESCE_CONNECT privilege.
Unquiesce the database and restart Liferay.
Using DB2 instance owner (if you're on Windows, any administrator):
db2 connect to DBNAME
db2 unquiesce database
db2 connect reset
Regards.