AtomikosSQLException when using Atomikos and liquibase - liquibase

I'm having problems getting liquibase to work with atomikos in my spring web application. The project works fine until I try to add liquibase into the mix. When I configure liquibase for my project, I am seeing an exception. The key piece is where it says:
Caused by: com.atomikos.jdbc.AtomikosSQLException: Connection was already closed - calling hashCode is no longer allowed!
Liquibase is trying to call hashCode() on the JDBC connection, which was already closed. I'm trying to figure out if there is a way to configure Atomikos (or perhaps Hibernate?) to keep the connection open long enough for liquibase to do its thing?
Has anyone had experience with liquibase + atomikos?
Here's the stack trace:
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [com/mycompany/myapp/config/XaDatabaseConfiguration.class]: Invocation of init method failed; nested exception is java.lang.reflect.UndeclaredThrowableException
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1566)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:351)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:108)
at org.springframework.beans.factory.support.ConstructorResolver.resolveConstructorArguments(ConstructorResolver.java:637)
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:447)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1111)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1006)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:504)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveInnerBean(BeanDefinitionValueResolver.java:299)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:129)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1469)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1214)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:537)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
…
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:198)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.getOrderedBeansOfType(ServletContextInitializerBeans.java:176)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.addAsRegistrationBean(ServletContextInitializerBeans.java:141)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.addAsRegistrationBean(ServletContextInitializerBeans.java:136)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.addAdaptableBeans(ServletContextInitializerBeans.java:119)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.<init>(ServletContextInitializerBeans.java:69)
at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.getServletContextInitializerBeans(EmbeddedWebApplicationContext.java:216)
at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext$1.onStartup(EmbeddedWebApplicationContext.java:202)
at org.springframework.boot.context.embedded.tomcat.ServletContextInitializerLifecycleListener.lifecycleEvent(ServletContextInitializerLifecycleListener.java:64)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5095)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1409)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1399)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.UndeclaredThrowableException: null
at com.sun.proxy.$Proxy92.hashCode(Unknown Source)
at liquibase.database.jvm.JdbcConnection.hashCode(JdbcConnection.java:436)
at liquibase.database.AbstractJdbcDatabase.hashCode(AbstractJdbcDatabase.java:1143)
at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106)
at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
at liquibase.executor.ExecutorService.clearExecutor(ExecutorService.java:42)
at liquibase.database.AbstractJdbcDatabase.close(AbstractJdbcDatabase.java:1161)
at liquibase.integration.spring.SpringLiquibase.afterPropertiesSet(SpringLiquibase.java:326)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1625)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1562)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
…
at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:87)
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:331)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1202)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:537)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:371)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1111)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1006)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:504)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:198)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.getOrderedBeansOfType(ServletContextInitializerBeans.java:176)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.addAsRegistrationBean(ServletContextInitializerBeans.java:141)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.addAsRegistrationBean(ServletContextInitializerBeans.java:136)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.addAdaptableBeans(ServletContextInitializerBeans.java:119)
at org.springframework.boot.context.embedded.ServletContextInitializerBeans.<init>(ServletContextInitializerBeans.java:69)
at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.getServletContextInitializerBeans(EmbeddedWebApplicationContext.java:216)
at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext$1.onStartup(EmbeddedWebApplicationContext.java:202)
at org.springframework.boot.context.embedded.tomcat.ServletContextInitializerLifecycleListener.lifecycleEvent(ServletContextInitializerLifecycleListener.java:64)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5095)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1409)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1399)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.atomikos.jdbc.AtomikosSQLException: Connection was already closed - calling hashCode is no longer allowed!
at com.atomikos.jdbc.AtomikosSQLException.throwAtomikosSQLException(AtomikosSQLException.java:46)
at com.atomikos.jdbc.AtomikosSQLException.throwAtomikosSQLException(AtomikosSQLException.java:57)
at com.atomikos.jdbc.AtomikosConnectionProxy.invoke(AtomikosConnectionProxy.java:120)
at com.sun.proxy.$Proxy92.hashCode(Unknown Source)
at liquibase.database.jvm.JdbcConnection.hashCode(JdbcConnection.java:436)
at liquibase.database.AbstractJdbcDatabase.hashCode(AbstractJdbcDatabase.java:1143)
at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106)
at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
at liquibase.executor.ExecutorService.clearExecutor(ExecutorService.java:42)
at liquibase.database.AbstractJdbcDatabase.close(AbstractJdbcDatabase.java:1161)
at liquibase.integration.spring.SpringLiquibase.afterPropertiesSet(SpringLiquibase.java:326)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1625)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1562)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)

Here's some additional details that get to the underlying cause.
After liquibase works its magic, it closes the database connection (all well and good). Then it tries to remove the associated "executor" instance from a ConcurrentHashMap by calling ConcurrentHashMap#remove(Object key), where the key of the map is the database connection object.
Of course, at that point the hash map has to compute a hash code on the key. That is where we run into an Atomikos problem.
In this case the key is an instance of AtomikosConnectionProxy. The attempt to call hashCode() results in the AtomikosConnectionProxy#invoke(Object proxy, Method method, Object [] args) method being called. The logic in that method only allows calling getInvocationHandler, reap, and isClosed methods after the database connection is closed. Any other method call after the connection is closed will result in the above exception.
If you want liquibase to work with the existing Atomikos code, liquibase would have to clear the executor from the map prior to closing the connection.

Issue is solved (independently) in
liquibase 3.5.2 https://liquibase.jira.com/browse/CORE-2780
atomikos 4.0 http://fogbugz.atomikos.com/default.asp?community.6.3282.4

Related

Using Pandas UDF with Large Broadcast object

I am trying to use Pandas UDF GROUPEDMAP to do some processing. The code is below:
from pyspark.sql.functions import pandas_udf, PandasUDFType, spark_partition_id
models_broadcast = self.spark_session.sparkContext.broadcast(models)
#pandas_udf("id string, score string",
PandasUDFType.GROUPED_MAP)
def _segment_partition_score(segment_partition_pd):
my_models = models_broadcast.value # comment this line, the code ran run through
segment_partition_pd["score"] = "aaa"
segment_score_pd = segment_partition_pd[['id', 'score']]
return segment_score_pd
model_score = my_data_df.withColumn("pid", spark_partition_id()).groupby('pid').apply(_segment_partition_score)
model_score.show(100)
Here is the error message. It simply states " java.net.SocketException: Connection reset"
y4JJavaError: An error occurred while calling o1525.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 31.0 failed 4 times, most recent failure: Lost task 2.3 in stage 31.0 (TID 2466, 10.139.64.43, executor 10): java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:181)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:144)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
at org.apache.spark.scheduler.Task.run(Task.scala:113)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2362)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2350)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2349)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2349)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1102)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2582)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2529)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2517)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:897)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280)
at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:270)
at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:280)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:80)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:86)
at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:57)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectResult(Dataset.scala:2905)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3517)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634)
at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3501)
at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3496)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1$$anonfun$apply$1.apply(SQLExecution.scala:112)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:217)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:835)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:74)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:169)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3496)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2634)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2848)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:279)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:316)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:181)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:144)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
at org.apache.spark.scheduler.Task.run(Task.scala:113)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
The broadcast object is a pickle object which is around ~100 MB. It is a trained ski-learn machine learning model.
But you may find, in the UDF _segment_partition_score, I actually didn't use this broadcast object. I just want to get it at the UDF, for debugging purpose. Even this, the pyspark program will crash.
If I don't receive this broadcast object in the Pandas UDF, the program can run through, and generate result.
The dataframe "my_data_df" is very small (700 MB in Parquet) . I am sure given the existing cluster size (64GB) * 20 workers, it should be no problem to deal with it.
I highly suspect that it is the problem of serialization for the broadcast object, either size or type of serializer. But I have no idea what should I tune.
Could anyone point me out?
Thanks in advance.

Payara DataSource not being injected

I'm using Payara 5.184 and Java 1.8.241 on Linux (Ubuntu 19.10). I'm also using Liquibase as my database schema change management. A field named myDataSource is being injected as follows:
#Resource(mappedName = "java:global/ds")
private DataSource myDataSource;
When the method createDataSource() from Liquibase is invoked I noticed that the variable myDataSource is null, what makes me understand that the resource is not being properly injected. This error seems to happen only on Linux, as my other colleagues that run Windows did not have any problems so far. We're using exactly the same versions of Payara/Java. Is there any specific step that needs to be done on Linux? Here it is the stacktrace found on Payara logs.
Exception during lifecycle processing
org.glassfish.deployment.common.DeploymentException: CDI deployment failure:Exception List with 2 exceptions:
Exception 0 :
org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke public void liquibase.integration.cdi.CDILiquibase.onStartup() on liquibase.integration.cdi.CDILiquibase#911a89a
at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:85)
at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:66)
at org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122)
at liquibase.integration.cdi.CDIBootstrap$1.create(CDIBootstrap.java:95)
at liquibase.integration.cdi.CDIBootstrap$1.create(CDIBootstrap.java:38)
at org.jboss.weld.contexts.AbstractContext.get(AbstractContext.java:96)
at org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
at org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:102)
at org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:131)
at liquibase.integration.cdi.CDILiquibase$Proxy$_$$_WeldClientProxy.toString(Unknown Source)
at liquibase.integration.cdi.CDIBootstrap.afterDeploymentValidation(CDIBootstrap.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:95)
at org.jboss.weld.injection.MethodInvocationStrategy$SpecialParamPlusBeanManagerStrategy.invoke(MethodInvocationStrategy.java:144)
at org.jboss.weld.event.ObserverMethodImpl.sendEvent(ObserverMethodImpl.java:330)
at org.jboss.weld.event.ExtensionObserverMethodImpl.sendEvent(ExtensionObserverMethodImpl.java:123)
at org.jboss.weld.event.ObserverMethodImpl.sendEvent(ObserverMethodImpl.java:308)
at org.jboss.weld.event.ObserverMethodImpl.notify(ObserverMethodImpl.java:286)
at javax.enterprise.inject.spi.ObserverMethod.notify(ObserverMethod.java:124)
at org.jboss.weld.util.Observers.notify(Observers.java:166)
at org.jboss.weld.event.ObserverNotifier.notifySyncObservers(ObserverNotifier.java:285)
at org.jboss.weld.event.ObserverNotifier.notify(ObserverNotifier.java:273)
at org.jboss.weld.event.ObserverNotifier.fireEvent(ObserverNotifier.java:177)
at org.jboss.weld.event.ObserverNotifier.fireEvent(ObserverNotifier.java:171)
at org.jboss.weld.bootstrap.events.AbstractContainerEvent.fire(AbstractContainerEvent.java:53)
at org.jboss.weld.bootstrap.events.AbstractDeploymentContainerEvent.fire(AbstractDeploymentContainerEvent.java:35)
at org.jboss.weld.bootstrap.events.AfterDeploymentValidationImpl.fire(AfterDeploymentValidationImpl.java:28)
at org.jboss.weld.bootstrap.WeldStartup.validateBeans(WeldStartup.java:499)
at org.jboss.weld.bootstrap.WeldBootstrap.validateBeans(WeldBootstrap.java:93)
at org.glassfish.weld.WeldDeployer.processApplicationLoaded(WeldDeployer.java:517)
at org.glassfish.weld.WeldDeployer.event(WeldDeployer.java:428)
at org.glassfish.kernel.event.EventsImpl.send(EventsImpl.java:131)
at org.glassfish.internal.data.ApplicationInfo.load(ApplicationInfo.java:333)
at com.sun.enterprise.v3.server.ApplicationLifecycle.prepare(ApplicationLifecycle.java:497)
at org.glassfish.deployment.admin.DeployCommand.execute(DeployCommand.java:540)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$2$1.run(CommandRunnerImpl.java:549)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$2$1.run(CommandRunnerImpl.java:545)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$2.execute(CommandRunnerImpl.java:544)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$3.run(CommandRunnerImpl.java:575)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$3.run(CommandRunnerImpl.java:567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:566)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.doCommand(CommandRunnerImpl.java:1475)
at com.sun.enterprise.v3.admin.CommandRunnerImpl.access$1300(CommandRunnerImpl.java:111)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1857)
at com.sun.enterprise.v3.admin.CommandRunnerImpl$ExecutionContext.execute(CommandRunnerImpl.java:1733)
at com.sun.enterprise.v3.admin.AdminAdapter.doCommand(AdminAdapter.java:564)
at com.sun.enterprise.v3.admin.AdminAdapter.onMissingResource(AdminAdapter.java:251)
at org.glassfish.grizzly.http.server.StaticHttpHandlerBase.service(StaticHttpHandlerBase.java:166)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:83)
... 74 more
Caused by: java.lang.NullPointerException
at liquibase.integration.cdi.CDILiquibase.performUpdate(CDILiquibase.java:126)
at liquibase.integration.cdi.CDILiquibase.onStartup(CDILiquibase.java:116)
... 79 more
EDIT: Added stacktrace
EDIT 2: Problem solved! It ended up being something not related with Liquibase at all. The queue size of the PayaraExecutorService was too small, which leads me to think that some tasks submissions were rejected. Apparently this is a known issue of Payara 5.184, and a related issue had already been opened on Github (https://github.com/payara/Payara/issues/3495). Once I increased the size of the queue to a large number everything started working as expected.
I think you need to use lookup to inject the resource.
Compare this bug report: https://github.com/payara/Payara/issues/4413

Byte Buddy 1.9.x throws `java.lang.IllegalStateException: Cannot resolve type description for java.lang.String` errors. Is it a known issue?

I am seeing Cannot resolve type description errors when trying to load classes instrumented using Byte Buddy v1.9.11 within Mule 4 (JDK 1.8). Is there a known issue with Byte Buddy, or a particular combination of isolated classloader Mule 4 is using with Byte Buddy instrumented classes? Any suggestions are appreciated.
I am using Elastic Java APM agent to instrument Mule 4 with its over-complicated classloader isolation mechanism that seems to interfere with Byte Buddy v1.9.11 used by the APM agent.
java.lang.IllegalStateException: Cannot resolve type description for java.lang.String
at co.elastic.apm.agent.shaded.bytebuddy.pool.TypePool$Resolution$Illegal.resolve(TypePool.java:159)
at co.elastic.apm.agent.shaded.bytebuddy.pool.TypePool$Default$LazyTypeDescription$TokenizedGenericType.toErasure(TypePool.java:6505)
at co.elastic.apm.agent.shaded.bytebuddy.pool.TypePool$Default$LazyTypeDescription$LazyTypeList.get(TypePool.java:6313)
at co.elastic.apm.agent.shaded.bytebuddy.pool.TypePool$Default$LazyTypeDescription$LazyTypeList.get(TypePool.java:6286)
at java.util.AbstractList$Itr.next(AbstractList.java:358)
at co.elastic.apm.agent.shaded.bytebuddy.matcher.ElementMatchers.takesArguments(ElementMatchers.java:1301)
at co.elastic.apm.agent.shaded.bytebuddy.dynamic.scaffold.inline.InliningImplementationMatcher.of(InliningImplementationMatcher.java:71)
at co.elastic.apm.agent.shaded.bytebuddy.dynamic.scaffold.inline.RedefinitionDynamicTypeBuilder.make(RedefinitionDynamicTypeBuilder.java:202)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$Transformation$Simple$Resolution.apply(AgentBuilder.java:10132)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.doTransform(AgentBuilder.java:10551)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:10514)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.access$1500(AgentBuilder.java:10280)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:10889)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:10836)
at java.security.AccessController.doPrivileged(Native Method)
at co.elastic.apm.agent.shaded.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:10437)
at sun.instrument.TransformerManager.transform(TransformerManager.java:188)
at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:428)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at org.mule.runtime.module.artifact.api.classloader.FineGrainedControlClassLoader.findLocalClass(FineGrainedControlClassLoader.java:171)
at org.mule.runtime.module.artifact.api.classloader.FineGrainedControlClassLoader.loadClass(FineGrainedControlClassLoader.java:88)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at co.elastic.apm.api.NoopTransaction.startSpan(NoopTransaction.java:132)
at co.elastic.apm.agent.utils.SpanUtils.startSpan(SpanUtils.java:34)
at co.elastic.apm.agent.internal.ApmMessageProcessorNotificationListener.onNotification(ApmMessageProcessorNotificationListener.java:27)
at co.elastic.apm.agent.internal.ApmMessageProcessorNotificationListener.onNotification(ApmMessageProcessorNotificationListener.java:10)
at org.mule.runtime.core.api.context.notification.ServerNotificationManager.lambda$fireNotification$0(ServerNotificationManager.java:193)
at org.mule.runtime.core.internal.context.notification.Sender.dispatch(Sender.java:28)
at org.mule.runtime.core.internal.context.notification.Policy.dispatchToSenders(Policy.java:146)
at org.mule.runtime.core.internal.context.notification.Policy.doDispatch(Policy.java:134)
at org.mule.runtime.core.internal.context.notification.Policy.dispatch(Policy.java:106)
at org.mule.runtime.core.api.context.notification.ServerNotificationManager.notifyListeners(ServerNotificationManager.java:211)
at org.mule.runtime.core.api.context.notification.ServerNotificationManager.fireNotification(ServerNotificationManager.java:193)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain.fireNotification(AbstractMessageProcessorChain.java:373)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain.lambda$preNotification$23(AbstractMessageProcessorChain.java:339)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:190)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$1.onNext(AbstractMessageProcessorChain.java:292)
at org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain$1.onNext(AbstractMessageProcessorChain.java:285)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:127)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:204)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:204)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:204)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:398)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:484)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.mule.service.scheduler.internal.AbstractRunnableFutureDecorator.doRun(AbstractRunnableFutureDecorator.java:111)
at org.mule.service.scheduler.internal.RunnableFutureDecorator.run(RunnableFutureDecorator.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I assume that the class file resolver used by the Elasticsearch agent is querying the class loader directly where this particular class loader is hiding the class files in question.
You can argue that this is an issue of the class loader hiding the class file or an issue of the elastic search agent not attempting other class loaders or reattempting the redefinition of the loaded class. I'd contact both entities as this can disturb different tooling.

Why is my RabbitMQ message impossible to serialize using Apache Beam?

I'm trying to read a RabbitMQ queue using Apache Beam.
I've written some transformation code to have messages written to Kafka.
I've even tested my scenario using simple text messages.
Now I try to deploy it with the effective messages this transformer is made to run on. These are JSON message of a quite moderate size.
Strangely, when i try to read "production" messages, I get this exception stack trace.
java.lang.IllegalArgumentException: Unable to encode element 'ValueWithRecordId{id=[], value=org.apache.beam.sdk.io.rabbitmq.RabbitMqMessage#f179a7f}' with coder 'ValueWithRecordId$ValueWithRecordIdCoder(org.apache.beam.sdk.coders.SerializableCoder#76190fb2)'.
org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300)
org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: com.rabbitmq.client.impl.LongStringHelper$ByteArrayLongString
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
java.util.HashMap.internalWriteEntries(HashMap.java:1785)
java.util.HashMap.writeObject(HashMap.java:1362)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
org.apache.beam.sdk.coders.SerializableCoder.encode(SerializableCoder.java:183)
org.apache.beam.sdk.coders.SerializableCoder.encode(SerializableCoder.java:53)
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:99)
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:297)
org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
My understanding is that the RabbitMQ reader consider the messages big enough to require the use of LongString, which is not serializable.
Am I right on this point ? And if so, how do I suggest RabbitMQ to use a simple String (which will be enough for these messages) ?
This is an Apache Beam (https://issues.apache.org/jira/browse/BEAM-7414) for which solution is ... not yet merged into Apache Beam repo by pure laziness (this is bad). If someone wants to have the fix immediatly, it is possible to build my branch https://github.com/Riduidel/beam/tree/fix/rabbitmq-message-not-serializable

Hive error : java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

I have hive query which run successful sometimes but maximum time gives an error "java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
Below is my error log
java.lang.RuntimeException: java.io.IOException: Couldn't create proxy
provider class org.apache.hadoop.hdfs.server.namenode.ha.Con\
figuredFailoverProxyProvider at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:154)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:283)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:336)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:302)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:435)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:525)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:517)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:399)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at
org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564) at
org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
at
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
at
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1516) at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1283) at
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1101) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:924) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:914) at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464)
at
org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by:
java.io.IOException: Couldn't create proxy provider class
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverPr\
oxyProvider at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:475)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:148)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:632) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:570) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:147)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:151)
... 45 more Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown
Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:458)
... 53 more Caused by: java.lang.OutOfMemoryError: GC overhead limit
exceeded at java.util.Arrays.copyOf(Arrays.java:2219) at
java.util.ArrayList.grow(ArrayList.java:242) at
java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) at
java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) at
java.util.ArrayList.add(ArrayList.java:440) at
java.lang.String.split(String.java:2288) at
sun.net.util.IPAddressUtil.textToNumericFormatV4(IPAddressUtil.java:47)
at java.net.InetAddress.getAllByName(InetAddress.java:1129) at
java.net.InetAddress.getAllByName(InetAddress.java:1098) at
java.net.InetAddress.getByName(InetAddress.java:1048) at
org.apache.hadoop.security.SecurityUtil$StandardHostResolver.getByName(SecurityUtil.java:474)
at
org.apache.hadoop.security.SecurityUtil.getByName(SecurityUtil.java:461)
at
org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:235)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:215)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
at
org.apache.hadoop.hdfs.DFSUtil.getAddressesForNameserviceId(DFSUtil.java:677)
at
org.apache.hadoop.hdfs.DFSUtil.getAddressesForNsIds(DFSUtil.java:645)
at org.apache.hadoop.hdfs.DFSUtil.getAddresses(DFSUtil.java:628) at
org.apache.hadoop.hdfs.DFSUtil.getHaNnRpcAddresses(DFSUtil.java:727)
at
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.(ConfiguredFailoverProxyProvider.java:88)
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown
Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:458)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:148)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:632) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:570) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:147)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) Job
Submission failed with exception
'java.lang.RuntimeException(java.io.IOException: Couldn't create proxy
provider class org.apac\
he.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider)'
Could anyone tell me why this happen?
I just stumbled across a similar exception myself, and increasing the hive client heap didn't help. I found I was able to clear up the OutOfMemory GC Overhead exception by adding a partition column to the where clause of the query, so I've concluded that having a very large number of splits is causing this exception. I haven't dug into the code, but I believe I've seen this happen with string concatenation in a loop triggering gc thrashing, and something similar might be happening in the CombineHiveInputFormat.getSplits method.