Trouble creating the dataframe in spark - apache-spark-sql

training = spark.read.format("libsvm").load("sample_linear_regression_data.txt")
where sample_linear_regression_data.txt is the data and the sample data is
-9.490009878824548 1:0.4551273600657362 2:0.36644694351969087 3:-0.38256108933468047 4:-0.4458430198517267 5:0.33109790358914726 6:0.8067445293443565 7:-0.2624341731773887
its in libsvm format but loading the data with the above syntax gives me error as follows
Py4JJavaError: An error occurred while calling o68.load.
: java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$35.apply(RDD.scala:1037)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$35.apply(RDD.scala:1037)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1037)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1017)
at org.apache.spark.mllib.util.MLUtils$.computeNumFeatures(MLUtils.scala:94)
at org.apache.spark.ml.source.libsvm.LibSVMFileFormat$$anonfun$1.apply$mcI$sp(LibSVMRelation.scala:104)
at org.apache.spark.ml.source.libsvm.LibSVMFileFormat$$anonfun$1.apply(LibSVMRelation.scala:95)
at org.apache.spark.ml.source.libsvm.LibSVMFileFormat$$anonfun$1.apply(LibSVMRelation.scala:95)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.source.libsvm.LibSVMFileFormat.inferSchema(LibSVMRelation.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:844)

i had the same problem probably you have a space somewhere in the name of your folder

It worked for me when I copied the dataset to the directory where my notebook is present. However, it does not work when I set a different filepath and try to load from that filepath

Related

Why do I get "File could only be replicated to 0 nodes" when writing to a partitioned table?

I create an external table in Hive with partitions and then try to populate it from the existing table, however, I get the following exceptions:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/pavel.db/browserdatapart/.hive-staging_hive_2018-12-28_13-22-45_751_6056004898772238481-1/_task_tmp.-ext-10000/cityid=1/_tmp.000001_3 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:814)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/pavel.db/browserdatapart/.hive-staging_hive_2018-12-28_13-22-45_751_6056004898772238481-1/_task_tmp.-ext-10000/cityid=1/_tmp.000001_3 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)
at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1580)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1375)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:
According to the internet these exceptions occur when datanode can't communicate with namenode or when you are running low on memory, but in my case everything is fine. I have already tried formatting my namenode and datanode as well. What else could be the issue?
https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo I've also read this. And it didn't help me.
I am running on tez. this works:
insert into table browserdatapart partition(cityid) select UserAgent,cityid from browserdata limit 100;
And this fails with the exception I provided:
insert into table browserdatapart partition(cityid) select UserAgent,cityid from browserdata;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;
Setting the above parameters solved it for me. I guess hive was not able to replicate data to those partitions that show up in the exception, because there were more than the maximum (which is 224 in the case of my dataset).

Hue cannot load Presto Schema dbc does not exist

I just built Hue-4.1.0 and it's able to execute Presto sql via jdbc. Here is my hue-presto config:
[notebook]
[[interpreters]]
[[[presto-jdbc]]]
name=Presto-jdbc
interface=jdbc
options='{"url": "jdbc:presto://localhost:18080/platform_data/platform_data", "driver": "com.facebook.presto.jdbc.PrestoDriver", "user": "analysis"}'
# Doesn't work either
# options='{"url": "jdbc:presto://localhost:18080/platform_data/", "driver": "com.facebook.presto.jdbc.PrestoDriver", "user": "analysis"}'
Query Presto is OK but when I click the refresh button see pic:reflash, I got this:
An error occurred while calling o237.execute. : java.sql.SQLException: Query failed (#20180605_105344_00147_ugf87): line 1:26: Schema dbc does not exist at com.facebook.presto.jdbc.PrestoResultSet.resultsException(PrestoResultSet.java:1798) at com.facebook.presto.jdbc.PrestoResultSet.getColumns(PrestoResultSet.java:1742) at com.facebook.presto.jdbc.PrestoResultSet.(PrestoResultSet.java:118) at com.facebook.presto.jdbc.PrestoStatement.internalExecute(PrestoStatement.java:246) at com.facebook.presto.jdbc.PrestoStatement.execute(PrestoStatement.java:225) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: com.facebook.presto.sql.analyzer.SemanticException: line 1:26: Schema dbc does not exist at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:845) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:258) at com.facebook.presto.sql.tree.Table.accept(Table.java:53) at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:270) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:1919) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:957) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:258) at com.facebook.presto.sql.tree.QuerySpecification.accept(QuerySpecification.java:127) at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:270) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:280) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:676) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:258) at com.facebook.presto.sql.tree.Query.accept(Query.java:94) at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27) at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:270) at com.facebook.presto.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:244) at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:72) at com.facebook.presto.sql.analyzer.Analyzer.analyze(Analyzer.java:64) at com.facebook.presto.execution.SqlQueryExecution.(SqlQueryExecution.java:169) at com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:660) at com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:582) at com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:417) at com.facebook.presto.server.protocol.Query.(Query.java:186) at com.facebook.presto.server.protocol.Query.create(Query.java:153) at com.facebook.presto.server.protocol.StatementResource.createQuery(StatementResource.java:144) at sun.reflect.GeneratedMethodAccessor674.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:183) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) at org.glassfish.jersey.internal.Errors.process(Errors.java:316) at org.glassfish.jersey.internal.Errors.process(Errors.java:298) at org.glassfish.jersey.internal.Errors.process(Errors.java:268) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at com.facebook.presto.server.security.AuthenticationFilter.doFilter(AuthenticationFilter.java:69) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at io.airlift.http.server.TraceTokenFilter.doFilter(TraceTokenFilter.java:64) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at io.airlift.http.server.TimingFilter.doFilter(TimingFilter.java:52) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:674) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:755) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:673) at java.lang.Thread.run(Thread.java:748)
Some person got the same problem as well:
https://community.cloudera.com/t5/Web-UI-Hue-Beeswax/HUE-and-Presto-Integration-Error-on-Page-Load/td-p/62353
I am hoping for your help!
check this issue on GitHub: JDBC connector only supports Teradata.
seems it's been fixed in version 4.3.0;
you could try using [librdbms] section also

Spark 2.2 toPandas() is returning Py4JError

Currently refactoring some PySpark code and this specific snippet is causing me issues whereas it has ran fine before:
zip_table=spark.read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").schema(schemas.zip_schema).load(path_to_file,header=True)
zip_pandas = zip_table.toPandas()
running into:
Py4JJavaError: An error occurred while calling o167.get.
: java.util.NoSuchElementException: spark.sql.execution.pandas.respectSessionTimeZone
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1175)
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1175)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1175)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Other Spark Dataframes are successfully able to call toPandas() while others such as this return the same error.
The file in question is not particularly big and should be able to fit in the driver no problem. Also, I know it may be better to just load it into Pandas directly and not convert from Spark, but there is further logic in the code that requires both Spark/Pandas dataframes (and a later re-architecture of this work will address this issue).

json schema validator throwing exception when empty array is present in response

I am getting following exception if I place dummy array. eg, in Security/RelatedSecurities/RelatedSecurities. I tried to build a json schema from the response where it is empty. But if I use json validator it gives sytax error "array must have atleast one element". So I used the dummy array elements. After that I do not get syntax error but the validation fails with following error. Actual not matching with expected.
I will be happy to share the response and json schema if anyone is willing to help.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:80)
at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:74)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
at com.jayway.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure.validate(ResponseSpecificationImpl.groovy:598)
at com.jayway.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure$validate$1.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at com.jayway.restassured.internal.ResponseSpecificationImpl.validateResponseIfRequired(ResponseSpecificationImpl.groovy:760)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:210)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:59)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:154)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:166)
at com.jayway.restassured.internal.ResponseSpecificationImpl.content(ResponseSpecificationImpl.groovy:92)
at com.jayway.restassured.specification.ResponseSpecification$content$0.callCurrent(Unknown Source)

No X11 DISPLAY variable was set, but this program performed an operation which requires it

I have some tests that recently failed with the following reason: No X11 DISPLAY variable was set, but this program performed an operation which requires it.
Here's the complete stack trace:
testGetDialog(simple.marauroa.application.core.IAddApplicationDialogProviderTest) Time elapsed: 112 sec <<< ERROR!
java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:159)
at java.awt.Window.<init>(Window.java:431)
at java.awt.Frame.<init>(Frame.java:403)
at java.awt.Frame.<init>(Frame.java:368)
at javax.swing.SwingUtilities$SharedOwnerFrame.<init>(SwingUtilities.java:1733)
at javax.swing.SwingUtilities.getSharedOwnerFrame(SwingUtilities.java:1810)
at javax.swing.JDialog.<init>(JDialog.java:253)
at javax.swing.JDialog.<init>(JDialog.java:187)
at javax.swing.JDialog.<init>(JDialog.java:135)
at simple.marauroa.application.core.IAddApplicationDialogProviderTest$IAddApplicationDialogProviderImpl.getDialog(IAddApplicationDialogProviderTest.java:97)
at simple.marauroa.application.core.IAddApplicationDialogProviderTest.testGetDialog(IAddApplicationDialogProviderTest.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
In the mentioned line of code is just this:
return new JDialog();
I guess is related to the Jenkins environment not having something enabled. How can this get fixed?
I can disable that test, but why should I?
You need to run your build inside a virtual (headless) graphical environment, using XVNC (or XVFB). See wiki.cloudbees.com/bin/view/DEV/Testing+GUI+applications.
Edit 20/01/2016:
The link above no longer takes you to a page with relevant information. This link shows the page as it was at the time: https://web.archive.org/web/20120717015714/http://wiki.cloudbees.com/bin/view/DEV/Testing+GUI+applications
You might also consider refactoring your code a bit to use a mock service so that tests do not need to load AWT. This will likely make tests faster and more reliable, as well as minimizing the chance that they will fail in novel environments such as a CI server.