How to map filenames to RDD using sc.textFile("s3n://bucket/*.csv")?

How to map filenames to RDD using sc.textFile("s3n://bucket/*.csv")? - amazon-s3

Please note, I must use the sc.textFile, but I would accept any other answers.
What I want to do is to simply add the filename that is being processed to RDD.... some thing like:
var rdd = sc.textFile("s3n://bucket/*.csv").map(line=>filename+","+line)
Much appreciated!
EDIT2: SOLUTION TO EDIT1 is to use Hadoop 2.4 or above. However, I have not tested it by using the slaves... etc. However, some of the mentioned solutions work only for the small data-sets. If you want to use the big-data, you will have to use the HadoopRDD
EDIT: I have tried the following, and it did not work:
:cp symjar/aws-java-sdk-1.9.29.jar
:cp symjar/aws-java-sdk-flow-build-tools-1.9.29.jar
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import com.amazonaws.auth._
def awsAccessKeyId = "AKEY"
def awsSecretAccessKey = "SKEY"
val hadoopConf = sc.hadoopConfiguration;
hadoopConf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3n.awsAccessKeyId", awsAccessKeyId)
hadoopConf.set("fs.s3n.awsSecretAccessKey", awsSecretAccessKey)
var rdd = sc.wholeTextFiles("s3n://bucket/dir/*.csv").map { case (filename, content) => filename }
rdd.count
NOTE: It is connecting to S3, and that is not an issue (as I have tested it many many times).
The error I get is:
INFO input.FileInputFormat: Total input paths to process : 4
java.io.FileNotFoundException: File does not exist: /RTLM-918/simple/t1-100.csv
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:489)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:267)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1511)
at org.apache.spark.rdd.RDD.collect(RDD.scala:813)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC.<init>(<console>:48)
at $iwC.<init>(<console>:50)
at <init>(<console>:52)
at .<init>(<console>:56)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The only text method that includes the file name is wholeTextFiles.
sc.wholeTextFiles(path).map { case (filename, content) => ... }

Just pass it in as a variable during the map (or set as an object property).
sc.textFile("s3n://bucket/"+fn+".csv").map(line=>set_filepath(line,fn))

If you are dealing with big-data HadoopRDD is then the answer. Otherwise, with other suggestions, it won't work.
Code:
val text = sc.hadoopFile("s3n://.../", classOf[TextInputFormat], classOf[LongWritable], classOf[Text])
// Cast to a HadoopRDD
val hadoopRdd = text.asInstanceOf[HadoopRDD[LongWritable, Text]]
val fileAndLine = hadoopRdd.mapPartitionsWithInputSplit { (inputSplit, iterator) ⇒
val file = inputSplit.asInstanceOf[FileSplit]
iterator.map { tpl ⇒ (file.getPath, tpl._2) }
}

Related

How to Fix "java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getLong(OrcColumnVector.java:141)"

I'm trying to combine all columns in dataframe to one column named value.
Mycode :
val df = sparkSession.sql(sql)
val dfwithValue = df.withColumn("value",df.col("topic"))
dfwithValue.selectExpr("CAST(value AS STRING)").show() // no error
import org.apache.spark.sql.functions._
val cols = df.columns.map({ col =>
df.col(col)
}).toSeq
val newdf = df.withColumn("value", struct(cols : _*))
newdf.selectExpr("CAST(value AS STRING)").show() // error
when I use the second way, I encounter error
Caused by: java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getLong(OrcColumnVector.java:141)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Can someone help?

I do not get errors with either approach but I find the solution overly complicated for the task. Also this code returns a dataframe of rows of Row[WarappedArray[String]] rather than Row[String]
try:
df.map(_.mkString("")).toDF("value").show()

Null pointer exception also comes up when the value is not accessible, check the scope of variable to fix the issue.

Back-End (JVM) internal error. Why do i get this IDE error but others with the same code do not?

I'm part of a group that is developing a program in Kotlin. I have recently pulled fresh code off the development branch. The problem is i get this strange error. I am the only person that gets it; my groupmates have the same code and it runs fine for them.
I've tried googling for the error. I didn't find any help as it is quite a specific one. Plus like i said my groupmates do not get this error. It is therefore probably not related to the code.
The error i get is this:
Error:Kotlin: [Internal Error] java.lang.IllegalStateException: Backend Internal error: Exception during code generation
Cause: Back-end (JVM) Internal error: Error type encountered: [ERROR : For SuccessOrFailure] (ErrorType).
Cause: Error type encountered: [ERROR : For SuccessOrFailure] (ErrorType).
File being compiled at position: (32,28) in C:/Users/Gebruiker/Desktop/Repo/game/src/main/kotlin/nl/han/asd/a1/network/networkstates/EndRoundState.kt
The root cause was thrown at: KotlinTypeMapper.java:116
File being compiled at position: file://C:/Users/Gebruiker/Desktop/Repo/game/src/main/kotlin/nl/han/asd/a1/network/networkstates/EndRoundState.kt
The root cause was thrown at: ExpressionCodegen.java:322
at org.jetbrains.kotlin.codegen.CompilationErrorHandler.lambda$static$0(CompilationErrorHandler.java:24)
at org.jetbrains.kotlin.codegen.PackageCodegenImpl.generate(PackageCodegenImpl.java:74)
at org.jetbrains.kotlin.codegen.DefaultCodegenFactory.generatePackage(CodegenFactory.kt:97)
at org.jetbrains.kotlin.codegen.DefaultCodegenFactory.generateModule(CodegenFactory.kt:68)
at org.jetbrains.kotlin.codegen.KotlinCodegenFacade.doGenerateFiles(KotlinCodegenFacade.java:47)
at org.jetbrains.kotlin.codegen.KotlinCodegenFacade.compileCorrectFiles(KotlinCodegenFacade.java:39)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinToJVMBytecodeCompiler.generate(KotlinToJVMBytecodeCompiler.kt:446)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinToJVMBytecodeCompiler.compileModules$cli(KotlinToJVMBytecodeCompiler.kt:142)
at org.jetbrains.kotlin.cli.jvm.K2JVMCompiler.doExecute(K2JVMCompiler.kt:161)
at org.jetbrains.kotlin.cli.jvm.K2JVMCompiler.doExecute(K2JVMCompiler.kt:57)
at org.jetbrains.kotlin.cli.common.CLICompiler.execImpl(CLICompiler.java:96)
at org.jetbrains.kotlin.cli.common.CLICompiler.execImpl(CLICompiler.java:52)
at org.jetbrains.kotlin.cli.common.CLITool.exec(CLITool.kt:93)
at org.jetbrains.kotlin.daemon.CompileServiceImpl$compile$$inlined$ifAlive$lambda$1.invoke(CompileServiceImpl.kt:402)
at org.jetbrains.kotlin.daemon.CompileServiceImpl$compile$$inlined$ifAlive$lambda$1.invoke(CompileServiceImpl.kt:101)
at org.jetbrains.kotlin.daemon.CompileServiceImpl$doCompile$$inlined$ifAlive$lambda$2.invoke(CompileServiceImpl.kt:937)
at org.jetbrains.kotlin.daemon.CompileServiceImpl$doCompile$$inlined$ifAlive$lambda$2.invoke(CompileServiceImpl.kt:101)
at org.jetbrains.kotlin.daemon.common.DummyProfiler.withMeasure(PerfUtils.kt:137)
at org.jetbrains.kotlin.daemon.CompileServiceImpl.checkedCompile(CompileServiceImpl.kt:977)
at org.jetbrains.kotlin.daemon.CompileServiceImpl.doCompile(CompileServiceImpl.kt:936)
at org.jetbrains.kotlin.daemon.CompileServiceImpl.compile(CompileServiceImpl.kt:400)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:359)
at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:562)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:796)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:677)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:676)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.jetbrains.kotlin.codegen.CompilationException: Back-end (JVM) Internal error: Error type encountered: [ERROR : For SuccessOrFailure] (ErrorType).
Cause: Error type encountered: [ERROR : For SuccessOrFailure] (ErrorType).
File being compiled at position: (32,28) in C:/Users/Gebruiker/Desktop/Repo/game/src/main/kotlin/nl/han/asd/a1/network/networkstates/EndRoundState.kt
The root cause was thrown at: KotlinTypeMapper.java:116
at org.jetbrains.kotlin.codegen.ExpressionCodegen.genQualified(ExpressionCodegen.java:322)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.genQualified(ExpressionCodegen.java:281)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.gen(ExpressionCodegen.java:354)
at org.jetbrains.kotlin.codegen.CallGenerator$DefaultCallGenerator.genValueAndPut(CallGenerator.kt:68)
at org.jetbrains.kotlin.codegen.CallBasedArgumentGenerator.generateExpression(CallBasedArgumentGenerator.java:58)
at org.jetbrains.kotlin.codegen.ArgumentGenerator.generate(ArgumentGenerator.kt:68)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.invokeMethodWithArguments(ExpressionCodegen.java:2461)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.invokeMethodWithArguments(ExpressionCodegen.java:2433)
at org.jetbrains.kotlin.codegen.Callable$invokeMethodWithArguments$1.invoke(Callable.kt:41)
at org.jetbrains.kotlin.codegen.Callable$invokeMethodWithArguments$1.invoke(Callable.kt:13)
at org.jetbrains.kotlin.codegen.OperationStackValue.putSelector(StackValue.kt:79)
at org.jetbrains.kotlin.codegen.StackValueWithLeaveTask.putSelector(StackValue.kt:67)
at org.jetbrains.kotlin.codegen.StackValue.put(StackValue.java:112)
at org.jetbrains.kotlin.codegen.StackValue.put(StackValue.java:101)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.putStackValue(ExpressionCodegen.java:378)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.gen(ExpressionCodegen.java:363)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.gen(ExpressionCodegen.java:358)
at org.jetbrains.kotlin.codegen.MemberCodegen.generateInitializers(MemberCodegen.java:493)
at org.jetbrains.kotlin.codegen.ConstructorCodegen.generatePrimaryConstructorImpl(ConstructorCodegen.java:213)
at org.jetbrains.kotlin.codegen.ConstructorCodegen.access$000(ConstructorCodegen.java:41)
at org.jetbrains.kotlin.codegen.ConstructorCodegen$1.doGenerateBody(ConstructorCodegen.java:97)
at org.jetbrains.kotlin.codegen.FunctionGenerationStrategy$CodegenBased.generateBody(FunctionGenerationStrategy.java:84)
at org.jetbrains.kotlin.codegen.FunctionCodegen.generateMethodBody(FunctionCodegen.java:674)
at org.jetbrains.kotlin.codegen.FunctionCodegen.generateMethodBody(FunctionCodegen.java:435)
at org.jetbrains.kotlin.codegen.FunctionCodegen.generateMethod(FunctionCodegen.java:266)
at org.jetbrains.kotlin.codegen.ConstructorCodegen.generatePrimaryConstructor(ConstructorCodegen.java:93)
at org.jetbrains.kotlin.codegen.ImplementationBodyCodegen.generateConstructors(ImplementationBodyCodegen.java:462)
at org.jetbrains.kotlin.codegen.ClassBodyCodegen.generateBody(ClassBodyCodegen.java:83)
at org.jetbrains.kotlin.codegen.MemberCodegen.generate(MemberCodegen.java:128)
at org.jetbrains.kotlin.codegen.MemberCodegen.genClassOrObject(MemberCodegen.java:302)
at org.jetbrains.kotlin.codegen.MemberCodegen.genClassOrObject(MemberCodegen.java:286)
at org.jetbrains.kotlin.codegen.PackageCodegenImpl.generateClassOrObject(PackageCodegenImpl.java:161)
at org.jetbrains.kotlin.codegen.PackageCodegenImpl.generateClassesAndObjectsInFile(PackageCodegenImpl.java:86)
at org.jetbrains.kotlin.codegen.PackageCodegenImpl.generateFile(PackageCodegenImpl.java:119)
at org.jetbrains.kotlin.codegen.PackageCodegenImpl.generate(PackageCodegenImpl.java:66)
... 36 more
Caused by: java.lang.IllegalStateException: Error type encountered: [ERROR : For SuccessOrFailure] (ErrorType).
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper$1.processErrorType(KotlinTypeMapper.java:116)
at org.jetbrains.kotlin.load.kotlin.TypeSignatureMappingKt.mapType(typeSignatureMapping.kt:91)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.mapType(KotlinTypeMapper.java:512)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.writeParameterType(KotlinTypeMapper.java:1518)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.writeParameter(KotlinTypeMapper.java:1488)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.writeParameter(KotlinTypeMapper.java:1477)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.lambda$mapSignatureWithCustomParameters$4(KotlinTypeMapper.java:1295)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:4745)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:4753)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.mapSignatureWithCustomParameters(KotlinTypeMapper.java:1293)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.mapSignature(KotlinTypeMapper.java:1212)
at org.jetbrains.kotlin.codegen.state.KotlinTypeMapper.mapSignatureWithGeneric(KotlinTypeMapper.java:1171)
at org.jetbrains.kotlin.codegen.FunctionGenerationStrategy.mapMethodSignature(FunctionGenerationStrategy.java:46)
at org.jetbrains.kotlin.codegen.FunctionCodegen.generateMethod(FunctionCodegen.java:204)
at org.jetbrains.kotlin.codegen.FunctionCodegen.generateMethod(FunctionCodegen.java:183)
at org.jetbrains.kotlin.codegen.coroutines.CoroutineCodegenForLambda.generateResumeImpl(CoroutineCodegen.kt:421)
at org.jetbrains.kotlin.codegen.coroutines.CoroutineCodegenForLambda.generateClosureBody(CoroutineCodegen.kt:234)
at org.jetbrains.kotlin.codegen.ClosureCodegen.generateBody(ClosureCodegen.java:166)
at org.jetbrains.kotlin.codegen.coroutines.CoroutineCodegenForLambda.generateBody(CoroutineCodegen.kt:242)
at org.jetbrains.kotlin.codegen.MemberCodegen.generate(MemberCodegen.java:128)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.genClosure(ExpressionCodegen.java:1022)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.genClosure(ExpressionCodegen.java:992)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.visitLambdaExpression(ExpressionCodegen.java:983)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.visitLambdaExpression(ExpressionCodegen.java:111)
at org.jetbrains.kotlin.psi.KtLambdaExpression.accept(KtLambdaExpression.java:39)
at org.jetbrains.kotlin.codegen.ExpressionCodegen.genQualified(ExpressionCodegen.java:299)
... 70 more
I'm sure this has to do with my IDE or some local setting. Again, my groupmates don't get this error. The file that is mentioned in the error, EndRoundState.kt, however looks like this. If this helps clarify my problem.
package nl.han.asd.a1.network.networkstates
import kotlinx.coroutines.GlobalScope
import kotlinx.coroutines.delay
import kotlinx.coroutines.launch
import nl.han.asd.a1.network.NetworkLogic
import nl.han.asd.a1.network.Player
import nl.han.asd.a1.network.networkmessages.NetworkMessage
import nl.han.asd.a1.network.networkmessages.NetworkMessageTypes.*
import nl.han.asd.a1.network.networkmessages.messagetypes.GameData
import nl.han.asd.a1.utilities.gameclock.IClock
class EndRoundState(networkLogic: NetworkLogic, var players: MutableList<Player>, hash: String, val clock: IClock) : NetworkState(networkLogic) {
private var rightstate = true
private val endRoundDuration = 5000 //How long should the program be in this state? default = 5000
private val checkTimerInterval = 500L //How often should we check if the timer has expired. default = 500L
init {
var hashes: MutableList<String> = mutableListOf()
players.forEach {
if (it.hash != null) {
hashes.add(it.hash.toString())
}
}
val rightHash = getRightHash(hashes)
if (hash == rightHash) {
this.networkLogic.correctGameState(hash)
} else {
rightstate = false
}
GlobalScope.launch {
//launch a coroutine that will run alongside the other code. Think of it as a thread-lite. This will change the state after endRoundDuration expires
val endTime: Long = clock.getCurrentTime() + endRoundDuration
while (true) {
delay(checkTimerInterval)
if (clock.getCurrentTime() >= endTime) {
networkLogic.startNewRound()
return#launch
}
}
}
}
override fun handleMessage(message: NetworkMessage, ip: String) {
when (message.networkMessageType) {
ROUND_IS_OVER -> ignore()
CONNECT_REQUEST -> ignore()
CONNECT_RESPONSE -> ignore()
GAME_ANNOUNCE -> ignore()
GAME_DATA -> {
if (!rightstate) {
val gameData = message as GameData
this.networkLogic.setGameState(gameData.data.game)
}
}
INITIATOR_MESSAGE -> ignore()
MOVE -> TODO()
RECONNECT_REQUEST -> TODO()
}
}
private fun ignore() {
}
private fun getRightHash(hashes: MutableList<String>): String {
val frequenciesByHash = hashes.groupingBy { it }.eachCount()
var highestCount = 0
var rightHash: String? = null
frequenciesByHash.forEach {
if (it.value > highestCount) {
highestCount = it.value
rightHash = it.key
}
}
return run {if(rightHash.isNullOrEmpty()) "" else rightHash!!}
}
}
I just want the code to compile on my machine, like it does for my collegues. It does compile through Maven, just not through IntelliJ.
Thank you very much!

I fixed by just upgrading kotlin plugin. My plugin version currently is 1.3.41-release-IJ2018.2-1

I resolved this issue by uninstalling IntelliJ including all settings/plugins and reinstalling.
Uninstalling the IDE without removing settings/plugins did not work.

I've encountered this in one of our test-cases where we had a function with a long name.
Granted this development machine was a Windows 10 machine, and Windows has a history of having problems with long filenames, https://community.spiceworks.com/topic/2006950-file-path-too-long-shortening-names-is-only-the-solution.
Try to see if the filename or the function names is too long and shorten it.
It certainly helped me to reduce the 93 character test function name to 70 characters.
Also check to see if you are using weird characters and emojis, some times they can mess up the file generation.
Have fun and be safe out there.

In my case, it was caused by using my custom suspend operator fun plusAssign and calling it by +=. When I replace the += by explicit plusAssign, it compiles fine. I can also use the same += elsewhere just fine. No idea what's going on.

Run-time error of Spark code in Intellij

Running spark code in IDEA Intellij is painful as a new Spark/Intellij user.
I googled many pages, but didn't find a solution to this.
Code is very simple as below.
I'm getting run-time error in this line:
val conf = new SparkConf().setAppName("Spark Pi")
import org.apache.spark.SparkConf
import scala.math.random
object HelloWorld {
def main(args: Array[String]) {
println("Hello World")
val conf = new SparkConf().setAppName("Spark Pi")
// val spark = new SparkContext(conf)
// val slices = if (args.length > 0) args(0).toInt else 3
// val n = 100000 * slices
// val count = spark.parallelize(1 to n, slices).map { i =>
// val x = random * 2 - 1
// val y = random * 2 - 1
// if (x*x + y*y < 1) 1 else 0
// }.reduce(_ + _)
// println("Pi is roughly " + 4.0 * count / n)
// val pi = 4.0 * count / n
// val ppi = spark.parallelize(Seq(pi))
// ppi.saveAsTextFile("/tmp/bryan/spark/output.pi")
// spark.stop()
}
}
The error message is :
"C:\Program Files\Java\jdk8\bin\java" -Didea.launcher.port=7534 "-Didea.launcher.bin.path=C:\Program Files (x86)\JetBrains\IntelliJ IDEA Community Edition 14.1.5\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk8\jre\lib\charsets.jar;C:\Program Files\Java\jdk8\jre\lib\deploy.jar;C:\Program Files\Java\jdk8\jre\lib\javaws.jar;C:\Program Files\Java\jdk8\jre\lib\jce.jar;C:\Program Files\Java\jdk8\jre\lib\jfr.jar;C:\Program Files\Java\jdk8\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk8\jre\lib\jsse.jar;C:\Program Files\Java\jdk8\jre\lib\management-agent.jar;C:\Program Files\Java\jdk8\jre\lib\plugin.jar;C:\Program Files\Java\jdk8\jre\lib\resources.jar;C:\Program Files\Java\jdk8\jre\lib\rt.jar;C:\Program Files\Java\jdk8\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk8\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk8\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk8\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk8\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk8\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk8\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk8\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk8\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk8\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk8\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk8\jre\lib\ext\zipfs.jar;D:\code\spark-cdh\analysis-jobs\example\target\scala-2.10\classes;C:\Users\spark39\.ivy2\cache\org.scala-lang\scala-compiler\jars\scala-compiler-2.10.0.jar;C:\Users\spark39\.sbt\boot\scala-2.10.4\lib\scala-library.jar;C:\Program Files (x86)\JetBrains\IntelliJ IDEA Community Edition 14.1.5\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain com.spark.example.test.HelloWorld
Hello World
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at com.samsungaustin.yac.spark.example.test.HelloWorld$.main(Test.scala:15)
at com.samsungaustin.yac.spark.example.test.HelloWorld.main(Test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Someone mentioned about adding classpath through 'Edit Configurations' menu, but can anybody tell exactly how I figure this out?
Thanks.

Starting with new technology can often be a bit painful, but it might lead to valuable new knowledge and experience. Do you use a Maven pom.xml file for building your project? I would advise you to use Maven or something similar to keep track of which libraries your code needs.
You can also try one of the existing examples that are available for Spark on the web site: http://spark.apache.org/examples.html. There is also a GitHub repository with even more examples: https://github.com/apache/spark/tree/master/examples. (This repository already has a Maven pom.xml file: https://github.com/apache/spark/blob/master/examples/pom.xml.)
The following page contains (hopefully) Useful Developer Tools: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
There are also several tutorials that use Spark, Scala, and IntelliJ IDEA:
https://medium.com/large-scale-data-processing/how-to-kick-start-spark-development-on-intellij-idea-in-4-steps-c7c8f5c2fe63#.y0uxk7gay
https://docs.sigmoidanalytics.com/index.php/Step_by_Step_instructions_on_how_to_build_Spark_App_with_IntelliJ_IDEA

How to enable SQL on SchemaRDD via the JDBC interface? (is it even possible ?)

UPDATING the problem statement
We are using spark 1.2.0 (Hadoop 2.4). We have defined SchemaRDDs using data files in HDFS and would like to enable querying these as tables via HiveServer2. We are encountering runtime exceptions while trying to saveAsTable and would like guidance on how to proceed.
Source code:
package foo.bar
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark._
import org.apache.spark.sql.hive._
object HiveDemo {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Demo")
val sc = new SparkContext(conf)
// sc is an existing SparkContext.
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
// Create an RDD
val zipRDD = sc.textFile("/model-inputs/all_zip_state.csv")
// The schema is encoded in a string
val schemaString = "ODSMEMBERID,ZIPCODE,STATE,TEST_SUPPLIERID,ratio_death_readm_low,ratio_death_readm_high,regions"
// Generate the schema based on the string of schema
val schema =
StructType(
schemaString.split(",").map(fieldName => StructField(fieldName, StringType, true)))
// Convert records of the RDD (zip) to Rows.
val rowRDD = zipRDD.map(_.split(",")).map(p => Row(p(0), p(1), p(2), p(3), p(4), p(5), ""))
// Apply the schema to the RDD.
val zipSchemaRDD = hiveContext.applySchema(rowRDD, schema)
// HiveContext's save as Table
zipSchemaRDD.saveAsTable("allzipstable")
}
}
spark-submit Command:
./bin/spark-submit --class foo.bar.HiveDemo --master yarn-cluster --jars /usr/lib/hive/lib/hive-metastore.jar,/usr/lib/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark-1.2.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 lib/datapipe_2.10-1.0.jar 10
Exception at runtime on Node:
15/01/29 22:35:50 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Unresolved plan found, tree:
'CreateTableAsSelect None, allzipstable, false, None
LogicalRDD [ODSMEMBERID#0,ZIPCODE#1,STATE#2,TEST_SUPPLIERID#3,ratio_death_readm_low#4,ratio_death_readm_high#5,regions#6], MappedRDD[3] at map at HiveDemo.scala:30
)
Exception in thread "Driver" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree:
'CreateTableAsSelect None, allzipstable, false, None
LogicalRDD [ODSMEMBERID#0,ZIPCODE#1,STATE#2,TEST_SUPPLIERID#3,ratio_death_readm_low#4,ratio_death_readm_high#5,regions#6], MappedRDD[3] at map at HiveDemo.scala:30
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)
at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)
at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)
at com.healthagen.datapipe.ahm.HiveDemo$.main(HiveDemo.scala:36)
at com.healthagen.datapipe.ahm.HiveDemo.main(HiveDemo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
15/01/29 22:35:50 INFO yarn.ApplicationMaster: Invoking sc stop from shutdown hook
Another attempt:
package foo.bar
import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql._
case class AllZips(
ODSMEMBERID: String,
ZIPCODE: String,
STATE: String,
TEST_SUPPLIERID: String,
ratio_death_readm_low: String,
ratio_death_readm_high: String,
regions: String)
object HiveDemo {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("HiveDemo")
val sc = new SparkContext(conf)
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
import hiveContext._
val allZips = sc.textFile("/model-inputs/all_zip_state.csv").map(_.split(",")).map(p => AllZips(p(0), p(1), p(2), p(3), p(4), p(5), ""))
val allZipsSchemaRDD = createSchemaRDD(allZips)
allZipsSchemaRDD.saveAsTable("allzipstable")
}
}
Exception on node:
15/01/30 00:28:19 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Unresolved plan found, tree:
'CreateTableAsSelect None, allzipstable, false, None
LogicalRDD [ODSMEMBERID#0,ZIPCODE#1,STATE#2,TEST_SUPPLIERID#3,ratio_death_readm_low#4,ratio_death_readm_high#5,regions#6], MapPartitionsRDD[4] at mapPartitions at ExistingRDD.scala:36
)
Exception in thread "Driver" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree:
'CreateTableAsSelect None, allzipstable, false, None
LogicalRDD [ODSMEMBERID#0,ZIPCODE#1,STATE#2,TEST_SUPPLIERID#3,ratio_death_readm_low#4,ratio_death_readm_high#5,regions#6], MapPartitionsRDD[4] at mapPartitions at ExistingRDD.scala:36
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)
at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)
at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)
at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)
at com.healthagen.datapipe.ahm.HiveDemo$.main(HiveDemo.scala:22)
at com.healthagen.datapipe.ahm.HiveDemo.main(HiveDemo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
15/01/30 00:28:19 INFO yarn.ApplicationMaster: Invoking sc stop from shutdown hook

You need to use a HiveContext
Here are the java/scala docs:
* Note that this currently only works with SchemaRDDs that are created from a HiveContext as
* there is no notion of a persisted catalog in a standard SQL context.
#Experimental
def saveAsTable(tableName: String): Unit =
sqlContext.executePlan(CreateTableAsSelect(None, tableName, logicalPlan, false)).toRdd
So in your code change it to:
val sc = new HiveContext(conf)
Actually you should rename it to
val sqlc = new HiveContext(conf)
FYI: more info about registering tables (in SQLContext): note the tables are transient if done this way:
/**
* Temporary tables exist only
* during the lifetime of this instance of SQLContext.
*
* #group userf
*/
def registerRDDAsTable(rdd: SchemaRDD, tableName: String): Unit = {
catalog.registerTable(Seq(tableName), rdd.queryExecution.logical)
}
UPDATE Your new stacktrace includes the following phrase:
Unresolved plan found, tree:
That typically means you have a column that does not match the underlying table. I will look further to see if am able to isolate - but in the meantime you might also consider from that perspective.

createSchemaRDD code snippet from above works fine on spark 1.2.1
There was a CTAS defect in 1.2.0

Why am i getting this error java.lang.ClassNotFoundException?

Why am i getting this error message?
Testcase: createIndexBatch_usingTheTestArchive(src.LireHandlerTest): Caused an ERROR
at/lux/imageanalysis/ColorLayoutImpl
java.lang.NoClassDefFoundError: at/lux/imageanalysis/ColorLayoutImpl
at net.semanticmetadata.lire.impl.SimpleDocumentBuilder.createDocument(Unknown Source)
at net.semanticmetadata.lire.AbstractDocumentBuilder.createDocument(Unknown Source)
at backend.storage.LireHandler.createIndexBatch(LireHandler.java:49)
at backend.storage.LireHandler.createIndexBatch(LireHandler.java:57)
at src.LireHandlerTest.createIndexBatch_usingTheTestArchive(LireHandlerTest.java:56)
Caused by: java.lang.ClassNotFoundException: at.lux.imageanalysis.ColorLayoutImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
I am trying to create a Lucene index using documents that are created using Lire. When i am getting to the point where it tries to create a document using the document builder it gives me this error message.
input paramaters first run:
filepath: "test_archive" (this is a image archive)
whereToStoreIndex: "test_index" (Where i am going to store the index)
createNewIndex: true
Since the method is recursive (see the if statement where it checks if is a directory) it will call itself multiple times but all the recursive calls uses createNewIndex = false.
Here is the code:
public static boolean createIndexBatch(String filepath, String whereToStoreIndex, boolean createNewIndex){
DocumentBuilder docBldr = DocumentBuilderFactory.getDefaultDocumentBuilder();
try{
IndexWriter indexWriter = new IndexWriter(
FSDirectory.open(new File(whereToStoreIndex)),
new SimpleAnalyzer(),
createNewIndex,
IndexWriter.MaxFieldLength.UNLIMITED
);
File[] files = FileHandler.getFilesFromDirectory(filepath);
for(File f:files){
if(f.isFile()){
//Hopper over Thumbs.db filene...
if(!f.getName().equals("Thumbs.db")){
//Creating the document that is going to be stored in the index
String name = f.getName();
String path = f.getPath();
FileInputStream stream = new FileInputStream(f);
Document doc = docBldr.createDocument(stream, f.getName());
//Add document to the index
indexWriter.addDocument(doc);
}
}else if(f.isDirectory()){
indexWriter.optimize();
indexWriter.close();
LireHandler.createIndexBatch(f.getPath(), whereToStoreIndex, false);
}
}
indexWriter.close();
}catch(IOException e){
System.out.print("IOException in createIndexBatch:\n"+e.getMessage()+"\n");
return false;
}
return true;
}

It sounds like you are missing a java library.
I think you need to download Caliph & Emir, that may be used indirectly by Lire.
http://www.semanticmetadata.net/download/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to map filenames to RDD using sc.textFile("s3n://bucket/*.csv")? - amazon-s3

The only text method that includes the file name is wholeTextFiles. sc.wholeTextFiles(path).map { case (filename, content) => ... }

Just pass it in as a variable during the map (or set as an object property). sc.textFile("s3n://bucket/"+fn+".csv").map(line=>set_filepath(line,fn))

Related

How to Fix "java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getLong(OrcColumnVector.java:141)"

Back-End (JVM) internal error. Why do i get this IDE error but others with the same code do not?

Run-time error of Spark code in Intellij

How to enable SQL on SchemaRDD via the JDBC interface? (is it even possible ?)

Why am i getting this error java.lang.ClassNotFoundException?

Categories

Resources