Duplicate entry in Aspectj - apache-pig

When I try to weave the jar file using Aspectj code, I am getting
java.util.zip.ZipException: duplicate entry org/apache/pig/backend/hadoop/execu
tionengine/physicalLayer/expressionOperators/Add.class
I used the following command to weave the jar file
ajc -inpath C:\pig.jar -aspectpath C:\Aspects.jar -extdirs C:\libs -outjar C:\pig\pig.jar
Can any one tell me why?

Well, first of all ajc does not know any -extdirs paramater, AFAIK.
I also find it rather strange that your outjar has the same name as the injar, just in another subdirectory. This makes it easy to mistake one for another when weaving the next time or just using the library.
The "duplicate entry" thing maybe comes from the same class being contained in pig.jar and Aspects.jar. The "maybe" was a wrong guess, see discussion between Andy Clement and me in the comments section below. The true cause is described in update 2.
Update:
So how can it happen that the exception you mention happens while packaging the outjar?
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.jar.JarOutputStream;
import java.util.zip.ZipEntry;
public class ZipExceptionDemo {
public static void main(String[] args) throws IOException {
try (JarOutputStream stream = new JarOutputStream(new FileOutputStream("foo.jar"))) {
stream.putNextEntry(new ZipEntry("com/foo/One.class"));
stream.putNextEntry(new ZipEntry("com/foo/Two.class"));
stream.putNextEntry(new ZipEntry("com/foo/UhOh.class"));
stream.putNextEntry(new ZipEntry("com/foo/UhOh.class")); // uh-oh!
}
}
}
Exception in thread "main" java.util.zip.ZipException: duplicate entry: com/foo/UhOh.class
at java.util.zip.ZipOutputStream.putNextEntry(ZipOutputStream.java:215)
at java.util.jar.JarOutputStream.putNextEntry(JarOutputStream.java:109)
at ZipExceptionDemo.main(ZipExceptionDemo.java:12)
Now your job is to find out why you have the same class in multiple JARs you want to mix into one. You even know which one to search for: org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.
Update 2:
I have been able to reproduce the problem with two JARs or directories on the inpath containing the same class file. How you were able to get that error if your commandline really only has a single entry on the inpath as shown in your question, remains a mystery to me. Anyway, I have filed a bug ticket for the problem.

Related

Kotlin scripting support fails with "wrong number of arguments" whenever I try to run any script

I'm trying to run a very basic script with org.jetbrains.kotlin:kotlin-scripting-jvm, but I get two errors, when I should get none. This is my script:
1
I expect to get back a ResultWithDiagnostics.Success with a resultValue of 1 but instead I get a Failure, with these reports:
The expression is unused
wrong number of arguments
Even if I fix the warning by modifying my script to
class Foo(val foo: String = "foo")
Foo()
I still get the wrong number of arguments error. I checked the source and it seems that in
BasicJvmScriptEvaluator:95
return try {
ctor.newInstance(*args.toArray()) <-- here
} finally {
Thread.currentThread().contextClassLoader = saveClassLoader
}
args is empty. What am I doing wrong? This is how I try to run the script:
private fun evalFile(scriptFile: File): ResultWithDiagnostics<EvaluationResult> {
val compilationConfiguration = createJvmCompilationConfigurationFromTemplate<TestScript> {
jvm {
dependenciesFromCurrentContext(wholeClasspath = true)
}
}
return BasicJvmScriptingHost().eval(scriptFile.toScriptSource(), compilationConfiguration, null)
}
and this is the stack trace for this wrong number of arguments error I get:
java.lang.IllegalArgumentException: wrong number of arguments
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.evalWithConfigAndOtherScriptsResults(BasicJvmScriptEvaluator.kt:95)
at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke$suspendImpl(BasicJvmScriptEvaluator.kt:40)
at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke(BasicJvmScriptEvaluator.kt)
at kotlin.script.experimental.host.BasicScriptingHost$eval$1.invokeSuspend(BasicScriptingHost.kt:47)
at kotlin.script.experimental.host.BasicScriptingHost$eval$1.invoke(BasicScriptingHost.kt)
at kotlin.script.experimental.host.BasicScriptingHost$runInCoroutineContext$1.invokeSuspend(BasicScriptingHost.kt:35)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:238)
at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.kt:116)
at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:80)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:54)
at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:36)
at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
at kotlin.script.experimental.host.BasicScriptingHost.runInCoroutineContext(BasicScriptingHost.kt:35)
at kotlin.script.experimental.host.BasicScriptingHost.eval(BasicScriptingHost.kt:45)
This isn't fix, just workaround.
You can pass the source code to Kotlin Compiler by the different ways:
From FileScriptSource - when you pass list of files in the configuration
From the list of source code content in memory - e.g. each file should be read and content should be placed inside StringScriptSource
From the single memory script, which is created just with all input source files concatenation.
As I found in my experiments:
If you have mockk+kotest jars in the same classpath, option 1 doesn't work. In that case I'd like to assume for you to make one change:
// this doesn't work - scriptFile.toScriptSource()
scriptFile.readText().toScriptSource() // ok - we read source from memory, not from file
If you have huge service with a lot of Spring jars, all options above work. It means that you couldn't test your compilation in the unit tests, however your service will work!
If you want to do compilation from Gradle Plugin, you can catch another kind of issue - class conflict with coroutines library, so all options above don't work.
Finally, I changed the following in my code:
On input I always have a lot of kt/kts files.
I have three compilation options (described above). So my code executes createJvmCompilationConfigurationFromTemplate with the different logic, according on my compilation mode (it is just enum).
For unit tests I have to use option 3 only.
For service I use the first option as it is the fastest one
For gradle plugin classpath I start separate instance of java (with fresh classpath) which executes the input kts files.

Using sun.reflect package with openjdk11

Is there a way to use sun.reflect in OpenJDK11, by maybe adding something in "--add-exports"? Our code fails since a jide pkg internally uses sun.reflect package and I'm trying to see if there's a way to make it work.
I've already tried with the below but that doesn't help.
"--add-exports jdk.unsupported/sun.reflect=ALL-UNNAMED"
Here's the exception, where the underlying class references sun.reflect.Reflection
java.lang.NoClassDefFoundError: sun/reflect/Reflection
I had this problem and fixed it by using a newer version of jide. Changing from jide-whatever:3.2.3 to jide-whatever:3.7.6 was enough to make it work in my case.
If you cannot migrate to newer versions, the solution is to make a wrapper around Throwable().getStackTrace()[n].getClass() and put it in WEB-INF/classes folder
This is simple workaround. It works in many cases.
package sun.reflect;
public class Reflection {
public static Class<?> getCallerClass(int n){
StackTraceElement[] elements = new Throwable().getStackTrace();
return elements[n].getClass() ;
}
}
https://github.com/rafaljot/NoClassDefFoundError-sun-reflect-Reflection
It can be fixed when you update the version of the jars.

NullPointerException caught when writing to BigTable using Apache Beam's dataflow sdk

I'm using Apache's Beam sdk version 0.2.0-incubating-SNAPSHOT
and trying to pull data to a bigtable with the Dataflow runner. Unfortunately I'm getting NullPointerException when executing my dataflow pipeline where I'm using BigTableIO.Write as my sink. Already checked my BigtableOptions and the parameters are fine, according to my needs.
Basically, I create and in some point of my pipeline I have the step to write the PCollection<KV<ByteString, Iterable<Mutation>>> to my desired bigtable:
final BigtableOptions.Builder optionsBuilder =
new BigtableOptions.Builder().setProjectId(System.getProperty("PROJECT_ID"))
.setInstanceId(System.getProperty("BT_INSTANCE_ID"));
// do intermediary steps and create PCollection<KV<ByteString, Iterable<Mutation>>>
// to write to bigtable
// modifiedHits is a PCollection<KV<ByteString, Iterable<Mutation>>>
modifiedHits.apply("writting to big table", BigtableIO.write()
.withBigtableOptions(optionsBuilder).withTableId(System.getProperty("BT_TABLENAME")));
p.run();
When executing the pipeline, I got the NullPointerException, pointing out exactly to the BigtableIO class at the public void processElement(ProcessContext c) method:
(6e0ccd8407eed08b): java.lang.NullPointerException at org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.processElement(BigtableIO.java:532)
I checked this method is processing all elements before to write on bigtable, but not sure why I'm getting such exception overtime I execute this pipeline. According to the code below, this method uses bigtableWriter attribute to process each c.element(), but I can't even set a breakpoint to debug where is exactly the null. Any kind of advice or suggestion of how to solve this issue?
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
checkForFailures();
Futures.addCallback(
bigtableWriter.writeRecord(c.element()), new WriteExceptionCallback(c.element()));
++recordsWritten;
}
Thanks.
I looked up the job and its classpath, and if I'm not mistaken it looks like you're using version 0.3.0-incubating-SNAPSHOT of beam-sdks-java-{core,io}, but version 0.2.0-incubating-SNAPSHOT of google-cloud-dataflow-java.
I believe the issue is because of this - you have to use the same version (more details: BigtableIO in version 0.3.0 uses #Setup and #Teardown methods, but runner 0.2.0 does not support them yet).

Scala with spark - "javax.servlet.ServletRegistration"'s signer information does not match signer information of other classes in the same package

I have simple scala application with spark dependencies. I am just trying to create spark context using the follwing code.
def main(args: Array[String]) {
var sparkConfig : SparkConf = new SparkConf() ;
sparkConfig.setAppName("ProxySQL").setMaster("local");
var sc = new SparkContext(sparkConfig)
}
When i try to run this code inside main - it throws security execption at new SparkContext(sparkConfig) with the following message .
Exception in thread "main" java.lang.SecurityException: class "javax.servlet.ServletRegistration"'s signer information does not match signer information of other classes in the same package .
At problem tab of Eclipse, it shows one warning
Description Path Resource Location Type
More than one scala library found in the build path (D:/workspaces/scala/scalaEclipse/eclipse/plugins/org.scala-ide.scala210.jars_4.0.0.201503031935/target/jars/scala-library.jar, C:/Users/prems.bist/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar).This is not an optimal configuration, try to limit to one Scala library in the build path. SQLWrapper Unknown Scala Classpath Problem
I have scala installation of 2.10.4 at windows machine.
Scala compiler version set at eclipse is 2.10.5 . What is causing this security exception? Is this the incompatiblity version issues or what exaclty else? How would i solve it?
The problem was more or less related with conflicting dependencies.
The following task resolve my issue.
Go to Project
Build Path -> Order and Export tab -> Change the order of
javax.servlet jar
either to bottom or top.
This Resolved the problem.
Well,as I follow the suggestion:Go to Project Build Path -> Order and Export tab -> Change the order of javax.servlet jar either to bottom or top.
I find my buidpath libiaries was changed and it seems mussy(too many small libs),maybe this was caused by maven.
So I try to remove all of them and reimport the libs and chose Project -> Maven ->Update Project !
Now ,it goes well.
The name of your object with the main method shoul be the same as the setAppName("ProxySQL"), also you can exttend it with app and do not use main method, but this is only if you want I find it easy.
package spark.sample
import org.apache.spark.{ SparkContext, SparkConf }
/**
* Created by anquegi on 18/05/15.
*/
object ProxySQL {
def main(args: Array[String]) {
var sparkConfig: SparkConf = new SparkConf();
sparkConfig.setAppName("ProxySQL").setMaster("local");
var sc = new SparkContext(sparkConfig)
}
}
I normally use and object like for using Spark
package spark.sample
import org.apache.spark.{ SparkContext, SparkConf }
/**
* Created by anquegi on 18/05/15.
*/
object ProxySQL extends App {
val sparkConfig: SparkConf = new SparkConf();
sparkConfig.setAppName("ProxySQL").setMaster("local[4]");
val sc = new SparkContext(sparkConfig)
}
I prefer to use val instead of var
You can also setMaster with .setMaster("local[4]"), and not work only with one
It means you did not exclude the Servlet APIs from some dependency in your app, and one of them is bringing it in every time. Look at the dependency tree and exclude whatever brings in javax.servlet.
It should be already available in Spark, and the particular javax.servlet JAR from Oracle has signing info that you have to strip out, or simply exclude the whole thing.
some of the libraries were mention here

Hadoop: wrong classpath in map reduce job

I'm running a cloudera cluster in 3 virtual maschines and try to execute hbase bulk load via a map reduce job. But I got always the error:
error: Class org.apache.hadoop.hbase.mapreduce.HFileOutputFormat not found
So, it seems that the map process doesnt find the class. So I tried this:
1) add the hbase.jar to the HADOOP_CLASSPATH on every node
2) adding TableMapReduceUtil.addDependencyJars(job) / TableMapReduceUtil.addDependencyJars(myConf, HFileOutputFormat.class) to my source code
nothing worked. I have absolute no idea why the class is not found, because the jar/class is definitely available in the classpath.
If I take a look into the job.xml I see the following entry:
name=tmpjars value=file:/C:/Users/Thomas/.m2/repository/org/apache/zookeeper/zookeeper/3.4.5-cdh4.3.0/zookeeper-3.4.5-cdh4.3.0.jar,file:/C:/Users/Thomas/.m2/repository/org/apache/hbase/hbase/0.94.6-cdh4.3.0/hbase-0.94.6-cdh4.3.0.jar,file:/C:/Users/Thomas/.m2/repository/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.3.0/hadoop-core-2.0.0-mr1-cdh4.3.0.jar,file:/C:/Users/Thomas/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar,file:/C:/Users/Thomas/.m2/repository/com/google/protobuf/protobuf-java/2.4.0a/protobuf-java-2.4.0a.jar
This seems a little bit odd to me, these are my local jars on the windows system. Maybe this should be the hdfs jars? If yes, how can I change the values for "tmpjars"?
Here is the java code I try to execute:
configuration = new Configuration(false);
configuration.set("mapred.job.tracker", "192.168.2.41:8021");
configuration.set("fs.defaultFS", "hdfs://192.168.2.41:8020/");
configuration.set("hbase.zookeeper.quorum", "192.168.2.41");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
Job job = new Job(configuration, "HBase Bulk Import for "
+ tablename);
job.setJarByClass(HBaseKVMapper.class);
job.setMapperClass(HBaseKVMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(KeyValue.class);
job.setOutputFormatClass(HFileOutputFormat.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setInputFormatClass(TextInputFormat.class);
HFileOutputFormat.configureIncrementalLoad(job, hTable);
FileInputFormat.addInputPath(job, new Path("myfile1"));
FileOutputFormat.setOutputPath(job, new Path("myfile2"));
job.waitForCompletion(true);
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(
configuration);
loader.doBulkLoad(new Path("myFile3"), hTable);
EDIT:
I tried a little bit more and its totaly strange. I add the following line to the java code:
job.setJarByClass(HFileOutputFormat.class);
after I executed this, the error is gone, but another class not found exception appear:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class mypackage.bulkLoad.HBaseKVMapper not found
HBaseKVMapper is my custom Mapper Class I want to execute. So, I tried to add it with "job.setJarByClass(HBaseKVMapper.class)", but it doesnt work since its its only a class file and no jar. So I generated a Jarfile including HBaseKVMapper.class. After that, I executed it again and now got the HFileOutputFormat.class not found exception again.
After debugging a little bit, I found out that the setJarByClass Methode only copies the local jar file to .staging/job_#number/job.jar on HDFS. So, this setJarByClass() Method will only work for one jar file because it overwrites the job.jar after executing setJarByClass() again with another jar.
While searching for the eroor I saw the following strcuture in the the job staging direcotry:
and inside the libjars direcotry I saw the relevant jar files
so, the hbase jar is inside the libjars directory but the jobtracker doesn't use this it for executing the job. Why?
I would try using Cloudera Manager (free version) as it takes care of these issues for you. Otherwise note the following:
Both your own classes and the HBase Class HFileOutputFormat need to be available on the classpath locally and remotely.
Submitting the job
Meaning getting the classpath right locally for when your driver runs:
$ env HADOOP_CLASSPATH=$(hbase classpath) hadoop jar path/to/jar class....
On the server
In your hadoop-env.sh
export HADOOP_CLASSPATH=$(hbase claspath)
or use
TableMapReduceUtil.addDependencyJars
I found a "hacked" solution which worked for me, but I'm not happy with it because it's not really practicable.
My "hacked" solution:
create one big Jar with all necessary class files, I called it "big.jar" and add it to the local (eclipse) classpath
add the line: job.setJarByClass(MyMapperClass.class) ... the MyMapperClass has to be in the big.jar
When I execute this the big.jar will be copied for every job to the filesystem. No errors anymore. The problem is, that the jar is 80mb in size and have to be copied every time.
If anywone knows a better way I would be tahnkful if he could tell me how.
EDIT:
Now I try to execute jobs with Apache Pig and have exactly the same problem. My hacked soultion doesn't work in this case because pig creats the jobs automaticly. Here is the pig error:
java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.mapreduce.TableSplit not found