Implementation of custom Writable in Hadoop? - serialization

I have defined a custom Writable class in Hadoop, but Hadoop gives me the following error message when running my program.
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.io.SortedMapWritable.readFields(SortedMapWritable.java:180)
at EquivalenceClsAggValue.readFields(EquivalenceClsAggValue.java:82)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1282)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1222)
at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1301)
at Mondrian$Combine.reduce(Mondrian.java:119)
at Mondrian$Combine.reduce(Mondrian.java:1)
at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1442)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:332)....
EquivalenceClsAggValue is the name of the Writable class I've defined and this is my class:
public class EquivalenceClsAggValue implements WritableComparable<EquivalenceClsAggValue>{
public ArrayList<SortedMapWritable> aggValues;
public EquivalenceClsAggValue(){
aggValues = new ArrayList<SortedMapWritable>();
}
#Override
public void readFields(DataInput arg0) throws IOException {
int size = arg0.readInt();
for (int i=0;i<size;i++){
SortedMapWritable tmp = new SortedMapWritable();
tmp.readFields(arg0);
aggValues.add(tmp);
}
}
#Override
public void write(DataOutput arg0) throws IOException {
//write the size first
arg0.write(aggValues.size());
//write each element
for (SortedMapWritable s:aggValues){
s.write(arg0);
}
}
I wonder to know what is the source of the problem.

Looks like an error in your write(DataOutput) method:
#Override
public void write(DataOutput arg0) throws IOException {
//write the size first
// arg0.write(aggValues.size()); // here you're writing an int as a byte
// try this instead:
arg0.writeInt(aggValues.size()); // actually write int as an int
//..
Look at the API docs for DataOutput.write(int) vs DataOutput.writeInt(int)
I'd also amend your creation of the SortedMapWritable tmp local variable in readFields to use ReflectionUtils.newInstance():
#Override
public void readFields(DataInput arg0) throws IOException {
int size = arg0.readInt();
for (int i=0;i<size;i++){
SortedMapWritable tmp = ReflectionUtils.newInstance(
SortedMapWritable.class, getConf());
tmp.readFields(arg0);
aggValues.add(tmp);
}
}
Note for this to work, you'll also need to amend you class signature to extend Configurable (such that Hadoop will inject a Configuration object when your object is initially created):
public class EquivalenceClsAggValue
extends Configured
implements WritableComparable<EquivalenceClsAggValue> {

Related

How to do failure tolerance for Flink to sink data to hdfs as gzip compression?

We want to write compressed data to HDFS by Flink's BucketingSink or StreamingFileSink. I have write my own Writer which works fine if no failure occurs. However when It encounters a failure and restart from checkpoint, It will generate valid-length file(hadoop < 2.7) or truncate the file. Unluckily gzips are binary files which have trailer at the end of file. Therefore simple truncation does not work in my case. Any ideas to enable exactly-once semantic for compression hdfs sink?
That's my writer's code:
public class HdfsCompressStringWriter extends StreamWriterBaseV2<JSONObject> {
private static final long serialVersionUID = 2L;
/**
* The {#code CompressFSDataOutputStream} for the current part file.
*/
private transient GZIPOutputStream compressionOutputStream;
public HdfsCompressStringWriter() {}
#Override
public void open(FileSystem fs, Path path) throws IOException {
super.open(fs, path);
this.setSyncOnFlush(true);
compressionOutputStream = new GZIPOutputStream(this.getStream(), true);
}
public void close() throws IOException {
if (compressionOutputStream != null) {
compressionOutputStream.close();
compressionOutputStream = null;
}
resetStream();
}
#Override
public void write(JSONObject element) throws IOException {
if (element == null || !element.containsKey("body")) {
return;
}
String content = element.getString("body") + "\n";
compressionOutputStream.write(content.getBytes());
compressionOutputStream.flush();
}
#Override
public Writer<JSONObject> duplicate() {
return new HdfsCompressStringWriter();
}
}
I would recommend to implement a BulkWriter for the StreamingFileSink which compresses the elements via a GZIPOutputStream. The code could look the following:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.enableCheckpointing(1000);
final DataStream<Integer> input = env.addSource(new InfinitySource());
final StreamingFileSink<Integer> streamingFileSink = StreamingFileSink.<Integer>forBulkFormat(new Path("output"), new GzipBulkWriterFactory<>()).build();
input.addSink(streamingFileSink);
env.execute();
}
private static class GzipBulkWriterFactory<T> implements BulkWriter.Factory<T> {
#Override
public BulkWriter<T> create(FSDataOutputStream fsDataOutputStream) throws IOException {
final GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fsDataOutputStream, true);
return new GzipBulkWriter<>(new ObjectOutputStream(gzipOutputStream), gzipOutputStream);
}
}
private static class GzipBulkWriter<T> implements BulkWriter<T> {
private final GZIPOutputStream gzipOutputStream;
private final ObjectOutputStream objectOutputStream;
public GzipBulkWriter(ObjectOutputStream objectOutputStream, GZIPOutputStream gzipOutputStream) {
this.gzipOutputStream = gzipOutputStream;
this.objectOutputStream = objectOutputStream;
}
#Override
public void addElement(T t) throws IOException {
objectOutputStream.writeObject(t);
}
#Override
public void flush() throws IOException {
objectOutputStream.flush();
}
#Override
public void finish() throws IOException {
objectOutputStream.flush();
gzipOutputStream.finish();
}
}

When attaching agent to running process, bytebuddy transformer doesn't seem to take effect

The code of my program to be attached is as below.
public class Foo {
}
public class TestEntry {
public TestEntry() {
}
public static void main(String[] args) throws Exception {
try
{
while(true)
{
System.out.println(new Foo().toString());
Thread.sleep(1000);
}
}
catch(Exception e)
{}
}
}
What I attempt to do is to make Foo.toString() returns 'test' by using the following agent.
public class InjectionAgent {
public InjectionAgent() {
}
public static void agentmain(String args, Instrumentation inst) throws Exception
{
System.out.println("agentmain Args:" + args);
new AgentBuilder.Default()
.type(ElementMatchers.named("Foo"))
.transform(new AgentBuilder.Transformer() {
#Override
public Builder<?> transform(Builder<?> arg0, TypeDescription arg1,
ClassLoader arg2, JavaModule arg3) {
return arg0.method(ElementMatchers.named("toString"))
.intercept(FixedValue.value("test"));
}
}).installOn(inst);
}
public static void premain(String args, Instrumentation inst) throws Exception
{
System.out.println("premain Args:" + args);
new AgentBuilder.Default()
.type(ElementMatchers.named("Foo"))
.transform(new AgentBuilder.Transformer() {
#Override
public Builder<?> transform(Builder<?> arg0, TypeDescription arg1,
ClassLoader arg2, JavaModule arg3) {
return arg0.method(ElementMatchers.named("toString"))
.intercept(FixedValue.value("test"));
}
}).installOn(inst);
}
}
I notice that, it was successful when I using -javaagent way, whereas attach way failed, here is code for attach.
public class Injection {
public Injection() {
}
public static void main(String[] args) throws AttachNotSupportedException, IOException, AgentLoadException, AgentInitializationException, InterruptedException {
VirtualMachine vm = null;
String agentjarpath = args[0];
vm = VirtualMachine.attach(args[1]);
vm.loadAgent(agentjarpath, "This is Args to the Agent.");
vm.detach();
}
}
I tried to add AgentBuilder.Listener.StreamWriting.toSystemOut() to the agent, after attaching, the output of TestEntry shows
[Byte Buddy] DISCOVERY Foo [sun.misc.Launcher$AppClassLoader#33909752, null, loaded=true]
[Byte Buddy] TRANSFORM Foo [sun.misc.Launcher$AppClassLoader#33909752, null, loaded=true]
[Byte Buddy] COMPLETE Foo [sun.misc.Launcher$AppClassLoader#33909752, null, loaded=true]
Foo#7f31245a
Foo#6d6f6e28
Foo#135fbaa4
Foo#45ee12a7
Foo#330bedb4
==================================Update=====================================
I defined a public method 'Bar' in Foo like this
public class Foo {
public String Bar()
{
return "Bar";
}
}
and then I was trying to make Foo.Bar() returns "modified" in the following way:
public static void agentmain(String args, Instrumentation inst) throws Exception
{
System.out.println("agentmain Args:" + args);
premain(args, inst);
new AgentBuilder.Default()
.with(RedefinitionStrategy.RETRANSFORMATION)
.disableClassFormatChanges()
.with(AgentBuilder.Listener.StreamWriting.toSystemOut())
.type(ElementMatchers.named("Foo"))
.transform(new AgentBuilder.Transformer() {
#Override
public Builder<?> transform(Builder<?> arg0, TypeDescription arg1,
ClassLoader arg2, JavaModule arg3) {
return arg0.visit(Advice.to(InjectionTemplate.class).on(ElementMatchers.named("Bar")));
}
})
.installOn(inst);
}
static class InjectionTemplate {
#Advice.OnMethodExit
static void exit(#Advice.Return String self) {
System.out.println(self.toString() + " " + self.getClass().toString());
self = new String("modified");
}
}
but I got this error:
java.lang.IllegalStateException: Cannot write to read-only parameter class java.lang.String at 1
any suggestions?
It does not seem like you are using redefinition for your agent. You can activate it using:
new AgentBuilder.Default()
.with(RedefinitionStrategy.RETRANSFORMATION)
.disableClassFormatChanges();
The last part is required on most JVMs (with the notable exception of the dynamic code evolution VM, a custom build of HotSpot). It tells Byte Buddy to not add fields or methods, what most VMs do not support.
In this case, it is no longer possible to invoke the original implementation of a method what is however not required for your FixedValue. Typically, users of Byte Buddy take advantage of Advice when creating an agent that applies dynamic transformations of classes.

Groovy compiler fails with unexpected token on readObject

My Gradle project has a mix of Java and Groovy classes. All source is under src/main/groovy. One of my Groovy classes contains a Map that I have created from reading a JSON string via JsonSlurper.parseText(). This class is marked Serializable.
To avoid a NotSerializableException, I have implemented my own writeObject() and readObject() methods, but my code is not compiling. I didn't find many Groovy examples, but various Java references and tutorials told me to use these signatures:
private void writeObject(java.io.ObjectOutputStream out)
throws IOException
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException
My class looks like this:
import groovy.json.JsonBuilder
import groovy.json.JsonSlurper
class GroovyJSONMap implements Serializable {
private static final long serialVersionUID = 20150902L
Map myJSON = [:]
GroovyJSONMap() {
//no op
}
GroovyJSONMap(String json) {
if (json) {
try {
setJSON(json)
} catch (any) {
println "WHOOPS! Not a JSON object...."
myJSON = ["invalid": true]
}
}
}
GroovyJSONMap(Map json) {
if (json) {
setJSON(json)
}
}
final void setJSON(String json) {
myJSON = new JsonSlurper().parseText(json)
}
String getJSON() {
new JsonBuilder(myJSON).toString()
}
#Override
String toString() {
getJSON()
}
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
setJSON((String)in.readObject())
}
private void writeObject(ObjectOutputStream out) throws IOException {
out.writeObject(getJSON())
}
}
The compiler error:
:clean
:compileJava UP-TO-DATE
:compileGroovy
startup failed:
c:\path\to\src\main\groovy\GroovyJSONMap.groovy: 44: unexpected token: ObjectInputStream # line 110, column 29.
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
^
1 error
:compileGroovy FAILED
I have moved the readObject() method to various positions in the source, but it still is not compiling. The compiler does not complain about writeObject(), only readObject(). Why is my code not compiling?
The compiler points to ObjectOutputStream, but the problem is really at in.
The word in is a reserved word in Groovy and cannot be used for a variable or method name.
The solution is to rename in to any non-Groovy-reserved word, such as stream (also changed writeObject() for consistency):
private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException {
setJSON((String)stream.readObject())
}
private void writeObject(ObjectOutputStream stream) throws IOException {
stream.writeObject(getJSON())
}

MethodDecorator.Fody and getting parameter values

I was wondering if with MethodDecorator it's possible to have the passed parameter during the OnException... that would be great since if I can catch an exception I can also have the passed parameter values
Consider this piece of code
static void Main(string[] args)
{
Worker worker = new Worker();
worker.DoWork(6);
}
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Constructor | AttributeTargets.Assembly | AttributeTargets.Module)]
public class LoggableAttribute : Attribute, IMethodDecorator
{
public void OnEntry(System.Reflection.MethodBase method)
{
var args = method.GetParameters();
var arguments = method.GetGenericArguments();
}
public void OnExit(System.Reflection.MethodBase method)
{
}
public void OnException(System.Reflection.MethodBase method, Exception exception)
{
}
}
and
public class Worker
{
[Loggable]
public void DoWork(int i )
{
}
}
I wish to have 6 on the OnEntry/Nor OnException
Thanks
I know this is an old question, but in case someone stumbles upon this like I did, you can add an Init method and capture the argument values there.
e.g:
public class LoggableAttribute : Attribute, IMethodDecorator
{
private object[] arguments;
public void Init(object instance, MethodBase method, object[] args) {
this.arguments = args;
}
public void OnEntry()
{
// this.arguments[0] would be 6 when calling worker.DoWork(6);
}
}
Check out the example on https://github.com/Fody/MethodDecorator

Initialize public static variable in Hadoop through arguments

I have a problem with changing public static variables in Hadoop.
I am trying to pass some values as arguments to the jar file from command line.
here is my code:
public class MyClass {
public static long myvariable1 = 100;
public static class Map extends Mapper<Object, Text, Text, Text> {
public static long myvariabl2 = 200;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
}
}
public static void main(String[] args) throws Exception {
col_no = Long.parseLong(args[0]);
Map.myvariable1 = Long.parseLong(args[1]);
Map.myvariable2 = Long.parseLong(args[1]);
other stuff here
}
}
But it is not working, myvariable1 & myvaribale2 always have 100 & 200.
I use Hadoop 0.20.203 with Ubuntu 10.04
What you can do to get the same behavior is to store your variables in the Configuration you use to launch the job.
public static class Map extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
String var2String = conf.get("myvariable2");
long myvariable2 = Long.parseLong(var2String);
//etc.
}
}
public static void main(String[] args) throws Exception {
col_no = Long.parseLong(args[0]);
String myvariable1 = args[1];
String myvariable2 = args[1];
// add values to configuration
Configuration conf = new Configuration();
conf.set("myvariable1", myvariable1);
conf.set("myvariable2", myvariable2);
//other stuff here
}