how to serialize map to byte array by protostuff - protostuff

is any method to serialize java.util.Map to byte array when using protostuff,
I saw there is a MapSchema in the protostuff-collectionsschema.jar file, but don't know how to use it.
can anyone give me some sample code,
thanks in advance.

If you need to serialize and deserialize Map, then you should wrap it as a field into a wrapper class (to create schema).
After that you can serialize/deserialize data to a binary format (or json, if you want human-readable text) with RuntimeSchema.
public class Foo {
private Map<Integer, String> map;
Serialization and deserialization code might look like this:
private final LinkedBuffer BUFFER = LinkedBuffer.allocate();
private final Schema<Foo> SCHEMA = RuntimeSchema.getSchema(Foo.class);
#Test
public void serializeAndDeserialize() throws Exception {
Foo foo = createFooInstance();
byte[] bytes = serialize(foo);
Foo x = deserialize(bytes);
Assert.assertEquals(foo, x);
}
private byte[] serialize(Foo foo) throws java.io.IOException {
return ProtobufIOUtil.toByteArray(foo, SCHEMA, BUFFER);
}
private Foo deserialize(byte[] bytes) {
Foo tmp = SCHEMA.newMessage();
ProtobufIOUtil.mergeFrom(bytes, tmp, SCHEMA);
return tmp;
}
Full source code for this example: RuntimeSchemaUsage.java

You can be in protostuff - collectionschema found in the test code.
public class StringMapSchema<V> extends MapSchema<String,V>
{
/**
* The schema for Map<String,String>
*/
public static final StringMapSchema<String> VALUE_STRING = new StringMapSchema<String>(null)
{
protected void putValueFrom(Input input, MapWrapper<String,String> wrapper,
String key) throws IOException
{
wrapper.put(key, input.readString());
}
protected void writeValueTo(Output output, int fieldNumber, String value,
boolean repeated) throws IOException
{
output.writeString(fieldNumber, value, repeated);
}
protected void transferValue(Pipe pipe, Input input, Output output, int number,
boolean repeated) throws IOException
{
input.transferByteRangeTo(output, true, number, repeated);
}
};
/**
* The schema of the message value.
*/
public final Schema<V> vSchema;
/**
* The pipe schema of the message value.
*/
public final Pipe.Schema<V> vPipeSchema;
public StringMapSchema(Schema<V> vSchema)
{
this(vSchema, null);
}
public StringMapSchema(Schema<V> vSchema, Pipe.Schema<V> vPipeSchema)
{
this.vSchema = vSchema;
this.vPipeSchema = vPipeSchema;
}
protected final String readKeyFrom(Input input, MapWrapper<String,V> wrapper)
throws IOException
{
return input.readString();
}
protected void putValueFrom(Input input, MapWrapper<String,V> wrapper, String key)
throws IOException
{
wrapper.put(key, input.mergeObject(null, vSchema));
}
protected final void writeKeyTo(Output output, int fieldNumber, String value,
boolean repeated) throws IOException
{
output.writeString(fieldNumber, value, repeated);
}
protected void writeValueTo(Output output, int fieldNumber, V value,
boolean repeated) throws IOException
{
output.writeObject(fieldNumber, value, vSchema, repeated);
}
protected void transferKey(Pipe pipe, Input input, Output output, int number,
boolean repeated) throws IOException
{
input.transferByteRangeTo(output, true, number, repeated);
}
protected void transferValue(Pipe pipe, Input input, Output output, int number,
boolean repeated) throws IOException
{
if(vPipeSchema == null)
{
throw new RuntimeException("No pipe schema for value: " +
vSchema.typeClass().getName());
}
output.writeObject(number, pipe, vPipeSchema, repeated);
}
}

Related

Jackson Serialization Problems

I am having some trouble serializing/deserializing my classes below.
My Data class holds a list of other classes.
When I call the serialize/deserialize methods in the Data class, I get the following error:
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Can not construct instance of com.amazon.rancor.storage.types.ChildData: no suitable constructor found, can not deserialize from Object value (missing default constructor or creator, or perhaps need to add/enable type information?)
The error comes from the deserialize method. But I also believe the serialization is not working properly. This is what the serialized Data object looks like:
{childData:[{zipCode:{present:true},countryCode:"US"}]
The Optional field is not being serialized properly even though I have set the objectMapper.registerModule(new Jdk8Module()); field
I can't seem to figure out what I am doing wrong. Maybe I need to change something in ChildData and ChildDataV2 class. But I am not sure what.
Any pointers would be appreciated!
public class Data {
private List<ChildData> childData;
private List<ChildDataV2> childDataV2;
private static ObjectMapper objectMapper;
static {
objectMapper = new ObjectMapper();
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
objectMapper.registerModule(new Jdk8Module());
}
public Data() { }
#JsonCreator
public Data(#JsonProperty("childData") final List<ChildData> childData,
#JsonProperty("childDataV2") final List<ChildDataV2> childDataV2) {
this.childData = childData;
this.childDataV2 = childDataV2;
}
public List<ChildData> getChildData() {
return childData;
}
public void setChildData(final List<ChildData> childData) {
this.childData = childData;
}
public List<ChildDataV2> getChildDataV2() {
return childDataV2;
}
public void setChildDataV2(final List<ChildDataV2> childDataV2) {
this.childDataV2 = childDataV2;
}
public String serialize() {
try {
return objectMapper.writeValueAsString(this);
} catch (JsonProcessingException e) {
throw new RuntimeException("Failed to serialize. Data: " + this, e);
}
}
public Data deSerialize(final String data) {
try {
return objectMapper.readValue(data, Data.class);
} catch (IOException e) {
throw new RuntimeException("Failed to deserialize. Data" + data, e);
}
}
}
public class ChildData {
private final String countryCode;
private final Optional<String> zipCode;
public ChildData(final String countryCode, final Optional<String> zipCode) {
this.countryCode = countryCode;
this.zipCode = zipCode;
}
public Optional<String> getZipCode() {
return zipCode;
}
public String getCountryCode() {
return countryCode;
}
}
public class ChildDataV2 extends ChildData {
private final Object struct;
public ChildDataV2(final String cc, final Optional<String> postalCode,
final Object struct) {
super(cc, postalcode);
this.struct = struct;
}
}
The exception is quite clear right? You need to add a default constructor for ChildData or annotate the existing constructor like this:
#JsonCreator
public ChildData(#JsonProperty("countryCode") String countryCode, #JsonProperty("zipCode") Optional<String> zipCode) {
this.countryCode = countryCode;
this.zipCode = zipCode;
}

Data loss while inserting data to BigQuery using Apache beam BigQueryIO

I am using below code to insert data to BQ using apache Beam BigQueryIO. I read data from kafka(Beam KafkaIO) and process it and create Pcollection of String and then stream it to BQ. While writing data to BQ it is not writing all records to Table. It doesn't throw any exception also.
public class ConvertToTableRow extends DoFn<String, TableRow> {
/**
*
*/
private static final long serialVersionUID = 1L;
private StatsDClient statsdClient;
private String statsDHost;
private int statsDPort = 9125;
public ConvertToTableRow(String statsDHost) {
this.statsDHost = statsDHost;
}
#Setup
public void startup() {
this.statsdClient = new NonBlockingStatsDClient("Metric", statsDHost, statsDPort);
}
#ProcessElement
public void processElement(#Element String record, ProcessContext context) {
try {
statsdClient.incrementCounter("bq.message");
TableRow row = new TableRow();
row.set("name", "Value");
Long timestamp = System.currentTimeMillis();
DateFormat dateFormater = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
Date date = new Date(timestamp);
String insertDate = dateFormater.format(date);
row.set("insert_date", insertDate);
context.output(row);
} catch (Exception e) {
statsdClient.incrementCounter("exception.bq.message");
}
}
#Teardown
public void teardown() {
this.statsdClient.close();
}
}
private void streamWriteOutputToBQ(PCollection<TableRow> bqTableRows) {
String tableSchema = //tableSchema;
bqTableRows
.apply((BigQueryIO.writeTableRows().skipInvalidRows().withMethod(Method.STREAMING_INSERTS)
.to("myTable").withJsonSchema(tableSchema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)));
}
I am not sure, if i am missing any configuration for BigQueryIO

How to do failure tolerance for Flink to sink data to hdfs as gzip compression?

We want to write compressed data to HDFS by Flink's BucketingSink or StreamingFileSink. I have write my own Writer which works fine if no failure occurs. However when It encounters a failure and restart from checkpoint, It will generate valid-length file(hadoop < 2.7) or truncate the file. Unluckily gzips are binary files which have trailer at the end of file. Therefore simple truncation does not work in my case. Any ideas to enable exactly-once semantic for compression hdfs sink?
That's my writer's code:
public class HdfsCompressStringWriter extends StreamWriterBaseV2<JSONObject> {
private static final long serialVersionUID = 2L;
/**
* The {#code CompressFSDataOutputStream} for the current part file.
*/
private transient GZIPOutputStream compressionOutputStream;
public HdfsCompressStringWriter() {}
#Override
public void open(FileSystem fs, Path path) throws IOException {
super.open(fs, path);
this.setSyncOnFlush(true);
compressionOutputStream = new GZIPOutputStream(this.getStream(), true);
}
public void close() throws IOException {
if (compressionOutputStream != null) {
compressionOutputStream.close();
compressionOutputStream = null;
}
resetStream();
}
#Override
public void write(JSONObject element) throws IOException {
if (element == null || !element.containsKey("body")) {
return;
}
String content = element.getString("body") + "\n";
compressionOutputStream.write(content.getBytes());
compressionOutputStream.flush();
}
#Override
public Writer<JSONObject> duplicate() {
return new HdfsCompressStringWriter();
}
}
I would recommend to implement a BulkWriter for the StreamingFileSink which compresses the elements via a GZIPOutputStream. The code could look the following:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.enableCheckpointing(1000);
final DataStream<Integer> input = env.addSource(new InfinitySource());
final StreamingFileSink<Integer> streamingFileSink = StreamingFileSink.<Integer>forBulkFormat(new Path("output"), new GzipBulkWriterFactory<>()).build();
input.addSink(streamingFileSink);
env.execute();
}
private static class GzipBulkWriterFactory<T> implements BulkWriter.Factory<T> {
#Override
public BulkWriter<T> create(FSDataOutputStream fsDataOutputStream) throws IOException {
final GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fsDataOutputStream, true);
return new GzipBulkWriter<>(new ObjectOutputStream(gzipOutputStream), gzipOutputStream);
}
}
private static class GzipBulkWriter<T> implements BulkWriter<T> {
private final GZIPOutputStream gzipOutputStream;
private final ObjectOutputStream objectOutputStream;
public GzipBulkWriter(ObjectOutputStream objectOutputStream, GZIPOutputStream gzipOutputStream) {
this.gzipOutputStream = gzipOutputStream;
this.objectOutputStream = objectOutputStream;
}
#Override
public void addElement(T t) throws IOException {
objectOutputStream.writeObject(t);
}
#Override
public void flush() throws IOException {
objectOutputStream.flush();
}
#Override
public void finish() throws IOException {
objectOutputStream.flush();
gzipOutputStream.finish();
}
}

When attaching agent to running process, bytebuddy transformer doesn't seem to take effect

The code of my program to be attached is as below.
public class Foo {
}
public class TestEntry {
public TestEntry() {
}
public static void main(String[] args) throws Exception {
try
{
while(true)
{
System.out.println(new Foo().toString());
Thread.sleep(1000);
}
}
catch(Exception e)
{}
}
}
What I attempt to do is to make Foo.toString() returns 'test' by using the following agent.
public class InjectionAgent {
public InjectionAgent() {
}
public static void agentmain(String args, Instrumentation inst) throws Exception
{
System.out.println("agentmain Args:" + args);
new AgentBuilder.Default()
.type(ElementMatchers.named("Foo"))
.transform(new AgentBuilder.Transformer() {
#Override
public Builder<?> transform(Builder<?> arg0, TypeDescription arg1,
ClassLoader arg2, JavaModule arg3) {
return arg0.method(ElementMatchers.named("toString"))
.intercept(FixedValue.value("test"));
}
}).installOn(inst);
}
public static void premain(String args, Instrumentation inst) throws Exception
{
System.out.println("premain Args:" + args);
new AgentBuilder.Default()
.type(ElementMatchers.named("Foo"))
.transform(new AgentBuilder.Transformer() {
#Override
public Builder<?> transform(Builder<?> arg0, TypeDescription arg1,
ClassLoader arg2, JavaModule arg3) {
return arg0.method(ElementMatchers.named("toString"))
.intercept(FixedValue.value("test"));
}
}).installOn(inst);
}
}
I notice that, it was successful when I using -javaagent way, whereas attach way failed, here is code for attach.
public class Injection {
public Injection() {
}
public static void main(String[] args) throws AttachNotSupportedException, IOException, AgentLoadException, AgentInitializationException, InterruptedException {
VirtualMachine vm = null;
String agentjarpath = args[0];
vm = VirtualMachine.attach(args[1]);
vm.loadAgent(agentjarpath, "This is Args to the Agent.");
vm.detach();
}
}
I tried to add AgentBuilder.Listener.StreamWriting.toSystemOut() to the agent, after attaching, the output of TestEntry shows
[Byte Buddy] DISCOVERY Foo [sun.misc.Launcher$AppClassLoader#33909752, null, loaded=true]
[Byte Buddy] TRANSFORM Foo [sun.misc.Launcher$AppClassLoader#33909752, null, loaded=true]
[Byte Buddy] COMPLETE Foo [sun.misc.Launcher$AppClassLoader#33909752, null, loaded=true]
Foo#7f31245a
Foo#6d6f6e28
Foo#135fbaa4
Foo#45ee12a7
Foo#330bedb4
==================================Update=====================================
I defined a public method 'Bar' in Foo like this
public class Foo {
public String Bar()
{
return "Bar";
}
}
and then I was trying to make Foo.Bar() returns "modified" in the following way:
public static void agentmain(String args, Instrumentation inst) throws Exception
{
System.out.println("agentmain Args:" + args);
premain(args, inst);
new AgentBuilder.Default()
.with(RedefinitionStrategy.RETRANSFORMATION)
.disableClassFormatChanges()
.with(AgentBuilder.Listener.StreamWriting.toSystemOut())
.type(ElementMatchers.named("Foo"))
.transform(new AgentBuilder.Transformer() {
#Override
public Builder<?> transform(Builder<?> arg0, TypeDescription arg1,
ClassLoader arg2, JavaModule arg3) {
return arg0.visit(Advice.to(InjectionTemplate.class).on(ElementMatchers.named("Bar")));
}
})
.installOn(inst);
}
static class InjectionTemplate {
#Advice.OnMethodExit
static void exit(#Advice.Return String self) {
System.out.println(self.toString() + " " + self.getClass().toString());
self = new String("modified");
}
}
but I got this error:
java.lang.IllegalStateException: Cannot write to read-only parameter class java.lang.String at 1
any suggestions?
It does not seem like you are using redefinition for your agent. You can activate it using:
new AgentBuilder.Default()
.with(RedefinitionStrategy.RETRANSFORMATION)
.disableClassFormatChanges();
The last part is required on most JVMs (with the notable exception of the dynamic code evolution VM, a custom build of HotSpot). It tells Byte Buddy to not add fields or methods, what most VMs do not support.
In this case, it is no longer possible to invoke the original implementation of a method what is however not required for your FixedValue. Typically, users of Byte Buddy take advantage of Advice when creating an agent that applies dynamic transformations of classes.

Initialize public static variable in Hadoop through arguments

I have a problem with changing public static variables in Hadoop.
I am trying to pass some values as arguments to the jar file from command line.
here is my code:
public class MyClass {
public static long myvariable1 = 100;
public static class Map extends Mapper<Object, Text, Text, Text> {
public static long myvariabl2 = 200;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
}
}
public static void main(String[] args) throws Exception {
col_no = Long.parseLong(args[0]);
Map.myvariable1 = Long.parseLong(args[1]);
Map.myvariable2 = Long.parseLong(args[1]);
other stuff here
}
}
But it is not working, myvariable1 & myvaribale2 always have 100 & 200.
I use Hadoop 0.20.203 with Ubuntu 10.04
What you can do to get the same behavior is to store your variables in the Configuration you use to launch the job.
public static class Map extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
String var2String = conf.get("myvariable2");
long myvariable2 = Long.parseLong(var2String);
//etc.
}
}
public static void main(String[] args) throws Exception {
col_no = Long.parseLong(args[0]);
String myvariable1 = args[1];
String myvariable2 = args[1];
// add values to configuration
Configuration conf = new Configuration();
conf.set("myvariable1", myvariable1);
conf.set("myvariable2", myvariable2);
//other stuff here
}