Hadoop Serializer Not Found Exception - serialization

I have a job whose output format is SequenceFileOuputFormat.
I set the output key and value class like this:
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(SplitInfo.class);
The SplitInfo class implements Serializable,Writable
I set the io.serializations property as follows:
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization,"
+ "org.apache.hadoop.io.serializer.WritableSerialization");
However, on the reducer side I get this error, telling me that Hadoop couldn't find a serializer:
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:961)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:892)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:393)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:476)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:61)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:638)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
Can anyone help, please ?

The problem was that I was making a stupid mistake: I was not updating a jar. So, basically SplitInfo was not implementing the Writable interface in the old (in use) jar.
As a general observation: the error specified in the OP has as underlying cause the fact that HADOOP can't find a Serializer for a specific type which you're trying to serialize (being directly or indirectly, e.g. by using that type as an output key/value). Hadoop cannot find a Serilizer for one of the 2 reasons:
your type is not serializable (i.e. it doesn't implement Writable or Serializable)
There is no Serializer available to Hadoop for the type of serialization your type implements (e.g.: your type implements Writable but hadoop for one reason or another cannot use the org.apache.hadoop.io.serializer.WritableSerialization class)

I think you're trying to do something you don't need to. Your output value only needs to implement the Writable interface and you should just set the output format.
conf.setOutputFormatClass(SequenceFileOutputFormat.class);
You only use the "io.serializations" configuration if you want to use a different serialization framework, which it doesn't look like you need.

Related

Kotlin type annotation

Consider the following annotation class
#Target(AnnotationTarget.TYPE)
annotation class ML(val size: Int)
By default, the retention policy is RUNTIME, thus this annotation must be accessible through reflection.
Now I have
val a: #ML(2) List<Int> = listOf(1)
which does compile, but, if examined in the debugger, one gets
a::class.annotations.size = 0
What am I doing incorrectly and what is the correct way to annotate types without wrapping things into classes and annotating properties?
The expression you used:
b::class.annotations
Can be used to obtain the annotations on the class returned by b. List is not annotated with anything, so you get no annotations. Given the location where you put the annotation, you actually want to get the annotations for the return type of property b:
::b.returnType.annotations
EDIT: I thought b was a property. What you want to do is impossible, because annotation information isn't stored for local variables on the JVM. See this question: Can I get information about the local variables using Java reflection? (about Java but it's all the same). If b had been a class property or a top-level property, then what I showed would have applied.

How to specify when giving a generic parameter that it should implement some specific creation method?

How to specify when giving a generic parameter that it should implement some specific creation method? as LIST[G -> create make end] doesn't work :-(
In my particular case,
* SMA_INVERTER_MANAGER_CSV has inherited from CONSUMPTION_SECTOR_MODBUS_DEVICE_CSVa list of devices as devices: LINKED_SET[G] as G -> MEASURING_POINT_MODBUS_DEVICE create make_from_file_path end.
I'd like the SMA_INVERTER_MANAGER_CSV class to be able into devices: LINKED_SET[G] to be able to have either JANITZA_DEVICE, SUNSPEC_DEVICE, ABB_DEVICE, etc. Giving the generic parameter as MEASURING_POINT_MODBUS_DEVICE seems to come out of a sense, but how do I specify that I'd like the creation method to be make_from_file_path
Hope the description is sufficient to understand, refactoring I think this question is linked -> explicit creation type not conforming to type of target
The only workaround for the moment I found working for the moment is
class
SMA_INVERTER_MANAGER_CSV
inherit
CONSUMPTION_SECTOR_MODBUS_DEVICE_CSV[SUNSPEC_DEVICE]
create
make
end
but I'd like it to be
class
SMA_INVERTER_MANAGER_CSV
inherit
CONSUMPTION_SECTOR_MODBUS_DEVICE_CSV[MEASURING_POINT_MODBUS_DEVICE]
create
make
end
which would generate a conformance problem because MEASURING_POINT_MODBUS_DEVICE generic parameter doesn't specify make_from_file_path as creation procedure as its deferred
There is more than a conformance problem. MEASURING_POINT_MODBUS_DEVICE is deferred. Therefore, it cannot be used as an actual parameter for CONSUMPTION_SECTOR_MODBUS_DEVICE_CSV. If it were allowed, how would CONSUMPTION_SECTOR_MODBUS_DEVICE_CSV create an instance of a deferred class?
One possible solution — supplying an effective class — is mentioned in the question. Another solution is to add a formal generic parameter to SMA_INVERTER_MANAGER_CSV with the corresponding constraint and to use it for the actual generic of CONSUMPTION_SECTOR_MODBUS_DEVICE_CSV.

Combine JsonDeserialize#contentAs with JsonDeserialize#contentConverter or JsonDeserialize#contentUsing for custom deserialization

In JsonDeserialize annotation documentation the contentAs field is supposed to define the "Concrete type to deserialize content".
I tried to use this in combination, with either a Converter (via contentConverter field of the same annotation) or a JsonDeserializer (via contentUsing field of the same annotation), by extending either StdConverter or StdDeserializer, respectively, in an attempt to create an agnostic custom deserializer.
I cannot find a way to access the JsonDeserialize#contentAs information inside any of these two classes.
I am aware that the classes I extend from have a type parameter, I just put an Object class there. Documentation states
contentAs Concrete type to deserialize content (elements of a Collection/array, values of Maps) values as, instead of type otherwise declared. Must be a subtype of declared type; otherwise an exception may be thrown by deserializer.
Apparently I am applying the #JsonDeserializer annotation on a Collection of some persistable Class. I want to deserialize each such object, solely by knowing its id. Well, if I could only get that very type I defined in the #JsonDeserializer#contentAs field...
Can anyone tell me if this is possible anyhow?
I managed to implement the agnostic deserializer withou the use of #JsonDeserializer#contentAs after all.
After reading the javadocs of com.fasterxml.jackson.databind.JsonDeserializer I concluded that my custom deserializer should implement the com.fasterxml.jackson.databind.deser.ContextualDeserializer interface.
Inside the implementation of ContextualDeserializer#createContextual(DeserializationContext ctxt, BeanProperty property)
I could finally get access to the class type of the content of the collection, which I applied the #JsonDeserialize annotation on,
by calling:
ctxt.getContextualType().getRawClass()
NOTE that the same call inside the implementation of com.fasterxml.jackson.databind.JsonDeserializer#deserialize(com.fasterxml.jackson.core.JsonParser, com.fasterxml.jackson.databind.DeserializationContext) returned null, hence the need of the aforementioned interface.
All I had to do then is store the returned class in a member field (of type Class< ? >) of the custom deserializer and use it in the execution of JsonDeserializer#deserialize()
The only thing that remains to check is whether an instance of this custom deserializer is shared between threads. I only did some minor checks; I used the same implementation for two different collections of different types. I observed that ContextualDeserializer#createContextual(DeserializationContext ctxt, BeanProperty property) was called once (among multiple deserialization invokations), for each distinct type that was going to be deserialized. After checking during debugging, it seems that the same deserializer object is used for the same type. In my case, since what I store in the member field is this type itself, I don't mind if the same deserializer is used for the same java type to be deserialized because they should contain the same value. So we 're clear on this aspect as well.
EDIT: It appears all I have to do is update the com.fasterxml.jackson.databind.deser.std.StdDeserializer#_valueClass value to the now known class. Since it is final and since the ContextualDeserializer#createContextual(DeserializationContext ctxt, BeanProperty property) returns a JsonSerializer object, which is actually used,
instead of returning "this" serializer I can create a new one, passing the discovered class in the constructor, which actually sets the StdDeserializer#_valueClass to the class I actually want, and I'm all set!
Finally, NOTE that I didn't have to use the #JsonDeserializer#contentAs annotationfield as I get the value from the ctxt.getContextualType().getRawClass() statement inside ContextualDeserializer#createContextual(DeserializationContext ctxt, BeanProperty property) implementation

Using the DoctrineObjectConstructor, how are new entities created?

I am attempting to use JMSSerializerBundle to consume JSON into Doctrine entities. I need to both create new entities where they do not already exist in the database, and update existing entities when they do already exist. I am using the DoctrineObjectConstructor included in the JMSSerializer package to help with this. When I consume JSON which contains a property designated as an identifier, such as:
{
"id": 1,
"some_other_attribute": "stuff"
}
by attempting to deserialize it, JMSSerializer causes warnings and eventually dies with an exception for attempting to utilize reflection to set properties on a null value. The warnings all look like this:
PHP Warning: ReflectionProperty::setValue() expects parameter 1 to be object, null given in /Users/cdonadeo/Repos/Ubertester/vendor/jms/serializer/src/JMS/Serializer/GenericDeserializationVisitor.php on line 176
If I manually insert an entity with ID 1 in my database and make another attempt then I receive no errors and everything appears to be working correctly, but I'm now short half my functionality. I looked at the code for the DoctrineObjectConstructor class, and at the top is a comment:
/**
* Doctrine object constructor for new (or existing) objects during deserialization.
*/
But I don't see how it could possibly create a new a new entity because after the construct() function has done all of its checks, at the end it calls:
$object = $objectManager->find($metadata->name, $identifierList);
And since the identifier does not exist in the database the result is null which is ultimately what gets returned from the function. This explains why inserting a row in the database with the appropriate ID makes things work: find() now returns a proper Entity object, which is what the rest of the library expects.
Am I using the library wrong or is it broken? I forked the Git repo and made an edit, and trying it out everything seems to work more or less the way I expected. That edit does have some drawbacks that make me wonder if I'm not just making this more difficult than it has to be. The biggest issue I see is that it will cause persisted and unpersisted entities to be mixed together with no way to tell which ones are which, but I don't know if that's even a big deal.
For Doctrine entities use configuration:
jms_serializer:
object_constructors:
doctrine:
fallback_strategy: "fallback" # possible values ("null" | "exception" | "fallback")
see configuration reference https://jmsyst.com/bundles/JMSSerializerBundle/master/configuration

DataIntegrityViolationException when I change a List variable to ArrayList

I have a grails project that is throwing the following exception:
org.springframework.dao.DataIntegrityViolationException: could not delete: [Role#4]; SQL [delete from role where id=? an
d version=?]; constraint [null]; nested exception is org.hibernate.exception.ConstraintViolationException: could not del
ete: [Role#4]
In my Role domain, all I did to create this error, was change the definition of one of the variables from
List<RoleTool> roleTools = new ArrayList<RoleTool>()
to
ArrayList<RoleTool> roleTools = new ArrayList<RoleTool>()
Why is that?
It's bad practice in general to specify a concrete class as the declaration type, both in variable declarations and in method signatures. Unless you really need it to be an ArrayList, leave it as List to allow more flexibility.
I'm not entirely sure what's happening here, but Hibernate has its own collections classes that it uses for mapped collections, the most commonly used being org.hibernate.collection.PersistentList and org.hibernate.collection.PersistentSet. These implement the List and Set interfaces respectively, but do not extend ArrayList or HashSet or any typical concrete collection. Instead they're Hibernate internal classes that monitor changes to help with dirty detection when persisting, flushing, etc.
It's fine to declare the initial collection as an ArrayList since it's only read from when saving (it's Groovy though, so it's a lot cleaner to just use List<RoleTool> roleTools = []). But Hibernate needs the flexibility of implementing the List/Set interface when loading persistent instances.