spark sql protocol buffer support - apache-spark-sql

Have been attempting to write against java rdds and datasets and use protocol buffers (v2.5.x) for spark to infer the schema.
However spark fails on protocol buffer field members
Given a proto
message FooProto {
required string name = 1;
required string value = 2;
}
And an attempt to build a java bean
#Builder
#Data
#NoArgsConstructor
#AllArgsConstructor
public class TestProto implements Serializable {
private FooProto foobar;
}
And then using it spark java
FooProto proto = ParsedHeaderProto.newBuilder()
.setName("foo")
.setValue("value")
.build();
TestProto testProto = TestProto .builder()
.foobar(proto)
.build();
JavaRDD<TestProto> rdd = sparkContext.parallelize(Arrays.asList(testProto));
Dataset<TestProto> dataSet = sqlc.createDataset(rdd.rdd(), Encoders.bean(TestProto.class));
dataSet.show();
throws the following error
java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class com.google.protobuf.Descriptors$Descriptor
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:123)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:117)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:133)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$1.apply(JavaTypeInference.scala:131)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:131)
at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:86)
at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
at org.apache.spark.sql.Encoders.bean(Encoders.scala)
I understand that the javadocs suggest that inference is supported for nested java beans.
My question is whether
native support exists already within the spark-framework and is there a glaring error I am missing above that can resolve the problem ?
a library exists which adds native support for protocol buffers in spark-sql-datasets ?
The workaround would be to create serializable pojos for every protobuff.

Related

Ktor - create List from Json file

i am getting error - This class does not have constructor at object : TypeToken<List<Todo>>() + object is not abstract and does not implement object member
data class Todo(
val identifier: Long ,
val name: String ,
val description: String
)
class DefaultData {
private lateinit var myService: MyService
#PostConstruct
fun initializeDefault() {
val fileContent = this::class.java.classLoader.getResource("example.json").readText()
val todos: List<Todo> = Gson().fromJson(fileContent, object : TypeToken<List<Todo>>() {}.type)
myService.createTodoFromJsontodos
}
}
how can I fix this?
Objective is : To be able to create an endpoint that can get data from json file via service
Is there is a full fledged example
Also how to create interfaces in Ktor? As I want to use Dependency Inversion to enable retrieving data from different sources
Kotlin has built-in util similar to TypeToken, so I suggest using it instead:
Gson().fromJson(fileContent, typeOf<List<Todo>>().javaType)
You will need to add a dependency to kotlin-reflect. typeOf() function is marked as experimental, but I use it for some time already and never had any problems with it.
Also, you said in your comment that this is a starter project. If you don't have any existing code already then I suggest to use kotlinx-serialization instead of Gson. It is a de facto standard in Kotlin.
You can easily take advantage of kotlinx-serialization.
Steps:
Add the kotlin serialization plugin in your build.gradle file
kotlin("plugin.serialization") version "1.5.20"
plugins {
application
java
id("org.jetbrains.kotlin.jvm") version "1.5.21"
kotlin("plugin.serialization") version "1.5.20"
}
Add the dependecy for serialization library
dependencies {
...
implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.2.2")
}
Decode your json string to corresponding object using Json decode method
val JSON = Json {isLenient = true}
val mytodos = JSON.decodeFromString(message) as List<Todo>

Kotlin : Type mismatch when class implement Serializable

I have a project that was written in both Java & Kotlin (forgive my very newbie kotlin understanding)
The class I would like to serialize has implemented the Serializable interface.
Error Message:
Type mismatch: inferred type is EntityRelationship but Serializable! was expected
public class EntityRelationship implements Serializable {
#NonNull
private Relationship relationship;
#Builder.Default
private List<Relationship> childRelationship = new ArrayList<>();
}
When I tried to serialize it
val entityRelationship: EntityRelationship = relationshipList.get(0)
val serializedResponse = SerializationUtils.serialize(entityRelationship) //<--Trouble in this line
I do not understand why type mismatch happen here, because entityRelationship already implement serializable interface. Even if I enforce
val entityRelationship: Serializable = relationshipList.get(0)
p.s: relationshipList is a List
I still get the same error. My zero understanding regards to kotlin forbid me to ping the issue. Any pointer is appreciated.
First of all "Relationship" class should be implemented Serializable as well. And you may need to make your serialize function generic and cast your class in SerializationUtils.

Receiving NoClassDefFoundError when invoking generated serializer() method

I’m getting a NoClassDefFoundError when trying to invoke the Foo.serializer() method on a #Serializable class.
Here's my test case:
#Serializable
data class Foo(val data: String)
val jsonString = json.stringify(
Foo.serializer(), // <= Error happens here
Foo(data = "foo")
)
Attempting to run the code results in the following stack trace:
java.lang.NoSuchMethodError: 'void kotlinx.serialization.internal.SerialClassDescImpl.<init>(java.lang.String, kotlinx.serialization.internal.GeneratedSerializer, int)'
at com.example.Foo$$serializer.<clinit>(Foo.kt:7)
at com.example.Foo$Companion.serializer(Foo.kt)
This is the result of version mismatches between Kotlin and Kotlinx.serialization, as they are relatively tightly coupled. In my case I was using Kotlin 1.3.71 and kotlinx.serialization 0.14.0, so the solution was to upgrade kotlinx.serialization to 0.20.0.

spring data rest kotlin associations POST

I followed the tutorial http://www.baeldung.com/spring-data-rest-relationships.
I also observed that I can create the association directly by providing the link to the relationship.
curl -i -X POST -H "Content-Type:application/json" -d '{"name":"My Library"}' http://localhost:8080/libraries
curl -i -X POST -d '{"title":"Books", "library":"http://localhost:8080/libraries/1"}' -H "Content-Type:application/json" http://localhost:8080/books
This works fine in Java and also in Kotlin when using a regular class.
However, if I use a data class in Kotlin, I get the following error
2018-04-26 14:13:43.730 ERROR 79256 --- [nio-8080-exec-2] b.e.h.RestResponseEntityExceptionHandler : org.springframework.http.converter.HttpMessageNotReadableException: JSON parse error: Cannot construct instance of com.baeldung.models.Library (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('http://localhost:8080/libraries/1'); nested exception is com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance of com.baeldung.models.Library (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('http://localhost:8080/libraries/1') at [Source: (org.apache.catalina.connector.CoyoteInputStream); line: 1, column: 29] (through reference chain: com.baeldung.models.Book["library"])
I do have the relevant kotlin-spring, kotlin-jpa and kotlin-noarg plugins in my project.
Code is here https://github.com/vijaysl/spring-data-rest
Try adding #JsonCreator(mode = JsonCreator.Mode.DISABLED) annotation on primary constructor. No need to disable the com.fasterxml.jackson.module:jackson-module-kotlin.
Explanation:
Kotlin Jackson module implies your default constructor is the JSON creator (see KotlinValueInstantiator class).
Therefore, Spring Data REST does not apply its bean deserializer modifier (that is supposed to load a bean by URI) because bean properties mappings are not used for creator properties (constructor params).
KotlinValueInstantiator tries to deserialize constructor params using standard deserializers and instantiators and this leads to the error you mentioned.
Possible solution:
Since koltin-jpa module adds a default empty constructor for JPA, you can instruct Jackson not to use the JSON creator but the default empty constructor by explicitly disabling it.
Example:
#Entity
class Book #JsonCreator(mode = JsonCreator.Mode.DISABLED) constructor(
#ManyToMany
val libraries: ModifiableList<Library> = ArrayList(),
): AbstractPersistable<Long>(), Identifiable<Long>
Kotlin data classes are pretty strict. It's telling you, basically, it can't construct your POKO and it's listing some of the ways it tries. One of them is with a String constructor. Others are through private field manipulation (which is the way it's been done normally).
Data classes in kotlin, if they have fields declared as private val name:String translate to (in java) private final String name; It can't assign to a final field (which is dirty to try to assign to a private field, but impossible when it's final; the JVM won't allow it) and there are no getName() or setName() functions which can be used as another method of hydration.
Some options:
Declare your variables are var instead of val. private var name:String is java equiavalent to private String name which will use field based (dirty) hydration.
include a specific kotlin dependency for kotlin that fixes this issue: compile("com.fasterxml.jackson.module:jackson-module-kotlin") have a look at this project
example kotlin class that should work for you:
import org.springframework.hateoas.Identifiable
import java.time.LocalDate
import javax.persistence.*
import javax.validation.constraints.*
#Entity
data class Employee(#Pattern(regexp = "[A-Za-z0-9]+")
#Size(min = 6, max = 32)
val name: String,
#Email
#NotNull
val email: String?,
#PastOrPresent
val hireDate: LocalDate = LocalDate.now(),
#OneToMany(mappedBy = "employee", cascade = [CascadeType.ALL])
val forms:List<Form> = listOf(),
#OneToMany(mappedBy = "employee", cascade = [CascadeType.ALL])
val reports:List<Report> = listOf(),
#Id #GeneratedValue( strategy = GenerationType.IDENTITY) private val id: Long? = null): Identifiable<Long> {
override fun getId() = id
constructor(name:String): this(name,"$name#foo.com")
}
With kotlin all Ok.
Just replace "data class" to "class".
Jackson don't find empty constructor in "data class". And use other deserializator... not Uri....

Issue when serializing Map.Entry using jackson

If I try to deserialize below type stored as String:
List<Entry<String, String>> entryList;
where entryList contains:
[{"dummyKey1":"dummyValue1"}]
I get the following errors
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Can not construct instance of java.util.Map$Entry, problem: abstract types either need to be mapped to concrete types, have custom deserializer, or be instantiated with additional type information.
I get above error while running a test case in junit, but if I remove the test case, then after deploying everything runs fine :
Above error comes while running junit test case because of absence of NoArgsConstructor in Entry. So, I created a DummyEntry with NoArgsConstructor that calls Entry with arguments as null.
DummyEntry<K, V> extends SimpleEntry<K, V>
After making this change, above error didn't come but I started getting below error after changes are deployed.
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
Unrecognized field "dummyKey1", not marked as ignorable (2 known properties: "value", "key"]).
What is the reason that one way doesn't works for junit, but in production it works while other does work in junit but not in production.
Also, I noticed one additional thing: In production, Map.Entry is serialized to
{'dummyKey1':'dummyValue1'}
whereas, test case in junit serializes the same string as
{'key':'dummyKey1', 'value':'dummyValue1'}
What is the reason about this weird behavior ? How can I make this thing work for both ?
I suspect you might be encountering an issue with different serialisation strategies for Map.Entry.
In v2.5.0 (IIRC) of jackson-databind Map.Entry was supported as a 'known type'. Prior to this version, the key and value attributes of Map.Entry would appear in a serialised Map.Entry. After this version, that's no longer the case.
Here are some example test cases showing what I mean:
#Test
public void mapSerialisationPreJackson2_5_0() throws IOException {
Map<String, String> aMap = Maps.newHashMap();
aMap.put("dummyKey1", "dummyValue1");
Set<Map.Entry<String, String>> incoming = aMap.entrySet();
ObjectMapper objectMapper = new ObjectMapper();
String serialised = objectMapper.writeValueAsString(incoming);
// prints: [{"key":"dummyKey1","value":"dummyValue1"}]
System.out.println(serialised);
Set<Map.Entry<String, String>> deserialised = objectMapper.readValue(serialised, Set.class);
// prints: [{key=dummyKey1, value=dummyValue1} (just like you posted in your question) whereas for versions > 2.5.0 the serialised form is ]
System.out.println(deserialised);
}
#Test
public void mapSerialisationPostJackson2_5_0() throws IOException {
Map<String, String> aMap = Maps.newHashMap();
aMap.put("dummyKey1", "dummyValue1");
Set<Map.Entry<String, String>> incoming = aMap.entrySet();
ObjectMapper objectMapper = new ObjectMapper();
String serialised = objectMapper.writeValueAsString(incoming);
// prints: [{"dummyKey1":"dummyValue1"}]
System.out.println(serialised);
Set<Map.Entry<String, String>> deserialised = objectMapper.readValue(serialised, Set.class);
// prints: [{dummyKey1=dummyValue1}]
System.out.println(deserialised);
}
Prior to v2.5.0 a Map.Entry would be serialised to {key=dummyKey1, value=dummyValue1} (just like you posted in your question) whereas for versions > 2.5.0 the serialised form is {dummyKey1=dummyValue1}.
I think you are using a version of jackson-databind in your test context which is < 2.5.0 and a version of jackson-databind in your production context which is > 2.5.0
In order to be able to deserialize [{"dummyKey1":"dummyValue1"}] into a List<Entry<String, String>> variable you can:
Use Jackson's parameter names module. Read more here. It basically allows non-annotated, non default constructors with parameters to be used for deserialization of a class. In this case the constructors of the various implementations of Map.Entry. A perfectly straightforward solution if you use Java 8 anyway.
If you can't use the parameter names module (e.g. Java 7), you can look into using mixins to annotate a constructor of a class without modifying it's source code. I had a go at that and it's tricky. For HashMap for instance the implementation of Map.Entry is Node which has package private visibility.