How to query ignite RDD? - apache-spark-sql

I am using CacheConfiguration with setIndexedTypes(Long.class, StructType.class) were StructType is an object of Spark and using igniteRDD.saveValues(df.rdd()) to push values. But when i try to query on that cache getting "Use setIndexedTypes or setTypeMetadata methods on CacheConfiguration to enable". I am aware of annotating fields with querysqlfields on POJO but the value here is Spark object how can we do this.

This doesn't work because StructType class doesn't know anything about Ignite SQL. You should create your own key and value classes and convert each StructType instance to a key-value pair during loading (use savePairs method). After that you will be able to configure SQL as described here: https://apacheignite.readme.io/docs/sql-queries

Related

Object aggregations in Querydsl-jpa

I would like to ask if there is any possibility to use object aggregation functions in JPA (which uses HQL). Functions like json_agg()
I would like to achieve something like. So the goal is to take entity and transform it into string.
Expressions.stringTemplate("jsonb_agg(json_build_object('entity', {0}))", qEntity.id)
Why I try to do I am getting org.hibernate.QueryException: No data type for node: org.hibernate.hql.internal.ast.tree.MethodNode error. I´ve read that problem is I can not use HQL cause I can not use the HQL object properties in json aggregation functions.
I would like to avoid using querydsl-sql as much as I can (It makes complications in docker app deployment, It needs to be connected to database etc). So is there any way how to agregate objects like this using HQL? I am using spring-data-jpa so these is opportunity to use this tool to if there is better solution in it.
Your QueryDSL snippet looks just fine, but you need to register custom functions for JSONB_AGG and JSON_BUILD_OBJECT as well as custom types for the JSONB result.
For the custom JSONB type you can use the JsonBinaryType from the hibernate-types library.
For the custom function, you need to create a MetadataBuilderInitializer that registers the SQL functions with Hibernate. you can take inspiration from my hibernate-types-querydsl-apt library (for example ArrayFunctionInitializer). Applied to JSON functions specifically, you would end up with something along the lines of:
public class ArrayFunctionInitializer implements MetadataBuilderInitializer {
#Override
public void contribute(MetadataBuilder metadataBuilder, StandardServiceRegistry standardServiceRegistry) {
metadataBuilder.applySqlFunction("json_build_object", new StandardSQLFunction("json_build_object", JsonBinaryType.INSTANCE));
metadataBuilder.applySqlFunction("jsonb_agg", new StandardSQLFunction("jsonb_agg", JsonBinaryType.INSTANCE));
}
}

Simple Jackson Array string addition

I am trying to convert GSON to jackson I have a method that returns a gson JsonObject. However it only creates a JsonArray, and returns that so I assume there is a simple casting there. So what would be the Jackson Equivalent? Now the method only adds one string at a time. So I need something like this:
JsonNode node = new JsonNode();
node.add("String 1");
node.add("String 2');
but would come out like this:
["String 1","String 2"]
I could create a List and map it from there, but I want to do this raw.
This seems too simple as google has given me many suggestions that are far beyond what this simple exercise requires.
And if anyone has a nice blog to tutorial on how to convert Gson to Jackson that would be great.
it is a bit tricky - you create an array node through JsonNode factory method:
ArrayNode arrNode = (ArrayNode)new JsonNode().withArray("my_array"); // arg is arrray propertry name
arrNode.add("String 1");
arrNode.add("String 2');
If you just want to create ArrayNode, ObjectMapper has method createArrayNode(), along with createObjectNode(). You can then add entries to it, as well as add node itself into other arrays, or as property in ObjectNode.
Actual construction of nodes by mapper is done using configurable JsonNodeFactory; default implementation of which simple constructs one of standard implementation types like ObjectNode and ArrayNode.

Accessing properties of a Kotlin entity

I'm new to Kotlin, so apologies if I'm not articulating concepts correctly. I have an instance of an Entity
[TestEntity(applicationId=1, timestamp=2018-01-24T18:40:30, issueState=MA, product=test, status=sold, paymentMode=VISA, premium=null)]
I am writing a service that is going to take these keys and use them to create the headers of a report. The keys may change depending on the type of report the user is trying to generate, which will have an impact on the Entity that will be instantiated.
I want to be able to iterate over this Entity so that I can create an array to use for the headers. Any thoughts on how I do this?
I think the cleanest solution is storing values in a map and delegating properties to it.
Don't think you can otherwise iterate over class fields without some verbose getter chain or ugly reflection shenanigans.
For example here you can access map fields as if they were class fields, but can also easily iterate over map.
data class TestEntity(val map : Map<String, Any>){
val appId : Int by map
val timeStamp : Long by map
(... more fields)
}

Write pojo's to parquet file using reflection

HI Looking for APIs to write parquest with Pojos that I have.
I was able to generate avro schema using reflection and then create parquet schema using AvroSchemaConverter.
Also i am not able to find a way to convert Pojos to GenericRecords (avro) else I could have been able to use AvroParquetWriter to write out the Pojos into parquet files.
Any suggestions ?
If you want to go through avro you have two options:
1) Let avro generate your pojos (see the tutorial here). The generated pojos extend SpecificRecord which can then be used with AvroParquetWriter.
2) Write the conversion from your pojo to GenericRecord yourself. You can do this either manually or a more generic solution would be to use reflection. However, I encountered difficulties with this approach when I tried to read the data. Based on the supplied schema avro found the pojo in the classpath and tried to instantiate a SpecificRecord instead of GenericRecord. Because of this reason I went with option 1.
Parquet also supports now writing pojo directly. Here is the pull request on parquet github page. However, I think this is not part of an official release yet. In another words, I did not find this code in maven.
DISCLAIMER: The following code was written when I was in a hurry. It is not efficient and future versions of parquet will surely fix this more directly. That being said, this is a lightweight inefficient approach to what you need. The strategy is POJO -> AVRO -> PARQUET
POJO -> AVRO: Declare a schema via reflection. Declare writers and readers based on the schema. At the time of conversion write the object to byte stream and read it back as avro.
AVRO -> Parquet: use the AvroParquetWriter included in the parquet-me project.
private static final Schema avroSchema = ReflectData.AllowNull.get().getSchema(YOURCLASS.class);
private static final ReflectDatumWriter<YOURCLASS> reflectDatumWriter = new ReflectDatumWriter<>(avroSchema);
private static final GenericDatumReader<Object> genericRecordReader = new GenericDatumReader<>(avroSchema);
public GenericRecord toAvroGenericRecord() throws IOException {
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
reflectDatumWriter.write(this, EncoderFactory.get().directBinaryEncoder(bytes, null));
return (GenericRecord) genericRecordReader.read(null, DecoderFactory.get().binaryDecoder(bytes.toByteArray(), null));
}
One more thing: it seems the parquet writers are currently very strict about null fields. Make sure none of your fields are null before attempting to write to parquet
I wasn't able to find an existing solution, so I implemented it myself. Here is the link to the implementation: https://gist.github.com/alexeygrigorev/eab72e40c6051e0163a6693054906d66
In short, it does the following:
uses reflection to get Avro schema from the pojo
using the schema and reflection it converts pojos to GenericRecord objects
reflection is applied recursively if the pojo contains other pojos or list of pojos

Hadoop Serializer Not Found Exception

I have a job whose output format is SequenceFileOuputFormat.
I set the output key and value class like this:
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(SplitInfo.class);
The SplitInfo class implements Serializable,Writable
I set the io.serializations property as follows:
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization,"
+ "org.apache.hadoop.io.serializer.WritableSerialization");
However, on the reducer side I get this error, telling me that Hadoop couldn't find a serializer:
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:961)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:892)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:393)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:476)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:61)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:638)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
Can anyone help, please ?
The problem was that I was making a stupid mistake: I was not updating a jar. So, basically SplitInfo was not implementing the Writable interface in the old (in use) jar.
As a general observation: the error specified in the OP has as underlying cause the fact that HADOOP can't find a Serializer for a specific type which you're trying to serialize (being directly or indirectly, e.g. by using that type as an output key/value). Hadoop cannot find a Serilizer for one of the 2 reasons:
your type is not serializable (i.e. it doesn't implement Writable or Serializable)
There is no Serializer available to Hadoop for the type of serialization your type implements (e.g.: your type implements Writable but hadoop for one reason or another cannot use the org.apache.hadoop.io.serializer.WritableSerialization class)
I think you're trying to do something you don't need to. Your output value only needs to implement the Writable interface and you should just set the output format.
conf.setOutputFormatClass(SequenceFileOutputFormat.class);
You only use the "io.serializations" configuration if you want to use a different serialization framework, which it doesn't look like you need.