Apache Ignite Continuous Queries : How to get the field names and field values in the listener updates when there are dynamic fields? - sql

I am working on a POC on whether or not we should go ahead with Apache Ignite both for commerical and enterprise use. There is a use case though that we are trying to find an answer for.
Preconditions
Dynamically creation of tables i.e. there may be new fields that come to be put into the cache. Meaning there is no precompiled POJO(Model) defining the attributes of the table/cache.
Use case
I would like to write a SELECT continuous query where it gives me the results that are modified. So I wrote that query but the problem is that when the listener gets a notification, I am not able to find all the field names that are modified from any method call. I would like to be able to get all the field names and field values in some sort of Map, which I can use and then submit to other systems.

You could track all modified field values using binary object and continuous query:
IgniteCache<Integer, BinaryObject> cache = ignite.cache("person").withKeepBinary();
ContinuousQuery<Integer, BinaryObject> query = new ContinuousQuery<>();
query.setLocalListener(events -> {
for (CacheEntryEvent<? extends Integer, ? extends BinaryObject> event : events) {
BinaryType type = ignite.binary().type("Person");
if (event.getOldValue() != null && event.getValue() != null) {
HashMap<String,Object> oldProps = new HashMap<>();
HashMap<String,Object> newProps = new HashMap<>();
for (String field : type.fieldNames()) {
oldProps.put(field,event.getOldValue().field(field));
newProps.put(field,event.getValue().field(field));
}
com.google.common.collect.MapDifference<Object, Object> diff = com.google.common.collect.Maps.difference(oldProps, newProps);
System.out.println(diff.entriesDiffering());
}
}
});
cache.query(query);
cache.put(1, ignite.binary().builder("Person").setField("name","Alice").build());
cache.put(1, ignite.binary().builder("Person").setField("name","Bob").build());

Related

Materialized view to use different Serde

Version used: Kafka 3.1.1, Confluent 7.1.0, Avro 1.11.0
I’m creating a REST controller which is “searching” for AVRO objects in a topic. The objects in the topic are serialized using SpecificAvroSerde<>. Each topic has assigned two AVRO schemas. One for the key (with several fields of various types) and one for the value (multiple fields and types).
I’ve done this several times whereby I’m consuming the topic in a KTable and then materialize it. There is only one pair of serdes involved and the serialized format is the same for both the topic and the materialized view (RocksaltDb). The REST controller then can look up the store and either perform a get with a key or do a range scan between two keys. This all works as expected.
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.range(fromKey, toKey)) {
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", fromKey, viewName);
}
}
I now want to change this using a prefix scan instead. Since the key contains multiple fields there is no way to only serialize first part (i.e. first few fields) of the key I need a specialized serializer. This also means I have to use a different serializer for the materialized view itself (SpecificAvroSerde puts the magic number and schema ID at the beginning of the byte array) as otherwise the serialized output for the prefix and the key in the materialized view can’t be compared. Hence I created a specialised Serde which serializes the key using the same logic as when used for serializing the prefix but omitting the fields not required for the scan (i.e. omitting the last field). Above code now looks
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXrefKey> materializedProductXrefKeySerde = new ProductXrefKeySerde();
// for the value part we can still used standard serde as no change in serialization logic needed
private final SpecificAvroSerde<ProductXref> materializedProductXrefSerde = new SpecificAvroSerde<>();
// telling the serializer to cut off last field
private final SpecificAvroSerde<ProductXref> prefixScanProductXrefSerde = new ProductXrefKeySerde(true);
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(materializedProductXrefKeySerde)
.withValueSerde(materializedProductXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.prefixScan(prefixKey, prefixScanProductXrefSerde)){
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", prefixKey, viewName);
}
}
My assumption was, that the topic gets deserialized using the SpecificAvroSerde and then gets serialized for the view using my ProductXrefKeySerde. The problem is, that the content in the materialized view is still serialized using the same logic as in the original topic. It appears that the serializer is never used during the topic being processed and stored in the materialized view. I can verify that also on the file system and see that the keys in the RocksaltDb files are serialized with the magic byte and schema ID and hence prefixScan wont be able to fine anything.
How can I change the serialization format for the materialized view?
Or is there a better way for serializing a prefix AVRO object?
It appears that there is some optimization happening which avoids deserialization/serialization if KTable is directly materialized. I've changed the logic such that it consumes it as a KStream and then creates the KTable (toTable(...))
KTable<ProductXrefKey, ProductXref> productXrefStream = builder
.stream(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde))
.toTable(Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
With this small change, data now gets deserialized (using SpecificAvroSerde<>) and serialized again using the provided ProductXrefKeySerde. Now also the prefix scan works and returns the records as expected.

HotChocolate: Dynamic schemas and how to update filters accordingly

NOTE: If you do know that the below is not possible, this information is just as valuable.
Im checking out HotChocolate and Ive looked into the dynamic schemas
I have taken the code in your Github-example This works ok. I have extended our Product-type like so:
//Same json as in your Github-sample, new properties are "color" and some more
foreach (var field in type.GetProperty("fields").EnumerateArray())
{
string name = field.GetString();
typeDefinition.Fields.Add(new ObjectFieldDefinition(field.GetString()!, type: TypeReference.Parse("String"), pureResolver: ctx =>
{
var product = ctx.Parent<Product>();
return product.SubTypeSpecific.ContainsKey(name) ? product.SubTypeSpecific[name] : null;
}));
}
now I have "moved out" dynamic properties from a Dictionary (That is a sub-object in my documentDb) to own properties. Works well.
BUT:
How can I extend my ProductFilter in the same fashion?
I would like to extend my current Product-filter so its possible to search on the new dynamic property "color"
getProducts(where : color : {eq :"blue" }}) {...}
I can create new FilterInputDefinition, but not extend existing filter (because there is no FilterInputTypeExtensions.CreateUnsafe()
If I manage to create the new filter, is there any way to update the IQueryable-generation so that the inputed color ("blue")
So the query to my CosmosDb will be created automatically?
Many thanks in advance.

Apache Ignite : Ignite Repository query with "IN" clause, returns no records

I am using Apache Ignite as the back-end data store in a SpringBoot Application.
I have a requirement where I need to get all the entities whose name matches one of the names from a set of names.
Hence i am trying to get it implemented using a #Query configuration and a method named findAllByName(Iterable<String> names)as below:
Here on the Query, I am trying to use the 'IN' clause and want to pass an array of names as an input to the 'IN' clause.
#RepositoryConfig(cacheName = "Category")
public interface CategoryRepository extends IgniteRepository<Category, Long>
{
List<Category> findByName(String name);
#Query("SELECT * FROM Category WHERE name IN ( ? )")
Iterable<Category> findAllByName(Iterable<String> names); // this method always returns empty list .
}
In this the method findAllByName always returns empty list, even when ignite has Categories for which the name field matches the data passed in the query.
I am unable to figure out if there is a problem with the Syntax or the query of the method signature or the parameters.
Please try using String[] names instead for supplying parameters.
UPDATE: I have just checked the source, and we don't have tests for such scenario. It means that you're on uncharted territory even if it is somehow possible to get to work.
Otherwise looks unsupported currently.
I know your question is more specific to Spring Data Ignite feature. However, as an alternate, you can achieve it using the SqlQuery abstraction of Ignite.
You will form your query like this. I have pasted the sample below with custom sql function inSet that you will write. Also, the below tells how this is used in your sql.
IgniteCache<String, MyRecord> cache = this.ignite
.cache(this.environment.getProperty(Cache.CACHE_NAME));
String sql = "from “my-ignite-cache”.MyRecord WHERE
MyRecord.city=? AND inSet(?, MyRecord.flight)"
SqlQuery<String, MyRecord> sqlQuery = new SqlQuery<>(MyRecord.class,
sql);
sqlQuery.setArgs(MyCity, [Flight1, Flight2 ] );
QueryCursor<Entry<String, MyRecord>> resultCursor = cache.query(sqlQuery);
You can iterate the result cursor to do something meaningful from the extracted data.
resultCursor.forEach(e -> {
MyRecord record = e.getValue();
// do something with result
});
Below is the Ignite Custom Sql function which is used in the above Query - this will help in replicating the IN clause feature.
#QuerySqlFunction
public static boolean inSet(List<String> filterParamArgIds, String id) {
return filterParamArgIds.contains(id);
}
And finally, as a reference MyRecord referred above can be defined something like this.
public class MyRecord implements Serializable {
#QuerySqlField(name = "city", index = true)
private String city;
#QuerySqlField(name = "flight", index = true)
private String flight;
}

Java - Insert a single row at a time into google Big Query ?

I am creating an application where every time a user clicks on an article, I need to capture the article data and the user data to calculate the reach of every article and be able to run analytics on the reached data.
My application is on App Engine.
When I check documentation for inserts into BQ, most of them point towards bulk inserts in the form of Jobs or Streams.
Question:
Is it even a good practice to insert into big Query one row at a time every time a user action is initiated ? If so, could you point me to some Java code to effectively do this ?
There are limits on the number of load jobs and DML queries (1,000 per day), so you'll need to use streaming inserts for this kind of application. Note that streaming inserts are different from loading data from a Java stream.
TableId tableId = TableId.of(datasetName, tableName);
// Values of the row to insert
Map<String, Object> rowContent = new HashMap<>();
rowContent.put("booleanField", true);
// Bytes are passed in base64
rowContent.put("bytesField", "Cg0NDg0="); // 0xA, 0xD, 0xD, 0xE, 0xD in base64
// Records are passed as a map
Map<String, Object> recordsContent = new HashMap<>();
recordsContent.put("stringField", "Hello, World!");
rowContent.put("recordField", recordsContent);
InsertAllResponse response =
bigquery.insertAll(
InsertAllRequest.newBuilder(tableId)
.addRow("rowId", rowContent)
// More rows can be added in the same RPC by invoking .addRow() on the builder
.build());
if (response.hasErrors()) {
// If any of the insertions failed, this lets you inspect the errors
for (Entry<Long, List<BigQueryError>> entry : response.getInsertErrors().entrySet()) {
// inspect row error
}
}
(From the example at https://cloud.google.com/bigquery/streaming-data-into-bigquery#bigquery-stream-data-java)
Note especially that a failed insert does not always throw an exception. You must also check the response object for errors.
Is it even a good practice to insert into big Query one row at a time every time a user action is initiated ?
Yes, it's pretty typical to stream event streams to BigQuery for analytics. You'll could get better performance if you buffer multiple events into the same streaming insert request to BigQuery, but one row at a time is definitely supported.
A simplified version of Google's example.
Map<String, Object> row1Data = new HashMap<>();
row1Data.put("booleanField", true);
row1Data.put("stringField", "myString");
Map<String, Object> row2Data = new HashMap<>();
row2Data.put("booleanField", false);
row2Data.put("stringField", "myOtherString");
TableId tableId = TableId.of("myDatasetName", "myTableName");
InsertAllResponse response =
bigQuery.insertAll(
InsertAllRequest.newBuilder(tableId)
.addRow("row1Id", row1Data)
.addRow("row2Id", row2Data)
.build());
if (response.hasErrors()) {
// If any of the insertions failed, this lets you inspect the errors
for (Map.Entry<Long, List<BigQueryError>> entry : response.getInsertErrors().entrySet()) {
// inspect row error
}
}
You can use Cloud Logging API to write one row at a time.
https://cloud.google.com/logging/docs/reference/libraries
Sample code from document
public class QuickstartSample {
/** Expects a new or existing Cloud log name as the first argument. */
public static void main(String... args) throws Exception {
// Instantiates a client
Logging logging = LoggingOptions.getDefaultInstance().getService();
// The name of the log to write to
String logName = args[0]; // "my-log";
// The data to write to the log
String text = "Hello, world!";
LogEntry entry =
LogEntry.newBuilder(StringPayload.of(text))
.setSeverity(Severity.ERROR)
.setLogName(logName)
.setResource(MonitoredResource.newBuilder("global").build())
.build();
// Writes the log entry asynchronously
logging.write(Collections.singleton(entry));
System.out.printf("Logged: %s%n", text);
}
}
In this case you need to create sink from dataflow logs. Then message will be redirect to the big Query table.
https://cloud.google.com/logging/docs/export/configure_export_v2

How do I add a lazy loaded column in EntitySpaces?

If you do not have experience with or aren't currently using EntitySpaces ("ES") ORM this question is not meant for you.
I have a 10 year old application that after 4 years now needs my attention. My application uses a now defunct ORM called EntitySpaces and I'm hoping if you're reading this you have experience or maybe still use it too! Switching to another ORM is not an option at this time so I need to find a way to make this work.
Between the time I last actively worked on my application and now (ES Version 2012-09-30), EntitySpaces ("ES") has gone through a significant change in the underlying ADO.net back-end. The scenario that I'm seeking help on is when an entity collection is loaded with only a subset of the columns:
_products = new ProductCollection();
_products.Query.SelectAllExcept(_products.Query.ImageData);
_products.LoadAll();
I then override the properties that weren't loaded in the initial select so that I may lazyload them in the accessor. Here is an example of one such lazy-loaded property that used to work perfectly.
public override byte[] ImageData
{
get
{
bool rowIsDirty = base.es.RowState != DataRowState.Unchanged;
// Check if we have loaded the blob data
if(base.Row.Table != null && base.Row.Table.Columns.Contains(ProductMetadata.ColumnNames.ImageData) == false)
{
// add the column before we can save data to the entity
this.Row.Table.Columns.Add(ProductMetadata.ColumnNames.ImageData, typeof(byte[]));
}
if(base.Row[ProductMetadata.ColumnNames.ImageData] is System.DBNull)
{
// Need to load the data
Product product = new Product();
product.Query.Select(product.Query.ImageData).Where(product.Query.ProductID == base.ProductID);
if(product.Query.Load())
{
if (product.Row[ProductMetadata.ColumnNames.ImageData] is System.DBNull == false)
{
base.ImageData = product.ImageData;
if (rowIsDirty == false)
{
base.AcceptChanges();
}
}
}
}
return base.ImageData;
}
set
{
base.ImageData = value;
}
}
The interesting part is where I add the column to the underlying DataTable DataColumn collection:
this.Row.Table.Columns.Add(ProductMetadata.ColumnNames.ImageData, typeof(byte[]));
I had to comment out all the ADO.net related stuff from that accessor when I updated to the current (and open source) edition of ES (version 2012-09-30). That means that the "ImageData" column isn't properly configured and when I change it's data and attempt to save the entity I receive the following error:
Column 'ImageData' does not belong to table .
I've spent a few days looking through the ES source and experimenting and it appears that they no longer use a DataTable to back the entities, but instead are using a 'esSmartDictionary'.
My question is: Is there a known, supported way to accomplish the same lazy loaded behavior that used to work in the new version of ES? Where I can update a property (i.e. column) that wasn't included in the initial select by telling the ORM to add it to the entity backing store?
After analyzing how ES constructs the DataTable that is uses for updates it became clear that columns not included in the initial select (i.e. load) operation needed to be added to the esEntityCollectionBase.SelectedColumns dictionary. I added the following method to handle this.
/// <summary>
/// Appends the specified column to the SelectedColumns dictionary. The selected columns collection is
/// important as it serves as the basis for DataTable creation when updating an entity collection. If you've
/// lazy loaded a column (i.e. it wasn't included in the initial select) it will not be automatically
/// included in the selected columns collection. If you want to update the collection including the lazy
/// loaded column you need to use this method to add the column to the Select Columns list.
/// </summary>
/// <param name="columnName">The lazy loaded column name. Note: Use the {yourentityname}Metadata.ColumnNames
/// class to access the column names.</param>
public void AddLazyLoadedColumn(string columnName)
{
if(this.selectedColumns == null)
{
throw new Exception(
"You can only append a lazy-loaded Column to a partially selected entity collection");
}
if (this.selectedColumns.ContainsKey(columnName))
{
return;
}
else
{
// Using the count because I can't determine what the value is supposed to be or how it's used. From
// I can tell it's just the number of the column as it was selected: if 8 colums were selected the
// value would be 1 through 8 - ??
int columnValue = selectedColumns.Count;
this.selectedColumns.Add(columnName, columnValue);
}
}
You would use this method like this:
public override System.Byte[] ImageData
{
get
{
var collection = this.GetCollection();
if(collection != null)
{
collection.AddLazyLoadedColumn(ProductMetadata.ColumnNames.ImageData);
}
...
It's a shame that nobody is interested in the open source EntitySpaces. I'd be happy to work on it if I thought it had a future, but it doesn't appear so. :(
I'm still interested in any other approaches or insight from other users.