Map object query over ignite - sql

I am a beginner at Ignıte. I am doing a sample app in order to measure query times of it.
So the key in the cache is String, value is Map. One of the field in value Map is "order_item_subtotal" so the query is like:
select * from Map where order_item_subtotal>400
And the sample code is:
Ignite ignite= Ignition.ignite();
IgniteCache<String, Map<String, Object>> dummyCache= ignite.getOrCreateCache(cfg);
Map<String,Map<String, Object>> bufferMap=new HashMap<String,Map<String, Object>>();
int i=0;
for (String jsonStr : jsonStrs) {
if(i%1000==0){
dummyCache.putAll(bufferMap);
bufferMap.clear();
}
Map data=mapper.readValue(jsonStr, Map.class);
bufferMap.put(data.get("order_item_id").toString(), data);
i++;
}
SqlFieldsQuery asd=new SqlFieldsQuery("select * from Map where order_item_subtotal>400");
List<List<?>> result= dummyCache.query(asd).getAll();
But the result is always "[]", means empty. And there is no error or exceptions.
What am I missing here? any ideas?
PS: sample data below
{order_item_id=99, order_item_order_id=37, order_item_product_id=365, order_item_quantity=1, order_item_subtotal=59.9900016784668, order_item_product_price=59.9900016784668, product_id=365, product_category_id=17, product_name=Perfect Fitness Perfect Rip Deck, product_description=, product_price=59.9900016784668, product_image=http://images.acmesports.sports/Perfect+Fitness+Perfect+Rip+Deck}

This is not supported. You should use a simple POJO class instead of a map to make it work.
Note that Ignite will store data in binary format and will not deserialize objects when running queries. So you still don't need to deploy class definitions on server node. Please refer to this page for more details: https://apacheignite.readme.io/docs/binary-marshaller

Related

Materialized view to use different Serde

Version used: Kafka 3.1.1, Confluent 7.1.0, Avro 1.11.0
I’m creating a REST controller which is “searching” for AVRO objects in a topic. The objects in the topic are serialized using SpecificAvroSerde<>. Each topic has assigned two AVRO schemas. One for the key (with several fields of various types) and one for the value (multiple fields and types).
I’ve done this several times whereby I’m consuming the topic in a KTable and then materialize it. There is only one pair of serdes involved and the serialized format is the same for both the topic and the materialized view (RocksaltDb). The REST controller then can look up the store and either perform a get with a key or do a range scan between two keys. This all works as expected.
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.range(fromKey, toKey)) {
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", fromKey, viewName);
}
}
I now want to change this using a prefix scan instead. Since the key contains multiple fields there is no way to only serialize first part (i.e. first few fields) of the key I need a specialized serializer. This also means I have to use a different serializer for the materialized view itself (SpecificAvroSerde puts the magic number and schema ID at the beginning of the byte array) as otherwise the serialized output for the prefix and the key in the materialized view can’t be compared. Hence I created a specialised Serde which serializes the key using the same logic as when used for serializing the prefix but omitting the fields not required for the scan (i.e. omitting the last field). Above code now looks
private final static String TOPIC_NAME = "input-topic";
private final static String VIEW_NAME = "materialized-view";
private final SpecificAvroSerde<ProductXrefKey> productXrefKeySerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXref> productXrefSerde = new SpecificAvroSerde<>();
private final SpecificAvroSerde<ProductXrefKey> materializedProductXrefKeySerde = new ProductXrefKeySerde();
// for the value part we can still used standard serde as no change in serialization logic needed
private final SpecificAvroSerde<ProductXref> materializedProductXrefSerde = new SpecificAvroSerde<>();
// telling the serializer to cut off last field
private final SpecificAvroSerde<ProductXref> prefixScanProductXrefSerde = new ProductXrefKeySerde(true);
final Map<String, Object> props = this.kafkaProperties.buildStreamsProperties();
productXrefKeySerde.configure(props, true);
productXrefSerde.configure(props, false);
KTable<ProductXrefKey, ProductXref> productXrefTable = builder
.table(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde),
Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(materializedProductXrefKeySerde)
.withValueSerde(materializedProductXrefSerde));
<…>
final ReadOnlyKeyValueStore<ProductXrefKey, ProductXref> store =
streamsBuilderFactoryBean.getKafkaStreams().store(fromNameAndType(VIEW_NAME, keyValueStore()));
try (KeyValueIterator<ProductXrefKey, ProductXref> range = store.prefixScan(prefixKey, prefixScanProductXrefSerde)){
if (range != null) {
range.forEachRemaining(kv -> {
<…>
});
} else {
log.info("Could not find {} in local ReadOnlyKeyValueStore {}", prefixKey, viewName);
}
}
My assumption was, that the topic gets deserialized using the SpecificAvroSerde and then gets serialized for the view using my ProductXrefKeySerde. The problem is, that the content in the materialized view is still serialized using the same logic as in the original topic. It appears that the serializer is never used during the topic being processed and stored in the materialized view. I can verify that also on the file system and see that the keys in the RocksaltDb files are serialized with the magic byte and schema ID and hence prefixScan wont be able to fine anything.
How can I change the serialization format for the materialized view?
Or is there a better way for serializing a prefix AVRO object?
It appears that there is some optimization happening which avoids deserialization/serialization if KTable is directly materialized. I've changed the logic such that it consumes it as a KStream and then creates the KTable (toTable(...))
KTable<ProductXrefKey, ProductXref> productXrefStream = builder
.stream(TOPIC_NAME, Consumed.with(productXrefKeySerde, productXrefSerde))
.toTable(Materialized.<ProductXrefKey, ProductXref, KeyValueStore<Bytes, byte[]>>as(VIEW_NAME)
.withKeySerde(productXrefKeySerde)
.withValueSerde(productXrefSerde));
With this small change, data now gets deserialized (using SpecificAvroSerde<>) and serialized again using the provided ProductXrefKeySerde. Now also the prefix scan works and returns the records as expected.

Apache Ignite : Ignite Repository query with "IN" clause, returns no records

I am using Apache Ignite as the back-end data store in a SpringBoot Application.
I have a requirement where I need to get all the entities whose name matches one of the names from a set of names.
Hence i am trying to get it implemented using a #Query configuration and a method named findAllByName(Iterable<String> names)as below:
Here on the Query, I am trying to use the 'IN' clause and want to pass an array of names as an input to the 'IN' clause.
#RepositoryConfig(cacheName = "Category")
public interface CategoryRepository extends IgniteRepository<Category, Long>
{
List<Category> findByName(String name);
#Query("SELECT * FROM Category WHERE name IN ( ? )")
Iterable<Category> findAllByName(Iterable<String> names); // this method always returns empty list .
}
In this the method findAllByName always returns empty list, even when ignite has Categories for which the name field matches the data passed in the query.
I am unable to figure out if there is a problem with the Syntax or the query of the method signature or the parameters.
Please try using String[] names instead for supplying parameters.
UPDATE: I have just checked the source, and we don't have tests for such scenario. It means that you're on uncharted territory even if it is somehow possible to get to work.
Otherwise looks unsupported currently.
I know your question is more specific to Spring Data Ignite feature. However, as an alternate, you can achieve it using the SqlQuery abstraction of Ignite.
You will form your query like this. I have pasted the sample below with custom sql function inSet that you will write. Also, the below tells how this is used in your sql.
IgniteCache<String, MyRecord> cache = this.ignite
.cache(this.environment.getProperty(Cache.CACHE_NAME));
String sql = "from “my-ignite-cache”.MyRecord WHERE
MyRecord.city=? AND inSet(?, MyRecord.flight)"
SqlQuery<String, MyRecord> sqlQuery = new SqlQuery<>(MyRecord.class,
sql);
sqlQuery.setArgs(MyCity, [Flight1, Flight2 ] );
QueryCursor<Entry<String, MyRecord>> resultCursor = cache.query(sqlQuery);
You can iterate the result cursor to do something meaningful from the extracted data.
resultCursor.forEach(e -> {
MyRecord record = e.getValue();
// do something with result
});
Below is the Ignite Custom Sql function which is used in the above Query - this will help in replicating the IN clause feature.
#QuerySqlFunction
public static boolean inSet(List<String> filterParamArgIds, String id) {
return filterParamArgIds.contains(id);
}
And finally, as a reference MyRecord referred above can be defined something like this.
public class MyRecord implements Serializable {
#QuerySqlField(name = "city", index = true)
private String city;
#QuerySqlField(name = "flight", index = true)
private String flight;
}

Apache Ignite Continuous Queries : How to get the field names and field values in the listener updates when there are dynamic fields?

I am working on a POC on whether or not we should go ahead with Apache Ignite both for commerical and enterprise use. There is a use case though that we are trying to find an answer for.
Preconditions
Dynamically creation of tables i.e. there may be new fields that come to be put into the cache. Meaning there is no precompiled POJO(Model) defining the attributes of the table/cache.
Use case
I would like to write a SELECT continuous query where it gives me the results that are modified. So I wrote that query but the problem is that when the listener gets a notification, I am not able to find all the field names that are modified from any method call. I would like to be able to get all the field names and field values in some sort of Map, which I can use and then submit to other systems.
You could track all modified field values using binary object and continuous query:
IgniteCache<Integer, BinaryObject> cache = ignite.cache("person").withKeepBinary();
ContinuousQuery<Integer, BinaryObject> query = new ContinuousQuery<>();
query.setLocalListener(events -> {
for (CacheEntryEvent<? extends Integer, ? extends BinaryObject> event : events) {
BinaryType type = ignite.binary().type("Person");
if (event.getOldValue() != null && event.getValue() != null) {
HashMap<String,Object> oldProps = new HashMap<>();
HashMap<String,Object> newProps = new HashMap<>();
for (String field : type.fieldNames()) {
oldProps.put(field,event.getOldValue().field(field));
newProps.put(field,event.getValue().field(field));
}
com.google.common.collect.MapDifference<Object, Object> diff = com.google.common.collect.Maps.difference(oldProps, newProps);
System.out.println(diff.entriesDiffering());
}
}
});
cache.query(query);
cache.put(1, ignite.binary().builder("Person").setField("name","Alice").build());
cache.put(1, ignite.binary().builder("Person").setField("name","Bob").build());

Ignite CacheJdbcPojoStoreFactory using Enum fields

I am to using the CacheJdbcPojoStoreFactory
I want to have a VARCHAR field in the database which maps to an Enum in Java.
The way I am trying to achieve this is something like the following. I want the application code to work with the enum, but the persistence to use the string so that it is human readable in the database. I do not want to use int values in the database.
This seems to work fine for creating new objects, but not for reading them out. It seems that it tries to set the field directly, and the setter (setSideAsString) is not called. Of course there is no field called sideAsString. Should this work? Any suggestions?
Here is the code excerpt
In some application code I would do something like
trade.setSide(OrderSide.Buy)
And this will persist fine. I can read "Buy" in the side column as a VARCHAR.
In Trade
private OrderSide side; // OrderSide is an enum with Buy,Sell
public OrderSide getSide() {
return side;
}
public void setSide(OrderSide side) {
this.side = side;
}
public String getSideAsString() {
return this.side.name();
}
public void setSideAsString(String s) {
this.side = OrderSide.valueOf(s);
}
Now when configuring the store, I do this
Collection<JdbcTypeField> vals = new ArrayList<>();
vals.add(new JdbcTypeField(Types.VARCHAR, "side", String.class, "sideAsString"));
After a clean start, If I query Trade using Ignite SQL query, and call trade.getSide() it will be null. Other (directly mapped) columns are fine.
Thanks,
Gordon
BinaryMarshaller deserialize only fields which used in query.
Please try to use OptimizedMarshaller:
IgniteConfiguration cfg = new IgniteConfiguration();
...
cfg.setMarshaller(new OptimizedMarshaller());
Here's the ticket for support enum mapping in CacheJdbcPojoStore.

GridGain SQL Transform Query Limitations

I am running into an issue with doing an SQL Transform Query. I have a replicated Cache setup with thousands of cached items in various Classes. When I run a transform query that returns specific (summary) items from Classes on the Cache, it looks like the query executes just fine and returns a Collection. However, when I iterate through the Collection, after 2,048 items, the individual items in the Collection (which used to be Cast'able until then) are now simple a 'GridCacheQueryResponseEntry', which I can't seem to cast anymore...
Is 2,048 items the limit for a Transform Query Result Set in GridGain?
here's the code I use to query/transform the cache items (Simplified a bit). This works for exactly 2048 items and then throws an Exception:
GridCacheQuery<Map.Entry<UUID, Object>> TypeQuery = queries.createSqlQuery(Object.class, "from Object where Type = ? and Ident regexp ?");
GridClosure<Map.Entry<UUID, Object>, ReturnObject> Trans = new GridClosure<Map.Entry<UUID, Object>, ReturnGeometry>() {
#Override public ReturnObject apply(Map.Entry<UUID, Object> e) {
try {
ReturnObject tmp = e.getValue().getReturnObject();
} catch (Exception ex) {ex.getMessage()); }
return tmp;
}
};
Collection<ReturnObject> results = TypeQuery .execute(Trans,"VarA","VarB").get();
Iterator iter = results.iterator();
while (iter.hasNext()) {
try {
Object item = iter.next();
ReturnObjectpoint = (ReturnObject) item;
} catch (Exception ex) {}
}
There are no such limitations in GridGain. Whenever you execute a query, you have two options:
Call GridCacheQueryFuture.get() method to get the whole result set as a collection. This works only for relatively small result set, because all the rows in the result set have to be loaded to client node's memory.
Use GridCacheFutureMethod.next() to iterate through result set. In this case results will be acquired from remote nodes page by page. When you finished iteration through a page, it's discarded and next one is loaded. So you have only one page at a time which gives you an opportunity to query result sets of any size.
As for GridCacheQueryResponseEntry, you should not cast to it, because it's an internal GridGain class and is actually a simple implementation of Map.Entry interface which represents a key-value pair from GridGain cache.
In case of transform query you will get Map.Entry instances only in transformer, while client node will receive already transformed values, so I'm not sure how it's possible to get them during iteration. Can you provide a small code example of how you execute the query?