Ignite and Kafka Integration - ignite

I am trying the Ignite and Kafka Integration to bring kafka message into Ignite cache.
My message key is a random string(To work with Ignite, the kafka message key can't be null), and the value is a json string representation for Person(a java class)
When Ignite receives such a message, it looks that Ignite will use the message's key(the random string in my case) as the cache key.
Is it possible to change the message key to the person's id, so that I can put the into the cache.
Looks that streamer.receiver(new StreamReceiver) is workable
streamer.receiver(new StreamReceiver<String, String>() {
public void receive(IgniteCache<String, String> cache, Collection<Map.Entry<String, String>> entries) throws IgniteException {
for (Map.Entry<String, String> entry : entries) {
Person p = fromJson(entry.getValue());
//ignore the message key,and use person id as the cache key
cache.put(p.getId(), p);
}
}
});
Is this the recommended way? and I am not sure whether calling cache.put in StreamReceiver is a correct way, since it is only a pre-processing step before writing to cache.

Data streamer will map all your keys to cache affinity nodes, create batches of entries and send batches to affinity nodes. After it StreamReceiver will receive your entries, get Person's ID and invoke cache.put(K, V). Putting entry lead to mapping your key to corresponding cache affinity node and sending update request to this node.
Everything looks good. But result of mapping your random key from Kafka and result of mapping Person's ID will be different (most likely different nodes). As result your will get poor performance due to redundant network hops.
Unfortunately, current KafkaStreamer implementations doesn't support stream tuple extractors (see e.g. StreamSingleTupleExtractor class). But you can easily create your own Kafka streamer implementation using existing one as example.
Also you can try use KafkaStreamer's keyDecoder and valDecoder in order to extract Person's ID from Kafka message. I don't sure, but it can help.

Related

Ingest data into warp10 - Performance tip

We're looking for the best way to ingest data in warp10. We are on a Microservices architecture that use Kafka mainly.
Two solutions:
Use Ingress endpoint as defined here: https://www.warp10.io/content/03_Documentation/03_Interacting_with_Warp_10/03_Ingesting_data/01_Ingress (This is the solution we use for now)
Use the warp10 Kafka plugin as defined here: https://blog.senx.io/introducing-the-warp-10-kafka-plugin/
As described here, we use Ingress solution as of now, based on an aggregation of data for x seconds, and call the Ingress API to send data per packet. (Instead of calling the API each time we need to insert something).
For few days, we are experimenting with the Kafka Plugin. We successfully set up the plugin and create an .mc2 responsible to consume data from a given topic and then insert them using UPDATE into warp10.
Questions:
Using the Kafka plugin, would it be better to apply the same buffer mechanism as the one applied when we use the Ingress endpoint? Or, is there any specific implementation in warp10 Kafka plugin that allows to consume message per message in the topic and call the UPDATE function for each ?
Today, as both solutions are working, we're trying to find differences to get the best performance results during ingestion of data. And if possible, without having to apply any buffer mechanism because we are trying to be in real-time as much as possible.
MC2 file:
{
'topics' [ 'our_topic_name' ] // List of Kafka topics to subscribe to
'parallelism' 1 // Number of threads to start for processing the incoming messages. Each thread will handle a certain number of partitions.
'config' { // Map of Kafka consumer parameters
'bootstrap.servers' 'kafka-headless:9092'
'group.id' 'senx-consumer'
'enable.auto.commit' 'true'
}
'macro' <%
// macro executed each time a kafka record is consumed
/*
// received record format :
{
'timestamp' 123 // The record timestamp
'timestampType' 'type' // The type of timestamp, can be one of 'NoTimestampType', 'CreateTime', 'LogAppendTime'
'topic' 'topic_name' // Name of the topic which received the message
'offset' 123 // Offset of the message in 'topic'
'partition' 123 // Id of the partition which received the message
'key' ... // Byte array of the message key
'value' ... // Byte array of the message value
'headers' { } // Map of message headers
}
*/
"recordArray" STORE
"preprod.write" "token" STORE
// macro can be called on timeout with an empty entry map
$recordArray SIZE 0 !=
<%
$recordArray 'value' GET // kafka record value is retrieved in bytes
'UTF-8' BYTES-> // convert bytes to string (WARP10 INGRESS format)
JSON->
"value" STORE
"Records received through Kafka" LOGMSG
$value LOGMSG
$value
<%
DROP
PARSE
// PARSE outputs a gtsList, including only one gts
0 GET
// GTS rename is required to use UPDATE function
"gts" STORE
$gts $gts NAME RENAME
%>
LMAP
// Store GTS in Warp10
$token
UPDATE
%>
IFT
%> // end macro
'timeout' 10000 // Polling timeout (in ms), if no message is received within this delay, the macro will be called with an empty map as input
}
If you want to cache something in Warp 10 to avoid lots of UPDATE per second, you can use SHM (SHared Memory). This is a built-in extension you need to activate.
Once activated, use it with SHMSTORE and SHMLOAD to keep objects in RAM between two WarpScript executions.
In you example, you can push all the incoming GTS in a list, or a list of list of GTS, using +! to append elements to an existing list.
The MERGE of all the GTS in the cache (by name + labels) and UPDATE in the database can then be done in a runner (don't forget to use a MUTEX)
Don't forget the total operation cost:
The ingress format can be optimized for ingestion speed, if you do not repeat classname and labels, and if you gather lines per gts. See here.
PARSE deserialize data from the Warp 10 ingress format.
UPDATE serialize data to the Warp 10 optimized ingress format (and push it to the update endpoint).
the update endpoint deserialize again.
It makes sense to do these deserialize/serialize/deserialize operation if your input data is far from the optimal ingress format. It also make sense if you want to RANGECOMPACT your data to save disk space, or do any preprocessing.

How Spring store cache and key to Redis

I follow some tutorial on web to setup Spring Cache with redis,
my function look like this:
#Cacheable(value = "post-single", key = "#id", unless = "#result.shares < 500")
#GetMapping("/{id}")
public Post getPostByID(#PathVariable String id) throws PostNotFoundException {
log.info("get post with id {}", id);
return postService.getPostByID(id);
}
As I understand, the value inside #Cacheable is the cache name and key is the cache key inside that cache name. I also know Redis is an in-memory key/value store. But now I'm confused about how Spring will store cache name to Redis because looks like Redis only manages key and value, not cache name.
Looking for anyone who can explain to me.
Thanks in advance
Spring uses cache name as the key prefix when storing your data. For example, when you call your endpoint with id=1 you will see in Redis this key
post-single::1
You can customize the prefix format through CacheKeyPrefix class.

Can I allow multiple http clients to consume a Flowable stream of data with resteasy-rxjava2 / quarkus?

Currently I am able to see the streaming values exposed by the code below, but only one http client will receive the continuous stream of values, the others will not be able to.
The code, a modified version of the quarkus quickstart for kafka reactive streaming is:
#Path("/migrations")
public class StreamingResource {
private volatile Map<String, String> counterBySystemDate = new ConcurrentHashMap<>();
#Inject
#Channel("migrations")
Flowable<String> counters;
#GET
#Path("/stream")
#Produces(MediaType.SERVER_SENT_EVENTS) // denotes that server side events (SSE) will be produced
#SseElementType("text/plain") // denotes that the contained data, within this SSE, is just regular text/plain data
public Publisher<String> stream() {
Flowable<String> mainStream = counters.doOnNext(dateSystemToCount -> {
String key = dateSystemToCount.substring(0, dateSystemToCount.lastIndexOf("_"));
counterBySystemDate.put(key, dateSystemToCount);
});
return fromIterable(counterBySystemDate.values().stream().sorted().collect(Collectors.toList()))
.concatWith(mainStream)
.onBackpressureLatest();
}
}
Is it possible to make any modification that would allow multiple clients to consume the same data, in a broadcast fashion?
I guess this implies letting go of backpressure, because that would imply a state per consumer?
I saw that Observable is not accepted as a return type in the resteasy-rxjava2 for the Server Side Events media-tpe.
Please let me know any ideas,
Thank you
Please find the full code in Why in multiple connections to PricesResource Publisher, only one gets the stream?

MobileFirst 8 : Unexpected error encountered while storing data

We are using UserAuthenticationSecurityCheck to authenticate user.
If verification is successful, the MFP server will store the user attributes.
public class AuthSecurityCheck extends UserAuthenticationSecurityCheck {
static Logger logger = Logger.getLogger(AuthSecurityCheck.class.getName());
private String userId, displayName;
private JSONObject attrObject;
private String errorMessage;
#Override
protected AuthenticatedUser createUser() {
Map<String, Object> userAttrsMap = new HashMap<String, Object>();
userAttrsMap.put("attributes",attrObject);
return new AuthenticatedUser(userId, displayName, this.getName(), userAttrsMap);
}
...
}
But if we store larger data(when userAttrsMap is large enough), we will get the 500 error.
errorMsg: Unexpected error encountered while storing data
The error is shown below:
Full source is on Github: https://github.com/DannyYang/PMR_CreateUserStoredLargeData
MFP version:
cordova-plugin-mfp 8.0.2017102115
MFP DevelopKit : 8.0.0.00-20171024-064640
The issue happens owing to the size of the data you are holding within the AuthenticatedUser object and thereby the Securitycheck's state.
MFP runtime saves the state of the securitycheck along with all the attributes to the attribute store . This involves serializing the security check state and persisting it to the DB. With a large object ( the custom map you have) this persistence operation fails and ends in a transaction rollback. This happens because the data you are trying to persist is too big and exceeds the allocated size.
SecurityCheck’s design consideration is to use it for a security check ( validation) and creating an identity object. Within your security check implementation, you have the following:
//Here the large data is assigned to the variable.
attrObject = JSONObject.parse(largeJSONString);
//This data is set into the AuthenticatedUser object.
Map<String, Object> userAttrsMap = new HashMap<String, Object>();
userAttrsMap.put("attributes",attrObject);
return new AuthenticatedUser(userId, displayName, this.getName(), userAttrsMap);
In this scenario this large data becomes part of the Securitycheck itself and will be serialized and attempted for persistence into the attribute store. When this data does not fit in the column, the transaction is rolled back and the error condition is propagated to the end user. Hence the error message you see - ” Unexpected error occured while storing data”. Enabling detailed trace will indicate the actual cause of the issue in the server trace logs.
Either way, this approach is not recommended at all in production systems because:
a) Every request from the client reaching the server goes through security introspection which will involve the server to load, check and update the securitycheck’s state. On systems taking heavy load ( production ) this can and will have performance costs. The process involves serializing the data and deserializing it later. In a distributed topology ( cluster or farms ) the request may end up in any of the nodes and these nodes will have to load and later save the security check's state to the store. All this will impact performance of your system.
b) At the end of successful authentication, the AuthenticatedUser object is propagated to the client application indicating completion of the login flow . Even if the SecurityCheck state were to be stored successfully in the attribute store ( with large data) transmitting large payloads over the network just to indicate successful login will be counter productive. For the enduser it may appear as if nothing has happened since they entered the credentials, while data indicating success is still getting downloaded.
c) Under heavy loads , the server will be strained from both a) and b) above.
You should consider cutting down the data that is propagated to the client within the authenticateduser object. Keep the data minimal within the AuthenticatedUser object. Instead, you should offload obtaining large data to resource adapters , that can be accessed post successful login .

Redis Booksleeve - How to use Hash API properly

i am using the Booksleeve hash api for Redis. I am doing the following:
CurrentConnection.Hashes.Set(0, "item:1", "priority", task.priority.ToString());
var taskResult = CurrentConnection.Hashes.GetString(0, "item:1", "priority");
taskResult.Wait();
var priority = Int32.Parse(taskResult.Result)
However i am getting an Aggregate exception:
"ERR Operation against a key holding the wrong kind of value"
I am not sure what i am doing wrong here (except of blocking the task :)).
Note: CurrentConnection is an instance of BookSleeve.RedisConnection
Please help!
Thanks
That is not a Booksleeve issue - it is a redis error; in fact, the full error message you should be seeing is:
Redis server: ERR Operation against a key holding the wrong kind of value
(where I try to make it clear that this error has come from redis, not Booksleeve)
As for what causes this: each key in redis has a designated type; string, hash, list, etc. You cannot use hash operations on something that is not a hash.
My guess, then, is that "item:1" already exists, but as something other than a hash. I have unit tests that confirm this from Booksleeve (i.e. with/without a pre-existing non-hash value).
You can investigate this in redis using redis-cli or any other client (telnet works, at a push), with the command:
type item:1
(thanks #Sripathi)