How to get lastest data from redis at real time? - redis

I want to implement real-time applications with Redis.
There are data that are pushed in real time on Redis, like the source code below that used lettuce library.
RedisClient redisClient = RedisClient.create(uri);
StatefulRedisConnection<String, String> connection = redisClient.connect()
RedisStringAsyncCommands<String, String> asyncCommands = connection.async();
List<RedisFuture<?>> futures = Lists.newArrayList();
while(true) {
futures.add(asyncCommands.set("key", "value"));
}
If I want to check the data at client in real time, how can I implement it?
At first time, I used pub/sub way, but the pub/sub method could not get the stored data. It was just publish data - channel - subscribe data in real time.
For example, Kafka can continuously get data through the consumer, like that.
while(true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
logger.info("offset = {}, value = {}", record.offset(), record.value());
}
}
Are there some ways?

Related

Reading a large number of keys in high Redis traffic

I have used Redis to store the statistics of a website with relatively high traffic. I get a Redis variable for each statistical item and increment it with each call.
Now, I need to read this information every minute and update it in SQL and reset the keys to zero. Since the number of keys and traffic is high, what is the optimal way to read the keys?
The easiest method is to scan the Redis database every minute. But since scanning is costly and it will include new keys that are alive for less than a minute, this method does not seem suitable.
In another method, I first enabled the keyspace events to receive a notification when the keys expires.
CONFIG SET notify-keyspace-events Ex
Then, for each key, I created an empty key with a TTL of one minute, and at the time of receiving its expiration notification, I read the associated key and its value and put it in the output queue. This method also has disadvantages, such as the possibility of losing notifications when reconnecting.
public void SubscribeOnStatKeyExpire()
{
sub.Subscribe("__keyevent#0__:expired").OnMessage(async channelMessage => {
await ExportExpiredKeys((string)channelMessage.Message);
});
}
private async Task ExportExpiredKeys(string expiredKey)
{
if(expiredKey.StartsWith(_notifyKeyPrefix))
{
var key = expiredKey.Remove(0, _notifyKeyPrefix.Length);
var value = await _cache.StringGetDeleteAsync(key);
if (!value.HasValue || value.IsNullOrEmpty) return;
var count = (int)value;
if(count <= _negligibleValue)
{
await _cache.StringIncrementAsync(key, count);
return;
}
var listLength = await Enqueue(_exportQueue, key + "." + value);
if (listLength > _numberOfKeysInOneMessage) // _numberOfKeysInOneMessage for now is 500
{
var list = await Dequeue(_exportQueue,_numberOfKeysInOneMessage);
await SendToSqlServiceBrocker(list);
}
}
}
Is there a better way? For example, using a stream, sorted set or etc.

Spring Webflux: I want to send data to kafka after saving to database

I'm trying to do send data to kafka after my database operation is successful.
I have a /POST endpoint which store the data in mongodb and return the whole object along with mongoDB uuid.
Now I want to perform an addition task, if data is successfully saved in mongodb i should call my kafka producer method and send the data.
Not sure how to do it.
Current Codebase
public Mono<?> createStock(StockDTO stockDTONBody) {
// logger.info("Received StockDTO body: {}, ", stockDTONBody);
Mono<StockDTO> stockDTO = mongoTemplate.save(stockDTONBody);
// HERE I WANT TO SEND TO KAFKA IF DATA IS SAVED TO MONGO.
return stockDTO;
}
Thanks #Alex for help. I
Adding my answer for others.
public Mono<?> createStock(StockDTO stockDTONBody) {
// logger.info("Received StockDTO body: {}, ", stockDTONBody);
Mono<StockDTO> stockDTO = mongoTemplate.save(stockDTONBody);
// =============== Kafka Code added======================
return stockDTO.flatMap(data -> sendToKafka(data, "create"));
}
public Mono<?> sendToKafka(StockDTO stockDTO, String eventName){
Map<String, Object> data = new HashMap<String, Object>();
data.put("event", eventName);
data.put("campaign", stockDTO);
template.send(kafkaTopicName, data.toString()).log().subscribe();
System.out.println("sending to Kafka "+ eventName + data.toString());
return Mono.just(stockDTO);
}
This can result in dual writes if your data is saved in mongo and something goes wrong while publishing to kafka. Data will be missing in kafka. Instead you should use change data capture for this. Mongo provides mongo change streams which can be used here or there are other open source kafka connectors available where you can configure the connectors to listen to changelogs of Mongo and stream those to kafka.

What happens when flux is returned from spring web controller?

I am comparatively new to reactive APIs and was curious about what was happening behind the scenes when we return a Flux from a web controller.
According to spring-web documentation
Reactive return values are handled as follows:
A single-value promise is adapted to, similar to using DeferredResult. Examples include Mono (Reactor) or Single (RxJava).
A multi-value stream with a streaming media type (such as application/stream+json or text/event-stream) is adapted to, similar to using ResponseBodyEmitter or SseEmitter. Examples include Flux (Reactor) or Observable (RxJava). Applications can also return Flux or Observable.
A multi-value stream with any other media type (such as application/json) is adapted to, similar to using DeferredResult<List<?>>.
I created two APIs as below:
#GetMapping("/async-deferredresult")
public DeferredResult<List<String>> handleReqDefResult(Model model) {
LOGGER.info("Received async-deferredresult request");
DeferredResult<List<String>> output = new DeferredResult<>();
ForkJoinPool.commonPool().submit(() -> {
LOGGER.info("Processing in separate thread");
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000 ; i++) {
list.add(String.valueOf(i));
}
output.setResult(list);
});
LOGGER.info("servlet thread freed");
return output;
}
#GetMapping(value = "/async-flux",produces = MediaType.APPLICATION_JSON_VALUE)
public Flux<String> handleReqDefResult1(Model model) {
LOGGER.info("Received async-deferredresult request");
List<String> list = new ArrayList<>();
list.stream();
for (int i = 0; i < 10000 ; i++) {
list.add(String.valueOf(i));
}
return Flux.fromIterable(list);
}
So the exception was that both APIs should behave same as multi-value stream(Flux) should have similar behavior to that of a returning a DeferredResult. But in API where deferred result was returned, whole list was printed in one go on browser where as in API where Flux was returned the numbers where printed sequentially(one by one).
What exactly is happening when I am returning Flux from controller ?
When we return a Flux from a service endpoint many things can happen. But I assume you want to know what is happening when Flux observed as stream of events from client of this endpoint.
Scenario One: By adding 'application/json' as the content type of the endpoint Spring will communicate to the client to expect JSON body.
#GetMapping(value = "/async-flux", produces = MediaType.APPLICATION_JSON_VALUE)
public Flux<String> handleReqDefResult1(Model model) {
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
list.add(String.valueOf(i));
}
return Flux.fromIterable(list);
}
The output at the client will be the whole set of numbers in one go. And once the response delivered the connection will be closed. Even though you have used Flux as the response type, you are still bound the laws of how HTTP over TCP/IP works. The endpoint got a HTTP request, execute the logic and respond with HTTP response containing final result.
As a result, you do not see the real value of a reactive api.
Scenario Two: By adding 'application/stream+json' as the content type of the endpoint, Spring starts to treat the resulting events of the Flux stream as individual JSON items. When an item is emitted is gets serialised, the HTTP response buffer is flushed, and the connection from the server to client keep open up until the event sequence get completed.
To get that working we can slightly modify your original code as follows.
#GetMapping(value = "/async-flux",produces = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Flux<String> handleReqDefResult1(Model model) {
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000 ; i++) {
list.add(String.valueOf(i));
}
return Flux.fromIterable(list)
// we have 1 sec delay to demonstrate the difference of behaviour.
.delayElements(Duration.ofSeconds(1));
}
This time we can see the real value of reactive api endpoint where it is able to deliver results to it's client as date get available.
You can find more details about how to build reactive REST APIs at
https://medium.com/#senanayake.kalpa/building-reactive-rest-apis-in-java-part-1-cd2c34af55c6
https://medium.com/#senanayake.kalpa/building-reactive-rest-apis-in-java-part-2-bd270d4cdf3f

How to span a ConcurrentDictionary across load-balancer servers when using SignalR hub with Redis

I have ASP.NET Core web application setup with SignalR scaled-out with Redis.
Using the built-in groups works fine:
Clients.Group("Group_Name");
and survives multiple load-balancers. I'm assuming that SignalR persists those groups in Redis automatically so all servers know what groups we have and who are subscribed to them.
However, in my situation, I can't just rely on Groups (or Users), as there is no way to map the connectionId (Say when overloading OnDisconnectedAsync and only the connection id is known) back to its group, and you always need the Group_Name to identify the group. I need that to identify which part of the group is online, so when OnDisconnectedAsync is called, I know which group this guy belongs to, and on which side of the conversation he is.
I've done some research, and they all suggested (including Microsoft Docs) to use something like:
static readonly ConcurrentDictionary<string, ConversationInformation> connectionMaps;
in the hub itself.
Now, this is a great solution (and thread-safe), except that it exists only on one of the load-balancer server's memory, and the other servers have a different instance of this dictionary.
The question is, do I have to persist connectionMaps manually? Using Redis for example?
Something like:
public class ChatHub : Hub
{
static readonly ConcurrentDictionary<string, ConversationInformation> connectionMaps;
ChatHub(IDistributedCache distributedCache)
{
connectionMaps = distributedCache.Get("ConnectionMaps");
/// I think connectionMaps should not be static any more.
}
}
and if yes, is it thread-safe? if no, can you suggest a better solution that works with Load-Balancing?
Have been battling with the same issue on this end. What I've come up with is to persist the collections within the redis cache while utilising a StackExchange.Redis.IDatabaseAsync alongside locks to handle concurrency.
This unfortunately makes the entire process sync but couldn't quite figure a way around this.
Here's the core of what I'm doing, this attains a lock and return back a deserialised collection from the cache
private async Task<ConcurrentDictionary<int, HubMedia>> GetMediaAttributes(bool requireLock)
{
if(requireLock)
{
var retryTime = 0;
try
{
while (!await _redisDatabase.LockTakeAsync(_mediaAttributesLock, _lockValue, _defaultLockDuration))
{
//wait till we can get a lock on the data, 100ms by default
await Task.Delay(100);
retryTime += 10;
if (retryTime > _defaultLockDuration.TotalMilliseconds)
{
_logger.LogError("Failed to get Media Attributes");
return null;
}
}
}
catch(TaskCanceledException e)
{
_logger.LogError("Failed to take lock within the default 5 second wait time " + e);
return null;
}
}
var mediaAttributes = await _redisDatabase.StringGetAsync(MEDIA_ATTRIBUTES_LIST);
if (!mediaAttributes.HasValue)
{
return new ConcurrentDictionary<int, HubMedia>();
}
return JsonConvert.DeserializeObject<ConcurrentDictionary<int, HubMedia>>(mediaAttributes);
}
Updating the collection like so after I've done manipulating it
private async Task<bool> UpdateCollection(string redisCollectionKey, object collection, string lockKey)
{
var success = false;
try
{
success = await _redisDatabase.StringSetAsync(redisCollectionKey, JsonConvert.SerializeObject(collection, new JsonSerializerSettings
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
}));
}
finally
{
await _redisDatabase.LockReleaseAsync(lockKey, _lockValue);
}
return success;
}
and when I'm done I just ensure the lock is released for other instances to grab and use
private async Task ReleaseLock(string lockKey)
{
await _redisDatabase.LockReleaseAsync(lockKey, _lockValue);
}
Would be happy to hear if you find a better way of doing this. Struggled to find any documentation on scale out with data retention and sharing.

Java - Insert a single row at a time into google Big Query ?

I am creating an application where every time a user clicks on an article, I need to capture the article data and the user data to calculate the reach of every article and be able to run analytics on the reached data.
My application is on App Engine.
When I check documentation for inserts into BQ, most of them point towards bulk inserts in the form of Jobs or Streams.
Question:
Is it even a good practice to insert into big Query one row at a time every time a user action is initiated ? If so, could you point me to some Java code to effectively do this ?
There are limits on the number of load jobs and DML queries (1,000 per day), so you'll need to use streaming inserts for this kind of application. Note that streaming inserts are different from loading data from a Java stream.
TableId tableId = TableId.of(datasetName, tableName);
// Values of the row to insert
Map<String, Object> rowContent = new HashMap<>();
rowContent.put("booleanField", true);
// Bytes are passed in base64
rowContent.put("bytesField", "Cg0NDg0="); // 0xA, 0xD, 0xD, 0xE, 0xD in base64
// Records are passed as a map
Map<String, Object> recordsContent = new HashMap<>();
recordsContent.put("stringField", "Hello, World!");
rowContent.put("recordField", recordsContent);
InsertAllResponse response =
bigquery.insertAll(
InsertAllRequest.newBuilder(tableId)
.addRow("rowId", rowContent)
// More rows can be added in the same RPC by invoking .addRow() on the builder
.build());
if (response.hasErrors()) {
// If any of the insertions failed, this lets you inspect the errors
for (Entry<Long, List<BigQueryError>> entry : response.getInsertErrors().entrySet()) {
// inspect row error
}
}
(From the example at https://cloud.google.com/bigquery/streaming-data-into-bigquery#bigquery-stream-data-java)
Note especially that a failed insert does not always throw an exception. You must also check the response object for errors.
Is it even a good practice to insert into big Query one row at a time every time a user action is initiated ?
Yes, it's pretty typical to stream event streams to BigQuery for analytics. You'll could get better performance if you buffer multiple events into the same streaming insert request to BigQuery, but one row at a time is definitely supported.
A simplified version of Google's example.
Map<String, Object> row1Data = new HashMap<>();
row1Data.put("booleanField", true);
row1Data.put("stringField", "myString");
Map<String, Object> row2Data = new HashMap<>();
row2Data.put("booleanField", false);
row2Data.put("stringField", "myOtherString");
TableId tableId = TableId.of("myDatasetName", "myTableName");
InsertAllResponse response =
bigQuery.insertAll(
InsertAllRequest.newBuilder(tableId)
.addRow("row1Id", row1Data)
.addRow("row2Id", row2Data)
.build());
if (response.hasErrors()) {
// If any of the insertions failed, this lets you inspect the errors
for (Map.Entry<Long, List<BigQueryError>> entry : response.getInsertErrors().entrySet()) {
// inspect row error
}
}
You can use Cloud Logging API to write one row at a time.
https://cloud.google.com/logging/docs/reference/libraries
Sample code from document
public class QuickstartSample {
/** Expects a new or existing Cloud log name as the first argument. */
public static void main(String... args) throws Exception {
// Instantiates a client
Logging logging = LoggingOptions.getDefaultInstance().getService();
// The name of the log to write to
String logName = args[0]; // "my-log";
// The data to write to the log
String text = "Hello, world!";
LogEntry entry =
LogEntry.newBuilder(StringPayload.of(text))
.setSeverity(Severity.ERROR)
.setLogName(logName)
.setResource(MonitoredResource.newBuilder("global").build())
.build();
// Writes the log entry asynchronously
logging.write(Collections.singleton(entry));
System.out.printf("Logged: %s%n", text);
}
}
In this case you need to create sink from dataflow logs. Then message will be redirect to the big Query table.
https://cloud.google.com/logging/docs/export/configure_export_v2