Aggregate by time and return the latest value in MuleSoft - anypoint-studio

I have an API that continuously receives key value pairs throughout the day. We use a time based aggregator to batch all the values received in 30 minute batches and send it to the server for processing. The API might receive the same key values at different times, for instance:
Call 1: {key="A", value=5}
Call 2: {key="A", value=6}
Call 3: {key="A", value=7}
Call 4: {key="B", value=10}
Call 5: {key="A", value=1}
Currently, the time based aggregator provides me with an ArrayList containing all the above key value pairs at the end of 30 minutes. However, my requirement is to only receive the latest value for each key. The end result of the examples above should be the following:
[{"A":1, "B":10}]
That is, the aggregator should only return the latest value for each key. I am not even sure if I need to use an aggregator in this case. What would be the best way to achieve this?
Thank you in advance.

Related

How do I get records in a Redis stream by position instead of ID?

With XREAD, I can get count from a specific ID (or first or last). With XRANGE, I can get a range from key to key (or from first - or to last +). Same in reverse with XREVRANGE.
How do I get by position? E.g. "give me the last 10 records in order"? XREVRANGE can do it with XREVRANGE stream + - COUNT 10, although it will be backwards in order, and XRANGE will give me the first 10 with XRANGE stream - + COUNT 10, but how do I:
get the last X in order?
get an arbitrary X in the middle, e.g. the 10 from position 100-109 (inclusive), when the stream has 5000 records in it?
Redis Streams currently (v6.0) do not provide an API for accessing their records via position/offset. The Stream's main data structure (a radix tree) can't provide this type of functionality efficiently.
This is doable with an additional index, for example a Sorted Set of the ids. A similar discussion is at https://groups.google.com/g/redis-db/c/bT00XGMq4DM/m/lWDzFa3zBQAJ
The real question is, as #sonus21 had asked, what's the use case.
It is an existing API that allows users to retrieve records from index
x to index y. It has a memory backend (just an array of records) and a
file one, want to put Redis in it as well.
We can just use Redis List to get the items randomly and at any offset using LRANGE
Attaching consumers to Redis LIST is easy but it does not come with an ack mechanism like a stream, you can build an ack mechanism in your code but that could be tricky. There could be many ways to build an ack mechanism in Redis, see one of them here Rqueue.
Ack mechanism is not always required but if you just want to consume from the LIST, you can use BLPOP/LPOP commands to consume elements from the list, that could be your simple consumer, but using BLPOP/LPOP commands would remove entries from the LIST so your offset would be dynamic, it will not always start from the 0. Instead of using BLPOP/LPOP, if you use LRANGE and track the offset in a simple key like my-consumer-offset, using this offset you can build your consumer that would always get the next element from the list based on the current offset.
Using LIST and STREAM to get random offset and streaming feature. Most important part would be you should use Consumer group so that streams are not trimmed. During push operation you should add element to STREAM as well as LIST. Once elements are inserted your consumer can work without any issue in a consumer group, for random offset you use LIST to get elements. Stream can grow as it'd be in append only mode, so you would need to periodically trim stream. See my other answer for deleting older entries

Redis server how to check pass value directly on value range

I need to manage below data using redis.
Strat_range|End_range|circle|operator|operator_id|circle_id
918005000000|918005099999|UP EAST|BSNL|4|22
919967200000|919967299999|MAHARASHTRA|AIRTEL|15|20
I have one api for operator detection. I am passed mobile number via api & check number above series & which series match at configure & return line.
For example.
Mobile number :- 919967288367
This number match in second series. So that we return below out put.
919967200000|919967299999|MAHARASHTRA|AIRTEL|15|20
We need this match directly on value. because we not using loop for performance basis.
I have 10000 series.
Please help any.
The simplest approach is to use a Sorted Set, where members are your rows and their scores is the range's start.
E.g.:
ZADD ops 918005000000 "918005099999|UP EAST|BSNL|4|22" 919967200000 "919967299999|MAHARASHTRA|AIRTEL|15|20"
To query a number:
ZREVRANGEBYSCORE ops 919967288367 -inf LIMIT 0 1
The solution is simple! Store the series as keys instead as values.
Details:
Store these series as Redis keys and values.
Key: 918005000000 Value:918005099999|UP EAST|BSNL|4|22
Key:919967200000 Value:919967299999|MAHARASHTRA|AIRTEL|15|20
Now, given a mobile number 919967288367, form a key pattern like
9199672* and get all the matching keys.
From this set of keys select the smallest key and get it's value from redis.

DynamoDB: Have sequencing within Items

I am developing forums on DynamoDB.
There is a table posts which contains all the posts in a thread.
I need to have a notion of sequence in the posts, i.e. I need to know which post came first and which came later.
My service would be running in a distributed env.
I am not sure if using Timestamp is the best solution for deciding the sequence, as the hosts might have slightly different times and might be off my milliseconds/ seconds.
Is there another way to do this?
Can I get DynamoDB to populate the date so it is consistent?
Or is there a sequence generator that I can use in a distributed env?
You can't use DynamoDB to auto-populate dates. You can use other services to provide you with auto-generating numbers or use DynamoDB's atomic increment to create your own UUID.
This can become a bottleneck if your forum is very successful (needs lots of numbers per second). I think you should start with timestamp and later on add complexity to your id generating (concatenate timestamp+uuid or timstamp+atomiccounter)
It is always a best practice to sync your servers clock (ntpd)
Use a dedicated sequence table. If you have only one sequence (say, PostId), then there's going to be only one row with two attributes in the table.
Yes, there's extra cost and effort of managing another table, but this is the best solution I know by far and haven't seen any one else mentioning it.
The table should have a key attribute as primary partition key, and a numeric value attribute with initial value of 1 (or whatever you want the initial value to be).
Every time you want to get the next available key, you tell DynamoDB to do this:
Increment the value where key = PostId by 1, and return the value before incrementing.
Note that this is one single atomic operation. DynamoDB handles the auto-incrementing, so there's no concurrency issues.
In code, there're more than one ways of implementing this. Here's one example:
Map<String,AttributeValue> key = new HashMap<>();
key.put("key", new AttributeValue("PostId"));
Map<String, AttributeValueUpdate> item = new HashMap<String, AttributeValueUpdate>();
item.put("value",
new AttributeValueUpdate()
.withAction(AttributeAction.ADD)
.withValue(1));
UpdateItemRequest request = new UpdateItemRequest("Sequences", key, item).withReturnValues(ReturnValue.ALL_OLD);
UpdateItemResult result = dynamoDBClient.updateItem(request);
Integer postId = Integer.parseInt(result.getAttributes().get("value").getN()); // <- this is the sequential ID you want to set to your post
Another variation of Chen's suggestion is to have strict ordering of posts within a given Forum Thread, as opposed to globally across all Threads. One way to do this is to have a Reply table with the Hash key of ThreadId, and a range key of ReplyId. The ReplyId would be a Number type attribute starting at 0. Every time someone replies, your app does a Query on the Reply table for the one most recent reply on that thread (ScanIndexForward: false, Limit: 1, ThreadId: ). To insert your new reply use the ReplyId of the one returned in the Query, + 1. Then use PutItem, using a Conditional Write, so that if someone else replies at the same time, an error will be returned, and your app can start again with the query.
If you want the simplest initial solution possible, then the timestamp+uuid concatenation Chen suggests is the simplest approach. A global atomic counter item will be a scaling bottleneck, as Chen mentions, and based on what you've described, a global sequence number isn't required for your app.

Yammer API - Paging

I am trying to gather a range of messages through the rest API, and am aware that you can only retrieve 20 results at a time. I have tried incrementing a page variable, but this has no affect, and I am just getting the same results each time no matter the page number (https://www.yammer.com/api/v1/messages.json?page=6). I have proceeded to use the newer_than and older_than parameters to page through the results, and it works to some extent, but it appears to be excluding records. I am using the following approach below:
Since just setting a newer_than only results in the 20 most recent records as long as they are newer than the id that is sent in the newer_than parameter, I am also setting a dynamic older_than parameter.
Send request with only a newer than parameter. This returns the 20 most recent records. (eg. ww.yammer.com/api/v1/messages.json?newer_than=235560157)
Extract the ID of the 20th id in the JSON, and using this to populate the older_than parameter. The result is 20 different records. (eg.ww.yammer.com/api/v1/messages.json?newer_than=235560157&older_than=405598096)
Repeat step 2 until no results are returned since the newer_than and older_than parameters will eventually overlap.
The problem is that the set of records that is returned with this method is less than the number of records that is returned for messages from the data export API. I am working under the assumption that newer message IDs are always generated with a value greater than any older messages.
Could I possibly be misunderstanding how paging through results is supposed to be implemented with the REST API?
Any help would be much appreciated!
Thanks in advance!
First of all, the page parameter works only for the search API.
Secondly, the way you are trying to fetch messages will not return any comments on the messages or will return top 2 comments on any message based on the "extended" parameter. By default it returns 2 comments on every message. To get all the comments on the message you will have to get it individually message wise.
That must be causing the difference in the number of messages in the two methods mentioned.
I agree with Farhann - The rest API endpoint returns only top two comments for any message by default. To get all the comments for a post, you have to make a separate request.
With the use of the Data Export API, all the comments along with the message (public and private) are also exported which increases the count of the number of the messages. While, the API call returns only recent 2 comments on any message by default.
The data export includes private messages. Private messages will not be returned by that API call.
Check if the messages you are not seeing are private messages.

WCF stateful service

Or - at least I think the correct term is stateful. I got a wcf service, listing lots of data back to me. So much data in fact, that I'm exceeding maxrecievedmessagesize - and the program crashes.
I've come to realize that I need to split the calls to the db. Instead of retrieving 5000 rows, I need to get row 1 - 200, remember the id of row number 200, get the next 200 rows from the id of row number 200 and so on.
Does anyone know how to do this? Is stateful (as in 'opposite of stateless') the correct way to go? And how would I proceed...? Could someone point me to an example?
You do not need stateful service in your scenario. Stateful services is better to avoid, especially when you want to save 5000 rows there.
Client should specified how much data it needs. So it could be method GetRows(index, amount), where index is start index for getting rows and amount of rows to get beginning from starting index.
Also client should ask about data state from service, and service just sends data state. For example when you have these 5000 rows, you could have method on service GetRowsState(index, amount) and the same story it's just saying you the last updated time for your rows, when time you have received is higher or other then on client, then once more to GetRows from server to update client data state.