How to make a Atomic Increment of a numeric field in dynamodb using aws-java-sdk 2.0? - aws-java-sdk

I am developing a system that updates the progress of tasks,
always incrementing by 1 a progress attribute into the dynamodb task table.
I want to do that using the atomic increment of the attribute.
How can I do that using aws-java-sdk 2.0?
I did several kinds of research related to this subject. But I didn't find anything.

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#API_UpdateItem_RequestSyntax under "UpdateExpression".
SET - Adds one or more attributes and values to an item. If any of these attribute already exist, they are replaced by the new values. You can also use SET to add or subtract from an attribute that is of type Number. For example: SET myNum = myNum + :val.
In this case :val always equals 1 then.
I can not give guaranties (and have not tested it) but this seems to me like it should be Atomic.
PS. One of my first StackOverflow comments, constructive feedback very welcome.

Related

How to generate an incremental item number in BOPF

I created a custom table to store reasons for modifying an Object. I'm doing a POC with BOPF in order to learn, even it may not make sense to use it here.
This is how the persistent structure looks like (simplified):
define type zobject_modifications {
object_id : zobject_id;
#EndUserText.label : 'Modification Number'
mod_num : abap.numc(4);
reason_id : zreason_id;
#EndUserText.label : 'Modification Comments'
comments : abap.string(256);
}
The alternative key consists in the object_id + mod_num. The mod_num should be an auto-generated counter, always adding 1 to the last modification for the object_id.
I created a determination before_save to generate it, checking the MAX mod_num from the database BOs and from the current instantiated BOs and increasing by 1.
But when I try to create 2 BOs for the same object in a single transaction, I get an error because of the duplicated alternative key, since the field MOD_NUM is still initial and the before_save would be triggered later. I tried to change the determination to "After Modify" but I still get the same problem.
The question is: When and how should I generate the next MOD_NUM to be able to create multiple nodes for the same object ID safely?
This must be a very common problem so there must be a best practice way to do it, but I was not able to find it.
Use a number range to produce sequential identifiers. They ensure that you won't get duplicates if there are ongoing and concurrent transactions.
If you want to insist on determining the next identifier on your own, use the io_read input parameter of the determination to retrieve the biggest mod_num:
The database contains only those nodes that have already been committed. But your new nodes are not committed, yet, such that you won't get them.
io_read in contrast accesses BOPF's temporary buffer that also contains the nodes you just created, hence seeing the more actual data.

How can I fetch (via GET) all JIRA issues? Do I go to the Search node?

It looks like /api/2/project easily returns all projects in a JIRA instance in JSON format.
I'd like to do the same for issues, but this does not appear to exist.
Is /api/2/search the standard way to do a mass-dump like this? And what is the best way to regularly update this to a database? Would I do something like search (update date > [last entry in database]) and then go through the pagination? Surely I can't be the first person attempting this, though I see no similar guide anywhere online to this (I checked Jira's own docs, no mass-issue-export guide really).
EDIT: Okay it looks like search really is the "issue dump" and not the issue node which, contrary to their documentation, does not default to a collection but really for creating issues or listing one at at time. I'll probably go the route of updated > [whatever last date is in the DB]
Unless you have very few issues, you can't fetch all of them at once.
What you can do is to execute the search step by step.
For example, lets say you have 1324 JIRA issues. In order to retrive all of them you have to execute a search similar to this several times:
/rest/api/2/search?&maxResults=100&startAt=0
This will retrive the first 100 JIRA issues starting from 0.
How to get the others?
When you execute the search, a field named total is returned. That field is the number of the total JIRA issues in your system (1324 issues).
The next query will be:
/rest/api/2/search?&maxResults=100&startAt=100
Repeat this operation, incrementing the value of startAt by 100 every time, until all the issues are returned.

DynamoDB: Have sequencing within Items

I am developing forums on DynamoDB.
There is a table posts which contains all the posts in a thread.
I need to have a notion of sequence in the posts, i.e. I need to know which post came first and which came later.
My service would be running in a distributed env.
I am not sure if using Timestamp is the best solution for deciding the sequence, as the hosts might have slightly different times and might be off my milliseconds/ seconds.
Is there another way to do this?
Can I get DynamoDB to populate the date so it is consistent?
Or is there a sequence generator that I can use in a distributed env?
You can't use DynamoDB to auto-populate dates. You can use other services to provide you with auto-generating numbers or use DynamoDB's atomic increment to create your own UUID.
This can become a bottleneck if your forum is very successful (needs lots of numbers per second). I think you should start with timestamp and later on add complexity to your id generating (concatenate timestamp+uuid or timstamp+atomiccounter)
It is always a best practice to sync your servers clock (ntpd)
Use a dedicated sequence table. If you have only one sequence (say, PostId), then there's going to be only one row with two attributes in the table.
Yes, there's extra cost and effort of managing another table, but this is the best solution I know by far and haven't seen any one else mentioning it.
The table should have a key attribute as primary partition key, and a numeric value attribute with initial value of 1 (or whatever you want the initial value to be).
Every time you want to get the next available key, you tell DynamoDB to do this:
Increment the value where key = PostId by 1, and return the value before incrementing.
Note that this is one single atomic operation. DynamoDB handles the auto-incrementing, so there's no concurrency issues.
In code, there're more than one ways of implementing this. Here's one example:
Map<String,AttributeValue> key = new HashMap<>();
key.put("key", new AttributeValue("PostId"));
Map<String, AttributeValueUpdate> item = new HashMap<String, AttributeValueUpdate>();
item.put("value",
new AttributeValueUpdate()
.withAction(AttributeAction.ADD)
.withValue(1));
UpdateItemRequest request = new UpdateItemRequest("Sequences", key, item).withReturnValues(ReturnValue.ALL_OLD);
UpdateItemResult result = dynamoDBClient.updateItem(request);
Integer postId = Integer.parseInt(result.getAttributes().get("value").getN()); // <- this is the sequential ID you want to set to your post
Another variation of Chen's suggestion is to have strict ordering of posts within a given Forum Thread, as opposed to globally across all Threads. One way to do this is to have a Reply table with the Hash key of ThreadId, and a range key of ReplyId. The ReplyId would be a Number type attribute starting at 0. Every time someone replies, your app does a Query on the Reply table for the one most recent reply on that thread (ScanIndexForward: false, Limit: 1, ThreadId: ). To insert your new reply use the ReplyId of the one returned in the Query, + 1. Then use PutItem, using a Conditional Write, so that if someone else replies at the same time, an error will be returned, and your app can start again with the query.
If you want the simplest initial solution possible, then the timestamp+uuid concatenation Chen suggests is the simplest approach. A global atomic counter item will be a scaling bottleneck, as Chen mentions, and based on what you've described, a global sequence number isn't required for your app.

Redis Sorted Set ... store data in "member"?

I am learning Redis and using an existing app (e.g. converting pieces of it) for practice.
I'm really struggling to understand first IF and then (if applicable) HOW to use Redis in one particular use-case ... apologies if this is super basic, but I'm so new that I'm not even sure if I'm asking correctly :/
Scenario:
Images are received by a server and info like time_taken and resolution is saved in a database entry. Images are then associated (e.g. "belong_to") with one Event ... all very straight-forward for a RDBS.
I'd like to use a Redis to maintain a list of the 50 most-recently-uploaded image objects for each Event, to be delivered to the client when requested. I'm thinking that a Sorted Set might be appropriate, but here are my concerns:
First, I'm not sure if a Sorted Set can/should be used in this associative manner? Can it reference other objects in Redis? Or is there just a better way to do this altogether?
Secondly, I need the ability to delete elements that are greater than X minutes old. I know about the EXPIRE command for keys, but I can't use this because not all images need to expire at the same periodicity, etc.
This second part seems more like a query on a field, which makes me think that Redis cannot be used ... but then I've read that I could maybe use the Sorted Set score to store a timestamp and find "older than X" in that way.
Can someone provide come clarity on these two issues? Thank you very much!
UPDATE
Knowing that the amount of data I need to store for each image is small and will be delivered to the client's browser, can is there anything wrong with storing it in the member "field" of a sorted set?
For example Sorted Set => event:14:pictures <time_taken> "{id:3,url:/images/3.png,lat:22.8573}"
This saves the data I need and creates a rapidly-updatable list of the last X pictures for a given event with the ability to, if needed, identify pictures that are greater than X minutes old ...
First, I'm not sure if a Sorted Set can/should be used in this
associative manner? Can it reference other objects in Redis?
Why do you need to reference other objects? An event may have n image objects, each with a time_taken and image data; a sorted set is perfect for this. The image_id is the key, the score is time_taken, and the member is the image data as json/xml, whatever; you're good to go there.
Secondly, I need the ability to delete elements that are greater than
X minutes old
If you want to delete elements greater than X minutes old, use ZREMRANGEBYSCORE:
ZREMRANGEBYSCORE event:14:pictures -inf (currentTime - X minutes)
-inf is just another way of saying the oldest member without knowing the oldest members time, but for the top range you need to calculate it based on current time before using this command ( the above is just an example)

Rails an MongoDB, how to get the last document inserted and be sure it is thread safe?

I need when I add a new document in my collection X to get the last document that was inserted in that same collection, because some values of that document must influence the document I am currently inserting.
Basically as a simple example I would need to do that:
class X
include Mongoid::Document
include Mongoid::Timestamps
before_save :set_sum
def set_sum
self.sum = X.last.sum + self.misc
end
field :sum, :type => Integer
field :misc, :type => Integer
end
How can I make sure that type of process will never break if there are concurrent insert? I must make sure that when self.sum = X.last.sum + self.misc is calculate that X.last.sum absolutely represents that last possible document inserted in the collection ?
This is critical to my system. It needs to be thread safe.
Alex
ps: this also needs to be performant, when there are 50k documents in the collections, it can't take time to get the last value...
this kind of behavior is equivalent to having an auto increment id.
http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field
The cleanest way is to have a side collection with one (or more) docs representing the current total values.
Then in your client, before inserting the new doc, do a findAndModify() that atomically updates the totals AND retrieves the current total doc.
Part of the current doc can be an auto increment _id, so that even if there are concurrent inserts, your document will then be correctly ordered as long as you sort by _id.
Only caveat: if your client app dies after findAndModify and before insert, you will have a gap in there.
Either that's ok or you need to add extra protections like keeping a side log.
If you want to be 100% safe you can also get inspiration from 2-phase commit
http://www.mongodb.org/display/DOCS/two-phase+commit
Basically it is the proper way to do transaction with any db that spans more than 1 server (even sql wouldnt help there)
best
AG
If you need to keep a running sum, this should probably be done on another document in a different collection. The best way to keep this running sum is to use the $inc atomic operation. There's really no need to perform any reads while calculating this sum.
You'll want to insert your X document into its collection, then also $inc a value on a different document that is meant for keeping this tally of all the misc values from the X documents.
Note: This still won't be transactional, because you're updating two documents in two different collections separately, but it will be highly performant, and fully thread safe.
Fore more info, check out all the MongoDB Atomic Operations.