Geode transaction to generate ID and insert object - gemfire

Let's say I have 3 PARTITIONED_REDUNDANT regions:
/Orders - keys are Longs (an ID allocated from /Sequences) and values are instances of Order
/OrderLineItems - keys are Longs (an ID allocated from /Sequences) and values are instances of OrderLineItem
/Sequences - keys are Strings (name of a sequence), values are Longs
The /Sequences region will have many entries, each of which is the ID sequence for some persistent type of that is stored in another region (e.g., /Orders, /OrderLineItems, /Products, etc.)
I want to run a Geode transaction that persists one Order and a collection of OrderLineItems together.
And, I want to allocate IDs for the Order and OrderLineItems from the entries in the /Sequences region whose keys are "Orders" and "OrderLineItems", respectively. This operates like an "auto increment" column would in a relational database - the ID is allocated/assigned at insertion time as part of the transaction.
The insertion of Orders and OrderLineItems and the allocation of IDs from the /Sequences region need to be transactionally consistent - they all succeed or fail together.
I understand that Geode requires data being operated on in transaction to be co-located if the region is partitioned.
The obvious thing is to co-locate OrderLineItems with the owning Order, which can be done with a PartitionResolver that returns the Order's ID as the routing object.
However, there's still the /Sequences region that is involved in the transaction, and I'm not clear on how to co-locate that data with the Order and OrderLineItems.
The "Orders" entry of the /Sequences reqion would need to be co-located with every Order for which an ID is generated...wouldn't it? Obviously that's not possible.
Or is there another / better way to do this (e.g., change region type for /Sequences)?
Thanks for any suggestions.

Depending on how much data is in your /Sequences region - you could make that region a replicated region. A replicated region is considered co-located with all other regions because it's available on all members.
https://geode.apache.org/docs/guide/15/developing/transactions/data_location_cache_transactions.html
This pattern is potentially expensive though if you are creating a lot of entries concurrently. Every create will go through these shared global sequences. You may end up with a lot of transaction conflicts, especially if you are getting the next sequence number by incrementing the last used sequence number.
As an alternative you might want to consider UUIDs as the keys for your Orders and OrderLineItems, etc. A UUID takes twice as much space as a long, but you can allocate a random UUID without needing any coordination between concurrent creates.

Related

How to avoid deadlocks in PessimisticLockScope.EXTENDED?

I’m creating a Java transfer money app that basically transfers money from one account to another.
In a nutshell I have a Transfer entity that contains 3 properties: #ManyToOne OriginAccount, #ManyToOne TargetAccount and Amount.
Account contains a balance to be adjust within the transfer.
I’m about to use LockModeType.PESSIMISTIC_WRITE on Account entities but I have to consider a deadlock.
One option is to select list of 2 accounts always in the same order (sort based on id) to always acquire locks in the same order.
I also heard about PessimisticLockScope.EXTENDED but what will be the order of locks acquiring on joined records? Is there any option to ensure the order based on a kinda comparator? How can I certainly exclude a deadlock possibility?

Query on non key in Redis

I am storing objects as hash ,for example: key-> customer:123 ,email->dk#gmail.com,mobile->828212,name->darshan etc...
Now is it possible in redis to query customers based on email without storing the cross relationship as set which is more of a workaround.
like for example,at the time of insertion of customer storing Set as key->email:dk#gmail.com value->customer:123 and so on.
Lets say if I have 100 fields in a hash, and i need to query 20 of them(like email)
it increases the count of keys in redis instance significantly if we create each entry of those fields in Sets as well.
Is there any other alternative or better approach?
Redis doesn't have inbuilt indexing/searching by fields because it is not a database but more like a data structures server(each key holds a data structure like set/list/map/sortedset/number of unique values etc), but if you are using redis 4.0 you can use the search module to accomplish it. The link is here.

designing a database schema for aws mobile backend

I am new to databases and sql and would like to design a database for a fitness app that will keep track of workouts at the gym.
In my app, I have designed a custom workout object that has a name (e.g. 'Chest day'), an ID (some number) and a date (string). Each workout object contains an array of exercises, another custom object, that has a property for called 'set'. The set is also a custom object with only two numeric properties: number of reps and weight (e.g. 10 reps at 50 lbs)
What I thought of is to have one table for the workouts, another for the exercises and another for the sets. The problem is I do not know how to connect the tables (i.e. link multiple exercises to a unique workout and link multiple sets to a unique exercise) and am not sure if this is even the correct approach.
Also, I planned to set up the backend for this app using the amazon web services mobile hub which provides a noSQL database.
In NoSQL, you should keep all the attributes in single table. You shouldn't normalize the data like RDBMS. Also, please try to come away from Join. The main advantage of NoSQL is that keep everything as one item, so that you don't need to Join to get the result.
Advantages of this approach are:-
1) Fast response as all the data is present as one item in a table
2) Schema less database i.e. you can add any attributes at any time (i.e. no need to alter table and add the new columns)
DynamoDB design for above use case:-
The combination of partition and sort key should be unique
name -String (Partition Key)
id -Number (Sort Key)
date - String
exercise : [array of values] - List data type
custom_set : {rep : 1, weight : 2} - Map data type
Important Note:-
The important thing while designing the data model for DynamoDB is all the data retrieval use cases (i.e. Query Access Patterns) should be available to design the appropriate model.

Accepted methodology when using multiple Sqlite databases

Question
What is the accepted way of using multiple databases that record information about the same object that will ultimately end up living in one central database?
Example
There is one main SQL database about trees.
This database holds information about unique trees from all over the UK.
To collect the information a blank Sqlite database is created (with the same schema) and taken to the tree on a phone.
The collected information is then stored in the Sqlite database until it is brought back to the main database, Where it is then transferred into the main database.
Now this works fine as long as there is only one Sqlite database out for any one tree at a time.
However, if two people wanted to collect different information for the same tree at the same time, when they both came back and attempted to transfer their data in to the main database, there would be collisions on their primary key constraints.
ID Schemes (with example data)
There is a tree table which has unique identifier called treeID
TreeID - TreeName - Location
1001 - Teddington Field - Plymouth
Branch table
BranchID - BranchName - TreeID
1001-10001 - 1st Branch - 1001
1001-10002 - 2nd Branch -1001
Leave table
LeafID - LeafName - BranchId
1001-10001-1 - Bedroom - 1001-10001
1001-10002-2 - Bathroom - 1001-10001
Possible ideas
Assign each database 1000 unique ID's and then one they come back in as the ids have already been assigned the ids on each database won't collide.
Downfall
This isn't very dynamic and could fail if one database overruns on its preassigned ids.
Is there another way to achieve the same flexibility but with out the downfall mentioned above?
So, as an answer:
on the master db, store an extra id field identifying the source/collection database that the dataset was collected on, as well as the tree id.
(src01, 1001), (src02, 1001)
This also allows you to link back easily to the collection source of the information which is likely gonna be a future requirement. Now, you may or may not want to autogenerate another sequence id key value on the master db's table (I wouldn't but that's because I am not that fond of surrogate keys), but I would definitely keep track of the source/treeid it was originally collected with in the field, separately of any master db unique key considerations.
Apparently you are talking about auto-generated IDs for related objects, not the IDs for the trees themselves. Two different people collecting information about the same tree, starting from the same starting set, end up generating the same IDs independently. The two sets of generated IDs cannot coexist in the same DB.
Since you want to keep all the new data. One possible solution is to avoid using the field-generated IDs in the central database at all. When each set of data comes in, take the data that were added in the field, and programmatically add them to the central DB in a way equivalent to how they are added in the field, letting the central DB autogenerate its own IDs.
This requires a mechanism to distinguish newly-collected data from old, but that might be as simple as a timestamp.

DynamoDB: When to use what PK type?

I am trying to read up on best practices on DynamoDB. I saw that DynamoDB has two PK types:
Hash Key
Hash and Range Key
From what I read, it appears the latter is like the former but supports sorting and indexing of a finite set of columns.
So my question is why ever use only a hash key without a range key? Is it a viable choice only when the table is not searched?
It'd also be great to have some general guidelines on when to use what key type. I've read several guides (including Amazon's own documentation on DynamoDB) but none of them appear to directly address this question.
Thanks
The choice of which key to use comes down to your Use Cases and Data Requirements for a particular scenario. For example, if you are storing User Session Data it might not make much sense using the Range Key since each record could be referenced by a GUID and accessed directly with no grouping requirements. In general terms once you know the Session Id you just get the specific item querying by the key. Another example could be storing User Account or Profile data, each user has his own and you most likely will access it directly (by User Id or something else).
However, if you are storing Order Items then the Range Key makes much more sense since you probably want to retrieve the items grouped by their Order.
In terms of the Data Model, the Hash Key allows you to uniquely identify a record from your table, and the Range Key can be optionally used to group and sort several records that are usually retrieved together. Example: If you are defining an Aggregate to store Order Items, the Order Id could be your Hash Key, and the OrderItemId the Range Key. Whenever you would like to search the Order Items from a particular Order, you just query by the Hash Key (Order Id), and you will get all your order items.
You can find below a formal definition for the use of these two keys:
"Composite Hash Key with Range Key allows the developer to create a
primary key that is the composite of two attributes, a 'hash
attribute' and a 'range attribute.' When querying against a composite
key, the hash attribute needs to be uniquely matched but a range
operation can be specified for the range attribute: e.g. all orders
from Werner in the past 24 hours, or all games played by an individual
player in the past 24 hours." [VOGELS]
So the Range Key adds a grouping capability to the Data Model, however, the use of these two keys also have an implication on the Storage Model:
"Dynamo uses consistent hashing to partition its key space across its
replicas and to ensure uniform load distribution. A uniform key
distribution can help us achieve uniform load distribution assuming
the access distribution of keys is not highly skewed."
[DDB-SOSP2007]
Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.
Choosing the correct keys to represent your data is one of the most critical aspects during your design process, and it directly impacts how much your application will perform, scale and cost.
Footnotes:
The Data Model is the model through which we perceive and manipulate our data. It describes how we interact with the data in the database [FOWLER]. In other words, it is how you abstract your data model, the way you group your entities, the attributes that you choose as primary keys, etc
The Storage Model describes how the database stores and manipulates the data internally [FOWLER]. Although you cannot control this directly, you can certainly optimize how the data is retrieved or written by knowing how the database works internally.