How gemfire does colocation of replicated and partitioned regions

How gemfire does colocation of replicated and partitioned regions - gemfire

1.How does gemfire internally performs co location?
2.How does co-location works with partitioned regions?
3.How does co-location works with replicated regions?
4.How does co-location works with replicated region and partitioned region together?
3.Is custom partitioning required to do co-location?

Gemfire's co-location capability exists so that you can ensure that your transactions will happen at memory speed rather than network speed. For instance, say you want to partition your problem by CustomerId. Customers have Orders, Shipments and Payments associated with them. Lets say that when a Shipment occurs, you want to insert the Shipment record, update the Order and update the Customer record. What you will need to do in order to guarantee that the Orders are co-located with the Customer record is build a compound key for the Orders that contains both the OrderId AND the CustomerId. This can be as simple as a String containing the OrderId and CustomerId separated by a hyphen. Then you need to implement a PartitionResolver that returns the CustomerId portion of the key. When defining the Orders region you would need to add the following to the region configuration:
<cache>
<region name="Orders">
<region-attributes>
<partition-attributes colocated-with="Customers">
<partition-resolver=CustomerIdPartitionResolver">
<class-name>myPackage.CustomerIdPartitionResolver</class-name>
</region-attributes>
</region>
</cache>

Related

Geode transaction to generate ID and insert object

Let's say I have 3 PARTITIONED_REDUNDANT regions:
/Orders - keys are Longs (an ID allocated from /Sequences) and values are instances of Order
/OrderLineItems - keys are Longs (an ID allocated from /Sequences) and values are instances of OrderLineItem
/Sequences - keys are Strings (name of a sequence), values are Longs
The /Sequences region will have many entries, each of which is the ID sequence for some persistent type of that is stored in another region (e.g., /Orders, /OrderLineItems, /Products, etc.)
I want to run a Geode transaction that persists one Order and a collection of OrderLineItems together.
And, I want to allocate IDs for the Order and OrderLineItems from the entries in the /Sequences region whose keys are "Orders" and "OrderLineItems", respectively. This operates like an "auto increment" column would in a relational database - the ID is allocated/assigned at insertion time as part of the transaction.
The insertion of Orders and OrderLineItems and the allocation of IDs from the /Sequences region need to be transactionally consistent - they all succeed or fail together.
I understand that Geode requires data being operated on in transaction to be co-located if the region is partitioned.
The obvious thing is to co-locate OrderLineItems with the owning Order, which can be done with a PartitionResolver that returns the Order's ID as the routing object.
However, there's still the /Sequences region that is involved in the transaction, and I'm not clear on how to co-locate that data with the Order and OrderLineItems.
The "Orders" entry of the /Sequences reqion would need to be co-located with every Order for which an ID is generated...wouldn't it? Obviously that's not possible.
Or is there another / better way to do this (e.g., change region type for /Sequences)?
Thanks for any suggestions.

Depending on how much data is in your /Sequences region - you could make that region a replicated region. A replicated region is considered co-located with all other regions because it's available on all members.
https://geode.apache.org/docs/guide/15/developing/transactions/data_location_cache_transactions.html
This pattern is potentially expensive though if you are creating a lot of entries concurrently. Every create will go through these shared global sequences. You may end up with a lot of transaction conflicts, especially if you are getting the next sequence number by incrementing the last used sequence number.
As an alternative you might want to consider UUIDs as the keys for your Orders and OrderLineItems, etc. A UUID takes twice as much space as a long, but you can allocate a random UUID without needing any coordination between concurrent creates.

How does one structure queries to amalgamate master detail records over a given period

Consider the following scenario (if it helps think Northwind Orders / OrderDetails).
I have two tables LandingHeaders and LandingDetails, that record details about commercial fishing trips. Typically, over the course of a week, a fishing vessel can make several trips to sea, and so will end up with several LandingHeader/LandingDetail records.
At the end of each week the company that purchases the results of these fishing trips need to work out the value of each landing made by each vessel and then pay the owner of that vessel whatever money is due. To add to fun there are some vessels owned by the same person, so the company purchasing the fish would prefer if the value of all the landings from all of the vessels owned by a given individual were amalgamated into a single payment.
Until now the information required to perform this task was spread across more that a simple master-detail table structure and as such it has required several stored procedures (along with the judicious use of dictionaries in the main application doing the work) to achieve the desired end result. External circumstances beyond my control have forced some major database changes and I have taken the opportunity to restructure the LandingHeader table such that it contains all the necessary information that might be needed.
From the landing Header table I need to record the following fields;
LandingHeaderId of sql type int
VesselOwnerId of sql type int
LandingDate (Just used as part of query in reality) of sql type datetime
From the LandingDetails Table I need to record the following fields;
ProductId of sql type int
Quantity of sql type decimal (10,2)
UnitPrice of sql type money
I have been thinking about creating a query that takes as Parameters VesselOwnerID , SartDate and EndDate.
As output I need to know which LandingId's are associated with the owner and the total Quantity for each Distinct ProductId (along with the UnitPrice which will be the same for each ProductId over the selected period) spread over the various landingDetails associated with the LandingHeaders over the given period.
I have been thinking along the lines of output rows that might look a little like this;
Can this sort of thing be done from a standard master - detail type table relationship or will I still need to resort to multiple stored procedures.
A longer term goal is to have a query that could be used to produce xml that could be adapted for use with a web api.

Counting rows with condition

My Table looks like something below
Id | Customer_number | Customer_Name | Customer_owner
I want to insert Customer_Number as a sequence specific to Customer_owner
that is 1,2,3,.... for Customer_owner X and 1,2,3,... Customer_owner Y.
To get the Customer_number I can use following SQL
SELECT COUNT(*) FROM Customer where Customer_owner='X'
My question is that are there any performance impact. Specially for a table with 100,000 records.
Are there any better alternatives?

In terms of performance, I would suggest not adding another column to Customers, for various reasons:
The need to update all of owner's A related customers when adding a customer with the owner A, same goes for removing.
Number of clients is Repeated multiple times - taking up more space and thus (generally) slowing execution.
No real usage to link Number of clients to client's owner via another column for a record describing a single customer.
and many more explanations..
The correct normal form would be having 2 tables:
Customers(Cust_id,Cust_name,Cust_Owner_id)
2.a. Owners (Owner_id,Owner_name,NumberOfCustomers)
OR
2.b. Owners (Owner_id,Owner_name) and have NumberOfCustomers be auto calculated upon Querying.
Edit:
Since you want to display all the customers for a single owner, I assume that is your main usage, you should add a cluster index on Cust_Owner_id . Then , when querying, performance would be good since it will have the benefits of clustering according to your desired data.
Read more about clustering here: Clustered Index
Edit 2:
I've just realized your intent via latest comments, but the solution still remains, I would add, specific to your issue, that I don't recommend you should store the number for all of one owner's customers, instead, keep a SUBSCRIBED DATE Column in Customers table, and when querying, decide of the customer number upon display.
If you want however that number to be permanent (any by that the order 1,2,3,..n will probably break, since customers can be removed), simply use the Customer_Id, since it is already unique.

You can calculate Customer_Number on the fly when you need it:
select c.*, row_number() over (partition by Customer_Owner order by id) as CustomerNumber
from Customer c
This is a much safer approach than trying to store and maintain the number, which can be affected by all sorts of updates into the system. Imagine the fun of changing the numbering when an existing record changes it ownership, for instance.

If you only need unique numbering in the UI you could just assign the numbers in the UI. If you go that route you need to make sure you always retrieve customers in the same order, so add an ORDER BY Id, Or, do what Gordon Linoff suggests.

Database Design - sales from multiple sources

We currently have a SQL database with a table that holds online sales for our company, which sells products using other websites (say, Amazon). The table schema has been set up to hold specific sale data/attributes provided by the website our items are currently sold on (say, Site A).
We are expanding sales to other websites that provide different attributes than Site A uses when an item is sold (e.g. Site A might provide a unique sales id number, and site B might not provide a unique sales id number, but also provide some other info that Site A doesn't provide that we still need to capture).
The question is do I add a separate table for sales on each 'site' that we sell on, as the schema will be different, or try to combine all sales into one table, no matter the platform, leaving some columns null if it doesn't pertain to the particular platform? Or maybe a hybrid approach, separating only the attributes that aren't common among the two sites into separate tables, while a "master" sales table holds attributes that are shared (sale_price, sale_date, etc)?
There are also other tables in play that hold internal information (product Ids, costs, etc), that are linked to the sales table via a unique identifier. Whichever route I choose, I'd need come up with a unique identifier I could use across all tables (auto incremented sale_id, e.g.), and store that in a table for reference/joins.
Any suggestions are welcomed!

A sale is a sale >> same data belongs to the same table. I would definitely not recommend splitting your sales to several tables as this creates lots of difficulty for all that might follow: sales statistics and so on. Try to keep all sales in one table.
If it's a very small project, it might be the best shot to integrate the different fields into one table. Otherwise you might try to create profiles for every sale platform: In this case, use an Entity-Attribute-Value model.

Do not add a table for each site. It sounds like you have a many to many relationship between sites and attributes, so set up your database that way. Also, for any unique identifier you need, create it yourself.

Modelling the Domain from Two perspectives

I'm trying to model the domain of my system but I've come across and issue and could do with some help.
My issue is one of perspective. I'm modeling a system where I have a Customer entity which will have a number of Order entities and the system will be required to list all the Orders for a selected Customer (perspective 1). I therefore modeled a Customer class which contains a collection of Orders... simple. However I've just realised that the system will also need to list all Orders with the details of the Customer (perspective 2) which would mean that I had a single Customer reference from each Order.
The problem is that from each perspective I will be taking time to create object which I will not be interested in E.g. When I will display a list of Orders a Customer instance will be created for each order; in turn the Customer instance will then hold a collection of Orders they have made (which from this perspective I'm not interested in!!).
Could anybody help with suggestions? I've come across this issue before but I've never taken the time to design a proper solution.
Regards,
JLove

I have seen this before. The trick is to differentiate between Customer-Identity and Customer-Details (e.g. Orders). You can then link from all Order-Objects to the Customer-Identity-Object, and in the other view link from the Customer-Identity-Object to the Customer-Details-Object which further links to Order-Objects (you probably want this ordered chronologically).
The implementation can be held as on Object-System or as a relational Database (in which case you would have a table "Customers" with CustomerID as Key, their addresses etc; and a table "Orders" with OrderID as key, and CustomerID as another column.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas