How does Ingite affinity-collocation work? - ignite

I am reading through the https://apacheignite.readme.io/docs/affinity-collocation, but I didn't get a good unserstanding how
the affinity-collocation works or its behavior.
Presume I have an Employee object(its id is 1000) whose companyId is 1, then this Employee object will be collocated with the Company object whose id is 1.
That is, they will reside in the same node but in different cache:
Employee Cache: <1000, EmployeeObjWhoseCompanyIdIs1>
Company Cache: <1, CompanyObj>
But What if there is a third cache, say Country Cache, and it also has a key that is 1, that is:
Country Cache: <1, CountryObj>
Then, will the Employee object and the Country object will also reside in the same node?
From the Affinity class definition,it only defines the
affKey to which to collocate with, but it doesn't specify the cache that owns this affKey

Yes, they will be stored on one node. However, I would treat this as a coincidence because it happens only because you used same types and values for keys in this particular case. Logically company ID and country ID are not related to each other, so it's not correct to say that they are collocated.

Related

Geode transaction to generate ID and insert object

Let's say I have 3 PARTITIONED_REDUNDANT regions:
/Orders - keys are Longs (an ID allocated from /Sequences) and values are instances of Order
/OrderLineItems - keys are Longs (an ID allocated from /Sequences) and values are instances of OrderLineItem
/Sequences - keys are Strings (name of a sequence), values are Longs
The /Sequences region will have many entries, each of which is the ID sequence for some persistent type of that is stored in another region (e.g., /Orders, /OrderLineItems, /Products, etc.)
I want to run a Geode transaction that persists one Order and a collection of OrderLineItems together.
And, I want to allocate IDs for the Order and OrderLineItems from the entries in the /Sequences region whose keys are "Orders" and "OrderLineItems", respectively. This operates like an "auto increment" column would in a relational database - the ID is allocated/assigned at insertion time as part of the transaction.
The insertion of Orders and OrderLineItems and the allocation of IDs from the /Sequences region need to be transactionally consistent - they all succeed or fail together.
I understand that Geode requires data being operated on in transaction to be co-located if the region is partitioned.
The obvious thing is to co-locate OrderLineItems with the owning Order, which can be done with a PartitionResolver that returns the Order's ID as the routing object.
However, there's still the /Sequences region that is involved in the transaction, and I'm not clear on how to co-locate that data with the Order and OrderLineItems.
The "Orders" entry of the /Sequences reqion would need to be co-located with every Order for which an ID is generated...wouldn't it? Obviously that's not possible.
Or is there another / better way to do this (e.g., change region type for /Sequences)?
Thanks for any suggestions.
Depending on how much data is in your /Sequences region - you could make that region a replicated region. A replicated region is considered co-located with all other regions because it's available on all members.
https://geode.apache.org/docs/guide/15/developing/transactions/data_location_cache_transactions.html
This pattern is potentially expensive though if you are creating a lot of entries concurrently. Every create will go through these shared global sequences. You may end up with a lot of transaction conflicts, especially if you are getting the next sequence number by incrementing the last used sequence number.
As an alternative you might want to consider UUIDs as the keys for your Orders and OrderLineItems, etc. A UUID takes twice as much space as a long, but you can allocate a random UUID without needing any coordination between concurrent creates.

Not sure if this consistitues a transitive dependency

I am a bit stuck designing part of a database.
I have a table called Staff. It has different attributes:
StaffID
First Name
Last Name
Job Title
Department Number
Telephone Number
StaffID is the primary key in this table.
My issue however, is that it is possible to find any information based on the telephone number (i.e. each staff member has a different, unique telephone number).
For example, this means that the First Name or Job Title can be found when we have the Phone Number. However, Phone Number is not a primary key, StaffID is.
I am not sure whether this is a transitive dependency and should fixed through 3NF by splitting up the table and having the Staff table without the Phone Number and another table with just StaffID and Telephone Number.
Transitive dependency occurs only if you have indirect relationship between more than 2 attributes that are not part of they key.
In your example, as you explained, the StaffID is part of your dependency, which is fine because it's the primary key.
Also you can look at this question that shows what is wrong with a transitive dependency. It could help put things into perspective.
In your table, if you delete staff member, you delete all the information (rightly so because you don't need it). If you leave phone number in a different table and, for instance, delete entry only in Staff, you're left with a wild phone number. But if your Staff table allowed multiple entries for the same person (but different departments) then the situation would be different.
Other sites that helped me in the past:
https://www.thoughtco.com/transitive-dependency-1019760
https://beginnersbook.com/2015/04/transitive-dependency-in-dbms/
Funnily they always follow the book example : )
In design-theoretical terms, keys are implied by dependencies. If PhoneNumber→StaffID and if StaffID is known to be a key then we can infer that PhoneNumber is also a key. If that is the case then there is no violation of 3NF because the determinants are all keys. Note that the choice of StaffID as primary key is irrelevant here. Normalization treats all keys as equally significant.
In practical database design however, the question arises as to whether PhoneNumber really makes sense as a key. In other words, would you actually want to enforce dependencies like PhoneNumber→StaffID? If, after consideration, you decide that dependency is not applicable then you could discard that dependency (by not making PhoneNumber a key) and the table would still satisfy 3NF with respect to the set of dependencies you have left.
Here's a reason why a dependency like PhoneNumber→StaffID might not be a realistic choice: when I joined my present company I got a staff ID on my first day; I didn't get a phone number until two days later.
It is not because there is no dependency between phone and name or last name, if you know the name you can't know the phone number, it is not the same as for example, Model and Manufacturer, if you know the model is a mustang then you know the manufacturer is ford, and ther other way around, you know that ford makes mustangs
With the columns you mentioned I would have separate tables for departments and job titles, because they do not depend on the PK StaffID. Think about it as removing potential redundancies, you can have five thousand people in there and have job title as a string repeated one thousand times, that is a signal that it needs its own table (2NF).
Transitive dependency means that you have a (set of) attribute(s) that are completely determined by going from a (set of) attribute(s) A -> B and then from B -> C, while you cannot go from B -> A.
In your case, you do indeed have (StaffId) -> (PhoneNumber) and also (PhoneNumber) -> (StaffId). This means you have A -> B and B -> A and hence at this step you can already rule out the transitive dependency.
If you like, you could say that PhoneNumber would be another candidate for PK.
As a background, the problem with transitive dependencies is this: Assume you have a table consisting of "Book Title" (primary key), "Author" and "Gender of Author". Then you certainly have a transitive dependency BT -> A, A -> GoA, hence BT -> GoA.
Now assume that one of your authors is "Andy Smith", Andy being a short name for Andreas. Andreas goes and changes gender, and is now Andrea. Obviously you do not need to change the name, "Andy" works just fine for "Andrea". But you do have to change the Gender. You have to do it for many entries in your table, i.e. for all books from that author.
In this case, you would fix the problem by creating a new table "Author", obviously, and then you'd have only one row for Andy.
Hope that clears it up. It is easy to see that in your example there is no constellation where you have to change many rows due to a phone number change. It's a simple 1:1 relationship between StaffId and PhoneNumber, no problems whatsoever. Both are candidate keys.

Domain Driven Design Auto Incremented Entity Key

Just starting with Domain Driven Design and I've learned that you should keep your model in a valid state and when creating a new instance of a class it's recomended to put all required attributes as constructor parameters.
But, when working with auto incremented keys I just have this new ID when I call an Add method from my persistent layer. If I instanciate my objects without a key, I think they will be in a invalid state because they need some sort of unique identifier.
How should I implement my architecture in order to have my IDs before creating a new instance of my entity ?
Generated Random IDs
The pragmatic approach here is to use random IDs and generate them before instantiating an entity, e.g. in a factory. GUIDs are a common choice.
And before you ask: No, you won't run out of GUIDs :-)
Sequential IDs with ID reservation
If you must use a sequential ID for some reason, then you still have options:
Query a sequence on the DB to get the next ID. This depends on your DB product, Oracle for example has them).
Create a table with an auto-increment key that you use only as key reservation table. To get an ID, insert a row into that table - the generated key is now reserved for you, so you can use it as ID for the entity.
Note that both approaches for sequential IDs require a DB round-trip before you even start creating the entity. This is why the random IDs are usually simpler. So if you can, use random IDs.
DB-generated IDs
Another possibility is to just live with the fact that you don't have the ID at creation time, but only when the insert operation on the DB succeeds. In my experience, this makes entity creation awkward to use, so I avoid it. But for very simple cases, it may be a valid approach.
IN adition to theDmi's comments
1) You can in your factory method make sure your entity gets stored to the database. This might or might not be applicable to your domain but if you are sure that entity is going to be saved that might be a valid approach
2) You can separate the ID from the primary key from the database. I've worked with a case there something was only an order if the customer payed and at that point it would be identified by it's invoice id (a sequentual ID). that doesn't mean in the database i would need an column ID which was also the primary key of the object. You could have a primary key in the database (random guid) and till have an ID (int?) to be sequentual and null if it hasn't be filled yet.

Insert into statement

I have a table
Moon PIS
pID pAddr cID cName leaseExp mRent oID oName oContact
pID – property id: coded to identify the specific property, chosen to be primary key.
pAddr – property address, required (ie, cannot be null)
cID – client id: coded to identify the client – null means not rented out yet
cName – client name – null means not rented out yet
leaseExp – lease Expiration date – null allowed, if not rented out yet.
mRent – monthly rent (in dollars) – null allowed.
oID – owner id: coded to identify the property owner, required (ie, cannot be null)
oName – owner name, required (ie, cannot be null)
oContact – owner’s contact address.
and I am suppose to normalize this table
I created a table for the property, owner, and client.
The property table has pID, pAddr
the client table has cID, cName the owner table has oID, oName, oContact
first, I was wondering if normalized the table properly?
if so I am then required to move the data from the MoonPIS table into the newly created tables. I have attempted:
INSERT INTO Property (PropertyID, PropertyAddr)
SELECT pID, pAddr FROM Moon PIS
I am receiving an error saying "Microsoft Access database engine could not find the input table or query 'Moon'. Make sure it exist and that its name is spelled correctly."
Do I have to set up relationships prior to transferring the data. All I have done is created the tables and columns.
You need to escape the name because it contains a space:
INSERT INTO Property (PropertyID, PropertyAddr)
SELECT pID, pAddr
FROM [Moon PIS];
This answer isn't valid for the question, as it refers to the original table, and not to the changes suggested in the question... but I'll leave it for now as I think it might be useful anyway, if someone disagrees I'll remove it.
first, I was wondering if normalized the table properly?
No, it's not properly normalized as it records data for multiple types of entites (properties, clients and owners). You'll want to move the client details (cName) and the owner details (oName and oContact) to their own tables (Client and Owner maybe) and just keep the cId and oID as foreign keys.
Also, what is this table supposed to record, right now you include a lot of data that doesn't depend on the primary key (propertyId). Take rent for example, is the rent bound to a property or is it something that relates to a specific contract with a client and can change depending on client? If so, it doesn't belong in the property table. And so on...
Normalizing relational designs can be hard (and involve formal logic), but if you want your design to work, it's a subject well worth putting some time into studying.
As for the query error, don't use white spaces in table names :)

Should one include ID as a property on objects persisted to a database?

I am creating the model for a web application. The tables have ID fields as primary keys. My question is whether one should define ID as a property of the class?
I am divided on the issue because it is not clear to me whether I should treat the object as a representation of the table structure or whether I should regard the table as a means to persist the object.
If I take the former route then ID becomes a property because it is part of the structure of the database table, however if I take the latter approach then ID could be viewed as a peice of metadata belonging to the database which is not strictly a part of the objects model.
And then we arrive at the middle ground. While the ID is not really a part of the object I'm trying to model, I do realise that the the objects are retrieved from and persisted to the database, and that the ID of an object in the database is critical to many operations of the system so it might be advantageous to include it to ease interactions where an ID is used.
I'm a solo developer, so I'd really like some other, probably more experienced perspectives on the issue
Basically: yes.
All the persistence frameworks ive used (including Hibernate, Ibatis) do require the ID to be on the Object.
I understand your point about metadata, but an Object from a database should really derive its identity in the same way the database does - usually an int primary key. Then Object-level equality should be derived from that.
Sometimes you have primary keys that are composite, e.g first name and last name (don't ever do this!), in which cases the primary key doesn't become 'metadata' because it is part of the Object's identity.
I generally reserve the ID column of an object for the database. My opinion is that to use it for any 'customer-facing' purpose, (for example, use the primary key ID as a customer number) you will always shoot yourself in the foot later.
If you ever make changes to the existing data (instead of exclusively adding new data), you need the PK. Otherwise you don't know which record to change in the DB.
You should have the ID in the object. It is essential.
The easiest use case to give as an example is testing equality:
public bool Equals(Object a, Object b) { return {a.ID = b.ID}; }
Anything else is subject to errors, and you'll find that out when you start getting primary key violations or start overwriting existing data.
By counterargument:
Say you don't have the ID in the object. Once you change an object, and don't have it's ID from the database, how will you know which record to update?
At the same time, you should note that the operations I mention are really private to the object instance, so ID does not necessarily have to be a public property.
I include the ID as a property. Having a simple unique identifier for an object is often very handy regardless of whether the object is persisted in a database or not. It also makes your database queries much more simple.
I would say that the table is just a means to persist an object, but that doesn't mean the object can't have an ID.
I'm very much of the mindset that the table is a means to persist the object, but, even so, I always expose the IDs on my objects for two primary reasons:
The database ID is the most convenient way to uniquely identify an object, either within a class (if you're using a per-table serial/autonumber ID) or universally (if you're maintaining a separate "ID-to-class" mapping). In the context of web applications, it makes everything much simpler and more efficient if your forms are able to just specify <input type=hidden name=id value=12345> instead of having to provide multiple fields which collectively contain sufficient information to identify the target object (or, worse, use some scheme to concatenate enough identifying information into a single string, then break it back down when the form is submitted).
It needs to have an ID anyhow in order to maintain a sane database structure and there's no reason not to expose it.
Should the ID in the object read-only or not? In my mind it should be read-only as by definition the ID will never change (as it uniquely identifies a record in the database).
This creates a problem when you create a new object (ID not set yet), save it in the database through a stored procedure which returns the newly created ID then how do you store it back in the object if the ID property is read-only?
Example:
Employee employee = new Employee();
employee.FirstName="John";
employee.LastName="Smith";
EmployeeDAL.Save(employee);
How does the Save method (which actually connects to the database to save the new employee) update the EmployeeId property in the Employee object if this property is read-only (which should be as the EmployeeId will never ever change once it's created).