Is 'identity' an optional characteristic for real world objects? - oop

Somewhere I've read
An object has three characteristics:
state (e.g. name)
behavior (e.g. reading)
identity(unique id number of student)
As per this information, every object will have unique identification, so that all objects of a class will be different from each other.
but,
In many other places I've read that objects have two characteristics:
state
behavior
Question:
which one is true? objects have 2 characteristics or 3 characteristics?
suppose there are two erasers of the same brand, look, shape, size and color.
So, these two objects should be treated as 'equal objects' as there is nothing to uniquely identify them?

Explicit identity is optional.
However, the culmination of an object's states and behaviors is an implicit identity; thus,
Implicit identity is required.
Two objects can have the exact same intrinsic characteristics (eg. color, size, shape), but differ in their extrinsic characteristics (eg. location, owner).
In this way, two objects may be considered equivalent when compared by a selection of their properties, but would be considered distinct in terms of the culmination of all intrisic or extrinsic states and behaviors.

In your provided analogy, you bring up two identical erasers with the same characteristics. If you show them to us, and ask us, are they different, we will say "No. They are the same."
However, if you were to ask us if these two erasers are actually the same singular eraser, we'd wonder how we got to Philosophy SE.
Identity does not have to be explicitly defined. Take the case of String for example.
If I do:
String a = "ABCDE";
String b = "ABCDEFG".substring(0, 5); //turns into ABCDE
We have two Strings storing identical information ABCDE
We can do two comparisons:
a == b //false
a.equals(b) //true
These two Strings are like your two erasers. They are equal in that they both consist of ABCDE, but they aren't actually the same singular String, but two separate sets of characters that are coincidentally the same thing.
Both a and b point to a unique reference to "ABCDE". In this case, we don't have an explicit identity, but both a and b are unique references, so the language knows, "Hey, these are two different Strings."
Now, let's return to the eraser example. In this case, we haven't been provided any sort of way to differentiate the two, but we can still differentiate the two.
One eraser is on the left, one is on the right (or however they're arranged)
These erasers have implicitly, by us, been given an identity so we recognize that they're two different erasers with identical properties.
We can explicitly define an identity by giving erasers serial numbers or names. They may look the same, but they now have explicit identities rather than the ones we have made up in our head.

Value Objects (for instance date, color, etc..) haven't by essence the notion of identity.
Two distinct Color objects MAY be equal as long as their properties (values) are equal (checked by HashCode/equals).
It is quite possible that your design leads you to define Color as an Entity but in a very few cases.
In the contrary, each entity (user123, car456 etc..) owns a unique id, and comparison are usually made through this id. HashCode/equals only take account of the id.
So if I want to make a rule for both, it would be:
object has two characteristics: state and behavior

I would argue that you can regard Identity as being part of State... The vast majority of classes will have some form of identity stored with them, but it's not a hard and fast rule. Consider, for instance, strings...
Out in the real world though, the vast majority of what you do will involve talking to databases, and joining information together. Keys are crucial to that, and your identity is basically your key... In the old days just because it was in your database didn't mean you would have it in your class, but in this era of Object Relational Mappers I'd get used to the idea if I were you...

If two objects are completely the same in every way, there's no way to distinguish between them, so they're the same object. Distinct objects — those that are not the same object — are distinct because they differ in some way. Your two erasers may have the same brand, look, shape, size, and color, but they differ in their position: they don't physically overlap in space. You can put them on the table next to each other and see them in their distinct locations, so you know they're distinct objects.
You may find it useful to consider two distinct objects equivalent even if they differ in some of their properties — two erasers in different physical locations but the same in other characteristics, or two data structures at different memory addresses but containing the same data. This is the difference between Java's equals() method and its == operator.

You are describing the difference between Reference Equality (literally the same object) and Value Equality (have the same properties). Whether or not a given object should have Reference Equality or Value Equality is NOT dependent solely on the object itself; it depends on the context. In most contexts, a dollar bill has value equality: one is just as good as another, we interchange them freely, and we don't care at all about the "identity" of each individual bill. However, to a counterfeit specialist working for the treasury department, all dollar bills are not the same, and the identity of the individual bill matters a lot.
Another example might be an airplane. If I am getting on an American Airlines flight, whether or not the airplane is a 737 or an S80 matters to me: one has 6 seats in a row while the other has 5, one has AC power in every seat while the other does not. But I only care about the properties of the plane, not the identity; one 737 is just as good as another to me. But to the mechanics who maintain the planes, the identity matters a great deal. One plane has just been serviced, while another is approaching its service deadline; keeping track of which one is which is extremely important.
So before you decide how to model your object, consider the context in which it will be used.

I endorse much of #Wyzard's answer, but it doesn't go far enough. To be clear:
YES! Identity is optional for real-world objects.
Even objects that are not truly identical in every way--two potato chips, for example--are for all practical intents and purposes identical. They are identical in use. No one cares about their individuality. There is no naming scheme, no serial numbers, no hash values to distinguish them in any way. Your two erasers or two oysters you just bought at the local raw seafood bar may have some different characteristics, and they have a physics identity (i.e. they can't occupy the same space at the same time), but there is no meaningful way to talk about them other than pointing at them ("this one is bigger than that one!") or describing them ("the blue eraser--no, that other one that looks like it's been slightly used"). Their identity is only transient, and only what individual users assign. Very few people bother naming their erasers or their oysters. ("Go to work, Fred! Erase!" or "Francine was delicious! Bring me another! I shall name her Sally Wellfleet!")
This no-real-identity is true of many manufactured products, such as nails, pieces of lumber, ball bearings, Ibuprofen caplets, or bottles of Ibuprofen. Manufacturers often track cohort identity--the batch number of which they were a part, for example. But that is the finest granularity for which true identity information is created or considered.
Now, this isn't generally true of electronic devices. Even the humblest Ethernet adaptor, Bluetooth transponder, or RFID tag has an elaborate identification system. It has a manufacturer, a model/part number, a serial number, and often a designed identity (device id), There may also be a "current address" like a MAC address or other "I am operating at/as" identity. Many of these pieces of identifying information are available via reflection. Individual chips that make up the device may us a manufacturing "batch id" system, but the operational device has a more overt individual identity.
But most real-world objects are not electronic transponders, and they have no meaningful identity other than what we assign.

Related

A valid case for a single-column ID table?

As a hobby project, I've taken on the challenge of creating a database for storing the details of monsters from a certain popular monster-collecting RPG whose name rhymes with Blokémon.
The logical place to start of course is a table called Species, to hold the basic demographic details of each species. The trouble is, 20 years of exceptions and gimmicks has meant there's not actually a single demographic left that matches 1:1 to a species in all cases. Some examples:
Name: We call it Bulbasaur but Japan calls it Fushigidane (or フシギダネ if you prefer). Other languages have different names.
Category: (Bulbasaur is a "Seed" Pokémon for eg) This would be 1:1 but recently-added species Hoopa has to be awkward and have two. And there's still the language thing anyway.
Height/Weight/Stats: Most species just have one "forme", but quite a few now have multiple, and each has different stats and appearance. Many of these stats would live at the Forme level of the hierarchy, not the Species level.
The result of all this is all that remains is the concept of a species, and concept is difficult to store in a database. For example, Pikachu's a little yellow electric woodland mouse thing, and that's all it ever is so it graciously only has one set of demographics (its even called Pikachu in most languages). If every species were like Pikachu, this would be a very simple to design table. Shaymin, on the other hand? Well, its one species, but it has two formes - Sky Forme and Land Forme - each with different stats. The Sky Forme is a flying white dog. The Land Forme is a little green hedgehog.
Regardless, species is still a useful thing to have. It links formes together, and every species has a name even if that name differs between languages. You can count the number of species, or look at species that appear within a particular game. But the only field that can exist in such a table is an ID. It's the only thing we can consider fixed for every single species. I will probably also include a "Label" field for my own developer sanity, but it wouldn't be considered part of the dataset, just a helper for me personally.
Is this an acceptable case for a single-column ID table, or is there a better way to structure this?
Is this an acceptable case for a single-column ID table
Yes.
From a relational perspective: A table holds rows of values that are in a certain relation to each other, ie participate in a certain relationship, ie are associated in a certain way, ie satisfy a certain statement template aka predicate. Your predicate of interest is Species(ID) "ID is a species". So make that a table. You will have lots of other predicates like "ID is a species and ...". But as long as none of them has IDs in 1:1 correspondence with those in Species you can't use any of them instead of Species. (You might be able to express Species as, say, a union of projections of them, but that's a separate design issue.)
From an ERM perspective: There are some species. So there is a species entity type. Its table gets a surrogate key. You aren't interested in any attributes. So don't have any.
There's just nothing special about having a single-column table.

How to Store Graph-Like Data in SQL Server?

This is a bit of a complex one, and even trying to think it over is somewhat confusing.
Basically I'm having to design a series of tables that will house information about many different pieces of electrical equipment. The arrangement of this equipment is quite complex, and can vary fairly drastically.
The different types of equipment are as follows:
RDC - Remote Distribution Center
EBD - Electrical Bus Duct
UPB - Upright Panel Board
PDU - Power Distribution Unit
Now the way these units work together is slightly confusing as well.
PDU - Powers RDC's, EBD's, and UPB's. They are often redundant, and have a secondary
unit that powers the same equipment in the event of a power failure.
Can also contain breakers and power equipment directly.
RDC - Powers nearly all the equipment on the data center floors, are usually redundant.
They have two units side by side, being powered by a PDU. In the event of a
failure, the second RDC is activated and resumes operations.
EBD - Nearly identical to the RDC, being phased out, but still needs to be tracked in a
similar fashion.
UPB - Similar to an RDC, however, they are not redundant.
Now what I'm trying to do is figure out the most simplistic method of tracking this crazy relationship between all the different items?
I need to track the redundant sources for all possible hardware, but also what powers each unit. This can be quite complex because if two PDUs power a set of two RDCs, we need to be able to track exactly what goes where.
Any idea on exactly where to start?
EDIT Here is a visual representation of what I'm after. The objects that are touching are redundant, and must be documented as such. Also, the different hardware that is connected to each device must be cataloged.
Set up one table for equipment, one table for power supplies, then a third table that matches a piece of equipment with its power supply.
This sounds like a job for an entity-relationship model. You can learn more about that here: enter link description here
But, in the interest of answering your question, here's how I would set it up. I believe I understand the relationships between entities. My shorthand follows this pattern: Table [TableName] ([columns]). I tried to name them so they make the relationships obvious.
Table RDC (id)
Table PDU (id)
Table UPB (id, PduId) // Many-to-one relationship between UPBs and Pdus
Table PDU (id)
Table PDU_RDC (PduId, RdcId) // represents many-to-many relationship between PDUs and RDCs
Table PDU_EBD (PduId, EbdId) // represents many-to-many relationship between PDUs and EBDs
Good luck!
Instead of focusing on "entities" focus on basic facts. Each gives a table or view.
Some of the basic facts just involve entities; others are about (ids of) entities:
RDC(id) // id identifies a remote distribution center
powers(pid,rid) // PDU pid powers RDC rid
backup(rid1,rid2) // RDC rid1 is backed up by RDC rid2
active(rid) // RDC is active
Until you supply adequate statements you want to make/use we can only answer you with guesses or principles; give statements and business rules we can suggest alternatives and rearrangements.
When you get AND between two statements you already have, the table with that statement is expressible as a JOIN of the two statements' tables.
You can introduce notions like hardware type but the tables/statements for that way will involve simpler statements (for which you may have defined tables). The former tables/statements are joins of the latter, and the latter are projections of the former. This means you can write views of either way in terms of the other. Neither is more complex; you have fewer things with more parts or more simpler things. Queries involving given statement will be simpler--but using the appropriate view neither is more complex. However, each way has corresponding versions of constraints and SQL might make certain constraints hard to express declaratively. Investigate join performance later as a non-premature optimization.
When a column is a function of a set of columns there is an FD from the set to the column. A column set forms a key when all other columns are functions of it but of no subset. FDs and keys are kinds of constraint.
There will be certain constraints that a projection of a source table is always a subset of a projection of a target table (maybe the same one). That's an IND. Informally it means something(c1,...) IMPLIES otherthing(c1,...). Formally, EXISTS x1,... t1(c1,...,x1,...) IMPLIES EXISTS y1,... t2(c1,...,y1,...). If the target projection' columns form a key in its table, there's also a FK. SQL FK [sic] declarations actually declare INDs.
There will be other constraints.
Supplying whateever-to-whateverness for a table is just one property about it. Not being 0-or-more-to-0-or-more means a corresponding FD or IND holds. People talk about "a" "1-to-n" "relationship" between entity types or tables but that's just sloppy unclear expression of some constraint. Make sure you know exactly the table(s) and constraint(s) that means.
Read about ORM2 (or NIAM or FCO-IM) because it is based on relational principles (although could be moreso).

How best to normalize and reference (FK) locations (Neighborhood/City/Region/Country/Continent)

So I have searched around but haven't found a satisfactory answer.
I have different types of locations, as stated in the title. Given a type of location (i.e. city), the less granular locations can be inferred. I.e. if you know you're in Oregon, it implies you're in the United States, which implies you're in North America.
We have Objects that reference locations, but the granularity is not all the same. Some items might point to neighborhoods, others are only known down to the city level, while some are only known to a region, etc.
There were two ways in which I thought of organizing the data, this is the way I am leaning towards:
Have a generic "Locations" table, with a location "type" and a "parent location" referencing itself. So there'd be an entry for United States of type country, and an entry for Oregon type state which references United States.
i.e.
You can then have the object reference the location off its primary key, and then other locations can be inferred. Does this make sense or is there a better way I could be organizing the data?
The other way I considered was with a different table for each location "type" but then the problem is having our objects referencing it, since the most granular type of location for an object isn't always the same.
If I were to slip other location types in later, for example counties in between Cities and Regions, might this present a problem? I'm thinking it would be no more a problem than with separate tables, but perhaps there's a better way I can keep track of things in a logical way.
This is a case of subclasses, often called subtypes. It's complicated by the fact that some subtypes are contained in other subtypes. The container issue is well handled by classical elementary relational database design.
The subclass issue requires a little explanation. What OOP calls "subclasses" goes by the name "ER Specialization" in ER modeling circles. This tells you how to diagram subclasses, but it doesn't tell you how to implement them.
It's worth mentioning two techniques for implementing subclasses in SQL tables. The first goes by the name "single Table Inheritance". The second goes by the name "Class Table Inheritance". In class table inheritance, you will have one generic table for "locations" with all the attributes that are common to all locations, regardless of type. In the "Cities" table you will have attributes that pertain to cities, but not to countries, etc. You will have other subclass tables for the other types of locations.
If you go this route, you should look up another technique, called "Shared Priomary Key". In this technique, the id field of the subclass tables all contain copies of the id field from the superclass table. This requires a little effort, but it's well worth it.
Shared primary key offers several advantages. It enforces the one-to-one nature of a subclass relationship. It makes joining specialized data with generalized data simple, easy, and fast. It keeps track of which items belong in which subclass, without an extra field.
In your case, there is yet another advantage. Other tables that reference a location by using a foreign key don't have to decide whether to reference the superclass table or the subclass table. A single foreign key that references the superclass table will also implicitly reference one of the subclass tables, although it isn't obvious which one.
This isn't perfect, but it's very, very good. Been there, done that.
For more information, you can google the techniques, or find relevant tags here in SO.
What about:
Countries:
Id,
Name.
Regions:
Id,
CityId,
Name.
Cities:
Id,
RegionId,
Name.
Neighborhoods:
Id,
CityId,
Name.
This for location types. But the main problem in your case is
but the granularity is not all the same.
For this:
Object:
Id,
Name,
LocationId,
Type.
Good question.
You should definitely go with your first option. If you look at any data modeling patterns book, they all choose that way.
Is this North America only, or global?
Issues:
Cities/Towns/Hamlets/Villages are children of Divisions (generic term for state/province), though not in, say, England, where they are children of Country (or is it County)
Postal Areas (postal codes, zip codes) are children of Divisions too, not county or city. Some cities reside entirely in zips, and some zips reside entirely in cities
Counties are children of Division too. Manhattan contains counties, whereas most counties contain cities.
I would read Hay's Enterprise Model Patterns if you are hoping for a global solution. It's on safari for cheap.

Is it acceptable to have multiple aggregation that can theoretically be inconsistent?

I have a question about the modelling of classes and the underlying database design.
Simply put, the situation is as follows: at the moment we have Positions and Accounts objects and tables and the relationship between them is that a Position 'has an' Account (an Account can have multiple Positions). This is simple aggregation and is handled in the DB by the Position table holding an Account ID as a foreign key.
We now need to extend this 'downwards' with Trades and Portfolios. One or more Trades make up a Position (but a Trade is not a Position in itself) and one or more Portfolios make up an Account (but a Portfolio is not an Account in itself). Trades are associated with Portfolios just like Positions are associated with Accounts ('has a'). Note that it is still possible to have a Position without Trades and an Account without Portfolios (i.e. it is not mandatory to have all the existing objects broken down in subcomponents).
My first idea was to go simply for the following (the first two classes already exist):
class Account;
class Position {
Account account;
}
class Portfolio {
Account account;
}
class Trade {
Position position;
Portfolio portfolio;
}
I think the (potential) problem is clear: starting from Trade, you might end up in different Accounts depending if you take the Position route or the Portfolio route. Of course this is never supposed to happen and the code that creates and stores the objects should never be able create such an inconsistency. I wonder though whether the fact that it is theoretically possible to have an inconsistent database implies a flawed design?
Looking forward to your feedback.
The design is not flawed just because there are two ways to get from class A to class D, one way over B and one over C. Such "squares" will appear often in OOP class models, sometimes not so obvious, especially if more classes lie in the paths. But as Dan mentioned, always the business semantics determine if such a square must commute or not (in the mathematic sense).
Personally I draw a = sign inside such a square in the UML diagram to indicate that it must commute. Also I note the precise formula in an UML comment, in my example it would be
For every object a of class A: a.B.D = a.C.D
If such a predicate holds, then you have basically two options:
Trust all programmers to not break the rule in any code, since it is very well documented
Implement some error handling (like Dan and algirdas mentioned) or, if you don't want to have such code in your model, create a Checker controller, which checks all conditions in a given model instance.

Is there ever a time where using a database 1:1 relationship makes sense?

I was thinking the other day on normalization, and it occurred to me, I cannot think of a time where there should be a 1:1 relationship in a database.
Name:SSN? I'd have them in the same table.
PersonID:AddressID? Again, same table.
I can come up with a zillion examples of 1:many or many:many (with appropriate intermediate tables), but never a 1:1.
Am I missing something obvious?
A 1:1 relationship typically indicates that you have partitioned a larger entity for some reason. Often it is because of performance reasons in the physical schema, but it can happen in the logic side as well if a large chunk of the data is expected to be "unknown" at the same time (in which case you have a 1:0 or 1:1, but no more).
As an example of a logical partition: you have data about an employee, but there is a larger set of data that needs to be collected, if and only if they select to have health coverage. I would keep the demographic data regarding health coverage in a different table to both give easier security partitioning and to avoid hauling that data around in queries unrelated to insurance.
An example of a physical partition would be the same data being hosted on multiple servers. I may keep the health coverage demographic data in another state (where the HR office is, for example) and the primary database may only link to it via a linked server... avoiding replicating sensitive data to other locations, yet making it available for (assuming here rare) queries that need it.
Physical partitioning can be useful whenever you have queries that need consistent subsets of a larger entity.
One reason is database efficiency. Having a 1:1 relationship allows you to split up the fields which will be affected during a row/table lock. If table A has a ton of updates and table b has a ton of reads (or has a ton of updates from another application), then table A's locking won't affect what's going on in table B.
Others bring up a good point. Security can also be a good reason depending on how applications etc. are hitting the system. I would tend to take a different approach, but it can be an easy way of restricting access to certain data. It's really easy to just deny access to a certain table in a pinch.
My blog entry about it.
Sparseness. The data relationship may be technically 1:1, but corresponding rows don't have to exist for every row. So if you have twenty million rows and there's some set of values that only exists for 0.5% of them, the space savings are vast if you push those columns out into a table that can be sparsely populated.
Most of the highly-ranked answers give very useful database tuning and optimization reasons for 1:1 relationships, but I want to focus on nothing but "in the wild" examples where 1:1 relationships naturally occur.
Please note one important characteristic of the database implementation of most of these examples: no historical information is retained about the 1:1 relationship. That is, these relationships are 1:1 at any given point in time. If the database designer wants to record changes in the relationship participants over time, then the relationships become 1:M or M:M; they lose their 1:1 nature. With that understood, here goes:
"Is-A" or supertype/subtype or inheritance/classification relationships: This category is when one entity is a specific type of another entity. For example, there could be an Employee entity with attributes that apply to all employees, and then different entities to indicate specific types of employee with attributes unique to that employee type, e.g. Doctor, Accountant, Pilot, etc. This design avoids multiple nulls since many employees would not have the specialized attributes of a specific subtype. Other examples in this category could be Product as supertype, and ManufacturingProduct and MaintenanceSupply as subtypes; Animal as supertype and Dog and Cat as subtypes; etc. Note that whenever you try to map an object-oriented inheritance hierarchy into a relational database (such as in an object-relational model), this is the kind of relationship that represents such scenarios.
"Boss" relationships, such as manager, chairperson, president, etc., where an organizational unit can have only one boss, and one person can be boss of only one organizational unit. If those rules apply, then you have a 1:1 relationship, such as one manager of a department, one CEO of a company, etc. "Boss" relationships don't only apply to people. The same kind of relationship occurs if there is only one store as the headquarters of a company, or if only one city is the capital of a country, for example.
Some kinds of scarce resource allocation, e.g. one employee can be assigned only one company car at a time (e.g. one truck per trucker, one taxi per cab driver, etc.). A colleague gave me this example recently.
Marriage (at least in legal jurisdictions where polygamy is illegal): one person can be married to only one other person at a time. I got this example from a textbook that used this as an example of a 1:1 unary relationship when a company records marriages between its employees.
Matching reservations: when a unique reservation is made and then fulfilled as two separate entities. For example, a car rental system might record a reservation in one entity, and then an actual rental in a separate entity. Although such a situation could alternatively be designed as one entity, it might make sense to separate the entities since not all reservations are fulfilled, and not all rentals require reservations, and both situations are very common.
I repeat the caveat I made earlier that most of these are 1:1 relationships only if no historical information is recorded. So, if an employee changes their role in an organization, or a manager takes responsibility of a different department, or an employee is reassigned a vehicle, or someone is widowed and remarries, then the relationship participants can change. If the database does not store any previous history about these 1:1 relationships, then they remain legitimate 1:1 relationships. But if the database records historical information (such as adding start and end dates for each relationship), then they pretty much all turn into M:M relationships.
There are two notable exceptions to the historical note: First, some relationships change so rarely that historical information would normally not be stored. For example, most IS-A relationships (e.g. product type) are immutable; that is, they can never change. Thus, the historical record point is moot; these would always be implemented as natural 1:1 relationships. Second, the reservation-rental relationship store dates separately, since the reservation and the rental are independent events, each with their own dates. Since the entities have their own dates, rather than the 1:1 relationship itself having a start date, these would remain as 1:1 relationships even though historical information is stored.
Your question can be interpreted in several ways, because of the way you worded it. The responses show this.
There can definitely be 1:1 relationships between data items in the real world. No question about it. The "is a" relationship is generally one to one. A car is a vehicle.
One car is one vehicle. One vehicle might be one car. Some vehicles are trucks, in which case one vehicle is not a car. Several answers address this interpretation.
But I think what you really are asking is... when 1:1 relationships exist, should tables ever be split? In other words, should you ever have two tables that contain exactly the same keys? In practice, most of us analyze only primary keys, and not other candidate keys, but that question is slightly diferent.
Normalization rules for 1NF, 2NF, and 3NF never require decomposing (splitting) a table into two tables with the same primary key. I haven't worked out whether putting a schema in BCNF, 4NF, or 5NF can ever result in two tables with the same keys. Off the top of my head, I'm going to guess that the answer is no.
There is a level of normalization called 6NF. The normalization rule for 6NF can definitely result in two tables with the same primary key. 6NF has the advantage over 5NF that NULLS can be completely avoided. This is important to some, but not all, database designers. I've never bothered to put a schema into 6NF.
In 6NF missing data can be represent by an omitted row, instead of a row with a NULL in some column.
There are reasons other than normalization for splitting tables. Sometimes split tables result in better performance. With some database engines, you can get the same performance benefits by partitioning the table instead of actually splitting it. This can have the advantage of keeping the logical design easy to understand, while giving the database engine the tools needed to speed things up.
I use them primarily for a few reasons. One is significant difference in rate of data change. Some of my tables may have audit trails where I track previous versions of records, if I only care to track previous versions of 5 out of 10 columns splitting those 5 columns onto a separate table with an audit trail mechanism on it is more efficient. Also, I may have records (say for an accounting app) that are write only. You can not change the dollar amounts, or the account they were for, if you made a mistake then you need to make a corresponding record to write adjust off the incorrect record, then create a correction entry. I have constraints on the table enforcing the fact that they cannot be updated or deleted, but I may have a couple of attributes for that object that are malleable, those are kept in a separate table without the restriction on modification. Another time I do this is in medical record applications. There is data related to a visit that cannot be changed once it is signed off on, and other data related to a visit that can be changed after signoff. In that case I will split the data and put a trigger on the locked table rejecting updates to the locked table when signed off, but allowing updates to the data the doctor is not signing off on.
Another poster commented on 1:1 not being normalized, I would disagree with that in some situations, especially subtyping. Say I have an employee table and the primary key is their SSN (it's an example, let's save the debate on whether this is a good key or not for another thread). The employees can be of different types, say temporary or permanent and if they are permanent they have more fields to be filled out, like office phone number, which should only be not null if the type = 'Permanent'. In a 3rd normal form database the column should depend only on the key, meaning the employee, but it actually depends on employee and type, so a 1:1 relationship is perfectly normal, and desirable in this case. It also prevents overly sparse tables, if I have 10 columns that are normally filled, but 20 additional columns only for certain types.
The most common scenario I can think of is when you have BLOB's. Let's say you want to store large images in a database (typically, not the best way to store them, but sometimes the constraints make it more convenient). You would typically want the blob to be in a separate table to improve lookups of the non-blob data.
In terms of pure science, yes, they are useless.
In real databases it's sometimes useful to keep a rarely used field in a separate table: to speed up queries using this and only this field; to avoid locks, etc.
Rather than using views to restrict access to fields, it sometimes makes sense to keep restricted fields in a separate table to which only certain users have access.
I can also think of situations where you have an OO model in which you use inheritance, and the inheritance tree has to be persisted to the DB.
For instance, you have a class Bird and Fish which both inherit from Animal.
In your DB you could have an 'Animal' table, which contains the common fields of the Animal class, and the Animal table has a one-to-one relationship with the Bird table, and a one-to-one relationship with the Fish table.
In this case, you don't have to have one Animal table which contains a lot of nullable columns to hold the Bird and Fish-properties, where all columns that contain Fish-data are set to NULL when the record represents a bird.
Instead, you have a record in the Birds-table that has a one-to-one relationship with the record in the Animal table.
1-1 relationships are also necessary if you have too much information. There is a record size limitation on each record in the table. Sometimes tables are split in two (with the most commonly queried information in the main table) just so that the record size will not be too large. Databases are also more efficient in querying if the tables are narrow.
In SQL it is impossible to enforce a 1:1 relationship between two tables that is mandatory on both sides (unless the tables are read-only). For most practical purposes a "1:1" relationship in SQL really means 1:0|1.
The inability to support mandatory cardinality in referential constraints is one of SQL's serious limitations. "Deferrable" constraints don't really count because they are just a way of saying the constraint is not enforced some of the time.
It's also a way to extend a table which is already in production with less (perceived) risk than a "real" database change. Seeing a 1:1 relationship in a legacy system is often a good indicator that fields were added after the initial design.
Most of the time, designs are thought to be 1:1 until someone asks "well, why can't it be 1:many"? Divorcing the concepts from one another prematurely is done in anticipation of this common scenario. Person and Address don't bind so tightly. A lot of people have multiple addresses. And so on...
Usually two separate object spaces imply that one or both can be multiplied (x:many). If two objects were truly, truly 1:1, even philosophically, then it's more of an is-relationship. These two "objects" are actually parts of one whole object.
If you're using the data with one of the popular ORMs, you might want to break up a table into multiple tables to match your Object Hierarchy.
I have found that when I do a 1:1 relationship its totally for a systemic reason, not a relational reason.
For instance, I've found that putting the reserved aspects of a user in 1 table and putting the user editable fields of the user in a different table allows logically writing those rules about permissions on those fields much much easier.
But you are correct, in theory, 1:1 relationships are completely contrived, and are almost a phenomenon. However logically it allows the programs and optimizations abstracting the database easier.
extended information that is only needed in certain scenarios. in legacy applications and programming languages (such as RPG) where the programs are compiled over the tables (so if the table changes you have to recompile the program(s)). Tag along files can also be useful in cases where you have to worry about table size.
Most frequently it is more of a physical than logical construction. It is commonly used to vertically partition a table to take advantage of splitting I/O across physical devices or other query optimizations associated with segregating less frequently accessed data or data that needs to be kept more secure than the rest of the attributes on the same object (SSN, Salary, etc).
The only logical consideration that prescribes a 1-1 relationship is when certain attributes only apply to some of the entities. However, in most cases there is a better/more normalized way to model the data through entity extraction.
The best reason I can see for a 1:1 relationship is a SuperType SubType of database design. I created a Real Estate MLS data structure based on this model. There were five different data feeds; Residential, Commercial, MultiFamily, Hotels & Land.
I created a SuperType called property that contained data that was common to each of the five separate data feeds. This allowed for very fast "simple" searches across all datatypes.
I create five separate SubTypes that stored the unique data elements for each of the five data feeds. Each SuperType record had a 1:1 relationship to the appropriate SubType record.
If a customer wanted a detailed search they had to select a Super-Sub type for example PropertyResidential.
In my opinion a 1:1 relationship maps a class Inheritance on a RDBMS.
There is a table A that contains the common attributes, i.e. the partent class status
Each inherited class status is mapped on the RDBMS with a table B with a 1:1 relationship
to A table, containing the specialized attributes.
The table namend A contain also a "type" field that represents the "casting" functionality
Bye
Mario
You can create a one to one relationship table if there is any significant performance benefit. You can put the rarely used fields into separate table.
1:1 relationships don't really make sense if you're into normalization as anything that would be 1:1 would be kept in the same table.
In the real world though, it's often different. You may want to break your data up to match your applications interface.
Possibly if you have some kind of typed objects in your database.
Say in a table, T1, you have the columns C1, C2, C3… with a one to one relation. It's OK, it's in normalized form. Now say in a table T2, you have columns C1, C2, C3, … (the names may differ, but say the types and the role is the same) with a one to one relation too. It's OK for T2 for the same reasons as with T1.
In this case however, I see a fit for a separate table T3, holding C1, C2, C3… and a one to one relation from T1 to T3 and from T2 to T3. I even more see a fit if there exist another table, with which there already exist a one to multiple C1, C2, C3… say from table A to multiple rows in table B. Then, instead of T3, you use B, and have a one to one relation from T1 to B, the same for from T2 to B, and still the same one to multiple relation from A to B.
I believe normalization do not agree with this, and that may be an idea outside of it: identifying object types and move objects of a same type to their own storage pool, using a one to one relation from some tables, and a one to multiple relation from some other tables.
It is unnecessary great for security purposes but there better ways to perform security checks. Imagine, you create a key that can only open one door. If the key can open any other door, you should ring the alarm. In essence, you can have "CitizenTable" and "VotingTable". Citizen One vote for Candidate One which is stored in the Voting Table. If citizen one appear in the voting table again, then their should be an alarm. Be advice, this is a one to one relationship because we not refering to the candidate field, we are refering to the voting table and the citizen table.
Example:
Citizen Table
id = 1, citizen_name = "EvryBod"
id = 2, citizen_name = "Lesly"
id = 3, citizen_name = "Wasserman"
Candidate Table
id = 1, citizen_id = 1, candidate_name = "Bern Nie"
id = 2, citizen_id = 2, candidate_name = "Bern Nie"
id = 3, citizen_id = 3, candidate_name = "Hill Arry"
Then, if we see the voting table as so:
Voting Table
id = 1, citizen_id = 1, candidate_name = "Bern Nie"
id = 2, citizen_id = 2, candidate_name = "Bern Nie"
id = 3, citizen_id = 3, candidate_name = "Hill Arry"
id = 4, citizen_id = 3, candidate_name = "Hill Arry"
id = 5, citizen_id = 3, candidate_name = "Hill Arry"
We could say that citizen number 3 is a liar pants on fire who cheated Bern Nie. Just an example.
When you are dealing with a database from a third party product, then you probably don't want to alter their database as to prevent tight coupling. but you may have data that corresponds 1:1 with their data
Anywhere were two entirely independent entities share a one-to-one relationship. There must be lots of examples:
person <-> dentist (its 1:N, so its wrong!)
person <-> doctor (its 1:N, so it's also wrong!)
person <-> spouse (its 1:0|1, so its mostly wrong!)
EDIT: Yes, those were pretty bad examples, particularly if I was always looking for a 1:1, not a 0 or 1 on either side. I guess my brain was mis-firing :-)
So, I'll try again. It turns out, after a bit of thought, that the only way you can have two separate entities that must (as far as the software goes) be together all of the time is for them to exist together in higher categorization. Then, if and only if you fall into a lower decomposition, the things are and should be separate, but at the higher level they can't live without each other. Context, then is the key.
For a medical database you may want to store different information about specific regions of the body, keeping them as a separate entity. In that case, a patient has just one head, and they need to have it, or they are not a patient. (They also have one heart, and a number of other necessary single organs). If you're interested in tracking surgeries for example, then each region should be a unique separate entity.
In a production/inventory system, if you're tracking the assembly of vehicles, then you certainly want to watch the engine progress differently from the car body, yet there is a one to one relationship. A care must have an engine, and only one (or it wouldn't be a 'car' anymore). An engine belongs to only one car.
In each case you could produce the separate entities as one big record, but given the level of decomposition, that would be wrong. They are, in these specific contexts, truly independent entities, although they might not appear so at a higher level.
Paul.