LDAP: is it possible to populate an attribute value with a value from another entry? - ldap

I am refactoring our LDAP directory, and all of our "people" users work in a certain building, but those buildings, in addition to having different postal addresses, also belong to different organizations, have physical coordinates, have a postal code, and a CEDEX postal code.
Instead of having a redundant repetition of the postal address for every building repeated in every 'person' entry, would there be a simpler way to link an attribute in a 'person' entry to the entry/DN of that building, and extract one of the latter's attributes (inherited?) in a 'person' query, without doing two (or more) lookups?
PS: I would like to avoid using the (obsolete, or soon to be) 'pilotOrganization' object class.

Related

Do two tables with the same content break data normalization?

[Assuming there is a one to many relationship between an individual and an address, and assuming there is a one to many relationship between an agency and an address.]
Given the following table structure:
Wouldn't you want to merge the two address tables together and instead of using a foreign key within each one use a tie table?
Like this:
Are they both valid for normalization or only one?
Depends what you want to do.
In your second example with the tie tables, if I want to do a mailshot to my customers then my query has to go out to the agency tie table to exclude any agency addresses.
Of course you could have an address type column to differentiate but then you have a more complex query for your insert statement.
So although "address" is a global idea, sometimes it is easier to have it segregated by context.
Secondly, your customer data would usually be changing much more than your agency data. There may also be organisational and legal requirements around storage of personal data that make it better to separate the two.
e.g. in a health records system I want to be able to easily extract / restrict client data and to keep my configuration or commissioning data separate.
Thus in all the client systems I have used, the model tends to be the first one you describe rather than the second.

How best to normalize and reference (FK) locations (Neighborhood/City/Region/Country/Continent)

So I have searched around but haven't found a satisfactory answer.
I have different types of locations, as stated in the title. Given a type of location (i.e. city), the less granular locations can be inferred. I.e. if you know you're in Oregon, it implies you're in the United States, which implies you're in North America.
We have Objects that reference locations, but the granularity is not all the same. Some items might point to neighborhoods, others are only known down to the city level, while some are only known to a region, etc.
There were two ways in which I thought of organizing the data, this is the way I am leaning towards:
Have a generic "Locations" table, with a location "type" and a "parent location" referencing itself. So there'd be an entry for United States of type country, and an entry for Oregon type state which references United States.
i.e.
You can then have the object reference the location off its primary key, and then other locations can be inferred. Does this make sense or is there a better way I could be organizing the data?
The other way I considered was with a different table for each location "type" but then the problem is having our objects referencing it, since the most granular type of location for an object isn't always the same.
If I were to slip other location types in later, for example counties in between Cities and Regions, might this present a problem? I'm thinking it would be no more a problem than with separate tables, but perhaps there's a better way I can keep track of things in a logical way.
This is a case of subclasses, often called subtypes. It's complicated by the fact that some subtypes are contained in other subtypes. The container issue is well handled by classical elementary relational database design.
The subclass issue requires a little explanation. What OOP calls "subclasses" goes by the name "ER Specialization" in ER modeling circles. This tells you how to diagram subclasses, but it doesn't tell you how to implement them.
It's worth mentioning two techniques for implementing subclasses in SQL tables. The first goes by the name "single Table Inheritance". The second goes by the name "Class Table Inheritance". In class table inheritance, you will have one generic table for "locations" with all the attributes that are common to all locations, regardless of type. In the "Cities" table you will have attributes that pertain to cities, but not to countries, etc. You will have other subclass tables for the other types of locations.
If you go this route, you should look up another technique, called "Shared Priomary Key". In this technique, the id field of the subclass tables all contain copies of the id field from the superclass table. This requires a little effort, but it's well worth it.
Shared primary key offers several advantages. It enforces the one-to-one nature of a subclass relationship. It makes joining specialized data with generalized data simple, easy, and fast. It keeps track of which items belong in which subclass, without an extra field.
In your case, there is yet another advantage. Other tables that reference a location by using a foreign key don't have to decide whether to reference the superclass table or the subclass table. A single foreign key that references the superclass table will also implicitly reference one of the subclass tables, although it isn't obvious which one.
This isn't perfect, but it's very, very good. Been there, done that.
For more information, you can google the techniques, or find relevant tags here in SO.
What about:
Countries:
Id,
Name.
Regions:
Id,
CityId,
Name.
Cities:
Id,
RegionId,
Name.
Neighborhoods:
Id,
CityId,
Name.
This for location types. But the main problem in your case is
but the granularity is not all the same.
For this:
Object:
Id,
Name,
LocationId,
Type.
Good question.
You should definitely go with your first option. If you look at any data modeling patterns book, they all choose that way.
Is this North America only, or global?
Issues:
Cities/Towns/Hamlets/Villages are children of Divisions (generic term for state/province), though not in, say, England, where they are children of Country (or is it County)
Postal Areas (postal codes, zip codes) are children of Divisions too, not county or city. Some cities reside entirely in zips, and some zips reside entirely in cities
Counties are children of Division too. Manhattan contains counties, whereas most counties contain cities.
I would read Hay's Enterprise Model Patterns if you are hoping for a global solution. It's on safari for cheap.

Design Pattern required for database schema with two classes both composing a third class

Consider a system which has classes for both letters and people; both of these classes compose an address. When designing a database for the system it seems sensible to have a separate schema for the address but this leads to an issue; I don't know how to have a foreign key in the address table cleanly identify what it belongs to because the foreign key could identify either a letter or a person. Further more I expect that further classes will be added which will also compose an address.
I would be grateful for design patterns addressing this kind of design point.
Best Regards
I don't know how to have a foreign key
in the address table cleanly identify
what it belongs to because the foreign
key could identify either a letter or
a person.
It sounds like you have got that the wrong way around. There would be no foreign key in the address table; rather, the letters table would have a foreign key referencing the address table and the persons table would also have a foreign key referencing the address table.
In SQL, the REFERENTIAL_CONSTRAINTS view in the Information Schema catalog will tell you which tables are referencing the address table.
In our shop we regularly debate whether an address should be modelled as an entity in its own right. The problem with treating an address as an entity is that there is no reliable key beyond the attributes themselves (postal code, house name or number, etc) and many variations can identify the same address (I can write my home address twenty different ways). Then you need to consider post office boxes, care of addresses, Santa Claus, etc. And the Big Question: do you allow an address to be amended? If someone moves house, do they keep the same address entity with amended attributes (that way all linked entities get the address change) or do you lock-down addresses' attributes and force the creation of a new address entity then relink all the referencing addresses to the new one (and do you delete the now-orphaned address entity and if yes then why did you bother to replace it...?) But your application still needs to allow an address to be amended in case the post office changes it in real life, but how to you prevent this ability from being misused i.e. using it for aforementioned illegal house moves? You can use a webservice to scrub your address data so that is is correct but an external system way have less clean data and therefore you can't get your address entities to match anymore...?
I like to keep it simple: an address is a single attribute being the plaintext you must put on an item of mail for it to be delivered by the post office to the addressable entity in question; it is not an entity in its own right because it lacks an identifier. For practical reasons (e.g. being able to print it on an address label), this single attibute is usually split into subatomic elements (address_line_1, address_line_2, ... postal_code, whatever).
Because most SQL products lack support for domains, I have no problem duplicating the column names, data types, constraints, etc between for each table that models an addressable entity.
Surely the foreign keys should be from the Letter and People tables referencing the primary key on the Address table?
So, the Letter table contains an AddressId column referencing the Id on the Address table, as does the Person table, and any future classes which compose an address.
If Letters and Persons compose multiple addresses, then intermediate link tables will be required.

What would be the best schema to store the 'address' for different entities?

Suppose we're making a system where we have to store the addrees for buildings,
persons, cars, etc.
The address 'format' should be something like:
State (From a State list)
County (From a County List)
Street (free text, like '5th Avenue')
Number (free text, like 'Chrysler Building, Floor 10, Office No. 10')
(Yes I don't live in U.S.A)
What would be the best way to store that info:
Should I have a Person_Address, Car_Address, ...
Or the address info should be in columns on each entity,
Could we have just one address table and try to link each row to a different entity?
Or are there another 'better' way to handle this type of scenario?
How would yo do it?
I have seen scenarios where the Address is stored in an Address table and then there are many-to-many link tables which store links to addresses from People - there is a separate table for each so that foreign keys can be enforced. Sometimes the link table stores information about the relationship, like primary, ship-to, etc.
I've also seen it where the address is stored in the row of a customer. This results in effectively arrays of addresses for bill-to, ship-to, etc, and it's fine that way. Having dealt with both, I think I prefer having them in their own entities, it allows you to keep history of old inactive addresses pretty easily.
We've used this same technique for phone numbers, where people need to store varying numbers of phone numbers.
I would highly recommend reading 'Data Model Patterns - Conventions of Thought' by David C. Hay. This issue is discussed in depth by the author.
What you have in your design are two broad entities.
Address of a geographical location
A person/object that resides/belongs to the address
In general, it is not a good practice to combine the address with a person or objects' details in the same table like below
Person(personID, name, gender, addressline1, addressline2)
You could have the following entities in your design
Address(number, street, countyID,stateID)
Party(PartyID, Type)
Person(PersonID, name, dob, gender,...,primaryPartyID)
Car(carID, make, model, ...,primaryPartyID)
The Party is a link between person/car to an address. The primaryPartyID in person and Car tables are foreign keys to the party table. This way, you can share and address between a car and a person. In the event you want to store multiple addresses for each person, you could add a separate m:n table between person and party. The type attribuet for Party can take the following values : 'Person', 'Vehicle' etc...
I'd say to have an AddressType field that is a lookup from a Drop-Down list

Best way to model Customer <--> Address

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?
Option 1. Customer has foreign key to Address
Customer (id, phys_address_id, mail_address_id)
Address (id, street, city, etc.)
Option 2. Customer has one-to-many relationship to Address, which contains a field
to describe the address type
Customer (id)
Address (id, customer_id, address_type, street, city, etc.)
Option 3. Address information is de-normalized and stored in Customer
Customer (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)
One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?
I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.
If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach
Customer (id, phys_address_id)
Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
Address (id, street, city, etc.)
One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.
In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.
Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)
On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.
So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.
We are moving forward with a model like this:
Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)
Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.
The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.
We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).
The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.
I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.
When answering those kinds of questions I like to use the classifications of DDD. If it's a Entity it should have a separate ID, if it's a value object it should not.
Option 3 is too restrictive, and option 1 cannot be extended to allow for other address types without changing the schema.
Option 2 is clearly the most flexible and therefore the best choice.
In most code I write nowadays every customer has one and only one physical location. This is the legal entity beeing our business partner. Therefore I put street, city etc in the customer object/table. Often this is the possible simplest thing that works and it works.
When an additional mailing address is needed, I put it in a separate object/table to not clutter the customer object to much.
Earlier in my career I normalized like mad having an order referencing a customer which references a shipping address. This made things "clean" but slow and inelegant to use. Nowadays I use an order object which just contains all the address information. I actually consider this more natural since a customer might change his (default?) address, but the address of a shipment send in 2007 should always stay the same - even if the customer moves in 2008.
We currently implement the VerySimpleAddressProtocol in out project to standardize the fields used.
I'd go for the first option. In these situations I'm very weary of YAGNI (you aren't going to need it). I can't count the number of times I've looked at schemas that've had one-to-many tables "just incase" that are many years old. If you only need two, just use the first option; if the requirement changes in the future, change it then.
Like in many cases: It depends.
If your customers deal with multiple addresses then a to-many relationship would be appropriate. You could introduce a flag on address that signals if an address is for shipment or bill, etc. Or you store the different address types in different tables and have multiple to-one relationships on a customer.
In cases where you only need to know one address of a customer why would you model that to-many? A to-one relationship would satisfy your needs here.
Important: Denormalize only if you encounter performance issues.
I would go with option 1. If you want to, you could even modify it a little bit to keep an address history:
Customer (id, phys_address_id, mail_address_id)
Address (id, customer_id, start_dt, end_dt, street, city, etc.)
If the address changes, just end date the current address and add a new record in the Address table. The phys_address_id and mail_address_id always point to the current address.
That way you can keep a history of addresses, you could have multiple mailing addresses stored in the database (with the default in mail_address_id), and if the physical address and mailing address are identical you'll just point phys_address_id and mail_address_id at the same record.
Good thread. I have spent a while contemplating the most suitable schema and I have concluded that quentin-starin's solution is the best except I have added start_date and end_date fields to what would be his PersonAddress table. I have also decided to add notes, active and deleted.
deleted is for soft delete functionality as I think I do not want to lose trace of previous addresses simply by deleting the record from the junction table. I think that is quite wise and something others may want to consider. If not done this way, it could be left to revision of paper or electronic documents to try to trace address information (something best avoided).
notes I think of being something of a requirement but that might just be preference. I've spent time in backfill exercises verifying addresses in databases and some addresses can be very vague (such as rural addresses) that I think it is very useful to at least allow notes about that address to be held in the record address.
One thing i would like to hear opinions on is the unique indexing of the address table (again, referring to the table of the same name in quentin-starin's example. Do you think it should be unique index should be enforced (as a compound index presumably across all not-null/required fields)? This would seem sensible but it might still be hard to stop duplicate data regardless as postal/zip codes are not always unique to a single property. Even if the country, province and city fields are populated from reference data (which they are in my model), spelling differences in the address lines may not match up. The only way to best avoid this might be to run one or a number of DB queries from the incoming form fields to see if a possible duplicate has been found. Another safety measure would be give the user the option of selecting from address in the database already linked to that person and use that to auto-populate. I think this might be a case where you can only be sensible and take precautions to stop duplication but just accept it can (and probably will) happen sooner or later.
The other very important aspect of this for me is future editing of the address table records. Lets say you have 2 people both listed at: -
11 Whatever Street
Whatever City
Z1P C0D3
Should it not be considered dangerous to allow the same address table record to be assigned to different entities (person, company)? Then let's say the user realises one of these people lives at 111 Whatever Street and there is a typo. If you change that address, it will change it for both of the entities. I would like to avoid that. My suggestion would be to have the model in the MVC (in my case, PHP Yii2) look for existing address records when a new address is being created known to be related to that customer (SELECT * FROM address INNER JOIN personaddress ON personaddress.address_id = address.id WHERE personaddress.person_id = {current person being edited ID}) and provide the user the option of using that record instead (as was essentially suggested above).
I feel linking the same address to multiple different entities is just asking for trouble as it might be a case of refusing later editing of the address record (impractical) or risking that the future editing of the record may corrupt data related to other entities outside of the one who's address record is being edited.
I would love to hear people's thoughts.