Sql one foreign key for different tables - sql

First of all, I do realize that there are plenty of questions regarding this problem and there is a good chance that I already know one of the answers, but after I've tried a couple of them, none of them worked for me.
In my application I am handling communication between user and customer. Both users and customers can have multiple e-mail addresses. Also, both users and customers can be assigned to multiple conversations at the same time. After user or customer sends an e-mail message, system receives information about that and creates a ConversationMessage with basic info about the message and EmailMessage with specific data (subject, FROM, TO etc.).
Right now, fields FROM and TO are just varchar fields with e-mail addresses. I would like to change it so that FROM and TO fields are foreign keys. I was planning to create an EmailAddresses table that would store basic info about the address and UserEmailAddresses and ContactPersonEmailAddresses tables with specific info about address (like host, port etc.), but I realized that EmailAddresses table would contain just an Id. Is it a good approach? Am I missing something?
Are there any better solutions to this kind of problem?
Thank you for any help!

There are numerable issues with the design but let's limit the discussion to the email addresses. You store the addresses in two tables. This, unsurprisingly, is causing problems.
An email address is an entity. Store them in a table. Anywhere a single address is needed, place a FK to the one address table there. Where multiple addresses is or may be needed, place an intersection table there.
create table EmailAddresses(
ID int auto_generating primary key,
Addr varchar not null unique,
Active bool
);
Are such fields as Login, Port, etc. really attributes of an email address? Maybe you could clarify those if you would.
Then the tables UserEmailAddresses and ContactEmailAddresses would be intersection tables and look something like this:
create table [User|Contact]EmailAddresses(
UserID int not null references [Users|Contacts]( ID )],
AddrID int not null references EmailAddresses( ID )
);
As an email message can only have one "From" value, that would be in the EmailMessages row as it is now except it can now be a FK to the one address table. A message may have one or many "To" values so that would be implemented also as an intersection table:
create table EmailTo(
EmailID int not null references EmailMessages( ID )],
ToID int not null references EmailAddresses( ID ),
ToType enum( To, CC, BCC )
);
There are probably other requirements that would require some constraints on any or all of the tables above but those depend on your usage. For example, even though email applications allow the same address to exist more than once in a From list, you may want to catch and restrict such occurrences. This would be implemented by a unique constraint on (EmailID, ToID)
Possible extensions would be the addition of email lists which themselves contain a group of addresses. That would require the EmailAddresses table to split but because all addresses are now located in one table, such a redesign would certainly be easier that having them located in two tables.

Related

Can multiple relationship occur between two entities?

I have following two entities, where a person address is kept in a separate table because a person can have multiple addresses.
The mailing address however is one of the multiple addresses stored in address table, which is to be referred in person table.
Can following relationship exist?
The answer to your direct question is "Yes". However, you cannot declare the model only using create table because:
Declaring the foreign key address.person_id requires that the person table already exist.
Declaring the foreign key person.mailing_address requires that the address table already exist.
Hence, to implement the model, you need to use alter table to add one or both of the constraints after both tables are created.
Is this the model you want? One feature of an address is that multiple people can have the same address. Your model does not allow that. To handle this, you would typically have three tables:
Person
Address
PersonAddress
The third table has one row for each person/address pair. It can also have other information such as:
Type ("mailing" versus other types)
Effective and end dates.
Perhaps other information.
If you want to guarantee uniqueness of the "mailing" address in such a model, many databases support filtered unique indexes, to ensure there are no duplicate mailing addresses.
I'm not sure why you have person_id as an fk on address but that doesn't look correct. There are lots of correct ways to model this and the best one for you will depend on you particular circumstances - but a couple of options are:
If you know all the types of addresses there can be then add multiple address fk fields to the person e.g. billing address, shipping address, etc. This makes querying quick and simple but is inflexible: adding a new address type in the future is relatively complex to implement
Add an intersection table with fks for person and address and an address type field. This has a slight overhead when it comes to querying but has the advantage if being very flexible: adding a new address type is trivial

How do I create a table model in SQL?

Let's say I've 3 tables: enterprise, users, houses.
Everyone needs an address. What should I do?
Create a table address for everyone?
Create one unique table address?
Assuming that the first is "more" correct (cache, fragmentation, size, ...), how should I write it in plain SQL?
How do I create a "model" of the table address, and use it in custom tables (e.g. enterprise_address)?
* So when I change one field in the model, it gets replicated.
It appears you need every user to have an address. You have multiple ways to solve it.
Handle it in UI
Your users table will be just information about users (on address info here)
Create addresses table with addressid as primary key
Create many-to-many table/junction table called user_addresses that brings userid and addressid together
Ensure that your UI mandates address for each user
Tables
create table addresses (addressid int not null PK, line1, line2, city ...);
create table users (userid int not null PK, username, ...);
create table user_addresses (user_addressid int, userid int, addressid int, UK userid + addressid, FK to users, FK to addresses);
This allows a user to have multiple addresses. You have the flexibility of marking which one is primary.
Handle it in DB (and UI)
create table addresses (addressid int not null PK, ....);
create table users (userid int not null PK, username varchar(20), ..., addressid int not null, FK to addresses);
This allows one address per user.
and handle it also on the UI to ensure that address is supplied. Enter address into addresses first and then into users (make it an atomic process; use transactions!)
You can start with one these methods and analyze what your business needs are. Work with your DBA, if you have one, to see what their experience says about the business and previous DB designs similar to this.
Just from the three given tables I would assume that address is something you would want to put in either Houses or Users.
Depending on how much detail you want to keep track of in the Address it could be work making a separate table but there are a number of ways to go about this and the correct method depends solely on what you are trying to achieve.
Without knowing any of the required information all I can really advise is that you should aim to store the addresses as unique values. Once you have this you can assign each address to any other table you wish using a Foreign Key.
If you want to store multiple fields for the address then you will require a seperate table, if a single field will do then it could just as easily be added to the person or houses table.
One thing that would make a major difference to which way you would want to do this is the relationship between address and your other entities. For example if a user can have many addresses then you cant make it a part of the user table, and if an address can have multiple users assigned to it then it cant use a FK to represent its relationship to the users table.
One way you can do this is to make a relational table. So you make addresses and users seperate tables then have another table with just an id(pk) and 2 fks linking it to each of the other tables. This way you can have a many to many relationship if you wish.
First question: how many addresses per enterprise/user/house? If there are multiple addresses per entity, that leads you to want a separate address table, possibly with an address type.
Examples of multiple address situations:
Multiple locations
Ship-to addresses
Physical vs. PO Box
If you are going to assert that an entity can only have one address, then you probably prefer making the address part of the entity row, for simplicity. However, there can be reasons not to do that; for example, table width (number of columns in the table).
If you have multiple addresses per entity, typically you will use a foreign key. The entity should have an abstract key: some kind of ID. This will either be an integer or, in SQL Server, it could be a uniqueidentifier. The entity ID will be part of the key for the address.
Ok, I found the answer to postgresql: use INHERITS
e.g. CREATE TABLE name () INHERITS (table_model)
Thank you all for the answers.

One-To-One reference side

I'm not sure if I named this question right.
Imagine that there is two tables: one that represents Users and another represents Addresses.
One user can have only one address:
CREATE TABLE User (
id INT NOT NULL,
full_name VARCHAR(255),
PRIMARY KEY(id)
);
CREATE TABLE Address (
id INT AUTO_INCREMENT NOT NULL,
address_line_1 VARCHAR(255),
address_line_2 VARCHAR(255),
PRIMARY KEY(id)
);
The question is: where is the right place to put foreign key reference to? Should I put a reference column into the User table by creating "address_id" column that references to Address.id? Or should I put user_id column into the Address table that references to User.id?
When you have 1-1 association, you need to ask yourself why is it there in the first place? Most of the times it indicate that the design is incomplete or inaccurate. Sometimes 1-1 is correct, don't get me wrong, but you have to justify its existence.
Having said that,in your case, chances are you will be accessing the user as the main table. Your end-users may be looking for user profiles by userid,then drill down or navigate to more profile information such as the address. If this is the case, the FK would belong to the address table. However, if your business is such that your end-user will be browsing address information primarily then drill down to user information then the FK would belong to the user table.
Another thing you need to complete here is whether the FK is mandatory or optional (Not Null or Nullable) since this would affect RI. What happens if I delete a user? Should the corresponding address go away?
Here there is chance of getting more number of users in same address hence placing the Foreign key in address table would be my choice.

Design Pattern required for database schema with two classes both composing a third class

Consider a system which has classes for both letters and people; both of these classes compose an address. When designing a database for the system it seems sensible to have a separate schema for the address but this leads to an issue; I don't know how to have a foreign key in the address table cleanly identify what it belongs to because the foreign key could identify either a letter or a person. Further more I expect that further classes will be added which will also compose an address.
I would be grateful for design patterns addressing this kind of design point.
Best Regards
I don't know how to have a foreign key
in the address table cleanly identify
what it belongs to because the foreign
key could identify either a letter or
a person.
It sounds like you have got that the wrong way around. There would be no foreign key in the address table; rather, the letters table would have a foreign key referencing the address table and the persons table would also have a foreign key referencing the address table.
In SQL, the REFERENTIAL_CONSTRAINTS view in the Information Schema catalog will tell you which tables are referencing the address table.
In our shop we regularly debate whether an address should be modelled as an entity in its own right. The problem with treating an address as an entity is that there is no reliable key beyond the attributes themselves (postal code, house name or number, etc) and many variations can identify the same address (I can write my home address twenty different ways). Then you need to consider post office boxes, care of addresses, Santa Claus, etc. And the Big Question: do you allow an address to be amended? If someone moves house, do they keep the same address entity with amended attributes (that way all linked entities get the address change) or do you lock-down addresses' attributes and force the creation of a new address entity then relink all the referencing addresses to the new one (and do you delete the now-orphaned address entity and if yes then why did you bother to replace it...?) But your application still needs to allow an address to be amended in case the post office changes it in real life, but how to you prevent this ability from being misused i.e. using it for aforementioned illegal house moves? You can use a webservice to scrub your address data so that is is correct but an external system way have less clean data and therefore you can't get your address entities to match anymore...?
I like to keep it simple: an address is a single attribute being the plaintext you must put on an item of mail for it to be delivered by the post office to the addressable entity in question; it is not an entity in its own right because it lacks an identifier. For practical reasons (e.g. being able to print it on an address label), this single attibute is usually split into subatomic elements (address_line_1, address_line_2, ... postal_code, whatever).
Because most SQL products lack support for domains, I have no problem duplicating the column names, data types, constraints, etc between for each table that models an addressable entity.
Surely the foreign keys should be from the Letter and People tables referencing the primary key on the Address table?
So, the Letter table contains an AddressId column referencing the Id on the Address table, as does the Person table, and any future classes which compose an address.
If Letters and Persons compose multiple addresses, then intermediate link tables will be required.

Use email address as primary key?

Is email address a bad candidate for primary when compared to auto incrementing numbers?
Our web application needs the email address to be unique in the system. So, I thought of using email address as primary key. However my colleague suggests that string comparison will be slower than integer comparison.
Is it a valid reason to not use email as primary key?
We are using PostgreSQL.
String comparison is slower than int comparison. However, this does not matter if you simply retrieve a user from the database using the e-mail address. It does matter if you have complex queries with multiple joins.
If you store information about users in multiple tables, the foreign keys to the users table will be the e-mail address. That means that you store the e-mail address multiple times.
I will also point out that email is a bad choice to make a unique field, there are people and even small businesses that share an email address. And like phone numbers, emails can get re-used. Jsmith#somecompany.com can easily belong to John Smith one year and Julia Smith two years later.
Another problem with emails is that they change frequently. If you are joining to other tables with that as the key, then you will have to update the other tables as well which can be quite a performance hit when an entire client company changes their emails (which I have seen happen.)
the primary key should be unique and constant
email addresses change like the seasons. Useful as a secondary key for lookup, but a poor choice for the primary key.
Disadvantages of using an email address as a primary key:
Slower when doing joins.
Any other record with a posted foreign key now has a larger value, taking up more disk space. (Given the cost of disk space today, this is probably a trivial issue, except to the extent that the record now takes longer to read. See #1.)
An email address could change, which forces all records using this as a foreign key to be updated. As email address don't change all that often, the performance problem is probably minor. The bigger problem is that you have to make sure to provide for it. If you have to write the code, this is more work and introduces the possibility of bugs. If your database engine supports "on update cascade", it's a minor issue.
Advantages of using email address as a primary key:
You may be able to completely eliminate some joins. If all you need from the "master record" is the email address, then with an abstract integer key you would have to do a join to retrieve it. If the key is the email address, then you already have it and the join is unnecessary. Whether this helps you any depends on how often this situation comes up.
When you are doing ad hoc queries, it's easy for a human being to see what master record is being referenced. This can be a big help when trying to track down data problems.
You almost certainly will need an index on the email address anyway, so making it the primary key eliminates one index, thus improving the performance of inserts as they now have only one index to update instead of two.
In my humble opinion, it's not a slam-dunk either way. I tend to prefer to use natural keys when a practical one is available because they're just easier to work with, and the disadvantages tend to not really matter much in most cases.
No one seems to have mentioned a possible problem that email addresses could be considered private. If the email address is the primary key, a profile page URL most likely will look something like ..../Users/my#email.com. What if you don't want to expose the user's email address? You'd have to find some other way of identifying the user, possibly by a unique integer value to make URLs like ..../Users/1. Then you'd end up with a unique integer value after all.
It is pretty bad. Assume some e-mail provider goes out of business. Users will then want to change their e-mail. If you have used e-mail as primary key, all foreign keys for users will duplicate that e-mail, making it pretty damn hard to change ...
... and I haven't even started talking about performance considerations.
I don't know if that might be an issue in your setup, but depending on your RDBMS the values of a columns might be case sensitive. PostgreSQL docs say: „If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive“. In other words, if you accept user input for a search in a table with email as primary key, and the user provides "John#Doe.com", you won't find “john#doe.com".
At the logical level, the email is the natural key.
At the physical level, given you are using a relational database, the natural key doesn't fit well as the primary key. The reason is mainly the performance issues mentioned by others.
For that reason, the design can be adapted. The natural key becomes the alternate key (UNIQUE, NOT NULL), and you use a surrogate/artificial/technical key as the primary key, which can be an auto-increment in your case.
systempuntoout asked,
What if someone wants to change his email address? Are you going to change all the foreign keys too?
That's what cascading is for.
Another reason to use a numeric surrogate key as the primary key is related to how the indexing works in your platform. In MySQL's InnoDB, for example, all indexes in a table have the primary key pre-pended to them, so you want the PK to be as small as possible (for speed's and size's sakes). Also related to this, InnoDB is faster when the primary key is stored in sequence, and a string would not help there.
Another thing to take into consideration when using a string as an alternate key, is that using a hash of the actual string that you want might be faster, skipping things like upper and lower cases of some letters. (I actually landed here while looking for a reference to confirm what I just said; still looking...)
yes, it is better if you use an integer instead. you can also set your email column as unique constraint.
like this:
CREATE TABLE myTable(
id integer primary key,
email text UNIQUE
);
Yes, it is a bad primary key because your users will want to update their email addresses.
Another reason why integer primary key is better is when you refer to email address in different table. If address itself is a primary key then in another table you have to use it as a key. So you store email addresses multiple time.
I am not too familiar with postgres. Primary Keys is a big topic. I've seen some excellent questions and answers on this site (stackoverflow.com).
I think you may have better performance by having a numeric primary key and use a UNIQUE INDEX on the email column. Emails tend to vary in length and may not be proper for primary key index.
some reading here and here.
Personally, I do not use any information for primary key when designing database, because it is very likely that I might need to alter any information later. The sole reason that I provide primary key is, it is convenience to do most SQL operation from client-side, and my choice for that has been always auto-increment integer type.
I know this is a bit of a late entry but i would like to add that people abandon email accounts and service providers recover the address allowing another person to use it.
As #HLGEM pointed out "Jsmith#somecompany.com can easily belong to John Smith one year and Julia Smith two years later." in this case should John Smith want your service you either have to refuse to use his email address or delete all your records pertaining to Julia Smith.
If you have to delete records and they relate to the financial history of the business depending on local law you could find yourself in hot water.
So i would never use data like email addresses, number plates, etc. as a primary keys because no matter how unique they seem they are out of your control and can provide some interesting challenges that you may not have time to deal with.
You may need to consider any applicable data regulation legislation. Email is personal information, and if your users are EU citizens for instance then under GDPR they can instruct you to delete their information from your records (remember this applies regardless of which country you are based).
If you need to keep the record itself in the database for referential integrity or historical reasons such as audit, using a surrogate key would allow you to just NULL all the personal data field. This obviously isn't as easy if their personal data is the primary key
Your colleague is right: Use an autoincrementing integer for your primary key.
You can implement the email-uniqueness either at the application level, or you coudl mark your email address column as unique, and add an index on that column.
Adding the field as unique will cost you string comparision only when inserting into that table, and not when performing joins and foreign key constraint checks.
Of course, you must note that adding any constraints to your application at the database level can cause your app to become inflexible. Always give due consideration before you make any field "unique" or "not null" just because your application needs it to be unique or non-empty.
Use a GUID as a primary key... that way you can generate it from your program when you do an INSERT and you don't need to get a response from the server to find out what the primary key is. It will also be unique accross tables and databases and you don't have to worry about what happens if you truncate the table some day and the auto-increment gets reset to 1.
you can boost the performance by using integer primary key.
you should use an integer primary key. if you need the email-column to be unique, why don't you simply set an unique-index on that column?
If you have a non int value as primary key then insertions and retrievals will be very slow on large data.
primary key should be chosen a static attribute. Since email addresses are not static and can be shared by multiple candidates so it is not a good idea to use them as primary key. Moreover email addresses are strings usually of a certain length which may be greater than unique id we would like to use[len(email_address)>len(unique_id)] so it would require more space and even worst they are stored multiple times as foreign key. And consequently it will lead to degrade the performance.
It depends on the table. If the rows in your table represent email addresses, then email is the best ID. If not, then email is not a good ID.
If it's simply a matter of requiring the email to be unique then you can just create a unique index with that column.
Email is a good unique index candidate, but not for primary key, if it is a primary key, you will be no able to change the contact's emails address for example.
I think your join querys will be slower too.
don not use email address as primary key , keep email as unique but don not use it as primary key, use user id or username as primary key