PK for table that have not unique data - sql

I have 2 tables like
Company( #id_company, ... )
addresses( address, *id_company*, *id_city* )
cities( #id_city, name_city, *id_county* )
countries( #id_country, name_country )
What i want is :
It is a good design ? ( a company can have many addresses )
And the important thing is that you my notice that i didn't add a PK for addresses table because every address of a companies will be different, so am I right ?
And i will never have a where in a select that specify a address.

First of all we should distinguish natural keys and technical keys. As to natural keys:
A country is uniquely identified by its name.
A city can be uniquely identified by its country and a unique name. For instance there are two Frankfurt in Germany. To make sure what we are talking about we either use the distinct names Frankfurt/Main and Frankfurt/Oder or use the city name with its zip codes range.
A company gets identified by its full name usually. Or use some tax id, code, whatever.
To uniquely identify a company address we would take the company plus country, city and address in the city (street name and number usually).
You've decided to use technical keys. That's okay. But you should still make sure that names are unique. You don't want France and France in your table, it must be there just once. You don't want Frankfurt and Frankfurt without any distinction in your city table for Germany either. And you don't want to have the same address twice entered for one company.
company( #id_company, name_company, ... ) plus a unique constraint on name_country or whatever makes a company unique
countries( #id_country, name_country ) plus a unique constraint on name_country
cities( #id_city, name_city, id_county ) plus a unique constraint on name_city, id_country
addresses( address, id_company, id_city ) with a unique constraint on all three columns
From what you say, it looks like you want the addresses only for lookup. You don't want to use them in any other table, not now and not in the future. Well, then you are done. As you need a unique constraint on all three columns, you could just as well declare this as your primary key, but you don't have to.
Keep in mind, that to reference a company address in any other future table, you would have to store address + id_company + id_city in that table. At that point you would certainly like to have an address id instead. But you can add that when needed. For now you can do without.

It's okay - you might want to add some non-unique index on company_id so company address queries are sped up. Another option would be making a joining table between Company and Address, but that would probably only be justified if Address stored more data(so searches would be slower).

This design is fine.
A (relational) table always has a (candidate) key. (One of which you can choose as the primary key, but candidate keys, aka keys, are what matter.) Because if no subset of columns smaller than set of all columns is unique then the key is the set of all columns.
Since every table has one, in SQL you should declare it. Eg in SQL if you want to declare a FOREIGN KEY constraint to the key of this table then you have to declare that column set a key via PRIMARY KEY, KEY or UNIQUE. Also, telling the DBMS what you know helps optimize your use of it.
What matters to determining keys are subsets of columns that are unique that don't have smaller subsets that are unique. Those are the keys.
A company, address or city is not unique since you are going to have multiple of each.
A (city,address) is not unique normally.
A (city,company) is not unique normally.
A (company,address) is not unique normally.
So (company,address,city) is the (only) (candidate) key.
Note that if there were only ever one city, then (company,address) would be the key. And if there were only ever one company, then (address,city) would be the key. So your given reason that the "because every address[+city?] of a company [?] will be different" isn't sound unless we're supposed to assume other things.

I'm making this an answer instead of a comment because of length. As to the address table having a defined primary key, the answer is yes. There are several good reasons but just consider this one.
Suppose a company had several addresses and a move required you to delete one of the addresses. You can't just delete where comp_id = x as that would delete all the addresses for that company. You have to have where comp_id = x and something_else where the something else must differentiate the one address from all the others for that company. So you have to have someone look at the different addresses to see how they differ and select the one difference that correctly identifies the one address and then write that correctly into the where clause.
That's a lot of work to do every time you want to delete (or update) an address.
It also means it's more difficult to write a parameterized delete statement that can be used to delete any address. Suppose a company has several locations in the same building: Shipping in Suite 101, Marketing in Suite 202 and IT in (of course) the basement. So the street, city, state, everything is the same, different only in Suite_No or whatever is used to refine the address.
Then consider your user. Most of the time, a user isn't going to be interested in seeing every single address you have listed for a company. He's only interested in Product Testing. You should be able to give them Product Testing's address and no other. Users are not known for their patience when presented with a data dump every time they do a query and it's up to them to select the one they're looking for.
It just solves so many problems to be able to specify where addr_id = x.

An address is a thing and should have its own table.
An address can exist without a company, therefore it should not have a foreign key to company. Also, what if you start selling to/buying from individuals?
A company can have zero, one, or many addresses.
Two or more companies can have the exact same address. You assumption is flawed.
Use a junction table:
company -< company_address >- address

Related

How to invert a an incorrectly modeled 1:1 relationship?

I have an incorrectly modeled 1:1 relationship between two tables:
Table: Customer
* id (bigint)
* ...
Table: Address
* id
* customer_id (bigint) <--- FOREIGN KEY
* street (varchar)
* ...
The real-world relationship is so that a customer may have one address or not. However, with the current data model, it would be possible to assign multiple addresses to a customer. We do not do this at the moment, so the data could be migrated to this:
Table: Customer
* id (bigint)
* address_id (nullable bigint)
* ...
Is it possible to make this migration in one transaction, using purely SQL code? I would like to avoid an intermediate state where we have both relationships and migrate the customers one-by one. That is the best idea I came up with so far.
What I understood so far is you want to add address_id column in customer table. If there is more than one address for a customer then you might need to select only one address. Here I am considering last address.
update Customer
set address_id=(select max(id) from Address a where a.customer_id=Customer.id)
You can actually leave your data model as is and just add a unique constraint to address:
alter table address unq_address_customer unique (customer_id);
This is not ideal, but it does enforce the unique constraint with minimal changes to the data model.
That said, I would question why you want only one address per customer. Have you considered these situations?
Customers whose "delivery" address and whose "billing" address are different.
Customers who move.
Customers whose address changes although they do not move, say due to postal code reassignments or street name changes.

Composite primary key: Finding one attribute using another

Data fields
I am designing a database table structure. Say that we need to record employee profiles from different companies. We have the following fields:
+---------+--------------+-----+--------+-----+
| Company | EmployeeName | Age | Gender | Tel |
+---------+--------------+-----+--------+-----+
It's possible that two employees from different company may have the same name (and assume that no 2 employee has the same name in the same company). In this case a composite primary key (Company, EmployeeName) would be necessary in my opinion.
Search
Now I need to get all information by using only one of the 2 attributes in the primary key. For example,
I want to search all employees' profile of Company A:
SELECT EmployeeName, Age, Gender, Tel FROM table WHERE Company = 'Company A'
And I can also search all employees from different company named Donald:
SELECT Company, Age, Gender, Tel FROM table WHERE EmployeeName = 'Donald'
Strategy
In order to implement this requirement, my strategy would be storing all data in a single table, which is easy to read and understandable. However I noticed that it may take a long time to search as the query may need to iterate through all rows. I would like to retrieve these information as quick as possible. Would there be a better strategy for this?
First, your rows should have a unique identifier for each row -- identity/auto-increment/serial, depending on the database. Second, you might reconsider names being unique. Why can't two people at the same company have the same name?
In any case, you have a primary key on, say, (company, name). For the opposite search you simply want another index on (name, company):
create index idx_profiles_name_company on profiles(name, company);
A note explaining Gordon's suggestion for an identity on each row. This is supplemental to his answer above.
In theory there is nothing wrong with a primary key that crosses columns and in a db like PostgreSQL I like to have identity values as secondary keys (i.e. not null unique) and specify natural primary keys. Of course on MS SQL Server or MySQL/InnoDB that would be a recipe for problems. I would also not say "all" but rather "almost all" since there are times when breaking this rule is good.
Regardless, having an identity row simplifies a couple of things and it provides an abstraction around keys in case you get things wrong. Composite keys provide a couple issues that end up eating time (and possibly resulting in downtime) later. These include:
Joins on composite keys are often more expensive than those on simple values, and
Adding or changing a natural primary key which crosses columns is far harder when joins are involved
So depending on your db you should either specify a unique secondary key or make your natural primary key separate (which you should do depends on storage and implementation specifics).

Design Sql Tables with common columns

I have 5 tables that have the same structure and same columns: id, name, description. So I wonder what is the best way to design or to avoid having 5 tables that have the same columns:
Create a category table that will include my three common
columns and another column "enum" that will differentiate my categories
ex (city, country, continent, etc.)
Create a category table that will include my three common
columns and create the other five tables that will just include an
id.
Note that I would have an assocation table that should include the id of cities, id countries, id continents, etc. so i can display them into a report
Thank you for your advice.
The decision on how many tables to have under these circumstances simply depends.
The most important factor is whether the five things are independent entities or whether they are related. A simple way to understand this is by understanding foreign key relationships: Will other tables have a column that could refer to any of the five (say "geoid")? Or will other tables have a column that generally refers to one of the five ("cityid", "countryid")? The ability to define helpful foreign key constraints often drives the table structure.
There are other considerations. If your data is at the geographic level, then it might represent hierarchies . . . cities are in countries, countries are on continents. Some databases (such as MySQL) do not support hierarchical queries at all. Under these circumstances, you might consider denormalizing the data for querying purposes.
Other considerations can also come into play. If your application is going to be internationalized, then having all the reference tables in a single place is handy -- for providing cultural-specific information (language, currency symbol, and so on). One way of handling this situation is to put all such references in a single table (and perhaps using more sophisticated foreign key relationships).
The column names are not important, just the data in the columns. If City description, country description and continent description are different information then you are already doing this the right way. The only time you would aim to reduce this data would be if you were repeating information but for the titles of the data it's fine.
In fact. You are doing this correctly. Country will have different values from city for every field mentioned. Id is just an id, every table should have one. Name and description wont be the same across country and city.
Also, this way if you want a countrys name you dont have to go through every country, continent and city. You only have 192 or so entries to go through. If you had all of that in one massive table you would have to use it for everything and go through every result every time you want data. You would also have to distinguish between cities, countries and continents in some other way than the separate tables.
Eg:
method 1, with 5 tables:
SELECT * FROM country
does the same as
method 2, 1 table:
SELECT * FROM table WHERE enumvalue = 'country';
If you have tables representing city, country and continent, and they all have exactly the same fields, you have a fundamental problem. In real life, every city is in a country and every country is in at least one continent (more or less) but your data structure does not reflect that. Your city table should look something like this:
id (primary key)
countryId (foreign key to country)
name
other fields
You'll need a similar relationship between countries and continents. However, before you do so, you have to decide what to do about countries like Russia which is in two continents and Palau which isn't really in any.
You may also want to have a provinceStateTerritory table to help you sort out the 38 places in the United States named Springfield. Or, you may want to handle that situation differently.

Why is this table not normalized?

I am taking a database course and I am studying table normalization.
Could anyone explain to me, why the second table in the first row on the right not normalized?
It is not normalized because
For a student who has signed for more than one course, the entries in the table will be:
23 Jake Smith CS101 B+
23 Jake Smith B102 C+
Clearly the data is being repeated(redundant data). It is leading to anomalies(insert, update, delete anomalies).
Ex:When you have to change the name of a Student say Jake Smith, you have to modify all of the rows,this is called an update anomalie.
Normalization is used to avoid these kind of anomalies and redundant data storage.
The table on the right hand side in the second row handles this situation in a better way, as it stores id, name and DOB in a separate table, the edits can be made easily using id attribute on a single row.
There are several normal forms like 1NF, 2NF, 3NF etc. Each normal form has some constraints associated with it. Each Higher form being stricter than the previous one.
I suppose it is table for students grades. It is not normalized because it contains students names directly, instead of references to students records.
It's better not to include student_name into this table, but store all students data in separate students table and reference it by student_id foreign key (something like first table in second row except the ids.).
It's not normalised because neither id nor student_name is the key (both have duplicates) so the key must be one of those (probably id) together with the course code. The other one (name) then doesn't depend on that key, but just on id.
The simple rule for 3NF is that every non-key column must depend on "the key, the whole key, and nothing but the key" - to which we all solemnly intone "so help me Codd"!
The higher normal forms deal with dependencies inside the parts of a key.
Because in your first right table you have twice values
23 - j.smith
that is repeated and do not adhere to Codd 1 normal form

Need to write a sql query: There can only be one value for another particular value

For example, in the table ADDRESSEs there is a column ZIP_CODE. The zip codes can be anything, say 90210, 45430, 45324. There can be multiple instances of 90210 or any other zip code. But for any zip code, there can only be one value for it in the STATE column. If 90210 is in the zip code column, the STATE column MUST be CA, if there is another record with 90210 and has OH or GA or anything else, it is incorrect. I am looking to find these particular zip codes that have anything other than one single value in this other column.
This is not a homework question.
select zip_Code, count(distinct state)
from address
group by zip_code
having count(distinct state) > 1;
try this:
select zip_code, count(*)
from (select distinct zip_code, state
from address)
group by zip_code
having count(*) > 1;
The clearest way to design this is to have a table that contains every allowable combination of ZIP code and state, then use those two columns as foreign keys against the table that contains the actual address. This way, any invalid state/ZIP combination would violate the foreign key constraint. This approach will work even if a ZIP code crosses state lines, as long as you don't make the ZIP code field of the lookup table a unique key. The downside to this approach is that you would need to pre-populate the lookup table and keep it up-to-date, which would mean paying the USPS for their listings.
If you're really insistent on the one-state-per-ZIP approach and/or don't want to buy the list from the USPS, you could still use a similar approach. Again, you'll want a lookup table with the state and ZIP, but this time you'll want to make the ZIP a unique key. As records are added to the table containing addresses, you could use a trigger to populate the lookup table when the ZIP code doesn't already exist. This would add a little overhead, but not enough to worry about in most cases.
select count(*), zip_code
from address
group by zip_code
having count(*) > 1