Need to write a sql query: There can only be one value for another particular value - sql

For example, in the table ADDRESSEs there is a column ZIP_CODE. The zip codes can be anything, say 90210, 45430, 45324. There can be multiple instances of 90210 or any other zip code. But for any zip code, there can only be one value for it in the STATE column. If 90210 is in the zip code column, the STATE column MUST be CA, if there is another record with 90210 and has OH or GA or anything else, it is incorrect. I am looking to find these particular zip codes that have anything other than one single value in this other column.
This is not a homework question.

select zip_Code, count(distinct state)
from address
group by zip_code
having count(distinct state) > 1;

try this:
select zip_code, count(*)
from (select distinct zip_code, state
from address)
group by zip_code
having count(*) > 1;

The clearest way to design this is to have a table that contains every allowable combination of ZIP code and state, then use those two columns as foreign keys against the table that contains the actual address. This way, any invalid state/ZIP combination would violate the foreign key constraint. This approach will work even if a ZIP code crosses state lines, as long as you don't make the ZIP code field of the lookup table a unique key. The downside to this approach is that you would need to pre-populate the lookup table and keep it up-to-date, which would mean paying the USPS for their listings.
If you're really insistent on the one-state-per-ZIP approach and/or don't want to buy the list from the USPS, you could still use a similar approach. Again, you'll want a lookup table with the state and ZIP, but this time you'll want to make the ZIP a unique key. As records are added to the table containing addresses, you could use a trigger to populate the lookup table when the ZIP code doesn't already exist. This would add a little overhead, but not enough to worry about in most cases.

select count(*), zip_code
from address
group by zip_code
having count(*) > 1

Related

Best database schema for country, region, county, town

I have country, region, county, town data and I'm currently deciding between 2 schema designs (if there's a better one, do tell).
I first thought
Country
Id
Name
Region
Id
CountryId
Name
County
Id
RegionId
Name
Town
Id
CountyId
Name
Does the job however to get all towns in a country you have to 3 inner joins to do the filtering. I guess this could be ok but potentially expensive?
The other design was:
Country
Id
Name
Region
Id
Name
County
Id
Name
Town
Id
CountryId
RegionId
CountyId
Name
This way all hierarchical data so to speak is at the bottom and you can go back up however if you want all regions in a country you're a bit screwed which makes we wonder whether the first design is best.
What do you think is the best schema design?
The best database design depends on how the data is being used.
If this is pretty static data that will all be updated at one time and external references are all to towns, then I would probably go for a denormalized dimension. That is, store the information all in one row:
Town Id
Town name
County name
Region name
Country name
Under the above scenario, the ids for county, region, and country are not necessary (by assumption).
If the data is being provided as separate tables with separate ids, and these tables can be updated independently or row-by-row, then a separate table for each makes sense. Putting all the ids into the towns table may or may not be a good idea. You will have to verify and maintain the hierarchies when data is inserted and updated.
If ids for each level are necessary for your, then you should have appropriate table structure for declaring foreign key constraints. But, this can get complicated. Will an external entity have a "geography" attribute that can be at any level? Will an external always know what level it is going to refer to as?
In other words, you need to know how the data is going to be used in order to define an appropriate data model.

which is faster select+ update or delete+insert in sql?

I just got stuck in a problem, where there are two ways of solving this.
Let me first explain the case,
I have a DB table consisting of some columns say id, name, address, priority. Here name and address is not unique but name + address + priority is unique.
Input provided to me is name and list of addresses. Now, what I have to do is to arrange name and address in the same order as given in input in my DB table.
There are two ways of solving:
selecting on the basis of name and address and make update queries for those data which are changed and execute them.
delete the data corresponding to name and address from table and insert the data with new priority.
I know that one update is faster than delete + insert but here in this case there is one select query too.
My intuition is that 1st method will be more fast but I don't have any technical details about it.
Am I missing something?

PK for table that have not unique data

I have 2 tables like
Company( #id_company, ... )
addresses( address, *id_company*, *id_city* )
cities( #id_city, name_city, *id_county* )
countries( #id_country, name_country )
What i want is :
It is a good design ? ( a company can have many addresses )
And the important thing is that you my notice that i didn't add a PK for addresses table because every address of a companies will be different, so am I right ?
And i will never have a where in a select that specify a address.
First of all we should distinguish natural keys and technical keys. As to natural keys:
A country is uniquely identified by its name.
A city can be uniquely identified by its country and a unique name. For instance there are two Frankfurt in Germany. To make sure what we are talking about we either use the distinct names Frankfurt/Main and Frankfurt/Oder or use the city name with its zip codes range.
A company gets identified by its full name usually. Or use some tax id, code, whatever.
To uniquely identify a company address we would take the company plus country, city and address in the city (street name and number usually).
You've decided to use technical keys. That's okay. But you should still make sure that names are unique. You don't want France and France in your table, it must be there just once. You don't want Frankfurt and Frankfurt without any distinction in your city table for Germany either. And you don't want to have the same address twice entered for one company.
company( #id_company, name_company, ... ) plus a unique constraint on name_country or whatever makes a company unique
countries( #id_country, name_country ) plus a unique constraint on name_country
cities( #id_city, name_city, id_county ) plus a unique constraint on name_city, id_country
addresses( address, id_company, id_city ) with a unique constraint on all three columns
From what you say, it looks like you want the addresses only for lookup. You don't want to use them in any other table, not now and not in the future. Well, then you are done. As you need a unique constraint on all three columns, you could just as well declare this as your primary key, but you don't have to.
Keep in mind, that to reference a company address in any other future table, you would have to store address + id_company + id_city in that table. At that point you would certainly like to have an address id instead. But you can add that when needed. For now you can do without.
It's okay - you might want to add some non-unique index on company_id so company address queries are sped up. Another option would be making a joining table between Company and Address, but that would probably only be justified if Address stored more data(so searches would be slower).
This design is fine.
A (relational) table always has a (candidate) key. (One of which you can choose as the primary key, but candidate keys, aka keys, are what matter.) Because if no subset of columns smaller than set of all columns is unique then the key is the set of all columns.
Since every table has one, in SQL you should declare it. Eg in SQL if you want to declare a FOREIGN KEY constraint to the key of this table then you have to declare that column set a key via PRIMARY KEY, KEY or UNIQUE. Also, telling the DBMS what you know helps optimize your use of it.
What matters to determining keys are subsets of columns that are unique that don't have smaller subsets that are unique. Those are the keys.
A company, address or city is not unique since you are going to have multiple of each.
A (city,address) is not unique normally.
A (city,company) is not unique normally.
A (company,address) is not unique normally.
So (company,address,city) is the (only) (candidate) key.
Note that if there were only ever one city, then (company,address) would be the key. And if there were only ever one company, then (address,city) would be the key. So your given reason that the "because every address[+city?] of a company [?] will be different" isn't sound unless we're supposed to assume other things.
I'm making this an answer instead of a comment because of length. As to the address table having a defined primary key, the answer is yes. There are several good reasons but just consider this one.
Suppose a company had several addresses and a move required you to delete one of the addresses. You can't just delete where comp_id = x as that would delete all the addresses for that company. You have to have where comp_id = x and something_else where the something else must differentiate the one address from all the others for that company. So you have to have someone look at the different addresses to see how they differ and select the one difference that correctly identifies the one address and then write that correctly into the where clause.
That's a lot of work to do every time you want to delete (or update) an address.
It also means it's more difficult to write a parameterized delete statement that can be used to delete any address. Suppose a company has several locations in the same building: Shipping in Suite 101, Marketing in Suite 202 and IT in (of course) the basement. So the street, city, state, everything is the same, different only in Suite_No or whatever is used to refine the address.
Then consider your user. Most of the time, a user isn't going to be interested in seeing every single address you have listed for a company. He's only interested in Product Testing. You should be able to give them Product Testing's address and no other. Users are not known for their patience when presented with a data dump every time they do a query and it's up to them to select the one they're looking for.
It just solves so many problems to be able to specify where addr_id = x.
An address is a thing and should have its own table.
An address can exist without a company, therefore it should not have a foreign key to company. Also, what if you start selling to/buying from individuals?
A company can have zero, one, or many addresses.
Two or more companies can have the exact same address. You assumption is flawed.
Use a junction table:
company -< company_address >- address

sql - denormalize address

Customer Table - CustomerId, Street, City, State, Zipcode
ZipCode Table - ZipCodeId, ZipCode, CityId, StateId
However, what I am confused about it is - should I put CityId, StateId and ZipCodeId or should I put CityName, StateName and ZipCode in Customer Table? And should these be set up as referencing foreign key in city, state and zipcode tables? Should I get rid of City Table altogether and just repeat city's name in ZipCode table and Customer table?
Nominally, the customer table should only contain the street address and zip code ID (not the city or state). Note that data entry for such a normalized scheme is not necessarily straight-forward; people will expect to enter city, state, zip code (or maybe just zip code) and the onus will be on the application to map that correctly and disambiguate when necessary.
I too live in a zip code used by two cities; they even happen to be in different counties, which leads to questions about which county I live in on occasion. One of the cities has multiple other zip codes; the other only has (part of) the one zip code. There's no problem: you would have two separate ZipCodeID entries for the same ZipCode, one for each city. Note that this means that there would not be a unique index on the ZipCode column in the ZipCode table.
Where would you store the +4 of the Zip+4 scheme? Good question! That belongs with the street address.
If it is possible to have more than one city with same zip code, then a better solution that you have a City table, where you have zipcode column (just varchar or int column, no foreign key). In City table you keep foreign key from State table. At the end you should keep CityId in Customer table.
Reason for this: City and State tables are small tables, and joining them you don't make performance problem, just make index on them and everything will be fine.
Your design looks ok to me - keep full address in customer table. Just make sure that you permit to enter multiple entries into ZipCode table having the same zipcode but different cities (i.e. do not make ZipCode.ZipCode unique key).
See here:
Zip codes relate only to the mail system and have absolutely nothing
to do with political boundaries. Some cities will be covered by more
than one zip code; some zip codes will cover more than one city.
It is very frustrating when you call insurance company and after telling them zipcode they would give you really bad quote - simply because you seem to live in different city across the road, and that city has exactly the same zip code, but bad crime situation.
You can also incorporate USPS address correction into your application to standardize and keep this information normalized. See: https://ribbs.usps.gov/index.cfm?page=aec

trying to determine unique identifier for database table

I have a database table with many columns and there is no specified primary key. There isn't a list of super keys either. Besides iteratively trying all candidate keys/columns, is there a way for me, using SQL, to try and figure our whether a subset of keys can make a unique identifier for my table?
For example, a table may have 4 columns first name, last name, address and zip and the data I see is:
John, Smith, 1 main st, 00001
Mary, Smith, 1 main st, 00001
Mary, Smith, 2 sub st, 00002
In this case, I'll need first, last and zip as my unique key.
John, Smith, 1 main st, 00001
John, Smith, 1 main st, 00001
In this case, there is no unique key.
Please don't comment on my table construction and/or normalization of databases, I'm just trying to find a practical answer. Thanks.
This is my question: Besides iteratively trying all candidate keys/columns, is there a way for me, using SQL, to try and figure our whether a subset of keys can make a unique identifier for my table?
Looking for a subset of unique values in this case seems so specific to the particular data set. What if you arrive at a subset today and find you can't insert a new row tomorrow?
Use an artificial key, like an auto-incrementing integer.
In short: no, there's no way to do this in T-SQL really.
My advice: just add a ID INT IDENTITY PRIMARY KEY column to the table. It's guaranteed to be unique, it will be filled automagically when you create it, it's fast and easy, no messy "is this really unique or are there any combinations of rows that violate the uniqueness" questions......
Just do it - it's the easiest way to go!!
You cannot find if a combination "can" make a primary key. You can find if one WILL make a good primary key for an existing set of data.
To find if a set of fields is candidate or not, you can count the distinct of those fields (using group-by with rollup) and compare that with count (*)
There is a much faster method.
Enterprise dbms have had it for many years but MS SQL Server 2005 (useable in 2008) and later provided the HashBytes() function. Convert the columns to CHAR() (VARCHAR on MS), concatenate them; then hash them; then compare the hashes. You can compare the two tables in a single SELECT command. IIRC max 8000 characters per row.
(If you use this answer, please undo and redo your Answer choice.)
if you are comparing two databases, then you can see if any duplicate rows exist in the source db with structures like this:
select a,b,c,d
from mytable
having count(*) > 1
group by a,b,c,d
include all columns.
then use all columns as the 'row key' to see if it exists in the target system
there are update anomalies in this schema:
you cannot a person without knowing his address
better approach is to separate to three tables, one for persons and one for PersonAddress
> perons: id,firstname, lastname
> address: id,address:
> personaddress: personid, addressid
You cannot find if a combination "can" make a primary key.
I actually disagree with this, I think it is possible to write a query that will SELECT all possible permutations of columns from the table and combine each permutation into a single unique value (the simplest, crudest way is to CAST them all to VARCHAR and connect them with a spacer character - a better way would be some kind of hash function).
With a single pass you would then have set of columns like P1, P12, P123, P2, P23, P3 etc (in case of three columns). Then you can do a query with COUNT(*) vs COUNT(DISTINCT) for each permutation column and you will see which permutations are unique.
Using dynamic SQL you could probably make it so that it would work on any table, although I don't know about the column limit for SQL Server.