SQL normalization: Mathematical? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm taking a class in database design. The curriculum focuses heavily on nomalization. The steps and methods speak for themselves, however I find the intuition aspect frustrating. Is there a mathematical way to handle data normalization, where one can assign the properties and crunch a mathematically certain conclusion?

I'll give another real world example in a discussion regarding addresses.
You are architect in charge of redesigning a legacy app with an old database that contains address information.
The DB currently has 1 field for all address information.
Our address looks something like this:
Address
___________________________________
Bob Robertson 123 Broad Way Springfield IL 62701
Unacceptable! Let's normalize this a bit to remove some of that redundancy, so in our second pass, we break it up into a few fields:
Address1 Address2 City State Zip
________ ________ ____ _____ ____
Bob Robertson 123 Broad Way Springfield IL 62701
Well, you say, there are multiple addresses within a state, so we need to break that out into a lookup table, and we can save tons of space on that city name, and there is a one to many relationship between streetname and all of the numbers that exist on that street.
We also need to represent the 9 digit zip codes, and one 5 digit zip code has multiple 4 digit suffixes, so that needs it's own table too in order to fully normalize this relationship, boy am I clever!
Also, we might have multiple robertsons, and multiple Bobs that we ship things to, so we need a cross reference lookup table for that as well.
At this point, our table may look like this:
FirstNameID LastNameID StreetID StreetNum CityID StateProvinceID ZipID 4DigitSuffix
__________ _________ ________ ___________ ______ _______________ _____ ___________
3452 1257 45 234 990 32 123 1234
If you presented this to most rational developers, requiring 6 or more joins just to get address information, you might get run out of town on a rail.
In many cases, there is an optimal middle ground that satisfies business needs and also speed/ease of development.

Related

How to split a street address in SQL Server Management Studio

I have a column that contains a street address. Below are some examples of how that street address could be.
Question is: how do I break this into individual columns for str_number, str_prefix (may or may not exist), Str_Name (could be one or more words), str_type, str_suffix (may or may not exist).
I'm not sure if this is possible in SQL, with some values not being there but thought I would check. Thank you very much for any assistance.
123 N Main ST E
456 Bear Creek AVE W
789 N Rose LN
234 E Deer Run Ln
You can't do this reliably, as there are far too many variations on how addresses are formatted, abbreviated, etc. See Falsehoods programmers believe about addresses.
You would be much better off writing/finding an application which submits each address to a service (API) which can then look up against a known address database and return the constituent components in a structured format, then insert that "cleansed" data into proper fields in the database.

Ignore similar values and not treat as duplicate records

I'm writing a Select query in SQL server and I found a question.
When I have two rows like that:
ID Address City Zip
1 123 Wash Ave. New York 10035
1 123 Wash Ave New York 10035
Because I have many same Address but some of them just have dot or some little difference.
they are almost identical, so how can I find all such case.
Using UPS Online API's, our solution was not to correct the error but help sort the results that best represent the correct answer.
With the results returned by UPS, we would run various filters against the original source address and each of the returned responses. then produce a weighting system to sort the results to present to our CSR to select the most logical "correctly formatted" answer from UPS.
Thus building a score card from the result set, such as the number of digits incorrect in the ZIP Code (catches fat fingering).
Another measure removes all pronunciation marks and gives a ranking of how close the address is now.
Lastly we pass the results through a matrix of standard substitutions [ ST for STREET ] and do a final ranking.
From all these scores we sort and present the results from most likely to least likely to an individual who then selects the correct answer to save in our database.
Correcting these errors now serve two purposes:
1) We look good to our customers by having the correct address information on the billing ( No just close enough)
2) We save on secondary charges from UPS by not being billed for incorrect addresses.

Is this a sensible object oriented design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
---------------------
| AbstractPersister |
---------------------
^
/ \
---
|
------------------
| OrderPersister |
------------------
^
/ \
---
|
--------- ---------
| Offer |<>---| Item |
--------- ---------
The above is ascii art for a UML class diagram. I have a domain class called "offer" which contains items. E.g. an offer for a trip which costs 500 bucks and contains a bus ticket, a nights stay at a motel and a show.
When the customer checks out of the store, the offer is persisted to the database as an order containing order items.
Would it make sense in a pure object oriented design to make the offer inherit from an OrderPersister? The OrderPersister can get a connection to the database using its super class, the AbstractPersister. I want to add a method to OrderPersister called Book, which will create the order and its items in the database.
If this is a bad design, what alternatives are there? I want this to be as object oriented as possible please.
Before extending a class, you should ask this question: is a?
Is an Offer an OrderPersister? I say it's not.
You should take a look at the Data Mapper pattern. It will let you use domain objects that know nothing about persistence, and still be able to persist them.

help with tree-like structure

I've got some financial data to store and manipulate. Let's say I have 2 divisions, with offices in 2 cities, in 2 currencies, and 4 bank accounts. (It's actually more complex than that.) I want to show a list like this:
Electronics
Chicago
Dollars
Account 2 -> transactions in acct2 in $ in chicago/electronics
Euros
Account 1 -> transactions in acct1 in E in chicago/electronics
Account 3 -> etc.
Account 4
Brussles
Dollars
Account 1
Euros
Account 3
Account 4
Dessert Toppings
Chicago
Dollars
Account 1
Account 4
Euros
Account 2
Account 4
Brussles
Dollars
Account 2
Euros
Account 3
Account 4
So at each level except the top, the category can appear in multiple places. I've been reading around about the various methods, but none of the examples seem to address my particular use case, where nodes can appear in more than one place in the hierarchy. (Maybe there's a different name for this than "tree" or "hierarchy".)
I guess my hierarchy is actually something like Division > City > Currency with 'Electronics' and 'Euros' merely instances of each level, but I'm not quite sure how that helps or hurts.
A few notes: this is for a demo site, so the dataset won't be large -- ease of set-up and maintenance is more important than query efficiency. (I'm actually considering just building a data object by hand, though I'd much rather do it the right way.) Also, FWIW, we're working in php with an ms access back-end, so any libraries out there that make this easy in that environment would be helpful. (I've found a couple of implementations of the nested set pattern already.)
Are you sure you want to use a hierarchical design for this? To me, the hierarchy seems more a consequence of the desired output format than something intrinsic to your data structure.
And what if you have to display the data in a different order, like City > Currency > Division? Wouldn't that be very cumbersome?
You could use a plain structure instead, with a table for Branches, one for Cities, one for Currencies, and then then one Account table with Branch_ID, City_ID, and Currency_ID as foreign keys.
I'm not sure what database platform you're using. But if you're using MS SQL Server, then you should check out recursive queries using common table expressions (CTEs). They're easy to use and are designed for exactly the type of situation you've illustrated (a bill of materials, for instance). Check out this website for more detail: http://www.mssqltips.com/tip.asp?tip=1520
Good luck!

Is there common street addresses database design for all addresses of the world? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am a programmer and need a practical approach to storing street address structures of the world in a database. So which is the best and common database design for storing street addresses? It should be simple to use, fast to query and dynamic to store all street addresses of the world.
It is possible to represent addresses from lots of different countries in a standard set of fields. The basic idea of a named access route (thoroughfare) which the named or numbered buildings are located on is fairly standard, except in China sometimes. Other near universal concepts include: naming the settlement (city/town/village), which can be generically referred to as a locality; naming the region and assigning an alphanumeric postcode. Note that postcodes, also known as zip codes, are purely numeric only in some countries. You will need lots of fields if you really want to be generic.
The Universal Postal Union (UPU) provides address data for lots of countries in a standard format. Note that the UPU format holds all addresses (down to the available field precision) for a whole country, it is therefore relational. If storing customer addresses, where only a small fraction of all possible addresses will be stored, its better to use a single table (or flat format) containing all fields and one address per row.
A reasonable format for storing addresses would be as follows:
Address Lines 1-4
Locality
Region
Postcode (or zipcode)
Country
Address lines 1-4 can hold components such as:
Building
Sub-Building
Premise number (house number)
Premise Range
Thoroughfare
Sub-Thoroughfare
Double-Dependent Locality
Sub-Locality
Frequently only 3 address lines are used, but this is often insufficient. It is of course possible to require more lines to represent all addresses in the official format, but commas can always be used as line separators, meaning the information can still be captured.
Usually analysis of the data would be performed by locality, region, postcode and country and these elements are fairly easy for users to understand when entering data. This is why these elements should be stored as separate fields. However, don't force users to supply postcode or region, they may not be used locally.
Locality can be unclear, particularly the distinction between map locality and postal-locality. The postal locality is the one deemed by a postal authority which may sometimes be a nearby large town. However, the postcode will usually resolve any problems or discrepancies there, to allow correct delivery even if the official post-locality is not used.
Have a look at Database Answers. Specifically, this covers many cases:
(All variable length character datatype)
AddressId
Line1
Line2
Line3
City
ZipOrPostcode
StateProvinceCounty
CountryId
OtherAddressDetails
Ask yourself what is the main purpose of storing this data? Do you intend to actually send mail to the person at the address? Track demographics, populations? Be able to ask callers for their correct address as part of some basic authentication/verification? All of the above? None of the above?
Depending on your actual need, you will determine either a) it doesn't really matter, and you can go for a free-text approach, or b) structured/specific fields for all countries, or c) country specific architecture.
Sometimes the closest you can get to a street address is the city.
I once had a project to put all the Secondary Schools in India in Google Maps. I wrote a spiffy program using the Google API and thought it would be quite easy.
Then I got the data from the client. Some school addresses were things like "Across from the market, next to the barber" or "Near old bus stand".
It made my task much harder since, unfortunately, the Google API does not support that format.
For international addresses, it is remarkably hard to find a way to format the information if it is broken down into fields. As a for instance, an Italian address uses:
<street address>
<zip> <town> <region>
<country>
Such as
Via Eroi della Repubblica
89861 Tropea VV
Italy
That is rather different from the order for US addresses - on the second line.
See also the SO questions:
How many address fields would you use for a UK database?
Do you break up addresses into street / city / state / zip?
How do you deal with duplicate street suffixes?
Best practices for storing postal addresses in a database (RDBMS)?
Also check out tag 'postal-code'.
Edit: Reverse order of region and town - per UPU
Maybe this is useful:
https://gist.github.com/259744
For a project I collected a table of informations about all countries of the world, including ISO codes, top level domain, phone code, car sign, length and regex of zip.
Country names and comments unfortunately only in German...
Differently of other answers here, I believe it's possible to have an structured address database.
Just out of the hat, I can think of the following structure:
Country
Region (State / Province)
Locality (City / Municipality)
Sub-Locality (County / other sub-division of a locality)
Street
But how to query it fast enough?
One way I always think it can be accomplished is to ask for the ZIP Code (or Postal Code) which varies from country to country, but is solid within the country.
This way you can structure your data around the information provided by the postal offices around the world.
Depends on how free-form you are prepared to go with the fields. One free-form address field will obviously always do, but be of relatively little help narrowing down geography.
The problem you'll have is that there is too much variation in the level of geographical hierarchy across countries. Heck, some countries do not even have 'street addresses' everywhere.
I recommend you don't try to make it too clever.
Len Silverston of Universal Data Model fame recommends a separate hierarchy of GEOGRAPHIC BOUNDARIES and depending on how much free-formed-ness you're willing to accept either simple STREET ADDRESS LINEs or per-country derivatives.
No, absolutely not. If you compare the way US and Japanese addresses work, you'll see that it's not possible.
UPDATE:
On second thought, anything can be done, but there's a trade-off.
One approach is to model the problem with address and address_attribute tables, with a 1:m relationship between them, anything can be modeled. The address_attribute table would have a pk, a name, a value, and an fk that points back to its address parent's pk. It's almost like using a Map with name, value pairs.
The trade-off is having to do a JOIN every time you want an address. You also have to interrogate the names of the address_attributes to figure out what you're dealing with each time.
Another approach would be to do more comprehensive research on how addresses are modeled around the world. In an object-oriented world you might have the western Address class (street1/street2/city/state/zip) and others for Japan, China, as many as needed to tile the address space. Then you'd have a master Address table and child tables to the other types with a 1:1 relationship between them.
How does Amazon or eBay do it? They ship internationally. Do they have locale-specific UI features? I've only used the US locale.
No, there are no standard addressing scheme. It usually varies from country-to-country.
Even the Universal Postal Union said on Adressing the world, an address for everyone that there is none. The best solution for this is to use the 2/3-letter country code standards known as ISO 3166 and treat everything else by country's standards.
However, if you really are desperate to use easily accessible tools for your project, you can try Google Place API.
Your design should strongly depend from your purpose. Some people have posted how to structure data. So if you simply want to send s-mail to someone, it will do. Things begin to complicate if you want to use this data for navigation. Car navigation will require additional structures to contain traffic info (eg one-way roads), while foot navigation will require a lot of additional data. Here is small example: in my city, my neighborhood is near the park. Next to the park is former airfield (in fact, one of the oldest in Europe) turned into aviation museum. Next to aviation museum is a business park. Street number for museum is 39, while business park numbers start with 39A. So it may seem that 39 and 39A are close – but it takes about a mile to walk from one to another (and even longer if going by car) .
This is just a small example taken from my city, I think you can probably find a lot of exceptions (especially in rural or wilder parts of every country).