What is the industry standard way to store country / state / city in a database of web APP? - sql

For country and state, there are ISO numbers. With City, there is not.
Method 1:
Store in one column:
[Country ISO]-[State ISO]-[City Name]
Method 2:
Store in 3 separate columns.
Also, how to handle city names if there is no unique identifier?

First and foremost, three separate columns to keep your data. If you want to create a unique identifier, the easiest way would be giving a random 3-10 digit code depending on the size of your data set. However, I would suggest concatenating [country-code]-[state-code]-[code] if you have a small data set and if you want human readability to a certain point. code can be several things. Here are some ideas:
of course a random id or even a database row id
licence plate number/code if there is for a city
phone area code of the city or the code of the center
same logic may apply to zip codes
combination of latitude and longitude of the city center up to certain degree
Here are also more references that can be used:
ISO 3166 is a country codes. In there you can find codes for states or cities depending on the country.
As mentioned IATA has both Airport and City codes list but they are hard to obtain.
UN Location list is a good mention but it can be difficult to gather the levels of differentiation, like the airport code or city code or a borough code can be on the same list, but eventually the UN/LOCODE must be unique. (Airport codes are used for ICAO, similar to IATA but not the same)
there are several data sets out there like OpenTravelData or GeoNames that can be used for start but may require digging and converting. They provide unique codes for locations. And many others can be found.
Bonus:
I would suggest checking Schema.org's City Schema and other Place Schemas for a conscious setup.

Related

How can my address column in my orders table reference multiple types of address formats?

I'm doing this in SQLite
The problem is my orders table needs to have an address to deliver to but there are many different address formats around the world so I have a table for each different format.
Examples of formats:
US and similar address - Place name, street address, apt number, city, state, country, zip code
Certain places in Africa - Place name, country, state, neighborhood, landmark, distance from land mark, direction from landmark
There are many other valid delivery address formats.
So my ideas are:
Add null columns to the orders table that reference each address format use a check restraint to confirm there is at least an address.
Have separate tables like order_us_address, order_africa_address
Create a table called master_address_id and then reference that in each address format table so that i can just references that master_address_id in the orders table
What is the best practice here? Is there another option?
A single address table should work fine. I assume you want to hold all the individual address fields so just add them to your table. For any individual address record, a lot of these columns will be null but that’s not an issue.
Then add columns for Mailing Address Lines (up to the maximum number of lines there could be in a mailing address). Based on the logic you presumably know for how to construct a mailing address for each country, populate these mailing address lines appropriately.

How to design table or collection for storing destinations?

I have an task to store data about destinations of delivery, where companies can ship the postal parcel.
The trivial way is to create a table
CompanyShippmentPlaces
id | country | city
There are the some design issues:
What if need be delivered to towns or villages, not to cities? That means altering a table?
What if company needs to specify a part of city, townm or village?
What if the destinations have the same name?
How I plan to use this data:
When system gest a order, the order should be distributes across all companies. I must get all companies that can deliver this product.
It pushes me to use noSQL database, but I am not confident.
What do you think about that?
What if need be delivered to towns or villages, not to cities? That means altering a table?
This would be solved by the solution of jaimish11.
Peronsaly I would change the naming of "City" to "locality" (or something comparable - to generalize).
What if company needs to specify a part of city, townm or village?
I think this is solved by the address lines.
What if the destinations have the same name?
Normaly (as much that i know) each location in a country has it's own pin- or zipcode respectively if the naming doesn't match the post will use the code to identify the location. (to be sure you should ask the post in your at least in your country)
When system gest a order, the order should be distributes across all companies. I must get all companies that can deliver this product.
I would get all locations where the products are available and then get the location wich is next by the city out of the first selection. Maybe you could save the nearest location to a city in your "city table".
The issue you're describing isn't actually an issue. No matter what database you use (SQL or NoSQL) you can simply have all the address fields you need such as:
Address Line 1
Address Line 2
Landmark
City
Pincode/ Zipcode
State
Country
This way, it won't matter whether it's city, town or village.

how to fix spelling mistakes in a database have multiple records in that records there are more records

i have a database having country, city, state and hotels in these table country name has multiple identical records for eg mexico is wrongly spelled as maxico and mxico and mexico,other records like usa and united states of america and america these type of records are having mutiple same wrongly spelled states and states has multiple wrong spelled cities but hotels are unique and i want them to set them to there right city and state and country for eg. some hotel is in chicago city Illinois state and country is usa. please help me how can i fix this
you could do an update if you know all the different scenarios that are incorrect
update tbl
set city = 'Mexico'
where city in ('maxico', 'mxico')
Well,you can list all values the country column has,and then check wether the values is right, if it is wrong, just use update clause to fix the wrong value, like below:
update my_table set country = 'Mexico' where country in ('maco', 'xico');
It depends on infrastructure you're running.
If you have access to some ETL tools, they often have DataQuality capabilities, often with databases used in correcting adresses. Those are often paid.
If you are a "private" developer, then you might not want to use paid data, so you can look for open data sources, like https://catalog.data.gov allegheny country addresses.
You can use multitude of algorithms and solutions, ranging from simple distances in word space to neural networks pre-trained to do just that.
This type of data problem is hard. There is no built-in simple way to determine the "right spelling". Many databases have one of two capabilities built in that can help -- either "soundex" algorithms or Levenshtein distance.
What should you do? If you really want to fix this problem, create a table with the misspelled name and the correct value that you want. This table will need to be maintained manually, such as in a spreadsheet. Then use this table when importing data and use only the rectified value.
Better yet, set up a reference table with only the correct names. Create a second table with alternative names, which is maintained as above.

Linking two seperate sets of data codes without a common identifier

I have two large sets of data. Both sets are a form of structured coding system,and is used to categorize groups of people based on their occupation. The two sets of data have no common identifier. Besides a column that contains a unique identifier each table has a description for said identifier, but although they may be describing similar things the descriptions are not identical.
How do I create a table, that connects the two sets of data, without having to go back and manually try to figure out how to make the connection between the two identifiers. I am not sure if this can be done on Access or SQL. If there is a way to do this, I would like to know what software is maybe out there.
Here's some example data:
Table 1:
Z Identifier DescriptionA
162000 Pharmacist
3123566 Electronic Repairman
143246 Banker
8444455 Doctor
Table 2:
Q Identifier DescriptionB
XX134556 COPY/PRINT/SCAN EQUIP
666Q1224 DRUGS
722WWYZ Financial Svc
8456435T Medical Services
15666PP Health Services
Desired Output:
Table 3:
Z Identifier DescriptionA Q Identifier DescriptionB
162000 Pharmacist 666Q1224 DRUGS
3123566 Electr Repairman XX134556 COPY/PRINT/SCAN EQUIP
143246 Banker 722WWYZ Financial Svc
8444455 Doctor 8456435T Medical Services
Table 1:
Z Identifier DescriptionA
162000 Pharmacist
3123566 Electronic Repairman
143246 Banker
8444455 Doctor
Table 2:
Q Identifier DescriptionB
XX134556 COPY/PRINT/SCAN EQUIP
666Q1224 DRUGS
722WWYZ Financial Svc
8456435T Medical Services
15666PP Health Services
Output:
Z Identifier DescriptionA Q Identifier DescriptionB
162000 Pharmacist 666Q1224 DRUGS
3123566 Electr Repairman XX134556 COPY/PRINT/SCAN EQUIP
143246 Banker 722WWYZ Financial Svc
8444455 Doctor 8456435T Medical Services
Conventional tools that you are used to (like Access, Excel, and SQL) can only go so far with comparing the meaning and usage of words.
In other words (forgive the pun), in order to do this, you need some sort of natural language processing toolkit (NLPT). Along with that, you also need some knowledge of how to program, because I don't think there exists front-end interfaces that can give you the output you want given only the input you listed by just filling out some forms.
So with that in mind, in order to solve your problem (I'll assume you know how to program and can pick up a NLPT in a language of your choice), you need to do the following:
Put your two datasets in some tables.
Manipulate DescriptionA and DescriptionB to be something meaningful to the NLPT you are using. They won't like a string such as "COPY/PRINT/SCAN/ EQUIP". They'll want the slashes removed and the words separated.
Compare DescriptionA with DescriptionB in a permutation-style manner by using a path_similarity type of function in the library. For example path_similarity('animal.definition1', 'dog.definition1') should return a high value, say .60, while path_similarity('animal.definition1', 'book.definition1') should return a low value, like .10.
If the path_similarity is above a certain value (up for you to decide), join the two items together and append them as a single row to a results table, while removing them from their respective tables. Continue doing this until the list is exhausted of DescriptionA greater than a certain similarity to a DescriptionB. Then do something else with the rows that are left in Table 1 and Table 2.
This should all be fairly easy to do programmatically. You may find you are not getting proper matches in some places with this method because you are randomly choosing two words to compare. Because of that, you may want to find another algorithm other than just permutations, perhaps one that looks at the statistics of the path_similarity of every piece of your data to every other piece and acts more appropriately.
Additionally, you may want to allow more than two words to be paired up. For example; "lumberjack", "tree cutter", and "tree chopper" make more sense to be grouped in one row with an additional two columns created than to throw one of them out who will likely be left without a pair. All of the problems I just listed in this paragraph, I'm sure are not new problems and you can search around the internet in order to solve them. Best of luck!

Localisation of country names

As part of addresses I am storing in my SQL database country codes (e.g. US, DE,...). I then have another table (with two columns) in my database which translates the country codes to the English language names of the respective countries.
If I want to make the site multi-language, I could expand this translation table adding country names in other languages than English.
I was wondering if there is another method which does not involve modification of the database, e.g. using gettext to translate the English country names?
The typical way to handle this is to change the table structure to have three columns, instead of two:
Language
CounryCode
FullName
Whenever you query the database, you would provide the current language.
You then have to change your code to include the additional language key in any queries.
Depending on how you are going to keep track of the current language, you would also use a view or user defined function.
You don't want to use automated translation, since the name of a country like "China" could turn into the equivalent of "porcelain".