how to fix spelling mistakes in a database have multiple records in that records there are more records - sql

i have a database having country, city, state and hotels in these table country name has multiple identical records for eg mexico is wrongly spelled as maxico and mxico and mexico,other records like usa and united states of america and america these type of records are having mutiple same wrongly spelled states and states has multiple wrong spelled cities but hotels are unique and i want them to set them to there right city and state and country for eg. some hotel is in chicago city Illinois state and country is usa. please help me how can i fix this

you could do an update if you know all the different scenarios that are incorrect
update tbl
set city = 'Mexico'
where city in ('maxico', 'mxico')

Well,you can list all values the country column has,and then check wether the values is right, if it is wrong, just use update clause to fix the wrong value, like below:
update my_table set country = 'Mexico' where country in ('maco', 'xico');

It depends on infrastructure you're running.
If you have access to some ETL tools, they often have DataQuality capabilities, often with databases used in correcting adresses. Those are often paid.
If you are a "private" developer, then you might not want to use paid data, so you can look for open data sources, like https://catalog.data.gov allegheny country addresses.
You can use multitude of algorithms and solutions, ranging from simple distances in word space to neural networks pre-trained to do just that.

This type of data problem is hard. There is no built-in simple way to determine the "right spelling". Many databases have one of two capabilities built in that can help -- either "soundex" algorithms or Levenshtein distance.
What should you do? If you really want to fix this problem, create a table with the misspelled name and the correct value that you want. This table will need to be maintained manually, such as in a spreadsheet. Then use this table when importing data and use only the rectified value.
Better yet, set up a reference table with only the correct names. Create a second table with alternative names, which is maintained as above.

Related

What is the industry standard way to store country / state / city in a database of web APP?

For country and state, there are ISO numbers. With City, there is not.
Method 1:
Store in one column:
[Country ISO]-[State ISO]-[City Name]
Method 2:
Store in 3 separate columns.
Also, how to handle city names if there is no unique identifier?
First and foremost, three separate columns to keep your data. If you want to create a unique identifier, the easiest way would be giving a random 3-10 digit code depending on the size of your data set. However, I would suggest concatenating [country-code]-[state-code]-[code] if you have a small data set and if you want human readability to a certain point. code can be several things. Here are some ideas:
of course a random id or even a database row id
licence plate number/code if there is for a city
phone area code of the city or the code of the center
same logic may apply to zip codes
combination of latitude and longitude of the city center up to certain degree
Here are also more references that can be used:
ISO 3166 is a country codes. In there you can find codes for states or cities depending on the country.
As mentioned IATA has both Airport and City codes list but they are hard to obtain.
UN Location list is a good mention but it can be difficult to gather the levels of differentiation, like the airport code or city code or a borough code can be on the same list, but eventually the UN/LOCODE must be unique. (Airport codes are used for ICAO, similar to IATA but not the same)
there are several data sets out there like OpenTravelData or GeoNames that can be used for start but may require digging and converting. They provide unique codes for locations. And many others can be found.
Bonus:
I would suggest checking Schema.org's City Schema and other Place Schemas for a conscious setup.

How to design table or collection for storing destinations?

I have an task to store data about destinations of delivery, where companies can ship the postal parcel.
The trivial way is to create a table
CompanyShippmentPlaces
id | country | city
There are the some design issues:
What if need be delivered to towns or villages, not to cities? That means altering a table?
What if company needs to specify a part of city, townm or village?
What if the destinations have the same name?
How I plan to use this data:
When system gest a order, the order should be distributes across all companies. I must get all companies that can deliver this product.
It pushes me to use noSQL database, but I am not confident.
What do you think about that?
What if need be delivered to towns or villages, not to cities? That means altering a table?
This would be solved by the solution of jaimish11.
Peronsaly I would change the naming of "City" to "locality" (or something comparable - to generalize).
What if company needs to specify a part of city, townm or village?
I think this is solved by the address lines.
What if the destinations have the same name?
Normaly (as much that i know) each location in a country has it's own pin- or zipcode respectively if the naming doesn't match the post will use the code to identify the location. (to be sure you should ask the post in your at least in your country)
When system gest a order, the order should be distributes across all companies. I must get all companies that can deliver this product.
I would get all locations where the products are available and then get the location wich is next by the city out of the first selection. Maybe you could save the nearest location to a city in your "city table".
The issue you're describing isn't actually an issue. No matter what database you use (SQL or NoSQL) you can simply have all the address fields you need such as:
Address Line 1
Address Line 2
Landmark
City
Pincode/ Zipcode
State
Country
This way, it won't matter whether it's city, town or village.

What is the most performant way to build and execute a multiple where clause in SQL from a single table of identifiers?

Here's the challenge. Users want to be able to create filters based on N-criteria and the criteria being used for the filter is a fluid heirarchy. To simplify it, let's use two hierarchies that the user could select from:
All Territories
Europe
UK
France
Americas
US
Canada
Mexico
Media
Music
Downloads
CDs
Movies
Streaming
DVD
Objects would have a table of tags associated with them. The ObjectsTags table would contain an indicator as two which type of data the tag is linked to
The issue is that user would want to select and group the tags they want to filter by. So they might want Movies in Europe so they would select those three tags as a single grouped filter. It's easy enough to get a filter based on those three tags that says:
Any object that has a tag of: (All Territories OR Europe OR UK or FRANCE) AND (All Media OR Movies OR DVD OR Streaming). The challenge is that I need to support any number of new hierarchies that might be needed and any level of filters, since a user could also want a filter that returns everything from that filter as well as all of the CDs in the US.
Is there any new feature in SQL Server that would be better suited for handling this type of a where clause in a performant way?
You are either going to have to create your where clause dynamically, or you will pre-create the SQL using a where clause similar to the following:
where country = coalesce(p_country, country)
and media = coalesce(p_media, medias)
and music = coalesce(p_music, music)
The really cool part of this statement? Your performance will be the
worst that it can possibly be.
I recommend creating a dynamic statement with the specific conditions you need.

Using LIKE in SQL Server to identify strings

I am writing a program that performs operations on a database of Football matches and data. One of the issues that I have is that my source data does not have consistent naming of each Team. So Leyton Orient could appear as L Orient. Most of the time this team is listed as L Orient. So I need to find the closest match to a team name when it does not appear in the database team name list exactly as it appears in the data that I am importing. Currently in my database I have a table 'Team' with a data sample as follows:
TeamID TeamName TeamLocation
1 Arsenal England
2 Aston Villa England
3 L Orient England
If the name 'Leyton Orient' appears in the data being imported I need to match this to L Orient and get the TeamID 3. My question is, can I use the LIKE function to achieve this in a case where the team name is longer than the name in the database?
I have figured out that if I had 'Leyton Orient' in the table and was importing 'L Orient' I could locate the correct entry with:
SELECT TeamName FROM Team WHERE TeamName LIKE '%l%orient%';
But can I do it the other way around? Also, I could have an example like Manchester United and I want to import Man Utd. I could find this by putting a % sign between every character like this:
SELECT TeamName FROM Team WHERE TeamName LIKE '%M%a%n%U%t%d%';
But is there a better way?
Finally, and this might be better put in another question, I would like not to have to search for the correct team when the way a team is named is repeated, i.e. I would like to store alternative spellings/aliases for teams in order to find the correct team entry quickly. Can anybody advise on how I might approach this? Thanks
The solution you are looking for is the FULL TEXT SEARCH, it'll require your DBA to create a full text index, however, once there you can perform much more powerful searches than just character pattern matching.
As the others have suggested, you could also just have an Alias table, which contains all possible forms of a team name and reference that. depending on how your search is working, that may well be the path of least resistance.
Finally, and this might be better put in another question, I would like not to have to search for the correct team when the way a team is named is repeated, i.e. I would like to store alternative spellings/aliases for teams in order to find the correct team entry quickly. Can anybody advise on how I might approach this? Thank
I would personally have a team table and a teamalias table. Use relationships to marry them up.
I believe the best way to prevent this, is to have a list of teams names displayed in a dropdown list. This will also let you drop validation for the team name. The users can now only choose one set team name and will also make it much easier for you working in your database. then you can look for the exact team name as it appears in your database. i.e.:
SELECT TeamName FROM Team WHERE TeamName = [dropdownlist_name];

Localisation of country names

As part of addresses I am storing in my SQL database country codes (e.g. US, DE,...). I then have another table (with two columns) in my database which translates the country codes to the English language names of the respective countries.
If I want to make the site multi-language, I could expand this translation table adding country names in other languages than English.
I was wondering if there is another method which does not involve modification of the database, e.g. using gettext to translate the English country names?
The typical way to handle this is to change the table structure to have three columns, instead of two:
Language
CounryCode
FullName
Whenever you query the database, you would provide the current language.
You then have to change your code to include the additional language key in any queries.
Depending on how you are going to keep track of the current language, you would also use a view or user defined function.
You don't want to use automated translation, since the name of a country like "China" could turn into the equivalent of "porcelain".