I am writing a program that performs operations on a database of Football matches and data. One of the issues that I have is that my source data does not have consistent naming of each Team. So Leyton Orient could appear as L Orient. Most of the time this team is listed as L Orient. So I need to find the closest match to a team name when it does not appear in the database team name list exactly as it appears in the data that I am importing. Currently in my database I have a table 'Team' with a data sample as follows:
TeamID TeamName TeamLocation
1 Arsenal England
2 Aston Villa England
3 L Orient England
If the name 'Leyton Orient' appears in the data being imported I need to match this to L Orient and get the TeamID 3. My question is, can I use the LIKE function to achieve this in a case where the team name is longer than the name in the database?
I have figured out that if I had 'Leyton Orient' in the table and was importing 'L Orient' I could locate the correct entry with:
SELECT TeamName FROM Team WHERE TeamName LIKE '%l%orient%';
But can I do it the other way around? Also, I could have an example like Manchester United and I want to import Man Utd. I could find this by putting a % sign between every character like this:
SELECT TeamName FROM Team WHERE TeamName LIKE '%M%a%n%U%t%d%';
But is there a better way?
Finally, and this might be better put in another question, I would like not to have to search for the correct team when the way a team is named is repeated, i.e. I would like to store alternative spellings/aliases for teams in order to find the correct team entry quickly. Can anybody advise on how I might approach this? Thanks
The solution you are looking for is the FULL TEXT SEARCH, it'll require your DBA to create a full text index, however, once there you can perform much more powerful searches than just character pattern matching.
As the others have suggested, you could also just have an Alias table, which contains all possible forms of a team name and reference that. depending on how your search is working, that may well be the path of least resistance.
Finally, and this might be better put in another question, I would like not to have to search for the correct team when the way a team is named is repeated, i.e. I would like to store alternative spellings/aliases for teams in order to find the correct team entry quickly. Can anybody advise on how I might approach this? Thank
I would personally have a team table and a teamalias table. Use relationships to marry them up.
I believe the best way to prevent this, is to have a list of teams names displayed in a dropdown list. This will also let you drop validation for the team name. The users can now only choose one set team name and will also make it much easier for you working in your database. then you can look for the exact team name as it appears in your database. i.e.:
SELECT TeamName FROM Team WHERE TeamName = [dropdownlist_name];
Related
I am trying to find the most efficient way to find ‘inverse' of getting all records that match particular criteria
I.e. find all predefined criteria from a set that a particular record matches
I have a table of 'target' criteria that has many records - each built using a querybuilder javascript component - so each target record has its criteria stored as a json string in a field.
I also have a standard 'person' table
It is straight forward to query how many people fit a particular target.
What I am trying to do is get all targets that match a particular person
Is there a more efficient way than just running each target's criteria against a person?
Open to suggestions beyond just sql - e.g. caching , hashing or building up some kind of lookup table/file
Edit:
Hopefully tables below clarify this issue. If I parsed and ran the 'Good Eyesight' target criteria I would expect to return both Bob and Sue
But I want to know that Bob matches the 'Young People' and 'Good Eyesight' target. I will have thousands of users and probably up to 50 active targets.
Table 1: Person
ID Name Age Fav_Vegetable
---------------------------------
1 Bob 20 Carrot
2 Sue 40 Carrot
Table 2: Target
ID Name Criteria_JSON
---------------------------------
1 Young People {"rule": "young_age", "selectedOperator": "<","selectedOperand": "Age","value": "30"}
2 Old People {"rule": "old_age", "selectedOperator": ">","selectedOperand": "Age","value": "30"}
3 Good Eyesight {"rule": "vegetable","selectedOperator": "equals","selectedOperand": "Fav_Vegetable","value": "Carrot"}
The answer I have come up with is to run all targets against all people and maintain an index type table of the results.
i.e. have a table TargetIndex with columns targetId, personId
Then when I need to know the targets for a particular person I can just check against the TargetIndex table rather than rerunning queries.
Obviously these results would need to be refreshed as the target or people records change - - probably whenever a target is added/edited and refreshed periodically (hourly/nightly?) to pick up changes in people
Thanks for people's thoughts
i have a database having country, city, state and hotels in these table country name has multiple identical records for eg mexico is wrongly spelled as maxico and mxico and mexico,other records like usa and united states of america and america these type of records are having mutiple same wrongly spelled states and states has multiple wrong spelled cities but hotels are unique and i want them to set them to there right city and state and country for eg. some hotel is in chicago city Illinois state and country is usa. please help me how can i fix this
you could do an update if you know all the different scenarios that are incorrect
update tbl
set city = 'Mexico'
where city in ('maxico', 'mxico')
Well,you can list all values the country column has,and then check wether the values is right, if it is wrong, just use update clause to fix the wrong value, like below:
update my_table set country = 'Mexico' where country in ('maco', 'xico');
It depends on infrastructure you're running.
If you have access to some ETL tools, they often have DataQuality capabilities, often with databases used in correcting adresses. Those are often paid.
If you are a "private" developer, then you might not want to use paid data, so you can look for open data sources, like https://catalog.data.gov allegheny country addresses.
You can use multitude of algorithms and solutions, ranging from simple distances in word space to neural networks pre-trained to do just that.
This type of data problem is hard. There is no built-in simple way to determine the "right spelling". Many databases have one of two capabilities built in that can help -- either "soundex" algorithms or Levenshtein distance.
What should you do? If you really want to fix this problem, create a table with the misspelled name and the correct value that you want. This table will need to be maintained manually, such as in a spreadsheet. Then use this table when importing data and use only the rectified value.
Better yet, set up a reference table with only the correct names. Create a second table with alternative names, which is maintained as above.
I have a table which contains company names which appear to have been a free text box entry. As such there ends up being lots of companies with 3-5 entries such as A Good Company, A Good Company LLC, AA Good Company etc.
I know if I was looking for one company I could use like (%) to get all the variations, but I would like to insert them into a new company table with just one row for all options so that I can use that as a reference table going forward. Is there a way to do this within SQL, or in an outside application for that matter?
Trying to generate a list of tracks composed by more than one person.
Name Composer Make
So it should look something like this
Name composer Make
Going home Robert dennings / Don Bedge Robert dennings , Don Bedge
You probably want something like this
SELECT Name, Composer, REPLACE(Composer,'/',',') AS Make
FROM tracks
But it really is impossible to tell for sure given that you don't tell us any of the table or field names in your database and very little about your database model
So, I'm practicing for an exam (high school level), and although we have never been thought SQL it is necessarry know a little when handling MS Access.
The task is to select the IDs of areas which names does not correspond with the town's they belong to.
In the solution was the following example:
SELECT name
FROM area
WHERE id not in (SELECT areaid
FROM area, town, conn
WHERE town.id = conn.townid
AND area.id = conn.areaid AND
area.name like "*"+town.name+"*");
It would be the same with INNER JOINS, just stating that, because Access makes the connection between tables that way.
It works perfectly (well, it was in the solution), but what I don't get is that why do we need the "not in" part and why can't we use just "not like" instead of "like" and make the query in one step.
I rewrote it that way (without the "not in" part) and it gave a totally different result. If I changed "like" with "not like" it wasn't the opposite of that, but just a bunch of mixed data. Why? How does that work? Please, could someone explain?
Edit (after best answer): It was more like a theoretical question on how SQL queries work, and does not needed a concrete solution, but an explanation of the process. (Because of this I feel like the sql tag however belongs here)
One thing that would create a difference is to consider this example
areaid areaname townname
1 AA AA
1 AA BB
So your first query would exclude both records from the outcome. Because the inner query would identify areaid =1 to be among those to be excluded. Therefore, both records will not show up in the output.
Using not like however would exclude the first record and return to you the second record. Because the first record satisfies the condition with not like but the second doesn't satisfy the condition.
In other words, the first query would exclude any area (and corresponding records) that have at least one townname that is like an areaname. The second approach, would exclude only incidences where areaname is like townname but doesn't necessarily exclude all records for that area.
The reason is because there can be more than one town in an area, right?
So if there is a town in an area that has a similar name, then that area will be found in the LIKE subquery.
If there is another town in the SAME AREA that does not have a similar name, then that area will ALSO be found in the NOT LIKE subquery.
So the same area can be returned whether you use LIKE or NOT LIKE, because of the one-to-many relationship to towns.
Make sense?
It depends on what the relationship between area, town and conn are. If you have many towns in an area, you will see the area duplicated in your row set. Your original query simply asks "Show me the areas that are in the following list:". Your query in one-step asks a different question: "Show me the 'conns' in towns, in areas which have an area name not like the town name...
SELECT name
FROM area, town, conn
WHERE area.id = conn.areaid
AND town.id = conn.townid
AND area.name NOT like "*"+town.name+"*");