Fuzzy Matching Names and Joining Multiple Data Sources

Fuzzy Matching Names and Joining Multiple Data Sources - sql

I have three different tables that I need to join onto each other. The only unique identifier is the names of employees, but the names are not normalized. I've installed a fuzzy function to use, but I'm not sure how I can use the fuzzy match to join on the multiple tables.
table 1 name - Matt Le-Hall
table 2 name - Matt Hal
table 3 name - Matt Hall
If I do a:
select * from table 1, table 2, table 3
where table name 1 = table name 2
and table name 2 = table name 3
The results won't populate due to names being in different normalization, BUT I can use a fuzzy function to get me a certain percentage of similarity between each name column in each table.
My question is: how can I use a fuzzy function in order to join these tables together by non-normalized names?

Related

Bigquery join 2 tables with id concated from 4 columns and create a new table dynamically

I have two tables in Bigquery from two different data sources, lets say x and y. I want to join these two tables on os_name, tracker_name, date, country columns. For that i am using concat function and joining like this:
full outer join x on concat(x.date,x.os_name,x.tracker_name, x.country) = concat(y.date,y.os_name,y.tracker_name,y.country_code)
as a query result common columns also gets duplicated. like in the result there is os_name and os_name_1, country_code, country_code_1 etc. columns. I don't want that. Final columns should be as in the example below in Final Table Schema.
I want to return all records from both sides. For example if there is no match in table y
y_install, and y_purcase will be 0, and vice versa.
X TABLE SCHEMA:
os_name,
tracker_name,
date ,
country
install
purchase
Y TABLE SCHEMA:
os_name,
tracker_name,
date,
country,
y_install,
y_purchase
Final Table Schema required:
os_name,
tracker_name,
date ,
country
install
purchase,
y_install,
y_purchase
I am going to schedule the query and write results to destination table at given interval.
Can you help me out with this query.

Regarding the final table, I don't understand whether you want to return first NON NULL result or whether you want to have e.g. an array which will contain both results from both tables in case both tables a valid value. In my sample table, do you want row 1,2 (actually the same thing) or 3?
row_number
x_install
y_install
final_table_install
1
23
50
23
2
NULL
50
50
3
23
50
[23,50]

It comes out that What I wanted to use was union all. First, I added the non-common columns to the two tables so that the schemas of the two tables are equal. So I was able to vertically merge tables using union all. Thanks for trying to help out anyway.

Access Append Query compare with table

I am currently rebuilding a messy Access Database and I entcountered the following problem:
I've got a Table of facilities which contain a row called district. Those Rows contain a number linked to another table which just contains the numbers and names of districts. I added a lookup Column with the Name of the district displayed.
I now want to change the new column for every row depending on the data in the old row.
Facilities
NAME|..|DISTRICT_OLD
A |..| 1
B |..| 2
C |..| 1
...
DISTRICTS
ID|NAME
1 |EAST
2 |WEST
...
I would like something like the following:
Facilities
NAME|..|DISTRICT_OLD|DISTRICT
A |..| 1|EAST
B |..| 2|WEST
C |..| 1|EAST
...
The District Field (lookup) gets its Data like follows SELECT [DISTRICTS].ID, [DISTRICTS].NAME FROM DISTRICTS ORDER BY [NAME];
(Thanks to Gordon Linoff) I could get the query but I do now struggle with the insert. I can get the Data I want:
SELECT [DISTRICTS].NAME FROM Facilities INNER JOIN DISTRICTS ON Facilities.DISTRICT_OLD = [DISTRICTS].ID;
If I try to INSERT INTO Facilities(DISTRICT) It says Typerror.
How can I modify the data to be compatible with a lookup column?
I guess I need to select the ID as well which isnt't a problem but then the error says to many columns.
I hope I haven't mistaken any names, my Access isn't running the english language.
Can you help me?
Fabian

Lookup columns are number (long integer)
with a relational database, you only need the single column containing the ID (as you always lookup the district.name with a query) so:
INSERT INTO Facilities(DISTRICT) SELECT 4
where 4 is the ID of the record in the lookup table that you want, or better still:
INSERT INTO Facilities(DISTRICT)
SELECT ID FROM DISTRICTS
where District.Name = "Name you want the ID for"

SQL/T-SQL Substring LEFT or Right doesn't appear to resolve

I have a two table where I have some values in a column UniqueKeys such as:
Table 1
2016_2016-2 S2_001840_30_01
2017_2017-2 D4_002213_3_01
The problem is that I am trying to match these with table 2 Unique values where the values are written in a different order such as :
Table 2:
001840_2016-2_S2_30_D_179_364128_400985
002213_2017-2_D4_3_E_752_376901_422828
Table 1 is from a different source system and table 2 is from different one. What I am trying to achieve is create a new table TABLE 3 where when the unique values match between table 1 and table 2 then insert the data from certain columns of table 1 and 2 into table 3 or else ignore the rest.
The way the Unique values should be is the following:
Year and Period: 2016-2
Cycle : S2
Unit: 001840
Group: 30
Giving the end result in Table 3 as:
001840_2016-2_S2_30
002213_2017-2_D4_3

You need to split both input values by "_" and then recombine the parts in the way they lead to the same format. Then you can join the tables.
Use two functions, the first one for values from type table 1, the second for values from table 2.
Effekt:
SELECT ...
FROM table1
JOIN table2 ON splitfunction1(table1.Key1) = splitfunction2(table2.Key2);

SQL: Merge 2 Columns or more from different tables into one Alias new Column

I have 2 tables: DisconnectionsData and DisconnectionsVoice.
both of them have columns: WeekNum,Month,Quarter,Year,IncomeLost and ID.
the names of the columns are different between the 2 tables, but the data inside them is parallel (Quarter and QuarterNumber is literally the same).
my wish is to FULL JOIN both tables into one table with only 6 columns.
i cant figure out how to make an alias, for example: how do i merge DisconnectionsData.Quarter and DisconnectionsVoice.QuarterNumber
into one column with the alias name of QuarterOfDisconnection.
that the desired result:
thank you.

You don't specify which sql are you using? You can try below.
SELECT *
FROM DisconnectionsData
UNION ALL
SELECT *
FROM DisconnectionsVoice

Normalise/join SQL Server tables

I have inherited a SQL server database which isn't normalised and is giving me headaches. I am not very experienced in SQL and maybe asking stupid questions but would appreciate any advice on how to go forward with the below scenarios.
I have three tables as follows:-
A table of results:
**ResId CompId Name Result**
1 1 Band A 2
2 1 Band B 1
3 1 Band C 3
4 2 Band A* 2
5 2 Band B 1
6 2 Band C 3
A table of Bands current names:
**BandId BandName**
1 Band A
2 Band B
3 Band C
A table of names the bands were previously known as (linked on BandId):
**oldBandId BandId oldBandName**
1 1 Band A*
2 1 Band a
2 2 Band b
I am looking to consolidate the list of band names in the results table, replacing the band name with a bandId however the result table contains band names from both tables. First question should I create some sort of join table and use this as the bandId in the results table? If so What do I need in this join table, is it just a psuedo-Id of bandId/oldBandId and the table name concatenated then this placed in the results table?
I am then looking to use a query to select all results where the user selects the band by any name variant (new or old) and returns the results including all names linked with the band i.e. choosing Band A would return the results for both Band A and Band A*.
Thanks in advance
Steve

I like the idea of using the band id in the results table. I would suggest eliminating the "old band name" table and replace it with a table of band aliases, since that sounds more like what you want. The band alias table would just have the band id and an one alias per row.

I think your current db structure is fine enough - I can't think of any way to improve on it, without complicating it further (especially if you want to retain the old band names).
You can just write a query as so for your need -
select * from results
where Name = #bandName or
Name in (select oldBandName
from oldBands
where BandId in (select BandId
from Bands
where BandName = #bandName))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Fuzzy Matching Names and Joining Multiple Data Sources - sql

Related

Bigquery join 2 tables with id concated from 4 columns and create a new table dynamically

Access Append Query compare with table

SQL/T-SQL Substring LEFT or Right doesn't appear to resolve

SQL: Merge 2 Columns or more from different tables into one Alias new Column

Normalise/join SQL Server tables

Categories

Resources