Pattern Matching or Fuzzy Matching of two tables based on one column - sql

Assuming I have the right naming, what O am trying to write is a function or stored procedure to compare names and find out if they are the same value.
I think its called fuzzy matching
For example, a table has 2 columns and table b has 3 columns:
Name
Number
Hello
24
Evening
56
Name
Num
F
Heello
23
some value
GoodEvening
15
some value
I want table like
A
D
Hello
Heello
Morning
GoodMorning
Currently, I'm using
Select A.Name, B.Name
from table A
left table B
on A.Name like B.Name
or (LTRIM(RTRIM(REPLACE(REPLACE(REPLACE( A.Name,' ',''),'-',''),'''',''))) = LTRIM(RTRIM(REPLACE(REPLACE(REPLACE(B.Name,' ',''),'-',''),'''',''))))
OR (A.Name LIKE '%'+B.Name+'%')
OR (B.Name LIKE '%'+A.Name+'%')
It is giving me a result, but not too accurate and is very slow, any other way I could try to compare these values?

Related

SQLite: Matching a column containig a single string to another column containing comma-separated values [duplicate]

I have a table with a column that has concatenated values like this
Table CHILD:
ChildId Values
2 x123,j455
3 f456,z789
4 m333,y567
5 x123,h888
And I have a master table MASTER that has
Table MASTER:
MainValues
x123
f456
y567
I need to get a query that'll select the following data
ChildId MainValues
2 x123
3 f456
4 y567
5 x123
Basically match value from MASTER in child values and return only the master value. How can I do this ? I have tried IN and LIKE clause matching with second table but that doesnt help much since the values are csv. Is there a way to split and match in sqlite ?
EDIT: Table and column names are fictional and intended just to explain this question better
Use a regular expression:
SELECT ChildId,MainValues FROM CHILD INNER JOIN MASTER WHERE ','||[Values]||',' like '%,'||MainValues||',%'
Also, please refrain from using keywords like values for column names...
Unfortunately SQLite doesn't have a function to find the index of a character from a string. So you have to rely on something else. Idan's method is good too but can be slower. You may try this:
SELECT c.childID, m.mainvalues
FROM CHILD c
JOIN MASTER m
WHERE m.mainvalues = substr(c.ivalues, -length(c.ivalues), 4)
OR m.mainvalues = substr(c.ivalues, 6);
I have used 4 and 6 assuming your number of characters before and after the ,. If that's not fixed you can try:
SELECT c.childID, m.mainvalues
FROM CHILD c
JOIN MASTER m
WHERE m.mainvalues = substr(c.ivalues, -length(c.ivalues), length(m.mainvalues))
OR m.mainvalues = substr(c.ivalues, length(m.mainvalues) + 2);

Count blanks in multiple columns, grouped by another value

Ok so this gets me the count of how many Records of type A are blank in column B
SELECT A, Count(B)
FROM `table1`
where
B = ""
group by A
it gives me a table
A
B
First
564
Second
1985
And that is great. But I want this to summarize by counting blanks in multiple columns, not just blanks in column B, like this:
A
B
C
First
564
9001
Second
1985
223
I have an intuition that this is done by creating another table first that would look like this
A
Column
Value
First
"B"
B value
First
"C"
C value
Second
"B"
B value
Second
"C"
C value
for every document, so you can count blanks, but I'm not sure how to get there. Is this the right approach? or is there a much simpler version using pivot tables or similar?
You could try using a conditional sum,
select A,
Sum(case when b='' then 1 end) B,
Sum(case when c='' then 1 end) C
from t
group by A

SQLite3: merge rows with common columns

For some context I have a table in SQLite3 that currently looks like this:
What I am looking to do is merge rows with the same breed. The same columns will not be populated in both cases. So far I have tried this kind of query but it doesn't really do the job I am looking for, as it will not deduplicate or merge the rows as desired. Also it seems to be difficult to generalise to all columns without having to manually type out each column name.
select distinct t1.breed, coalesce(t1.dog_group_1, t2.dog_group_1) from breed_merge t1 left join breed_merge t2 on t1.breed = t2.breed;
Output:
Afador|
Affenhuahua|
Affenpinscher|
Affenpinscher|GROUP 1 - TOYS
Afghan Hound|
Afghan Hound|GROUP 4 - HOUNDS
...
Desired output:
Afador|
Affenhuahua|
Affenpinscher|GROUP 1 - TOYS
Afghan Hound|GROUP 4 - HOUNDS
...
For this sample data, where you have max 2 rows for each breed and each of these 2 rows (if they exist) contain a value or null, all you have to do is group by breed and use an aggregate function like MAX() for each of the other columns:
SELECT breed, MAX(imgsrc) imgsrc, MAX(dog_group_1) dog_group_1, .....
FROM breed_merge
GROUP BY breed

PostgreSQL Return Row if Value Exists in One of Several Columns

Ok, I am stuck on this one.
I have a PostgreSQL table customers that looks like this:
id firm1 firm2 firm3 firm4 firm5 lastname firstname
1 13 8 2 0 0 Smith John
2 3 2 0 0 0 Doe Jane
Each row corresponds to a client/customer. Each client/customer can be associated with one or multiple firms; the numeric value under each firm# columns corresponds to the firm id in a different table.
So I am looking for a way of returning all rows of customers that are associated with a specific firm.
For example, SELECT id, lastname, firstname where 8 exists in firm1, firm2, firm3, firm4, firm5 would just return the John Smith row as he is associated with firm 8 under the firm2 column.
Any ideas on how to accomplish that?
You can use the IN operator for that:
SELECT *
FROM customer
where 8 IN (firm1, firm2, firm3, firm4, firm5);
But it would be much better in the long run if your normalized your data model.
You should consider to normalize your tables, with the current schema you should join firms tables as many times as the number of firm fields in your customer table.
select *
from customers c
left join firms f1
on f1.firm_id = c.firm1
left join firms f2
on f2.firm_id = c.firm2
left join firms f3
on f3.firm_id = c.firm3
left join firms f4
on f4.firm_id = c.firm4
You can "unpivot" using a combination of array and unnest, as specified in this answer: unpivot and PostgreSQL.
In your case, I think this should work:
select lastname,
firstname,
unnest(array[firm1, firm2, firm3, firm4, firm5]) as firm_id
from customer
Now you can select from this table (using either a with statement or an inner query) where firm_id is the value you care about

Find rows that contain all words in any order

My application is built in vb.net with SQL Server Compact as the database so I'm unable to use a full-text index.
Here's my data...
MainTable field1
A B C
B G C
X Y Z
C P B
Search term = B C
Expected Results = any combination of the search term = Rows 1, 2, 4
Here's what I'm currently doing...
I'm permuting the search term B C into an array containing %B%C% and %C%B% and inserting those values into field1 of tempTable.
So my SQL looks like this:
SELECT * FROM MainTable INNER JOIN tempTable ON MainTable.field1 LIKE tempTable.field1
In this simple example, it does return the expected results correctly. However, my search term can contain more values. For example 6 search terms B C D E F G when permuted has 720 different values and as more search terms are used, the permutations grow exponentially...which is not good.
Is there a better way to do this?
The following will work for your example above:
Select * from table where field1 like '%[BC]%'
But it will also return strings that contain ONLY "B" or "C". Do you need both characters in any order or one or more?
EDIT: Then the following would work:
Select * from test_data where col1 LIKE '%Apple%' and col1 like '%Dog%'
See the demo here: http://rextester.com/edit/LNDQ49764