Conditional joining in Postgres using levenshtein - sql

I have two tables lets say Table A and Table B...
I want to query these two tables so that I can check to see if two columns in the tables say col1 and col2 are similar and show them.
Something like:
SELECT A.col1, B.col2
FROM A INNER JOIN B
ON LEVENSHTEIN(A.col1, B.col2) < 2;
Ultimately I want to also get rid of all the white spaces within and just look at the characters within the columns so
if col1 values where {g o o d, b a d,}
and col2 had {good,bad}
I would like those to be matches

Does this work?
SELECT A.col1, B.col2
FROM A INNER JOIN
B
ON LEVENSHTEIN(replace(A.col1, ' ', ''), replace(B.col2, ' ', '')) < 2;

Related

filtering for column pairs in SQL

consider two tables that have the following columns
A
1,X
2,Y
3,Z
B
1,X
1,Y
1,Z
2,X
2,Y
2,Z
3,X
3,Y
3,Z
is it possible select rows in B that have column pairs as in A without joining or a third column?
something like
select * from B where distinct columns in (select distinct columns from A)
You could use exists logic:
SELECT col1, col2
FROM TableB b
WHERE EXISTS (SELECT 1 FROM TableA a WHERE a.col1 = b.col1 AND a.col2 = b.col2);

How to find differences in a content table SQL

I would like to find records which differ from eachother, based on different datasets in the same table, which are loaded on a different date.
So if one or more attributes(except from the key) differ from eachother from dataset x loaded on 1-1-2018 and dataset y loaded on 31-12-2018.
How do i achieve this in SQL?
The key on which the compare should be made is ZIP_CODE + House_ID
Greets,
you can get previous zipcode by LAG
SELECT ZipCode, HouseId,
LAG(ZipCode, 1,0) OVER (ORDER BY LoadDate) AS ZipCodeMinus1,
LAG(HouseId, 1,0) OVER (ORDER BY LoadDate) AS HouseIdMinus11
FROM Addresses;
A simple way to compare sets is
select ... a
EXCEPT
select ... b
but you need another
select ... b
EXCEPT
select ... a
and this doesn't tell you which columns are different.
Or you use a full outer join:
select
coalesce(a.ZIP_CODE, b.ZIP_CODE)
,coalesce(a.House_ID, b.House_ID)
,case when a.col1 <> b.col then 'a: || a.col1 || ' b:' || b.col1 end
...
from
( select ....) as a
full join
( select ....) as b
on a.ZIP_CODE = b.ZIP_CODE
and a.House_ID = b.House_ID
and ( a.col1 <> a.col1 or
a.col2 <> a.col2 or
a.col3 <> a.col3 or
...
)
If columns are NULLable you need to add more conditions checking for one of both columns is NULL. Of course this comparison syntax can be automatically created using the existing metadata....

SQL how to check is a value in a col is NOT in another table

Maybe I need another coffee because this seems so simple yet I cannot get my head around it.
Let's say I have a tableA with a col1 where employee IDs are stored.... ALL employee IDs. And the 2nd table, tableB has col2 which lists all employeeID who have a negative evaluation.
I need a query which returns all ID's from col1 from table1 and a newcol which show a '1' for those ID's which do NOT exist in col2 of TableB.
I am doing this in dashDB
One option uses a LEFT JOIN between the two tables:
SELECT a.col1,
CASE WHEN b.col2 IS NULL THEN 1 ELSE 0 END AS new_col
FROM tableA a
LEFT JOIN tableB b
ON a.col1 = b.col2
Alternatively you can achieve your requirement with LEFT JOIN along with IFNULL function as below.
SELECT a.col1,
IFNULL(b.col2, 1) NewCol
FROM tableA a
LEFT JOIN tableB b
ON a.col1 = b.col2

Is there a way to do a multi table query and get result just from specific tables?

I am trying to do a multi query but I don't want to use sub queries i.e:
SELECT column1
FROM table1
WHERE
EXISTS (SELECT column1 FROM table2 WHERE table1.column1 = table2.column1);)
I thought of using a JOIN but so far my best result was this:
SELECT *
FROM table1
JOIN table2 ON table1.t1id = table2.t2id
WHERE table1.id = 5;
This would be good except of the fact that I get a duplicate column (the id in table 1 and 2 are foreign keys).
How do I remove the duplicate column if possible?
UPDATE:
Table1:
tableA_ID, TABLEB_ID
1, 1
1, 4
3, 2
4, 3
TableA: ID, COL1, COL2
1, A, B
2, A, B
3, A, B
4, A, B
TableB: ID, Col3, COL4
1, C, D
2, C, D
3, C, D
4, C, D
I want to get all or some of the columns from TableA according to a condition
Sample: Lets say the condition is that tableA_ID = 1 which will result in the 2 first rows in the table then I want to get all or some of the columns in TableA that respond to the ID that I got from Table1.
Sample: The result from before was [{1,1}{1,4}] which means I want from TableA the results:
TableA.ID, TableA.COL1, TableA.COL2
1,A,B
4,A,B
The actual results I get is:
Table1.tableA_ID, Table1.TABLEB_ID, TableA.ID, TableA.COL1, TableA.COL2
1,1,1,A,B
1,4,4,A,B
Is this what you're looking for?
select a.id, a.column1, b.column2
from table1 a
left join table2 b on a.id = b.otherid;
You can't change the column list of a query based on the values it returns. It just isn't the way that SQL is designed to operate. At best, you can return all of the columns from the second table and ignore the ones that aren't relevant based on other values in that row.
I'm not even sure how a variable column list would work. In your scenario, you're looking for two discrete values separately. But that's not the only scenario: what if the condition is tableA_ID in (1,2). Would you want different numbers of columns in different rows as part of a single result set?
Getting just the columns you want (just from specific tables, as you say) is the easy part (btw -- don't use '*' if you can help it -- topic for another discussion):
SELECT
A.ID,
A.COL1,
A.COL2
FROM
TABLE1 Bridge
LEFT JOIN TABLEA A
ON Bridge.TABLEA_ID = A.ID
LEFT JOIN TABLEB B
ON Bridge.TABLEB_ID = B.ID
Getting the rows you want will be the harder part (influenced by your choice of joins, among several other things).
I think you'll need to select only the fields of table A and use a distinct clause. Rest of your query will remain as it is. i.e.
SELECT distinct table1.*
FROM table1
JOIN table2 ON table1.t1id = table2.t2id
WHERE table1.id = 5;

Comparing substrings in same table

I need to run a query that will give me the list of all entries in one column that is NOT LIKE any of the entries in another column, i.e.:
SELECT DISTINCT columnA
FROM tableA
WHERE columnA NOT LIKE (SELECT columnB FROM tableA)
Obviously, the above query doesn't work, I'm providing it only in the hopes that it will clarify what I'm trying to achieve. So, as an example, say that my columns contain the following:
COLUMNA:
ABCD
ABCE
BCDE
BCDF
BCDEF
GHIJ
GHIK
COLUMNB:
ABC
DEF
HIJ
My desired results would be:
BCDE
BCDF
GHIK
There are a total of 396 values in column in the table, so just entering the values manually is not feasible. In addition, as noted in the example, the values in columnB would always be substrings of the values in columnA, so I also need to have my query do the comparison with that in mind.
Thanks in advance for any help anyone can offer, and also apologies if this question has already been answered elsewhere - I did a search but wasn't able to find anything that I could interpret as addressing this specific requirement.
ADDING NEW INFO **
So, as noted, I made a HUGE mistake in that the two columns are in different tables. That said, though, it was easy enough to modify califax's suggestion below as follows:
SELECT DISTINCT COLUMNA
FROM TABLE1 T1
LEFT JOIN TABLE2 T2 ON
T1.COLUMNA LIKE '%' + T2.COLUMNB + '%'
AND T2.COLUMNB IS NULL
However, it's still returning the full list of entries from COLUMNA. I've confirmed that there are entries in COLUMNB that are substrings of the entries in COLUMNA - any ideas why this isn't filtering?
Thanks.
SELECT DISTINCT columnA
FROM tableA as O
WHERE not exists ( select 42 from TableA where O.ColumnA like ColumnB )
Perform a self join, and look for the ones that don't match:
SELECT DISTINCT a1.ColumnA
FROM TableA a1
LEFT JOIN TableA a2
ON a1.ColumnA LIKE '%' + a2.ColumnB + '%'
AND a2.ColumnB IS NULL
(I added a leading wildcard, since you clarified the desired matches in your question.)
UPDATE
If there are two distinct tables, b.ColumnB shows you the ones that don't match:
SELECT DISTINCT a.ColumnA
FROM TableA a
LEFT JOIN TableB b
ON a.ColumnA LIKE '%' + b.ColumnB + '%'
AND b.ColumnB IS NULL
I would try something like :
select distinct columnA from tableA where columnA not like '%' + columnB + '%'
or following criticalfix's remark (as I'm not sure what you wish exactly)
SELECT DISTINCT columnA FROM tableA tbA
WHERE not exists ( select 1 from TableA where tbA.ColumnA like '%' + ColumnB '%' )