Matching two columns in MySQL - sql

I'm quite new to SQL and have a question about matching names from two columns located within a table:
Let's say I want to use the soundex() function to match two columsn. If I use this query:
SELECT * FROM tablename WHERE SOUNDEX(column1)=SOUNDEX(column2);
a row is returned if the two names within that row match. Now I'd also like to get those name matches between column1 and column2 that aren't in the same row. Is there a way to automate a procedure whereby every name from column1 is compared to every name from column2?
Thanks :)
p.s.: If anyone could point me in the direction of a n-gram/bi-gram matching algorithm that is easy for a noob to implement into mysql that would be good as well.

If your table has a key, say id, you can try:
select A.column1, B.column2
from tablename as A, tablename as B
where (A.id != B.id) and (SOUNDEX(A.column1) = SOUNDEX(B.column2))

You can join the table to itself on that relationship as such:
SELECT * FROM tablename t1 JOIN tablename t2
ON SOUNDEX(t1.column1) = SOUNDEX(t2.column2);

Related

Store only 1 values and remove the rest for same duplicated values in bigquery

I have duplicated values in my data. However, from the duplicated values, i only want to store 1 values and remove the rest of same duplicated values.
So far, I have found the solution where they remove ALL the duplicated values like this.
Code:
SELECT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
Sample input:
Sample output (Store only 1 values from the duplicated values and remove the rest):
NOTE: query might wrong as it remove all duplicated values.
Considering your screenshot's sample data and your explanation. I understand that you want to remove duplicates from your table retaining only one row of unique data. Thus, I was able to create a query to select only one row of data ignoring the duplicates.
In order to select the rows without any duplicates, you can use SELECT DISTINCT. According to the documentation, it discards any duplicate rows. In addition to this method, CREATE TABLE statement will also be used to create a new table (or replace the previous one) with the new data without duplicates. The syntax is as follows:
CREATE OR REPLACE TABLE project_id.dataset.table AS
SELECT DISTINCT ID, a.date as date.A, b.date as date.B,
CASE WHEN a.date <> b.date THEN NULL END AS b.date
except(date.A)
FROM
table1 a LEFT JOIN table2 b
USING (ID)
WHERE date.A = 1
And the output will be exactly the same as you shared in your question.
Notice that I used CREATE OR REPLACE, which means if you set project_id.dataset.table to the same path as the table within your select, it will replace your current table (in case you have the data coming from one unique table). Otherwise, it will create a new table with the specified new table's name.
You can use aggregation. Something like this:
SELECT ANY_VALUE(a).*, ANY_VALUE(b).*
FROM table1 a LEFT JOIN
table2 b
USING (ID)
WHERE date.A = 1
GROUP BY id, a.date;
For each id/datecombination, this returns an arbitrary matching row froma/b`.

Oracle SQL Update one table column with the value of another table

I have a table A, where there is a column D_DATE with value in the form YYYYMMDD (I am not bothered about the date format). I also happen to have another table B, where there is a column name V_TILL. Now, I want to update the V_TILL column value of table B with the value of D_DATE column in table A which happens to have duplicates as well. Meaning, the inner query can return multiple records from where I form a query to update the table.
I currently have this query written but it throws the error:
ORA-01427: single-row subquery returns more than one row
UPDATE TAB_A t1
SET (V_TILL) = (SELECT TO_DATE(t2.D_DATE,'YYYYMMDD')
FROM B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE)
WHERE EXISTS (
SELECT 1
FROM TAB_B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE)
PS: BK_CODE IS THE CONCATENATION OF BK_CODE and BR_CODE
Kindly help me as I am stuck in this quagmire! Any help would be appreciated.
If the subquery returns many values which one do you want to use ?
If any you can use rownum <=1;
If you know that there is only one value use distinct
SET (V_TILL) = (SELECT TO_DATE(t2.D_DATE,'YYYYMMDD')
FROM B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE AND ROWNUM <=1)
or
SET (V_TILL) = (SELECT DISTINCT TO_DATE(t2.D_DATE,'YYYYMMDD')
FROM B t2
WHERE t1.BR_CODE = t2.BR_CODE
AND t1.BK_CODE = t2.BK_CODE||t2.BR_CODE)
above are workarounds. To do it right you have to analyze why you are getting more than one value. Maybe more sophisticated logic is needed to select the right value.
I got it working with this command:
MERGE INTO TAB_A A
USING TAB_B B
ON (A.BK_CODE = B.BK_CODE || B.BR_CODE
AND A.BR_CODE = B.BR_CODE AND B.BR_DISP_TYPE <> '0'
AND ((B.BK_CODE, B.BR_SUFFIX) IN (SELECT BK_CODE,
MIN(BR_SUFFIX)
FROM TAB_B
GROUP BY BK_CODE)))
As mentioned earlier by many, I was missing an extra condition and got it working, otherwise the above mentioned techniques work very well.
Thanks to all!

Select values from one table depending on referenced value in another table

I have two tables in my SQLite Database (dummy names):
Table 1: FileID F_Property1 F_Property2 ...
Table 2: PointID ForeignKey(fileid) P_Property1 P_Property2 ...
The entries in Table2 all have a foreign key column that references an entry in Table1.
I now would like to select entries from Table2 where for example F_Property1 of the referenced file in Table1 has a specific value.
I tried something naive:
select * from Table2 where fileid=(select FileID from Table1 where F_Property1 > 1)
Now this actually works..kind of. It selects a correct file id from Table1 and returns entries from Table2 with this ID. But it only uses the first returned ID. What I need it to do is basically connect the returned IDs from the inner select by OR so it returns data for all the IDs.
How can I do this? I think it is some kind of cross-table-query like what is asked here What is the proper syntax for a cross-table SQL query? but these answers contain no explaination of what they are actually doing so I'm struggeling with any implementation.
They are using JOIN statements, but wouldn't this mix entries from Table1 and Table2 together while only checking matching IDs in both tables? At least that is how I understand this http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
As you may have noticed from the style, I'm very new to using databases in general, so please forgive me if not everything is clear about what I want. Please leave a comment and I will try to improve the question if neccessary.
The = operator compares a single value against another, so it is assumed that the subquery returns only a single row.
To check whether a (column) value is in a set of values, use IN:
SELECT *
FROM Table2
WHERE fileid IN (SELECT FileID
FROM Table1
WHERE F_Property1 > 1)
The way joins work is not by "mixing" the data, but sort of combining them based on the key.
In your case (I am assuming the key field in Table 1 is unique), if you join those two tables on the primary key field, you will end up with all the entries in table2 plus all corresponding fields from table1. If you were doing this:
select * from table1, table2 where table1.fieldID=table2.foreignkey;
then, providing your key fields are set up right, you will end up with the following:
PointID ForeignKey(fileid) P_Property1 P_Property2 FileID F_Property1 F_Property2
The field values from table1 would be from matching rows.
Now, if you do this:
select table1.* from table 1, table2 where
table1.fieldID=table2.foreignkey and F_Property1>1;
Would essentially get the same set of records, but will only show the columns from the second table, and only those that satisfy the where condition for the first one.
Hope this helps :)
If I understood your question correctly this will get the job done.
Select t2.*
from table1 t1
inner join table2 t2 on t2.id = t1.id
where t1.Prop = 'SomeValue'

how to compare two rows in one mdb table?

I have one mdb table with the following structure:
Field1 Field2 Field3 Field4
A ...
B ...
I try to use a query to list all the different fields of row A and B in a result-set:
SELECT * From Table1
WHERE Field1 = 'A'
UNION
SELECT * From Table1
WHERE Field1 = 'B';
However this query has two problems:
it list all the fields including the
identical cells, with a large table
it gives out an error message: too
many fields defined.
How could i get around these issues?
Is it not easiest to just select all fields needed from the table, based on the Field1 value and group on the values needed?
So something like this:
SELECT field1, field2,...field195
FROM Table1
WHERE field1 = 'A' or field1 = 'B'
GROUP BY field1, field2, ....field195
This will give you all rows where field1 is A or B and there is a difference in one of the selected fields.
Oh and for the group by statement as well as the SELECT part, indeed use the previously mentioned edit mode for the query. There you can add all fields (by selecting them in the table and dragging them down) that are needed in the result, then click the 'totals' button in the ribbon to add the group by- statements for all. Then you only have to add the Where-clause and you are done.
Now that the question is more clear (you want the query to select fields instead of records based on the particular requirements), I'll have to change my answer to:
This is not possible.
(untill proven otherwise) ;)
As far as I know, a query is used to select records using for example the where clause, never used to determine which fields should be shown depending on a certain criterium.
One thing that MIGHT help in this case is to look at the database design. Are those tables correctly made?
Suppose you have 190 of those fields that are merely details of the main data. You could separate this in another table, so you have a main table and details table.
The details table could look something like:
ID ID_Main Det_desc Det_value
This way you can filter all Detail values that are equal between the two main values A and B using something like:
Select a.det_desc, a.det_value, b.det_value
(Select Det_desc, det_value
from tblDetails
where id_main = a) as A inner join
(Select Det_desc, det_value
from tblDetails
where id_main = a) as B
on A.det_desc = B.det_desc and A.det_value <> B.det_value
This you can join with your main table again if needed.
You can full join the table on itself, matching identical rows. Then you can filter on mismatches if one of the two join parts is null. For example:
select *
from (
select *
from Table1
where Field1 = 'A'
) A
full join
(
select *
from Table1
where Field1 = 'B'
) B
on A.Field2 = B.Field2
and A.Field3 = B.Field3
where A.Field1 is null
or B.Field1 is null
If you have 200 fields, ask Access to generate the column list by creating a query in design view. Switch to SQL view and copy/paste. An editor with column mode (like UltraEdit) will help create the query.

how to select a row where one of several columns equals a certain value?

Say I have a table that includes column A, column B and column C. How do I write I query that selects all rows where either column A OR column B OR column C equals a certain value? Thanks.
Update: I think forgot to mention my confusion. Say there is another column (column 1) and I need to select based on the following logic:
...where Column1 = '..' AND (ColumnA='..' OR ColumnB='..' OR ColumnC='..')
Is it valid to group statements as I did above with parenthesis to get the desired logic?
Unless I'm missing something here...
SELECT * FROM MYTABLE WHERE COLUMNA=MyValue OR COLUMNB=MyValue OR COLUMNC=MyValue
I prefer this way as its neater
select *
from mytable
where
myvalue in (ColumnA, ColumnB, ColumnC)
SELECT *
FROM myTable
WHERE (Column1 = MyOtherValue) AND
((ColumnA = MyValue) OR (ColumnB = MyValue) OR (ColumnC = MyValue))
Yes, it's valid to use parentheses. However, if you're searching multiple columns for the same value, you may want to consider normalizing the database.