I want to merge the duplicate names in my table or at least see the names that are unique and look alike - sql

I have a employee table with schema as follows:
Id Name Birthday DeathDay Startdate EndDate
The problem is that I have data as follows:
Bergh Celestin 06/09/1791 14/12/1861
Bergh Célestin 06/09/1791 14/12/1861
Bergh Francois 04/04/1958 11/12/2001
Bergh Jozef Francois 04/04/1958 11/12/2001
Now i want to merge these records as 1 as they are the same person how can i do that?
Also, if I just want to display the list of only those person from the table whose names are possibly same, like above, how can I do that?
I used:
select Distinct name,birthday,deathday from table
but that is not good enough.

I would use a function (.NET or SQL) of sorts to remove the accents as per https://stackoverflow.com/a/12715102/1662973 and then group on that together with the dates. You will need to group on something, as essentially "Bergh Célestin" could actually be a different person to "Bergh Celestin".
Sample:
select
RemoveExtraChars(name)
,birthday
,deathday
from
TABLE
group by
RemoveExtraChars(name)
,birthday
,deathday

For your second Question you can use SQL LIKE Operator:
SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern;

Related

In Snowflake, I want to count duplicates in a table based on all the columns in the table without typing out every column name

I have a table with 60 columns in it. I would like to identify how many duplicates there are in the table based on all the columns being identical.
I don't want to have to type out every field name in the SELECT or GROUP BY clauses. Is there a way to do that?
You can use an approach like this for each table:
SELECT
MD5(OBJECT_CONSTRUCT(SRC.*)::VARCHAR) DUP_MD5, SUM(1) AS TOTAL_COUNT
FROM <table> SRC
GROUP BY 1
HAVING SUM(1) > 1;

SQL - Find specific value in a specific table

Say I have a table called "Team" with the following columns:
ID, MemberName ,ManagerName,Title
And I would like to retrieve all rows where a value "John" exists.
Assume "John" exists in a row for the MemberName column, and that "John" would exist in another row under the "ManagerName" column.
Please assume would have large number of columns. Greater than 50, and do would not know where the value would fall under statically.
If you need an exact match for "John" you can use following query:
Select *
From Team
Where MemberName = 'John' or ManagerName = 'John'
If you need all rows where "John" could be a part of the string then you can use like:
Select *
From Team
Where MemberName like '%John%' or ManagerName like '%John%'
Generally, you need to specify all the columns your are searching from in SQL.
SELECT * FROM Team WHERE 'John' IN (col1, col2, col3, ..., colN) ;
However that depends.
If you are using MySQL you can Search Table Data. From the MySQL Workbench right click the table , and choose Search Table Data.
If you are using PostgreSQL take a look at the following:
https://stackoverflow.com/a/52715388/10436747

Count only a specific subset of elements in a Postgres DB

I have a table with some identifiers that repeat themselves like
id
-------
djkfgh
kdfjhw
efkujh
dfsggs
djkfgh
djkfgh
efkujh
I also have a list of id's of interest, say ["djkfgh","dfsggs"]. I would like to count only those values that appear in the list, rather than all the distinct values of the column.
Select count(id) from table where id IN(subset);

Updating table where LIKE has several criteria

I have two tables in PostgreSQL (version 9.3). The first holds id, title and the second holds schdname. I'm trying to create a select statement that will retrieve id and title where the title contains the schdname from the other table. The id, title table can hold several thousand rows. I can do this fine if I use WHERE LIKE for an individual schdname example but there are 40 plus names so this is not practical.
My original query ran like this which I know doesn't work but would show what I'm trying to achieve.
SELECT
id,
title,
dname
FROM
mytable
WHERE
title LIKE (
SELECT
schdname
FROM
schedule
)
This produces an error of more than one row returned by the subquery used as an expresssion. So my question is can this be achieved another way?
Here is one way to do that:
SELECT id, title, dname FROM mutable
JOIN schedule ON mutable.title like '%' || schedule.schdname || '%'
Or a sligtly more readable way:
SELECT id, title, dname FROM mutable
JOIN schedule ON POSITION(schedule.schdname in mutable.title)<>0
Are you actually using a wildcard with like? You don't say so above. If not you can replace like with IN. If you do want to do a wildcard join I'd recommend taking a substring of the columns and comparing that e.g.
names
james
jack
janice
select substr(names,1,2) as names_abbr
from names_table where names_abbr = (select ...)

SQL using where contains to return rows based on the content of another table

I need some help:
I have a table called Countries, which has a column named Town and a column named Country.
Then I have table named Stores, which has several columns (it is a very badly set up table) but the ones that are important are the columns named Address1 and Address2.
I want to return all of the rows in Stores where Address1 and Address2 contains the towns in the Countries table.
I have a feeling this is a simple solution but I just can't see it.
It would help if maybe you could use WHERE CONTAINS but in your parameters search in another table's column?
e.g.
SELECT *
FROM Stores
WHERE CONTAINS (Address1, 'Select Towns from Countries')
but obviously that is not possible, is there a simple solution for this?
You're close
SELECT * FROM Stores s
WHERE EXISTS (
SELECT * FROM Countries
WHERE CONTAINS(s.Address1, Town) OR CONTAINS(s.Address2, Town)
)
This would be my first attempt:
select * from stores s
where
exists
(
select 1 from countries c
where s.Address1 + s.Address2 like '%'+c.Town+'%'
)
Edit: Ooops just saw that you want the 'CONTAINS' clause. Then take Paul's solution