Updating a database column based on its similarity to another database column - sql

I have a database table (Customers) with the following columns:
ID
FIRST_NAME
MIDDLE_INIT
LAST_NAME
FULL_NAME
I also have a database table (ENG) with the following columns:
ID
ENG_NAME
I want to replace all of the ENG.ENG_NAME entries with a FULL_NAME entry from the CUSTOMERS table
Here is the problem.
The ENG_NAME was hand-jammed through a web form and, so, has no consistency. For instance, one row might contain "Robin Hood". Another "Hood, Robin L". An another "Robin L Hood".
I want to search the entries in the CUSTOMERS table, find a close match, then replace the ENG.ENG_NAME with the CUSTOMERS.FULL_NAME.
Example:
ENG table CUSTOMERS table
ID ENG_NAME ID FULL_NAME FIRST_NAME MIDDLE_INIT LAST_NAME
================ ==================================================================
1 Hood,Robin 1 Robin L Hood Robin L Hood
2 Rob Hood 2 Maid M Marion Maid M Marion
3 Marion M 3 Friar F Tuck Friar F Tuck
4 Rob Garza 4 Robert A Garza Robert A Garza
Based on the data above, I would want ENG_NAME columns to be replaced like this:
ENG table
ID ENG_NAME
====================
1 Robin L Hood
2 Robin L Hood
3 Maid M Marion
4 Robert A Garza
Any thoughts on how to do this?
Thanks

This is not going to be a simple task, I would start at finding a good C# (or any .NET) algorithm that detects similar strings portions.
Then look at Compiling C# Code into SQL Stored Procedures and Invoke that code using SQL Server. This CLR Code can then write the results to a table for you to analyze and do whatever you want with it.
For More: CLR SQL Server User-Defined Function

I would do it in .NET using Levenshtein distance.
Start at 1 and you are going to have some ties and you need to decide
Then move to 2,3,4...
You could do in a CLR but how are you going to deal with ties? And you are going to have ties. How are you going to decide when it is not a match at all?
And I would put it in new column so you have a history of original data
Or a FK reference to customers table

Related

Strict Match Many to One on Lookup Table

This has been driving me and my team up the wall. I cannot compose a query that will strict match a single record that has a specific permutation of look ups.
We have a single lookup table
room_member_lookup:
room | member
---------------
A | Michael
A | Josh
A | Kyle
B | Kyle
B | Monica
C | Michael
I need to match a room with an exact list of members but everything else I've tried on stack overflow will still match room A even if I ask for a room with ONLY Josh and Kyle
I've tried queries like
SELECT room FROM room_member_lookup
WHERE member IN (Josh, Michael)
GROUP BY room
HAVING COUNT(1) = 2
However this will still return room A even though that has 3 members I need a exact member permutation and that matches the room even not partials.
SELECT room
FROM room_member_lookup a
WHERE member IN ('Monica', 'Kyle')
-- Make sure that the room 'a' has exactly two members
and (select count(*)
from room_member_lookup b
where a.room=b.room)=2
GROUP BY room
-- and both members are in that room
HAVING COUNT(1) = 2
Depending on the SQL dialect, one can build a dynamic table (CTE or select .. union all) to hold the member set (Monica and Kyle, for example), and then look for set equivalence using MINUS/EXCEPT sql operators.

Merge SQL Rows in Subquery

I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.

SQL Query - More options and suggestions apart from pivoting

New to SQL please dont mind if this is a silly question..
My table looks like this
I want to send only one email to manager telling him that these employees in your group failed to fill timesheet.
currently i have pivoted the above table that looks like this
and sending emails by concatinating firstemp+secondemp+thirdemp+------
can this be done in any more easiest way..?
You can use CONCAT() function to retrieve a single row data in one column
SELECT M_EMAIL,
CONCAT(FIRSTEMP, SECONDEMP, THIRDEMP, FOURTHEMP, FIFTHEMP...)
FROM 'your_table';
CONCAT() replaces NULL values with an empty string.
Please don't pivot, as the concat is really ugly to maintain (and will break if a more capable manager pops up with more subordinates than your concat columns).
The syntax depends on what SQL server you use. For example, in MSSQL you could do this:
select manager, m_email, STRING_AGG(employee, ', ') as subordinates
from Employee
group by manager, m_email
This result has only 1 row per manager and fixed number of columns regardless how many subordinates the manager has:
manager | m_email | subordinates
----------------------------------
A | A#A.COM | b, D
D | D#D.COM | e, h
I | I#I.COM | j
Play with the example here: http://sqlfiddle.com/#!18/73bb5/5
Another option is just query relevant data to application layer and do the grouping there.

<SQL>Extract data from table which is txt format (contain 1 fields only)

In our exsiting system we have a table to store the reported user info (let's say UserInfoRaw), this table contains 1 fields (detail) only , the sample data would be sth like below:
^StartNewUser
^UserName
Simon
^EnableFacebook
Y
^EnableTwitter
N
^EndNewUser
^StartNewUser
^UserName
Vicky
^EnableFacebook
N
^EndNewUser
Currently we need to convert this format to a query-able table, lets say "User-info" which contain below 3 fields , the output should be
UserName facebook twitter
==================================================
Simon Y N
Vicky N N
The Constraint is
1. I do know the tage fields i need to extract (said , ^EnableFacebook is a tag name that i know , can use for selection)
2. We're extract by user level, for each user they MUST have ^StartNewUser/^EndNewUser in the txt , this is pre-assumption.
3. The attribute tag may not exist for some cases (eg , Vicky's ^EnableTwitter tag) , it should treat this field as N while extraction.
4. We can use pure SQL here only as this is an interim solution for our MI Team , they can run SQL only and currently we can't do any program release to automate this process at this moment.
Currently we have come up a solution by RRN but need many interim table
1: Produce OUT1 which include the row number for start/end for each user
SELECT RRN(A) ,detail from UserInfoRaw A where A.detail in ('^StartNewUser' , '^EndNewUser')
OUT1 :
1 ^StartNewUser
8 ^EndNewUser
9 ^StartNewUser
14 ^EndNewUser
2: Produce OUT2 which include the row number for user name
Select RRN(A) ,detail from UserInfoRaw A where RRN(A) IN
(select RRN(B)+1 from UserInfoRaw B where B.detail = '^UserName')
OUT2 :
3 Simon
11 Vicky
3: Produce JOIN12 which include the row mapping for ^StartNewUser / ^UserName
Select MAX(A.row) as startRow , B.row as nameRow from OUT1 A,OUT2 B
where A.detail = '^StartNewUser' AND A.row <B.row
GROUP BY B.row
order by A.row
JOIN12 :
1 3
9 11
4: Join 3 table by the row of ^startNewUser to get the 1 field mapping
Select C.startRow ,A.detail , C.nameRow,B.detail from OUT1 A, OUT2 B, JOIN12 C where A.row=C.startRow and B.row=C.nameRow
Result :
1 ^startNewUser 3 Simon
9 ^startNewUser 11 Vicky
By this approach we can produce a 1 field mapping , and using similar step we can get all the result field table we want.
But we have 10+ fields to extract (mayeb more if business request) , so we're seek help here to see if we have better idea for this case. Thanks!
(ps: if you're a AS400 guy and you know how to produce result by wrkqry that would be the best :) you know what MI team im referring to... really mess..)
By your own admission, you have a text file, not a table.
SQL wasn't designed to deal with files (text or otherwise), it was designed to deal with databases, containing tables and relations.
Therefore, don't use SQL statements for this, process it as a text file. It's going to be faster and simpler. My default assumption is that a traditional RPGLE program doing READ is going to beat any attempt to do this sort of processing in SQL, simply because this is the type of workload it was designed for. Or use any other language that can process files.
(It would be easier to do this in SQL with a stored procedure and firing up a cursor, but it would be unwieldy at best, because SQL lacks many of the features makes doing this in a 'normal' programming language so much easier, like local, private functions)
tl;dr
I have a hammer, what's the best way to use it to cut a board in half?

Custom Sort Order in CAML Query

How would one go about telling a CAML query to sort the results in a thoroughly custom order?
.
For instance, for a given field:
-- when equal to 'Chestnut' at the top,
-- then equal to 'Zebra' next,
-- then equaling 'House'?
Finally, within those groupings, sort on a second condition (such as 'Name'), normally ascending.
So this
ID Owns Name
————————————————————
1 Zebra Sue
2 House Jim
3 Chestnut Sid
4 House Ken
5 Zebra Bob
6 Chestnut Lou
becomes
ID Owns Name
————————————————————
6 Chestnut Lou
3 Chestnut Sid
5 Zebra Bob
1 Zebra Sue
2 House Jim
4 House Ken
In SQL, this can be done with Case/When. But in CAML? Not so much!
CAML does not have such a sort operator by my knowledge. The workaround might be that you add a calculated column to the list with a number datatype and formula
=IF(Owns="Chestnut",0,IF(Owns="Zebra",1,IF(Owns="House",3,999))).
Now it is possible to order on the calculated column, which translates the custom sort order to numbers. Another solution is that you create a second list with the items to own, and a second column which contains their sort order. You can link these two lists and order by the item list sort order. The benefit is that a change in the sort order is as easy as editing the respective listitems.