Dynamic SQL: CASE expression in HAVING clause for SSRS dataset query - sql

One of my tables contains 6 bit flags:
tblDocumentFact.useCase1
tblDocumentFact.useCase2
tblDocumentFact.useCase3
tblDocumentFact.useCase4
tblDocumentFact.useCase5
tblDocumentFact.useCase6
The bit flags are used to restrict the returned data via a HAVING clause, for example:
HAVING tblDocumentFact.useCase4 = 1 /* '1' means 'True' */
That works in a static query. The query is for a dataset for a SQL Server Reporting Services report. Rather than have 6 reports, one per bit flag, I'd like to have 1 report with an #UserChoice input parameter. I'm trying to write a dynamic query to structure the HAVING clause in accordance with the #UserChoice parameter. I'm thinking that #UserChoice could be set to an integer value (1, 2, 3, 4, 5 or 6) when the user clicks a 1-of-6 option button. I've tried to do this via CASE expressions as shown below, but it doesn't work--the query returns no rows. What's the correct approach here?
HAVING (
(CASE WHEN #UserChoice =1 THEN 'dbo.tblDocumentFact.useCase1' END) = '1'
OR (CASE WHEN #UserChoice =2 THEN 'dbo.tblDocumentFact.useCase2' END) = '1'
OR (CASE WHEN #UserChoice =3 THEN 'dbo.tblDocumentFact.useCase3' END) = '1'
OR (CASE WHEN #UserChoice =4 THEN 'dbo.tblDocumentFact.useCase4' END) = '1'
OR (CASE WHEN #UserChoice =5 THEN 'dbo.tblDocumentFact.useCase5' END) = '1'
OR (CASE WHEN #UserChoice =6 THEN 'dbo.tblDocumentFact.useCase6' END) = '1'
)

You need to rephrase your logic slightly:
HAVING
(#UserChoice = 1 AND 'dbo.tblDocumentFact.useCase1' = '1') OR
(#UserChoice = 2 AND 'dbo.tblDocumentFact.useCase2' = '2') OR
(#UserChoice = 3 AND 'dbo.tblDocumentFact.useCase3' = '3') OR
(#UserChoice = 4 AND 'dbo.tblDocumentFact.useCase4' = '4') OR
(#UserChoice = 5 AND 'dbo.tblDocumentFact.useCase5' = '5') OR
(#UserChoice = 6 AND 'dbo.tblDocumentFact.useCase6' = '6');
A CASE expression can't be used in the way you were using it, because what follows THEN or ELSE has to be a literal value, not a logical condition.

To expand a bit on the comment under Tim's post, I think the reason it doesn't work out is because your cases are emitting strings containing column names not the values of columns
HAVING
CASE WHEN #UserChoice = 1 THEN dbo.tblDocumentFact.useCase1 END = 1
OR CASE WHEN #UserChoice = 2 THEN dbo.tblDocumentFact.useCase2 END = 1
...
It might even clean up to this:
HAVING
CASE #UserChoice
WHEN 1 THEN dbo.tblDocumentFact.useCase1
WHEN 2 THEN dbo.tblDocumentFact.useCase2
...
END = 1
The problem (I believe; in sql server at least, not totally sure about SSRS) is that when you say:
CASE WHEN #UserChoice = 1 THEN 'dbo.tblDocumentFact.useCase1' END = '1'
Your case when is emitting the literal string dbo.tblDocumentFact.useCase1 not the value of that column on that row. And of course this literal string is never equal to a literal string of 1
Overall I prefer Tim's solution; I think the query optimizer will more likely be able to use an index on the bit columns in that form, but be aware that use of ORs can cause sql server to ignore indexes; the DBAs at my old place frequently rewrote queries like:
SELECT * FROM Person WHERE FirstName = 'john' OR LastName = 'Smith'
Into this:
SELECT * FROM Person WHERE FirstName = 'john'
UNION
SELECT * FROM Person WHERE LastName = 'Smith'
Because the server wouldn't combine the index on FirstName and the other index on LastName when we used OR, but it would parallel execute using both indexes in the UNION form
Consider as an alternative, combining those bit flags into a single integer, either as a binary 2's complement (if you want to be able to say user choice 1 and 2 by searching for 3 or choice 2 and 4 and 6 by searching 42 [2^(2 -1) + 2^(4-1) + 2^(6-1)]) or just a straight int you can compare to #userChoice, and indexing it

Related

using case statement in a where clause

Hello I am missing something because my code errors.
select * from ##ScheduleDetail SD
left join ##HolidayFilterTbl HF on SD.Scheduledate = HF.Testdate
where (ScheduleDate = testdate)
and
(Case
when HF.IsHoliday = 1 then (overtime = 1 and makeup = 0)
else
(overtime = 0 and Makeup = 0)
end
)
and
DOW = 5
order by ActivityStartTime
I've attempted several combinations and each one errors at either the first equal sign or the second. What am I missing?
The branches of a case expression can only return values, not additional expressions to be evaluated in the where condition. You could, however, simulate this behavior with the and and or logical operators:
select *
from ##ScheduleDetail SD
left join ##HolidayFilterTbl HF on SD.Scheduledate = HF.Testdate
where (ScheduleDate = testdate) and
((HF.IsHoliday = 1 and overtime = 1 and makeup = 0) or
(overtime = 0 and Makeup = 0)) and
DOW = 5
order by ActivityStartTime
Note that you have makeup = 0 on both branches of the case expression in the question (or both sides of the or in the answer), so you could extract it out of it and simplify the condition a bit:
select *
from ##ScheduleDetail SD
left join ##HolidayFilterTbl HF on SD.Scheduledate = HF.Testdate
where ScheduleDate = testdate and
makeup = 0 and
((HF.IsHoliday = 1 and overtime = 1) or
overtime = 0) and
DOW = 5
order by ActivityStartTime
If you are still wanting to know how to utilize a CASE Statement Expression in a WHERE Clause the CASE Expression must be compared to a value as that is the syntax understood for conditions contained within a WHERE Clause. See below a mock example.
SELECT *
FROM ##ScheduleDetail SD
LEFT JOIN ##HolidayFilterTbl HF ON SD.Scheduledate = HF.Testdate
WHERE(ScheduleDate = testdate)
AND
/* If you wish to stick with using a CASE Expression within the WHERE Clause set the the CASE Expression equal to 'something'. I usually stick with 1 or 0 'true/false'.
| You simply have to create your own True/False evaluation. You can add your logic checks within a CASE Expression within
| the WHERE Clause and when your logic is TRUE THEN return 1. Basically you are saying when 1 = 1 then return Record.
*/
1 =
Case
WHEN HF.IsHoliday = 1 AND makeup = 0 THEN
CASE WHEN (overtime = 1 OR overtime = 0) THEN 1 END /* Return 1 here to evaluation to TRUE */
ELSE
0 /* You can add another CASE here if needed and when the condition you write in evaluations to 1 'true' return record */
END
AND
DOW = 5
ORDER BY ActivityStartTime;
There are a few reasons I've used CASE Expressions within a WHERE Clause over using AND/ORs. Just one minor reason is it allows me to contain and organize logic in a WHERE Clause inside CASE Expressions rather than having multiple AND/ORs all nested together. I've also found that using CASE Expressions in the WHERE Clause is useful when encountering Dynamic queries that accept variables to be later inserted into the SQL before being sent to the database for processing. In the case of using Dynamic SQL there are times when a CASE Statement MUST be used due to the fact that there could be data that is being compared against in the WHERE clause that is NOT a column.field value but a hardcoded value that is compared to perhaps a user selection or status (as examples)... it might be a static value passed in via the application which is how my web application works that I support which is why I bring it up.
Basically it's good to know how to use a CASE Expression in a WHERE Clause as there are some cases when the ONLY way to evaluate certain data is by using a CASE Expression .
I have no data to test this against and that's not the point. The point of my answer is to simply provide to you an alternative to the existing answer. In my opinion this logic is basic and the already provided answer is the correct one however my answer is to demonstrate how you could go about using a CASE in a WHERE Clause.
If interested see this SO Post for the differences between a CASE Statement vs a CASE Expression however know that this terminology slightly differs between databases.
As an example of this... SQL Server refers to these as Simple vs Searched but refers to all of it as a CASE Expression. Therefore a CASE Expression can either be a Simple or a Searched CASE that can be used within a Statement.

Is it possible to use AND in an UPDATE SET clause in a CASE statement?

I need to check two conditions:
1. when the function returns true
2. when the function returns true AND ISP_Program has the word "IRSS" in it
What is the correct syntax? I have the following:
UPDATE [PAYROLL].[dbo].[BILL]
SET Pay_Code = CASE dbo.is_Holiday([BILL].Date)
WHEN 1 THEN holiday_code
WHEN 1 AND ISP_Program like '%IRSS%' THEN '66'
ELSE Pay_Code
END
FROM tbl_TXEX_HOLIDAY
INNER JOIN [BILL] ON [BILL].Pay_Code = tbl_HOLIDAY.regular_code
I think you want:
SET Pay_Code = (CASE WHEN dbo.is_Holiday([BILL].Date) = 1 AND ISP_Program like '%IRSS%' THEN '66'
WHEN dbo.is_Holiday([BILL].Date) = 1 THEN holiday_code
ELSE Pay_Code
END)
Note that the ordering of these conditions is important.
I assume that BILL is the table referenced in the UPDATE. I would recommend writing the complete logic as:
UPDATE b
SET Pay_Code = (CASE WHEN dbo.is_Holiday(b.Date) = 1 AND ISP_Program like '%IRSS%' THEN '66'
WHEN dbo.is_Holiday(b.Date) = 1 THEN holiday_code
ELSE b.Pay_Code
END)
FROM [PAYROLL].[dbo].[BILL] b JOIN
tbl_TXEX_HOLIDAY h
ON b.Pay_Code = h.regular_code;
Notes:
Define aliases for the tables so the query is easier to write and to read.
Use the alias for the update, so it is clear what you intend.
Put the table being updated first. After all, it needs to have matching rows for the update to take place.
Of course, fix the case expression.

WHERE conditions being listed in a column if they are met

I have a file that i receive each morning which contains details of customers whos information doesnt meet certain criteria, i have built a script with many WHERE conditions that, if met, will show customers information and put them in a file but im having trouble finding out why they are wrong.
As i have many conditions in the where clause, is there a way to show which column has the incorrect information
For example i could have a table like this:
NAME|ADDRESS |PHONE|COUNTRY
John|123avenue |12345|UK
My conditions could be
SELECT * FROM CUSTOMERS
WHERE NAME LIKE 'J%'
AND LEFT(PHONE,1) = '1'
so it would show in the file as two conditions are met, but as i have over 80 rows and 40 conditions, its hard to look at each row and find out why its in their.
Is there a way i can add a column which will tell me which WHERE condition has been met?
As worded, no. You should reverse your logic. Add fields that show what's wrong, then use those fields in a WHERE clause.
SELECT
*,
CASE WHEN LEFT(phone, 1) = '1' THEN 1 ELSE 0 END AS phone_starts_with_1,
CASE WHEN LEFT(name, 1) = 'Z' THEN 1 ELSE 0 END AS name_starts_with_z
FROM
customers
WHERE
phone_starts_with_1 = 1
OR name_starts_with_z = 1
Depending on which dialect of SQL you use, you may need to nest this, such that the new fields are resolved before you can use them in the WHERE clause...
SELECT
*
FROM
(
SELECT
*,
CASE WHEN LEFT(phone, 1) = '1' THEN 1 ELSE 0 END AS phone_starts_with_1,
CASE WHEN LEFT(name, 1) = 'Z' THEN 1 ELSE 0 END AS name_starts_with_z
FROM
customers
)
checks
WHERE
phone_starts_with_1 = 1
OR name_starts_with_z = 1

T-SQL Sum Values of Like Rows

I currently use this select statement in SSRS to report Recent Demand and Days of Inventory to end users.
select Issue.MATERIAL_NUMBER,
SUM(Issue.SHIPPED_QTY)AS DEMAND_QTY,
Main.QUANTITY_TOTAL_STOCK / SUM(Issue.SHIPPED_QTY) * 122 AS [DOI]
From AGS_DATAMART.dbo.GOODS_ISSUE AS Issue
join AGS_DATAMART.dbo.OPR_MATERIAL_DIM AS MAT on MAT.MATERIAL_NUMBER = Issue.MATERIAL_NUMBER
join AGS_DATAMART.dbo.SCE_ECC_MAIN_FINAL_INV_FACT AS MAIN on MAT.MATERIAL_SID = MAIN.MATERIAL_SID
join AGS_DATAMART.dbo.SCE_PLANT_DIM AS PLANT on PLANT.PLANT_SID = MAIN.PLANT_SID
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTID
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
and PLANT.PLANT_CODE = #CUSTPLANT
and MAIN.STORAGE_LOCATION = '0001'
Group by Issue.MATERIAL_NUMBER,Main.QUANTITY_TOTAL_STOCK
Pretty Simple.
But is has come to my attention, that they have similar Material Numbers whos values need to be combined.
Material | Qty
0242-55161W 1
0242-55161 3
The two Material Numbers above should be combined and reported as 0242-55161 Qty 4.
How do I combine rows like this? This is just 1 of many queries that will need to be adjusted. Is it possible?
EDIT - The similar material will always be the base number plus the "W", if that matters.
Please note I am brand new to SQL and SSRS, and this is my first time posting here.
Let me know if I need to include any other details.
Thanks in advance.
Answer;
Using just replace, it kept returning 2 unique lines even when using SUM.
I was able to get the desired result using the following. Can you see anything wrong with this method?
with Issue_Con AS
(
select replace(Issue.MATERIAL_NUMBER,'W','') As [MATERIAL_NUMBER],
Issue.SHIPPED_QTY AS [SHIPPED_QTY]
From AGS_DATAMART.dbo.GOODS_ISSUE AS Issue
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTSHIP
and Issue.SALES_ORDER_TYPE_CODE = 'ZTPC'
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
)
select Issue_Con.MATERIAL_NUMBER,
SUM(Issue_Con.SHIPPED_QTY)AS [DEMAND_QTY],
Main_Con.QUANTITY_TOTAL_STOCK / SUM(Issue_Con.SHIPPED_QTY) * 122 AS [DOI]
From Issue_Con
join Main_Con on Main_Con.MATERIAL_Number = Issue_Con.MATERIAL_Number
Group By Issue_Con.MATERIAL_NUMBER, Main_Con.QUANTITY_TOTAL_STOCK;
You need to replace Issue.MATERIAL_NUMBER in the select and group by with something else. What that something else is depends on your data.
If it's always 10 digits with anything afterwards ignored, then you can use substr(Issue.MATERIAL_NUMBER, 1, 10)
If the extraneous character is always W and there are no Ws in the proper number, then you can use replace(Issue.MATERIAL_NUMBER, 'W', '')
If it's anything from the first alphabetic character, then you can use case when patindex('%[A-Za-z]%', Issue.MATERIAL_NUMBER) = 0 then Issue.MATERIAL_NUMBER else substr(Issue.MATERIAL_NUMBER, 1, patindex('%[A-Za-z]%', Issue.MATERIAL_NUMBER)) end
You could group your data by this expression instead of MATERIAL_NUMBER:
CASE SUBSTRING(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER), 1)
WHEN 'W' THEN LEFT(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER) - 1)
ELSE MATERIAL_NUMBER
END
That is, check if the last character is W. If it is, return all but the last character, otherwise return the entire value.
To avoid repeating the same expression twice (once in GROUP BY and once in SELECT) you could use a subselect, for example like this:
select Issue.MATERIAL_NUMBER_GROUP,
SUM(Issue.SHIPPED_QTY)AS DEMAND_QTY,
Main.QUANTITY_TOTAL_STOCK / SUM(Issue.SHIPPED_QTY) * 122 AS [DOI]
From (
SELECT
*,
CASE SUBSTRING(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER), 1)
WHEN 'W' THEN LEFT(MATERIAL_NUMBER, LEN(MATERIAL_NUMBER) - 1)
ELSE MATERIAL_NUMBER
END AS MATERIAL_NUMBER_GROUP
FROM AGS_DATAMART.dbo.GOODS_ISSUE
) AS Issue
join AGS_DATAMART.dbo.OPR_MATERIAL_DIM AS MAT on MAT.MATERIAL_NUMBER = Issue.MATERIAL_NUMBER
join AGS_DATAMART.dbo.SCE_ECC_MAIN_FINAL_INV_FACT AS MAIN on MAT.MATERIAL_SID = MAIN.MATERIAL_SID
join AGS_DATAMART.dbo.SCE_PLANT_DIM AS PLANT on PLANT.PLANT_SID = MAIN.PLANT_SID
Where Issue.SHIP_TO_CUSTOMER_ID = #CUSTID
and Issue.ACTUAL_PGI_DATE > GETDATE() - 122
and PLANT.PLANT_CODE = #CUSTPLANT
and MAIN.STORAGE_LOCATION = '0001'
Group by Issue.MATERIAL_NUMBER_GROUP,Main.QUANTITY_TOTAL_STOCK

How to match people between separate systems using SQL?

I would like to know if there is a way to match people between two separate systems, using (mostly) SQL.
We have two separate Oracle databases where people are stored. There is no link between the two (i.e. cannot join on person_id); this is intentional. I would like to create a query that checks to see if a given group of people from system A exists in system B.
I am able to create tables if that makes it easier. I can also run queries and do some data manipulation in Excel when creating my final report. I am not very familiar with PL/SQL.
In system A, we have information about people (name, DOB, soc, sex, etc.). In system B we have the same types of information about people. There could be data entry errors (person enters an incorrect spelling), but I am not going to worry about this too much, other than maybe just comparing the first 4 letters. This question deals with that problem more specifically.
They way I thought about doing this is through correlated subqueries. So, roughly,
select a.lastname, a.firstname, a.soc, a.dob, a.gender
case
when exists (select 1 from b where b.lastname = a.lastname) then 'Y' else 'N'
end last_name,
case
when exists (select 1 from b where b.firstname = a.firstname) then 'Y' else 'N'
end first_name,
case [etc.]
from a
This gives me what I want, I think...I can export the results to Excel and then find records that have 3 or more matches. I believe that this shows that a given field from A was found in B. However, I ran this query with just three of these fields and it took over 3 hours to run (I'm looking in 2 years of data). I would like to be able to match on up to 5 criteria (lastname, firstname, gender, date of birth, soc). Additionally, while soc number is the best choice for matching, it is also the piece of data that tends to be missing the most often. What is the best way to do this? Thanks.
You definitely want to weigh the different matches. If an SSN matches, that's a pretty good indication. If a firstName matches, that's basically worthless.
You could try a scoring method based on weights for the matches, combined with the phonetic string matching algorithms you linked to. Here's an example I whipped up in T-SQL. It would have to be ported to Oracle for your issue.
--Score Threshold to be returned
DECLARE #Threshold DECIMAL(5,5) = 0.60
--Weights to apply to each column match (0.00 - 1.00)
DECLARE #Weight_FirstName DECIMAL(5,5) = 0.10
DECLARE #Weight_LastName DECIMAL(5,5) = 0.40
DECLARE #Weight_SSN DECIMAL(5,5) = 0.40
DECLARE #Weight_Gender DECIMAL(5,5) = 0.10
DECLARE #NewStuff TABLE (ID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(MAX), LastName VARCHAR(MAX), SSN VARCHAR(11), Gender VARCHAR(1))
INSERT INTO #NewStuff
( FirstName, LastName, SSN, Gender )
VALUES
( 'Ben','Sanders','234-62-3442','M' )
DECLARE #OldStuff TABLE (ID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(MAX), LastName VARCHAR(MAX), SSN VARCHAR(11), Gender VARCHAR(1))
INSERT INTO #OldStuff
( FirstName, LastName, SSN, Gender )
VALUES
( 'Ben','Stickler','234-62-3442','M' ), --3/4 Match
( 'Albert','Sanders','523-42-3441','M' ), --2/4 Match
( 'Benne','Sanders','234-53-2334','F' ), --2/4 Match
( 'Ben','Sanders','234623442','M' ), --SSN has no dashes
( 'Ben','Sanders','234-62-3442','M' ) --perfect match
SELECT
'NewID' = ns.ID,
'OldID' = os.ID,
'Weighted Score' =
(CASE WHEN ns.FirstName = os.FirstName THEN #Weight_FirstName ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN #Weight_LastName ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN #Weight_SSN ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN #Weight_Gender ELSE 0 END)
,
'RAW Score' = CAST(
((CASE WHEN ns.FirstName = os.FirstName THEN 1 ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN 1 ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN 1 ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN 1 ELSE 0 END) ) AS varchar(MAX))
+
' / 4',
os.FirstName ,
os.LastName ,
os.SSN ,
os.Gender
FROM #NewStuff ns
--make sure that at least one item matches exactly
INNER JOIN #OldStuff os ON
os.FirstName = ns.FirstName OR
os.LastName = ns.LastName OR
os.SSN = ns.SSN OR
os.Gender = ns.Gender
where
(CASE WHEN ns.FirstName = os.FirstName THEN #Weight_FirstName ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN #Weight_LastName ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN #Weight_SSN ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN #Weight_Gender ELSE 0 END)
>= #Threshold
ORDER BY ns.ID, 'Weighted Score' DESC
And then, here's the output.
NewID OldID Weighted Raw First Last SSN Gender
1 5 1.00000 4 / 4 Ben Sanders 234-62-3442 M
1 1 0.60000 3 / 4 Ben Stickler 234-62-3442 M
1 4 0.60000 3 / 4 Ben Sanders 234623442 M
Then, you would have to do some post processing to evaluate the validity of each possible match. If you ever get a 1.00 for weighted score, you can assume that it's the right match, unless you get two of them. If you get a last name and SSN (a combined weight of 0.8 in my example), you can be reasonably certain that it's correct.
Example of HLGEM's JOIN suggestion:
SELECT a.lastname,
a.firstname,
a.soc,
a.dob,
a.gender
FROM TABLE a
JOIN TABLE b ON SOUNDEX(b.lastname) = SOUNDEX(a.lastname)
AND SOUNDEX(b.firstname) = SOUNDEX(a.firstname)
AND b.soc = a.soc
AND b.dob = a.dob
AND b.gender = a.gender
Reference: SOUNDEX
I would probably use joins instead of correlated subqueries but you will have to join on all the fields, so not sure how much that might improve things. But since correlated subqueries often have to evaluate row-by-row and joins don't it could improve things a good bit if you have good indexing. But as with all performance tuning only trying the techinque will let you knw ofor sure.
I did a similar task looking for duplicates in our SQL Server system and I broke it out into steps. So first I found everyone where the names and city/state were an exact match. Then I looked for additional possible matches (phone number, ssn, inexact name match etc. AS I found a possible match between two profiles, I added it to a staging table with a code for what type of match found it. Then I assigned a confidence amount to each type of match and added up the confidence for each potential match. So if the SOC matches, you might want a high confidence, same if the name is eact and the gender is exact and the dob is exact. Less so if the last name is exact and the first name is not exact, etc. By adding a confidence, I was much better able to see which possible mathes were more likely to be the same person. SQl Server also has a soundex function which can help with names that are slightly different. I'll bet Oracle has something similar.
After I did this, I learned how to do fuzzy grouping in SSIS and was able to generate more matches with a higher confidence level. I don't know if Oracle's ETL tools havea a way to do fuzzy logic, but if they do it can really help with this type of task. If you happen to also have SQL Server, SSIS can be run connecting to Oracle, so you could use fuzzy grouping yourself. It can take a long time to run though.
I will warn you that name, dob and gender are not likely to ensure they are the same person especially for common names.
Are there indexes on all of the columns in table b in the WHERE clause? If not, that will force a full scan of the table for each row in table a.
You can use soundex but you can also use utl_match for fuzzy comparing of string, utl_match makes it possible to define a treshold: http://www.psoug.org/reference/utl_match.html