How would I write this SQL query? - sql

I have the following tables:
PERSON_T DISEASE_T DRUG_T
========= ========== ========
PERSON_ID DISEASE_ID DRUG_ID
GENDER PERSON_ID PERSON_ID
NAME DISEASE_START_DATE DRUG_START_DATE
DISEASE_END_DATE DRUG_END_DATE
I want to write a query that takes an input of a disease id and returns one row for each person in the database with a column for the gender, a column for whether or not they have ever had the disease, and a column for each drug which specifies if they took the drug before contracting the disease. I.E. true would mean drug_start_date < disease_start_date. False would mean drug_start_date>disease_start_date or the person never took that particular drug.
We currently pull all of the data from the database and use Java to create a 2D array with all of these values. We are investigating moving this logic into the database. Is it possible to create a query that will return the result set as I want it or would I have to create a stored procedure? We are using Postgres, but I assume an SQL answer for another database will easily translate to Postgres.

Based on the info provided:
SELECT p.name,
p.gender,
CASE WHEN d.disease_id IS NULL THEN 'N' ELSE 'Y' END AS had_disease,
dt.drug_id
FROM PERSON p
LEFT JOIN DISEASE d ON d.person_id = p.person_id
AND d.disease_id = ?
LEFT JOIN DRUG_T dt ON dt.person_id = p.person_id
AND dt.drug_start_date < d.disease_start_date
..but there's going to be a lot of rows that will look duplicate except for the drug_id column.

You're essentially looking to create a cross-tab query with the drugs. While there are plenty of OLAP tools out there that can do this sort of thing (among all sorts of other slicing and dicing of the data), doing something like this in traditional SQL is not easy (and, in general, impossible to do without some sort of procedural syntax in all but the simplest scenarios).
You essentially have two options when doing this with SQL (well, more accurately, you have one option, and another more complicated but flexible option that derives from it):
Use a series of CASE statements in your query to produce columns that are representative of each individual drug. This requires knowing the list of variable values (i.e. drugs) ahead of time
Use a procedural SQL language, such as T-SQL, to dynamically construct a query that uses case statements as described above, but along with obtaining that list of values from the data itself.
The two options essentially do the same thing, you're just trading simplicity and ease of maintenance for flexibility in the second option.
For example, using option 1:
select
p.NAME,
p.GENDER,
(case when d.DISEASE_ID is null then 0 else 1 end) as HAD_DISEASE,
(case when sum(case when dr.DRUG_ID = 1 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_1,
(case when sum(case when dr.DRUG_ID = 2 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_2,
(case when sum(case when dr.DRUG_ID = 3 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_3
from PERSON_T p
left join DISEASE_T d on d.PERSON_ID = p.PERSON_ID and d.DISEASE_ID = #DiseaseId
left join DRUG_T dr on dr.PERSON_ID = p.PERSON_ID and dr.DRUG_START_DATE < d.DISEASE_START_DATE
group by p.PERSON_ID, p.NAME, p.GENDER, d.DISEASE_ID
As you can tell, this gets a little laborious as you get outside of just a few potential values.
The other option is to construct this query dynamically. I don't know PostgreSQL and what, if any, procedural capabilities it has, but the overall procedure would be this:
Gather list of potential DRUG_ID values along with names for the columns
Prepare three string values: the SQL prefix (everything before the first drug-related CASE statement, the SQL stuffix (everything after the last drug-related CASE statement), and the dynamic portion
Construct the dynamic portion by combining drug CASE statements based upon the previously retrieved list
Combine them into a single (hopefully valid) SQL statement and execute

Related

SQL query question. Extracting data met for one of two conditions but not both

I'm extracting student data who have completed a list of courses for degree requirements. One of the courses on the list is equivalent to another course, so if a student completes both equivalent courses, it can only be counted once towards a degree. I need to extract data on students who completed the list of courses, while filtering for just one of the equivalent courses.
Where am I going wrong?
I've tried different OR and AND NOT clauses but I can't seem to get the result that I need
use coll18_live
select ENR_STUDENT_ID, ENR_TERM, CRS_NAME, ENR_GRADE
from dbo.CA320_ENROLLMENT_VIEW_N03
WHERE ENR_CENSUS_REG_FLAG = 'Y'
and ENR_TERM in ('14/FA', '15/SP')
and not (CRS_NAME = 'BUSI-105' and CRS_NAME = 'ENGL-120')
and CRS_NAME in ('ACCT-120', 'ACCT-125', 'BUSI-100', 'BUSI-103', 'BUSI-105', 'ENGL-120')
I expect the output to show students who completed ACCT-120, ACCT-12, BUSI-100, BUSI-103, and BUSI-105 or ENGL-120 (but not both BUSI-105 or ENGL-120)
I think you want aggregating with a having clause. You cannot do this with a WHERE, because the information you want is (apparently) in different rows:
select ENR_STUDENT_ID
from dbo.CA320_ENROLLMENT_VIEW_N03
where ENR_CENSUS_REG_FLAG = 'Y' AND
ENR_TERM in ('14/FA', '15/SP')
group by ENR_STUDENT_ID
having sum(case when CRS_NAME in ('ACCT-120', 'ACCT-125', 'BUSI-100', 'BUSI-103') then 1 else 0 end) = 4 and
sum(case when CRS_NAME in ('BUSI-105', 'ENGL-120') then 1 else 0 end) > 0;

SQL SUM function doubling the amount it should using multiple tables

My query below is doubling the amount on the last record it returns. I have 3 tables - activities, bookings and tempbookings. The query needs to list the activities and attached information and pull the total number (using the SUM) of places booked (as BookingTotal) from the booking table by each activity and then it needs to calculate the same for tempbookings (as tempPlacesReserved) providing the reservedate field inside that table is in the future.
However the first issue is that if there are no records for an activity in the tempbookings table it does not return any records for that activity at all, to get around this i created dummy records in the past so that it still returns the record, but if I can make it so I don't have to do this I would prefer it!
The main issue I have is that on the final record of the returned results it doubles the booking total and the places reserved which of course makes the whole query useless.
I know that I am doing something wrong I just haven't been able to sort it, I have searched similar issues online but am unable to apply them to my situation correctly.
Any help would be appreciated.
P.S. I'm aware that normally you wouldn't need to fully label all the paths to the databases, tables and fields as I have but for the program I am planning to use it in I have to do it this way.
Code:
SELECT [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice],
SUM([LeisureActivities].[dbo].[bookings].[bookingPlaces]) AS 'bookingTotal',
SUM (CASE WHEN[LeisureActivities].[dbo].[tempbookings].[tempReserveDate] > GetDate() THEN [LeisureActivities].[dbo].[tempbookings].[tempPlaces] ELSE 0 end) AS 'tempPlacesReserved'
FROM [LeisureActivities].[dbo].[activities],
[LeisureActivities].[dbo].[bookings],
[LeisureActivities].[dbo].[tempbookings]
WHERE ([LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[bookings].[activityID]
AND [LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[tempbookings].[tempActivityID])
AND [LeisureActivities].[dbo].[activities].[activityDate] > GetDate ()
GROUP BY [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice];
Your current query is using an INNER JOIN between each of the tables so if the tempBookings table has no records, you will not return anything.
I would advise that you start to use JOIN syntax. You might also need to use subqueries to get the totals.
SELECT a.[activityID],
a.[activityName],
a.[activityDate],
a.[activityPlaces],
a.[activityPrice],
coalesce(b.bookingTotal, 0) bookingTotal,
coalesce(t.tempPlacesReserved, 0) tempPlacesReserved
FROM [LeisureActivities].[dbo].[activities] a
LEFT JOIN
(
select activityID,
SUM([bookingPlaces]) AS bookingTotal
from [LeisureActivities].[dbo].[bookings]
group by activityID
) b
ON a.[activityID]=b.[activityID]
LEFT JOIN
(
select tempActivityID,
SUM(CASE WHEN [tempReserveDate] > GetDate() THEN [tempPlaces] ELSE 0 end) AS tempPlacesReserved
from [LeisureActivities].[dbo].[tempbookings]
group by tempActivityID
) t
ON a.[activityID]=t.[tempActivityID]
WHERE a.[activityDate] > GetDate();
Note: I am using aliases because it is easier to read
Use new SQL-92 Join syntax, and make join to tempBookings an outer join. Also clean up your sql with table aliases. Makes it easier to read. As to why last row has doubled values, I don't know, but on off chance that it is caused by extra dummy records you entered. get rid of them. That problem is fixed by using outer join to tempBookings. The other possibility is that the join conditions you had to the tempBookings table(t.tempActivityID = a.activityID) is insufficient to guarantee that it will match to only one record in activities table... If, for example, it matches to two records in activities, then the rows from Tempbookings would be repeated twice in the output, (causing the sum to be doubled)
SELECT a.activityID, a.activityName, a.activityDate,
a.activityPlaces, a.activityPrice,
SUM(b.bookingPlaces) bookingTotal,
SUM (CASE WHEN t.tempReserveDate > GetDate()
THEN t.tempPlaces ELSE 0 end) tempPlacesReserved
FROM LeisureActivities.dbo.activities a
Join LeisureActivities.dbo.bookings b
On b.activityID = a.activityID
Left Join LeisureActivities.dbo.tempbookings t
On t.tempActivityID = a.activityID
WHERE a.activityDate > GetDate ()
GROUP BY a.activityID, a.activityName,
a.activityDate, a.activityPlaces,
a.activityPrice;

sql - return all rows only if there is no field equal to A?

I'm a bit stuck. I don't have my current SQL handy, but I'll try to be clear. Say I have a table of student IDs along with a course number and their grade. I want to return students only with no grades equal to A. I don't want just rows that aren't equal to A, but I am trying to find all student IDs with no associated A grades, otherwise students with at least one A will not be returned.
How might I tackle this problem? I'm fairly new to SQL. Thanks
EDIT: Okay, now that I have the actual SQL I can drop my terrible analogy.
SELECT ALL
ITEMS.QTY_ONHAND AS QTY_ONHAND,
ITEMS.ITEM_ID AS ITEM_ID_I0,
MEDORDER.MO_STAT AS MOMO_STAT,
UPPER(ITEMS.ITEM_ID) AS ITEM_ID_IC,
ITEMS.RX_DISP AS RX_DISP,
OMNIS.OMNI_ID AS OMNI_ID,
MEDORDER.ITEM_ID AS MOITEM_ID
FROM ITEMS ITEMS,
MEDORDER MEDORDER,
OMNIS OMNIS,
"PATIENTS" PATIENTS
WHERE (ITEMS.OMNI_STID = OMNIS.OMNI_STID) AND
(PATIENTS.PAT_ID = MEDORDER.PAT_ID) AND
(OMNIS.AREA = PATIENTS.AREA) AND (UPPER(ITEMS.ITEM_NAME) LIKE '%Doxycycline%'
AND MEDORDER.ITEM_ID = UPPER(ITEMS.ITEM_ID)
AND MEDORDER.MO_STAT='A')
ORDER BY ITEMS.QTY_ONHAND DESC, OMNIS.OMNI_ID, MEDORDER.ITEM_ID
What I have here is a location, omni_id. My query is looking for every omni_id that has this item_id. This will return multiple results for each omni_id corresponding to multiple medorders for the same item_id. What I want to do, is only return omni_ids that have 0 results for any medorder.mo_stat equal to 'A'. For example, omni_id 3W will have 3 returns, wth medorder_mo_stat equal to A, C, and C. And another, 4S, with medorder.mo_stats of C, C, C. I only want my query to return that 4S group, because it has no medorder.mo_stats of 'A'.
Sorry about the bad explanation.. I'm multitasking a few projects right now and my SQL-fu is not very strong, I'm not sure how to implement the below solutions to get what I'm looking for.
Thanks a ton in advance
Based you on your update - I assume this may help, but I'm still not sure I understand your schema, so maybe I am missing something...
SELECT
OMNIS.OMNI_ID AS OMNI_ID,
COUNT(MEDORDER.ITEM_ID) AS A_COUNT
FROM
ITEMS
JOIN
OMNIS
ON (ITEMS.OMNI_STID = OMNIS.OMNI_STID)
JOIN
PATIENTS
ON (OMNIS.AREA = PATIENTS.AREA)
LEFT JOIN
MEDORDER
ON (MEDORDER.ITEM_ID = UPPER(ITEMS.ITEM_ID)
AND (PATIENTS.PAT_ID = MEDORDER.PAT_ID)
AND (MEORDER.MO_STAT = 'A')
WHERE
UPPER(ITEMS.ITEM_NAME) LIKE '%Doxycycline%'
GROUP BY
OMNIS.OMNI_ID
HAVING
COUNT(MEDORDER.ITEM_ID) = 0
^EDIT^
Lost of ways to solve this one. I would do this:
SELECT
StudentID,
MAX(CASE Grade WHEN 'A' THEN 1 ELSE 0 END) AS HasAnA
FROM
MyTable
GROUP BY
StudentID
HAVING
MAX(CASE Grade WHEN 'A' THEN 1 ELSE 0 END) = 0
Or even just this depending on your table design:
SELECT DISTINCT
StudentID
FROM
MyTable
WHERE
Grade != 'A'

Speed up SQLite query, can I do it without a union?

Hi everybody of the stackoverflow community! I've been visiting this site for years and here comes my first post
Lets say I have a database with three tables:
groups (GroupID,GroupType,max1,size)
candies (candyID,name,selected)
members (groupID,nameID)
Example: The candy factory.
In the candy factory 10 types of candy bags are produced out of 80 different candies.
So: There are 10 unique group types(bags) with 3 different sizes: (4,5,6); a group is combination out of 80 unique candies.
Out of this I make a database, (with some rules about which candy combinations gets into a group).
At this point I have a database with 40791 unique candy bags.
Now I want to compare a collection of candies with all the candy bags in the DB, as a result I want the bags out of the DB which are missing 3 or less candies with the compare collection.
-- restore candy status
update candies set selected = 0, blacklisted = 0;
-- set status for candies to be selected
update candies set selected = 1 where name in ('candy01','candy02','candy03','candy04');
select groupId, GroupType, max, count(*) as remainingNum, group_concat(name,', ') as remaining
from groups natural join members natural join candies
where not selected
group by groupid having count(*) <= 3
UNION -- Union with groups which dont have any remaining candies and have a 100% match
select groupid, GroupType, max, 0 as remainingNum, "" as remaining
from groups natural join members natural join candies
where selected
group by groupid having count(*) =groups.size;
The above query does this. But the thing I am trying to accomplish is to do this without the union, because speed is of the essence. And also I am new to sql and are very eager to learn/see new methods.
Greetings, Rutger
I'm not 100% sure about what you are accomplishing through these queries, so I haven't looked at a fundamentally different approach. If you can include example data to demonstrate your logic, I can have a look at that. But, in terms of simply combining your two queries, I can do that. There is a note of caution first, however...
SQL is compiled in to query plans. If the query plan for each query is significantly different from the other, combining them into a single query may be a bad idea. What you may end up with is a single plan that works for both cases, but is not very efficient for either. One poor plan can be a lot worse than two good plans => Shorter, more compact, code does not always give faster code.
You can put selected in to your GROUP BY instead of your WHERE clause; the fact that you have two UNIONed queries shows that you are treating them as two separate groups already.
Then, the only difference between your queries is the filter on count(*), which you can accommodate with a CASE WHEN statement...
SELECT
groups.groupID,
groups.GroupType,
groups.max,
CASE WHEN Candies.Selected = 0 THEN count(*) ELSE 0 END as remainingNum,
CASE WHEN Candies.Selected = 0 THEN group_concat(candies.name,', ') ELSE '' END as remaining
FROM
groups
INNER JOIN
members
ON members.GroupID = groups.GroupID
INNER JOIN
candies
ON Candies.CandyID = members.CandyID
GROUP BY
Groups.GroupID,
Groups.GroupType,
Groups.max,
Candies.Selected
HAVING
CASE
WHEN Candies.Selected = 0 AND COUNT(*) <= 3 THEN 1
WHEN Candies.Selected = 1 AND COUNT(*) = Groups.Size THEN 1
ELSE 0
END
=
1
The layout changes are simply because I disagree with using NATURAL JOIN for maintenance reasons. They are a short-cut in initial build and a potential disaster in later development. But that's a different issue, you can read about it on line if you feel you want to.
Don't update the database when you're doing a select, your first update update candies set selected = 0, blacklisted = 0; will apply to the entire table, and rewrite every record. You should try without using selected and also changing your union to UNION ALL. Further to this, you try inner join instead of natural join (but I don't know your schema for candy to members)
select groupId, GroupType, max, count(*) as remainingNum, group_concat(name,', ') as remaining
from groups
inner join members on members.groupid = groups.groupid
inner join candies on candies.candyid = member.candyid
where name NOT in ('candy01','candy02','candy03','candy04')
group by groups.groupid
having count(*) <= 3
UNION ALL -- Union with groups which dont have any remaining candies and have a 100% match
select groupid, GroupType, max, 0 as remainingNum, "" as remaining
from groups
inner join members on members.groupid = groups.groupid
inner join candies on candies.candyid = member.candyid
where name in ('candy01','candy02','candy03','candy04')
group by groupid
having count(*) = groups.size;
This should at least perform better than updating all records in the table before querying it.

How to match people between separate systems using SQL?

I would like to know if there is a way to match people between two separate systems, using (mostly) SQL.
We have two separate Oracle databases where people are stored. There is no link between the two (i.e. cannot join on person_id); this is intentional. I would like to create a query that checks to see if a given group of people from system A exists in system B.
I am able to create tables if that makes it easier. I can also run queries and do some data manipulation in Excel when creating my final report. I am not very familiar with PL/SQL.
In system A, we have information about people (name, DOB, soc, sex, etc.). In system B we have the same types of information about people. There could be data entry errors (person enters an incorrect spelling), but I am not going to worry about this too much, other than maybe just comparing the first 4 letters. This question deals with that problem more specifically.
They way I thought about doing this is through correlated subqueries. So, roughly,
select a.lastname, a.firstname, a.soc, a.dob, a.gender
case
when exists (select 1 from b where b.lastname = a.lastname) then 'Y' else 'N'
end last_name,
case
when exists (select 1 from b where b.firstname = a.firstname) then 'Y' else 'N'
end first_name,
case [etc.]
from a
This gives me what I want, I think...I can export the results to Excel and then find records that have 3 or more matches. I believe that this shows that a given field from A was found in B. However, I ran this query with just three of these fields and it took over 3 hours to run (I'm looking in 2 years of data). I would like to be able to match on up to 5 criteria (lastname, firstname, gender, date of birth, soc). Additionally, while soc number is the best choice for matching, it is also the piece of data that tends to be missing the most often. What is the best way to do this? Thanks.
You definitely want to weigh the different matches. If an SSN matches, that's a pretty good indication. If a firstName matches, that's basically worthless.
You could try a scoring method based on weights for the matches, combined with the phonetic string matching algorithms you linked to. Here's an example I whipped up in T-SQL. It would have to be ported to Oracle for your issue.
--Score Threshold to be returned
DECLARE #Threshold DECIMAL(5,5) = 0.60
--Weights to apply to each column match (0.00 - 1.00)
DECLARE #Weight_FirstName DECIMAL(5,5) = 0.10
DECLARE #Weight_LastName DECIMAL(5,5) = 0.40
DECLARE #Weight_SSN DECIMAL(5,5) = 0.40
DECLARE #Weight_Gender DECIMAL(5,5) = 0.10
DECLARE #NewStuff TABLE (ID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(MAX), LastName VARCHAR(MAX), SSN VARCHAR(11), Gender VARCHAR(1))
INSERT INTO #NewStuff
( FirstName, LastName, SSN, Gender )
VALUES
( 'Ben','Sanders','234-62-3442','M' )
DECLARE #OldStuff TABLE (ID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(MAX), LastName VARCHAR(MAX), SSN VARCHAR(11), Gender VARCHAR(1))
INSERT INTO #OldStuff
( FirstName, LastName, SSN, Gender )
VALUES
( 'Ben','Stickler','234-62-3442','M' ), --3/4 Match
( 'Albert','Sanders','523-42-3441','M' ), --2/4 Match
( 'Benne','Sanders','234-53-2334','F' ), --2/4 Match
( 'Ben','Sanders','234623442','M' ), --SSN has no dashes
( 'Ben','Sanders','234-62-3442','M' ) --perfect match
SELECT
'NewID' = ns.ID,
'OldID' = os.ID,
'Weighted Score' =
(CASE WHEN ns.FirstName = os.FirstName THEN #Weight_FirstName ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN #Weight_LastName ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN #Weight_SSN ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN #Weight_Gender ELSE 0 END)
,
'RAW Score' = CAST(
((CASE WHEN ns.FirstName = os.FirstName THEN 1 ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN 1 ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN 1 ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN 1 ELSE 0 END) ) AS varchar(MAX))
+
' / 4',
os.FirstName ,
os.LastName ,
os.SSN ,
os.Gender
FROM #NewStuff ns
--make sure that at least one item matches exactly
INNER JOIN #OldStuff os ON
os.FirstName = ns.FirstName OR
os.LastName = ns.LastName OR
os.SSN = ns.SSN OR
os.Gender = ns.Gender
where
(CASE WHEN ns.FirstName = os.FirstName THEN #Weight_FirstName ELSE 0 END)
+
(CASE WHEN ns.LastName = os.LastName THEN #Weight_LastName ELSE 0 END)
+
(CASE WHEN ns.SSN = os.SSN THEN #Weight_SSN ELSE 0 END)
+
(CASE WHEN ns.Gender = os.Gender THEN #Weight_Gender ELSE 0 END)
>= #Threshold
ORDER BY ns.ID, 'Weighted Score' DESC
And then, here's the output.
NewID OldID Weighted Raw First Last SSN Gender
1 5 1.00000 4 / 4 Ben Sanders 234-62-3442 M
1 1 0.60000 3 / 4 Ben Stickler 234-62-3442 M
1 4 0.60000 3 / 4 Ben Sanders 234623442 M
Then, you would have to do some post processing to evaluate the validity of each possible match. If you ever get a 1.00 for weighted score, you can assume that it's the right match, unless you get two of them. If you get a last name and SSN (a combined weight of 0.8 in my example), you can be reasonably certain that it's correct.
Example of HLGEM's JOIN suggestion:
SELECT a.lastname,
a.firstname,
a.soc,
a.dob,
a.gender
FROM TABLE a
JOIN TABLE b ON SOUNDEX(b.lastname) = SOUNDEX(a.lastname)
AND SOUNDEX(b.firstname) = SOUNDEX(a.firstname)
AND b.soc = a.soc
AND b.dob = a.dob
AND b.gender = a.gender
Reference: SOUNDEX
I would probably use joins instead of correlated subqueries but you will have to join on all the fields, so not sure how much that might improve things. But since correlated subqueries often have to evaluate row-by-row and joins don't it could improve things a good bit if you have good indexing. But as with all performance tuning only trying the techinque will let you knw ofor sure.
I did a similar task looking for duplicates in our SQL Server system and I broke it out into steps. So first I found everyone where the names and city/state were an exact match. Then I looked for additional possible matches (phone number, ssn, inexact name match etc. AS I found a possible match between two profiles, I added it to a staging table with a code for what type of match found it. Then I assigned a confidence amount to each type of match and added up the confidence for each potential match. So if the SOC matches, you might want a high confidence, same if the name is eact and the gender is exact and the dob is exact. Less so if the last name is exact and the first name is not exact, etc. By adding a confidence, I was much better able to see which possible mathes were more likely to be the same person. SQl Server also has a soundex function which can help with names that are slightly different. I'll bet Oracle has something similar.
After I did this, I learned how to do fuzzy grouping in SSIS and was able to generate more matches with a higher confidence level. I don't know if Oracle's ETL tools havea a way to do fuzzy logic, but if they do it can really help with this type of task. If you happen to also have SQL Server, SSIS can be run connecting to Oracle, so you could use fuzzy grouping yourself. It can take a long time to run though.
I will warn you that name, dob and gender are not likely to ensure they are the same person especially for common names.
Are there indexes on all of the columns in table b in the WHERE clause? If not, that will force a full scan of the table for each row in table a.
You can use soundex but you can also use utl_match for fuzzy comparing of string, utl_match makes it possible to define a treshold: http://www.psoug.org/reference/utl_match.html