print name of parents who have children with same name - sql

PARENT table PK is PID and PID is FK in CHILDREN table. How do I print names of Parents from PARENT table that have children who share the same name as another child in CHILDREN table? I think a recursive join should be used to find the same name but I can't get it to work. I am able to join the PARENT and CHILDREN tables using below query:
select PARENT.NAME as ParentName
from PARENT inner join CHILDREN
on PARENT.PID=CHILDREN.PID
group by NAME;
I have tried this query to complete the recursive join but it isn't working:
select CHILDREN.NAME
from CHILDREN e, CHILDREN m
where e.CHILDREN.PID=m.CHILDREN.PID
order by CHILDREN.PID;

Group by child name and evaluate, if at least two different parents exist.
-- TEST DATA
with parent(pid, name) as
(select 1, 'Parent1' from dual
union all
select 2, 'Parent2' from dual
union all
select 3, 'Parent3' from dual
union all
select 4, 'Parent4' from dual),
children(name, pid) as
(select 'Tom', 1 from dual
union all
select 'Tim', 1 from dual
union all
select 'Steven', 2 from dual
union all
select 'Tim', 2 from dual
union all
select 'Marta', 2 from dual
union all
select 'Jess', 3 from dual
union all
select 'Jim', 4 from dual
union all
select 'Jess', 4 from dual)
--> SELECT
select c.name, listagg(p.name, ',') within group(order by p.name)
from parent p
join children c
on c.pid = p.pid
group by c.name -- group by child name
having min (p.pid) <> max (p.pid) -- at least two different parents
--> RESULT
Jess Parent3,Parent4
Tim Parent1,Parent2

AND Parent.Name = Children.Name
So wouldn't this print names of the parents where it matches with Children's names?

Related

How to get an accurate JOIN using Fuzzy matching in Oracle

I'm trying to join a set of county names from one table with county names in another table. The issue here is that, the county names in both tables are not normalized. They are not same in count; also, they may not be appearing in similar pattern always. For instance, the county 'SAINT JOHNS' in "Table A" may be represented as 'ST JOHNS' in "Table B". We cannot predict a common pattern for them.
That means , we cannot use "equal to" (=) condition while joining. So, I'm trying to join them using the JARO_WINKLER_SIMILARITY function in oracle.
My Left Outer Join condition would be like:
Table_A.State = Table_B.State
AND UTL_MATCH.JARO_WINKLER_SIMILARITY(Table_A.County_Name,Table_B.County_Name)>=80
I've given the measure 80 after some testing of the results and it seemed to be optimal.
Here, the issue is that I'm getting set of "false Positives" when joining. For instance, if there are some counties with similarity in names under the same state ("BARRY'and "BAY" for example), they will be matched if the measure is >=80.
This creates inaccurate set of joined data.
Can anyone please suggest some work around?
Thanks,
DAV
Can you plz help me to build a query that will lookup Table_A for each record in Table B/C/D, and match against the county name in A with highest ranked similarity that is >=80
Oracle Setup:
CREATE TABLE official_words ( word ) AS
SELECT 'SAINT JOHNS' FROM DUAL UNION ALL
SELECT 'MONTGOMERY' FROM DUAL UNION ALL
SELECT 'MONROE' FROM DUAL UNION ALL
SELECT 'SAINT JAMES' FROM DUAL UNION ALL
SELECT 'BOTANY BAY' FROM DUAL;
CREATE TABLE words_to_match ( word ) AS
SELECT 'SAINT JOHN' FROM DUAL UNION ALL
SELECT 'ST JAMES' FROM DUAL UNION ALL
SELECT 'MONTGOMERY BAY' FROM DUAL UNION ALL
SELECT 'MONROE ST' FROM DUAL;
Query:
SELECT *
FROM (
SELECT wtm.word,
ow.word AS official_word,
UTL_MATCH.JARO_WINKLER_SIMILARITY( wtm.word, ow.word ) AS similarity,
ROW_NUMBER() OVER ( PARTITION BY wtm.word ORDER BY UTL_MATCH.JARO_WINKLER_SIMILARITY( wtm.word, ow.word ) DESC ) AS rn
FROM words_to_match wtm
INNER JOIN
official_words ow
ON ( UTL_MATCH.JARO_WINKLER_SIMILARITY( wtm.word, ow.word )>=80 )
)
WHERE rn = 1;
Output:
WORD OFFICIAL_WO SIMILARITY RN
-------------- ----------- ---------- ----------
MONROE ST MONROE 93 1
MONTGOMERY BAY MONTGOMERY 94 1
SAINT JOHN SAINT JOHNS 98 1
ST JAMES SAINT JAMES 80 1
Using some made up test data inline (you would use your own TABLE_A and TABLE_B in place of the first two with clauses, and begin at with matches as ...):
with table_a (state, county_name) as
( select 'A', 'ST JOHNS' from dual union all
select 'A', 'BARRY' from dual union all
select 'B', 'CHEESECAKE' from dual union all
select 'B', 'WAFFLES' from dual union all
select 'C', 'UMBRELLAS' from dual )
, table_b (state, county_name) as
( select 'A', 'SAINT JOHNS' from dual union all
select 'A', 'SAINT JOANS' from dual union all
select 'A', 'BARRY' from dual union all
select 'A', 'BARRIERS' from dual union all
select 'A', 'BANANA' from dual union all
select 'A', 'BANOFFEE' from dual union all
select 'B', 'CHEESE' from dual union all
select 'B', 'CHIPS' from dual union all
select 'B', 'CHICKENS' from dual union all
select 'B', 'WAFFLING' from dual union all
select 'B', 'KITTENS' from dual union all
select 'C', 'PUPPIES' from dual union all
select 'C', 'UMBRIA' from dual union all
select 'C', 'UMBRELLAS' from dual )
, matches as
( select a.state, a.county_name, b.county_name as matched_name
, utl_match.jaro_winkler_similarity(a.county_name,b.county_name) as score
from table_a a
join table_b b on b.state = a.state )
, ranked_matches as
( select m.*
, rank() over (partition by m.state, m.county_name order by m.score desc) as ranking
from matches m
where score > 50 )
select rm.state, rm.county_name, rm. matched_name, rm.score
from ranked_matches rm
where ranking = 1
order by 1,2;
Results:
STATE COUNTY_NAME MATCHED_NAME SCORE
----- ----------- ------------ ----------
A BARRY BARRY 100
A ST JOHNS SAINT JOHNS 80
B CHEESECAKE CHEESE 92
B WAFFLES WAFFLING 86
C UMBRELLAS UMBRELLAS 100
The idea is matches computes all scores, ranked_matches assigns them a sequence within (state, county_name), and the final query picks all the top scorers (i.e. filters on ranking = 1).
You may still get some duplicates as there is nothing to stop two different fuzzy matches scoring the same.

Oracle hierachial query to select only the root parents

I have a tree data and I am trying to select only the root parents. The data may be subset of a larger set so it is possible that parent may not be empty. I would like to top most level for each tree in the data set.
with test_data as (
select '1' ID,'100' name, null parent from dual
union
select '2' ID,'200' name, null parent from dual
union
select '3' ID,'300' name, null parent from dual
union
select '1.1' ID,'1.100' name, '1' parent from dual
union
select '2.1' ID,'2.100' name, '2' parent from dual
union
select '3.1' ID,'3.100' name, '3' parent from dual
union
select '3.1.1' ID,'3.1.100' name, '3.1' parent from dual
union
select '3.1.2' ID,'3.1.2.100' name, '3.1' parent from dual
union
select '4.1' ID,'4.100' name, '4' parent from dual
union
select '4.1.1' ID,'4.1.100' name, '4.1' parent from dual
union
select '4.1.2' ID,'4.1.2.100' name, '4.1' parent from dual )
select * from test_data
start with parent is null
connect by parent=prior id
I would like to see results as
ID NAME PAR
----- --------- ---
1 100
2 200
3 300
4.1 4.100 4
Rowid 4 is not selected as part of subset is a parent, but since 4.1 is a top most in this data set, i would like to return that row. So basically, i would like to see all top most level records for each hierarchy.
Thank You.
One method is using not exists:
select id, name, parent
from test_data td
where not exists (select 1
from test_data td2
where td.parent = td2.id
);

'SELECT IN' with item list containing duplicates

I'm doing
SELECT Name WHERE Id IN (3,4,5,3,7,8,9)
where in this case the '3' Id is duplicated.
The query automatically excludes the duplicated items while for me would be important to get them all.
Is there a way to do that directly in SQL?
The query doesn't exclude duplicates, there just isn't any duplicates to exclude. There is only one record with id 3 in the table, and that is included because there is a 3 in the in () set, but it's not included twice because the 3 exists twice in the set.
To get duplicates you would have to create a table result that has duplicates, and join the table against that. For example:
select t.Name
from someTable t
inner join (
select id = 3 union all
select 4 union all
select 5 union all
select 3 union all
select 7 union all
select 8 union all
select 9
) x on x.id = t.id
Try this:
SELECT Name FROM Tbl
JOIN
(
SELECT 3 Id UNION ALL
SELECT 4 Id UNION ALL
SELECT 5 Id UNION ALL
SELECT 3 Id UNION ALL
SELECT 7 Id UNION ALL
SELECT 8 Id UNION ALL
SELECT 9 Id
) Tmp
ON tbl.Id = Tmp.Id

SQl Query : need to get the latest created data in the child records

I have a requirment in which I need to get the latest created data in the child records.
Suppose there are two tables A and B. A is parent and B is child. They have 1:M relation. Both has some columns and B table has one 'created date' column also which holds the created date of the record in table B.
Now, I need to write a query which can fetch all records from A table and it's latest created child record from B table. suppose If two child records are created today in table B for a parent record then the latest one out of them should get fetch.
One record of A table could have many childs, so how can we achive this.
Result should be - Columns of tbl A, Columns of tbl B(Latest created one)
I hope the 'created date' is a DATETIME column. This would give you the most recent child record. Assuming you have a consistent ID in the parent table with the same ParentID in the child table as a foreign key....
select A.*, B.*
from A
join B on A.ParentID = B.ParentID
join (
select ParentID, max([created date]) as [created date]
from B
group by ParentID
) maxchild on A.ParentID = maxchild.ParentID
where B.ParentID = maxchild.ParentID and B.[created date] = maxchild.[created date]
Below is the query that can help you out.
select x, y from ( select a.coloumn_TAB_A x, b.coloumn_TAB_B y from TableA a ,
TableB b where a.primary_key=b.primary_key
and a.Primary_key ='XYZ' order by b.created_date desc) where rownum < 2
Here we have two tables A and B, Joined them based on primary keys, order them on created date column of Table B in Descending order.
Use this output as inline view for outer query and select whichever coloumn u want like x, y. where rownum < 2 (that will fetch the latest record of table B)
This is not the most efficient but will work (SQL Only):
SELECT [Table_A].[Columns], [Table_B].[Columns]
FROM [Table_A]
LEFT OUTER JOIN [Table_B]
ON [Table_B].ForeignKey = [Table_A].PrimaryKey
AND [Table_B].PrimaryKey = (SELECT TOP 1 [Table_B].PrimaryKey
FROM [Table_B]
WHERE [Table_B].ForeignKey = [Table_A].PrimaryKey
ORDER BY [Table_B].CREATIONDATE DESC)
You can use analytic functions to avoid hitting each table (or specifically B) more than once
Using CTEs to provide dummy data for A and B you can do this:
with A as (
select 1 as id from dual
union all select 2 from dual
union all select 3 from dual
),
B as (
select 1 as a_id, date '2012-01-01' as created_date, 'First for 1' as value
from dual
union all select 1, date '2012-01-02', 'Second for 1' from dual
union all select 1, date '2012-01-03', 'Third for 1' from dual
union all select 2, date '2012-02-01', 'First for 2' from dual
union all select 2, date '2012-02-03', 'Second for 2' from dual
union all select 3, date '2012-02-01', 'First for 3' from dual
union all select 3, date '2012-02-03', 'Second for 3' from dual
union all select 3, date '2012-02-05', 'Third for 3' from dual
union all select 3, date '2012-02-09', 'Fourth for 3' from dual
)
select id, created_date, value from (
select a.id, b.created_date, b.value,
row_number() over (partition by a.id order by b.created_date desc) as rn
from a
join b on b.a_id = a.id
)
where rn = 1
order by id;
ID CREATED_D VALUE
---------- --------- ------------
1 03-JAN-12 Third for 1
2 03-FEB-12 Second for 2
3 09-FEB-12 Fourth for 3
You can select any columns you want from A and B, but you'll need to alias them in the subquery if there are any with the same name in both tables.
You may also need to user rank() or dense_rank() instead of row_number to handle ties appropriately, if you can have child records with the same created date.

Filtering SQL Tree Query

if I have a tree query like the one below and I want to filter the Name = 'Son' and also select all of its parent records and so the result should yield the first 3 rows. How would I construct my query? I've read that I should use Common Table Expression (CTE) but I'm a newbie on CTE. Can anyone help me? Thanks.
select 1 AS id, NULL AS parent, 'God' AS name
UNION
select 2 AS id, 1 AS parent, 'Father' AS name
UNION
select 3 AS id, 2 AS parent, 'Son' AS name
UNION
select 4 AS id, NULL AS parent, 'Godmother' AS name
UNION
select 5 AS id, 4 AS parent, 'Mother' AS name
Sounds like you could store the tree in a table (or define a view using the SQL above), and then if you are using Oracle, you could use the CONNECT BY function to filter records.
Is this what you're looking for?
;with SomeCTE as
(
select *
from (
select 1 AS id, NULL AS parent, 'God' AS name
UNION
select 2 AS id, 1 AS parent, 'Father' AS name
UNION
select 3 AS id, 2 AS parent, 'Son' AS name
UNION
select 4 AS id, NULL AS parent, 'Godmother' AS name
UNION
select 5 AS id, 4 AS parent, 'Mother' AS name ) as a
)
select *
from SomeCTE a
left join SomeCTE b
on a.parent = b.id
left join SomeCTE c
on b.parent = c.id
where a.name = 'son'