I have run into this a couple of times where a client is able to import data into a catalog with parent child relationships and I run into problems with said relationships. I need to find a way to prevent the following:
Object 1 has a child of Object 2
Object 2 has a child of Object 3
Object 3 has a child of Object 1
This throws the server into an infinite recursive loop and ultimately brings it to its knees. I can't seem to wrap my head around a SQL query that I could use to detect such recursive madness. The problem is prevalent enough that I need to find some solution. I've tried queries using CTE, nested selects/sub-selects and just can't seem to write one that will solve this issue. Any help would be greatly appreciated.
with recursive parents as (
select
s.id,
s.parent_id,
1 as depth
from categories s
where s.id = <passed in id>
union all
select
t.id,
t.parent_id,
c.depth + 1 as depth
from categories t
inner join parents c
on t.id = c.parent_id
where t.id <> t.parent_id)
select distinct parent_id from parents where parent_id <> 0 order by depth desc
This is what I finally came up with to "detect" a cycle condition
with recursive find_cycle as (
select
categories_id,
parent_id,
0 depth
from
categories
where categories_id = <passed in id>
union all
select
f.categories_id,
c.parent_id,
f.depth + 1
from
categories c
inner join find_cycle f
ON f.parent_id = c.categories_id
where c.parent_id <> c.categories_id
and f.parent_id <> f.categories_id
)
select
f.parent_id as categories_id,
c.parent_id
from find_cycle f
inner join categories c
on f.parent_id = c.categories_id
where exists (
select
1
from find_cycle f
inner join categories c
on f.parent_id = c.categories_id
where f.parent_id = <passed in id>)
order by depth desc;
It will return rows with the offending path and no rows if no cycle detected. Thanks for all the tips folks.
Here is the MariaDB function I came up with that will return 0 if there is not a cycle and 1 if there is a cycle for the id passed in to the function.
create function `detect_cycle`(id int, max_depth int) RETURNS tinyint(1)
begin
declare cycle_exists int default 0;
select (case when count(*) = 1 then 0 else 1 end) into cycle_exists
from
(
with recursive find_cycle as (
select
categories_id,
parent_id,
0 depth
from
categories
where categories_id = id
union all
select
f.categories_id,
c.parent_id,
f.depth + 1
from
categories c
inner join find_cycle f
ON f.parent_id = c.categories_id
where
c.parent_id <> c.categories_id
and f.parent_id <> f.categories_id
and f.depth < max_depth
)
select
c.parent_id
from find_cycle f
inner join categories c
on f.parent_id = c.categories_id
order by depth desc
limit 1
) __temp
where parent_id = 0;
return cycle_exists;
end;
This can then be called by executing
select categories_id, detect_cycle(categories_id, 5) as cycle_exists
from categories
where categories_id = <whatever id you want to check for a cycle condition>;
Here is a stored procedure that will accomplish the same thing but is generic enough to handle any table, id column, parent column combination.
CREATE PROCEDURE `detect_cycle`(table_name varchar(64), id_column varchar(32), parent_id_column varchar(32), max_depth int)
BEGIN
declare id int default 0;
declare sql_query text default '';
declare where_clause text default '';
declare done bool default false;
declare id_cursor cursor for select root_id from __temp_ids;
declare continue handler for not found set done = true;
drop temporary table if exists __temp_ids;
create temporary table __temp_ids(root_id int not null primary key);
set sql_query = concat('
insert into __temp_ids
select
`',id_column,'`
from ',table_name);
prepare statement from sql_query;
execute statement;
drop temporary table if exists __temp_cycle;
create temporary table __temp_cycle (id int not null, parent_id int not null);
open id_cursor;
id_loop: loop
fetch from id_cursor into id;
if done then
leave id_loop;
end if;
set where_clause = concat('where `',id_column,'` = ',id);
set sql_query = concat('
insert into __temp_cycle
select
t.`',id_column,'`,
t.`',parent_id_column,'`
from
(
with recursive find_cycle as (
select
`',id_column,'`,
`',parent_id_column,'`,
0 depth
from
`',table_name,'`
',where_clause,'
union all
select
f.`',id_column,'`,
c.`',parent_id_column,'`,
f.depth + 1
from
`',table_name,'` c
inner join find_cycle f
ON f.`',parent_id_column,'` = c.`',id_column,'`
where
c.`',parent_id_column,'` <> c.`',id_column,'`
and f.`',parent_id_column,'` <> f.`',id_column,'`
and f.depth < ',max_depth,'
)
select
c.`',id_column,'`,
c.`',parent_id_column,'`
from find_cycle f
inner join `',table_name,'` c
on f.`',parent_id_column,'` = c.`',id_column,'`
order by depth desc
limit 1
) t
where t.`',parent_id_column,'` > 0');
prepare statement from sql_query;
execute statement;
end loop;
close id_cursor;
deallocate prepare statement;
select distinct
*
from __temp_cycle;
drop temporary table if exists __temp_ids;
drop temporary table if exists __temp_cycle;
END
usage:
call detect_cycle(table_name, id_column, parent_id_column, max_depth);
This will return a result set of all cycle conditions within the given table.
Looks like you have this figured out to stop a cycling event but are looking for ways to identify a cycle. In that case, consider using a path:
with recursive parents as (
select
s.id,
s.parent_id,
1 as depth,
CONCAT(s.id,'>',s.parent_id) as path,
NULL as cycle_detection
from categories s
where s.id = <passed in id>
union all
select
t.id,
t.parent_id,
c.depth + 1 as depth,
CONCAT(c.path, '>', t.parent_id),
CASE WHEN c.path LIKE CONCAT('%',t.parent_id,'>%') THEN 'cycle' END
from categories t
inner join parents c
on t.id = c.parent_id
where t.id <> t.parent_id)
select distinct parent_id, cycle_detection from parents where parent_id <> 0 order by depth desc
I may be a bit off my syntax since it's been forever since I wrote mysql/mariadb syntax, but this is the basic idea. Capture the path that the recursion took and then see if your current item is already in the path.
If the depth of the resulting tree is not extremely deep then you can detect cycles by storing the bread crumbs that the recursive CTE is walking. Knowing the bread crumbs you can detect cycles easily.
For example:
with recursive
n as (
select id, parent_id, concat('/', id, '/') as path
from categories where id = 2
union all
select c.id, c.parent_id, concat(n.path, c.id, '/')
from n
join categories c on c.parent_id = n.id
where n.path not like concat('%/', c.id, '/%') -- cycle pruning here!
)
select * from n;
Result:
id parent_id path
--- ---------- -------
2 1 /2/
3 2 /2/3/
1 3 /2/3/1/
See running example at DB Fiddle.
I have below data in two different tables. I am able to retrieve required records using select query, however I am unable to update ROW_IND of these records. The update statement I used gives me error. Any pointers will be much appreciated.
Table CLP :
ID KEY EFF_DT ROW_IND
28420000000006 4599 1/1/2000 1
28420000000006 21164 10/16/2019 1
28420000000011 58429 1/1/2000 1
28420000000011 68434 10/16/2019 1
Table CPI :
KEY2 ID2
21164 28420000000006
68434 28420000000011
The Query :
SELECT p.id , p.key, i.key AS KEY2, i.id AS ID2, p.EFF_DT, p.row_ind
FROM CLP P, CLI I
WHERE p.id = i.id
AND P.KEY <> I.KEY
AND p.row_ind = 1
AND P.id IN
(
SELECT id
FROM CLP
WHERE row_ind = 1
GROUP BY id
HAVING count(*) > 1
);
ID KEY KEY2 ID2 EFF_DT ROW_IND
28420000000006 4599 21164 28420000000006 1/1/2000 1
28420000000011 58429 68434 28420000000011 1/1/2000 1
The Update Query:
UPDATE
(
< The Above SELECT Query >
) A
SET A.row_ind = 0
Error: ORA-01779: cannot modify a column which maps to a non key-preserved table
This syntax for an UPDATE statement cannot be used within Oracle while might be used within MySQL. Alternatively, you can try to use a MERGE statement :
MERGE INTO CLP t
USING
(
SELECT p.id , p.key, i.key AS KEY2, i.id AS ID2, p.EFF_DT, p.row_ind
FROM CLP p
JOIN CPI i
ON p.id = i.id
WHERE p.key <> i.key
AND p.row_ind = 1
AND P.id in
(
SELECT id
FROM CLP
WHERE row_ind = p.row_ind
GROUP BY id
HAVING count(*) > 1
)
) tt
ON (tt.key2 = t.key)
WHEN MATCHED THEN UPDATE SET t.row_ind = 0;
where
you'd better converting the old-style SELECT statement to the
syntax containing explicit JOIN keywords instead of comma-seperated
JOINs
no need to repeat the condition row_ind = 1 twice, replacing the
second one with row_ind = p.row_ind is more preferable
Demo
How to look in Table C for those inspectors who have got ParentID but not child.
Table A has both parent and child data. Parent ID 0 is for parents and child has their parent ID.
In Table C, one inspector can have many parents and many childs.
I need to run a query to look for those inspectors who have got parents but not child.
Table A Table B Table C
-------- ------- -------
DisciplineID(PK) InspectorID(PK) ID (PK)
ParentID DisciplineID(FK)
InspectorID (Fk)
Table A Table C
In above mentioned data, Inspector 7239 and 7240 only have parent but not child. So query should return those two not 7242 because he has both parent and childs.
Use EXISTS and NOT EXISTS:
SELECT c.ID, c.InspectorID, c.DisciplineID
FROM dbo.TableC c
WHERE EXISTS
(
SELECT 1 FROM dbo.TableA a
WHERE a.DisciplineID = c.DisciplineID
AND a.ParentID = 0 -- parent exists
)
AND NOT EXISTS
(
SELECT 1 FROM dbo.TableC c2
WHERE c.InspectorID = c2.InspectorID
AND c.ID <> c2.ID -- look for another record with this InspectorID
AND EXISTS
(
SELECT 1 FROM dbo.TableA a
WHERE a.DisciplineID = c2.DisciplineID
AND a.ParentID <> 0 -- no child exists
)
)
I would start with a pre-qualifying query per discipline based on those having a count of entries that HAVE a parent ID = 0 but also no records as child... Join that result to your TableC
SELECT
c.ID,
c.InspectorID,
c.DisciplineID
FROM
dbo.TableC c
JOIN ( select
a.DisciplineID
from
TableA a
group by
a.DisciplineID
having
sum( case when a.ParentID = 0 then 1 else 0 end ) > 0
AND sum( case when a.ParentID > 0 then 1 else 0 end ) = 0 ) qual
on c.DisciplineID = qual.DisciplineID
You can try this:
SELECT DISTINCT B.INSPECTORID FROM TABLEA A
LEFT JOIN TABLEC CHILD ON CHILD.DISCIPLINEID = A.DISCIPLINEID
LEFT JOIN TABLEC PARENT ON PARENT.DISCIPLINEID = A.PARENTID
JOIN TABLEB B ON A.INSPECTORID = B.INSPECTORID
WHERE (A.PARENTID = 0 AND CHILD.DISCIPLINEID IS NOT NULL)
I'm wondering if there is a simpler way to accomplish my goal than what I've come up with.
I am returning a specific attribute that applies to an object. The objects go through multiple iterations and the attributes might change slightly from iteration to iteration. The iteration will only be added to the table if the attribute changes. So the most recent iteration might not be in the table.
Each attribute is uniquely identified by a combination of the Attribute ID (AttribId) and Generation ID (GenId).
Object_Table
ObjectId | AttribId | GenId
32 | 2 | 3
33 | 3 | 1
Attribute_Table
AttribId | GenId | AttribDesc
1 | 1 | Text
2 | 1 | Some Text
2 | 2 | Some Different Text
3 | 1 | Other Text
When I query on a specific object I would like it to return an exact match if possible. For example, Object ID 33 would return "Other Text".
But if there is no exact match, I would like for the most recent generation (largest Gen ID) to be returned. For example, Object ID 32 would return "Some Different Text". Since there is no Attribute ID 2 from Gen 3, it uses the description from the most recent iteration of the Attribute which is Gen ID 2.
This is what I've come up with to accomplish that goal:
SELECT attr.AttribDesc
FROM Attribute_Table AS attr
JOIN Object_Table AS obj
ON obj.AttribId = obj.AttribId
WHERE attr.GenId = (SELECT MIN(GenId)
FROM(SELECT CASE obj2.GenId
WHEN attr2.GenId THEN attr2.GenId
ELSE(SELECT MAX(attr3.GenId)
FROM Attribute_Table AS attr3
JOIN Object_Table AS obj3
ON obj3.AttribId = attr3.AttribId
WHERE obj3.AttribId = 2
)
END AS GenId
FROM Attribute_Table AS attr2
JOIN Object_Table AS obj2
ON attr2.AttribId = obj2.AttribId
WHERE obj2.AttribId = 2
) AS ListOfGens
)
Is there a simpler way to accomplish this? I feel that there should be, but I'm relatively new to SQL and can't think of anything else.
Thanks!
The following query will return the matching value, if found, otherwise use a correlated subquery to return the value with the highest GenId and matching AttribId:
SELECT obj.Object_Id,
CASE WHEN attr1.AttribDesc IS NOT NULL THEN attr1.AttribDesc ELSE attr2.AttribDesc END AS AttribDesc
FROM Object_Table AS obj
LEFT JOIN Attribute_Table AS attr1
ON attr1.AttribId = obj.AttribId AND attr1.GenId = obj.GenId
LEFT JOIN Attribute_Table AS attr2
ON attr2.AttribId = obj.AttribId AND attr2.GenId = (
SELECT max(GenId)
FROM Attribute_Table AS attr3
WHERE attr3.AttribId = obj.AttribId)
In the case where there is no matching record at all with the given AttribId, it will return NULL. If you want to get no record at all in this case, make the second JOIN an INNER JOIN rather than a LEFT JOIN.
Try this...
Incase the logic doesn't find a match for the Object_table GENID it maps it to the next highest GENID in the ON clause of the JOIN.
SELECT AttribDesc
FROM object_TABLE A
INNER JOIN Attribute_Table B
ON A.AttrbId = B.AttrbId
AND (
CASE
WHEN A.Genid <> B.Genid
THEN (
SELECT MAX(C.Genid)
FROM Attribute_Table C
WHERE A.AttrbId = C.AttrbId
)
ELSE A.Genid
END
) -- Selecting the right GENID in the join clause should do the job
= B.Genid
This should work:
with x as (
select *, row_number() over (partition by AttribId order by GenId desc) as rn
from Attribute_Table
)
select isnull(a.attribdesc, x.attribdesc)
from Object_Table o
left join Attribute_Table a
on o.AttribId = a.AttribId and o.GenId = a.GenId
left join x on o.AttribId = x.AttribId and rn = 1
I have a very simple MS SQL table with the following data(with column name and datatype):
TableId PersonName Attribute AttributeValue
(int) (nvarchar 50) (nvarchar 50) (bit)
----------- ----------------------- ------------------- --------------
1 A IsHuman 1
2 A CanSpeak 1
3 A CanSee 1
4 A CanWalk 0
5 B IsHuman 1
6 B CanSpeak 1
7 B CanSee 0
8 B CanWalk 0
9 C IsHuman 0
10 C CanSpeak 1
11 C CanSee 1
12 C CanWalk 0
Now, What I need as a result is the unique PersonName that have both Attribute IsHuman and CanSpeak with AttributeValue = 1.
The expected result should be (Must not include C as this one has IsHuman = 0)
PersonName
------------
A
B
Please can any expert help me in writting a SQL Query for this.
SELECT PersonName
FROM MyTable
WHERE AttributeName = 'IsHuman'
AND AttributeValue = 1
INTERSECT
SELECT PersonName
FROM MyTable
WHERE AttributeName = 'CanSpeak'
AND AttributeValue = 1;
Obviously this approach doesn't 'scale' if the criteria can vary. It could be that the relational operator you require is division, popularly known as "the supplier who supplies all parts", specifically division with remainder.
SELECT PersonName
FROM MyTable
WHERE (AttributeName = 'IsHuman' AND AttributeValue = 1) OR
(AttributeName = 'CanSpeak' AND AttributeValue = 1)
GROUP BY PersonName
HAVING COUNT(*) > 1
or
SELECT PersonName
FROM MyTable
WHERE AttributeValue = 1 AND AttributeName IN ('IsHuman', 'CanSpeak')
GROUP BY PersonName
HAVING COUNT(*) > 1
SELECT PersonName FROM MyTable
WHERE PersonName IN
(SELECT T1.PersonName FROM MyTable T1 WHERE T1.Attribute = 'IsHuman' and T1.AttributeValue='1')
AND (Attribute = 'CanSpeak' AND AttributeValue='1')
I think two inner joins may give you alright performance depending on indexing and table sizes.
SELECT t.PersonName FROM table t
INNER JOIN table t2 ON t.PersonName=t2.PersonName AND t3.Attribute = 'IsHuman' AND t2.AttributeValue = 1
INNER JOIN table t3 ON t2.PersonName=t3.PersonName AND t3.Attribute = 'CanSpeak' AND t3.AttributeValue = 1
or
SELECT t.PersonName FROM table t
INNER JOIN table t2 ON t.PersonName=t2.PersonName
INNER JOIN table t3 ON t2.PersonName=t3.PersonName
WHERE t2.Attribute = 'IsHuman' AND t2.AttributeValue = 1 AND t3.Attribute = 'CanSpeak' AND t3.AttributeValue = 1
This solution could be simplified significantly however should the properties IsHuman and CanSpeak were in separate tables with an linking ID table between them. Sounds like this table could possibly benefit from some normalization.
If you cant progress that, a view may assist in performance. I am at home without SQL installed so I cant verify any performance aspects.
select personname from yourtablename where personname in ('a','b') group by personname
I actually use this as a screening question for interviews. None of you people would get the job.
OK, maybe you would, but while the strategies you use might or might not work, they aren't generalizable and they miss a basic notion of relational algebra, to wit, aliasing.
The right answer (in the sense that it would make me more likely to employ you as well as the less importance senses that the RDMS's optimizer understands it and it can be extended to other, arbitrarily complex, cases) is
SELECT t1.PersonName
FROM MyTable t1, MyTable t2
WHERE t2.AttributeName = 'CanSpeak'
AND t2.AttributeValue = 1
AND t1.AttributeName = 'IsHuman'
AND t1.AttributeValue = 1
AND t1.PersonName = t2.PersonName;