Recursive select that selects rows based own plus childrens values - sql

I need to select rows in a table like this:
Select all rows in the table where both conditions are met:
Condition 1: the value column should not match with any value in table v
Condition 2: no decendent (on any level, ie: child or sub child, sub- sub- child etc) has a value that matches with any value in table v
Table v looks like this:
Expected result from example table. Should [row] be selected/returned?
a1: No - condition 2
a2: No - condition 2
a3: No - condition 1
a4: No - condition 1
a5: Yes - (value does not match in v and no decendents that match in v)
a6: Yes - (value does not match in v and no decendents that match in v)
a7: Yes - (value does not match in v and no decendents that match in v)
a8: Yes - (value does not match in v and no decendents that match in v)
Here is an sqlfiddle where the tables are set up together with a recursive function that shows all rows and their level in the tree, but that I don't know how to procede with:
http://sqlfiddle.com/#!18/736a28/15/0

Check this solution:
--------------------------- DDL+DML
drop table if exists a
drop table if exists v
GO
CREATE TABLE a
([id] varchar(13), [parentId] varchar(57), [value] varchar(57))
;
CREATE TABLE v
([id] varchar(13), [value] varchar(57))
;
INSERT INTO a
([id], [parentId], [value])
VALUES
('a1', NULL, NULL),
('a2', 'a1', NULL),
('a3', 'a2', '1'),
('a4', NULL, '5'),
('a5', 'a1', '8'),
('a6', 'a2', NULL),
('a7', NULL, NULL),
('a8', NULL, '3'),
('a9', 'a8', '7')
;
INSERT INTO v
([id], [value])
VALUES
('v1', '1'),
('v2', '5'),
('v3', '10'),
('v4', '15'),
('v5', '20'),
('v6', '25'),
('v7', '30'),
('v8', '35'),
('v9', '40')
;
SELECT * FROM a
SELECT * FROM v
GO
-------------------- Solution
WITH MyRecCTE AS(
SELECT a.id, a.parentId, a.[value], Res = 'NO'
FROM a
INNER JOIN v ON a.[value] = v.[value]
UNION ALL
SELECT
a.id, a.parentId, a.[value], Res = 'NO'
FROM a
INNER JOIN MyRecCTE c ON c.parentId = a.id
)
SELECT DISTINCT a.id, a.parentId,a.[value], ISNULL(CONVERT(VARCHAR(3),c.Res),'YES')
FROM a
LEFT JOIN MyRecCTE c ON a.id = c.id
ORDER BY id
GO
Result Set (fits requested)):
For the sake of the discussion let's add another row which lead rows with id a8 and a9 to be "NO" since it is child of a9 and has value from the second table
INSERT INTO a
([id], [parentId], [value])
VALUES
('a10', 'a9', 35)
GO
test 2 Result set (fits expected)

This got somewhat complicated, but I created a CTE where there is a record that contains a Path for every combination of ancestor and descendant (transitive closure). Then, I create a second CTE where I extract the parent id from the beginning of Path and the descendant id from the end of Path and look up the descendant's value. Then, finally, I query the second CTE and use NOT EXISTS to filter the rows.
WITH tree
AS
(
SELECT a.id, a.parentId, a.value,
CAST('/' + a.id as varchar(1000)) as Path
FROM a
UNION ALL
SELECT a.id, a.parentId, a.value,
CAST(t.Path + '/' + a.id as varchar(1000)) as Path
FROM a
INNER JOIN tree t
ON Path LIKE '%/' + a.parentId
),
DT
AS
(
SELECT t.Path,
RIGHT(LEFT(t.Path,3),2) as parent_id,
RIGHT(t.Path,2) as descendant_id,
(SELECT q.[value]
FROM a q
WHERE q.id = RIGHT(t.Path,2)
) as [descendant_value]
FROM tree t
)
SELECT *
FROM DT dt_outer
WHERE NOT EXISTS (SELECT 1 FROM DT dt_inner WHERE dt_inner.parent_id = dt_outer.parent_id AND
dt_inner.descendant_value IN (SELECT [value] FROM v))
ORDER BY 2,3
I left the result set with duplicates to get a clearer picture of what's going on. You can finish up with a DISTINCT parent_id to get the unique ids.
SQL Fiddle

Related

Select 75% of records to rename, based on column sum

I have a scenario where I need to rename a value in one column, based on another column's total. Example table below with basic math, to express concept. I'd like to change the value in 'Condition' column to "Used" for the rows that make up 70% of the 'Revenue' column (which in this example would be 7 rows). The other 30% would be renamed to "New" (the remaining 3 rows). No other specific logic required.
I found that the approach mentioned here works for selecting the percentage of rows required
Select Rows who's Sum Value = 80% of the Total
I suppose I could create two temporary tables, rename the column fields in each respective table, and then join together. Curious if there is an easier way?
Current Table:
Source
Condition
Revenue
A
Old
1
B
New
1
C
Old
1
D
New
1
E
Old
1
F
New
1
G
Old
1
H
New
1
I
Old
1
J
New
1
New Table:
Source
Condition
Revenue
A
Used
1
B
Used
1
C
Used
1
D
Used
1
E
Used
1
F
Used
1
G
Used
1
H
New
1
I
New
1
J
New
1
You could do this with two updates. The first would update the entire table. The second would update the first 70%.
First we need sample data in a table. I used a table variable here but you would use your actual table.
declare #Something table
(
Source char(1)
, Condition varchar(10)
, Revenue int
)
insert #Something values
('A', 'Old', 1)
, ('B', 'New', 1)
, ('C', 'Old', 1)
, ('D', 'New', 1)
, ('E', 'Old', 1)
, ('F', 'New', 1)
, ('G', 'Old', 1)
, ('H', 'New', 1)
, ('I', 'Old', 1)
, ('J', 'New', 1)
select *
from #Something;
Next simply update the entire table.
update #Something
set Condition = 'New';
Last step is to update the first 70%. An easy to do this is to use a cte to select the first 70% and then update the cte.
with Top70 as
(
select top 70 percent *
from #Something
order by Source
)
update Top70
set Condition = 'Used';
Here is the final output.
select *
from #Something;
--EDIT--
Now understanding we need a running total you could do something like this.
select *
, case when sum(Revenue) over(order by Source) > (sum(Revenue) over() * .7) then 'New' else 'Old' end
from #Something
You can select/mark the 70% and 30% records using this query :
with cte as (
SELECT *, SUM(revenue) OVER(ORDER BY source) AS cumulative_revenue, SUM(revenue) OVER() as total
FROM mytable t
)
select Source, iif((cumulative_revenue + 0.0) /total <= 0.7, 'Used', 'New') as Condition, revenue, cumulative_revenue, (cumulative_revenue + 0.0) /total as perc
from cte
Demo here
You could chain a couple of CTEs to run the UPDATE
DROP TABLE IF EXISTS #t
CREATE TABLE #t([Source] VARCHAR(10), [Condition] VARCHAR(10), Revenue INT)
INSERT INTO #t([Source], [Condition], [Revenue])
values
('A', 'Old', 1)
,('B', 'New', 1)
,('C', 'Old', 1)
,('D', 'New', 1)
,('E', 'Old', 1)
,('F', 'New', 1)
,('G', 'Old', 1)
,('H', 'New', 1)
,('I', 'Old', 1)
,('J', 'New', 1)
;WITH cte AS (
SELECT *, SUM( Revenue) OVER (ORDER BY Source) ACC
FROM #t
), cte2 as(
SELECT MAX(acc)*1. TotalRevenue FROM cte
)
UPDATE cte
SET Condition = CASE WHEN Acc / TotalRevenue <= .7 THEN 'Used' ELSE 'New' END
FROM cte
CROSS APPLY (SELECT TotalRevenue FROM cte2) ca
SELECT * FROM #t

Find matching first 7 chars to identify duplicates

I'm trying to identify duplicate state_num that are failing validation. The R is causing issues with validation, but I want to just search the first 7 characters and find the duplicate values, so that it returns the row that has an R in the string and the row that doesn't. The column is a type: char(15) But when trying to run a query it is not finding the matching 7 characters. My table only showing how it should look, its not showing what is actually being returned. It basically is just finding the state and only finding non R state_num in results. It should be returning around 480 rows but is returning like 20k rows and not just showing the duplicates
I've tried querying a bunch of different ways but i've spen the last hour only being able to return the R row if i ad AND state_num[8] = 'R' to the end of the query. Which defeats what I'm trying to find the duplicate first 7 characters. This is an informix db.
My Query:
SELECT id_ref, cont_ref, formatted, state_num, type, state
FROM state_form sf1
WHERE EXISTS (select cont_ref, san
FROM state_form sf2
WHERE sf1.cont_ref = sf2.cont_ref and left(sf1.state_num,7) = LEFT(sf2.state_num,7)
GROUP BY cont_ref, state_num
HAVING COUNT(state_num) > 1)
AND state = 'MT';
This is what I'd like my results to return:
id_ref
cont_ref
formatted
state_num
type
state
658311
5237
71-75011R
7175011R
Y
MT
1459
5237
71-75011
7175011
I
MT
7501
555678
99-67894
9967894
I
MT
345443
555678
99-67894R
9967894R
Y
MT
Here are a couple options producing the same results. This may need to be changed if you need to identify the 8th character as something such as a Letter. That is, this will also catch 12345678 and 1234567.
create table my_data (
id_ref integer,
cont_ref integer,
state_num varchar(20),
type varchar(5),
state varchar(5)
);
insert into my_data values
(1, 5237, '7175011R', 'Y', 'MT'),
(2, 5237, '7175011', 'I', 'MT'),
(3, 6789, '7878787', 'Y', 'CA'),
(4, 6789, '7878787R', 'I', 'CA'),
(5, 555678, '9967894', 'I', 'MT'),
(6, 555678, '9967894R', 'Y', 'MT'),
(7, 98765, '123456', 'I', 'MT');
Query #1
with dupes as (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
)
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;
Query #2
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
where m.cont_ref in (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
);
id_ref
cont_ref
state_num
type
state
1
5237
7175011R
Y
MT
2
5237
7175011
I
MT
5
555678
9967894
I
MT
6
555678
9967894R
Y
MT
View on DB Fiddle
UPDATE
If Informix does not want to group by left(column, 7), then you could get the target cont_ref values using this. Here's the CTE method, but you could also do with sub-query.
with dupes as (
select cont_ref
from (
select cont_ref, left(state_num, 7) as left_seven
from my_data
where state = 'MT'
)z
group by cont_ref
having count(*) > 1
)
select m.*
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Find the users having more than two elements and one of those elements must be A

I want to extract the users having more than two elements and one of those elements must be A.
This my table:
CREATE TABLE #myTable(
ID_element nvarchar(30),
Element nvarchar(10),
ID_client nvarchar(20)
)
This is the data of my table:
INSERT INTO #myTable VALUES
(13 ,'A', 1),(14 ,'B', 1),(15 ,NULL, 1),(16 ,NULL, 1),
(17 ,NULL, 1),(18 ,NULL, 1),(19 ,NULL, 1),(7, 'A', 2),
(8, 'B', 2),(9, 'C', 2),(10 ,'D', 2),(11 ,'F', 2),
(12 ,'G', 2),(1, 'A', 3),(2, 'B', 3),(3, 'C', 3),
(4, 'D', 3),(5, 'F', 3),(6, 'G', 3),(20 ,'Z', 4),
(22 ,'R', 4),(23 ,'D', 4),(24 ,'F', 5),(25 ,'G', 5),
(21 ,'x', 5)
And this is my query:
Select Distinct ID_client
from #myTable
Group by ID_client
Having Count(Element) > 2
Add to your query CROSS APPLY with id_clients that have element A
SELECT m.ID_client
FROM #myTable m
CROSS APPLY (
SELECT ID_client
FROM #myTable
WHERE ID_client = m.ID_client
AND Element = 'A'
) s
GROUP BY m.ID_client
HAVING COUNT(DISTINCT m.Element) > 2
Output:
ID_client
2
3
I think this is what you are looking for:
SELECT * FROM
(SELECT *, RANK() OVER (PARTITION BY element ORDER by id_client) AS grouped FROM #myTable) t
wHERE grouped > 1
AND Element = 'A'
ORDER by t.element
which brings back
ID_element Element ID_client grouped
7 A 2 2
1 A 3 3
You can select the ID_client values which have an 'A' as an Element and join your table with the result of that:
SELECT m.ID_Client
FROM #myTable AS m
JOIN (
SELECT a.ID_Client FROM #myTable AS a
WHERE a.Element = 'A') AS filteredClients
ON m.ID_client = filteredClients.ID_client
GROUP BY m.ID_client
HAVING COUNT(m.Element) > 2
Outputs:
ID_Client
2
3
However, this is not necessarily the best way to do it: When should I use Cross Apply over Inner Join?

Update column based on IF Else Condition

I have two tables A and B
Table A
ID_number as PK
first_name,
L_Name
Table B
ID_number,
Email_id,
Flag
I have several people who have multiple email ID and are already flagged as X on table B.
Whereas i am trying to find list of people who have an email id or multiple email ID, but were never flagged.
e.g John clark might have 2 email in table B, but was never flagged.
Simply use not exists:
select a.*
from a
where not exists (select 1
from b
where b.id_number = a.id_number and b.flag = 'X'
);
You may want to perform an update, but your question seems to be only about selecting (probably to update based on select). It should be something like this:
SELECT A.L_Name
FROM A
WHERE NOT EXISTS (
SELECT 1
FROM B
WHERE B.ID_number = A.ID_number AND B.Flag = 'X'
)
OR the LEFT JOIN version
SELECT 1
FROM A
LEFT JOIN B ON B.ID_number = A.ID_number AND B.Flag = 'X'
WHER B.ID_number IS NULL
Usually, the first version is faster than the second one.
Forget Table A...
SELECT DISTINCT ID_number FROM table_b t1
WHERE NOT EXISTS(
SELECT NULL FROM table_b t2 WHERE t1.ID_number=t2.ID_number AND t2.flag='X'
)
Judging by your responses in the comments, I believe this is what you are looking for:
--drop table update_test;
create table update_test
(
id_num number,
email_id number,
flag varchar2(1) default null
);
insert into update_test values (1, 1, null);
insert into update_test values (1, 2, null);
insert into update_test values (2, 3, null);
insert into update_test values (2, 7, null);
insert into update_test values (3, 2, null);
insert into update_test values (3, 3, 'X');
insert into update_test values (3, 7, null);
select * from update_test;
select id_num, min(email_id)
from update_test
group by id_num;
update update_test ut1
set flag = case
when email_id = (
select min(email_id)
from update_test ut2
where ut2.id_num = ut1.id_num
) then 'X'
else null end
where id_num not in (
select id_num
from update_test
where Flag is not null);
The last update statement will update and set the Flag field on the record for each id_num group with the lowest email_id. If the id_num group already has the Flag field set for one it will ignore it.