Remove Duplicate Nodes from SQL Server Hierarchy - sql

Looking at the below data is there an easy way to find IDs that exist at a higher level under the same leaf and remove them.
E.g. ID 4,5 exists in rows 4,5 , 8,9 and 12,13.
i want to remove rows 4,5 as the same IDs exist further down the hierarchy (rows 8,9), but rows 12,13 stay as they are on a separate leaf.
Rows to be Removed
row ID Path
1 1 /1/
2 2 /1/2/
3 3 /1/2/3/
4 4 /1/2/3/4/
5 5 /1/2/3/5/
6 6 /1/2/3/6/
7 7 /1/2/3/6/7/
8 4 /1/2/3/6/4/
9 5 /1/2/3/6/5/
10 8 /1/2/8/
11 7 /1/2/8/7/
12 4 /1/2/8/4/
13 5 /1/2/8/5/

example for max ID len = 1
select *
--delete t1
from table as t1
where exists(
select *
from table as t2
where left(t2.path, len(t1.path - 2)) = left(t1.path, len(t1.path - 2))
and charindex(right(t1.path, 2), t2.path, len(t1.path)) > 0
)

This is my approach it should work for any length of ID:
with
conversed as (
select substring(reverse(path),CHARINDEX('/',reverse(path),2), len(path)-CHARINDEX('/',reverse(path),2)) inverse_path, id, t.row
from t)
select c.id, c.row keeper, c1.row deletethis
from conversed c join conversed c1
on c.id = c1.id
and c.row <> c1.row
and c.inverse_path like '%' +c1.inverse_path;
OUTPUT
id keeper deletethis
4 8 4
5 9 5

Related

SELECT COUNT of child records as subquery within UNION

I'm trying to get a count of the number of children associated with a user. The portion of this query that selects the number of children records is: (select count(*) from blitz_user as ua where ua.manager_id= u.id ) as "directsCount".
The query selects the correct number of children for all records except those with zero (0) children. For records with zero children the value of directsCount is still one (1).
The database in use is PostgreSQL. Please let me know if this is easier to understand with sample data. I tried to limit the length of the post in case that is not helpful.
WITH RECURSIVE subordinates AS (
SELECT u.id,
u.company_id as "companyId",
u.manager_id as "managerId",
1 as levels_down,
(select count(*) from blitz_user as ua where ua.manager_id= u.id ) as "directsCount"
FROM blitz_user AS u
WHERE u.manager_id = 9
UNION
SELECT u.id,
u.company_id as "companyId",
u.manager_id as "managerId",
levels_down + 1, 1 as levels_down
FROM blitz_user AS u
INNER JOIN subordinates AS s ON u.manager_id = s.id
WHERE levels_down <= 5
)
SELECT id, "companyId", "managerId", levels_down, "directsCount"
FROM subordinates;
Please see the following dbfiddle for example code along with the current output.
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=509411d5971efc4eddfccabf3c4d22a4
The current output is:
id companyId managerId levels_down directsCount
10 6 9 1 2 (correct)
11 6 9 1 1 (correct)
12 6 9 1 2 (correct)
13 6 9 1 3 (correct)
14 6 13 2 1 (should be 2)
15 6 13 2 1 (should be 2)
16 6 13 2 1 (correct)
41 6 10 2 1 (should be 3)
45 6 10 2 1 (should be 3)
49 6 11 2 1 (should be 4)
54 6 12 2 1 (should be 0)
58 6 12 2 1 (should be 3)
17 6 14 3 1 (should be 4)
<snip>
The problem just repeats at this point for all remaining rows.
The "levels_down" tracks how far down the hierarchy we are. The values for "directsCount" are correct when we are only 1 levels_down, but then default to "1" for all levels_down from 2, 3, etc.

How to select parent child till end using root parent id in sql?

Below the table :
I need to get without recursion, but use any other join and union.
Id ParentId
1 0
2 1
3 2
4 2
5 4
6 5
7 6
8 7
9 8
N N
Without recursion use any other join queries
DECLARE #Nvalue INT = 10
SELECT NUMBER AS ID,NUMBER-1 AS ParentID FROM (
SELECT DISTINCT NUMBER FROM master..spt_values WHERE number BETWEEN 1 AND #Nvalue
)T

SQL Inner Join and Partitioning To obtain RowNumbers when matching

I have 2 tables. The first table 'a' the second 'b'.
I am writing a query that grabs every row in table a (there is 33 rows defined) and inner joins table b where the EnclLocation or the BackPanLoc match the Workcell in table A.
I only want a row from table B where they match based off BackPan and EnclLocation but they are not the same records. table b has a few rows of data that is assigned to the same workcell as table a. I am just trying to retrieve those additional rows and partition it.
I attached table a and table b. I also attached the desired results for this query with respect to Workcell 10 only as an example... As you can see, table B has 4 records that has either the EnclLocation or the BackPanLoc = 10. But my results only show the same DelvNumber 4 times. any help is most appreicated.
Table a
Table b
Incorrect Results
Desired Results (showing only Workcell 10 as an example)
workcell DelvNumber RowNum
1 447910-02 1
2 445710-01 1
2 445710-01 2
3 444291-01 1
3 444291-01 2
4 447910-03 1
4 447910-03 2
5 648020-01 1
6 647800-02 1
7 646920-01 1
7 646920-01 2
8 644830-4-8 1
8 644830-4-8 2
9 443990-01 1
10 645960-01-03 1
10 445710-11 2
10 445710-02 3
10 445710-09 4
Code Used
WITH ss
AS (SELECT a.*,
Row_number()
OVER(
partition BY a.workcell
ORDER BY a.workcell) AS rownum
FROM nwcurrent a
INNER JOIN nwdeliverables b
ON b.encllocation = a.workcell
OR b.backpanloc = a.workcell
WHERE ( b.status < 9
AND ( b.encllocation <> 0
OR b.backpanloc <> 0 )
OR a.delvnumber = '123' ))
SELECT *
FROM ss
copy and paste format
1 447910-02 1
2 445710-01 1
2 445710-01 2
3 444291-01 1
3 444291-01 2
4 447910-03 1
4 447910-03 2
5 648020-01 1
6 647800-02 1
7 646920-01 1
7 646920-01 2
8 644830-4-8 1
8 644830-4-8 2
9 443990-01 1
10 645960-01-03 1
10 445710-11 2
10 445710-02 3
10 445710-09 4
SQLFiddle
http://sqlfiddle.com/#!3/a8682/4
A new try...
SELECT a.workcell
,a.DelvNumber AS A_DelvNumber
,b.DelvNumber AS B_DelvNumber
,CASE WHEN a.DelvNumber<>b.DelvNumber THEN b.DelvNumber ELSE a.DelvNumber END AS DelvNumber_Resolved
,Row_number() OVER(partition BY a.workcell ORDER BY a.workcell) AS rownum
FROM NWCurrent a
INNER JOIN NWDeliverables AS b ON b.EnclLocation=a.WorkCell OR b.BackPanLoc=a.WorkCell
WHERE (b.status <9 AND (b.EnclLocation<>0 OR b.BackPanLoc<>0)OR a.DelvNumber='123')

SQL: Assembling Non-Overlapping Sets

I have sets of consecutive integers, organized by type, in table1. All values are between 1 and 10, inclusive.
table1:
row_id set_id type min_value max_value
1 1 a 1 3
2 2 a 4 10
3 3 a 6 10
4 4 a 2 5
5 5 b 1 9
6 6 c 1 7
7 7 c 3 10
8 8 d 1 2
9 9 d 3 3
10 10 d 4 5
11 11 d 7 10
In table2, within each type, I want to assemble all possible maximal, non-overlapping sets (though gaps that cannot be filled by any sets of the correct type are okay). Desired output:
table2:
row_id type group_id set_id
1 a 1 1
2 a 1 2
3 a 2 1
4 a 2 3
5 a 3 3
6 a 3 4
7 b 4 5
8 c 5 6
9 c 6 7
10 d 7 8
11 d 7 9
12 d 7 10
13 d 7 11
My current idea is to use the fact that there is a limited number of possible values. Steps:
Find all sets in table1 containing value 1. Copy them into table2.
Find all sets in table1 containing value 2 and not already in table2.
Join the sets from (2) with table1 on type, set_id, and having min_value greater than the group's greatest max_value.
For the sets from (2) that did not join in (3), insert them into table2. These start new groups that may be extended later.
Repeat steps (2) through (4) for values 3 through 10.
I think this will work, but it has a lot of pain-in-the-butt steps, especially for (2)--finding the sets not in table2, and (4)--finding the sets that did not join.
Do you know a faster, more efficient method? My real data has millions of sets, thousands of types, and hundreds of values (though fortunately, as in the example, the values are bounded), so scalability is essential.
I'm using PLSQL Developer with Oracle 10g (not 11g as I stated before--thanks, IT department). Thanks!
For Oracle 10g you can't use recursive CTEs, but with a bit of work you can do something similar with the connect by syntax. First you need to generate a CTE or in-line view which has all the non-overlapping links, which you can do with:
select t1.type, t1.set_id, t1.min_value, t1.max_value,
t2.set_id as next_set_id, t2.min_value as next_min_value,
t2.max_value as next_max_value,
row_number() over (order by t1.type, t1.set_id, t2.set_id) as group_id
from table1 t1
left join table1 t2 on t2.type = t1.type
and t2.min_value > t1.max_value
where not exists (
select 1
from table1 t4
where t4.type = t1.type
and t4.min_value > t1.max_value
and t4.max_value < t2.min_value
)
order by t1.type, group_id, t1.set_id, t2.set_id;
This took a bit of experimentation and it's certainly possible I've missed or lost something about the rules in the process; but that gives you 12 pseudo-rows, and is in my previous answer this allows the two separate chains starting with a/1 to be followed while constraining the d values to a single chain:
TYPE SET_ID MIN_VALUE MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GROUP_ID
---- ------ ---------- ---------- ----------- -------------- -------------- --------
a 1 1 3 2 4 10 1
a 1 1 3 3 6 10 2
a 2 4 10 3
a 3 6 10 4
a 4 2 5 3 6 10 5
b 5 1 9 6
c 6 1 7 7
c 7 3 10 8
d 8 1 2 9 3 3 9
d 9 3 3 10 4 5 10
d 10 4 5 11 7 10 11
d 11 7 10 12
And that can be used as a CTE; querying that with a connect-by loop:
with t as (
... -- same as above query
)
select t1.type,
dense_rank() over (partition by null
order by connect_by_root group_id) as group_id,
t1.set_id
from t t1
connect by type = prior type
and set_id = prior next_set_id
start with not exists (
select 1 from table1 t2
where t2.type = t1.type
and t2.max_value < t1.min_value
)
and not exists (
select 1 from t t3
where t3.type = t1.type
and t3.next_max_value < t1.next_min_value
)
order by t1.type, group_id, t1.min_value;
The dense_rank() makes the group IDs contiguous; not sure if you actually need those at all, or if their sequence matters, so it's optional really. connect_by_root gives the group ID for the start of the chain, so although there were 12 rows and 12 group_id values in the initial query, they don't all appear in the final result.
The connection is via two prior values, type and the next set ID found in the initial query. That creates all the chains, but own its own would also include shorter chains - for d you'd see 8,9,10,11 but also 9,10,11 and 10,11, which you don't want as separate groups. Those are eliminated by the start with conditions, which could maybe be simplified.
That gives:
TYPE GROUP_ID SET_ID
---- -------- ------
a 1 1
a 1 2
a 2 1
a 2 3
a 3 4
a 3 3
b 4 5
c 5 6
c 6 7
d 7 8
d 7 9
d 7 10
d 7 11
SQL Fiddle demo.
If you can identify all the groups and their starting set_id then you can use a recursive approach and do this all in a single statement, rather than needing to populate a table iteratively. However you'd need to benchmark both approaches both for speed/efficiency and resource consumption - whether it will scale for your data volumes and within your system's available resources would need to be verified.
If I understand when you decide to start a new group you can identify them all at once with a query like:
with t as (
select t1.type, t1.set_id, t1.min_value, t1.max_value,
t2.set_id as next_set_id, t2.min_value as next_min_value,
t2.max_value as next_max_value
from table1 t1
left join table1 t2 on t2.type = t1.type and t2.min_value > t1.max_value
where not exists (
select 1
from table1 t3
where t3.type = t1.type
and t3.max_value < t1.min_value
)
)
select t.type, t.set_id, t.min_value, t.max_value,
t.next_set_id, t.next_min_value, t.next_max_value,
row_number() over (order by t.type, t.min_value, t.next_min_value) as grp_id
from t
where not exists (
select 1 from t t2
where t2.type = t.type
and t2.next_max_value < t.next_min_value
)
order by grp_id;
The tricky bit here is getting all three groups for a, specifically the two groups that start with set_id = 1, but only one group for d. The inner select (in the CTE) looks for sets that don't have a lower non-overlapping range via the not exists clause, and outer-joins to the same table to get the next set(s) that don't overlap, which gives you two groups that start with set_id = 1, but also four that start with set_id = 9. The outer select then ignores everything but the lowest non-overlapping with a second not exists clause - but doesn't have to hit the real table again.
So that gives you:
TYPE SET_ID MIN_VALUE MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GRP_ID
---- ------ ---------- ---------- ----------- -------------- -------------- ------
a 1 1 3 2 4 10 1
a 1 1 3 3 6 10 2
a 4 2 5 3 6 10 3
b 5 1 9 4
c 6 1 7 5
c 7 3 10 6
d 8 1 2 9 3 3 7
You can then use that as the anchor member in a recursive subquery factoring clause:
with t as (
...
),
r (type, set_id, min_value, max_value,
next_set_id, next_min_value, next_max_value, grp_id) as (
select t.type, t.set_id, t.min_value, t.max_value,
t.next_set_id, t.next_min_value, t.next_max_value,
row_number() over (order by t.type, t.min_value, t.next_min_value)
from t
where not exists (
select 1 from t t2
where t2.type = t.type
and t2.next_max_value < t.next_min_value
)
...
If you left the r CTE with that and just did sleect * from r you'd get the same seven groups.
The recursive member then uses the next set_id and its range from that query as the next member of each group, and repeats the outer join/not-exists look up to find the next set(s) again; stopping when there is no next non-overlapping set:
...
union all
select r.type, r.next_set_id, r.next_min_value, r.next_max_value,
t.set_id, t.min_value, t.max_value, r.grp_id
from r
left join table1 t
on t.type = r.type
and t.min_value > r.next_max_value
and not exists (
select 1 from table1 t2
where t2.type = r.type
and t2.min_value > r.next_max_value
and t2.max_value < t.min_value
)
where r.next_set_id is not null -- to stop looking when you reach a leaf node
)
...
Finally you have a query based on the recursive CTE to get the columns you want and to specify the order:
...
select r.type, r.grp_id, r.set_id
from r
order by r.type, r.grp_id, r.min_value;
Which gets:
TYPE GRP_ID SET_ID
---- ---------- ----------
a 1 1
a 1 2
a 2 1
a 2 3
a 3 4
a 3 3
b 4 5
c 5 6
c 6 7
d 7 8
d 7 9
d 7 10
d 7 11
SQL Fiddle demo.
If you wanted to you could show the min/max values for each set, and could track and show the min/max value for each group. I've just show then columns from the question though.

Query to serialize data

I have two tables:
Routes
ID Description
1 street1
2 street2
3 street3
4 street4
5 street5
Segments
ID RouteID, Progres, LabelStart, LabelEnd
1 1 5 1 A 21 B
2 1 10 2 A 10
3 2 15 3 25
4 2 15 2 20
5 3 20 1 11
6 3 22 4 10
7 4 30 5 11
8 4 31 2 12
I need a sequence with these rules:
table must be ordered by Progress ASC
A column Type is defined and take O if LabelStart and LabelEnd are Odd, E if Even
if two routes have the same progress then the rows are merged in one where
LabelStart is the minimum (among LabelStart Odd and LabelStart Even)
and LabelEnd is the Max, in this case Type takes the value of A (All)
as per example data above the result should be
Sequence
ID RouteID, Progres, LabelStart, LabelEnd Type
1 1 5 1 A 21 B O
2 1 10 2 A 10 E
4 2 15 2 25 A
5 3 20 1 11 O
6 3 22 4 10 E
7 4 30 5 11 O
8 4 31 2 12 E
It is for Postgres 9.2
This was an interesting query to write because you had letters in your LabelStart and LabelEnd fields. I used REGEX_REPLACE to remove those. Then I used a CTE to get the records that had more than one routeid and progress rows.
I think this should do it:
WITH CTE AS (
SELECT
RouteId, Progress
FROM Sequence
GROUP BY RouteId, Progress
HAVING COUNT(DISTINCT Id) > 1
)
SELECT MAX(S.ID) Id,
T.RouteId,
T.Progress,
MIN(regexp_replace(LabelStart, '[^0-9]', '', 'g')) LabelStart,
MAX(regexp_replace(LabelStart, '[^0-9]', '', 'g')) LabelEnd,
'A' as Type
FROM Sequence S
INNER JOIN CTE T ON S.RouteId = T.RouteId AND S.Progress = T.Progress
GROUP BY T.RouteId, T.Progress
UNION
SELECT S.Id,
S.RouteId,
S.Progress,
S.LabelStart,
S.LabelEnd,
CASE
WHEN CAST(regexp_replace(LabelStart, '[^0-9]', '', 'g') as int) % 2 = 0
THEN 'E'
ELSE 'O'
END
FROM Sequence S
LEFT JOIN CTE T ON S.RouteId = T.RouteId AND S.Progress = T.Progress
WHERE T.RouteId IS NULL
ORDER BY Progress ASC
And some sample Fiddle.