Recursive split of path with H2 DB and SQL - sql

I've path names of the following common form (path depth not limited):
/a/b/c/d/e/...
Example
/a/b/c/d/e
Expected result
What I'd like to achieve now is to split the path into a table containing the folder and the respective parent:
parent
folder
/a/b/c/d/
e
/a/b/c/
d
/a/b/
c
/a/
b
/
a
The capabilities of the H2 db are a bit limited when it comes to splitting strings, thus my assumption was it must be solved recursively (especially since the path depth is not limited).
Any help would be appreciated :)

You need to use a recursive query, for example:
WITH RECURSIVE CTE(S, F, T) AS (
SELECT '/a/b/c/d/e', 0, 1
UNION ALL
SELECT S, T, LOCATE('/', S, T + 1)
FROM CTE
WHERE T <> 0
)
SELECT
SUBSTRING(S FROM 1 FOR F) PARENT,
SUBSTRING(S FROM F + 1 FOR
CASE T WHEN 0 THEN CHARACTER_LENGTH(S) ELSE T - F - 1 END) FOLDER
FROM CTE WHERE F > 0;
It produces
PARENT
FOLDER
/
a
/a/
b
/a/b/
c
/a/b/c/
d
/a/b/c/d/
e

Do something like this:
with recursive
p(p) as (select '/a/b/c/d/e' as p),
t(path, parent, folder, i) as (
select
p,
REGEXP_REPLACE(p, '(.*)/\w+', '$1'),
REGEXP_REPLACE(p, '.*/(\w+)', '$1'),
1
from p
union
select
t.parent,
REGEXP_REPLACE(t.parent, '(.*)/\w+', '$1'),
REGEXP_REPLACE(t.parent, '.*/(\w+)', '$1'),
t.i + 1
from t
where t.parent != ''
)
select *
from t;
resulting in
|PATH |PARENT |FOLDER|I |
|----------|--------|------|---|
|/a/b/c/d/e|/a/b/c/d|e |1 |
|/a/b/c/d |/a/b/c |d |2 |
|/a/b/c |/a/b |c |3 |
|/a/b |/a |b |4 |
|/a | |a |5 |
Not sure if you're really interested in trailing / characters, but you can easily fix the query according to your needs.

Related

Create recursive CTE for this table [duplicate]

This question already has answers here:
The maximum recursion 100 has been exhausted before statement completion
(2 answers)
Closed 3 months ago.
I have a table like this:
|id |name |parent|
+-------+----------+------+
|1 |iran | |
|2 |iraq | |
|3 |tehran |1 |
|4 |tehran |3 |
|5 |Vaiasr St |4 |
|6 |Fars |1 |
|7 |shiraz |6 |
It's about addresses from country to street. I want to create address by recursive cte like this:
with cte_address as
(
select
ID, [Name], parent
from
[Address]
where
Parent is null
union all
select
a.ID, a.[name], a.Parent
from
address a
inner join
cte_address c on a.parent = c.id
)
select *
from cte_address
But I get an error:
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
you have to use option (maxrecursion 0) at the end of your select query,Maxrecursion 0 allows infinite recursion:
with cte_address as
(
...
...
)
select * from cte_address
option (maxrecursion 0)
Note :
Limiting the number of recursions allowed for a specific query in SQL Server with the 100 default value prevents the cause of an infinite loop situation due to a poorly designed recursive CTE query.

How can I replace this correlated subquery within a function call?

Given the following tables
buckets
metric_id|start_date |bucket
------------------------------------
a |2019-12-05 00:00:00|1
a |2019-12-06 00:00:00|2
b |2021-10-31 00:00:00|1
b |2021-11-01 00:00:00|2
points
point_id|metric_id|timestamp
----------------------------
1 |a |2019-12-05 00:00:00
2 |a |2019-12-06 00:00:00
3 |b |2021-10-31 00:00:00
4 |b |2021-11-01 00:00:00
And the following query
select
p.metric_id,
bucket
from points p
left join width_bucket(p.timestamp, (select array(select start_date
from buckets b
where b.metric_id = p.metric_id -- correlated sub-query
))) as bucket on true
Output
metric_id|bucket
-----------------
a |1
a |2
b |1
b |2
How can I remove the correlated sub-query to improve the performance?
Currently ~280,000 points * ~650 buckets = ~180,000,000 loops = very slow!
Basically I want to remove the correlated sub-query and apply the width_bucket function only once per unique metric_id in buckets, so that the performance is improved and the function is still given the correct time series data.
How can this be done in Postgres 13?
You can use a cte to aggregate buckets first
with buckets_arr as (
select metric_id, array_agg(start_date order by start_date) arrb
from buckets
group by metric_id
)
select
p.metric_id,
width_bucket(p.timestamp, ba.arrb) bucket
from points p
join buckets_arr ba on p.metric_id = ba.metric_id
you can rewrite your query :
select
p.metric_id,
width_bucket(p.timestamp,array_agg(b.start_date)) bucket
from points p
left join buckets b on b.metric_id = p.metric_id
group by p.metric_id, p.timestamp
also adding index on buckets.start_date & points ( metric_id, timestamp) would help alot.

Build Traversal path for each BST node

Code to create the table:
create table BST
(
N Int,
P Int
)
insert into BST values
(1,3),
(3,8),
(4,6),
(6,3),
(7,6),
(8,NULL),
(10,8),
(13,14),
(14,10)
The BST hierarchy looks like this:
I am trying to build a query such that for each node it shows the traversal path needed to reach that particular node.
I tried applying recursive CTE, but I am not sure if I applied it in the correct way or not.
WITH NodeCTE (N, P, [Level])
AS
(
SELECT N,
P,
1
FROM BST
WHERE P IS NULL
UNION ALL
SELECT BST.N,
BST.P,
NodeCTE.[Level] + 1
FROM BST
JOIN NodeCTE ON BST.P = NodeCTE.N
)
SELECT CTE1.N AS Node,
CTE1.[Level]
FROM NodeCTE CTE1
LEFT JOIN NodeCTE CTE2 ON CTE1.P = CTE2.N
In the end I need to use STRING_AGG to format the data that is what I found by googling, but I am unable to figure out how to get the data in the format required prior to applying the STRING_AGG
Expected Output:
| N | TraversalPath |
|-------|----------------|
|1 |8->3->1 |
|3 |8->3 |
|4 |8->3->6->4 |
|6 |8->3->6 |
|7 |8->3->6->7 |
|8 |8 |
|10 |8->10 |
|13 |8->10->14->13 |
|14 |8->10->14 |
You did much of the dirty work, so the additions are minimal:
WITH NodeCTE (N, P, [Level], [path])
AS
(
SELECT N,
P,
1,
convert(NVARCHAR(MAX),N)
FROM BST
WHERE P IS NULL
UNION ALL
SELECT BST.N,
BST.P,
NodeCTE.[Level] + 1,
NodeCTE.[path] + '->' + convert(NVARCHAR(MAX),BST.N)
FROM BST
JOIN NodeCTE ON BST.P = NodeCTE.N
)
SELECT N,
P,
Level,
path AS Traversal
FROM NodeCTE
ORDER BY N

SQL average multiple columns for each row with nulls

I have a table like this:
|Quality|Schedule|Cost Control|
-------------------------------
|7 | 8.5 |10 |
|NULL | 9 |NULL |
and I need to calculate the average of each row in the same table so it looks like this:
|Quality|Schedule|Cost Control|AVG|
----------------------------------
|7 | 8.5 |10 |8.5|
|NULL | 9 |NULL |9 |
which I have done using the following code:
SELECT r.Quality, r.Schedule, r.CostControl,
((coalesce(r.quality,0)+
coalesce(r.schedule,0)+
coalesce(r.CostControl,0)/3) as Average
FROM dbo.Rating r
Which gives the following table:
|Quality|Schedule|Cost Control|AVG|
----------------------------------
|7 | 8.5 |10 |8.5|
|NULL | 9 |NULL |3 |
I know the problem is that the divisor is hard coded in my select statement, but I can't figure out how to make it variable. I tried using a case statement to select an addition column:
select Count(case when(r.quality) > 0 then 1 else 0 end +
case when (r.Schedule) > 0 then 1 else 0 end +
case when (r.CostControl) > 0 then 1 else 0 end)
But that only gives me one value. I'm out of ideas and facing a pretty tight deadline, so any help would be much appreciated.
Instead of dividing by 3, use
(CASE WHEN Quality IS NULL THEN 0 ELSE 1 END +
CASE WHEN Schedule IS NULL THEN 0 ELSE 1 END +
CASE WHEN [Cost Control] IS NULL THEN 0 ELSE 1 END)
I would use apply instead :
select *, (select sum(v) / count(v)
from ( values (quality), (Schedule), (CostControl)
) tt(v)
) as AVG
from table t;
I would use apply with avg():
SELECT r.Quality, r.Schedule, r.CostControl, v.average
FROM dbo.Rating r CROSS APPLY
(SELECT avg(val)
FROM (VALUES (quality), (schedule), (CostControl)) v(val)
) v(average);
This requires no subqueries, no long case expressions, generalizes easily to more columns, runs no risk of divide-by-zero . . . and the performance might even be equivalent to the case expression.

I need to find the missing Id's from the table after checking the min and max id's from another table

I need to find the missing id's from the table #a below:
id |SEQ|Text
1 |1 |AA
1 |3 |CC
1 |4 |DD
1 |5 |EE
1 |6 |FF
1 |7 |GG
1 |8 |HH
1 |10 |JJ
2 |1 |KK
2 |2 |LL
2 |3 |MM
2 |4 |NN
2 |6 |PP
2 |7 |QQ
3 |1 |TT
3 |4 |ZZ
3 |5 |XX
The max and min SEQ of the table #a is stored in another table #b:
id| mn| mx
1 | 1 | 12
2 | 1 | 9
3 | 1 | 5
My query below is giving the correct output but the execution is expensive. Is there another way to solve this?
with cte
as
(
select id, mn, mx
from #b
union all
select id, mn, mx -1
from cte
where mx-1 > 0
)
select
cte.id, cte.mx
from
cte
left join #a on cte.id = #a.id and cte.mx = #a.seq
where
#a.seq is null
order by cte.id, cte.mx
There are mainly 2 problems in this query:
The query is running very slow. The above records are just example. In real database I have 50,000 rows.
I tried to understand the execution plan to detect the hiccups. However I could not understand some part of it, which I have highlighted in Red.
It would be great if someone could help me here. I am stuck.
You use recursive CTE to generate a set of numbers. It is quite inefficient way to do it (see charts for generating 50K numbers here). I'd recommend to have a persisted table of numbers in the database. I, personally, have a table Numbers with 100K rows with one column Number, which is a primary key, which has integer numbers from 1 to 100,000.
Once you have such table, your query is simplified to this:
SELECT
#b.id, #b.mx
FROM
#b
INNER JOIN Numbers ON
#b.mx >= Numbers.Number AND
#b.mn <= Numbers.Number -- use this clause if mn can be more than 1
LEFT JOIN #a ON
#a.id = #b.id AND
#a.seq = Numbers.Number
WHERE
#a.seq IS NULL
ORDER BY #b.id, #b.mx
Also, it goes without saying, that you have to make sure that you have index on #b on id, plus index on #a on (id, seq).
Two things that come to my mind are:
Use a numbers / tally table. Either by creating a normal table or create a virtual using CTE. Use that to find numbers that don't exist.
If there's not a lot of missing numbers, you can use a trick with row_number() to find the ranges of numbers that don't have any gaps with something like this:
select id, min(seq), max(seq)
from (
select
id,
seq,
seq - row_number () over (partition by id order by SEQ asc) GRP
from
table1
) X group by id, GRP
order by 1
This will of course need more handling after you have find the ranges of numbers that exists.
CTE is just syntax and is most likely getting evaluated multiple times
Materialize the CTE output to a #temp and join to the #temp