WITH RECURSIVE: is it possible to COUNT() the working table? - sql

I am using HSQLDB 2.6.1, and I want to COUNT() the working table of a recursive CTE.
I wrote the following test:
with recursive
nums (n, m) as
(
select 1, 1 from (values(1))
union all
select * from (
with
var (k) as
(
select count(*) from nums
)
select n+1, var.k from nums, var where n+1 <= 10
)
)
select n, m from nums;
Here is the result set:
N M
1 1
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
It seems like COUNT() does not work on the working table.
Was it not supposed to work?
And is there another way to count the working table?

This cannot be done directly in the current version (2.7.0) of HyperSQL.
The column n of your query is incremented in each round, therefore counting the identical n values in the result table gives the size of the delta table for each round.
with recursive nums (n, m) as (
...
) select count(*) from nums group by n;

Related

Snowflake table and generator functions does not give expected result

I tried to create a simple SQL to track query_history usage, but got into trouble when creating my timeslots using the table and generator functions (the CTE named x below).
I got no results at all when limiting the query_history using my timeslots, so after a while I hardcoded an SQL to give the same result (the CTE named y below) and this works fine.
Why does not x work? As far as I can see x and y produce identical result?
To test the example first run the code as it is, this produces no result.
Then comment the line x as timeslots and un-comment the line y as timeslots, this will give the desired result.
with
x as (
select
dateadd('min',seq4()*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(seq4()+1)*10,dateadd('min',-60,current_timestamp())) t
from table(generator(rowcount => 6))
),
y as (
select
dateadd('min',n*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(n+1)*10,dateadd('min',-60,current_timestamp())) t
from (select 0 n union all select 1 n union all select 2 union all select 3
union all select 4 union all select 5)
)
--select * from x;
--select * from y;
select distinct
user_name,
timeslots.f
from snowflake.account_usage.query_history,
x as timeslots
--y as timeslots
where start_time >= timeslots.f
and start_time < timeslots.t
order by timeslots.f desc;
(I know the code is not optimal, this is only meant to illustrate the problem)
SEQ:
Returns a sequence of monotonically increasing integers, with wrap-around. Wrap-around occurs after the largest representable integer of the integer width (1, 2, 4, or 8 byte).
If a fully ordered, gap-free sequence is required, consider using the ROW_NUMBER window function.
For:
with x as (
select
dateadd('min',seq4()*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(seq4()+1)*10,dateadd('min',-60,current_timestamp())) t
from table(generator(rowcount => 6))
)
SELECT * FROM x;
Should be:
with x as (
select
(ROW_NUMBER() OVER(ORDER BY seq4())) - 1 AS n,
dateadd('min',n*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(n+1)*10,dateadd('min',-60,current_timestamp())) t
from table(generator(rowcount => 6))
)
SELECT * FROM x;

Find missed max and min value in a sequence of numbers

For example, I have a sequence of numbers: {1, 2, 5, 7}.
I need to find the smallest and the biggest one, which are missed in this sequence (min=3 and max=6 for this example). Values can also be negative.
Here is my solution, but it doesn't pass on extra checking database (Wrong number of records (less by 1)), so I can't say what is exactly wrong. I also tried versions with LEFT OUTER JOIN and EXCEPT predicates - same problem. Please, help me to improve my solution.
WITH AA AS (SELECT MAX(Q_ID) MX
FROM UTQ),
BB AS (SELECT MIN(Q_ID) CODE
FROM UTQ
UNION ALL
SELECT CODE + 1
FROM BB
WHERE CODE < (SELECT MX
FROM AA)
)
SELECT MIN(CODE) MIN_RES, MAX(CODE) MAX_RES
FROM BB
WHERE CODE NOT IN (SELECT Q_ID
FROM UTQ)
One method is not exists:
select min(q_id + 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id + 1)
union all
select max(q_id - 1)
from utq
where not exists (select 1 from utq utq2 where utq2.q_id = utq.id - 1);
You can also use lead() and lag():
select min(case when next_q_id <> q_id + 1 then q_id + 1 end),
max(case when prev_q_id <> q_id - 1 then q_id - 1 end)
from (select utq.*,
lag(q_id) over (order by q_id) as prev_q_id,
lead(q_id) over (order by q_id) as next_q_id
from utq
) utq;
A tally based method seems like a good approach here. Especially if the sequences are large.
The first CTE summarizes the maximum and minimum q_id's in the test table. The second CTE selects the missing integers by generating the complete sequence (using the fnNumbers tvf) between the minimum and maximum q_id values and comparing WHERE NOT EXISTS to the original sequence. Something like this.
numbers function
create function [dbo].[fnNumbers](
#zero_or_one bit,
#n bigint)
returns table with schemabinding as return
with n(n) as (select null from (values (1),(2),(3),(4)) n(n))
select 0 n where #zero_or_one = 0
union all
select top(#n) row_number() over(order by (select null)) n
from n na, n nb, n nc, n nd, n ne, n nf, n ng, n nh,
n ni, n nj, n nk, n nl, n nm, n np, n nq, n nr;
data and query
drop table if exists #seq;
go
create table #seq(
q_id int unique not null);
insert #seq values (1),(2),(5),(7);
with
max_min_cte(max_q, min_q) as (
select max(q_id), min(q_id)
from #seq),
missing_cte(q_id) as (
select mm.min_q+fn.n
from max_min_cte mm
cross apply dbo.fnNumbers(0, mm.max_q-mm.min_q) fn
where not exists (select 1
from #seq s
where (mm.min_q+fn.n)=s.q_id))
select max(q_id) max_missing, min(q_id) min_missing
from missing_cte;
output
max_missing min_missing
6 3
You can try like following using LEAD
SELECT MIN(Q_ID + 1) AS MinValue
,MAX(Q_ID + 1) AS MaxValue
FROM (
SELECT *,LEAD(Q_ID) OVER (ORDER BY Q_ID) NQ_ID
FROM (VALUES (1),(2),(5),(7)) v(Q_ID)
) t
WHERE NQ_ID - Q_ID <> 1

Teradata SUBSTRING Index Out of Bounds

This query works:
SELECT
TOP 100 SUBSTRING(column_name FROM 6 FOR CHARACTER_LENGTH(column_name) - 5) AS X
FROM db_name.table_name
But the following query (with WHERE clause added) does not execute.
SELECT
TOP 100 SUBSTRING(column_name FROM 6 FOR CHARACTER_LENGTH(column_name) - 5) AS X
FROM db_name.table_name
WHERE NOT EXISTS
(
SELECT 1
FROM db_name2.lookup_name H
WHERE H.SRC_NUM1 = X
AND H.SRC_TYPE = 11
)
The query above throws
SELECT Failed. 2663: SUBSTR: string subscript out of bounds in table_name.column_name
However, this following one works (original SELECT is nested)
SELECT *
FROM (
SELECT
TOP 100 SUBSTRING(column_name FROM 6 FOR CHARACTER_LENGTH(column_name) - 5) AS X
FROM db_name.table_name
) A
WHERE NOT EXISTS
(
SELECT 1
FROM db_name2.lookup_name H
WHERE H.SRC_NUM1 = X
AND H.SRC_TYPE = 11
)
Why is that so? I am using SQL assistant to execute the queries but I doubt it is of relevance.
Try to change (maybe error is caused when column_name's lenght is less then 6):
SELECT
TOP 100 CASE WHEN CHARACTER_LENGTH(column_name)>5
THEN SUBSTRING(column_name FROM 6 FOR CHARACTER_LENGTH(column_name) - 5)
ELSE NULL END AS X
FROM db_name.table_name

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.
You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';
This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Max sum for the continous N rows

I've the following table (both A and B are integers):
Update 1 - Could anyone do me a favour and run the solution on a set of 1M records with B being a random decimal (to avoid overflows) residing in [0 to 1] range for N=> 10, 100 and 1000? I'd like to get a flavor of the time, required to run the solution query. Thanks a lot in advance.
Sample data:
A B
1 1
2 8
3 1
4 11
5 1
6 1
7 6
8 1
9 1
10 2
How do I get the maximum Sum of B values for any N sequential A's? The solution mustn't use cursors, usage of table vars/tem tables has to be stongly justified.
I can use SQLCLR in case if it'll give a distinct performance boost.
Some clarifications:
Max Sum for 1 element is 11 (see A = 4)
Max Sum for 2 elements is 12 (it's either A=> 1 & 2 or A=> 2 & 3),
Max Sum for 3 elements is 20 (A=>2, 3, 4),
Max Sum for 4 is 21 (A=>1,2,3,4 or A=>2,3,4,5) etc.
Since the A values are guaranteed to be consecutive integers, given N we know for any particular A which values we are interested in. So
SELECT
A,
(SELECT SUM(B) FROM Table T2 WHERE T.A <= T2.A AND T2.A <= T.A + N - 1)
AS SumOfBs
FROM Table T
WHERE A + N - 1 <= (SELECT COUNT(*) FROM Table)
gives, for each A, the sum of the B values for the N rows starting there. The WHERE restricts us to rows that do actually have N rows starting there. Put this in a subquery and we can get the maximum:
SELECT
MAX(SumOfBs) AS DesiredValue
FROM
(
SELECT
A,
(SELECT SUM(B) FROM Table T2 WHERE T.A <= T2.A AND T2.A <= T.A + N - 1)
AS SumOfBs
FROM Table T
WHERE A + N - 1 <= (SELECT COUNT(*) FROM Table)
) Intermediate
should do the job.
I've loaded your test data into a table called data.
The following SQL gives me the answer 20 for N=3:
declare #N int
set #N = 3
select max(SumB)
from data d
cross apply (select SumB = SUM(B) from data sub where sub.A between d.A - (#N-1) and d.A) x
Try:
with cte as
(select 1 window_count union all
select window_count+1 window_count from cte where window_count<#N)
select max(sum_B) from
(select T1.A,
sum(T2.B) sum_B
from MyTable T1
cross join cte
join MyTable T2 on T1.A = T2.A + cte.window_count - 1
group by T1.A) sq
I'm possibly not understanding the question fully, but it looks to me like...
SELECT SUM(B) FROM table WHERE A <= n
If not correct, can you explain a bit more?