How to lift or increase BigQuery's limit on recursive iterations - google-bigquery

It seems like Google BigQuery limits the number of iterations done for recursive queries:
A recursive CTE has reached the maximum number of iterations: 100
I cannot find any docs how to lift or at least increase this limit. Is it possible?

Documentation says (emphasis mine):
If recursion does not terminate, the query fails after reaching 100 iterations, which can be customized at the project level.
However, I could not find the setting in the project.
It's not in the project's BigQuery Quota Policy or BigQuery API quotas. It's not mentioned in the quota documentation either but it does say that one can request quota increases albeit some quota increases need to be requested via Cloud Customer Care.
Maybe this is one such limit or maybe we need to wait until GA... 🤷🏻‍♂

The work around I came up for this was to just chain CTEs, that may or may not be possible depending on how you're doing it.
Example:
DECLARE StartDate DATE DEFAULT '2022-03-15';
WITH RECURSIVE
Dates1 AS (
SELECT
StartDate AS Day,
1 RowNumber,
1 Ranking
UNION ALL
SELECT
DATE_ADD(Day, INTERVAL 1 DAY) AS Day,
RowNumber + 1 AS RowNumber,
Ranking + 1 AS Ranking
FROM
Dates1
WHERE
Day < CURRENT_DATE()
AND RowNumber < 100
),
Dates2 AS (
SELECT
DATE_ADD(Day, INTERVAL 1 DAY) AS Day,
1 AS RowNumber,
Ranking + 1 AS Ranking
FROM
Dates1
WHERE
RowNumber = 100
AND Day < CURRENT_DATE()
UNION ALL
SELECT
DATE_ADD(Day, INTERVAL 1 DAY) AS Day,
RowNumber + 1 AS RowNumber,
Ranking + 1 AS Ranking
FROM
Dates2
WHERE
Day < CURRENT_DATE()
AND RowNumber < 100
),
Dates AS (
SELECT
Day,
Ranking
FROM
Dates1
UNION ALL
SELECT
Day,
Ranking
FROM
Dates2
)
SELECT
*
FROM
Dates
ORDER BY
2 DESC

Related

Getting 3 best Posts per Month with three different queries

I am having a hard time wrapping my head around the row_number function.
This is my SCHEMA :
I am trying to build a query that would output the top value for Post_impressions within a date range (I.E. a month) WHEN the RowNumber is set to 1, the second best value when it is set to 2 and so on.
Here is the query I came up with so far
SELECT Post_timestamp,
Post_impressions,
Post_tipo,
from
(SELECT Post_timestamp,
Post_impressions,
Post_tipo,
FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), DAY)) as TheDate,
row_number() OVER
(PARTITION BY FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), DAY)) ORDER BY Post_impressions DESC) AS RowNumber
from `***DATABASENAME***`
WHERE RowNumber = 1 AND TheDate BETWEEN "2021-07-01" AND "2021-07-31";
Thans for your help!
You're getting 31 rows because you're partitioning the subquery by day, and each partition has a RowNumber = 1. You could partition your query by month but I suspect that wouldn't address all your use cases, particularly when you want to look at a time period over multiple partitions.
Alternatively if your use case is limited to month over month, you can simply partition by the month.
SELECT Post_timestamp,
Post_impressions,
Post_tipo,
from
(SELECT Post_timestamp,
Post_impressions,
Post_tipo,
FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), day)) as TheDate,
row_number() OVER
(PARTITION BY FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), month)) ORDER BY Post_impressions DESC) AS RowNumber
from `***DATABASENAME***`
WHERE RowNumber = 1 AND TheDate BETWEEN "2021-07-01" AND "2021-07-31";

Teradata Query for extracting data based on time interval (10 minutes

Can someone help me with a query in Teradata/SQL that I can use to extract all users that have more than 3 transactions in a timestamp of 10 minutes. Below is an extract of the table in question.
Kind regards,
You can use lag()/lead() and time comparisons. To get the rows where there are 2 such transactions before:
select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, -2) over (partition by userid order by transaction_timestamp) + interval '10' minute
If you only want the users:
select distinct userid
from (select t.*
from t
qualify transaction_timestamp < lag(transaction_timestamp, 2) over (partition by userid order by transaction_timestamp) + interval '10' minute
) t

Calculate Consecutive Concurrent Calls SQL Server

I just have the basic SQL skills hoping someone can help me out. I am using SQL Server trying to come up with a query to calculate consecutive concurrent calls happening at the same time per day. My company only has the license for 300 concurrent calls and were trying to find out the max point we reach per day. Basically if 3 people are on a call at 9:00 am and all 3 calls end at 9:15 the count would be 3. if another call happens at 9:05 am and ends at 9:20 am the count is now 4,but at 9:16 am the count would only be 1 then
I have a table (conferencecall2) with following columns:
CallID, UniqueCallID, Jointime, Leavetime
We get about 5000-6000 calls per day
Below is sample of some data.
The key here is to have (or generate) a table with one row for each time period. Then it's a simple APPLY or scalar subquery:
select t.minute, c.calls
from time_table_with_one_row_per_minute t
cross apply
(
select count(*) calls
from calls c
where t.Minute >= c.JoinTime
and t.Minute <= c.LeaveTime
) c
You can do this by unpivoting the columns, then using window functions:
select x.call_time, sum(sum(x.cnt_calls)) over(order by x.call_time) as cnt
from conferencecall2 c
cross apply (values (c.jointime, 1), (c.leavetime, -1)) as x(call_time, cnt_calls)
group by x.call_time
This solution scans the table only once, so I would expect it to perform efficiently over a large dataset.
Edit: you can get the peak of concurent calls per day with another level of subquery:
select convert(date, call_time) as call_day, max(cnt) as peak_cnt
from (
select x.call_time, sum(sum(x.cnt_calls)) over(order by x.call_time) as cnt
from conferencecall2 c
cross apply (values (c.jointime, 1), (c.leavetime, -1)) as x(call_time, cnt_calls)
group by x.call_time
) c
group by convert(date, call_time)
Edit 2
If you want to filter, then you need to do that in the outer query:
select convert(date, call_time) as call_day, max(cnt) as peak_cnt
from (
select x.call_time, sum(sum(x.cnt_calls)) over(order by x.call_time) as cnt
from conferencecall2 c
cross apply (values (c.jointime, 1), (c.leavetime, -1)) as x(call_time, cnt_calls)
group by x.call_time
) c
where call_time >= #endtime and call_time < #endtime
group by convert(date, call_time)

Window functions and calculating averages with tricky data manipulation

I have a SQL Server programming challenge involving some manipulations of healthcare patient pulse readings.
The goal is to do an average of readings within a certain time period and to only include the latest pulse reading of the day.
As an example, times are appt_time:
PATIENT 1 PATIENT 2
‘1/1/2019 80 ‘1/3/2019 90
‘1/4/2019 85
‘1/2/2019 10 am 78
‘1/2/2019 1 pm 85
‘1/3/2019 90
A patient may or may not have a second reading in a day. Only the last 3 latest chronological readings are used for the average. If less than 3 readings are available, an average is computed for 2 readings, or 1 reading is chosen as average.
Can this be done with the SQL window functions? This is a little more efficient than using a subquery.
I have used first_VALUE desc statements successfully to pick the last pulse in a day. I then have tried various row_number commands to exclude the marked off row (first pulse of the day when 2 readings are present). I cannot seem to correctly calculate the average. I have used row_number in select and from clauses.
with CTEBPI3
AS (
SELECT pat_id
,appt_time
,bp_pulse
,first_VALUE (bp_pulse) over(partition by appt_time order by appt_time desc ) fv
,ROW_NUMBER() OVER (PARTITION BY appt_time ORDER BY APPT_time DESC)RN1
,,Round(Sum(bp_pulse) OVER (PARTITION BY Pat_id) / COUNT (appt_time) OVER (PARTITION BY Pat_id), 0) AS adJAVGSYS3
FROM
pat_enc
WHERE appt_time > '07/15/2018'
)
select *,
WHEN rn=1
Average for pat1 should be 85
Average for pat2 should be 87.5
You can do this with two window functions:
MAX(appt_time) OVER ... to get the latest time per day
DENSE_RANK() OVER ... to get the last three days
You get the date part from your datetime with CONVERT(DATE, appt_time). The average function AVGis already built in :-)
The complete query:
select pat_id, avg(bp_pulse) as average_pulse
from
(
select
pat_id, appt_time, bp_pulse,
max(appt_time) over (partition by pat_id, convert(date, appt_time)) as max_time,
dense_rank() over (partition by pat_id order by convert(date, appt_time) desc) as rn
from pat_enc
) evaluated
where appt_time = max_time -- last row per day
and rn <= 3 -- last three days
group by pat_id
order by pat_id;
If the column bp_pulse is defined as an integer, you must convert it to a decimal to avoid integer arithmetic:
select pat_id, avg(convert(decimal, bp_pulse)) as average_pulse
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=3df744fcf2af89cdfd8b3cd8b6546d89
Actually, window functions are not necessarily more efficient. It is worth comparing:
select p.pat_id, avg(p.bp_pulse)
from pat_enc p
where -- appt_time > '2018-07-15' and -- don't know if this is necessary
p.appt_time >= (select distinct convert(date, appt_time)
from pat_enc p2
where p2.pat_id = p.pat_id
order by distinct convert(date, appt_time)
offset 2 row fetch first 1 row only
) and
p.appt_time = (select max(p2.appt_time)
from pat_enc p2
where p2.pat_id = p.pat_id and
convert(date, p2.appt_time) = convert(date, p.appt_time)
);
This wants an index on pat_enc(pat_id, appt_time).
In fact, there are a variety of ways to write this logic, with different mixes of subqueries and window functions (this is one extreme).
Which performs the best will depend on the nature of your data. In particular:
The number of appointments on the same day -- is this normally 1 or a large number?
The overall number of days with appointments -- is this right around three or are there hundreds?
You need to test on your data, but I think window function will work best when relatively few rows are filtered out (~1 appointment/day, ~3 days with appointments). Subqueries will be helpful when more rows are being filtered.

Postgres windowing (determine contiguous days)

Using Postgres 9.3, I'm trying to count the number of contiguous days of a certain weather type. If we assume we have a regular time series and weather report:
date|weather
"2016-02-01";"Sunny"
"2016-02-02";"Cloudy"
"2016-02-03";"Snow"
"2016-02-04";"Snow"
"2016-02-05";"Cloudy"
"2016-02-06";"Sunny"
"2016-02-07";"Sunny"
"2016-02-08";"Sunny"
"2016-02-09";"Snow"
"2016-02-10";"Snow"
I want something count the contiguous days of the same weather. The results should look something like this:
date|weather|contiguous_days
"2016-02-01";"Sunny";1
"2016-02-02";"Cloudy";1
"2016-02-03";"Snow";1
"2016-02-04";"Snow";2
"2016-02-05";"Cloudy";1
"2016-02-06";"Sunny";1
"2016-02-07";"Sunny";2
"2016-02-08";"Sunny";3
"2016-02-09";"Snow";1
"2016-02-10";"Snow";2
I've been banging my head on this for a while trying to use windowing functions. At first, it seems like it should be no-brainer, but then I found out its much harder than expected.
Here is what I've tried...
Select date, weather, Row_Number() Over (partition by weather order by date)
from t_weather
Would it be better just easier to compare the current row to the next? How would you do that while maintaining a count? Any thoughts, ideas, or even solutions would be helpful!
-Kip
You need to identify the contiguous where the weather is the same. You can do this by adding a grouping identifier. There is a simple method: subtract a sequence of increasing numbers from the dates and it is constant for contiguous dates.
One you have the grouping, the rest is row_number():
Select date, weather,
Row_Number() Over (partition by weather, grp order by date)
from (select w.*,
(date - row_number() over (partition by weather order by date) * interval '1 day') as grp
from t_weather w
) w;
The SQL Fiddle is here.
I'm not sure what the query engine is going to do when scanning multiple times across the same data set (kinda like calculating area under a curve), but this works...
WITH v(date, weather) AS (
VALUES
('2016-02-01'::date,'Sunny'::text),
('2016-02-02','Cloudy'),
('2016-02-03','Snow'),
('2016-02-04','Snow'),
('2016-02-05','Cloudy'),
('2016-02-06','Sunny'),
('2016-02-07','Sunny'),
('2016-02-08','Sunny'),
('2016-02-09','Snow'),
('2016-02-10','Snow') ),
changes AS (
SELECT date,
weather,
CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change
FROM v)
SELECT date
, weather
,(SELECT count(weather) -- number of times the weather didn't change
FROM changes v2
WHERE v2.date <= v1.date AND v2.weather = v1.weather
AND v2.date >= ( -- bounded between changes of weather
SELECT max(date)
FROM changes v3
WHERE change = 0
AND v3.weather = v1.weather
AND v3.date <= v1.date) --<-- here's the expensive part
) curve
FROM changes v1
Here is another approach based off of this answer.
First we add a change column that is 1 or 0 depending on whether the weather is different or not from the previous day.
Then we introduce a group_nr column by summing the change over an order by date. This produces a unique group number for each sequence of consecutive same-weather days since the sum is only incremented on the first day of each sequence.
Finally we do a row_number() over (partition by group_nr order by date) to produce the running count per group.
select date, weather, row_number() over (partition by group_nr order by date)
from (
select *, sum(change) over (order by date) as group_nr
from (
select *, (weather != lag(weather,1,'') over (order by date))::int as change
from tmp_weather
) t1
) t2;
sqlfiddle (uses equivalent WITH syntax)
You can accomplish this with a recursive CTE as follows:
WITH RECURSIVE CTE_ConsecutiveDays AS
(
SELECT
my_date,
weather,
1 AS consecutive_days
FROM My_Table T
WHERE
NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather)
UNION ALL
SELECT
T.my_date,
T.weather,
CD.consecutive_days + 1
FROM
CTE_ConsecutiveDays CD
INNER JOIN My_Table T ON
T.my_date = CD.my_date + INTERVAL '1 day' AND
T.weather = CD.weather
)
SELECT *
FROM CTE_ConsecutiveDays
ORDER BY my_date;
Here's the SQL Fiddle to test: http://www.sqlfiddle.com/#!15/383e5/3