LIMIT within ARRAY_AGG in BigQuery - google-bigquery

When I put a LIMIT clause in an ARRAY_AGG, I still get many items in the array. The docs suggest that this should work.
Am I doing something wrong?
SELECT
x,
ARRAY_AGG((
SELECT
AS STRUCT y
LIMIT
1)) y
FROM
`a`,
UNNEST(b) b
WHERE
x = 'abc'
GROUP BY
1
LIMIT 1
...produces a result with one row of STRING an ARRAY with 50 items, when I would have expected only 1 item.

The issue was the placement of the LIMIT clause. It was in scope for the SELECT statement, rather than the ARRAY_AGG function. This corrected it:
SELECT
x,
ARRAY_AGG((
SELECT
AS STRUCT y
) LIMIT 1) y
FROM
`a`,
UNNEST(b) b
WHERE
x = 'abc'
GROUP BY
1
LIMIT 1

Related

Snowflake table and generator functions does not give expected result

I tried to create a simple SQL to track query_history usage, but got into trouble when creating my timeslots using the table and generator functions (the CTE named x below).
I got no results at all when limiting the query_history using my timeslots, so after a while I hardcoded an SQL to give the same result (the CTE named y below) and this works fine.
Why does not x work? As far as I can see x and y produce identical result?
To test the example first run the code as it is, this produces no result.
Then comment the line x as timeslots and un-comment the line y as timeslots, this will give the desired result.
with
x as (
select
dateadd('min',seq4()*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(seq4()+1)*10,dateadd('min',-60,current_timestamp())) t
from table(generator(rowcount => 6))
),
y as (
select
dateadd('min',n*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(n+1)*10,dateadd('min',-60,current_timestamp())) t
from (select 0 n union all select 1 n union all select 2 union all select 3
union all select 4 union all select 5)
)
--select * from x;
--select * from y;
select distinct
user_name,
timeslots.f
from snowflake.account_usage.query_history,
x as timeslots
--y as timeslots
where start_time >= timeslots.f
and start_time < timeslots.t
order by timeslots.f desc;
(I know the code is not optimal, this is only meant to illustrate the problem)
SEQ:
Returns a sequence of monotonically increasing integers, with wrap-around. Wrap-around occurs after the largest representable integer of the integer width (1, 2, 4, or 8 byte).
If a fully ordered, gap-free sequence is required, consider using the ROW_NUMBER window function.
For:
with x as (
select
dateadd('min',seq4()*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(seq4()+1)*10,dateadd('min',-60,current_timestamp())) t
from table(generator(rowcount => 6))
)
SELECT * FROM x;
Should be:
with x as (
select
(ROW_NUMBER() OVER(ORDER BY seq4())) - 1 AS n,
dateadd('min',n*10,dateadd('min',-60,current_timestamp())) f,
dateadd('min',(n+1)*10,dateadd('min',-60,current_timestamp())) t
from table(generator(rowcount => 6))
)
SELECT * FROM x;

Perform loop and calculation on BigQuery Array type

My original data, B is an array of INT64:
And I want to calculate the difference between B[n+1] - B[n], hence result in a new table as follow:
I figured out I can somehow achieve this by using LOOP and IF condition:
DECLARE x INT64 DEFAULT 0;
LOOP
SET x = x + 1
IF(x < array_length(table.B))
THEN INSERT INTO newTable (SELECT A, B[OFFSET(x+1)] - B[OFFSET(x)]) from table
END IF;
END LOOP;
The problem is that the above idea doesn't work on each row of my data, cause I still need to loop through each row in my data table, but I can't find a way to integrate my scripting part into a normal query, where I can
SELECT A, [calculation script] from table
Can someone point me how can I do it? Or any better way to solve this problem?
Thank you.
Below actually works - BigQuery
select * replace(
array(select diff from (
select offset, lead(el) over(order by offset) - el as diff
from unnest(B) el with offset
) where not diff is null
order by offset
) as B
)
from `project.dataset.table` t
if to apply to sample data in your question - output is
You can use unnest() with offset for this purpose:
select id, a,
array_agg(b_el - prev_b_el order by n) as b_diffs
from (select t.*, b_el, lag(b_el) over (partition by t.id order by n) as prev_b_el
from t cross join
unnest(b) b_el with offset n
) t
where prev_b_el is not null
group by t.id, t.a

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

Multiple assignments in SQL - correct syntax

Multiple assignments in SQL. The one variable form works and I have extended to use two variables: A and B. But I cannot see how I would extend the format. The following is what I am trying.
SELECT TOP 1 #MYA = A; #MYB = CB FROM WHERE X='MYX' ORDER BY X Total select #MYA #MYB
Here is my original statement:
SELECT TOP 1 #MYA = A FROM WHERE X='MYX' ORDER BY X Total select #MYA
How can I return two variables or more?
Use commas:
SELECT TOP 1 #MYA = A, #MYB = CB
FROM
WHERE X='MYX'
ORDER BY X, Total;
select #MYA, #MYB;
The ; goes at the end of the statement.

MYSQL query to get 'n' rows nearby given row

I have a MySQL table by name 'videos', where one of the column is 'cat' (INT) and 'id' is the PRIMARY KEY.
So, if 'x' is the row number,and 'n' is the category id, I need to get nearby 15 rows
Case 1: There are many rows in the category before and after 'x'.. Just get 7 each rows before and after 'x'
SELECT * FROM videos WHERE cat=n AND id<x ORDER BY id DESC LIMIT 0,7
SELECT * FROM videos WHERE cat=n AND id>x LIMIT 0,7
Case 2: If 'x' is in the beginning/end of the the table -> Print all (suppose 'y' rows) the rows before/after 'x' and later print 15-y rows after/before 'x'
Case 1 is not a problem but I am stuck with Case 2. Is there any generic method to get 'p' rows nearby a row 'x' ??
This query will always position N (exact id match) at the centre of the data, unless there are no more rows (in either direction), in which case rows will be added from the prior/next sections as required, while still preserving data from prior/next (as much as available).
set #n := 28;
SELECT * FROM
(
SELECT * FROM
(
(SELECT v.*, 0 as prox FROM videos v WHERE cat=1 AND id = #n)
union all
(SELECT v.*, #rn1:=#rn1+1 FROM (select #rn1:=0) x, videos v WHERE cat=1 AND id < #n ORDER BY id DESC LIMIT 15)
union all
(SELECT v.*, #rn2:=#rn2+1 FROM (select #rn2:=0) y, videos v WHERE cat=1 AND id > #n ORDER BY id LIMIT 15)
) z
ORDER BY prox
LIMIT 15
) w
order by id
For example, if you had 30 ids for cat=1, and you were looking at item #28, it will show items 16 through 30, #28 is the 3rd row from the bottom.
Some explanation:
SELECT v.*, 0 as prox FROM videos v WHERE cat=1 AND id = #n
v.* means to select all columns in the table/alias v. In this case, v is the alias for the table videos.
0 as prox means to create a column named prox, and it will contain just the value 0
The next query:
SELECT v.*, #rn1:=#rn1+1 FROM (select #rn1:=0) x, videos v WHERE cat=1 AND id < #n ORDER BY id DESC LIMIT 15
v.* - as above
#rn1:=#rn1+1 uses a variable to return a sequence number for each record in this subquery. It starts with 1 and for each record, following the ORDER BY id DESC, it will be numbered 2, then 3 etc.
(select #rn1:=0) x This creates a subquery aliased as x, all it does is ensures the variable #rn1 starts with the value 1 for the first row.
The end result is that the variable and 0 as prox ranks each row based on how close it is to the value #n. The clause order by prox limit 15 takes the 15 that are closest to N.