How can I use a variable on BigQuery SQL Editor? - google-bigquery

While writing SQL queries on BigQuery UI, sometimes I do a lot of JOIN over multiple tables using many WHERE clauses with the same date condition for each table.
Whenever I need to see the results for a different date, I have to replace it at multiple locations in the query. I wonder if there is a simple way to use a variable in the BQ SQL Editor and pass it just once (top/bottom)?
This is true for all the complicated queries as we have to search throughout the query for variables to change.

While parameterized queries are not available in the Console. You can use Scripting, instead.
According to your need, you can use DECLARE and/or SET. It is stated in the documentation that:
DECLARE: Declares a variable of the specified type. If the DEFAULT clause is specified, the variable is initialised with the
value of the expression; if no DEFAULT clause is present, the variable
is initialised with the value NULL
The syntax is as follows:
#Declaring the variable's type and initialising the variable using DEFAULT
DECLARE variable STRING DEFAULT 'San Francisco';
SELECT * FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions`
#using the variable inside the where clause
WHERE name = variable
SET: Sets a variable to have the value of the provided expression, or sets multiple variables at the same time based on the
result of multiple expressions.
And the syntax, as below:
#Declaring the variable using Declare and its type
DECLARE variable STRING;
#Initialising the variable
SET variable = 'San Francisco';
SELECT * FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_regions`
#using the variable inside the where clause
WHERE name = variable
In both examples above, I have queried against a public dataset bigquery-public-data.san_francisco_bikeshare.bikeshare_regions. Also, both outputs are the same,
Row region_id name
1 3 San Francisco
In addition to the above example, more specifically to your case, when declaring a variable as date you can to it as follows:
DECLARE data1 DATE DEFAULT DATE(2019,02,15);
WITH sample AS(
SELECT DATE(2019,02,15) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,16) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,17) AS date_s, "Some other field!" AS string UNION ALL
SELECT DATE(2019,02,18) AS date_s, "Some other field!" AS string
)
SELECT * FROM sample
WHERE date_s = data1
And the output,
Row date_s string1
1 2019-02-15 Some other field!

Related

How to use CASE statement with a declared array

Within BigQuery I want to declare a list of 5-digit zipcodes and then refer to the list throughout different elements of the rest of the code.
I've tried to do something like:
DECLARE monday ARRAY<int>
('98198', '98003', '98023', '98498',
'98499', '98402', '98403', '98405',
'98406', '98407', '98409', '98421',
'98422', '98465', '98466', '98467',
'98070', '98418', '98103', '98107',
'98117', '98110');
SELECT
CASE WHEN SUBSTR(o.shipping_address.zip, 0, 5) IN --ARRAY NAMED MONDAY-- THEN 'Monday' END
But can't really seem to get it to work correctly. As it stands now my code works with declaring the same list over and over but I know there's gotta be a way to declare once and then re-use the declaration wherever.
Thanks!
I think you want unnest():
case when SUBSTR(o.shipping_address.zip, 1, 5) in (select mon from unnest(monday) mon)
Note that substr() starts with 1.
Of course, if you want your code to work, you should use proper typing:
declare monday ARRAY<string>;
set monday = array['98198', '98003', '98023', '98498',
'98499', '98402', '98403', '98405',
'98406', '98407', '98409', '98421',
'98422', '98465', '98466', '98467',
'98070', '98418', '98103', '98107',
'98117', '98110'];
select *
from unnest(monday) mon;
Before providing a query I have some considerations about your code. First, you specified a ARRAY and passed a tuple of STRINGS. Second, I assumed you want to pass a ARRAY of STRINGS because you use the SUBSTR() built-in method which receives a string. Lastly, it is in my understanding that you will have a static array with the zip codes for Monday then it will be used to evaluate each zip code in your table if it belongs to Monday.
Having said that, I was able to create a query using JavaScript UDF in BigQuery to perform what you aim. Below is the Query in StandardSQL.
DECLARE monday ARRAY<STRING> DEFAULT ['98198', '98003', '98023', '98498',
'98499', '98402', '98403', '98405',
'98406', '98407', '98409', '98421',
'98422', '98465', '98466', '98467',
'98070', '98418', '98103', '98107',
'98117', '98110'];
CREATE TEMP FUNCTION check( monday_code ARRAY<STRING> ,x STRING)
RETURNS BOOL
LANGUAGE js AS """
return monday_code.includes(x);
""";
#Sample data within WITH
WITH post_code AS (
SELECT '123456229' as zip_code UNION ALL
SELECT '123456789' as zip_code UNION ALL
SELECT '98198' as zip_code UNION ALL
SELECT '98003' as zip_code UNION ALL
SELECT '98023' as zip_code
)
SELECT substr(zip_code,1,5) as zip_code,
CASE WHEN check(monday,SUBSTR(zip_code, 1,5)) THEN 'MONDAY'
ELSE 'Not identified'
END AS DAY
from post_code
And the output,
Row zip_code DAY
1 12345 Not identified
2 12345 Not identified
3 98198 MONDAY
4 98003 MONDAY
5 98023 MONDAY
Notice that, I have used the DECLARE method with a DEFAULT array specified. Then, within the JavaScript UDF, it received the monday array and the row's current value. So, it checks if the value is inside the monday array using the include() built-in JS method which returns TRUE if the zip code is within the array and FALSE otherwise. Finally, in the last SELECT statement, CASE WHEN is used in order to specify MONDAY if True, meaning that the array contains the passed value.

Column Name = %Column Name% SQL

We have a class that replaces SQL parameters with their actual HashMap values. For example, select * from x where date = %processingDate% will substitute the value of processingDate and then retrieve the corresponding records. However, it seems to not do the replacement when the parameter name is the same name as the column; for example, select * from x where date = %date% does not substitute date and then retrieves all the records because it's acting like an always true boolean. Is this expected SQL behavior?
Thanks for the help.
Given date is a reserved keyword in T-SQL I would recommend wrapping the column with [] to qualify the column.
Link to list of reserved kewords
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/reserved-keywords-transact-sql
You mentioned SQL Paramater in your question. I would expect your parameter to look like #date if it is a sql parameter. If it does not have the # prefix your query may be evaluating the column and not the parameter that you expect.
I believe something like this would work for you:
--Assuming #date is being evaluated as a string
SELECT *
FROM x
WHERE [date] LIKE '%' + #date + '%'
--Evaluates the specific value (Research differences between like and = operator)
SELECT *
FROM x
WHERE [date] = #date

Because the IN clause () SQL Server does not work this way

see if someone can get me this question:
This is my SQL query which loads into a temporary table for which to consult posterirormente there all goes well:
DECLARE #listStr VARCHAR(MAX)
SELECT #listStr = COALESCE(#listStr+' ,' , '') + sCodProducto
FROM dbo.Productos WHERE sCodProducto IN (80063, 80061, 80067, 80062, 80065)
INSERT INTO #IDPROD2(CODIGO)
SELECT #listStr
if I make this a select shows me the following data:
SELECT * FROM #IDPROD2
Well, now if I consult so this brings me nothing:
SELECT * FROM dbo.Productos P WHERE P.sCodProducto IN (SELECT CODIGO FROM #IDPROD2)
now if it works this way:
SELECT * FROM dbo.Productos P WHERE P.sCodProducto IN (80061 ,80062 ,80063 ,80065 ,80067)
A field in a query result is considered a single VALUE. The actual contents of that field are irrelevant. Doesn't matter if you have a numbers in CSV format, or one single number - that entire chunk of data is one single VALUE, as far as the DB is concerned.
Since it's a single value, your codigo field's contents are parsed/executed as:
... WHERE foo IN (#codigo)
... WHERE foo IN ('1,2,3,4,...');
... WHERE foo = '1,2,3,4,....';
The DB will NOT parse those values, and therefore will NOT treat string as multiple distinct values.
If you want the contents of a single field or variable to be treated as multiple distinct values, you have to use dynamic sql:
sql = "SELECT .... WHERE foo IN (" + #codigo + ")";
exec #sql;
Note that this is basically a form of SQL injection. You remove the "context" of being a single value from that variable field, and force the DB to treat it as multiple different values.
Some DBs get around this by providing extract functions, e.g. mysql's find_in_set, which is designed specifically for this:
SELECT ... WHERE FIND_IN_SET('80063', '80063, 80061, 80067, 80062, 80065');
There is no such function in TSQL, but can be simulated, even with a simple like query:
... WHERE foo='80063' OR foo LIKE '80063,%' OR foo LIKE '%,80063,%' OR foo LIKE '%,80063'

How to replace where clause dynamically in query (BIRT)?

In my report query I have a where clause that needs to be replaced dynamically based on the data chosen in the front end.
The query is something like :
where ?=?
I already have a code to replace the value - I created report parameter and linked to the value ? in the query.
Example:
where name=?
Any value of name that comes from front end replaces the ? in the where clause - this works fine.
But now I need to replace the entire clause (where ?=?). Should I create two parameters and link them to both the '?' ?
No, unfortunately most database engines do not allow to use a query parameter for handling a dynamic column name. This is for security considerations.
So you need to keep an arbitrary column name in the query:
where name=?
And then in "beforeOpen" script of the dataset replace 'name' with a report parameter value:
this.queryText=this.queryText.replace("name",params["myparameter"].value);
To prevent SQLIA i recommend to test the value of the parameter in this script. There are many ways to do this but a white list is the strongest test, for example:
var column=params["myparameter"].value;
if (column=="name" || column=="id" || column=="account" || column=="mycolumnname"){
this.queryText=this.queryText.replace("name",column);
}
In addition to Dominique's answer and your comment, then you'll just need a slightly more advanced logic.
For example, you could name your dynamic column-name-value pairs (column1, value1), (column2, value2) and so on. In the static text of the query, make sure to have bind variables for value1, value2 and so on (for example, with Oracle SQL, using the syntax
with params as (
select :value1 as value1,
:value2 as value2 ...
from dual
)
select ...
from params, my_table
where 1=1
and ... static conditions....
Then, in the beforeOpen script, append conditions to the query text in a loop as needed (the loop left as an exercise to the reader, and don't forget checking the column names for security reasons!):
this.queryText += " and " + column_name[i] + "= params.value" + i;
This way you can still use bind variables for the comparison values.

Convert numeric to string inside a user-defined function

I am trying to call/convert a numeric variable into string inside a user-defined function. I was thinking about using to_char, but it didn't pass.
My function is like this:
create or replace function ntile_loop(x numeric)
returns setof numeric as
$$
select
max("billed") as _____(to_char($1,'99')||"%"???) from
(select "billed", "id","cm",ntile(100)
over (partition by "id","cm" order by "billed")
as "percentile" from "table_all") where "percentile"=$1
group by "id","cm","percentile";
$$
language sql;
My purpose is to define a new variable "x%" as its name, with x varying as the function input. In context, x is numeric and will be called again later in the function as a numeric (this part of code wasn't included in the sample above).
What I want to return:
I simply want to return a block of code so that every time I change the percentile number, I don't have to run this block of code again and again. I'd like to calculate 5, 10, 20, 30, ....90th percentile and display all of them in the same table for each id+cm group.
That's why I was thinking about macro or function, but didn't find any solutions I like.
Thank you for your answers. Yes, I will definitely read basics while I am learning. Today's my second day to use SQL, but have to generate some results immediately.
Converting numeric to text is the least of your problems.
My purpose is to define a new variable "x%" as its name, with x
varying as the function input.
First of all: there are no variables in an SQL function. SQL functions are just wrappers for valid SQL statements. Input and output parameters can be named, but names are static, not dynamic.
You may be thinking of a PL/pgSQL function, where you have procedural elements including variables. Parameter names are still static, though. There are no dynamic variable names in plpgsql. You can execute dynamic SQL with EXECUTE but that's something different entirely.
While it is possible to declare a static variable with a name like "123%" it is really exceptionally uncommon to do so. Maybe for deliberately obfuscating code? Other than that: Don't. Use proper, simple, legal, lower case variable names without the need to double-quote and without the potential to do something unexpected after a typo.
Since the window function ntile() returns integer and you run an equality check on the result, the input parameter should be integer, not numeric.
To assign a variable in plpgsql you can use the assignment operator := for a single variable or SELECT INTO for any number of variables. Either way, you want the query to return a single row or you have to loop.
If you want the maximum billed from the chosen percentile, you don't GROUP BY x, y. That might return multiple rows and does not do what you seem to want. Use plain max(billed) without GROUP BY to get a single row.
You don't need to double quote perfectly legal column names.
A valid function might look like this. It's not exactly what you were trying to do, which cannot be done. But it may get you closer to what you actually need.
CREATE OR REPLACE FUNCTION ntile_loop(x integer)
RETURNS SETOF numeric as
$func$
DECLARE
myvar text;
BEGIN
SELECT INTO myvar max(billed)
FROM (
SELECT billed, id, cm
,ntile(100) OVER (PARTITION BY id, cm ORDER BY billed) AS tile
FROM table_all
) sub
WHERE sub.tile = $1;
-- do something with myvar, depending on the value of $1 ...
END
$func$ LANGUAGE plpgsql;
Long story short, you need to study the basics before you try to create sophisticated functions.
Plain SQL
After Q update:
I'd like to calculate 5, 10, 20, 30, ....90th percentile and display
all of them in the same table for each id+cm group.
This simple query should do it all:
SELECT id, cm, tile, max(billed) AS max_billed
FROM (
SELECT billed, id, cm
,ntile(100) OVER (PARTITION BY id, cm ORDER BY billed) AS tile
FROM table_all
) sub
WHERE (tile%10 = 0 OR tile = 5)
AND tile <= 90
GROUP BY 1,2,3
ORDER BY 1,2,3;
% .. modulo operator
GROUP BY 1,2,3 .. positional parameter
It looks like you're looking for return query execute, returning the result from a dynamic SQL statement:
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html
http://www.postgresql.org/docs/current/static/plpgsql-statements.html