I'm learning the syntax of Google BigQuery, and currently, I'm reading documentation regarding identifiers and case sensitivity. I'm focused on standard SQL syntax of BigQuery.
Documentation says:
BigQuery follows these rules for case sensitivity:
Category | Case Sensitive?
Function names | No
But when I running the following statements in Console:
#standardSQL
create function cs_test.function_a (x int64, y int64) as (x*y);
create function cs_test.function_A (x int64, y int64) as (x-y);
select cs_test.function_a(5,6); -- 30
select cs_test.function_A(5,6); -- -1
two functions are created and different results are provided as a result of select statements.
At the same time if I run the following statements I get an error, which says that the function is not found:
create function cs_test.function_b (x int64, y int64) as (x+y);
select cs_test.function_B(5,6); -- NOK
Is the function name case insensitive in Google BigQuery? From the code snippets provided above it seems to be case sensitive.
Thank you.
What you've found is correct. Documentation has been updated to reflect it:
| Category | Case Sensitive? |
| Built-in Function names | No |
| User-Defined Function names | Yes |
Related
TL;DR:
Is there a way to do string manipulation in BQ only with SQL UDF?
Eg:
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
select removeExtraEqualToFromPayload(payload) from table
should give
____________________________________________________
payload
----------------------------------------------------
key1=val1&key2=val2&key3=val3&key4=val4
----------------------------------------------------
key5=val5&key6=val6
Long version:
My goal is to iterate over a string that is part of one of the columns
This is our table structure
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
As you see, key3 in first row has an = after val3 and key6 in second row has an = after val6 which is not desired for us
So the goal is to iterate over the string and remove these extra =
I had gone through https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions that explains how to use custom functions in BQ. As of now SQL UDF only supports SQL query, where as with JS UDF we can write our custom logic to add loops etc
Since JS UDF is very slow, using it has been ruled out and we only had to rely on SQL UDF.
I thought of using BQ Scripting(https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting) in combination with SQL UDF but that doesn't seem to work. Looks like script has to be altogether different
I had explored stored procedures with BQ for the same, however, that is also not working. I'm not sure if I am doing it right
I've created a procedure like this:
CREATE PROCEDURE test.AddDelta(INOUT x INT64, delta INT64)
BEGIN
SET x = x + delta;
END;
I'm not able to use the above procedure like this:
with ta as (select 1 id union all select 2 id)
select id from ta;
call test.AddDelta(id, 1);
select id;
I'm wondering if there is a way to parse strings like this without using Javascript UDF
Disclaimer: My regex-fu is not good. definitely have a look at the re2 syntax
You should be able to do it with REGEXP_REPLACE
SELECT
payload,
REGEXP_REPLACE(payload,r'=(&)|=$','\\1') AS payload_clean
FROM
`myproject.mydataset.mytable`
example output:
payload
payload_clean
key1=val1&key2=val2&key3=val3=&key4=val4=
key1=val1&key2=val2&key3=val3&key4=val4
Executable example:
WITH
payload_table AS (
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4" AS payload UNION ALL
SELECT "key5=val5&key6=val6=" AS payload UNION ALL
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4=" AS payload UNION ALL
SELECT "key3=val3=abc&key4=val4" AS payload
)
SELECT
payload,
REGEXP_REPLACE(payload,r'(=val\pN)=(\pL*&)|=(&)|=$','\\1\\2') AS payload_clean
FROM
payload_table
Of course (=val\pN)=(\pL*&) in the pattern won't necessarily work for you since you probably have different patterns. If there are no patterns to match then I'm not sure how you will remove the extra '=' from your strings automatically.
Context
I just met a single table in a PostgreSQL database which is actually only defining a triplet of coded values that are used across the whole database as a ternary data type. I am a bit astonished at first glance, I feel it's weird; there should be some ternary data type?
I've searched the web, especially the PostgreSQL documentation without any apparent success (I'm probably wrong with my search keywords?!), but maybe there is no other solution.
Question
I would like to know if it exists a ternary (as comparison with binary or boolean) data type in PostgreSQL or more generally in SQL which permits to express a "ternary state" (or "ternary boolean" which is clearly is an abuse of language), which I would represent as a general idea as:
+-------+----------+--------------------+
| id | type | also expressed as |
+-------+----------+--------------------+
| 0 | false | 0 |
| 1 | true | 1 |
| 2 | unknown | 2 |
+-------+----------+--------------------+
where unknown can be whatever third state you are actually dealing with.
I would like to know if it exists a ternary (as comparison with binary or boolean) data type
Actually, the boolean data type is ternary because it can have the values true, false and null.
Consider this table:
create table data (some_number int, some_flag boolean);
And the following data:
insert into data (some_number, some_flag)
values (1, true), (2, false), (3, null);
Then the following:
select *
from data
where some_flag = false;
will only return one row (with some_number = 2)
there is not a specific ternary operator but you could use case
select case when operator =0 then 'false'
when operatore =1 then 'true'
when operator = 2 then 'unknow'
else 'not managed'
end
from your_table
I second a_horse_with_no_name's solution for your specific example, but the more general approach is to use an enum data type:
CREATE TYPE ternary AS ENUM (
'never',
'sometimes',
'always'
);
Constants of such a data type are written as string constantls, e.g. 'never', but the internal storage uses 4 bytes per value, regardless of the length of the label.
I have float data in a BigQuery table like 5302014.2 and 5102014.4.
I'd like to run a BigQuery SQL that returns the values in String format, but the following SQL yields this result:
select a, string(a) from my_table
5302014.2 "5.30201e+06"
5102014.4 "5.10201e+06"
How can I rewrite my SQL to return:
5302014.2 "5302014.2"
5102014.4 "5102014.4"
use standardSQL doesn't have the problem
$ bq query '#standardSQL
SELECT a, CAST(a AS STRING) AS a_str FROM UNNEST(ARRAY[530201111114.2, 5302014.4]) a'
+-------------------+----------------+
| a | a_str |
+-------------------+----------------+
| 5302014.4 | 5302014.4 |
| 5.302011111142E11 | 530201111114.2 |
+-------------------+----------------+
SELECT STRING(INTEGER(f)) + '.' + SUBSTR(STRING(f-INTEGER(f)), 3)
FROM (SELECT 5302014.5642 f)
(not a nice hack, but a better method would be a great feature request to post at https://code.google.com/p/google-bigquery/issues/list?can=2&q=label%3DFeature-Request)
Converting your legacy sql to standard sql is really the best way going forward as far as working with GBQ is concerned. Standard sql is much faster and have way better implementation of features.
For your use case, going with standard sql with CAST(a AS STRING) would be best.
I'm doing some converting from Oracle to MSSQL and I was reading a guide by Oracle on B Supported SQL Syntax and Functions.
I noticed it was stated that there is a NOT NVL function (and its MSSQL equivalent was IS NOT NULL).
I'm compiling a list for my colleagues so we can have a one-stop resource for syntax and supported functions, am I correct in assuming that NOT NVL works like so:
There are 3 columns, name, location, loves_marmite
Andrew | UK | Yes
NOT NVL(loves_marmite, 'Nope')
So the data displayed would be:
Andrew | UK | Nope
I just don't get why it would be listed as an Oracle Function when it's just a logic issue, and what's more is that Oracle has IS NULL and IS NOT NULL.
I'm sorry I'm just looking for some clarification before I pass this document on to my colleagues.
EDIT : If possible would someone have a comprehensive list of function and syntax differences between the two platforms?
Check NVL2(param1, param2, param3) function.
If param1 is NOT (NULL or EMPTY STRING) it returns param2 else returns param3.
You could write:
NVL2(loves_marmite, 'Nope', something_else)
Also, see this answer for a list of null-related functions in Oracle
First, please see the isNull function. But Oracle may be trying to tell you to replace the NVL funcionality with a case;
SELECT CASE WHEN Foo IS NOT NULL THEN bar
ELSE BLA
END
I have the following function, based on the SQL Functions Returning Sets section of the PG docs, which accepts two arrays of equal length, and unpacks them into a set of rows with two columns.
CREATE OR REPLACE FUNCTION unpack_test(
in_int INTEGER[],
in_double DOUBLE PRECISION[],
OUT out_int INTEGER,
OUT out_double DOUBLE PRECISION
) RETURNS SETOF RECORD AS $$
SELECT $1[rowx] AS out_int, $2[rowx] AS out_double
FROM generate_series(1, array_upper($1, 1)) AS rowx;
$$ LANGUAGE SQL STABLE;
I execute the function in PGAdmin3, like this:
SELECT unpack_test(int_col, double_col) FROM test_data
It basically works, but the output looks like this:
|unpack_test|
|record |
|-----------|
|(1, 1) |
|-----------|
|(2, 2) |
|-----------|
...
In other words, the result is a single record, as opposed to two columns. I found this question that seems to provide an answer, but it deals with a function that selects from a table directly, whereas mine accepts the columns as arguments, since it needs to generate the series used to iterate over them. I therefore can't call it using SELECT * FROM function, as suggested in that answer.
First, you'll need to create a type for the return value of your function. Something like this could work:
CREATE TYPE unpack_test_type AS (out_int int, out_double double precision);
Then change your function to return this type instead of record.
Then you can use it like this:
SELECT (unpack_test).out_int, (unpack_test).out_double FROM
(SELECT unpack_test(int_col, double_col) FROM test_data) as test
It doesn't seem possible to take a function returning a generic record type and use it in this manner.