Bigquery UDF that returns latest partition

Bigquery UDF that returns latest partition - sql

I'm trying to build a BigQuery UDF that returns the latest partition DATE for a partitioned table, given the dataset name and table name as parameters to the UDF. I cannot use BQ scripting to get latest partition since I need to save my final query as a view definition (and views don't support scripting).
The UDF right now returns an error message 'Not found: Dataset my-project-id:dataset_name was not found in location northamerica-northeast1'.
The error message makes sense, but I don't want to hard-code my actual dataset's name in the UDF. How do I get around this problem ?
CREATE FUNCTION `my-project-id.test_dataset`.get_latest_partition(dataset_name STRING, tab_name STRING)
RETURNS DATE
AS (
(SELECT PARSE_DATE('%Y%m%d', MAX(partition_id)) FROM dataset_name.INFORMATION_SCHEMA.PARTITIONS WHERE table_name = tab_name)
)

It's not possible to pass dynamically the name of the dataset inside the function as you do with tab_name because it's completing the ".INFORMATION_SCHEMA". You can't concatenate or scripting (format, declare) in that case.

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

I have a table named Posts I would like to count and profile in Snowflake using the current Snowsight UI.
When I return the results via EXPLAIN using TABLULAR I am able to return the set with the combination of TABLE, RESULT_SCAN, and LAST_QUERY_ID functions, but any predicate or filter or column reference seems to fail.
Is there a valid way to do this in Snowflake with the TABLE function or is there another way to query the output of the EXPLAIN using TABLULAR?
-- Works
EXPLAIN using TABULAR SELECT COUNT(*) from Posts;
-- Works
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t;
-- Does not work
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t where operation = 'GlobalStats';
-- invalid identifier 'OPERATION', the column does not seem recognized.
Tried the third example and expected the predicate to apply to the function output. I don't understand why the filter works on some TABLE() results and not others.

You need to double quote the column name
where "operation"=
From the Documentation
Note that because the output column names from the DESC USER command
were generated in lowercase, the commands use delimited identifier
notation (double quotes) around the column names in the query to
ensure that the column names in the query match the column names in
the output that was scanned

BigQuery CREATE TABLE with dynamic table name [duplicate]

I am trying to write a BigQuery script that I can store as a procedure, I would like one of the arguments I pass to be used in the table name that is written out by the script, for example:
DECLARE id STRING;
SET id = '123';
CREATE OR REPLACE TABLE test.id AS(
SELECT * FROM dataset.table
)
However, in this example the table is created with the name id rather than the value of the "id" variable, 123. Is there any way I can dynamically create a table using the value of a declared variable in the BigQuery UI?

Why not just use Execute Immediate with concat if you know the table schema?
EXECUTE IMMEDIATE CONCAT('CREATE TABLE `', id, '` (column_name STRING)');

So far we have officially announced BigQuery scripting document, which is still in Beta phase, leveraging usage of dynamic parameters (variables) as a placeholders for values in SQL queries . However, according to Parameterized queries in BigQuery documentation, query parameters can't be used for SQL object identifiers:
Parameters cannot be used as substitutes for identifiers, column
names, table names, or other parts of the query.

Maybe you can use a wildcard table. You would create a wildcard table with all subtables you want to query and use the WHERE clause to select any subtable you want. Just be careful, the tables must have the same schema.

Create table using a variable name in the BigQuery UI

I am trying to write a BigQuery script that I can store as a procedure, I would like one of the arguments I pass to be used in the table name that is written out by the script, for example:
DECLARE id STRING;
SET id = '123';
CREATE OR REPLACE TABLE test.id AS(
SELECT * FROM dataset.table
)
However, in this example the table is created with the name id rather than the value of the "id" variable, 123. Is there any way I can dynamically create a table using the value of a declared variable in the BigQuery UI?

Why not just use Execute Immediate with concat if you know the table schema?
EXECUTE IMMEDIATE CONCAT('CREATE TABLE `', id, '` (column_name STRING)');

So far we have officially announced BigQuery scripting document, which is still in Beta phase, leveraging usage of dynamic parameters (variables) as a placeholders for values in SQL queries . However, according to Parameterized queries in BigQuery documentation, query parameters can't be used for SQL object identifiers:
Parameters cannot be used as substitutes for identifiers, column
names, table names, or other parts of the query.

Maybe you can use a wildcard table. You would create a wildcard table with all subtables you want to query and use the WHERE clause to select any subtable you want. Just be careful, the tables must have the same schema.

PostgreSQL - How to cast dynamically?

I have a column that has the type of the dataset in text.
So I want to do something like this:
SELECT CAST ('100' AS %INTEGER%);
SELECT CAST (100 AS %TEXT%);
SELECT CAST ('100' AS (SELECT type FROM dataset_types WHERE id = 2));
Is that possible with PostgreSQL?

SQL is strongly typed and static. Postgres demands to know the number of columns and their data type a the time of the call. So you need dynamic SQL in one of the procedural language extensions for this. And then you still face the obstacle that functions (necessarily) have a fixed return type. Related:
Dynamically define returning row types based on a passed given table in plpgsql?
Function to return dynamic set of columns for given table
Or you go with a two-step flow. First concatenate the query string (with another SELECT query). Then execute the generated query string. Two round trips to the server.
SELECT '100::' || type FROM dataset_types WHERE id = 2; -- record resulting string
Execute the result. (And make sure you didn't open any vectors for SQL injection!)
About the short cast syntax:
Postgres data type cast

How to check which function uses a type?

I have a type which I'd like to change but I don't know who else is using it.
How can I check for all functions that return this type?

You can find all dependencies in the system catalog pg_depend.
This returns all functions depending on the type. I.e. not only those with the type in the RETURNS clause, but also those with the type as function parameter:
SELECT objid::regproc AS function_name
, pg_get_functiondef(objid) AS function_definition
, pg_get_function_identity_arguments(objid) AS function_args
, pg_get_function_result(objid) AS function_returns
FROM pg_depend
WHERE refclassid = 'pg_type'::regclass
AND refobjid = 'my_type'::regtype -- insert your type name here
AND classid = 'pg_proc'::regclass; -- only find functions
This also works for table functions:
...
RETURNS TABLE (foo my_type, bar int)
Using system catalog information functions.
There may be other dependencies (not to functions). Remove the last WHERE condition from my query to test (and adapt the SELECT list, obviously).
And there is still the possibility of the type being used explicitly (in a cast for instance) in queries in the function body or in dynamic SQL. You can only identify such use cases by parsing the text of the function body. There are no explicit dependencies registered in the system.
Related:
How to get function parameter lists (so I can drop a function)
DROP FUNCTION without knowing the number/type of parameters?

As mentioned by Erwin Brandstetter, this only works for functions directly returning the data type.
SELECT * FROM information_schema.routines r
WHERE r.type_udt_name = 'YOUR_DATA_TYPE' ORDER BY r.routine_name;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bigquery UDF that returns latest partition - sql

It's not possible to pass dynamically the name of the dataset inside the function as you do with tab_name because it's completing the ".INFORMATION_SCHEMA". You can't concatenate or scripting (format, declare) in that case.

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

BigQuery CREATE TABLE with dynamic table name [duplicate]

Create table using a variable name in the BigQuery UI

PostgreSQL - How to cast dynamically?

How to check which function uses a type?

Categories

Resources