Table Functions and FLATTEN in BigQuery - sql

Is it possible to use FLATTEN and TABLE_QUERY function together?
It looks like the TABLE_QUERY returns only the table name and FLATTEN requires the dataset as well.
FROM Clause:
FROM FLATTEN(TABLE_QUERY(nbr_pcrf, 'table_id CONTAINS "dump_"'), quotas) d
Error:
Query Failed
Error: Table name cannot be resolved: dataset name is missing.
Job ID: nbr-data-storage:job_44jU_diWnh4tk27UxDxFP-I5Rbg

This is actually a little bit misleading: what is happening is that FLATTEN() with anything that isn't just a table name needs an extra set of parentheses to distinguish the field you're flattening by from a table. In other words, if you do
SELECT ... FROM FLATTEN(TABLE_QUERY(...), foo)
the foo field gets interpreted as a unioned table name (as in SELECT * from bar,foo).
The workaround for this issue is simple: Add another set of parentheses. That is:
SELECT ... FROM
FLATTEN((TABLE_QUERY(nbr_pcrf, 'table_id CONTAINS "dump_"')), quotas)

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

I have a table named Posts I would like to count and profile in Snowflake using the current Snowsight UI.
When I return the results via EXPLAIN using TABLULAR I am able to return the set with the combination of TABLE, RESULT_SCAN, and LAST_QUERY_ID functions, but any predicate or filter or column reference seems to fail.
Is there a valid way to do this in Snowflake with the TABLE function or is there another way to query the output of the EXPLAIN using TABLULAR?
-- Works
EXPLAIN using TABULAR SELECT COUNT(*) from Posts;
-- Works
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t;
-- Does not work
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t where operation = 'GlobalStats';
-- invalid identifier 'OPERATION', the column does not seem recognized.
Tried the third example and expected the predicate to apply to the function output. I don't understand why the filter works on some TABLE() results and not others.
You need to double quote the column name
where "operation"=
From the Documentation
Note that because the output column names from the DESC USER command
were generated in lowercase, the commands use delimited identifier
notation (double quotes) around the column names in the query to
ensure that the column names in the query match the column names in
the output that was scanned

Big Query For-In not picking up table paths from a look-up table

I have a look-up table containing a list of fully qualified table paths in a Big Query table called all_tables. For example
|table_list|
|----------|
|project_name.dataset_name1.table_1|
|project_name.dataset_name2.table_1|
|project_name.dataset_name3.table_1|
|project_name.dataset_name4.table_1|
|project_name.dataset_name5.table_1|
I am trying to iterate through these tables to pull out elements I need for another procedure using the for-in syntax in Big Query. This is a simplified version of the query I am using
```
FOR table IN (select * from my_project.my_dataset.all_tables)
DO
select * from table;
END FOR;
```
This isn't working. It picks up the list of tables correctly, but when it substitutes the dataset name in the line 3 select statement, it says
**Invalid value: Table "table" must be qualified with a dataset (e.g. dataset.table)**
I know what the error is, but I am not sure how to make it 'see' the value of table as a table path.
All paths are correct, and I am doing it this way as I am querying multiple tables across multiple datasets for a table creation query.
You should a dynamic sql to refer the table name as a variable, so consider below query:
FOR table IN (select * from my_project.my_dataset.all_tables)
DO
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM %s;
""", table.table_list);
END FOR;

BigQuery - Get fields of nested Repeated Records

I am working with BigQuery tables that can have many levels of nested repeated record fields, as shown in the example.
I need to make a select on the given table, extract only some fields and ignore others, and at the end have the same structure except for the ignored fields.
I think I have to work with array_agg and unnest to get only the required fields from the repeated record fields, but don't know how to do.
In the example, I want to keep only DatiRighe and DatiRigheDettaglio as Structs and for each of them I want to keep everything except DatiRighe.Nota and DatiRigheDettaglio.cod_iva.
Try the following query for your requirement:
SELECT
* REPLACE((
SELECT
AS STRUCT * EXCEPT(Nota)
FROM (
SELECT
AS STRUCT DatiRighe.* REPLACE((
SELECT
AS STRUCT DatiRighe.DatiRigheDettaglio.* EXCEPT(cod_iva))AS DatiRigheDettaglio) )) AS DatiRighe)
FROM
`my-project_id.database.table`
In the query, I used EXCEPT to remove the unwanted column and I used REPLACE to replace the corresponding structure with a new modified one. Let me know if it helps.

Select columns from table in schema different than public

In my PostgreSQL database I have table that is inside import schema. When I want to get all data from the column I do:
select * from import.master_plan
This query works fine. But when I try to for example get only title column values:
select import.master_plan.title from import.master_plan;
it returns:
ERROR: column master_plan.title does not exist
LINE 1: select import.master_plan.title from import.master_plan;
^
HINT: Perhaps you meant to reference the column "master_plan.  title".
I've also tried:
select title from import.master_plan;
but this also not works. I'm using PostgreSQL 10. How can I fix that?
I would suggest that you use a table alias instead:
select mp.title
from import.master_plan mp;
This is much easier to read and to type.
Judging from the error message, though, the name seems to have leading spaces. Something like:
select mp." title"
from import.master_plan mp;
might work. If this is the case, alter the table and rename the column.

SQL: is it normal to use column name with schema name?

Lets suppose I have two schemas in one DB: public and private. In both schemas I have the same table - my_table with the same columns. So is it normal to do the following:
SELECT public.my_table.my_col FROM public.my_table?
I am trying to do it with H2 but get exception in ResultSet - column not found. Is it not normal or it's not normal for H2?
You should write:
SELECT my_col FROM public.my_table
since column names are already evaluated in the table(s) specified in the query.