I want to list the tables which can be used as wildcards in BigQuery.
My dataset has the table list is similar to the following:
events_122022
events_122021
events_122020
...
...
events_112012
...
...
analytics_122022
analytics_122021
analytics_122020
...
...
analytics_112012
These tables are created dynamically and I have no information on the used table prefix
Is there a way to find the list of tables which can be used dynamically?
The result should be:[events_, analytics_]
My attempt:
Find the tables with similar DDL using the following SQL
SELECT
SUBSTR(ddl, STRPOS(ddl, '(')) as commonDDL,
STRING_AGG(table_name) as table
FROM
dataset.INFORMATION_SCHEMA.TABLES
GROUP BY SUBSTR(ddl, STRPOS(ddl, '('))
This gives the output as :
commonDDL
table
(ID STRING, ...)
events_122022, events_122021, ...
(NAME STRING, ...)
analytics_112022, analytics_112021 ...
Now using a Longest common shared start algorithm I can find the required result.
(Longest common start code here )
What are the other ways we can approach this problem?
Couldn't find anything on BigQuery docs.
Note: I only have readonly permission for the BigQuery dataset
What about finding your table with some regex:
select table_name
from yourds.INFORMATION_SCHEMA.TABLES
where regexp_contains(table_name, "_[0-2]+") is true
Related
I have a look-up table containing a list of fully qualified table paths in a Big Query table called all_tables. For example
|table_list|
|----------|
|project_name.dataset_name1.table_1|
|project_name.dataset_name2.table_1|
|project_name.dataset_name3.table_1|
|project_name.dataset_name4.table_1|
|project_name.dataset_name5.table_1|
I am trying to iterate through these tables to pull out elements I need for another procedure using the for-in syntax in Big Query. This is a simplified version of the query I am using
```
FOR table IN (select * from my_project.my_dataset.all_tables)
DO
select * from table;
END FOR;
```
This isn't working. It picks up the list of tables correctly, but when it substitutes the dataset name in the line 3 select statement, it says
**Invalid value: Table "table" must be qualified with a dataset (e.g. dataset.table)**
I know what the error is, but I am not sure how to make it 'see' the value of table as a table path.
All paths are correct, and I am doing it this way as I am querying multiple tables across multiple datasets for a table creation query.
You should a dynamic sql to refer the table name as a variable, so consider below query:
FOR table IN (select * from my_project.my_dataset.all_tables)
DO
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM %s;
""", table.table_list);
END FOR;
Is there any query that results from source table names and column table names using a mapping or mapping Id in informatica. This has been very hard and challenging
Like when we search up SELECT * FROM opb_mapping WHERE mapping_name LIKE '%CY0..%'
It is resulting in some details but I cannot find source table names and target table names. Help if you can.
Thanks.
You can use below view to get the data.
(Assuming you have full access to metadata tables.
select source_name, source_field_name,
target_name, target_column_name,
mapping_name, subject_name folder
from REP_FLD_MAPPING
where mapping_name like '%xx%';
Only issue i can see is, if you have overwrite sql, then you need to check sql for true source.
I have a dataset that contains several tables that have suffixes in their name:
table_src1_serie1
table_src1_serie2
table_src2_opt1
table_src2_opt2
table_src3_type1_v1
table_src3_type2_v1
table_src3_type2_v2
I know that i can use this type of queries in BQ:
select * from `project.dataset.table_*`
to get all the rows from theses different tables.
What i am trying to achieve is to have a column that will contain for instance the type of source (src1, src2, src3)
Assuming the schema of all tables the same - you can add below to your select list (for BigQuery Standard SQL)
SPLIT(_TABLE_SUFFIX, '_')[SAFE_OFFSET(0)] AS src
I have datasets of the same structure and i know I can query them like this, they are named by date:
SELECT column
FROM [xx.ga_sessions_20141019] ,[xx.ga_sessions_20141020],[xx.ga_sessions_20141021]
WHERE column = 'condition';
However I actually want to query various months of this data... so instead of listing them all in the same way as above, is there syntax that you can use that looks like:
SELECT column
FROM [xx.ga_sessions_201410*] ,[xx.ga_sessions_201411*]
WHERE column = 'condition';
Take a look at the table wildcard functions section of the BigQuery query reference. TABLE_DATE_RANGE or TABLE_QUERY will work for you here. Something like:
SELECT column
FROM TABLE_DATE_RANGE(xx.ga_sessions_,
TIMESTAMP('2014-10-19'),
TIMESTAMP('2014-10-21'))
WHERE column = 'condition';
Is it possible to use FLATTEN and TABLE_QUERY function together?
It looks like the TABLE_QUERY returns only the table name and FLATTEN requires the dataset as well.
FROM Clause:
FROM FLATTEN(TABLE_QUERY(nbr_pcrf, 'table_id CONTAINS "dump_"'), quotas) d
Error:
Query Failed
Error: Table name cannot be resolved: dataset name is missing.
Job ID: nbr-data-storage:job_44jU_diWnh4tk27UxDxFP-I5Rbg
This is actually a little bit misleading: what is happening is that FLATTEN() with anything that isn't just a table name needs an extra set of parentheses to distinguish the field you're flattening by from a table. In other words, if you do
SELECT ... FROM FLATTEN(TABLE_QUERY(...), foo)
the foo field gets interpreted as a unioned table name (as in SELECT * from bar,foo).
The workaround for this issue is simple: Add another set of parentheses. That is:
SELECT ... FROM
FLATTEN((TABLE_QUERY(nbr_pcrf, 'table_id CONTAINS "dump_"')), quotas)