I have a dataset that contains several tables that have suffixes in their name:
table_src1_serie1
table_src1_serie2
table_src2_opt1
table_src2_opt2
table_src3_type1_v1
table_src3_type2_v1
table_src3_type2_v2
I know that i can use this type of queries in BQ:
select * from `project.dataset.table_*`
to get all the rows from theses different tables.
What i am trying to achieve is to have a column that will contain for instance the type of source (src1, src2, src3)
Assuming the schema of all tables the same - you can add below to your select list (for BigQuery Standard SQL)
SPLIT(_TABLE_SUFFIX, '_')[SAFE_OFFSET(0)] AS src
Related
I want to list the tables which can be used as wildcards in BigQuery.
My dataset has the table list is similar to the following:
events_122022
events_122021
events_122020
...
...
events_112012
...
...
analytics_122022
analytics_122021
analytics_122020
...
...
analytics_112012
These tables are created dynamically and I have no information on the used table prefix
Is there a way to find the list of tables which can be used dynamically?
The result should be:[events_, analytics_]
My attempt:
Find the tables with similar DDL using the following SQL
SELECT
SUBSTR(ddl, STRPOS(ddl, '(')) as commonDDL,
STRING_AGG(table_name) as table
FROM
dataset.INFORMATION_SCHEMA.TABLES
GROUP BY SUBSTR(ddl, STRPOS(ddl, '('))
This gives the output as :
commonDDL
table
(ID STRING, ...)
events_122022, events_122021, ...
(NAME STRING, ...)
analytics_112022, analytics_112021 ...
Now using a Longest common shared start algorithm I can find the required result.
(Longest common start code here )
What are the other ways we can approach this problem?
Couldn't find anything on BigQuery docs.
Note: I only have readonly permission for the BigQuery dataset
What about finding your table with some regex:
select table_name
from yourds.INFORMATION_SCHEMA.TABLES
where regexp_contains(table_name, "_[0-2]+") is true
Is there any query that results from source table names and column table names using a mapping or mapping Id in informatica. This has been very hard and challenging
Like when we search up SELECT * FROM opb_mapping WHERE mapping_name LIKE '%CY0..%'
It is resulting in some details but I cannot find source table names and target table names. Help if you can.
Thanks.
You can use below view to get the data.
(Assuming you have full access to metadata tables.
select source_name, source_field_name,
target_name, target_column_name,
mapping_name, subject_name folder
from REP_FLD_MAPPING
where mapping_name like '%xx%';
Only issue i can see is, if you have overwrite sql, then you need to check sql for true source.
What I'm trying to do is to pull out a subset of tables from a Google Big Query Dataset based on the name of those tables and then add those tables to a Tableau datasource without having to join or union any of the tables.
I want to pull out all the tables beginning with System_1, from the below dataset
System_1_Start
System_1_Middle
System_1_End
System_2_Start
System_2_Middle
System_2_End
System_3_Start
System_3_Middle
System_3_End
I have been able to do a wildcard search using Information_Schema.Tables to get all the names of the tables that begin with System_1, but I cannot figure out a way to then get all of the tables with those names as the output of the query (SQL below)
SELECT table_name
AS matchingTables
FROM dataset.INFORMATION_SCHEMA.TABLES
WHERE table_name LIKE 'System_1%'
How do I go about extracting those tables and not just the names of those tables?
~~~~~~~~~~~EDIT~~~~~~~~~~~~~
This is the best approximation as to how I could have done this but I'm getting a dataset not found error which is strange
SELECT *
FROM dataset
WHERE (SELECT table_name FROM dataset.INFORMATION_SCHEMA.TABLES)
LIKE 'System_1%'
Error Recieved:
Not found: Dataset dataset:dataset was not found in location EU
If I understand correctly you want the list of the table names in your dataset that start with "System_1_". You can obtain that with the following:
SELECT CONCAT("System_1_", _TABLE_SUFFIX) AS table_name
FROM `dataset.System_1_*`
Is there a way that after comparing two tables and then use the Case function?
I am trying to have a new column base on Exists transformation. In sql I do it like this:
(isnull (select 'YES' from sales where salesperson = t1.salesperson group by salesperson), 'NO')) AS registeredSales
T1 is personal.
Or should I include the table into the stream of the joins and then use the case() function to compare the two columns?
If there's another way to work around to compare these two streams, I would be pleased to hear.
Thanks.
Flat files in a datalake can also be compared. We can use the derived column in dataflow to gernerate a new column.
I create a dataflow demo cotains two sources: CustomerSource(customer.csv stored in datalake2) and SalesSource(sales.csv stored in datalake2 and it contains only one column) as follows
Then I join the two sources with the column CustomerId
Then I use Select activity to give an alias to the CustomerId from SalesSource
In the DerivedColumn, I select the Add column and enter the expression iifNull(SalesCustomerID, 'NO', 'YES') to generate a new column named 'registeredSales' as follows:
The last column of the result shows:
In Hue/Hive,
Describe mytablename;
gives the list of columns, their types and comments. Is there any way to query in Hive, treating result from describe as a table ?
For example I want to count the number of numeric/character/specific type columns, filter column names, total number of columns (currently requires scrolling down per 100 each, which is a hassle with 1000+ columns), etc
Queries such as
select count(*) from (Describe mytablename);
select count(*) from (select * from describe mytablename);
are of course invalid
Any ideas ?
You can create a sql file --> hive.sql containing "describe dbname.tablename"
hive -f hive.sql > /path/file.txt
create table dbname.desc
(
name String,
type String,
desc String
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
then, load data from path '/path/file.txt' into table dbname.desc.