In BigQuery which metadata table(Information_Schema) is having Table Name, Columns and "Column Comments"?
We have Informatica connected to GCP Big Query, in Informatica "Target Mapping" we have choose option to create a backend table and load the data, it's successful but the target table is not updated with Table Column's Comments and we didn't find an option to get the column comments using informatica tool.
The source system 100+ tables having 200 to 400 columns, each column has column comments, but informatica tool it doesn't have a feature to get as source table columns comments (this column comments feature is not available).
Looking for a possibility way to update the Metadata GCP table for all the Tables Column's comments but not altering table with column comments.
Please let me know is there a Information_Schema's table which has the tables, columns and column comments.
thanks,
PM
You would likely want to check out the table_options and the column_field_paths views. In the linked docs they also provide sample queries.
Related
Does BigQuery provides any sql commands for retrieving table cardinality?
For example, some RDBMS providers have sql commands like:
show table_stats schemaname tablename
for getting table cardinality.
Also, what about column stats? Like the number of distinct values in a col and MIN, MAX, etc.
I saw that the BigQuery console provides both table and col stats but I wonder whether these info are accessible through SQL statements
Thanks!
The features you would like to use are more proper for the language, instead of the tool or service itself.
To get stats about the table. I found the Getting table metadata which explains how to get table metadata for Tables and Columns. Some of the information you will get when running the queries found in that doc.
For Tables: the name of the dataset that contains the table, the default lifetime, in days and other Table_Options view results.
For Columns: The name of the project that contains the dataset, the the column's standard SQL data type, if the value is updatable, stored, or hidden. Find more Results for the Columns view.
To get stats about columns. You can use the COUNT DISTINCT function, which retrieves a statistical approximation on the unique values in certain columns.
I found this Community blog, where they show different examples and ways to get unique values. It even explains how to increase the approximation threshold.
EDIT
It seems that BigQuery does not offer a count of unique fields. However, you can always take a look at the Schema and Details tabs in your BigQuery UI where the Fields' name are shown, including the type and description.
Example from the Public Datasets:
Hope this is helpful.
I am attempting to fix the schema of a Bigquery table in which the type of a field is wrong (but contains no data). I would like to copy the data from the old schema to the new using the UI ( select * except(bad_column) from ... ).
The problem is that:
if I select into a table, then Bigquery is removing the required of the columns and therefore rejecting the insert.
Exporting via json loses information on dates.
Is there a better solution than creating a new table with all columns being nullable/repeated or manually transforming all of the data?
Update (2018-06-20): BigQuery now supports required fields on query output in standard SQL, and has done so since mid-2017.
Specifically, if you append your query results to a table with a schema that has required fields, that schema will be preserved, and BigQuery will check as results are written that it contains no null values. If you want to write your results to a brand-new table, you can create an empty table with the desired schema and append to that table.
Outdated:
You have several options:
Change your field types to nullable. Standard SQL returns only nullable fields, and this is intended behavior, so going forward it may be less useful to mark fields as required.
You can use legacy SQL, which will preserve required fields. You can't use except, but you can explicitly select all other fields.
You can export and re-import with the desired schema.
You mention that export via JSON loses date information. Can you clarify? If you're referring to the partition date, then unfortunately I think any of the above solutions will collapse all data into today's partition, unless you explicitly insert into a named partition using the table$yyyymmdd syntax. (Which will work, but may require lots of operations if you have data spread across many dates.)
BigQuery now supports table clone features. A table clone is a lightweight, writeable copy of another table
Copy tables from query in Bigquery
Our use case for BigQuery is a little unique. I want to start using Date-Partitioned Tables but our data is very much eventual. It doesn't get inserted when it occurs, but eventually when it's provided to the server. At times this can be days or even months before any data is inserted. Thus, the _PARTITION_LOAD_TIME attribute is useless to us.
My question is there a way I can specify the column that would act like the _PARTITION_LOAD_TIME argument and still have the benefits of a Date-Partitioned table? If I could emulate this manually and have BigQuery update accordingly, then I can start using Date-Partitioned tables.
Anyone have a good solution here?
You don't need create your own column.
_PARTITIONTIME pseudo column still will work for you!
The only what you will need to do is insert/load respective data batch into respective partition by referencing not just table name but rather table with partition decorator - like yourtable$20160718
This way you can load data into partition that it belong to
I have a table PatientChartImages. It has a column ChartImage, which contains the binary of images. Now, we are planning to create a separate table which will contain the Binary of Images and will join both the tables to get requisite data. Now, we do not want to change the front end and I cannot use triggers. So, is there any way by which if a query refers to ChartImage column of PatientChartImages, it picks data from the third table? Please suggest.
I think inner-join does this. I've only overheard it's use - but this might lead you towards your answer.
Is it possible to construct a query so that I can find out what tables have a particular column? Then if it has the column query that table for an ID number? Is this possible?
Sure, I think you can look at the contents of the X$Field system table, described here:
http://ww1.pervasive.com/library/docs/psql/794/sqlref/sqlsystb.html