Creating a table from a BigQuery query, with field descriptions - google-bigquery

I am using BigQuery DDL to create tables from queries, for example:
CREATE OR REPLACE TABLE `test.bens_test_table` AS
SELECT "test data" AS my_field
When creating tables in this way there are no descriptions on the field in the BigQuery UI:
I would like to add field and table descriptions when creating a table. I have been reading through the documentation and I can create an empty table with field and table descriptions:
CREATE OR REPLACE TABLE `test.bens_test_table`
(my_field STRING OPTIONS(description="A description."),
my_int INT64 OPTIONS(description="A number field.")
)
OPTIONS(
  description="Hello, this is my test table.", 
  friendly_name="My friendly table"
)
However when I try to combine these two to create a table from a query, I get an error when trying to add in the descriptions:
CREATE OR REPLACE TABLE `test.bens_test_table` AS
(SELECT "test data" AS my_field)
OPTIONS(
  description="Hello, this is my test table.", 
  friendly_name="Ben is cool"
)
Can this be done using the BigQuery UI?

Change the order of the OPTIONS and the AS blocks. Check the docs for more query syntax information.
CREATE OR REPLACE TABLE `test.bens_test_table`
(my_field STRING OPTIONS(description="A description."),
my_int INT64 OPTIONS(description="A number field.")
)
OPTIONS(
description="Hello, this is my test table.",
friendly_name="Ben is cool"
)
AS (SELECT "test data" AS my_field, 1 AS my_int)

Related

How to add columns in BigQuery to a table with no schema without deleting it's current labels in SQL?

I run BQ jobs from a python code that first creates an empty table in BQ for the results with specific labels & description.
later in the BQ SQL I insert the results into that empty table. The only problem is that I can't use ALTER to add columns to a table with no schema. I can't add the schema before because the SQL query is dynamically created by the Python code.
The only way I found to solve this was to create the table with a column called 'x' and then remove it at the end of the SQL query.
Here is an idea of what the code looks like:
CREATE TEMP FUNCTION
... very_complicated_function ...;
CREATE TEMP TABLE features AS
... very_clever_code ...;
ALTER TABLE `table.created.by_python`
ADD COLUMN IF NOT EXISTS key INT64,
ADD COLUMN IF NOT EXISTS feature1 INT64;
ALTER TABLE `table.created.by_python` DROP COLUMN x;
INSERT INTO `table.created.by_python`
SELECT * except(nearest_centroids_distance)
from
ML.PREDICT(MODEL `brilliant.genius.amazing`,
(SELECT * FROM features)) M
The best possibility is just to insert the data into the empty table and let it create the schema itself if it doesn't exist.
You can add an empty column to an existing table by:
Using the Cloud Console
Using the bq command-line tool's bq update command
Calling the tables.patch API method
Using the ALTER TABLE ADD COLUMN data definition language (DDL)
statement.
Using the client libraries.
Here’s some python code that you can try to use and see if it helps in your case.
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table
# to add an empty column.
# table_id = "your-project.your_dataset.your_table_name"
table = client.get_table(table_id) # Make an API request.
original_schema = table.schema
new_schema = original_schema[:] # Creates a copy of the schema.
new_schema.append(bigquery.SchemaField("phone", "STRING"))
table.schema = new_schema
table = client.update_table(table, ["schema"]) # Make an API request.
if len(table.schema) == len(original_schema) + 1 == len(new_schema):
print("A new column has been added.")
else:
print("The column has not been added.")
Also, here’s some documentation that can help you to implement the new column into a table.

BigQuery: Run queries to create table and append to the table if table exist

Is it possible to run query to create a table if it does not exist, and append to the table if the table already exists? I like to write a single query to create or append. Note: I am using Admin console for now, will be using API eventually.
I have following query:
CREATE TABLE IF NOT EXISTS `project_1.dataset_1.tabe_1`
OPTIONS(
description="Some desc"
) AS
SELECT *
FROM source_table
I get following error:
A table named project_1.dataset_1.tabe_1 already exists.
Above query creates a table named 'table_1' if it does not exist under 'project_1.dataset_1', and append to the table if the table already exists.
IF
(
SELECT
COUNT(1)
FROM
`project_1.dataset_1.__TABLES_SUMMARY__`
WHERE
table_id='table_1') = 0
THEN
CREATE OR REPLACE TABLE
`project_1.dataset_1.table_1` AS
SELECT
'apple' AS item,
'fruit' AS category
UNION ALL
SELECT
'leek',
'vegetable';
ELSE
INSERT
`project_1.dataset_1.table_1` ( item,
category )
SELECT
'lettuce' AS item,
'vegetable' AS category
UNION ALL
SELECT
'orange',
'fruit';
END IF;
This seems like it may be a good opportunity to leverage scripting within a single query to accomplish your needs.
See this page for adding control flow to a query to handle an error (e.g. if the table create fails due to existing). For the exception case, you could then INSERT ... SELECT statement as needed.
You can do this via the API as well if you prefer. Simply issue a tables.get equivalent appropriate to the particular library/language you choose and see if the table exists, and then insert the appropriate query based on that outcome of that check.

BigQuery Equivalent of "CREATE TABLE my_table (LIKE your_table)"

I want to create a table which schema is exactly the same as another table. In other SQL engines, I think I was able to use "CREATE TABLE my_table (LIKE your_table)" or some variations.
I couldn't find the equivalent in BigQuery yet. Is this possible in some fashion?
Use this form:
CREATE TABLE dataset.new_table AS
SELECT *
FROM dataset.existing_table
LIMIT 0
This creates a new table with the same schema as the old one, and there is no cost due to the LIMIT 0.
Note that this does not preserve partitioning, table description, etc., however. Another option is to use the CLI (or API), making a copy of the table and then overwriting its contents, e.g.:
$ bq cp dataset.existing_table dataset.new_table
$ bq query --use_legacy_sql --replace --destination_table=dataset.new_table \
"SELECT * FROM dataset.new_table LIMIT 0;"
Now the new table has the same structure and attributes as the original did.
To create a partitioned and/or clustered table the syntax would be:
CREATE TABLE project.dataset.clustered_table
PARTITION BY DATE(created_time)
CLUSTER BY
account_id
AS SELECT * FROM project.dataset.example_table LIMIT 0

Hive - Create Table statement with 'select query' and 'fields terminated by' commands

I want to create a table in Hive using a select statement which takes a subset of a data from another table. I used the following query to do so :
create table sample_db.out_table as
select * from sample_db.in_table where country = 'Canada';
When I looked into the HDFS location of this table, there are no field separators.
But I need to create a table with filtered data from another table along with a field separator. For example I am trying to do something like :
create table sample_db.out_table as
select * from sample_db.in_table where country = 'Canada'
ROW FORMAT SERDE
FIELDS TERMINATED BY '|';
This is not working though. I know the alternate way is to create a table structure with field names and the "FIELDS TERMINATED BY '|'" command and then load the data.
But is there any other way to combine the two into a single query that enables me to create a table with filtered data from another table and also with a field separator ?
Put row format delimited .. in front of AS select
do it like this
Change the query to yours
hive> CREATE TABLE ttt row format delimited fields terminated by '|' AS select *,count(1) from t1 group by id ,name ;
Query ID = root_20180702153737_37802c0e-525a-4b00-b8ec-9fac4a6d895b
here is the result
[root#hadoop1 ~]# hadoop fs -cat /user/hive/warehouse/ttt/**
2|\N|1
3|\N|1
4|\N|1
As you can see in the documentation, when using the CTAS (Create Table As Select) statement, the ROW FORMAT statement (in fact, all the settings related to the new table) goes before the SELECT statement.

Google BigQuery: how to create a new column with SQL

I would like to add an column to an already existing table without using legacy SQL.
The basic SQL syntax for this is:
ALTER TABLE table_name
ADD column_name datatype;
I formatted the query for Google BigQuery:
ALTER TABLE `projectID.datasetID.fooTable`
ADD (barColumn date);
But than the syntax is incorrect with this error:
Error: Syntax error: Expected "." or keyword SET but got identifier "ADD" at [1:63]
So how do I format the SQL properly for Google BigQuery?
Support for ALTER TABLE ADD COLUMN was released on 2020-10-14 per BigQuery Release Notes.
So the statement as originally proposed should now work with minimal modification:
ALTER TABLE `projectID.datasetID.fooTable`
ADD COLUMN barColumn DATE;
BigQuery does not support ALTER TABLE or other DDL statements, but you could consider submitting a feature request. For now, you either need to open in the table in the BigQuery UI and then add the column with the "Add New Field" button, or if you are using the API, you can use tables.update.
my_old_table
a,b,c
1,2,3
2,3,4
CREATE TABLE IF NOT EXISTS my_dataset.my_new_table
AS
SELECT a,b,c,
"my_string" AS d, current_timestamp() as timestamp
FROM my_dataset.my_old_table
my_new_table
a,b,c,d,timestamp
1,2,3,my_string,2020-04-22 17:09:42.987987 UTC
2,3,4,my_string,2020-04-22 17:09:42.987987 UTC
schema:
a: integer
b: integer
c: integer
d: string
timestamp: timestamp
I hope all is clear :)
With this you can easily add new columns to an existing table, and also specify the data types. After that you can delete the old table, if necessary. :)