Google BigQuery list tables - google-bigquery

I need to list all tables in my BigQuery, but I don't know how to do it, I try search but I didn't find anything about it.
I need to know if the table exists, if it exists I search for record, if not I create table and insert record.

Depending where/how you want to do this, you can use CLI, API calls or client libraries. Here you have all the info about listing tables
As an example, if you want to list them using Command Line Interface, you can do it like:
bq ls <project>:<dataset>
If you want to use normal SQL queries, you can use the INFROMATION_SCHEMA Beta feature
SELECT table_name from `<project>.<dataset>.INFORMATION_SCHEMA.TABLES`
(project is optional)

Related

See managed tables in Databricks AWS

I need to identify and list all managed tables in a Databricks AWS workspace. I can see that manually in the table details, but I need to this for several thousand tables on different databases, and I cannot find a way to automate it.
The only way I found to tell programmatically if a table is managed or external is with the DESCRIBE TABLE EXTENDED command, but that returns it as a value on a column, and cannot be used with SELECT or WHERE to filter, even if I try running it as a subquery.
What is the easiest way to filter the managed tables?
spark.sql('use my_database')
df = spark.sql('show tables in my_database')
for t in df.collect():
print('table {}'.format(t.tableName))
display(spark.sql('describe table extended {}'.format(t.tableName)).where("col_name='Type' and data_type='MANAGED'"))
#use if condition to filter out the Managed data_type and collect the database and table names
#loop over all databases using "show databases" in outer loop

BigQuery: How to structure multiple SQL codes into one and write data into tables

If I have 2 SQL queries in Bigquery and I want them to run one after another. How should I build this data pipeline and automate it?
Select
a,
b
INTO Table2
From Table1;
Select
a,
b
INTO Table3
FROM Table2
You can simply use BigQuery DDL command to create table2 and then use it in the next query to create table3:
CREATE OR REPLACE TABLE `YOUR_PROJECT.YOUR_DATASET.table2` AS
SELECT a, b FROM `YOUR_PROJECT.YOUR_DATASET.table1`;
CREATE OR REPLACE TABLE `YOUR_PROJECT.YOUR_DATASET.table3` AS
SELECT a, b FROM `YOUR_PROJECT.YOUR_DATASET.table2`;
NOTE: Change YOUR_PROJECT and YOUR_DATASET to what you are using.
Depends on the kind of automation needed. For example, you may create tables using multiple create table statements and then schedule them to run at a certain frequency.
A quick route to schedule queries is to use Google cloud console, select your project and open the BigQuery editor. Type in the multiple SQL statements each ending with a semicolon and use the schedule query option to schedule them.
More at:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#creating_a_new_table
https://cloud.google.com/bigquery/docs/scheduling-queries
First, probably it's better to provide your use cases first. There's a lot of automation tools that can be used. Suppose you want to make it automated you can:
Check whether the table already exist first, and then create if necessary
Do create or replace.
The 1st option usually works if you want to do update, for example you're doing daily update to your table, by keep appending the data. Second option, works if you only wants to save the latest state of your table.

Finding Bigquery table size across all projects

We are maintaining a table in Bigquery that captures all the activity logs from the Stack driver logs. This table helps me list all the tables present, User, who created the table, what was the last command run on the table etc across projects and data sets in our organization. Along with this information, I also want the table size for the tables I am trying to check.
I can Join with the TABLES and TABLE_SUMMARY however I need to explicitly specify the project and dataset I want to query, but my driving table has details of multiple projects and Datasets.
Is there any other metadata table I can get the table size from, or any logs that I can load into a Bigquery table to join and get the desired results
You can use the bq command line tool. With the command:
bq show --format=prettyjson
This provides the numBytes, datasetId, projectId and more.
With a script you can use:
bq ls
and loop through the datasets and tables in each projects to get the information needed. Keep in mind that you can also use API or a client library.

Append one table to another

Is there currently a way to append data from one table to another via. the API and PHP?
For instance:
I have two tables;
today
all_time
at the end of every day I want to append today into all_time and both tables use the same schema.
It's possible, you just need to pass in the async query configuration writeDisposition=WRITE_APPEND and setup the destination tables.
read about writeDisposition here: https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource
Other than this, you can directly write the results of the query to a table in query mode, using the Destination Table option that is available under Show Options.

Google Bigquery create table with no code? syntax

I was hoping to use basic SQL Create Table syntax within Google BigQuery to create a table based on columns in 2 existing tables already in BQ. The Google SQL dialect reference does not show a CREATE. All of the documentation seems to imply that I need to know how to code.
Is there any syntax or way to do a
CREATE TABLE XYZ AS
SELECT ABC.123, DFG.234
from ABC, DFG
?
You cannot do it entirely through a SQL statement.
However, the UI does allow you to save results to a table (max result size is 64MB compressed. The API and command line clients have the same capabilities.