Big Query Extract With Filter - google-bigquery

Hello I'm very new in big query. I have some situation where my project needs to import from big query result. Here I'm facing some problems:
If I use bq extract, then I cannot use filter or where condition
If I use bq query and then save the result into some files, our server cannot process such large data. And some fields in data itself may contain comma and vertical bar character.
Can I achieve the condition where I could export or extract with filter with the the most efficient ways?
Thank you

The EXPORT DATA statement allows you to combine a query with extracting data to Cloud Storage in a single job.
See more details here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
EXPORT DATA OPTIONS(
uri='gs://mybucket/my-path-prefix/*',
format='CSV',
header=true) AS
SELECT field1, field2, field3 FROM mydataset.mytable WHERE some_condition

Related

From Big Query to Google cloud storage

I like to export data from Big query to Google cloud storage using any script. Also for multiple table using loop save in CSV format and overwrite existing file.
Also how can we schedule this script.
If anybody have answer that will be great help.
Thanks in advance
Common way to approach this problem is to use Airflow and write a DAG to meet your requirements.
But if you want to iterate tables and dump them in GCS on a regular basis only with BigQuery, following could be another option.
1. Export Data
You can export data to GCS with EXPORT DATA statement in BigQuery script.
EXPORT DATA OPTIONS(
uri='gs://bucket/folder/*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
2. Loops and Dynamic SQL
If you have a list of table you want to dump, you can loop those tables in BigQuery FOR loop.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#loops
And you need to generate EXPORT DATA script dynamically for each table. To do so, you can use EXECUTE IMMEDIATE Dynamic SQL.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#execute_immediate
3. Scheduling
BigQuery provides a feature to schedule a user query and you can use it for your purpose.
https://cloud.google.com/bigquery/docs/scheduling-queries#set_up_scheduled_queries

Can I use big query export data statement and scheduled the query?

I have a similar question asked in this link BigQuery - Export query results to local file/Google storage
I need to extract data from 2 big query tables using joins and where conditions. The extracted data has to be placed in a file on cloud storage. Mostly csv file. I want to go with a simple solution. Can I use big query export data statement In standard sql and schedule it?? Does it has a limitation of 1 Gb export?? If yes, what is the best possible way to implement this? Creating another temp table to save results from the query and using a data flow job to extras the data from the temp table? Please advise.
Basically google cloud now supports below
Please see code snippet in cloud documentation
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#exporting_data_to_csv_format
I’m thinking if I can use the above statement to export data into a file and select query will have join from 2 tables and other conditions.
This query will be a scheduled query in big query.
Any inputs please??

Write Bigquery results to Bigquery in Datalab

I ran a query abc and got the result table m in Datalab.
Is there any way I can create a new table in Bigquery and write the content of table m to it in Datalab?
%%bq query --name abc
select *
from `test`
m=abc.execute().result()
Yes you can, Datalab supports a number of BigQuery magic commands, among them is create, which allows for the creation of datasets and tables.
See the documentation here:
http://googledatalab.github.io/pydatalab/datalab.magics.html
And specifically this section, which details all the available BigQuery commands:
positional arguments:
{sample,create,delete,dryrun,udf,execute,pipeline,table,schema,datasets,tables,extract,load}
commands
sample Display a sample of the results of a BigQuery SQL
query. The cell can optionally contain arguments for
expanding variables in the query, if -q/--query was
used, or it can contain SQL for a query.
create Create a dataset or table.
delete Delete a dataset or table.
dryrun Execute a dry run of a BigQuery query and display
approximate usage statistics
udf Create a named Javascript BigQuery UDF
execute Execute a BigQuery SQL query and optionally send the
results to a named table. The cell can optionally
contain arguments for expanding variables in the
query.
pipeline Define a deployable pipeline based on a BigQuery
query. The cell can optionally contain arguments for
expanding variables in the query.
table View a BigQuery table.
schema View a BigQuery table or view schema.
datasets List the datasets in a BigQuery project.
tables List the tables in a BigQuery project or dataset.
extract Extract BigQuery query results or table to GCS.
load Load data from GCS into a BigQuery table.

Pricing of bq query with destination table

Im running bigquery command line query with destination table
like bq query --destination_table with some select statements from src table.
Whether this will be considered as loading data or querying data ?
Because, loading data is free and query data is going to cost.
My intention is to move some data from src to destination with some manipulations on src fields . So bq query with destination looks a perfect fit for this .
If you are running a query, then you are billed for the cost of the query. It doesn't matter whether you have specified a destination table. If you want to avoid the cost of querying, you need to extract the data, perform whatever transformation you want, and then load it again.

How to load csv data which is control+A separated into bigquery

I'm trying to load a CSV file which is control+A separated into bigquery. What should be the option I pass for -F parameter for the bq load command? All the options I have tried are resulting in an error while loading.
I would guess that Control+A is used in some legacy formats that OP wants to load into BigQuery. From the other hand Control+A can be chosen when it is hard to select any of usually used delimiters.
My recommendation would be to load your CSV file without any delimiter, so whole row will be loaded as a one field
Assuming your rows loaded into TempTable look like below with just one column called FullRow.
'value1^Avalue2^Avalue3'
where ^A is "invisible" character
So, after you loaded your file into BigQuery - now you can parse it to separate columns and write it to final table with something like below
SELECT
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){0}(\w*)') AS col1,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){1}(\w*)') AS col2,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){2}(\w*)') AS col3
FROM TempTable
Above is confirmed to work as I used this approach multiple times. Works for both Legacy and Standard SQL