Datalab does not populate bigQuery tables - google-bigquery

Hi I have a problem while using ipython notebooks on datalab.
I want to write the result of a table into a bigQuery table but it does not work and anyone says to use the insert_data(dataframe) function but it does not populate my table.
To simplify the problem I try to read a table and write it to a just created table (with the same schema) but it does not work. Can anyone tell me where I am wrong?
import gcp
import gcp.bigquery as bq
#read the data
df = bq.Query('SELECT 1 as a, 2 as b FROM [publicdata:samples.wikipedia] LIMIT 3').to_dataframe()
#creation of a dataset and extraction of the schema
dataset = bq.DataSet('prova1')
dataset.create(friendly_name='aaa', description='bbb')
schema = bq.Schema.from_dataframe(df)
#creation of the table
temptable = bq.Table('prova1.prova2').create(schema=schema, overwrite=True)
#I try to put the same data into the temptable just created
temptable.insert_data(df)

Calling insert_data will do a HTTP POST and return once that is done. However, it can take some time for the data to show up in the BQ table (up to several minutes). Try wait a while before using the table. We may be able to address this in a future update, see this
The hacky way to block until ready right now should be something like:
import time
while True:
info = temptable._api.tables_get(temptable._name_parts)
if 'streamingBuffer' not in info:
break
if info['streamingBuffer']['estimatedRows'] > 0:
break
time.sleep(5)

Related

Bigquery - browsing table data on table of type view

I want to use the preview/head feature of BigQuery to see sample data of a table without charge
as described here, and to do so i tried using the python api listed here
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to browse data rows.
# table_id = "your-project.your_dataset.your_table_name"
# Download all rows from a table.
rows_iter = client.list_rows(table_id) # Make an API request.
# Iterate over rows to make the API requests to fetch row data.
rows = list(rows_iter)
which results in:
BadRequest: 400 GET https://bigquery.googleapis.com/bigquery/v2/projects/your-project/datasets/your_dataset/tables/your_table_name/data?formatOptions.useInt64Timestamp=True&prettyPrint=false: Cannot list a table of type VIEW.
Is there a way to preview a table of type view?
Is there another free alternative?
You cannot use the TableDataList JSON API method(the API you are using here) to retrieve data from a view. This is limitation of view. So, only way is to use the original table to preview the data.
I assume you could write the contents of the view into a temp table and link to that instead. Not the cleanest solution I'd agree on that.
drftr

Creating table in big query by uploading csv

I am new to big query . I am trying o create a table by uploading csv. Its size if 290 kb. Even if I fill all the required information the thee dots beside create table keeps moving(like loading ) but even after waiting for a long time, the table doesn't get created.
You can upload the CSV in a bucket and then reference it from BigQuery creation panel.
Here is the official guide from Google, with the screenshot. Should be rather simple. ( https://cloud.google.com/bigquery/docs/schema-detect)
On step 4 of the image below, select the path to the file and CSV format.
On step 5 you can either keep everything like it is or select "External Table" (which I recommend), in order to delete the table in case of error and not lose the CSV.
BigQuery should automatically handle the rest. Please, share more detailed information in case of error.
There are couple of ways through which you can upload CSV file to Bigquery as given below :-
Write an Apache beam code (Python/Java) and get data loaded to Bigquery. Sample code for reading and writing you can combine it.
Write a python script which is responsible for loading data to Bigquery.
import pandas as pd
from pandas.io import gbq
import os
import numpy as np
dept_dt=pd.read_csv('dept_data')
#print(dept_dt)
# Replace with Project Id
project = 'xxxxx-aaaaa-de'
#Replace with service account path
path_service_account = 'xxxxx-aaaa-jhhhhh9874.json'
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path_service_account
dept_dt.to_gbq(destination_table='test1.Emp_data1',project_id='xxxxxx-ccvd-err',if_exists='fail')

Making iPython BigQuery Magic Function SQL query dynamic

I am using the bigquery magic function in Jupyter and would like to be able to dynamically change the project and dataset. For example
Instead of
%%bigquery table
SELECT * FROM `my_project.my_dataset.my_table`
I want
project = my_project
dataset = my_dataset
%%bigquery table
'SELECT * FROM `{}.{}.my_table`'.format(project,dataset)
According to the IPython Magics for BigQuery documentation is not possible to pass the project nor the dataset as parameters; nonetheless, you can use the BigQuery client library to perform this action in Jupyter Notebook.
from google.cloud import bigquery
client = bigquery.Client()
project = 'bigquery-public-data'
dataset = 'baseball'
sql ="""SELECT * FROM `{}.{}.games_wide` LIMIT 10"""
query=sql.format(project,dataset)
query_job = client.query(query)
print("The query data:")
for row in query_job:
# Row values can be accessed by field name or index.
print("gameId={}, seasonId={}".format(row[0], row["gameId"]))
I also recommend you to take a look in public documentation to know how to visualize BigQuery data in a Jupyter notebooks.

Append to tables in python

I would like to simply update a table in bigquery in python. I have a large table of data that I need to constantly update every hour.
The closest to updating tables I could find is this link here. However, only the command line and WebUI are supported for this feature.
Is it possible to do so? Or are there other alternatives? I tried searching for a similar question but did not find. thanks
There is a Python example on the documentation you shared and you can find the complete examples on Github.
with open(filename, "rb") as source_file:
job = client.load_table_from_file(source_file, table_ref, job_config=job_config)
job.result() # Waits for table load to complete.
I’m not sure if this is what you mean by “updating” the table, but you can truncate before inserting or append to the existing data by changing the write disposition of your load job’s configuration. Possible values are:
WRITE_TRUNCATE,
WRITE_APPEND,
WRITE_EMPTY

Can I issue a query rather than specify a table when using the BigQuery connector for Spark?

I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:
conf = {
# Input Parameters.
'mapred.bq.project.id': project,
'mapred.bq.gcs.bucket': bucket,
'mapred.bq.temp.gcs.path': input_directory,
'mapred.bq.input.project.id': 'publicdata',
'mapred.bq.input.dataset.id': 'samples',
'mapred.bq.input.table.id': 'shakespeare',
}
# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'
# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
'org.apache.hadoop.io.LongWritable',
'com.google.gson.JsonObject',
conf=conf)
copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?
Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
https://cloud.google.com/bigquery/docs/exporting-data
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract