Loading statistics(Logs) of all BigQuery load jobs in my project to BigQuery table - google-bigquery

After the end of Apache Beam (Google Cloud Dataflow 2.0) job, we get a readymade command at the end of logs bq show -j --format=prettyjson --project_id=<My_Project_Id> 00005d2469488547749b5129ce3_0ca7fde2f9d59ad7182953e94de8aa83_00001-0 which can be run from the Google Cloud SDK command prompt.
Basically it shows all the information like Job start time, end time, number of bad records, number of records inserted,etc.
I can see these information on the Cloud SDK console but where these information is stored?
I checked in the stackdriver logs, it has the data till previous day and even not the complete information which is shown on the Cloud SDK console.
If I want to export these information and load into the BigQuery, where can I get it.
Update : This is possible and I found the information when I added filter resource.type="bigquery_resource" in the Stackdriver logs viewer but It shows Timestamp information like CreateTime, StartTime and EndTime as 1970-01-01T00:00:00Z

You can export these logs into google cloud bucket. From stackdriver click on create export and then create sink providing sink name and sink destination which is bucket path obviously. Now next time when job get started then all the logs get exported and you can use those logs further.

Related

BigQuery - Data transfer "Detected that no changes will be made to the destination table"

I use a script to generate files from an API and store them on Google Cloud Storage. Following this documentation, https://cloud.google.com/bigquery/docs/cloud-storage-transfer?hl=en_US#limitations, I've created a BigQuery table with the corresponding schema in advance and t then created a Data Transfer with the following configuration:
When I run the Data Transfer the following error shows up in the logs:
Detected that no changes will be made to the destination table
I've updated some of the files, added files, deleted files, etc and everytime I get the same message. I also have other Data Transfers that work just fine with the same BigQuery instance and Cloud Storage bucket.
Only issue I found on SO, Not able to update Big query table with Transfer from a Storage file, says you need to wait 1 hour, but even after a day I get the same error.
Any idea as to what triggers BiQuery to determine changes have been made (or not)?

BigQuery Data Transfer - location not supported in command line, but works in GCP UI

I was able to create data transfer in GCP UI from bucket in europe-west3 location to BigQuery dataset which is also located in europe-west3 zone.
When I try to do the same with bq mk --transfer_config ... command, I get an error:
BigQuery error in mk operation: BigQuery Data Transfer Service does not yet support location: europe-west3
Where does this difference come from?
I reproduced the same problem and definitely this is not the expected behavior.
If you look in the documentation (and in the image below), you'll see that this feature is not available in this specific region. The only regions available in Europe for Data Transfer are europe-north1, europe-west2 and europe-west6 Also, this region was never mentioned in any release notes.
Given that, I opened a ticket for your case in Issue Tracker (Google support channel).
You can find the progress in this link

Cannot Export a Table from BigQuery to Google Cloud Storage

I am trying to export a table from big query to google cloud storage from console/command line. The console job runs for a few minutes and errors out without any error code and the command line job also after running for sometime gives the below error:
BigQuery error in extract operation: Error processing job 'data-flow-experiment:bqjob_r308ff0f73d1820a6_00000157f77e8ab9_1': Backend error. Job aborted.
Job id of the command line is given above.
The billing is enabled for the project and the big query service is also enabled.
Also I get the below error when I try to create a bucket in the Google Cloud Storage:
AccessDeniedException: 403 The account for the specified project is read only.
Though the IAM user I am using has owner access and I have created buckets using this account previously and have also extracted tables in the past.
Please guide.
For the bigquery issue:
Do you happen to have a timestamp column which have out-of-range values (say, far far far into the future)?
If so, you can just wait for two more days, as the fix is

Google Cloud Logging export to Big Query does not seem to work

I am using the the google cloud logging web ui to export google compute engine logs to a big query dataset. According to the docs, you can even create the big query dataset from this web ui (It simply asks to give the dataset a name). It also automatically sets up the correct permissions on the dataset.
It seems to save the export configuration without errors but a couple of hours have passed and I don't see any tables created for the dataset. According to the docs, exporting the logs will stream the logs to big query and will create the table with the following template:
my_bq_dataset.compute_googleapis_com_activity_log_YYYYMMDD
https://cloud.google.com/logging/docs/export/using_exported_logs#log_entries_in_google_bigquery
I can't think of anything else that might be wrong. I am the owner of the project and the dataset is created in the correct project (I only have one project).
I also tried exporting the logs to a google storage bucket and still no luck there. I set the permissions correctly using gsutil according to this:
https://cloud.google.com/logging/docs/export/configure_export#setting_product_name_short_permissions_for_writing_exported_logs
And finally I made sure that the 'source' I am trying to export actually has some log entries.
Thanks for the help!
Have you ingested any log entries since configuring the export? Cloud Logging only exports entries to BigQuery or Cloud Storage that arrive after the export configuration is set up. See https://cloud.google.com/logging/docs/export/using_exported_logs#exported_logs_availability.
You might not have given edit permission for 'cloud-logs#google.com' in the Big Query console. Refer this.

How to download all data in a Google BigQuery dataset?

Is there an easy way to directly download all the data contained in a certain dataset on Google BigQuery? I'm actually downloading "as csv", making one query after another, but it doesn't allow me to get more than 15k rows, and rows i need to download are over 5M.
Thank you
You can run BigQuery extraction jobs using the Web UI, the command line tool, or the BigQuery API. The data can be extracted
For example, using the command line tool:
First install and auth using these instructions:
https://developers.google.com/bigquery/bq-command-line-tool-quickstart
Then make sure you have an available Google Cloud Storage bucket (see Google Cloud Console for this purpose).
Then, run the following command:
bq extract my_dataset.my_table gs://mybucket/myfilename.csv
More on extracting data via API here:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Detailed step-by-step to download large query output
enable billing
You have to give your credit card number to Google to export the output, and you might have to pay.
But the free quota (1TB of processed data) should suffice for many hobby projects.
create a project
associate billing to a project
do your query
create a new dataset
click "Show options" and enable "Allow Large Results" if the output is very large
export the query result to a table in the dataset
create a bucket on Cloud Storage.
export the table to the created bucked on Cloud Storage.
make sure to click GZIP compression
use a name like <bucket>/prefix.gz.
If the output is very large, the file name must have an asterisk * and the output will be split into multiple files.
download the table from cloud storage to your computer.
Does not seem possible to download multiple files from the web interface if the large file got split up, but you could install gsutil and run:
gsutil -m cp -r 'gs://<bucket>/prefix_*' .
See also: Download files and folders from Google Storage bucket to a local folder
There is a gsutil in Ubuntu 16.04 but it is an unrelated package.
You must install and setup as documented at: https://cloud.google.com/storage/docs/gsutil
unzip locally:
for f in *.gz; do gunzip "$f"; done
Here is a sample project I needed this for which motivated this answer.
For python you can use following code,it will download data as a dataframe.
from google.cloud import bigquery
def read_from_bqtable(bq_projectname, bq_query):
client = bigquery.Client(bq_projectname)
bq_data = client.query(bq_query).to_dataframe()
return bq_data #return dataframe
bigQueryTableData_df = read_from_bqtable('gcp-project-id', 'SELECT * FROM `gcp-project-id.dataset-name.table-name` ')
yes steps suggested by Michael Manoochehri are correct and easy way to export data from Google Bigquery.
I have written a bash script so that you do not required to do these steps every time , just use my bash script .
below are the github url :
https://github.com/rajnish4dba/GoogleBigQuery_Scripts
scope :
1. export data based on your Big Query SQL.
2. export data based on your table name.
3. transfer your export file to SFtp server.
try it and let me know your feedback.
to help use ExportDataFromBigQuery.sh -h