BigQuery cost for a project - google-bigquery

How to find out BigQuery cost for a project programmatically. Is there an API to do that?
Also, is it possible to know the user level cost details for queries made?

To track individual costs for a BigQuery project, you can redirect all logs back to BigQuery - and then you can run queries over these logs.
https://cloud.google.com/bigquery/docs/reference/auditlogs
These logs include who ran the query, and how much data was scanned.
Another way is using the INFORMATION_SCHEMA table, check this post:
https://www.pascallandau.com/bigquery-snippets/monitor-query-costs/

You can use Cloud Billing API
For example
[GET] https://cloudbilling.googleapis.com/v1/projects/{projectsId}/billingInfo

Related

How to get query time and space information from BigQuery API

I'm going to build a web app and use BigQuery as a part of backend database, and I want to show the query cost information (ex. 1.8 sec elapsed, 264.9 MB processed) in the app.
I know we can check the BigQuery's query information inside GCP, but how do we I get that information from BigQuery API?
The information you are interested in is present in the job statistics.
See jobs.get for more details: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
The dry-run sample may be of interest as well, though you can get the stats from a real invocation as well (dry run is for estimating costs without executing the query):
https://cloud.google.com/bigquery/docs/samples/bigquery-query-dry-run

How to pre-process BigQuery data coming from Stackdriver

I am currently exporting logs from Stackdriver to BigQuery using sinks. But i am only interessted in the jsonPayload. I would like to ignore pretty much everything else.
But since the table creation and data insertion happens automatically, i could not do this.
Is there a way to preprocess data coming from sink to store only what matters?
If the answer is no, is there a way to run a cron job each day to copy yesterday data into a seperate table and then remove it? (knowing that the tables are named using timestamps which makes it possible to query them by day)
As far as I know both options mentioned are currently not possible in the GCP platform. On my end I've also tried to create an internal reproduction of your request and noticed that there isn't a way to solely filter the jsonPayload.
I would therefore suggest creating a feature request in regards to your ask on the following public issue tracker link. Note that feature requests do not have an ETA as to when they'll processed or if they'll be implemented.

How to be notified for high costs of queries in BigQuery?

I have a project in BigQuery where many people update/add Views.
Other access Views/Tables from 3rd party softwares like Tableau.
I have no control for example if the Analysit who wrote the query in Tableau used the Partition of the table or not.
Is it possible somehow to ask BigQuery to send email for each query that passes threshold? For example 20GB. Then I can check this specific query and user to see if it's OK or not (I'm not forcing partition as it's not always what we need)
I know that it's possible to use the Stackdriver Logging export to download logs into BigQuery tables / storage but I don't see anything there that can tell me if query passed this specific criteria.
There are different solutions available but the best is using Cloud Pub/Sub topics and piece of Cloud Function:
Enable programmatic notifications to receive Cloud Pub/Sub messages with the current status of your budget
Programmatic Budgets Notification Examples

How to track BigQuery usage by user/service account

I have a BigQuery account which is accessed by many internal and external users/service accounts. Recently, with the growth of bill, we started researching how to increase the visibility of how much cost is going to each user/service account.
I know there is a way to get this info somewhere through the BigQuery API but I was wondering if there is any other easy way to get this info. Has anybody had a similar problem?
Restating the question: how to track how much data each BigQuery user/service account has processed?
Use BigQuery's audit logs (via Stackdriver) to track access and cost details as described in the docs.
A good tip is to export the logs back to BigQuery for analysis.
https://cloud.google.com/bigquery/audit-logs

Using BigQuery for logs analysis

Im trying to do logs analysis with BigQuery. Specifically, I have an appengine app and a javascript client that will be sending log data to BigQuery. In bigquery, I'll store the full log text in one column but also extract important fields into other columns. I then want to be able to do adhoc queries over those columns.
Two questions:
1) Is BigQuery particularly good or particularly bad at this use case?
2) How do I setup revolving logs? I.e. I want to only store the last N logs or the last X GB of log data. I see delete is not supported.
Just so you know, there is an excellent demo of moving App Engine Log data to BigQuery via App Engine MapReduce called log2bq (http://code.google.com/p/log2bq/)
Re: "use case" - Stack Overflow is not a good place for judgements about best or worst, but BigQuery is used internally at Google to analyse really really big log data.
I don't see the advantage of storing full log text in a single column. If you decide that you must set up revolving "logs," you could ingest daily log dumps by creating separate BigQuery tables, perhaps one per day, and then delete the tables when they become old. See https://developers.google.com/bigquery/docs/reference/v2/tables/delete for more information on the Table.delete method.
After implementing this - we decided to open source the framework we built for it. You can see the details of the framework here: http://blog.streak.com/2012/07/export-your-google-app-engine-logs-to.html
If you want your Google App Engine (Google Cloud) project's logs to be in BigQuery, Google has added this functionality built in to the new Cloud Logging system. It is a beta feature known as "Logs Export"
https://cloud.google.com/logging/docs/install/logs_export
They summarize it as:
Export your Google Compute Engine logs and your Google App Engine logs to a Google Cloud Storage bucket, a Google BigQuery dataset, a Google Cloud Pub/Sub topic, or any combination of the three.
We use the "Stream App Engine Logs to BigQuery" feature in our Python GAE projects. This sends our app's logs directly to BigQuery as they are occurring to provide near real-time log records in a BigQuery dataset.
There is also a page describing how to use the exported logs.
https://cloud.google.com/logging/docs/export/using_exported_logs
When we want to query logs exported to BigQuery over multiple days (e.g. the last week), you can use a SQL query with a FROM clause like this:
FROM
(TABLE_DATE_RANGE(my_bq_dataset.myapplog_,
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'), CURRENT_TIMESTAMP()))