How to see BQ query cost before running it? - google-bigquery

Previously we had BQ Mate and SuperQuery that are no longer working in new UI to check the cost of query.
Is there any free or relatively inexpensive solution that would show cost estimation before running query?

Google Cloud Pricing Calculator will be the best option to check cost estimation for any google cloud service
https://cloud.google.com/bigquery/docs/estimate-costs

According to the doc
To estimate costs before running a query, you can use one of the following methods:
Query validator in the Google Cloud console
--dry_run flag in the bq command-line tool
dryRun parameter when submitting a query job using the API
The Google Cloud Pricing Calculator
Client libraries
In your case you have to use the Query validator from BQ Console with Pricing calculator .When you enter any query in BQ console the Query validator validates the query as well as show how much data will be read at runtime.
You have to enter those details in the Pricing Calculator and then you will be able to calculate estimated cost.

Related

Recommendations AI to Big Query : Cloud Function gives timeout exception

I am trying to get the predicted results from the Recommendations AI and store those predictions back into the big query.
After fetching a few rows, the Cloud Function gives timeout exception. Is there any way I can increase the timeout of Cloud Function? or if I can push the predicted results directly to the Big Query without any interaction of Cloud Function?
As mentioned in the Answer:
The maximum run time of 540 seconds applies to all Cloud Functions, no
matter how they're triggered. If you want to run something longer you
will have to either chop it into multiple parts, or run it on a
different platform, such as Compute Engine or App Engine.
Default timeout time can be changed here.
Follow this : select function >test function > edit > timeout
If you want to directly load your data to the BigQuery, you can follow this blog.
you can also look for the documentation on export prediction data to BigQuery as this is not the recommended method you can go for Remote Config personalization.
For more information related to BigQuery ML support model you can refer to the answer and blog.

How to get query time and space information from BigQuery API

I'm going to build a web app and use BigQuery as a part of backend database, and I want to show the query cost information (ex. 1.8 sec elapsed, 264.9 MB processed) in the app.
I know we can check the BigQuery's query information inside GCP, but how do we I get that information from BigQuery API?
The information you are interested in is present in the job statistics.
See jobs.get for more details: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
The dry-run sample may be of interest as well, though you can get the stats from a real invocation as well (dry run is for estimating costs without executing the query):
https://cloud.google.com/bigquery/docs/samples/bigquery-query-dry-run

BigQuery cost for a project

How to find out BigQuery cost for a project programmatically. Is there an API to do that?
Also, is it possible to know the user level cost details for queries made?
To track individual costs for a BigQuery project, you can redirect all logs back to BigQuery - and then you can run queries over these logs.
https://cloud.google.com/bigquery/docs/reference/auditlogs
These logs include who ran the query, and how much data was scanned.
Another way is using the INFORMATION_SCHEMA table, check this post:
https://www.pascallandau.com/bigquery-snippets/monitor-query-costs/
You can use Cloud Billing API
For example
[GET] https://cloudbilling.googleapis.com/v1/projects/{projectsId}/billingInfo

Google Dataflow instance and BigQuery cost considerations

I am planning to spin up a dataflow instance on google cloud platform to run some experiments. I want to get familiar with, and experiment with using apache beam to pull data from BigQuery, run some ETL jobs (in python) and streaming jobs, and finally store the result in BigQuery.
However, I am also concerned with sending my company's GCP bill through the roof. What are the main cost considerations, or any methods to estimate what the cost will be, so I don't get an earful from my boss.
Any help would be greatly appreciated, thanks!
You can use calculator to get an estimate of price of the job.
One of the most important resource on the dataflow side is CPU per hour. To limit the cpu hours, you can set the maximum machines using option maxNumWorkers in your pipeline.
Here are more pipeline options that you can set while running your dataflow job https://cloud.google.com/dataflow/docs/guides/specifying-exec-params
For BQ, you can do a similar estimate using the calculator.

How to be notified for high costs of queries in BigQuery?

I have a project in BigQuery where many people update/add Views.
Other access Views/Tables from 3rd party softwares like Tableau.
I have no control for example if the Analysit who wrote the query in Tableau used the Partition of the table or not.
Is it possible somehow to ask BigQuery to send email for each query that passes threshold? For example 20GB. Then I can check this specific query and user to see if it's OK or not (I'm not forcing partition as it's not always what we need)
I know that it's possible to use the Stackdriver Logging export to download logs into BigQuery tables / storage but I don't see anything there that can tell me if query passed this specific criteria.
There are different solutions available but the best is using Cloud Pub/Sub topics and piece of Cloud Function:
Enable programmatic notifications to receive Cloud Pub/Sub messages with the current status of your budget
Programmatic Budgets Notification Examples