How can I see my accumulated query costs for BigQuery while in the free trial? - google-bigquery

Google Cloud billing is not updating with the free trial (on monthly payments) and I can not change it to a faster update cycle. As per https://cloud.google.com/free-trial/docs/billing-during-free-trial the bill should come every month.
It is therefore not easy to see how much of the 300$ is left.
Is there any way to at least see how many TBs my queries used? This should be by far the biggest item on the bill.
I am concerned that I might get 'stuck' between some important queries that I otherwise could have managed better to have at least partial results available after the trial ends.

BigQuery analysis & storage costs should be listed under your GCP billing transactions:
https://console.cloud.google.com/billing/<INSERT_YOUR_BILLING_ID_HERE>/history?e=13803970,13803205
Another way to see how much you have queried is by enabling audit logging as described here.

Related

BQ: How to check query cost every time

Is it possible to show alert or message popup every time I run queries in BQ GUI?I am afraid of spending query cost too much.
I hope BQMate has this function.
Sometimes the cost of the query can only be determined when the query is finished, e.g, federated tables, and the newly released clustering tables. If you're concerned about the cost, the best option is to set the Maximum Bytes Billed option, then you can be sure you'll never be charged for more than that. You can set a default value for this option in your project, but right now you have to contact the support to set it for your project.
A fast way to get a query cost estimation is checking the amount of data processed on the right side of the screen in the query validator, by performing a dry-run. Check here a "query validator" example. You have two options to calculate the cost:
Manually: query pricing is described here on GB units, so you can sum and multiply: 1 free TB per month, $5 per extra TB. If you expect to query more than 1TB of data per month, you should sum queries' used data to know when to start calculating costs.
Automatically: using the online pricing calculator, which is available for all Google Cloud Platform products.
If you want to set custom cost controls, have a look on this page, since custom quotas are not enabled by default. Cost controls can be applied on project -level or user-level by restricting the number of bytes billed. Nowadays you have to submit a request from the Google Cloud Platform Console to ask for them to be set, on 10TB increments. If the usage exceeds a set quota the error message is quite clear, and is different depending on the project/user quota exceeded. For project quota:
Custom quota exceeded: Your usage exceeded the custom quota for
QueryUsagePerDay, which is set by your administrator. For more information,
see https://cloud.google.com/bigquery/cost-controls
With no remaining quota, BigQuery stops working for everyone in that project.
If you want to constantly monitorize billing data for BigQuery, have a look on this tutorial, which explains how to create a billing dashboard using Data Studio.
I don't know about BQMate since this is from Vaint Inc.

List all the queries made to BigQuery with their processed bytes

I would like to know if there is a method in the BigQuery API or any other way where i can list all the queries made and their processed bytes. Something like what is listed in the Activity Page but with the processedBytes field:
https://console.cloud.google.com/home/activity?project=coherent-server-125913
We are having a problem with billing. Suddenly our BigQuery Analysis Costs have increased a lot and we think we are being charged like 20 times more than expected (we check all the responses from BigQuery API and save the processedBytes field, taking into account that the minimum charge is of 10MB).
The only way we can solve this difference is listing all the requests and comparing to our numbers to see if we arenĀ“t measuring something or if we are doing something wrong. We have opened a billing support ticket and they have redirected me to Stackoverflow for asking the question as they think that is a technical issue.
Thanks in advance!
Instead of checking totalBytesProcessed - you should try checking totalBytesBilled and billingTier (see here)
You might jumped to high billing tiers - just guess
The best place to check would be the BigQuery logs.
This is going to tell you what queries were run, who ran them, what date/time they were run, the total bytes billed etc.
Logs can be a bit tedious to look through but BigQuery allows you to stream BigQuery logs into a BigQuery table and you can then query said table to identify expensive queries.
I've done this and it works really well to give you visibility on your BQ charges. The process of how to do this is outlined in more detail here: https://www.reportsimple.com.au/post/google-bigquery

How do I find out how much of my 1TB free monthly allotment I've used in Google BigQuery?

I feel like there must be some way to figure out how much of free 1TB is left besides summing up the "bytes processed" amounts for every single query. But I haven't been able to find it anywhere in the console or elsewhere.
Unfortunately this is not super easy right now. The best way to get the answer is, as you said, to sum up the bytes processed for the queries you've run.
You can get the data via the BQ jobs.list API, or you can use BQ's audit logs if that's more convenient. The audit logs can even be queried in BQ, but of course that incurs additional usage. :-)
You can also see your unbilled usage for GCP services via the GCP console. However, this only shows BQ usage from the free 1 TB tier once you've incurred some actual BQ charges (i.e., once you've gone over the 1 TB), which makes it less useful for your particular use case.
To expand on Jeremy's answer, you might get that information also directly via the INFORMATION_SCHEMA.JOBS_BY_* views as I explained in my answer to How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time? based on How to monitor query costs in Google BigQuery
SELECT
(SUM(total_bytes_processed) * 5) / (1024 * 1024 *1024 *1024),
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
Please consider the caveats mentioned in the linked answers (caching, region-specific costs per TB, ...)!

Google bigquery for free?

Im new here and looking actually for this:
https://temboo.com/hardware/google-big-query-getting-started
Its going about how to connect Sensors to Google Bigquery,
but I actually don't know whether it is free or not.
My usage per month were around 1GB.
Please tell me what I can get for free there, I'm absolutely beginner and don't want get a big bill.
Thanks,
Petr
BigQuery charges for query processing and storage.
Query processing will likely be free in your case, since the first 1 TB per month is free, and you're only using 1 GB per month.
You will likely to have pay a small amount to store the data you're querying, but at that scale we're talking pennies ($0.02/GB/month).
Full pricing details:
https://cloud.google.com/bigquery/pricing

Data warehouse, data update strategy with Bigquery

We have a MIS where stores all the information about Customers, Accounts, Transactions and etc. We are building a data warehouse with BigQuery.
I am pretty new on this topic, Should we
1. everyday extract ALL the customer's latest information and append them to a BigQuery table with timestamp,
2. or we only extract the updated customer's information on that day?
First solution uses a lot of storage and takes time to upload data, and got lots of duplicates. But it's very clear for me to run query. For 2nd solution, given a specific date how can I get the latest record for that day?
Similar for Account data, an example of simplified Account table, only 4 fields here.
AccountId, CustomerId, AccountBalance, Date
If I need to build a report or graphic of a group of customers' AccountBalance everyday, I need to know the balance of each account on every specific date. So should I extract each account record everyday, even it's the same as last day, or I can only extract the account when the balance changed?
What is the best solution or your suggestion? I prefer the 2nd one because there are no duplicates, but how can I construct the query in BigQuery, will performance be an issue?
What else should I consider? Any recommendation for me to read?
When designing DWH you need to start from business questions, translate them to KPIs, measures, dimensions, etc.
When you have those in place...
you chose technology based on some of the following questions (and many more):
who are your users? in what frequency and what resolutions they consume the data? what are your data sources? are they structured? what are the data volumes? what is your data quality? how often your data structure changes? etc.
when choosing technology you need to think of the following: ETL, DB, Scheduling, Backup, UI, Permissions management, etc.
after you have all those defined... data schema design is pretty straight forward and is derived from "The purpose of the DWH" and your technology limits.
You have pointed out some of the points to consider, but the answer is based of your needs... and is not related to specific DB technology.
I am afraid your question is too general to be answered without deep understanding of your needs.
Referring to your comment bellow:
How reliable is your source data? Are you interested in the analyzing trends or just snapshots? Does your source system allow "Select all" operations? what are the data volumes? What resources does your source allow for extraction (locks, bandwidth, etc.)?
If you just need a daily snapshot of the current balance, and there are no limits by your source system,
it would be much simpler to run a daily snapshot.
this way you don't need to manage "increments", handle data integrity issues and systems discrepancies etc. however, this approach might have undesired impact on your source system, and your network costs...
If you do have resources limits, and you chose the incremental ETL approach, you can either
create a "Changes log" table and query it, you can use row_number()
in order to find latest record per account.
or yo can construct a copy of the source accounts table, merging
changes everyday to an existing table.
each approach has its own aspect of simplicity, costs, and resource consumption...
Hope this helps