How to set data limit per query in Big Query - sql

How can we set a data limit pertained to every query.
Ex - In one single query run, no more than 10TB data can be scanned.
I have been following this link and gotten info on how to set limit at user and project level.
https://cloud.google.com/bigquery/docs/custom-quotas, but haven't been able to find quota limits per query.

That is not a valid option for custom quotas currently in Bigquery.
A custom quota for bigquery can set a limit on the amount of query data processed per day at the project-level or at the user-level.
Project-level custom quotas limit the aggregate usage of all users in that project.
User-level custom quotas are separately applied to all users and service accounts within a project.
So, in summary, there is no way to limit the number of queries o the amount of data on a single query. It is a daily global limit on data processed.

Related

How to set a limit on data scanned by a user in Big Query

Is there a way to set a limit on a user to not query data of more than a particular size in a table.
Eg - If a user uses 'Select' command, he should be limited to query a certain amount of data irrespective of the query he writes.
I have been trying to follow this link - https://cloud.google.com/bigquery/quotas.
You can set quotas at the project-level or at the user-level. Be careful though since it is not possible to assign a custom quota to a specific user or service account.
You can set quotas by following this steps in the cloud console:
Go to quotas page
Select only BigQuery service
Look for Query usage per day per user or Query usage per day and select it
Click on Edit Quotas
Set the limit on the right side, validate and Submit
More information:
https://cloud.google.com/bigquery/docs/custom-quotas
https://cloud.google.com/docs/quota#managing_your_quota_console

Is it possible to retrieve full query history and correlate its cost in google bigquery?

I am querying multiple tables and I am able to see the cost of each query for my personal use. As I view the Query History I only see the queries I ran on my account.
So my question is, is it possible to somehow to see the queries which have been run by others (as well as the cost of the query ) in a project from the query history ?
You can use Jobs information schema:
SELECT query, total_bytes_processed FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE project_id = 'you_project_id' AND user_email = 'my#eamil.com'
According to the documentation, there is not a direct method of getting costs by job and user. However, there is a way of doing it.
For a detailed billing analysis, I would advise you to export the logs to BigQuery with a custom filter and from there analyse the billing for each user and query job.
So, you can create an export using the Logs Viewer or the API. While creating your sink use the following custom filter:
resource.type="bigquery_resource"
logName="projects/<your_project>/logs/cloudaudit.googleapis.com%2Fdata_access"
protoPayload.methodName="jobservice.jobcompleted"
The above filter will retrieve completed query jobs whilst the data access logs are a comprehensive audit of every query run in BigQuery along with the total bytes scanned. I would like to point that you have to make sure that data_access logs are enable, link.
From the log entries you will get the fields:
protoPayload.authenticationInfo.principalEmail
protoPayload.serviceData.jobCompletedEvent.job.jobName.jobId
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.query
protoPayload.serviceData.jobCompletedEvent.job.jobStatistics.totalBilledBytes
In BigQuery, you can use a query as follows:
SELECT
protopayload_auditlog.authenticationInfo.principalEmail AS email,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes AS total_billed_bytes,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.query AS query,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobName.jobId as job_id
FROM
`<myproject>.<mydataset>.cloudaudit_googleapis_com_data_access`
WHERE
protopayload_auditlog.methodName = 'jobservice.jobcompleted';
Afterwards, to get an estimate of the price per each query you can use the totalBilledBytes and the Pricing summary in order to add a new column with a price estimative for each query. Therefore, you have a final table with the user's email, the query code, total bytes billed, job id and an estimate price.

Does querying a public BigQuery dataset counts towards project quota?

I am running a Google Cloud Platform project that utilizes BigQuery in Sandbox mode (no billing enabled). In this project, I query solely public datasets.
The Quota (in IAM & admin) shows 0 MiB although I queried a few 100 GBs already.
This raises the question of whether or not querying public BigQuery datasets counts towards project quota.
The first 1TB of data you query will be free. After that you will be billed at $5 per TB.
You can monitor your usage in logs, or I find using billing easier for this, you will get an exact usage figure, Product will be 'BigQuery', SKU will be 'Analysis'. If the data you were querying was not a public dataset you would also be charged for 'Active Storage'.
Relevant quote 1:
You pay only for the queries that you perform on the data (the first 1 TB per month is free, subject to query pricing details).
And 2:
To get started using a BigQuery public dataset, you must create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you must also enable billing.
Source: https://cloud.google.com/bigquery/public-data/

Where do you get Google Bigquery usage info (mainly for processed data)

I know that BigQuery offers the first "1 TB of data processed" per month for free but I can't figure out where to look on my dashboard to see my monthly usage. I used to be able to "revert" to the old dashboard which had the info but for the past couple of weeks the "old dashboard" isn't accessible.
From the Google Cloud Console overview page for your project, click on the "details" section on the top-right, next to the charge estimate :
You'll get an estimate of the charges for the current month for each service and item in the service, including Big Query analysis :
If you want to track this usage, you can also export the data into CSV every day by going in the Billing settings and enable the usage export feature. Do not worry about the fact that it only mentions Compute Engine, it actually works for other services also.
You can also access directly the billing history by clicking on the billing account link :
You will get a detailed bill with the usage info :
Post GCP Console Redesign Answer
The GCP console was redesigned and now the other answer here no longer applies, but it is still possible to view your usage by going to IAM & Admin -> Quotas.
What you're looking for is "Big Query API: Query usage per day". It doesn't seem possible to view your usage over 30 days unfortunately, but you can see your current usage (per day) and your peak usage over the past 7 days. You can also set a daily quota. If you're just working infrequently or doing a lot in one day, you can set a quota to 1 TiB and prevent yourself from blowing your whole allocation in one day.
You can try sending feedback about these limitations, like I did, by clicking the question mark at the top right and then send feedback.
Theo is correct that there is no way to view the number of bytes processed or billed since the start of the month (inside of the free tier) in the GCP Billing Console. However, you can extract the bytes processed and bytes billed data from logs in Cloud Logging and calculate the total bytes processed/billed since the start of the month inside of BigQuery.
Here are the steps to count total bytes billed in a month:
Under Cloud Logging, go to Logs Explorer (NOT the Legacy Logs Explorer) and run the following query in the query builder frame:
resource.type="bigquery_project" AND
protoPayload.metadata.jobChange.job.jobStats.queryStats.totalBilledBytes>1 AND
timestamp>="2021-04-01T00:00:00Z"
The timestamp clause is not actually necessary, but it will speed up the query. You can set timestamp >= <value> to any valid timestamp you want as long as it returns at least one result.
In the Query Results frame, click the "Action" button, and select "Create Sink".
In the window that opens, give your sink a name, click "Next", and in the "Select sink service" dropdown menu select "BigQuery dataset".
In the "Select BigQuery dataset" dropdown menu, either select an existing dataset where you would like to create your sink (which is a table containing logs) or if you prefer, choose "Create new BigQuery dataset.
Finally, you will likely want to check the box for Partition Table, since this will help you control costs whenever you query this sink. As of the time of this answer, however, Google limits partition tables to 4000 partitions, so you may find it is necessary to clear out old logs eventually.
Click "Create Sink" (there is no need for any inclusion or exclusion filters).
Run a query in BigQuery that produces bytes billed (i.e. a query that does not return a previously cached result). This is necessary to instantiate the sink. Moments after your query runs, you should now see a table called <your_biquery_dataset>.cloudaudit_googleapis_com_data_access
Enter the following Standard SQL query in the BigQuery query editor:
WITH
bytes_table AS (
SELECT
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.createTime') AS date_time,
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.queryStats.totalBilledBytes') AS billedbytes
FROM
`<your_project><your_bigquery_dataset>.cloudaudit_googleapis_com_data_access`
WHERE
EXTRACT(MONTH
FROM
timestamp) = 4
AND EXTRACT(YEAR
FROM
timestamp) = 2021)
SELECT
(SUM(CAST(billedbytes AS INT64))/1073741824) AS total_GB
FROM
bytes_table;
You will want to chance the month from 4 to whatever month you intend to query, and 2021 to whatever year you intend to query. Also, you may find it helpful to save this query as a view if you intend to rerun it periodically.
Be advised that your sink does not contain your past BigQuery logs, only BigQuery logs produced after you created the sink. Therefore in the first month the number of GB returned by this query will not be an accurate count your bytes billed in month unless you happen to have created the sink prior to running any queries in BigQuery during the current month.
Might be related to How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?
If you are fine by using BigQuery itself to get that information (instead of using a UI), you can use something like this:
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Open in BigQuery UI
Explanation
Please consider the caveats I mentioned in my answer here

Teradata : Query to find CPU utilization user level (History data)

I have the query to find the current CPU and IO utilization at user level
Query
SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC
But I want to check the history data( date and time) of CPU/IO utilization at user level
AccountName|UserName|CPU|DISKIO|Date/Time
Big Picture
A utility will be created which will fetch data from Teradata and try to generate graphs for the same on daily basis. The report will provide all the utilization details for the whole day which will help us to plot graph. The whole utility will be schedule to run once on daily basis.
Restrictions:
Being developer . we are not allowed to use Teradata Manager
Normally data in AmpUsage will be historized, i.e. a daily job INSERTs the result of your query into a history table and then DELETEs all from AmpUsage.
Depending on the Account strings used there might be "account string expansion" (ASE) in place, e.g. ..._&D_&H results in one row per AMP per user per hour like ..._131025_09 which can produce hourly usage data.