Is there a way to see how much data was processed by a query run by a BigQuery user? - google-bigquery

We have a project which is accessed by multiple users. Is there a way to see how much data is being processed by queries run by each of these users?

Take a look at Jobs: list API
You can retrieve all jobs run in given project by all users including those that are query.
In Response you should look for totalBytesProcessed , totalBytesBilled and billingTier as well as user_email
As an option you can consider using Analyzing Audit Logs Using BigQuery

Related

How do I create a BigQuery dataset out of another BigQuery dataset?

I need to understand the below:
1.) How does one BigQuery connect to another BigQuery and apply some logic and create another BigQuery. For e.g if i have a ETL tool like Data Stage and we have some data been uploaded for us to consume in form of a BigQuery. So in DataStage or using any other technology how do i design the job so that the source is one BQ and the Target is another BQ.
2.) I want to achieve like my input will be a VIEW (BigQuery) and then need to run some logic on the BigQuery View and then load into another BigQuery view.
3.) What is the technology used to connected one BigQuery to another BigQuery is it https or any other technology.
Thanks
If you have a large amount of data to process (many GB), you should do the transformation of the data directly in the Big Query database. It would be very slow to extract all the data, run it through something locally, and send it back. You don't need any outside technology to make one view depend on another view, besides access to the relevant data.
The ideal job design will be an SQL query that Big Query can process. If you are trying to link tables/views across different projects then the source BQ table must be listed in fully-specified form projectName.datasetName.tableName in the FROM clauses of the SQL query. Project names are globally unique in Google Cloud.
Permissions to access the data must be set up correctly. BQ provides fine-grained control over who can access, and it is in the BQ documentation. You can also enable public access to all BQ users if that is appropriate.
Once you have that SQL query, you can create a new view by sending your SQL to Google BigQuery either through the command line (the bq tool), the web console, or an API.
1) You can use BigQuery Connector in DataStage to read and write to bigquery.
2) Bigquery use namespaces in the format project.dataset.table to access tables across projects. This allows you to manipulate your data in GCP as it were in the same database.
To manipulate your data you can use DML or standard SQL.
To execute your queries you can use the GCP Web console or client libraries such as python or java.
3) BigQuery is a RESTful web service and use HTTPS

scheduling a query to copy data from a dataset between projects in BigQuery

We want to perform a test on BigQuery with scheduled queries.
The test retrieves a table from a dataset and, basically, copies it in another dataset (for which we have permission as owners) in another project. So far, we managed to do that with a script we wrote in R against the BigQuery API in a Google Compute Engine instance but we want/need to do it with scheduled queries in BigQuery.
If I just compose a query for retrieving the initial table data and I try to schedule it, I see there's a project selector but it's disabled so seems like I'm tied to the project for the user I'm logging in with.
Is this doable or am I overdoing it and using the API is the only option to do this?
Is this doable or am I overdoing it and using the API is the only option to do this?
The current scheduler logic doesn't allow this and for that reason, the project drop-down is disabled in the webUI.
As an example, I tried setting this schedule Job
CREATE TABLE IF NOT EXISTS `projectId.partitionTables.tableName` (Field0 TIMESTAMP) --AS SELECT * FROM mydataset.myothertable
And this is the error returning from the transferAPI
You will need to ask BigQuery team to add this option to future version of th scheduler API

Run specific BigQuery job from Google Scripts API

Is it possible to run a specific job from BigQuery calling it by its jobId from Google Scripts API instead of pasting the entire query?
I would like to set up a trigger to run a job periodically, but I do not want to paste the entire query to Scripts API, because of error-prone and time consuming formatting.
Update:
Queries should be able to use temporary functions.
No, you can't re-run a job like that using its id. But you could use the API to get the details of the job and pull the SQL from it i.e. https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
I'm not exactly sure why you can't just have the SQL in the script. That said, I'd just put the SQL in a view in BigQuery anyway, and call the view from your script.
a job is an instance of a query, so technically you can't run the same job multiple times. You can probably access the job details and fetch the query, but it would be much simpler to just create a new job with the given query.
I see that you want to avoid longer queries from running into formatting errors. if your job is only about reading things you then create views for the complicated queries and just trigger jobs that do Select * from view instead. This way you 1) have the query definitions easy to access, straight on the BQ UI and 2) don't run into formatting problems when assembling the job

How can I see jobs from all users in the Web UI?

Our backend service runs BigQuery jobs using it's own account. And as an administrator I would like to see those jobs in the Web UI. Is there some permissions I can set or query parameters I can use to do this?
You need to first setup Audit Logs.
Once you have it in a table (maybe next day) you can write a query to list all jobs
a simple query is:
SELECT
protopayload_auditlog.authenticationInfo.principalEmail,
protopayload_auditlog.methodName,
protopayload_auditlog.servicedata_v1_bigquery.jobInsertRequest.resource.jobConfiguration.query.query,
protopayload_auditlog.servicedata_v1_bigquery.jobInsertResponse.resource.jobName.jobId,
protopayload_auditlog.servicedata_v1_bigquery.jobInsertResponse.resource.jobStatistics.createTime
FROM [wr_auditlogs.cloudaudit_googleapis_com_data_access_20171208]
where protopayload_auditlog.serviceName='bigquery.googleapis.com'
which will list these things
email who executed (user,service account)
method name eg (jobservice.insert)
query string if this was a query job (there are extract and cancel jobs as well)
the jobID
the created time
You can setup this or more advanced query as a view, and you can query this view periodically. You can retrieve also via API, see that discussed here logging all BigQuery queries
It is now possible to view jobs and queries from other users in the web UI. You need bigquery.jobs.list permission for this.
To list queries, in the UI, go to "Query History", and be sure to check "show queries for all users". You are then able to filter the queries on the user you are looking for.
To list jobs, in the UI, go to "Job History". You are then able to filter the queries on the user you are looking for.
You need the permission bigquery.jobs.listAll
You can find this permission in some of the predefined roles.

When running Tableau against BigQuery, there is no Query History

When running a report on the Tableau Cloud against BigQuery, there is no Query History.
Even when I refresh the datasource with a new day of data, and I see that the report now shows the new date, there is no Query History. I need to see the History to calculate costs and to understand what Tableau is doing behind the scenes.
Make sure to use same user for tableau and for retrieving query history
There is a flag administrators can use on the jobs API to retrieve jobs for all users in the project they administer. See the allUsers flag described at:
https://cloud.google.com/bigquery/docs/reference/v2/jobs/list
This functionality is not available in the UI but you can use the command line tool (bq) to retrieve jobs across users. It has a flag that enables this option.