Who pays when using BigQuery Storage Read API across projects? - google-bigquery

Say that I have two projects in GCP, Projects A and B. Project A has a BigQuery table, and project B needs to read that data using the BigQuery Storage Read API.
If I create a service account in project B, then go to project A and assign this user the role BigQuery Read Session User as well as add them to the dataset ACL, they will be able to stream the table content. Which project will receive the bill for the data extracted? Project A where the read session is created, or Project B which is the home of the acting service account?
To be clear, I would like for Project B to pay for the load they generate.
I have tried to find a way to be explicit about this, but as far as I can tell there is no way to specify billing project when creating a read session. I have also checked what happens when I try to create a read session with the "parent project" set to Project B while the table location says Project A, and this just leads to the table not being found at all.

In Storage Read API pricing, BigQuery charges for the number of bytes processed (also referred to as bytes read). In your scenario, Project A has the Bigquery table and where the Read Session happens and you just attached Project B's service account as BigQuery Read Session User in Project A, hence the billed amount will go to Project A.
As an alternative, you can check using Billing Reports the cost trends for Bigquery Storage API with the following filters:

Related

How to query a BigQuery table in one GCP project and one location and write results to a table in another project and another location with Airflow?

I need to query a BigQuery table in one GCP project (say #1) and one location (EU) and write results to a table in another project (say #2) and another location (US) with Airflow.
Composer/Airflow instance itself runs in project #2 and location US.
Airflow is using GCP connection configured with a service account from project #2 which also has most of the rights in project #1.
I realise that this might involve multiple extra steps such as storing data temporarily in GCS, so this is fine as long as the end result is achieved.
How should I approach this problem? I saw quite a few articles but none does suggest a strategy for dealing with this situation which I suppose is fairly common.

Google Data Fusion Salesforce to Bigquery Pipeline, automatic way of managing schema updates in Salesforce

Hey I am trying to create some batch jobs that reads from a couple Salesforce Objects and pushes them to BQ. Every-time batch process runs it will truncate the table in BQ and push all the data in the SF object back into BQ. Is it possible for google data fusion to automatically detect changes in an object in Salesforce(like adding a new column or changing data types of a column) then be registered and pushed to BQ via google data fusion?
For SF side of the puzzle you could look into https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_describeGlobal.htm and If-Modified-Since header telling you if the definition of table(s) changed. That url is for all tables in the org or you run table-specific metadata describe calls with https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_sobject_describe.htm
But I can't tell you how to use it in your job.
You can use the provided answer of #eyescream to be the condition or the trigger for the update to BigQuery. You may push changes to BigQuery using the pre-built plugin Stream Source approach from Datafusion in which, as mentioned in this docmentation, it
tracks updates in Salesforce sObjects. Examples of sObjects are
opportunities, contacts, accounts, leads, any custom object, etc.
You may use this approach to automatically track changes and push them to BigQuery. You can also find the whole Salesforce Streaming Source configuration reference in this documentation as also redirected from google's official documentation.
However, if you want a more dynamic approach for your overall use case, you may also use the integration of BigQuery with Salesforce. However in this approach, you will need to build your own code in which you can also use #eyescream 's answer as the primary condition/trigger and then automatically push the update to your BigQuery schema.

Is it possible to extract job from big query to GCS across project ids?

Hey guys trying to export a bigquery table to cloud storage a la this example . Not working for me at the moment, am worried that the reason is that the cloud storage project is different to the bigquery table, is this actually doable? I can't see how using that template above.
Confirming:
You CAN have your table in ProjectA to be exported/extracted to GCS bucket in ProjectB. You just need make sure you have proper permissions on both sides. At least:
READ for respective dataset in Project A and
and
WRITE for respective bucket in Project B
Please note: Data in respective dataset of Project A and bucket in Project B - MUST be in the same location - US or EU , etc.
Simply to say: sourse and destination must be in the same location

BigQuery: Is it possible it to have the query cost goes to the project owner?

I know if any user query data from a dataset, that person will get billed for the query, while the project will get billed for the storage. Is it possible to set it up so the project or billing account which creates the project get billed for the query instead of the person who did the query?
I guess one solution to this is to create a service account and have that service account does the querying through a web app.
Who is billed for BigQuery queries?
it is not the user per se who is being billed for querying data - it is rather project (from which query is being executed) get's billed. Of course if that given user happen to be a billing owner - it means that the user gets billed.
The only way I know to change this is:
be added (as Project Viewer at least ) to the project that queryable data is in
log into that project and execute query from within that project
Note if you are in Web UI - the active project becomes a billing project. and if you are using bq command line - you need to set default or billing project using respectve flag

Google BigQuery, unable to load data into shared datasets

I created a project on Google BigQuery and enabled billing.
Went on to create few datasets that were shared with my team members (Can EDIT premissions).
However, my team mates are unable to load data into the respective datasets shared with them. Whenever they try it says billing not enabled for this project.
I am able to load data into the datasets but not my team.
It's been more than 24 hours
Thanks in advance
Note that in order to load data, they need to run a load job, and that load job needs to be run in a project. Perhaps billing is not enabled on the project they are using?
You can give your team members read access to the project (or greater) to allow them to run jobs in your own billing-enabled project.
You can share a BigQuery project at the project level and at the dataset level.
See https://developers.google.com/bigquery/access-control.
I assume you are sharing at the dataset level. Can you try sharing the project instead with your team members? (here: https://cloud.google.com/console/project)
Please report back!