dataset was not found in location EU (ga_sessions data) - google-bigquery

I have the following problem.
I've created the big query export adjusting the linkage from google analytics console.
And as expected there is a dataset created in the Big Query storage where the data is flowing on the daily basis.
The timezone and country in the GA account is Germany, but the location of the final data set in BQ is US (althoug I didn't specify it when I was linking the data), that causes some issues when connecting the data from this property with the other data I have in the storage.
My questions are:
Can someone please explain why it could have happened?
Is there any solution except copying the whole dataset to the new location?
Are there any other potential problems with having the dataset in the different location from the other datasets? (except of not able to query them at once?)
Really appreciate your help!
Thanks in advance

The data is located in US because it is the default location for the GA export data to BQ feature. It is documented here (Step 2.1).
Consider localizing your dataset to the E.U. at this step.
Data is geolocated in the U.S. by default. Localizing your data to the
EU after the initial export can cause issues with querying across
BigQuery regions. Resolving those issue may require a transfer of
data, which has associated costs. We recommend creating the
E.U.-localized dataset at this point in order to avoid any negative
side effects.
Google Analytics BigQuery Export is incompatible with GCP policies
that prevent dataset creation in the US. If you have such a policy on
your GCP project, you will have to remove it to export your data to
the EU.
The only solution in order to have the GA data in EU is to copy the whole dataset to a new one located in EU with a different name. Then, delete the original one, and copy the new one again to the original dataset.
Your main blocker is what you have mentioned. You won't be able to JOIN GA data with any other data located in other region. In addition, there might be legal issues because of the GDPR.

Related

How to query a BigQuery table in one GCP project and one location and write results to a table in another project and another location with Airflow?

I need to query a BigQuery table in one GCP project (say #1) and one location (EU) and write results to a table in another project (say #2) and another location (US) with Airflow.
Composer/Airflow instance itself runs in project #2 and location US.
Airflow is using GCP connection configured with a service account from project #2 which also has most of the rights in project #1.
I realise that this might involve multiple extra steps such as storing data temporarily in GCS, so this is fine as long as the end result is achieved.
How should I approach this problem? I saw quite a few articles but none does suggest a strategy for dealing with this situation which I suppose is fairly common.

Backfill Google Analytics in BigQuery

I'm looking for a workaround on the following issue. Hope someone can help.
I'm unable to backfill data in the ga_sessions_ table in BigQuery through product linking in GA. e.g. partition ga_sessions_20180517 is missing
This specific view has already been linked before. Google documentation says that historical load is only done once per view (hence, the issue) (https://support.google.com/analytics/answer/3416092?hl=en)
Is there any way to work around it?
Kind regards,
Martijn
You can use Google Analytics Reporting API to get the data for that view. This method has lot of restrictions like sometimes the data is sampled/only 7 dimensions can be exported in one call, but at least you will be able to fetch your data in a partitioned manner.
Documentation hereDoc
If you need a lot of dimensions/metrics in hit level format, scitylana.com has a service that can provide this data historically.
If you have a clientId set in a custom dimension the data-quality is near perfect.
It also works without a clientId set.
You can get all history as available through the API.
You can get 100+ dimensions/metrics in one batch into BQ.

Is it possible to extract job from big query to GCS across project ids?

Hey guys trying to export a bigquery table to cloud storage a la this example . Not working for me at the moment, am worried that the reason is that the cloud storage project is different to the bigquery table, is this actually doable? I can't see how using that template above.
Confirming:
You CAN have your table in ProjectA to be exported/extracted to GCS bucket in ProjectB. You just need make sure you have proper permissions on both sides. At least:
READ for respective dataset in Project A and
and
WRITE for respective bucket in Project B
Please note: Data in respective dataset of Project A and bucket in Project B - MUST be in the same location - US or EU , etc.
Simply to say: sourse and destination must be in the same location

Google Cloud Logging export to Big Query does not seem to work

I am using the the google cloud logging web ui to export google compute engine logs to a big query dataset. According to the docs, you can even create the big query dataset from this web ui (It simply asks to give the dataset a name). It also automatically sets up the correct permissions on the dataset.
It seems to save the export configuration without errors but a couple of hours have passed and I don't see any tables created for the dataset. According to the docs, exporting the logs will stream the logs to big query and will create the table with the following template:
my_bq_dataset.compute_googleapis_com_activity_log_YYYYMMDD
https://cloud.google.com/logging/docs/export/using_exported_logs#log_entries_in_google_bigquery
I can't think of anything else that might be wrong. I am the owner of the project and the dataset is created in the correct project (I only have one project).
I also tried exporting the logs to a google storage bucket and still no luck there. I set the permissions correctly using gsutil according to this:
https://cloud.google.com/logging/docs/export/configure_export#setting_product_name_short_permissions_for_writing_exported_logs
And finally I made sure that the 'source' I am trying to export actually has some log entries.
Thanks for the help!
Have you ingested any log entries since configuring the export? Cloud Logging only exports entries to BigQuery or Cloud Storage that arrive after the export configuration is set up. See https://cloud.google.com/logging/docs/export/using_exported_logs#exported_logs_availability.
You might not have given edit permission for 'cloud-logs#google.com' in the Big Query console. Refer this.

BigQuery Data Location Setting

Is there a way to determine "BigQuery Data Location Setting", similar to "Cloud Storage Data Location Setting" or “Datastore Data Location Setting”?
Apparently there are some legal & tax issues for companies operating outside of the US when using services hosted in the US. Our legal guys have asked me to configure the BigQuery location to be in EU. But i couldn't find where to configure this.
Thanks
There isn't currently a way to locate your BigQuery data in the EU. Right now, all of it is located in the United States.
That said, one of the reasons why this hasn't been done yet is due to lack of customer interest in EU datacenters. If you have a relationship with google cloud support and want this feature, please let them know. Alternately, vote up the question and we'll take that into account when we prioritize new features.
This appears to have changed now, so you can actually select the EU datacenter:
http://techcrunch.com/2015/04/16/google-opens-cloud-dataflow-to-all-developers-launches-european-zone-for-bigquery/
Another issue arises when you want to copy datasets from one region to the other which is not currently possibly (at least directly). Here is how you can check the location of your dataset. Open up a Google Cloud Shell and enter this command:
bq show --format=prettyjson {PROJECT_ID}:{DATASET_NAME} | grep location
However, note that you cannot edit the location. You will need to backup/export all your tables, delete the dataset, and recreate the dataset with the desired location.