How do I connect various different client's google analytics (GA4 & UA) to one instance of Big Query? I want to store the analytics reports on bigquery & then visualise it on a unified dashboard on Looker
You can set up the exports from Google Analytics to go to the same BigQuery project and transfer historical data to the same project as well.
Even if data is spread across multiple GCP projects, you can still query all from a single project. I would suggest you create a query that connects data from multiple sources together, you can then save it as a view and add it as a source in Looker, you can use it as a custom query in Looker or for best efficiency save the results of your query as a new reporting table that feeds into Looker.
Related
I want to know if it is possible to link multiple different google analytics properties in a single Bigquery project by separating the property datasets by properties.
I checked the acceptance message that the linking is going well, but I have no idea that the data will be saved in a dataset or different datasets.
Yes. It is possible to link multiple GA properties to a single GCP Project but in different BigQuery datasets. In case of Universal Analytics, the ID of each BQ dataset will be same as that of GA View ID. And in case of GA4 the ID of each BQ dataset will be analytics_<property_id>.
I need to understand the below:
1.) How does one BigQuery connect to another BigQuery and apply some logic and create another BigQuery. For e.g if i have a ETL tool like Data Stage and we have some data been uploaded for us to consume in form of a BigQuery. So in DataStage or using any other technology how do i design the job so that the source is one BQ and the Target is another BQ.
2.) I want to achieve like my input will be a VIEW (BigQuery) and then need to run some logic on the BigQuery View and then load into another BigQuery view.
3.) What is the technology used to connected one BigQuery to another BigQuery is it https or any other technology.
Thanks
If you have a large amount of data to process (many GB), you should do the transformation of the data directly in the Big Query database. It would be very slow to extract all the data, run it through something locally, and send it back. You don't need any outside technology to make one view depend on another view, besides access to the relevant data.
The ideal job design will be an SQL query that Big Query can process. If you are trying to link tables/views across different projects then the source BQ table must be listed in fully-specified form projectName.datasetName.tableName in the FROM clauses of the SQL query. Project names are globally unique in Google Cloud.
Permissions to access the data must be set up correctly. BQ provides fine-grained control over who can access, and it is in the BQ documentation. You can also enable public access to all BQ users if that is appropriate.
Once you have that SQL query, you can create a new view by sending your SQL to Google BigQuery either through the command line (the bq tool), the web console, or an API.
1) You can use BigQuery Connector in DataStage to read and write to bigquery.
2) Bigquery use namespaces in the format project.dataset.table to access tables across projects. This allows you to manipulate your data in GCP as it were in the same database.
To manipulate your data you can use DML or standard SQL.
To execute your queries you can use the GCP Web console or client libraries such as python or java.
3) BigQuery is a RESTful web service and use HTTPS
We are moving our BigQuery data from QA to production environment.
For that we have created new google account for production environment.
How we can transfer wildcard table data from one google account to another ?
You can use google groups to very fast copy tables between projects/datasets for different google accounts.
Set up a google group from the main google account
Invite the new google account (as owner) to the google group
Accept the invitation from the new google accounts gmail.
Share the original data set using the shared google group email. Under dataset name select the arrow down & pick Share dataset. Make sure to share as group and not user and make the account an owner (or you can not copy tables)
From the new google account create a new project & dataset in BQ. Then add the old project id to the new google account under Switch to project/ Display project (under the arrrow down under the dataset name). You can now see the old project/dataset and all its tables from the new google account. From there you can copy any tables from the old project to the new project/google account. Very large tables within seconds.
Edit: I think you need to use old UI for this to work since the options does not seem to be available in the new one yet
You can move the BigQuery data from your source account/project by using the logic of exporting the BigQuery dataset to a GCS bucket and then importing the data to the new BigQuery dataset located in your destination account/project.
Export the data from your source BigQuery account/project to a regional or multi-region Cloud Storage bucket in the same location
as your dataset.
Grant the required GCS and BigQuery permissions to the account that will be used to load the data in the destination
account/project by using the IAM console.
Load your data into your destination BigQuery account/project based on the data format selected during the export task.
There are no charges for exporting data from BigQuery to Cloud Storage and vice versa; Nevertheless, you do incur charges for storing the data. I suggest you to take a look on the Free Operations when using BigQuery to know more about this.
I've added an AdWords Transfer in BigQuery, which appears to be working as expected.
However, I have multiple MCC accounts in my AdWords account and would like to see aggregated data for these and I can only see the individual accounts.
Is it possible to see aggregated data for my MCC accounts using the data transfer service in BigQuery? and if not, how would I go about setting this up?
Each transfer config will load into separate tables.
You can directly reference the tables you are interested in, or you can use scheduled queries (currently in Alpha - https://docs.google.com/forms/d/1n4H54RLPPB0C5C9liBmVSUf-R6DRovu6eaWDhvuPYVQ/edit) to create aggregated tables on a regular basis.
We are very pleased with the combination BigQuery <-> Tableau Server with live connection. However, we now want to work with a data extract (500MB) on Tableau Server (since this datasource is not too big and is used very frequently). This takes too much time to refresh (1.5h+). We noticed that only 0.1% is query time and the rest is data export. Since the Tableau Server is on the same platform and location, latency should not be a problem.
This is similar to the slow export of a BigQuery table to a single file, which can be solved by using "daisy chain" option (wildcards). Unfortunately we can't use similar logic with a Google BigQuery data extract refresh in Tableau...
We have identified some approaches, but are not pleased with our current ideas:
Working with incremental refresh: our existing BigQuery table rows can change: these changes can only be applied in Tableau if you do a full refresh
Exporting the BigQuery table to GCS using the daisy chain option and making a Tableau data extract using the Tableau SDK: this would result in quite some overhead...
Writing a Dataflow job using a custom sink for Tableau Server (data extracts).
Experimenting with a Tableau web connector that communicates directly with the BigQuery API: I don't think this will be faster? I didn't see anything about parallelizing calls with the Tableau web connecector, but I didn't try this approach yet.
We would prefer a non-technical option, to limit maintenance... Is there a way to modify the Tableau connector to make use of the "daisy chain" option for BigQuery?
You've uploaded the data in BigQuery. Can't you just use the input for that load job (a CSV perhaps) as input for Tableau?
When we use Tableau and BigQuery we also notice that extracts are slow but we generally don't do that because you lose BigQuery's power. We start with a live data connection at first, and then (if needed) convert this into a custom query that aggregates that data into a much smaller datasets which extracts in just a few seconds.
Another way to achieve higher performance with BigQuery and Tableau is aggregating or joining tables on beforehand. JOINs on huge tables can be slow, so if you use a lot of those you might consider generating a denormalised dataset which does all of the JOIN-ing first. You will get a dataset with a lot of duplicates and a lot of columns. But if you select only what you need in Tableau (hide unused fields!) then these columns won't count in your query cost.
One recommendation I have seen is similar to your point 2 where you export the BQ table to Google Cloud Storage and then use the Tableau Extract API to create a .tde from the flat files in GCS.
This was from an article on the Google Cloud site so I'd assume it would be best practice:
https://cloud.google.com/blog/products/gcp/the-switch-to-self-service-marketing-analytics-at-zulily-best-practices-for-using-tableau-with-bigquery
There is an article here which provides a step by step guide to achieving the above.
https://community.tableau.com/docs/DOC-23161
It would be nice if Tableau optimised the BQ connector for extract refresh using the BigQuery Storage API. We too have our Tableau Server environment in the same GCP zone as our BQ datasets and experience slow refresh times.