I need to find a way how to (SQL)-join my GA360 tables in BigQuery(BQ) with data within AdsDataHub(ADH).
I already know how to query tables from BQ within ADH:
SELECT *
FROM 'projectname.table_name'
But I cant find any resources on what matching key to use in the Join statement
SELECT
*
FROM
adh.*** AS adh_data
adh_data LEFT JOIN ???
ON ga360.??? = ???
I read through this https://developers.google.com/ads-data-hub/guides/join-your-data
But it's not really clear to me what to get/use from it and I couldn't find any information on this topic anywhere.
Thank you in advance!
AFAIK, ADH doesn't currently allow for querying across google analytics data sets (which would already be in ADHs "clean room" if they wanted you to be able to make such queries...)
Your best option might be to A: make sure that you're capturing 1st party IDs in your google analytics implementation and B: ensuring those IDs are also captured in your CRM platforms as they interact with your properties (assumption being your CRM can capture, along with that ID, any Google Analytics related data you may find useful, though it won't be log level I don't think...)
From there, with "onboarding" of sorts, you may be able to eventually drop your CRM data into ADH queryable tables which can be joined (per the link you shared, "join your data") and then well... you're at google's behest for the most part, but I think that's the path you're looking for...
PS: Google may have some solutions with guides that include some useful example queries regarding join keys across CM/DV/GoogleAds tables, and they may be high quality, but they may not be EXACTLY what you're looking for... It's entirely possible they are not publicly available though...
Related
Attached is a sample of my table structure currently.
My data source is from Google Campaign manager. When I extract the different tables as indicated in the sheet, I get a difference in figures(I am taking over from a person who did the initial design). E.g Impressions might be one figure in my “fact table” and another figure somewhere else.
Problem is, there are no primary keys and also tying the tables to one another is also difficult, link might be between dates. The database is Google Bigquery.
Do you have any idea how to do a proper data design with this type of marketing data and does it need to be denormalised? From research I gathered, Bigquery data design is best as denormalised.
I would also like to move away from spreadsheets, I believe this is also part of the chaos.
I believe CM360 Data Platform is my fact table.
[CM360_Tables Overview.xlsx - Google Sheets 1]
[etl_process.txt - Google Drive 1]
I'm currently doing a benchmark to see if Google Cloud Datastore could suit our needs but I've got a problem with how indexes are handled.
I know that I will never have to filter on anything except the key field, and thus I would like to be able to disable the built-in indexing of all the other fields. I just want to use it as a key/value store.
I'm currently looking at potentially multiple TB of indexes if I cannot disable them (~50 fields, billions of rows) and that would kill our budget.
Is there any way to remove these indexes ? It seems the index.yaml file this link talks about is only about composite indexes.
Thanks for your help !
Found it ! You can explicitly tell Datastore not to index your field by doing it like this (excluded properties)
I have researched in Datastore github issues about this same question, about (2015), the last inquiry was on 2019. But there is no response. You can ask there if it has been any
I have also researched in the Public Issue Tracker PIT of Google Cloud Platform for an existing Feature Request (FR) or Issue related with this, but not found any.
I think the best way to proceed is to file a FR with the proper components. In this way the Engineering team will have visibility about this. The PIT uses the number of "stars" (people who have indicated interest in an issue) to prioritize work on the platform. Given that there is no FR opened, you should open a new one.
Is there a way in native SQL, SQL database specific (i.e. PostGresQL) or another (NoSQL database) to subscribe to query and receive updates when a entry matches the criteria? For example I have the query: SELECT * FROM users WHERE birthday = today() is it possible to receive update when a entry matches the criteria instead of using the so called 'pulling' mechanism? The query can be slightly more complex because this idea is required for a solution which send recurring messages based on the user preferences.
The only database I know that has built-in notifications like this is RebirthDB with a feature called "changefeeds":
They allow clients to receive changes on a table, a single document,
or even the results from a specific query as they happen. Nearly any
ReQL query can be turned into a changefeed.
The only problem is that the database began life as RethinkDB, but the company making it folded in 2016, leaving it to the open-source community. It's still alive as "RebirthDB" on GitHub with active development, but the documentation is just a copy of the old Rethink docs with GitHub notices. They have a website url, but no website. I hope they can keep it alive: it's a great idea.
https://github.com/RebirthDB/docs
I'm looking for a workaround on the following issue. Hope someone can help.
I'm unable to backfill data in the ga_sessions_ table in BigQuery through product linking in GA. e.g. partition ga_sessions_20180517 is missing
This specific view has already been linked before. Google documentation says that historical load is only done once per view (hence, the issue) (https://support.google.com/analytics/answer/3416092?hl=en)
Is there any way to work around it?
Kind regards,
Martijn
You can use Google Analytics Reporting API to get the data for that view. This method has lot of restrictions like sometimes the data is sampled/only 7 dimensions can be exported in one call, but at least you will be able to fetch your data in a partitioned manner.
Documentation hereDoc
If you need a lot of dimensions/metrics in hit level format, scitylana.com has a service that can provide this data historically.
If you have a clientId set in a custom dimension the data-quality is near perfect.
It also works without a clientId set.
You can get all history as available through the API.
You can get 100+ dimensions/metrics in one batch into BQ.
I've been trying to get my head around NoSQL, and I do see the benefits to embedding data in documents.
What I can't understand, and hope someone can clear up, is how to store data if it must be relational.
For example.
I have many users. They are all buying a product. So everytime that they buy a product, we add it under the users document in mongo, so its embedded and its all great.
The problem I have is when something in reference to that product changes.
Lets say user A buys a car called "Porsche". Then, we add a reference to that under the users profile. However, in a strange turn of events Porsche gets purchased by Ferrari.
What do you do now, update each and every record and change to name from Porsche to Ferrari?
Typically in SQL, we would create 3 tables. One for users, one for Cars (description, model etc) & one for mapping users to purchases.
Do you do the same thing for Mongo? It seems like if you go down this route, you are trying to make Mongo do things SQL way, which is not what its intended for.
I can understand how certain data is great for embedding (addresses, contact details, comments, etc) but what happens when you need to reference data that can and needs to change at a regular basis?
I hope this question is clear
DBRefs/Manual References were made specifically to solve this issue. Instead of manually adding the data to each document and then needing to update when something changes, you can store a reference to another collection. Here is the mongoDB documentation for details.
References in Mongo
Then all you would need to do is update the reference collection and the change would be reflected in all downstream locations.
When i used the mongoose library for node js it actually creates 3 tables similar to how you might do it in SQL, you can use object id's as foreign keys and enrich them either on the client side or on the backend, still no joining but you could do an 'in' query for the ID's then enrich the objects that way, mongoose can do this automatically by 'populating'