One BigQuery table in same dataset for each integration with same product in Stitch - google-bigquery

We are using the setup of BigQuery, Stitch and Branch.io. In Branch.io we generate deep links which are shared from our app. The data from Branch.io we the ETL with Stitch to our data warehouse at BigQuery.
We want to have several tables from Branch.io, one with data on clicks on deep links, and one with data on installs from deep links.
This works fine: We set up two webhooks in Branch.io and manage to ETL that data to BigQuery.
HOWEVER, we do not want this data to be in separate datasets. We only manage to get two separate datasets with one table each called "data".
We want one dataset, let's say "deep_links" with two tables, "installs_" and "clicks_". We would also like each of those two tables to be replicated with all data for each day. So that the data for today (2017-11-20) will end up in the tables "installs_20171120" and "clicks20171120".
Is this possible?

Related

How to connect and run Google Big Query to Tableau for my GA4 data?

I have connected Google Big Query to Tableau for my GA4 data. However, the data is split across several tables, each corresponding to a single day. How can I consolidate all these tables into a single, comprehensive table?

Connect different client's GA4 & UA accounts to one BigQuery project

How do I connect various different client's google analytics (GA4 & UA) to one instance of Big Query? I want to store the analytics reports on bigquery & then visualise it on a unified dashboard on Looker
You can set up the exports from Google Analytics to go to the same BigQuery project and transfer historical data to the same project as well.
Even if data is spread across multiple GCP projects, you can still query all from a single project. I would suggest you create a query that connects data from multiple sources together, you can then save it as a view and add it as a source in Looker, you can use it as a custom query in Looker or for best efficiency save the results of your query as a new reporting table that feeds into Looker.

Is it possible to link multiple google analytics properties in a single Bigquery project?

I want to know if it is possible to link multiple different google analytics properties in a single Bigquery project by separating the property datasets by properties.
I checked the acceptance message that the linking is going well, but I have no idea that the data will be saved in a dataset or different datasets.
Yes. It is possible to link multiple GA properties to a single GCP Project but in different BigQuery datasets. In case of Universal Analytics, the ID of each BQ dataset will be same as that of GA View ID. And in case of GA4 the ID of each BQ dataset will be analytics_<property_id>.

Building OLTP DB from Datalake?

I'm confused and having trouble finding examples and reference architecture where someone wants to extract data from an existing data lake (S3/Lakeformation in my case) and build a OLTP datastore that serves as an applications backend. Everything I come across is an OLAP data warehousing pattern (i.e. ETL -> S3 -> Redshift -> BI Tools) where data is always coming IN to the datalake and warehouse rather than being pulled OUT. I don't necessarily have a need for 'business analytics' but I do have a need for displaying graphs in web dashboards with large amounts of time series data points underneath for my websites users.
What if I want to automate pulling extracts of a large dataset in the datalake and build a relational database with some useful data extracts from the various datasets that need to be queried by the hand full instead of performing large analytical queries against a DW?
What if I just want an extract of say, stock prices over 10 years, and just get the list of unique ticker symbols for populating a drop down on a web app? I don't want to query an OLAP data warehouse every time to get this, so I want to have my own OLTP store for more performant queries on smaller datasets that will have much higher TPS?
What if I want to build dashboards for my web app's customers that display graphs of large amounts of time series data currently sitting in the datalake/warehouse. Does my web app connect directly to the DW to display this data? Or do I pull that data out of the datalake or warehouse and into my application DB on some schedule?
My views on your 3 questions:
Why not just use the same ETL solution that is being used to load the datalake?
Presumably your DW has a Ticker dimension that has unique records for each Ticker symbol? What's the issue with querying this as it would be very fast to get the unique Ticker symbols from it?
It depends entirely on your environment/infrastructure and what you are doing with the data - so there is no generic answer anyone could provide you with. If your webapp is showing aggregations of a large volume of data then your DW is probably better at doing those aggregations and passing the aggregated data to your webapp; if the webapp is showing unaggregated data (and only a small subset of what is held in your DW, such as just the last week's data) then loading it into your application DB might make more sense
The pros/cons of any solution would also be heavily influenced by your infrastructure e.g. what's the network performance like between your DW and application?

How to append every day data to the tabels in the big query?

We are migrating from Digital Ocean to GCP and to test things out we exported our data as json from our mongo DB and uploaded it to GCS bucket (userDump.json is one of it)
Now, we are fetching data from our GCS bucket and making tables in big query. (users table)
So far everything is working out.
My problem:
Every day we are onboarding new users and their data is saved on GCS. We want to run a cron /similar functionality to add that data to the table so in the morning, people can perform queries on yesterday's data.
How can I achieve this?
Take a look at my lazy data loading in BigQuery article:
https://medium.com/google-cloud/bigquery-lazy-data-loading-ddl-dml-partitions-and-half-a-trillion-wikipedia-pageviews-cd3eacd657b6
What you could do is:
Have BigQuery read files directly from GCS - a federated query.
Then have a scheduled query inside BigQuery materialize these federated tables into native BigQuery tables.
Your users will get fresh data daily, or even more frequently - no servers needed :).