I am trying to transfer my play store console data to BigQuery using BigQuery transfer service. My play console data is stored in a GCP bucket, which has 3 folders (reviews, stats, acquisition).
While running my BQ transfer job, only the last folder data is getting moved to BigQuery.
Is there any solution to migrate all the three folder data to BigQuery?
A wildcard in your load process can be used like shown in the following link
Related
I've had a look at this SO post but it's three years old and I think GCP has changed since then.
What I'm trying to do is set up a data pipeline using DataFlow jobs to copy/transform data from one GBQ project into another GBQ project.
To create a DataFlow job, you need to choose a template and there is no template that matches my needs i.e. no BQ to BQ template.
There is an option to use a custom template (which I imagine would be a python script or something along those lines), but it seems odd that there is no BQ to BQ template. Is DataFlow not the right tool for this job? Should I just use scheduled queries?
Thanks in advance
There is a way which is not very straight forward if you really want to use Dataflow template, you can use BigQuery to cloud storage template to store data in GCS and then cloud storage to BigQuery template to bring the data to destination project. However make sure you gave proper permission that is required to access the cloud storage buckets from the destination project.
If the transformations you want are not possible using SQL or not practical to use SQL, you can use Cloud Data fusion -> Integration studio. Here you can choose both source and sink as BigQuery and there are a number of options available for transformation component. It is similar to ETL tool. Data Fusion Quickstart documentation.
Otherwise, you can simply execute or schedule a query as per your requirement in BigQuery itself and save the result of the query in another table Saving query results in destination table.
I'm trying to set up a dashboard in Google Data Studio with Apple News analytics data as one of the sources.
I can see you can download this analytics data manually as a CSV - does anyone know a way of automating this extract? Automatically appending the data weekly to a BigQuery table would be ideal, or Google Sheets or directly into Data Studio if not.
Thanks.
You can load your CSV into BigQuery [1], or schedule a load job, and then use it in datastudio through a BigQuery reader package. Otherwise, if you do not need to append the data you can simply import it with other packages as "Custom JSON/CSV/XML" By Supermetrics.
[1] https://cloud.google.com/bigquery/docs/loading-data#supported_data_formats
I'm importing datasets in Google Cloud Dataprep (by Trifacta) to perform transformations on my data sources. But I can't see Google Drive Sheets in the list after connecting them to Big Query Console. I'm about to use them as rules for my transformations.
I've already created another dataset and the problem persists.
Is it possible to import them or not supported yet?
Thanks,
You are right. According to the documentation Dataprep only supports native BigQuery tables and views as BigQuery sources.
You could try downloading your Drive sheets as csv and then creating a BigQuery table from it, or maybe you could create a load job from your external table into a new native table using:
SELECT * FROM my_dataset.my_external_table
I have a MySQL DB in AWS and can I use the database as a data source in Big Query.
I m going with CSV upload to Google Cloud Storage bucket and loading into it.
I would like to keep it Synchronised by directly giving the data source itself than loading it every time.
You can create a permanent external table in BigQuery that is connected to Cloud Storage. Then BQ is just the interface while the data resides in GCS. It can be connected to a single CSV file and you are free to update/overwrite that file. But not sure if you can link BQ to a directory full of CSV files or even are tree of directories.
Anyway, have a look here: https://cloud.google.com/bigquery/external-data-cloud-storage
People, the company where I work has some MySQL databases on AWS (Amazon RDS). We are making a POC with BigQuery and what I am researching now is how to replicate the bases to BigQuery (the existing registers and the new ones in the future). My doubts are:
How to replicate the MySQL tables and rows to BigQuery. Is there any tool to do that (I am reading about Amazon Database Migration Service)? Should I replicate to Google Cloud SQL and than export to BigQuery?
How to replicate the future registers? Is possible to create a job inside MySQL to send the new registers after a predefined number? For example, after 1,000 new rows are inserted (or a time is passed), some event is "triggered" and the new registers are copied to Cloud SQL/BigQuery?
My initial idea is to dump the original base, load it to the other and use a script to listen to new registers and send them to the new base.
Have I explained it properly? Is it understandable?
You will need to use one of the ETL tools which have integration with both mySQL and BigQuery to perform initial transfer of the data and copy subsequent changes to BigQuery. Take a look on the list of available tools [1]
You can also implement your own tool by developing a process which will extract the data from mySQL to a CSV file and then load that file into BigQuery using data import [2]
[1] https://cloud.google.com/bigquery/third-party-tools
[2] https://cloud.google.com/bigquery/loading-data-into-bigquery
In addition to what Vadim said, you can try:
mysqldump to CSV files to s3 (I believe RDS allows that)
run "gsutil" Google Cloud Storage utility to copy data from s3 to GCS
run "bq load file.csv" to load the file to BigQuery
I'm interested in hearing your experience, so feel free to ping me in private.