Write Data from Google Spreadsheets to a BigQuery Table - google-bigquery

I'm trying to write data from Google Spreadsheets to a BigQuery Table.
Are there any sources which I can tap into to learn how to do this?
(Something like this would be awesome:
https://developers.google.com/apps-script/articles/bigquery_tutorial)
Thanks.

What have you tried so far?
Before writing actual code, I see two ways:
Send data through a POST request, using https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app and https://developers.google.com/bigquery/loading-data-into-bigquery#loaddatapostrequest.
Otherwise you could upload the data to Google Cloud Storage, and insert a job that loads it into BigQuery. Take a look at http://blog.knoldus.com/2013/01/19/google-apps-script-to-store-data-on-google-cloud-sorage/.

Related

Is it possible to use BigQuery to find when people request a website?

I'm trying to figure out at what time people request content from a website (such as 'www.netflix.com'), do any of the avalible reports contain this data? If so how would I access it?
I've had a look around and can't see a table that has this data. Would anywhere else store it?
If you meant reports that available in BigQuery, I would suggest exploring BigQuery Public Datasets. At the moment I do not see any website access datasets but might still be useful for your reference.

BigQuery to BigQuery DataFlow

I've had a look at this SO post but it's three years old and I think GCP has changed since then.
What I'm trying to do is set up a data pipeline using DataFlow jobs to copy/transform data from one GBQ project into another GBQ project.
To create a DataFlow job, you need to choose a template and there is no template that matches my needs i.e. no BQ to BQ template.
There is an option to use a custom template (which I imagine would be a python script or something along those lines), but it seems odd that there is no BQ to BQ template. Is DataFlow not the right tool for this job? Should I just use scheduled queries?
Thanks in advance
There is a way which is not very straight forward if you really want to use Dataflow template, you can use BigQuery to cloud storage template to store data in GCS and then cloud storage to BigQuery template to bring the data to destination project. However make sure you gave proper permission that is required to access the cloud storage buckets from the destination project.
If the transformations you want are not possible using SQL or not practical to use SQL, you can use Cloud Data fusion -> Integration studio. Here you can choose both source and sink as BigQuery and there are a number of options available for transformation component. It is similar to ETL tool. Data Fusion Quickstart documentation.
Otherwise, you can simply execute or schedule a query as per your requirement in BigQuery itself and save the result of the query in another table Saving query results in destination table.

Can we import data from BigQuery into Google Sheets?

I'm only seeing how to load data to my bigquery database but i don't see any wiki or post to help me on my question : Is it possible to load data to an external source like google sheet from a bigquery database (that can go thought the cloud storage plateforme)
you can export data from BigQuery, yes. One way to bring BigQuery data into Google Sheets is with the new Google Sheets Data Connector, see the screenshot below for where you'll find this on the GS interface:
Once connected, you can then write an SQL query and pull this data directly into a Google Sheet.
Here's a link to some official documentation on this data connector: https://support.google.com/docs/answer/9077536

How to refresh google drive data source - Google Big Query

I have a question regarding refreshing google big query table where the data source is google drive.
Imagine, you have CSV file on google drive and every day someone updates for you.
1. The filename is not changing
2. location URI is same
How can I refresh my big query table by using this google drive file?
Could you please guide me or send me related links?
Thanks
From the BigQuery docs:
Loading data into BigQuery from Google Drive is not currently
supported, but you can query data in Google Drive using an external
table.
The link above provides instructions on how to create an external table that references your stored-in-Drive data source. Considering that you want to be querying data from a Google Drive file which you will be updating in Drive, this is the solution you are looking for (in contrast to downloading your csv locally and then loading it into BQ, in which case you would then have to be updating directly in BQ).

Tableau data extract refresh from Google BigQuery takes very long

We are very pleased with the combination BigQuery <-> Tableau Server with live connection. However, we now want to work with a data extract (500MB) on Tableau Server (since this datasource is not too big and is used very frequently). This takes too much time to refresh (1.5h+). We noticed that only 0.1% is query time and the rest is data export. Since the Tableau Server is on the same platform and location, latency should not be a problem.
This is similar to the slow export of a BigQuery table to a single file, which can be solved by using "daisy chain" option (wildcards). Unfortunately we can't use similar logic with a Google BigQuery data extract refresh in Tableau...
We have identified some approaches, but are not pleased with our current ideas:
Working with incremental refresh: our existing BigQuery table rows can change: these changes can only be applied in Tableau if you do a full refresh
Exporting the BigQuery table to GCS using the daisy chain option and making a Tableau data extract using the Tableau SDK: this would result in quite some overhead...
Writing a Dataflow job using a custom sink for Tableau Server (data extracts).
Experimenting with a Tableau web connector that communicates directly with the BigQuery API: I don't think this will be faster? I didn't see anything about parallelizing calls with the Tableau web connecector, but I didn't try this approach yet.
We would prefer a non-technical option, to limit maintenance... Is there a way to modify the Tableau connector to make use of the "daisy chain" option for BigQuery?
You've uploaded the data in BigQuery. Can't you just use the input for that load job (a CSV perhaps) as input for Tableau?
When we use Tableau and BigQuery we also notice that extracts are slow but we generally don't do that because you lose BigQuery's power. We start with a live data connection at first, and then (if needed) convert this into a custom query that aggregates that data into a much smaller datasets which extracts in just a few seconds.
Another way to achieve higher performance with BigQuery and Tableau is aggregating or joining tables on beforehand. JOINs on huge tables can be slow, so if you use a lot of those you might consider generating a denormalised dataset which does all of the JOIN-ing first. You will get a dataset with a lot of duplicates and a lot of columns. But if you select only what you need in Tableau (hide unused fields!) then these columns won't count in your query cost.
One recommendation I have seen is similar to your point 2 where you export the BQ table to Google Cloud Storage and then use the Tableau Extract API to create a .tde from the flat files in GCS.
This was from an article on the Google Cloud site so I'd assume it would be best practice:
https://cloud.google.com/blog/products/gcp/the-switch-to-self-service-marketing-analytics-at-zulily-best-practices-for-using-tableau-with-bigquery
There is an article here which provides a step by step guide to achieving the above.
https://community.tableau.com/docs/DOC-23161
It would be nice if Tableau optimised the BQ connector for extract refresh using the BigQuery Storage API. We too have our Tableau Server environment in the same GCP zone as our BQ datasets and experience slow refresh times.