Bigquery s3 integration - amazon-s3

Does BigQuery have a feature to import data from s3?
If not then whats the best alternative path which you can suggest?

BigQuery doesn't support direct ingestion of data from S3 buckets. However, it is easy to move data from S3 buckets to Google Cloud Storage using the gsutil command line tool. I would suggest moving the data to Cloud Storage, then ingesting into BigQuery from there.
https://developers.google.com/storage/docs/gsutil

Related

How to connect AWS S3 from Spotfire

We have our data in AWS S3 bucket and respective schema defined in Glue Catelog.
Right now Athena queries are possible to the S3 bucket(well defined schema).
We need to visualize these data from spotfire.
What are the possible way of achieving this.
I am newbie to Spotfire.

Apache Iceberg table format to ADLS / azure data lake

I am trying to find some integration to use iceberg table format on adls /azure data lake to perform crud operations. Is it possible to not use any other computation engine like spark to use it on azure. I think aws s3 supports this usecase. Any thoughts on it.
spark can use Iceberg with the abfs connector, hdfs, even local files. you just need the classpath and authentication right
A bit late to the party but Starburst Galaxy deploys Trino on any Azure region and has a Great Lakes connector that supports Hive (parquet, orc, csv,etc..), Delta Lake and Iceberg. https://blog.starburst.io/introducing-great-lakes-connectivity-for-starburst-galaxy

Pulling data from google spanner to BigQuery for Data Data analysis?

What's the best way to pull the data from cloud spanner to BigQuery for Data Analysis?
thanks
You can use the Google provided dataflow template for pulling data from Spanner to GCS, then run a load job to load it into Bigquery.
Export Spanner database
Cloud Spanner to GCS AVRO
check this Link1. you dont need any external service to migrate data. you can directly read all spanner data through Bigquery and load it.

Backup options or snapshots of Google Cloud Storage data?

I pulled data into Google BigQuery tables and also generate some new datasets based on these data daily.
These original data and generated datesets, I would save in Google Cloud Storage for two purposes,
These are the backup copy of my Google BigQuery data.
Also some of these datasets saved in Google Cloud Storage would be dump loaded to AWS elasticsearch (so they are also the backup copy data for AWS Elasticsearch)
BigQuery or AWS Elasticsearch may only keep 2 months to 1 year data. So the data older than that, I only have one copy on Google Cloud Storage. (I need to have some backup options, such as 1 months snapshots for Google Cloud Storage which I can go back to if needed.)
My questions are
How could I keep a backup or snapshot of Google Cloud Storage data to prevent the data loss in Google Cloud Storage. Such as let me at least trace back 7 days or 1 months of the data in Google Cloud Storage?
So in the case of data lost, (accidentally delete data etc), I can go back a few days to get the data back.
Thanks!
You can backup your cloud data to some local storage, CloudBerry has option "Cloud to Local".
I can recommend the software I am using myself- Cloudberry backup that can backup cloud storage to local storage or to other cloud storage.The toolsupports various cloud storages i.e.Amazon, Google, Azure etc. You can also download and upload data with the help of the tool, thus it's better to install it on Google VM.

Does Google BigQuery and Google cloud storage share files between them?

I have created a BigQuery table by loading CSV file from Google cloud storage.
In this case, does BigQuery table reference the CSV file in cloud storage or it copies data to its own storage?
When you load file from Cloud Storage to BigQuery - this loads data into BigQuery "own" storage that is totally different from Cloud Storage.
Note: BigQuery supports querying data directly from Google Cloud Storage and Google Drive. See details at Creating and Querying Federated Data Sources