I have created a development Cloud Bigtable cluster and would like to disable this when I am not working on it to avoid getting billed, but the only option I see is to delete the cluster; doing this will require me to recreate the tables which I don't want to do.
Is there a way to disable the Cloud Bigtable cluster and enable them only during the time I work on it?
Unfortunately, there is not currently a way to disable a cluster and just maintain the tables and data. You can export data if need be, or create a script using the cbt command line tool to quickly restore your cluster.
Related
I would like to run bq load command once every day at 00:00 UTC. Can I use Google Cloud scheduler for scheduling this command?
As mentioned by #Daniel, there is no direct way to use cloud scheduler to execute-schedule queries, however there are options you can consider to run queries on schedule.
Use scheduled queries directly in BQ
Since your source is GCS, you can load data from GCS to BQ and then execute scheduled queries like mentioned here
Use scheduled Cloud Function to run queries
Schedule using Data Transfer
You can also try what #Graham Polley has mentioned in this blog, which requires an architecture combining Cloud Scheduler, Cloud Sourse Repositories and Cloud Build
Assuming you have a file that is being loaded into Cloud Storage everyday before 7am, you may consider a more resilient design: when the file is created in CS create a notification that starts the process to load it. It is a better design that will get the information earlier into BigQuery and it will keep working even if the file creation is delayed.
When the file is created in Cloud Storage get a message in PubSub: https://cloud.google.com/storage/docs/pubsub-notifications
Then, a Cloud Function is invoked that will execute the bq load command.
BTW if you have many files or even some dependencies, consider using Cloud Composer as an orchestrator to keep its complexity under control.
You would not be able to do it directly with Cloud Scheduler you would need an intermediary like a Cloud Function to execute a command. Alternatively you could try scheduling a data transfer, depending on the requirements of your load job.
Here is an example from the documentation:
https://cloud.google.com/bigquery/docs/cloud-storage-transfer#setting_up_a_cloud_storage_transfer
Based on your update of desiring to shard the table based on date, try scheduled queries in the following manner.
Create an external table pointed to the desired path in GCS as described here
Define your query, i recommend defining a query with column names and appropriate casting.
SELECT *
FROM myproject.dataset_id.external_table_name
-- INCLUDE FILTERING ON _FILE_NAME IF NEEDED LIKE FOLLOWING:
-- WHERE _FILE_NAME LIKE SOME_VALUE
Create Schedule Query with Run_Date Parmeter in the table name like new_table_{run_date}
We are evaluating bigquery and snowflake for our new cloud warehouse. Does bigquery has a cloning feature built-in? This will enable our developers to create multiple development environments quickly and we can also restore to point-in-time .Snowflake has a zero-copy clone to minimize the storage footprint. For managing DEV/QA environments in bigquery do we need to manually copy the datasets from prod? Please share some insights.
You can use a pre-GA feature Big query data transfer service to create copies of datasets, you can also schedule and configure the jobs to run periodically so that the target dataset is in sync with source dataset. Restoring to a point in time is available via FOR SYSTEM_TIME AS OF in FROM clause
I don't think there is an exact snowflake clone equivalent on Big query. What would this mean?
You will be charged for additional storage and for data transfer if its cross-region (pricing equivalent to Compute Engine network egress between regions)
Cloning is not instantaneous, for large tables(> 1 TB) you might still have to wait for a while before you see a new copy is created
I set up a Log Analytics workspace for one of my servers and I manually added several performance counters.
Is there a fast way to copy the performance counters to future workspaces, or do I have to manually add them each time?
It isn't possible to copy Performance Counters between LA workspace, however you can automate the process using PowerShell. Here is a SO thread where one of the community member has provided sample solution on how to add the performance counter using PowerShell.
Another article for your reference: Create workspace and configure data sources.
I am trying to create an event tracking system for our website. I would like to insert the events into Bigquery directly from the consumer's browser. However, to do this, I believe that I need to share the API key with the browser for it to be able to insert into Bigquery. This creates a security flaw, where someone can take the API key and insert large volumes of false events into our Bigquery tables. Are there security features on the Bigquery server that can filter out such events (perhaps by detecting malicious insertion patterns)?
See the solution "How to Do Serverless Pixel Tracking":
https://cloud.google.com/solutions/serverless-pixel-tracking-tutorial
Instead of logging straight to BigQuery, you could:
Create a pixel in Google Cloud Storage.
Insert this pixel in your pages.
Configure GCS logs so they are routed to BigQuery - in realtime through StackDriver.
Even add a load balancer, for best performance around the world.
I've deployed a single micro-instance redis on compute engine using the (very convenient) click-to-deploy feature.
I would now like to update this configuration to have a couple of instances, so that I can benchmark how this increases performance.
Is it possible to modify the config while it's running?
The other option would be to add a whole new redis deployment, bleed traffic onto that over time and eventually shut down the old one. Not only does this sound like a pain in the butt, but, I also can't see any way in the web UI to click-to-deploy multiple clusters.
I've got my learners license with all this, so would also appreciate any general 'good-to-knows'.
I'm on the Google Cloud team working on this feature and wanted to chime in. Sorry no one replied to this for so long.
We are working on some of the features you describe that would surely make the service more useful and powerful. Stay tuned on that.
I admit that there really is not a good solution for modifying an existing deployment to date, unless you launch a new cluster and migrate your data over / redirect reads and writes to the new cluster. This is a limitation we are working to fix.
As a workaround for creating two deployments using Click to Deploy with Redis, you could create a separate project.
Also, if you wanted to migrate to your own template using the Deployment Manager API https://cloud.google.com/deployment-manager/overview, keep in mind Deployment Manager does not have this limitation, and you can create multiple deployments from the same template in the same project.
Chris