Google Cloud Bigtable backup and recovery

Google Cloud Bigtable backup and recovery - bigtable

I am new to Google Cloud Bigtable and have a very basic question as to whether the cloud offering protects my data against user error or application corruption? I see a lot of mention on the Google website that the data is safe and protected but not clear if the scenario above is covered because I did not see references to how I can go about restoring data from a previous point-in-time copy. I am sure someone on this forum knows!

Updated 7/24/2020: Bigtable now supports both backups and replication.
Currently we create backups to protect against catastrophic events and provide for disaster recovery.
As of February 2017, Cloud Bigtable does not provide backups from user errors or application bugs at this time. We hope to make this feature available in a future release - there is no planned delivery date at this time. In the meantime you may make your own snapshots using HBase or a similar process.

In addition to Google's disaster protection #Greg Dubicki mentioned, at Egnyte we backup our mission-critical Bigtable data into GCS, as Hadoop sequence files, using a couple Python wrappers for the Bigtable HBase shaded jar.
This provides for a quick recovery, fully under our control (ie. no need to wait for Google support to recover data on demand) in case our BT cluster failed or if an error on our software/admin side corrupted the data. A usefull side-effect is access to historical BT data for debugging.
Last week I wrote about that on Egnyte's engineering blog: https://medium.com/egnyte-engineering/bigtable-backup-for-disaster-recovery-9eeb5ea8e0fb. And we are thinking about open-sourcing this. We'll see how it goes.
UPDATE: On Thu Feb 20 I have published the scripts on Egnyte’s GitHub, under MIT license - https://github.com/egnyte/bigtable-backup-and-restore.

As of February 2020, Cloud Bigtable does provide backups, but only vaguely described as:
(...) we [do] create backups of your data to protect against catastrophic events and provide for disaster recovery.
Source

Related

Log Analytics retention policy and querying on logs

I would like to know how can we address this scenario in Azure Log Analytics where I need to generate Kube-audit logs of different cluster every week and also retain these logs for approx 400 days. Now storing it over Log Analytics will cost me more and its not an optimized architecture as I will not be require that so often. So I would like to know from experts whats the best way to design the architecture, where we get the kube audit logs which can be retained for 400 days and be available for querying when required without incurring too much cost.
PS: I also heard in my team that querying 400 days logs always times out in KQL.

Log analytics offerings:
Log analytics now provides the capability to manage several service tiers at table scope. Setting your data as archive, with no query capabilities at a much lower cost. offering spans for up to 7 years.
when needed, you can choose to elevate a subset of your data into the Analytics offering, providing you the capability to query it. The action of elevating your data is denoted as - "Search jobs"
Another option is to elevate an entire period in time to the Analytic offering, they call it - "Restore logs".
Table's different service tiers -
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/data-retention-archive?tabs=api-1%2Capi-2
Search job offering -
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/search-jobs?tabs=api-1%2Capi-2%2Capi-3
Restore logs -
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/restore?tabs=api-1%2Capi-2
all are under public preview.
both offerings - Search jobs and Restore logs provides you the capability to engage your data on demand, can't comment or suggest regarding the actual cost.
Azure data explorer solution:
Another option is to use Azure storage to hold your data (as an example), Azure data explorer provides the capability to create an external table, that table is a logical view on top of your data, the data itself is kept outside of the ADX cluster. you can query your data by using ADX, expect degradation in query performance.
ADX external table offering -
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/externaltables

Blob storage folders bckups

we have a lot of pipelines in the synapse workspace.
using serverless sqlpool which is set to online
dedicated sql pool is paused as we do not use it to hold data...
using DevOps Repository
the support team will be making some clean-up in the environment. i.e. Running an old terraform to re-create the environment, etc.
How is it possible to make sure that
Question:
I understand in our DevOps Repository everything seems to be backed-up except the blob storage folders...
How can we make sure that if in-case something gets lost/ or goes wrong during the workspace clean-up, we will be able to get everything back...?
Thank you

ADLS Gen2 has its own tools for ensuring that DR event won’t affect you. One of the most powerful tools there is replication including Geo-Replicated Storage option.
Data Lake Storage Gen2 already handles 3x replication under the hood to guard against localized hardware failures. Additionally, other replication options, such as ZRS or GZRS, improve HA, while GRS & RA-GRS improve DR. When building a plan for HA, in the event of a service interruption the workload needs access to the latest data as quickly as possible by switching over to a separately replicated instance locally or in a new region.
In a DR strategy, to prepare for the unlikely event of a catastrophic failure of a region, it is also important to have data replicated to a different region using GRS or RA-GRS replication. You must also consider your requirements for edge cases such as data corruption where you may want to create periodic snapshots to fall back to. Depending on the importance and size of the data, consider rolling delta snapshots of 1-, 6-, and 24-hour periods, according to risk tolerances.
For data resiliency with Data Lake Storage Gen2, it is recommended to geo-replicate your data via GRS or RA-GRS that satisfies your HA/DR requirements. Additionally, you should consider ways for the application using Data Lake Storage Gen2 to automatically fail over to the secondary region through monitoring triggers or length of failed attempts, or at least send a notification to admins for manual intervention. Keep in mind that there is tradeoff of failing over versus waiting for a service to come back online.
For more details refer to Best practices for using Azure Data Lake Storage Gen2.
And also here a great article which talks about : Azure Synapse Disaster Recovery Architecture.

How to work with AWS Cognito in production environment?

I am working on an application in which I am using AWS Cognito to store users data. I am working on understanding how to manage the back-up and disaster recovery scenarios for Cognito!
Following are the main queries I have:
I wanted to know what is the availability of this stored user data?
What are the possible scenarios with Cognito, which I need to take
care before we go in production?

AWS does not have any published SLA for AWS Cognito. So, there is no official guarantee for your data stored in Cognito. As to how secure your data is, AWS Cognito uses other AWS services (for example, Dynamodb, I think). Data in these services are replicated across Availability Zones.
I guess you are asking for Disaster Recovery scenarios. There is not much you can do on your end. If you use Userpools, there is no feature to export user data, as of now. Although you can do so by writing a custom script, a built-in backup feature would be much more efficient & reliable. If you use Federated Identities, there is no way to export & re-use Identities. If you use Datasets provided by Cognito Sync, you can use Cognito Streams to capture dataset changes. Not exactly a stellar way to backup your data.
In short, there is no official word on availability, no official backup or DR feature. I have heard that there are feature requests for the same but who knows when they would be released. And there is not much you can do by writing custom code or follow any best practices. The only thing I can think of is that periodically backup your Userpool's user data by writing a custom script using AdminGetUser API. But again, there are rate limits on how many times you can call this API. So, backup using this method can take a long time.

AWS now offers a SLA for Cognito. In the event they are unable to meet their availability target (99.9% at the time of writing), you will receive service credits.

Even through there are couple of third party solutions available, when restoring a user pool users will be created using admin flow (users are not restored rather they will be created from an admin) and they will end up with "Force Change Password" status. So the users will be forced to change the password using the temporary password and that has to be facilitated from the front end of the application.
More info : https://docs.amazonaws.cn/en_us/cognito/latest/developerguide/signing-up-users-in-your-app.html
Tools available.
https://www.npmjs.com/package/cognito-backup
https://github.com/mifi/cognito-backup
https://github.com/rahulpsd18/cognito-backup-restore
https://github.com/serverless-projects/cognito-tool
Pls bear in mind that some of these tools are outdated and can not be used. I have tested "cognito-backup-restore" and it is working as expected.
Also you have to think of how to secure the user information outputted by these tools. Usually they create a json file containing all the user information (except the passwords as passwords can not be backed up) and this file is not encrypted.
The best solution so far is to prevent accidental deletion of user pools with AWS SCPs.

Reliability of Windows Azure Storage Logging

We are in the process of creating a piece of software to backup a storage account (blobs & tables, no queues) and while researching how to do this we came across the possibility storage logging. We would like to use this feature to do smart incremental backups after an initial full backup. However in the introductory post for this feature here the following caveat is mentioned:
During normal operation all requests are logged; but it is important to note that logging is provided on a best effort basis. This means we do not guarantee that every message will be logged due to the fact that the log data is buffered in memory at the storage front-ends before being written out, and if a role is restarted then its buffer of logs would be lost.
As this is a backup solution this behavior makes the features unusable, we can't miss a file. However I wonder if this has changed in the meantime as Microsoft has built a number of features on top of it like blob function triggers and very recently their new Azure Event Grid.
My question is whether this behavior has changed in the meantime or are the logs still on a best effort basis and should we stick to our 'scanning' strategy?

The behavior for Azure Storage logs is still same. For your case, you might be better off using the EventGrid notification for Blob storage: https://azure.microsoft.com/en-us/blog/introducing-azure-event-grid-an-event-service-for-modern-applications/

Application Level Replication Technologies

I am building out a solution that will be deployed in multiple data centers in multiple regions around the world, with each data center having a replicated copy of data actively updated in each region. I will have a combination of multiple databases and file systems in each data center, the state of which must be kept consistent (within a data center). These multiple repositories will be fronted by a SOA service tier.
I can tolerate some latency in the replication, and need to allow for regions to be off-line, and then catch up later.
Given the multiple back end repositories of data, I can't easily rely on independent replication solutions for each one to maintain a consistent state. I am thus lead to implementing replication at the application layer -- by replicating the SOA requests in some manner. I'll need to make sure that replication loops don't occur, and that last writer conditions are sorted out correctly.
In your experience, what is the best pattern for solving this problem, and are there good products (free or otherwise) that should be investigated?

Lotus/ Domino is your answer. I've been working with it for ten years and its exactly what you need. It may not be trendy (a perception that I would challenge) but its powerful, adaptable and very secure, The latest version R8 is the best yet.

You should definitely consider IBM Lotus Domino. A Lotus Notes database can replicate between sites on a predefined schedule. The replicate in Notes/Domino is definitely a very powerful feature and enables for full replication of data between sites. Even if a server is unavailable the next time it connects it will simply replicate and get back in sync.
As far as SOA Service tier you could then use Domino Designer to write a webservice. Since Notes/Domino 7.5.x (I believe) Domino has been able to provision and consume webservices.

AS what other advised, I will recommend also Lotus Notes/Domino. 8.5 is really very powerful application development platfrom

You dont give enough specifics to be certain of your needs but I think you should check out SQL Server Merge replication. It allows for asynchronous replication of multiple databases with full conflict resolution. You will need to designate a Global master and all the other databases will replicate to that one, but all the database instances are fully functional (read/write) and so you can schedule replication at whatever intervals suit you. If any region goes offline they can catch up later with no issues - if the master goes offline everyone will work independantly until replication can resume.
I would be interested to know of other solutions this flexible (apart from Lotus Notes/Domino of course which is not very trendy these days).

I think that your answer is going to have to be based on a pub/sub architecture. I am assuming that you have reliable messaging between your data centers so that you can rely on published updates being received eventually. If all of your access to the data repositories is via service you can add an event notification to the orchestration of each of your update services that notifies all interested data centers of the event. Ideally the master database is the only one that sends out these updates. If the master database is the only one sending the updates you can exclude routing the notifications to the node that generated them in the first place thus avoiding update loops.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas