How to backup a recoverable Cognito user pool?

How to backup a recoverable Cognito user pool? - backup

Does anyone have recommendations or best practices on how to backup a recoverable Cognito user pool?
I'm using nodejs serverless to create a CloudFormation with all the assets, including the Cognito setup. I'm looking for a way to export or save a backup of the user pool data somewhere incase the stack ever needs redeployed from scratch.
I'd prefer to keep the assets tied together, but even if I take the Cognito setup out of the CloudFormation template, I'm still interesting in knowing if a backup to recover from is possible incase something else happens to the user pool (like a region going down).
If it matters, we aren't using Cognito in production yet; the team I'm on is trying to evaluate it now. I'm aware of a couple nodejs packages, cognito-backup and cognito-backup-restore, but they both require everyone to reset their passwords if the backup actually needs to be used.

It's not possible to restore a Cognito user pool without requiring users to reset their passwords. I'd recommend telling Amazon that you find this unacceptable, and migrating to Google Cloud Identity Platform (which supports this) if possible.

Related

How to get user accessible queues in YARN?

I use typical stack YARN/Ranger with atomic policies for accessing YARN queues. Having Hadoop user access, how to get list of queues that user has access to? I can see how it's usually done from admin side, but what about user? I went through yarn APIs, but found nothing. Ranger - user usually doesn't have enough permissions to get more details about itself. Is the only way to do it is to bruteforce all queues in cluster until u find accessible one?

Unfortunately, the user queue policy is not visible through the REST APIs for Fair Scheduler. You can double-check by running:
curl RM-ADDR:PORT/ws/v1/cluster/scheduler
but looking at ResourceManager REST API’s:Cluster Scheduler API I think you're out of luck.
If you use Ambari or Cloudera Manager, those might have APIs that will allow you to download the Fair Scheduler's XML file.

Can't deploy marketplace object on GKE

I have a running Kubernetes cluster on Google Cloud Platform.
I want to deploy a postgres image to my cluster.
When selecting the image and my cluster, I get the error:
insufficient OAuth scope
I have been reading about it for a few hours now and couldn't get it to work.
I managed to set the scope of the vm to allow APIs:
Cloud API access scopes
Allow full access to all Cloud APIs
But from the GKE cluster details, I see that everything is disabled except the stackdriver.
Why is it so difficult to deploy an image or to change the scope?
How can I modify the cluster permissions without deleting and recreating it?

Easiest way is to delete and recreate the cluster because there is no direct way to modify the scopes of a cluster. However, there is a workaround. Create a new node pool with the correct scopes and make sure to delete any of the old node pools. The cluster scopes will change to reflect the new node pool.
More details found on this post

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.

A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

Azure Web App and sticky session with Affinity Cookie – Reliable?

I did a production release for one of my clients with Azure PAAS – Azure Web App (formerly Azure Websites). We marked the site in Auto scale mode with 2 minimum instances. We started getting complains from the users that intermittently users are experiencing some issues which can cause only if they lose the session variables while actively using the site.
We were wondering how that can happen since we are maintaining the sessions in "in proc". We didn’t go for a separate out of proc session manager (Redis cache / DB based) on the day 1 because, according to the documentation, Azure Web app runs with a sticky session load balancer (when running with multiple instances) by default. It injects ARR affinity cookie in the Http response that helps in redirecting a user to the same instance with which it established the session at the first time.
To debug this issue, we started printing the actual sessionId in the page. After many attempts, we reproduced the issue – surprisingly sessions are getting swapped. Lets say – my session Id is “1eocgtmwwwwvs1cxksyofne4”, after the page refreshes it got changed to “5p1hsxszq2mcqmt5i5ytqg12” including all other information that are managed in session variable. Scary… isn’t it?
Reached out to support with an Emergency ticket – the response is:
"Azure Web Apps is a stateless platform and our recommendations is to implement a Session Management solution that best works for your environment and avoid the dependency on In-memory session state management especially in cases where your Web App is hosted on multiple servers . In your case as you are considering to implement a Caching solution for session management in near future , our recommendation is to move to a single instance as a workaround for now . We will assist you in ensuring that the app is working as expected on a single instance.”
I completely understand that – but why Affinity cookie will fail. I know that Affinity cookie can be disabled and I didn’t disable it. I thought of sharing this story. It could be useful for any other person relying on Affinity cookie based sticky load balancer.
BTW: Redis session provider implementation is very easy. It took just a few mins to implement.
Doing a postmortem now – what went wrong? I didn’t consider the cookie ‘not supported’ browsers. Then why would it fail intermittently even in my browser where I support cookie. PAAS resources are constantly in move / getting swapped .. .. and there is no way we can use in proc session … etc

Monitoring Amazon S3 logs with Splunk?

We have a large extended network of users that we track using badges. The total traffic is in the neighborhood of 60 Million impressions a month. We are currently considering switching from a fairly slow, database-based logging solution (custom-built on PHP—messy...) to a simple log-based alternative that relies on Amazon S3 logs and Splunk.
After using Splunk for some other analyisis tasks, I really like it. But it's not clear how to set up a source like S3 with the system. It seems that remote sources require the Universal Forwarder installed, which is not an option there.
Any ideas on this?

Very late answer but I was looking for the same thing and found a Splunk app that does what you want, http://apps.splunk.com/app/1137/. I have yet not tried it though.

I would suggest logging j-son preprocessed data to a documentdb database. For example, using azure queues or simmilar service bus messaging technologies that fit your scenario in combination with azure documentdb.
So I'll keep your database based approach and modify it to be a schemaless easy to scale document based DB.

I use http://www.insight4storage.com/ from AWS Marketplace to track my AWS S3 storage usage totals by prefix, bucket or storage class over time; plus it shows me the previous versions storage by prefix and per bucket. It has a setting to save the S3 data as splunk format logs that might work for your use case, in addition to its UI and webservice API.

You use Splunk Add-On for AWS.
This is what I understand,
Create a Splunk instance. Use the website version or the on-premise
AMI of splunk to create an EC2 where splunk is running.
Install Splunk Add-On for AWS application on the EC2.
Based on the input logs type (e.g. Cloudtrail logs, Config logs, generic logs, etc) configure the Add-On and supply AWS account id or IAM Role, etc parameters.
The Add-On will automatically ping AWS S3 source and fetch the latest logs after specified amount of time (default to 30 seconds).
For generic use case (like ours), you can try and configure Generic S3 input for Splunk

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas