Prometheus export / import data for backup - backup

How do you export and import data in Prometheus? How do you make sure the data is backed up if the instance gets down?
It does not seem that there is a such feature yet, how do you do then?

Since Prometheus version 2.1 it is possible to ask the server for a snapshot. The documentation provides more details - https://web.archive.org/web/20200101000000/https://prometheus.io/docs/prometheus/2.1/querying/api/#snapshot
Once a snapshot is created, it can be copied somewhere for safe keeping and if required a new server can be created using this snapshot as its database.
The documentation website constantly changes all the URLs, this links to fairly recent documentation on this -
https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis

There is no export and especially no import feature for Prometheus.
If you need to keep data collected by prometheus for some reason, consider using the remote write interface to write it somewhere suitable for archival, such as InfluxDB (configured as a time-series database).
Prometheus isn't a long term storage: if the database is lost, the user is expected to shrug, mumble "oh well", and restart Prometheus.
credits and many thanks to amorken from IRC #prometheus.

There is an option to enable Prometheus data replication to remote storage backend. Later the data collected from multiple Prometheus instances could be backed up in one place on the remote storage backend. See, for example, how VictoriaMetrics remote storage can save time and network bandwidth when creating backups to S3 or GCS with vmbackup utility.

Related

Is backup service automacially region-redundant?

I'm learning about Azure redundancy polices, which consists of locally redundant storage, zone redundant storage and region redundant storage.
However when I tried to set up automatic back up service on my virtual machine, it doesn't give me options to choose from. Is backing up always done in a region redundant way? Thanks.
The short answer is no. Azure Backup does not have to use geo-redundancy (GRS) - you could configure a different redundancy option. However, by default it is indeed using GRS.
Please see the Azure Backup Architecture Overview for details: https://learn.microsoft.com/en-us/azure/backup/backup-architecture
Specifically, the 'Where is data backed up?' section outlines the redundancy options.

Snapshot IBM Cloud Object Storage

i am trying to figure out if it is possible, also with the help of other softwares (like minio, portworx, veeam etc) to take a snapshot (and eventually restore it later) of the content stored on an IBM Cloud Object Storage instance used as persistence layer for an openshfit cluster through its S3 compatible api endpoints.
Please check out this link and see if it provides what you are looking for.
Thanks. https://cloud.ibm.com/docs/openshift?topic=openshift-object_storage#cos_backup_restore
In the end i found this on the official IBM Cloud documentation that achieves somehow what i answered here: basically it explains how to synch tow backets beetwen them, also in different regions, so to have both data and backups on an S3 sotrage
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-region-copy

AWS S3 alternatives for private cloud

Right now we have a requirement to migrate from AWS to private Data Center. We need to find out potential alternative storage instead of AWS S3.
Currently S3 is used in the following way:
Overall storage size is 10TB;
Min/Avg/Max object size is 0.5/2/100 Mb;
We have N App instances that simultaneously writes/reads
objects approximately 50 writes/sec, 30 reads/sec;
This storage should be redundant (Highly Available), Fault Tolerant, Scalable;
The naive implementation could be store this data on:
Simple NFS storage and add some replication functionality;
Just store mentioned objects in NoSQL DB (as example in Cassandra). However Cassandra will require a number of instances to support this storage (It's nor recommended to store > 1TB pn 1 Cassandra node Cassandra capacity planning)
What solution would you recommend for such scenario ?
Using MinIO is your best bet if you want to have a private cloud storage. It is AWS S3 compatible meaning that applications use AWS S3 can be migrated to MinIO seamlessly. They have a tutorial how to connect MinIO server with AWS CLI. You can test it against the public hosted MinIO server https://play.min.io:9000. Please refer to AWS CLI with MinIO Server.
You can have highly available storage system using MinIO distributed setup. Beware that the dynamic expansion is not a feature of MinIO distributed setup. If you want to expand your cluster you end up spinning a new cluster with your desired number of servers/disks and then you have to migrate your data from old one to new one.
I find it much more easier to use than HDFS. In addition to this, there are a lot of technologies outside Hadoop ecosystem lack HDFS integration. For example, Docker Registry lacks built in HDFS storage driver. However, it has a S3 driver so you can use MinIO as it's object storage.
There're a bunch of options as of S3-compatible private cloud service. if you like open source solutions, the above open stack and Cassandra are good ones. Note that usually no matter what you use, probably you end up setting up a cloud with multiple nodes and this is inevitable to exchange for redundancy and availability. There're some good commercial and economic products as well such as the one from Cloudian
If you need object store I could recommend elliptics (in english).
As I know, it doesn't has limits on disk store.
In case for Cassandra we are using SSD disks (for better performance) < 200-500 Gb. Ring size would be depend from your requirements (read/write latency, replication rate, time to life).
50 writes/sec, 30 reads/sec
This is really quite easy for Cassandra, as I can compare with our setup.
In that case it more depends from time to life for your objects.
Generally, in case for distributed network you also could look at GlusterFS.
You can use OpenStack Swift
Swift is a highly available, distributed, eventually consistent object/blob store. Organizations can use Swift to store lots of data efficiently, safely, and cheaply.
Learn More on : https://docs.openstack.org/swift/latest/
And https://oldhenhut.com/2016/05/31/s3-vs-swift/

How to migrate Redis database to Aerospike?

We have a large redis database. The number of keys exploded recently as we have ~160M keys which take 50GB+ of RAM.
What would be the best migration strategy to move all this data from Redis to Aerospike? We are planning to use Jedis later so hopefully after the migration it will be as simple as pointing our services to a new port.
Ideally we can somehow import the dump.rdb file into Aerospike.
You need to put a little bit of extra work. Aerospike now supports Redis like list and map APIs. So, the migration will not be painful. However, you need to migrate your data and application.
To migrate data, you can export Redis data in csv format using the redis-cli utility and load it into aerospike using the aerospike csv loader utility. You can parallelize the loading if you split the data into multiple csv files.
To migrate the application, it's best to use aerospike native client library for better integration. You can pick language of your choice. You should find equivalent api for most of your needs. If you already abstracted the basic calls in your application, the migration should be even more smoother as there will be few places where you need to change the calls.

Planning the development of a scalable web application

We have created a product that potentially will generate tons of requests for a data file that resides on our server. Currently we have a shared hosting server that runs a PHP script to query the DB and generate the data file for each user request. This is not efficient and has not been a problem so far but we want to move to a more scalable system so we're looking in to EC2. Our main concerns are being able to handle high amounts of traffic when they occur, and to provide low latency to users downloading the data files.
I'm not 100% sure on how this is all going to work yet but this is the idea:
We use an EC2 instance to host our admin panel and to generate the files that are being served to app users. When any admin makes a change that affects these data files (which are downloaded by users), we make a copy over to S3 using CloudFront. The idea here is to get data cached and waiting on S3 so we can keep our compute times low, and to use CloudFront to get low latency for all users requesting the files.
I am still learning the system and wanted to know if anyone had any feedback on this idea or insight in to how it all might work. I'm also curious about the purpose of projects like Cassandra. My understanding is that simply putting our application on EC2 servers makes it scalable by the nature of the servers. Is Cassandra just about keeping resource usage low, or is there a reason to use a system like this even when on EC2?
CloudFront: http://aws.amazon.com/cloudfront/
EC2: http://aws.amazon.com/cloudfront/
Cassandra: http://cassandra.apache.org/
Cassandra is a non-relational database engine and if this is what you need, you should first evaluate Amazon's SimpleDB : a non-relational database engine built on top of S3.
If the file only needs to be updated based on time (daily, hourly, ...) then this seems like a reasonable solution. But you may consider placing a load balancer in front of 2 EC2 images, each running a copy of your application. This would make it easier to scale later and safer if one instance fails.
Some other services you should read up on:
http://aws.amazon.com/elasticloadbalancing/ -- Amazons load balancer solution.
http://aws.amazon.com/sqs/ -- Used to pass messages between systems, in your DA (distributed architecture). For example if you wanted the systems that create the data file to be different than the ones hosting the site.
http://aws.amazon.com/autoscaling/ -- Allows you to adjust the number of instances online based on traffic
Make sure to have a good backup process with EC2, snapshot your OS drive often and place any volatile data (e.g. a database files) on an EBS block. EC2 doesn't fail often but when it does you don't have access to the hardware, and if you have an up to date snapshot you can just kick a new instance online.
Depending on the datasets, Cassandra can also significantly improve response times for queries.
There is an excellent explanation of the data structure used in NoSQL solutions that may help you see if this is an appropriate solution to help:
WTF is a Super Column