Scheduled copying data from yandex bucket to s3 bucket - amazon-s3

I need to copy 1 TB of data from yandex bucket to s3 bucket. First run for full replication and then daily running it twice (every 12 hrs) so that all the new files are also synced to the s3 bucket. I have explored solutions like rclone and flexify however I am unsure what to proceed with. What would be the most optimal and cost effective solution to this problem?

Related

AWS S3 Folder wise metrics

We are using grafana's cloudwatch data source for aws metrics. We would like to differentiate folders in S3 bucket with respect to their sizes and show them as graphs. We know that cloudwatch doesn't give object level metrics but bucket level. In order to monitor the size of the folders in the bucket, let us know if any possible solution out there.
Any suggestion on the same is appreciated.
Thanks in advance.
Amazon CloudWatch provides daily storage metrics for Amazon S3 buckets but, as you mention, these metrics are for the whole bucket, rather than folder-level.
Amazon S3 Inventory can provide a daily CSV file listing all objects. You could load this information into a database or use Amazon Athena to query the contents.
If you require storage metrics at a higher resolution than daily, then you would need to track this information yourself. This could be done with:
An Amazon S3 Event that triggers an AWS Lambda function whenever an object is created or deleted
An AWS Lambda function that receives this information and updates a database
Your application could then retrieve the storage metrics from the database
Thanks for the reply John,
However I found a solution for it using an s3_exporter. It gives metrics according to size of the folders & sub-folders inside S3 bucket.

How can I search the changes made on a `s3` bucket between two timestamp?

I am using s3 bucket to store my data. And I keep pushing data to this bucket every single day. I wonder whether there is feature I can compare the files different in my bucket between two date. I not, is there a way for me to build one via aws cli or sdk?
The reason I want to check this is that I have a s3 bucket and my clients keep pushing data to this bucket. I want to have a look how much data they pushed since the last time I load them. Is there a pattern in aws support this query? Or do I have to create any rules in s3 bucket to analyse it?
Listing from Amazon S3
You can activate Amazon S3 Inventory, which can provide a daily file listing the contents of an Amazon S3 bucket. You could then compare differences between two inventory files.
List it yourself and store it
Alternatively, you could list the contents of a bucket and look for objects dated since the last listing. However, if objects are deleted, you will only know this if you keep a list of objects that were previously in the bucket. It's probably easier to use S3 inventory.
Process it in real-time
Instead of thinking about files in batches, you could configure Amazon S3 Events to trigger something whenever a new file is uploaded to the Amazon S3 bucket. The event can:
Trigger a notification via Amazon Simple Notification Service (SNS), such as an email
Invoke an AWS Lambda function to run some code you provide. For example, the code could process the file and send it somewhere.

aws backup dynamodb to s3 but only today added records

is there any way to make daily dynamodb backup to s3 bucket but backup should contain only current day added records in dynamodb.
In other work, I want to take daily dynamodb backup to s3 but daily backup contain only today records added in dynamobd.
Please help if there is any way.
thanks,
If you enable streams on your DynamoDB table you get visibility into every item change in the table. You can write a Lambda function that processes the stream events and dumps items added to the table into an S3 bucket.

How to copy bucket in amazon S3?

I'm really sick of amazon ability to clone a bucket. They don't offer bucket renaming and they don't offer a good way to copy bucket with dozen of thousands files in it. Is there a way that take seconds to minutes instead of hours?
You can easily clone a bucket by using sync. (First create the bucket you want to clone it to):
aws s3 sync --quiet s3://[bucket-old] s3://[bucket-new]
Tip: use the --dryrun flag to see what you're doing first (and break it off when it looks good or else you have to wait for your thousands of files to finish listing)

How does S3 assign a timestamp upon upload?

We have a process uploading files to S3. In fact, it's indirect. We use Amazon Elastic MapReduce (EMR), and Hadoop commits the files to S3, from many different task nodes. Then, after that Hadoop job has completed successfully, another part of the process uses Hadoop's FileSystem.createNewFile() to create some files from the master node.
The files that are created from these various machines have timestamps in S3. We assume the timestamps of the files committed from the task nodes are before the files created from the master node.
I believe that is sometimes untrue, but why?
What assigns the timestamp to an S3 file? Is it the Amazon EMR Hadoop client, or some S3 machine?
If I have two machines uploading to S3 whose local clock differs by 30 minutes, will the timestamps be 30 minutes apart?
You are unable to set the Last-Modified values yourself. S3 decides them:
https://forums.aws.amazon.com/thread.jspa?messageID=209241
The only timestamp in S3 appears to be the "Last Modified" meta-data. I believe that the last modified date/time is updated by the S3 system itself, and reflects the time when the file completed uploading fully to S3 (S3 will not show incomplete transfers.)
So it shouldn't matter which node you upload a file from, the "last modified" timestamp on S3 should be consistently the same when you list it on S3.