Transfer Files from one s3 Folder to Another s3 Folder using Boto3 Python

I am trying to copy files from one s3 bucket to another with some modifications in destination path.
The Original Script is as below:
import boto3
import os
old_bucket_name = 'XT01-sample-data'
old_prefix = 'Test/'
new_bucket_name = 'XT02-sample-data2'
new_prefix = old_bucket_name + '/' + old_prefix
s3 = boto3.resource('s3')
old_bucket = s3.Bucket(old_bucket_name)
new_bucket = s3.Bucket(new_bucket_name)
extra_args = {
'ServerSideEncryption': ENCRYPTION,
'StorageClass': STORAGE_CLASS
for obj in old_bucket.objects.filter(Prefix=old_prefix):
old_source = { 'Bucket': old_bucket_name,
'Key': obj.key}
# replace the prefix
new_key = obj.key.replace(old_prefix, new_prefix, 1)
new_obj = new_bucket.Object(new_key)
print("Object old ", obj)
print("new_key ", new_key)
print("new_obj ", new_obj)
print("Starting Deletion Loop")
bucket = s3.Bucket(old_bucket_name)
The above script is copying the files from bucket XT01-sample-data, Folder Test/
to New bucket XT02-sample-data2 with new path XT01-sample-data/Test1/
The ask is now to modify the script to add timestamp in destination path and files from one folder lands under once time stamp.
We have below files in source bucket at various folders
Expected output should be all files from one subfolder should be placed under one timestamp.
not all files should be placed in one timestamp there should be a level of segregation based on Timestamp at milisecond level (unix timestamp)
For Files under folder Test1/Test1.1
For files under folder Test1/Test1.2
For files under folder Test1/Test1.3
For files under folder Test1/Test2/Test2.1

I was able to resolve this issue by making an entry in dynamodb table.
as soon as the first instance is encountered then entry is made in dynamodb table.
when next instance is encountered a corresponding check is made in dynamodb table before copying the data and same date which is present in dynamodb is taken.


Reading multiple csv files in AWS Sagemaker from a location in Amazon S3 Bucket

I have multiple csv files in a location in S3. The name of those files is in a date format. Example: 2021_09_30_Output.csv
I need to understand how I can read all the files in this folder while selecting only the dates that I require. An example would be reading only the files from September. ie: "2022_09_*.csv" which would read only the files from that month
Would appreciate the help. Thanks
You can create a function that will return all files from a particular date onwards using the datetime library based on the naming convention of your files. The following snippet can get you started:
import datetime
s3 = boto3.resource('s3')
BUCKET_NAME = 'name'
september_1 = datetime.datetime(2021, 9, 1)
files = get_files_after(BUCKET_NAME, september_1)
for file in files:
contents = file['Body'].read()
contents = contents.decode("utf-8")
def get_files_after(bucket, date):
files = []
for obj in s3.Bucket(bucket).objects.all():
key = obj.key
file_date = key[:-4] # Remove '.csv' from name
file_date = datetime.datetime.strptime(file_date, '%Y_%m_%d')
if file_date > date:
return files

Google Cloud Data Transfer to a GCS subfolder

I am trying to transfer data from AWS S3 bucket (e.g. s3://mySrcBkt) to GCS location ( a folder under a bucket as gs://myDestBkt/myDestination ). I could not find the same option from Interface as it has only provision to provide bucket and not a subfolder. Neither I found the similar povision from the storagetransfer API. Here is my code snippet:
String SOURCE_BUCKET = .... ;
String ACCESS_KEY = .....;
String SECRET_ACCESS_KEY = .....;
String DESTINATION_BUCKET = .......;
TransferJob transferJob =
new TransferJob()
new TransferSpec()
.setObjectConditions(new ObjectConditions()
.setTransferOptions(new TransferOptions()
new AwsS3Data()
new AwsAccessKey()
new GcsData()
new Schedule()
Unfortunately I could not find anywhere to mention the destination folder for this transfer. I know gsutil rsync has similar however the scale & data integrity is a concern. Can anyone guide me/point me any way/workaround to achieve the goal ?
As the bucket and not a subdirectory is the available option for data transfer destination, the workaround for this scenario would be doing the transfer to your bucket, then doing the rsync operation between your bucket and the subdirectory, just keep in mind that you should try running the gsutil -m rsync -r -d -n to verify what it'll do, as you could delete data accidentally.

AWS S3 versioning - Choose which version S3 to restore

Currently, I'm using S3 versioning and I sync data to S3 bucket daily. My question is how can I restore a versioned bucket in to a particular point in time? For example: I sync data to S3 from Monday to Saturday, and in Saturday I want to restore whole folder from Tuesday, so how can I do in cli?
We used this in production to cleanup some files after s3-pit-restore and AWS support failed. This python script permanently deletes all versions of files after a given time.
import os
from datetime import datetime
import boto3
bad_day =
s3 = boto3.resource(
key = ''
metadata = s3.meta.client.list_object_versions(Bucket=os.environ['AWS_BUCKET'], Prefix=key)
to_delete = []
for version in metadata['Versions']:
if version['Size'] > 0:
if version['LastModified'] > bad_day:
to_delete.append({'Key': version['Key'], 'VersionId': version['VersionId']})
bucket = s3.Bucket(os.environ['AWS_BUCKET'])
# bucket.delete_objects(Delete={'Objects': to_delete})
Don't uncomment the last line until you are ready to delete.

Unzip a file to s3

I am looking at a simple way to extract a zip/gzip present in s3 bucket to the same bucket location and delete the parent zip/gzip file post extraction.
I am unable to achieve this with any of the API's currently.
Have tried native boto, pyfilesystem(fs), s3fs.
The source and destination links seem to be an issue for these functions.
(Using with Python 2.x/3.x & Boto 2.x )
I see there is an API for node.js(unzip-to-s3) to do this job , but none for python.
Couple of implementations i can think of:
A simple API to extract the zip file within the same bucket.
Use s3 as a filesystem and manipulate data
Use a data pipeline to achieve this
Transfer the zip to ec2 , extract and copy back to s3.
The option 4 would be the least preferred option, to minimise the architecture overhead with ec2 addon.
Need support in getting this feature implementation , with integration to lambda at a later stage. Any pointers to these implementations are greatly appreciated.
Thanks in Advance,
You could try that unzips/expands several different formats of archives from S3 into a destination in your bucket. I used it to unzip components of a digital catalog into S3.
Have solved by using ec2 instance.
Copy the s3 files to local dir in ec2
and copy that directory back to S3 bucket.
Sample to unzip to local directory in ec2 instance
def s3Unzip(srcBucket,dst_dir):
function to decompress the s3 bucket contents to local machine
srcBucket (string): source bucket name
dst_dir (string): destination location in the local/ec2 local file system
#bucket = s3.lookup(bucket)
bucket = s3.lookup(bucket_name)
for key in bucket:
path = os.path.join(dst_dir,
if path.endswith('.zip'):
opener, mode = zipfile.ZipFile, 'r'
elif path.endswith('.tar.gz') or path.endswith('.tgz'):
opener, mode =, 'r:gz'
elif path.endswith('.tar.bz2') or path.endswith('.tbz'):
opener, mode =, 'r:bz2'
raise ValueError ('unsuppported format')
print ("local directories created")
except Exception:
logger_s3.warning ("Exception in creating local directories to extract zip file/ folder already existing")
cwd = os.getcwd()
file = opener(path, mode)
try: file.extractall()
finally: file.close()'(%s) extracted successfully to %s'%(key ,dst_dir))
except Exception as e:
logger_s3.error('failed to extract (%s) to %s'%(key ,dst_dir))
sample code to upload to mysql instance
Use the "LOAD DATA LOCAL INFILE" query to upload to mysql directly
def upload(file_path,timeformat):
function to upload a csv file data to mysql rds
file_path (string): local file path
timeformat (string): destination bucket to copy data
for file in file_path:
con = connect()
cursor = con.cursor()
qry="""LOAD DATA LOCAL INFILE '%s' INTO TABLE xxxx FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (col1 , col2 ,col3, #datetime , col4 ) set datetime = str_to_date(#datetime,'%s');""" %(file,timeformat)
con.commit() ("Loading file:"+file)
except Exception:
logger_rds.error ("Exception in uploading "+file)
##Rollback in case there is any error
# disconnect from server
Lambda function:
You can use a Lambda function where you read zipped files into the buffer, gzip the individual files, and reupload them to S3. Then you can either archive the original files or delete them using boto.
You can also set an event based trigger that runs the lambda automatically everytime there is a new zipped file in S3. Here's a full tutorial for the exact thing here:

What would cause a 'BotoServerError: 400 Bad Request' when calling create_application_version?

I have included some code that uploads a war file into an s3 bucket (creating the bucket first if it does not exist). It then creates an elastic beanstalk application version using the just-uploaded war file.
Assume /tmp/server_war exists and is a valid war file. The following code will fail with boto.exception.BotoServerError: BotoServerError: 400 Bad Request:
#!/usr/bin/env python
import time
import boto
BUCKET_NAME = 'foo_bar23498'
s3 = boto.connect_s3()
bucket = s3.lookup(BUCKET_NAME)
if not bucket:
bucket = s3.create_bucket(BUCKET_NAME, location='')
version_label = 'server%s' % int(time.time())
# uplaod the war file
key_name = '%s.war' % version_label
s3key = bucket.new_key(key_name)
print 'uploading war file...'
headers={'Content-Type' : 'application/x-zip'})
# uses us-east-1 by default
eb = boto.connect_beanstalk()
what would cause this?
One possible cause of this error is the bucket name. Apparently you can have s3 bucket names that contain underscores, but you cannot create application versions using keys in those buckets.
If you change the fourth line above to
BUCKET_NAME = 'foo-bar23498'
It should work.
Yes, it feels weird to be answering my own question...apparently this the recommended approach for this situation on stack overflow. I hope I save someone else a whole lot of debugging time.