I would like to rename all files in my Amazon S3 bucket with extension.PDF to .pdf (lowercase).
Did someone already have to do this? There are a lot of files (around 1500). Is S3cmd the best way to do this? How would you do?
s3cmd --recursive ls s3://bucketname |
awk '{ print $4 }' | grep "*.pdf" | while read -r line ; do
s3cmd --recursive mv s3://<s3_bucketname>/$line/ s3://<s3_bucketname>/${line%.*}".PDF"
done
A local linux/unix example for renaming all files with .pdf extension to .PDF extension.
mkdir pdf-test
cd pdf-test
touch a{1..10}.pdf
Before
ls
a1.pdf a2.pdf a4.pdf a6.pdf a8.pdf grep.sh
a10.pdf a3.pdf a5.pdf a7.pdf a9.pdf
The script file grep.sh
#/bin/bash
ls |grep .pdf | while read -r line ; do # here use ls from s3
echo "Processing $line"
# your s3 code goes here
mv $line ${line%.*}".PDF"
done
Add permissions and try
chmod u+x grep.sh
./grep.sh
After
ls
a1.PDF a2.PDF a4.PDF a6.PDF a8.PDF grep.sh
a10.PDF a3.PDF a5.PDF a7.PDF a9.PDF
You can apply the same logic. instead of mv use s3 mv.
Related
Is there a way to use copy files to an S3 bucket by preserving the file path?
This is the example:
1. I produce a list of files that are different in bucket1 then in bucket2 using s3cmd sync --dry-run
The list looks like this:
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/definition/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/nodetypes.cnd
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/properties.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/app-store/.content.xml
I need to process this list to upload to a new location in the bucket (e.g. s3://bucket/diff/) only the files in the list BUT with the full path as shown in the list.
A simple loop like this:
diff_file_list=$(s3cmd -c s3cfg sync --dry-run s3://BUCKET/20150831/PROD s3://BUCKET/20150831/DEV | awk '{print $2}')
for f in $diff_file_list; do
s3cmd -c s3cfg cp $f s3://BUCKET/20150831/DIFF/
done
does not work; it produces this:
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/definition/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/nodetypes.cnd copied to s3://BUCKET/20150831/DIFF/nodetypes.cnd
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/properties.xml copied to s3://BUCKET/20150831/DIFF/properties.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/origin-store/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
Thanks,
Short answer: not it is not! That is because the paths in S3 buckets are not actually directories/folders and the S3 bucket have no such concepts of structure even if various tools are presenting it this way (including s3cmd which is really confusing...).
So, the "path" is actually a prefix (although the sc3cmd sync to local knows how to translate this prefix in a directory structure on your filesystem).
For a bash script the solution is:
1. create a file listing all the paths from a s3cmd sync --dry-run command (basically a list of diffs) => file1
copy that file and use sed to modify the paths as needed:
sed 's/(^s3.*)PROD/\1DIFF/') => file2
Merge the files so that line1 in file1 is continued by line1 in file2 and so on:
paste file1 file2 > final.txt
Read final.txt, line by line, in a loop and use each line as a set of 2 parameters to a copy or syun command:
while IFS='' read -r line || [[ -n "$line" ]]; do
s3cmd -c s3cfg sync $line
done < "final.txt"
Notes:
1. $line in the s3cmd must not be in quotes; if it is the sync command will complain that it received one parameter only... of course!
2. the [[ -n "$line" ]] is used here so that read will not fail of the last line has not new line character
Boto could not help more unfortunately so if you need something similar in python you would do it pretty much the same....
I'm using s3cmd to backup my databases to Amazon S3, but I'd also like to backup a certain folder and archive it.
I have this part from this script that successfully backups the databases to S3:
# Loop the databases
for db in $databases; do
# Define our filenames
filename="$stamp - $db.sql.gz"
tmpfile="/tmp/$filename"
object="$bucket/$stamp/$filename"
# Feedback
echo -e "\e[1;34m$db\e[00m"
# Dump and zip
echo -e " creating \e[0;35m$tmpfile\e[00m"
mysqldump -u root -p$mysqlpass --force --opt --databases "$db" | gzip -c > "$tmpfile"
# Upload
echo -e " uploading..."
s3cmd put "$tmpfile" "$object"
# Delete
rm -f "$tmpfile"
done;
How can I add another section to archive a certain folder, upload to S3 and then delete the local archive?
Untested and basic but this should get the job done with some minor tweaks
# change to tmp dir - creating archives with absolute paths can be dangerous
cd /tmp
# create archive with timestamp of dir /path/to/directory/to/archive
tar -czf "$stamp-archivename.tar.gz" /path/to/directory/to/archive
# upload archive to s3 bucket 'BucketName'
s3cmd put "/tmp/$stamp-archivename.tar.gz" s3://BucketName/
# remove local archive
rm -f "/tmp/$stamp-archivename.tar.gz"
I was using S3Fox and I end up with creating lots of _$folder$ files in multiple S3 directories. I want to clean them all. But the files are neither visible through command line tool nor through S3Fox. They are only visible through AWS S3 console.
I am looking for solution something like
hadoop fs -rmr s3://s3_bucket/dir1/dir2/dir3///*_\$folder\$
you can use s3cmd (http://s3tools.org/s3cmd) and the power of shell.
s3cmd del $(s3cmd ls s3://your-bucket | grep _$folder$ | awk '{ print $1}')
I am trying to use s3cmd tool to invalidate my files,
it seems s3cmd automatically choose a distribution for me,
but I have more distributions from the same bucket, how can I choose distribution to invalidate ?
I have tried this :
s3cmd sync —cf-invalidate myfile cf://XXXXXXXXXX/mypath
but it does not work. I get this:
Invalid source/destination”
any idea?
thanks!
I believe you'd be looking to force an invalidation via the origin (i.e. your S3 bucket or similar) like so:
s3cmd --cf-invalidate _site/ s3://example-origin.com/
Here is my final conclusion:
after many many tries , the solution is very restrict.
follow this format:
s3cmd sync --cf-invalidate --acl-public --preserve --recursive ./[local_folder] s3://[my_bucket]/[remote_folder]/
when I run this command , the actual folder should be in command running folder
local file should have ./
remote folder should be end by /
I could not make s3cmd's invalidation work, so I used s3cmd to update the file and cloudfront-invalidator to do the invalidation. The script reads the aws authentication used by s3cmd for cloudfront-invalidator.
#!/bin/bash
if [ -z "$(which s3cmd)" ]; then
echo "s3cmd is not installed or is not on the PATH"
exit -1
fi
if [ -z "$(which cloudfront-invalidator)" ]; then
echo "cloudfront-invalidator is not installed or is not on the PATH"
echo "See https://github.com/reidiculous/cloudfront-invalidator"
echo "TL;DR: sudo gem install cloudfront-invalidator"
exit -1
fi
function awsKeyId {
awk -F '=' '{if (! ($0 ~ /^;/) && $0 ~ /aws_access_key_id/) print $2}' ~/.aws/config | tr -d ' '
}
function awsSecret {
awk -F '=' '{if (! ($0 ~ /^;/) && $0 ~ /aws_secret_access_key/) print $2}' ~/.aws/config | tr -d ' '
}
export file="stylesheets/main.css"
export distributionId=blahblah
export bucket=www.blahblah
s3cmd -P -m 'text/css' put public/$file s3://$bucket/$f
cloudfront-invalidator invalidate `awsKeyId` `awsSecret` $distributionId $file
I have a S3 bucket with thousands of folders and many txt files inside those folders.
I would like to list all txt files inside the bucket so I can check if they're removable. Then remove them if they are.
Any idea how to do this with s3cmd?
This is fairly simple, but depends on how sophisticated you want the check to be. Suppose you wanted to remove every text file whose filename includes 'foo':
s3cmd --recursive ls s3://mybucket |
awk '{ print $4 }' | grep "*.txt" | grep "foo" | xargs s3cmd del
If you want a more sophisticated check than grep can handle, just redirect the first three commands to a file, then either manually edit the file or use or awk or perl or whatever your favorite tool is, then cat the output into s3cmd (depending on the check, you could do it all with piping, too):
s3cmd --recursive ls s3://mybucket | awk '{ print $4 }' | grep "*.txt" > /tmp/textfiles
magic-command-to-check-filenames /tmp/textfiles
cat /tmp/textfiles | xargs s3cmd del