Getting Origami-pdf to work with Amazon S3 files

Getting Origami-pdf to work with Amazon S3 files - pdf

I've implemented a local script to insert digital signatures into local pdf files recurring to Origami, but don't quite know what would be the best approach to do this within a rails server, and with amazon s3 stored files.
I am guessing i would need to download the file from s3 to my server (or capture it before uploading to amazon, which is what i am doing with paperclip) insert the signature, and sent it back to s3 again.
Here is the PDF.read method in pdf.rb file of origami solution:
class << self
#
# Reads and parses a PDF file from disk.
#
def read(filename, options = {})
filename = File.expand_path(filename) if filename.is_a?(::String)
PDF::LinearParser.new(options).parse(filename)
end
How could i adapt this so that i treat an in-memory binary file?
Do you have any suggestions?
You can find more about origami here
And my code below
require 'openssl'
begin
require 'origami'
rescue LoadError
ORIGAMIDIR = "C:\RailsInstaller\Ruby1.9.3\lib\ruby\gems\1.9.1\gems\origami-1.2.4\lib"
$: << ORIGAMIDIR
require 'origami'
end
include Origami
INPUTFILE = "Sample.pdf"
#inputfile = String.new(INPUTFILE)
OUTPUTFILE = #inputfile.insert(INPUTFILE.rindex("."),"_signed")
CERTFILE = "certificate.pem"
RSAKEYFILE = "private_key.pem"
passphrase = "your passphrase"
key4pem=File.read RSAKEYFILE
key = OpenSSL::PKey::RSA.new key4pem, passphrase
cert = OpenSSL::X509::Certificate.new(File.read CERTFILE)
pdf = PDF.read(INPUTFILE)
page = pdf.get_page(1)
# Add signature annotation (so it becomes visibles in pdf document)
sigannot = Annotation::Widget::Signature.new
sigannot.Rect = Rectangle[:llx => 89.0, :lly => 386.0, :urx => 190.0, :ury => 353.0]
page.add_annot(sigannot)
# Sign the PDF with the specified keys
pdf.sign(cert, key,
:method => 'adbe.pkcs7.sha1',
:annotation => sigannot,
:location => "Portugal",
:contact => "myemail#email.tt",
:reason => "Proof of Concept"
)
# Save the resulting file
pdf.save(OUTPUTFILE)

PDF.read and PDF.save methods both accept either a file path or a Ruby IO object.
One method to create a PDF instance from a string (which, I suppose, is what you mean when you say "in-memory") is to use a StringIO object.
For example, the following session in the Origami shell will create a PDF instance, save it to a StringIO object and reload it using its own output string.
>>> PDF.new.save(strio = StringIO.new)
...
>>> strio.string
"%PDF-1.0\r\n1 0 obj\r\n<<\r\n\t/Pages 2 0 R ..."
>>> strio.reopen(strio.string, 'r')
#<StringIO:0xffbea6cc>
>>> pdf = PDF.read(strio)
...
>>> pdf.class
Origami::PDF

After a deep further analysis into the origami code, i noticed that PDF.Read accepts a binary file, and so instead of sending the local file path, we can send the file instance as a whole.

As #MrWater wrote, Origami::PDF.read accepts a stream (more precisely, the Origami::PDF::LinearParser does, look at the source here).
Here's my simple solution:
require 'open-uri'
# pdf_url = 'http://someurl.com/somepdf.pdf'
pdf = Origami::PDF.read(URI.parse(pdf_url))
References
open-uri URI
Origami::PDF::LinearParser

Related

Use Shrine to upload video files and generate thumbnails

I want to use the Shrine gem to upload video files, transcode, & generate thumbnails from the video.
I am trying to convert Erik Dahlstrand's Shrine-Rails-example from photos to videos. I am having trouble creating the video uploader. I based this code on Video isn't of allowed type (allowed types: video/mp4), Shrine, Rails
require "streamio-ffmpeg"
class VideoUploader < Shrine
ALLOWED_TYPES = %w[video/mp4 video/quicktime video/x-msvideo video/mpeg]
plugin :processing
plugin :versions
plugin :determine_mime_type
plugin :cached_attachment_data
plugin :remove_attachment
plugin :add_metadata
plugin :validation_helpers
plugin :derivation_endpoint, prefix: "derivations/video"
add_metadata do |io, context|
movie = Shrine.with_file(io) { |file| FFMPEG::Movie.new(file.path) }
{ "duration" => movie.duration,
"bitrate" => movie.bitrate,
"resolution" => movie.resolution,
"frame_rate" => movie.frame_rate }
end
movie.screenshot("video_thumb_007.jpg", seek_time: 5, resolution: '320x240')
metadata_method :duration
Attacher.validate do
validate_max_size 100.megabyte, message: "is too large (max is 100 MB)"
validate_mime_type_inclusion ALLOWED_TYPES
end
end
I get this error:
/var/folders/mm/_j8x4k2176jcv31zvbc497_c0000gp
/T/shrine20190607-24438-4f3jz2.m4v: No such file or directory
and in fact, there is no file in that location. So where is the file stored while waiting for upload??
Also, using the demo to upload photos to AWS (production env), the objects are stored in the bucket in a folder called 'photos'. Shrine uses the table name to name the folder apparently. Is possible to create alternative & nested folder names?
Thank you - seems like an amazing gem! Trying to understand it better!
Thx

Further delving into the documentation, I was able to solve this problem with only one remaining question - is it possible to control which folders in the AWS bucket the original and thumbs are uploaded to? Thx
Solution:
require "streamio-ffmpeg"
require "tempfile"
class VideoUploader < ImageUploader
plugin :add_metadata
plugin :validation_helpers
plugin :processing
plugin :versions
plugin :delete_raw
add_metadata do |io, context|
movie = Shrine.with_file(io) { |file| FFMPEG::Movie.new(file.path) }
{ "duration" => movie.duration,
"bitrate" => movie.bitrate,
"resolution" => movie.resolution,
"frame_rate" => movie.frame_rate
}
end
process(:store) do |io, context|
versions = {original: io}
io.download do |original|
screenshot1 = Tempfile.new(["screenshot1", ".jpg"], binmode: true)
screenshot2 = Tempfile.new(["screenshot2", ".jpg"], binmode: true)
screenshot3 = Tempfile.new(["screenshot3", ".jpg"], binmode: true)
screenshot4 = Tempfile.new(["screenshot4", ".jpg"], binmode: true)
movie = FFMPEG::Movie.new(original.path)
movie.screenshot(screenshot1.path, seek_time: 5, resolution: '640x480')
movie.screenshot(screenshot2.path, seek_time: 10, resolution: '640x480')
movie.screenshot(screenshot3.path, seek_time: 15, resolution: '640x480')
movie.screenshot(screenshot4.path, seek_time: 20, resolution: '640x480')
[screenshot1, screenshot2, screenshot3, screenshot4].each(&:open) # refresh file descriptors
versions.merge!(screenshot1: screenshot1, screenshot2: screenshot2, screenshot3: screenshot3, screenshot4: screenshot4)
end
versions
end
Attacher.validate do
validate_max_size 100.megabyte, message: "is too large (max is 100 MB)"
validate_mime_type_inclusion ALLOWED_TYPES
end
end

How to convert raw emails (MIME) from AWS SES to Gmail?

I have a gmail account linked to my domain account.
AWS SES will send messages to my S3 bucket. From there, SNS will forward the message in a raw format to my gmail address.
How do I automatically convert the raw message into a standard email format?

The raw message is in the standard email format. I think what you want to know is how to parse that standard raw email into an object that you can manipulate so that you can forward it to yourself and have it look like a standard email. AWS provides a tutorial on how to forward emails with a lambda function, through SES, by first storing them in your S3 bucket: https://aws.amazon.com/blogs/messaging-and-targeting/forward-incoming-email-to-an-external-destination/
If you follow those instructions, you'll find that the email you recieve comes as an attachment, not looking like a standard email. The following code is an alteration of the Python code provided by AWS that does what you're looking for (substitute this for the code provided in the tutorial):
# Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# Altered from original by Adam Winter
#
# This file is licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License. A copy of the
# License is located at
#
# http://aws.amazon.com/apache2.0/
#
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
# OF ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
import os
import boto3
import email
import re
import html
from botocore.exceptions import ClientError
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.application import MIMEApplication
from email.mime.image import MIMEImage
region = os.environ['Region']
def get_message_from_s3(message_id):
incoming_email_bucket = os.environ['MailS3Bucket']
incoming_email_prefix = os.environ['MailS3Prefix']
if incoming_email_prefix:
object_path = (incoming_email_prefix + "/" + message_id)
else:
object_path = message_id
object_http_path = (f"http://s3.console.aws.amazon.com/s3/object/{incoming_email_bucket}/{object_path}?region={region}")
# Create a new S3 client.
client_s3 = boto3.client("s3")
# Get the email object from the S3 bucket.
object_s3 = client_s3.get_object(Bucket=incoming_email_bucket,
Key=object_path)
# Read the content of the message.
file = object_s3['Body'].read()
file_dict = {
"file": file,
"path": object_http_path
}
return file_dict
def create_message(file_dict):
stringMsg = file_dict['file'].decode('utf-8')
# Create a MIME container.
msg = MIMEMultipart('alternative')
sender = os.environ['MailSender']
recipient = os.environ['MailRecipient']
# Parse the email body.
mailobject = email.message_from_string(file_dict['file'].decode('utf-8'))
#print(mailobject.as_string())
# Get original sender for reply-to
from_original = mailobject['Return-Path']
from_original = from_original.replace('<', '');
from_original = from_original.replace('>', '');
print(from_original)
# Create a new subject line.
subject = mailobject['Subject']
print(subject)
if mailobject.is_multipart():
index = stringMsg.find('Content-Type: multipart/')
stringBody = stringMsg[index:]
#print(stringBody)
stringData = 'Subject: ' + subject + '\nTo: ' + sender + '\nreply-to: ' + from_original + '\n' + stringBody
message = {
"Source": sender,
"Destinations": recipient,
"Data": stringData
}
return message
for part in mailobject.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# case for each common content type
if ctype == 'text/plain' and 'attachment' not in cdispo:
bodyPart = MIMEText(part.get_payload(decode=True), 'plain', part.get_content_charset())
msg.attach(bodyPart)
if ctype == 'text/html' and 'attachment' not in cdispo:
mt = MIMEText(part.get_payload(decode=True), 'html', part.get_content_charset())
email.encoders.encode_quopri(mt)
del mt['Content-Transfer-Encoding']
mt.add_header('Content-Transfer-Encoding', 'quoted-printable')
msg.attach(mt)
if 'attachment' in cdispo and 'image' in ctype:
mi = MIMEImage(part.get_payload(decode=True), ctype.replace('image/', ''))
del mi['Content-Type']
del mi['Content-Disposition']
mi.add_header('Content-Type', ctype)
mi.add_header('Content-Disposition', cdispo)
msg.attach(mi)
if 'attachment' in cdispo and 'application' in ctype:
ma = MIMEApplication(part.get_payload(decode=True), ctype.replace('application/', ''))
del ma['Content-Type']
del ma['Content-Disposition']
ma.add_header('Content-Type', ctype)
ma.add_header('Content-Disposition', cdispo)
msg.attach(ma)
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
body = MIMEText(mailobject.get_payload(decode=True), 'UTF-8')
msg.attach(body)
# The file name to use for the attached message. Uses regex to remove all
# non-alphanumeric characters, and appends a file extension.
filename = re.sub('[^0-9a-zA-Z]+', '_', subject_original)
# Add subject, from and to lines.
msg['Subject'] = subject
msg['From'] = sender
msg['To'] = recipient
msg['reply-to'] = mailobject['Return-Path']
# Create a new MIME object.
att = MIMEApplication(file_dict["file"], filename)
att.add_header("Content-Disposition", 'attachment', filename=filename)
# Attach the file object to the message.
msg.attach(att)
message = {
"Source": sender,
"Destinations": recipient,
"Data": msg.as_string()
}
return message
def send_email(message):
aws_region = os.environ['Region']
# Create a new SES client.
client_ses = boto3.client('ses', region)
# Send the email.
try:
#Provide the contents of the email.
response = client_ses.send_raw_email(
Source=message['Source'],
Destinations=[
message['Destinations']
],
RawMessage={
'Data':message['Data']
}
)
# Display an error if something goes wrong.
except ClientError as e:
print('send email ClientError Exception')
output = e.response['Error']['Message']
else:
output = "Email sent! Message ID: " + response['MessageId']
return output
def lambda_handler(event, context):
# Get the unique ID of the message. This corresponds to the name of the file
# in S3.
message_id = event['Records'][0]['ses']['mail']['messageId']
print(f"Received message ID {message_id}")
# Retrieve the file from the S3 bucket.
file_dict = get_message_from_s3(message_id)
# Create the message.
message = create_message(file_dict)
# Send the email and print the result.
result = send_email(message)
print(result)

For those getting this error:
'bytes' object has no attribute 'encode'
In this line:
body = MIMEText(mailobject.get_payload(decode=True), 'UTF-8')
I could make it work. I am not an expert on this so the code might need some improvement. Also the email body includes html tags. But at least it got delivered.
If decoding the email still fails the error message will appear in your CloudWatch log. Also you will receive an email with the error message.
payload = mailobject.get_payload(decode=True)
try:
decodedPayload = payload.decode()
body = MIMEText(decodedPayload, 'UTF-8')
msg.attach(body)
except Exception as error:
errorMsg = "An error occured when decoding the email payload:\n" + str(error)
print(errorMsg)
body = errorMsg + "\nPlease download it manually from the S3 bucket."
msg.attach(MIMEText(body, 'plain'))
It is up to you which information you want to add to the error email like the subject or the from address.
Just another hint: With the above code you will get an error because subject_original is undefinded. Just delete the following lines.
# The file name to use for the attached message. Uses regex to remove all
# non-alphanumeric characters, and appends a file extension.
filename = re.sub('[^0-9a-zA-Z]+', '_', subject_original)
# Create a new MIME object.
att = MIMEApplication(file_dict["file"], filename)
att.add_header("Content-Disposition", 'attachment', filename=filename)
# Attach the file object to the message.
msg.attach(att)
As far as I understand this code is supposed to add the original email as an attachment which was not what I wanted.

create a (Prawn) PDF inside custom DelayedJob and upload it to S3?

Using: Rails 4.2, Prawn, Paperclip, DelayedJobs via ActiveJobs, Heroku.
I have a PDF that is very large and needs to be handled in the background. Inside a custom Job I want to create it, upload it to S3, and then email the user with a url when its ready. I facilitate this via a PdfUpload model.
Is there anything wrong with my approach/code? Im using File.open() as outlined in examples I found, but this seems to be the root of my error ( TypeError: no implicit conversion of FlightsWithGradesReport into String ).
class PdfUpload < ActiveRecord::Base
has_attached_file :report,
path: "schools/:school/pdf_reports/:id_:style.:extension"
end
/pages_controller.rb
def flights_with_grades_report
flash[:success] = "The report you requested is being generated. An email will be sent to '#{ current_user.email }' when it is ready."
GenerateFlightsWithGradesReportJob.perform_later(current_user.id, #rating.id)
redirect_to :back
authorize #rating, :reports?
end
/ the job
class GenerateFlightsWithGradesReportJob < ActiveJob::Base
queue_as :generate_pdf
def perform(recipient_user_id, rating_id)
rating = Rating.find(rating_id)
pdf = FlightsWithGradesReport.new( rating.id )
pdf_upload = PdfUpload.new
pdf_upload.report = File.open( pdf )
pdf_upload.report_processing = true
pdf_upload.report_file_name = "report.pdf"
pdf_upload.report_content_type = "application/pdf"
pdf_upload.save!
PdfMailer.pdf_ready(recipient_user_id, pdf_upload.id)
end
end
This results in an error:
TypeError: no implicit conversion of FlightsWithGradesReport into String

Changing this:
pdf_upload.report = File.open( pdf )
to this:
pdf_upload.report = StringIO.new(pdf.render)
fixed my problem.

How do I export PSDs as PNGs with py-appscript?

I wrote a script to export PSDs as PNGs with rb-appscript. That's fine and dandy, but I can't seem to pull it off in py-appscript.
Here's the ruby code:
#!/usr/bin/env ruby
require 'rubygems'
require 'optparse'
require 'appscript'
ps = Appscript.app('Adobe Photoshop CS5.app')
finder = Appscript.app("Finder.app")
path = "/Users/nbaker/Desktop/"
ps.activate
ps.open(MacTypes::Alias.path("#{path}guy.psd"))
layerSets = ps.current_document.layer_sets.get
# iterate through all layers and hide them
ps.current_document.layers.get.each do |layer|
layer.visible.set false
end
layerSets.each do |layerSet|
layerSet.visible.set false
end
# iterate through all layerSets, make them visible, and create a PNG for them
layerSets.each do |layerSet|
name = layerSet.name.get
layerSet.visible.set true
ps.current_document.get.export(:in => "#{path}#{name}.png", :as => :save_for_web,
:with_options => {:web_format => :PNG, :png_eight => false})
layerSet.visible.set false
end
And here's the apparently nonequivalent python code:
from appscript import *
from mactypes import *
# fire up photoshop
ps = app("Adobe Photoshop CS5.app")
ps.activate()
# open the file for editing
path = "/Users/nbaker/Desktop/"
f = Alias(path + "guy.psd")
ps.open(f)
layerSets = ps.current_document.layer_sets()
# iterate through all layers and hide them
for layer in ps.current_document.layers():
layer.visible.set(False)
#... and layerSets
for layerSet in layerSets:
layerSet.visible.set(False)
# iterate through all layerSets, make them visible, and create a PNG for them
for layerSet in layerSets:
name = layerSet.name()
layerSet.Visible = True
ps.current_document.get().export(in_=Alias(path + name + ".png"), as_=k.save_for_web,
with_options={"web_format":k.PNG, "png_eight":False})
The only part of the python script that doesn't work is the saving. I've gotten all sorts of errors trying different export options and stuff:
Connection is invalid... General Photoshop error occurred. This functionality may not be available in this version of Photoshop... Can't make some data into the expected type...
I can live with just the ruby script, but all of our other scripts are in python, so it would be nice to be able to pull this off in python. Thanks in advance internet.

Bert,
I think I have a solution for you – albeit several months later. Note that I'm a Python noob, so this is by no means elegant.
When I tried out your script, it kept crashing for me too. However by removing the "Alias" in the export call, it seems to be OK:
from appscript import *
from mactypes import *
# fire up photoshop
ps = app("Adobe Photoshop CS5.1.app")
ps.activate()
# open the file for editing
path = "/Users/ian/Desktop/"
f = Alias(path + "Screen Shot 2011-10-13 at 11.48.51 AM.psd")
ps.open(f)
layerSets = ps.current_document.layer_sets()
# no need to iterate
ps.current_document.layers.visible.set(False)
ps.current_document.layer_sets.visible.set(False)
# iterate through all layerSets, make them visible, and create a PNG for them
for l in layerSets:
name = l.name()
# print l.name()
l.visible.set(True)
# make its layers visible too
l.layers.visible.set(True)
# f = open(path + name + ".png", "w").close()
ps.current_document.get().export(in_=(path + name + ".png"), as_=k.save_for_web,
with_options={"web_format":k.PNG, "png_eight":False})
I noticed, too, that you iterate through all the layers to toggle their visibility. You can actually just issue them all a command at once – my understanding is that it's faster, since it doesn't have to keep issuing Apple Events.
The logic of my script isn't correct (with regards to turning layers on and off), but you get the idea.
Ian

Delete Amazon S3 buckets? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been interacting with Amazon S3 through S3Fox and I can't seem to delete my buckets. I select a bucket, hit delete, confirm the delete in a popup, and... nothing happens. Is there another tool that I should use?

It is finally possible to delete all the files in one go using the new Lifecycle (expiration) rules feature. You can even do it from the AWS console.
Simply right click on the bucket name in AWS console, select "Properties" and then in the row of tabs at the bottom of the page select "lifecycle" and "add rule". Create a lifecycle rule with the "Prefix" field set blank (blank means all files in the bucket, or you could set it to "a" to delete all files whose names begin with "a"). Set the "Days" field to "1". That's it. Done. Assuming the files are more than one day old they should all get deleted, then you can delete the bucket.
I only just tried this for the first time so I'm still waiting to see how quickly the files get deleted (it wasn't instant but presumably should happen within 24 hours) and whether I get billed for one delete command or 50 million delete commands... fingers crossed!

Remeber that S3 Buckets need to be empty before they can be deleted. The good news is that most 3rd party tools automate this process. If you are running into problems with S3Fox, I recommend trying S3FM for GUI or S3Sync for command line. Amazon has a great article describing how to use S3Sync. After setting up your variables, the key command is
./s3cmd.rb deleteall <your bucket name>
Deleting buckets with lots of individual files tends to crash a lot of S3 tools because they try to display a list of all files in the directory. You need to find a way to delete in batches. The best GUI tool I've found for this purpose is Bucket Explorer. It deletes files in a S3 bucket in 1000 file chunks and does not crash when trying to open large buckets like s3Fox and S3FM.
I've also found a few scripts that you can use for this purpose. I haven't tried these scripts yet but they look pretty straightforward.
RUBY
require 'aws/s3'
AWS::S3::Base.establish_connection!(
:access_key_id => 'your access key',
:secret_access_key => 'your secret key'
)
bucket = AWS::S3::Bucket.find('the bucket name')
while(!bucket.empty?)
begin
puts "Deleting objects in bucket"
bucket.objects.each do |object|
object.delete
puts "There are #{bucket.objects.size} objects left in the bucket"
end
puts "Done deleting objects"
rescue SocketError
puts "Had socket error"
end
end
PERL
#!/usr/bin/perl
use Net::Amazon::S3;
my $aws_access_key_id = 'your access key';
my $aws_secret_access_key = 'your secret access key';
my $increment = 50; # 50 at a time
my $bucket_name = 'bucket_name';
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id, aws_secret_access_key => $aws_secret_access_key, retry => 1, });
my $bucket = $s3->bucket($bucket_name);
print "Incrementally deleting the contents of $bucket_name\n";
my $deleted = 1;
my $total_deleted = 0;
while ($deleted > 0) {
print "Loading up to $increment keys...\n";
$response = $bucket->list({'max-keys' => $increment, }) or die $s3->err . ": " . $s3->errstr . "\n";
$deleted = scalar(#{ $response->{keys} }) ;
$total_deleted += $deleted;
print "Deleting $deleted keys($total_deleted total)...\n";
foreach my $key ( #{ $response->{keys} } ) {
my $key_name = $key->{key};
$bucket->delete_key($key->{key}) or die $s3->err . ": " . $s3->errstr . "\n";
}
}
print "Deleting bucket...\n";
$bucket->delete_bucket or die $s3->err . ": " . $s3->errstr;
print "Done.\n";
SOURCE: Tarkblog
Hope this helps!

recent versions of s3cmd have --recursive
e.g.,
~/$ s3cmd rb --recursive s3://bucketwithfiles
http://s3tools.org/kb/item5.htm

With s3cmd:
Create a new empty directory
s3cmd sync --delete-removed empty_directory s3://yourbucket

This may be a bug in S3Fox, because it is generally able to delete items recursively. However, I'm not sure if I've ever tried to delete a whole bucket and its contents at once.
The JetS3t project, as mentioned by Stu, includes a Java GUI applet you can easily run in a browser to manage your S3 buckets: Cockpit. It has both strengths and weaknesses compared to S3Fox, but there's a good chance it will help you deal with your troublesome bucket. Though it will require you to delete the objects first, then the bucket.
Disclaimer: I'm the author of JetS3t and Cockpit

SpaceBlock also makes it simple to delete s3 buckets - right click bucket, delete, wait for job to complete in transfers view, done.
This is the free and open source windows s3 front-end that I maintain, so shameless plug alert etc.

I've implemented bucket-destroy, a multi threaded utility that does everything it takes to delete a bucket. I handle non empty buckets, as well as version enabled bucket keys.
You can read the blog post here http://bytecoded.blogspot.com/2011/01/recursive-delete-utility-for-version.html and the instructions here http://code.google.com/p/bucket-destroy/
I've successfully deleted with it a bucket that contains double '//' in the key name, versioned key and DeleteMarker keys. Currently I'm running it on a bucket that contains ~40,000,000 so far I've been able to delete 1,200,000 in several hours on m1.large. Note that the utility is multi threaded but does not (yet) implemented shuffling (which will horizontal scaling, launching the utility on several machines).

If you use amazon's console and on a one-time basis need to clear out a bucket: You can browse to your bucket then select the top key then scroll to the bottom and then press shift on your keyboard then click on the bottom one. It will select all in between then you can right click and delete.

If you have ruby (and rubygems) installed, install aws-s3 gem with
gem install aws-s3
or
sudo gem install aws-s3
create a file delete_bucket.rb:
require "rubygems" # optional
require "aws/s3"
AWS::S3::Base.establish_connection!(
:access_key_id => 'access_key_id',
:secret_access_key => 'secret_access_key')
AWS::S3::Bucket.delete("bucket_name", :force => true)
and run it:
ruby delete_bucket.rb
Since Bucket#delete returned timeout exceptions a lot for me, I have expanded the script:
require "rubygems" # optional
require "aws/s3"
AWS::S3::Base.establish_connection!(
:access_key_id => 'access_key_id',
:secret_access_key => 'secret_access_key')
while AWS::S3::Bucket.find("bucket_name")
begin
AWS::S3::Bucket.delete("bucket_name", :force => true)
rescue
end
end

I guess the easiest way would be to use S3fm, a free online file manager for Amazon S3. No applications to install, no 3rd party web sites registrations. Runs directly from Amazon S3, secure and convenient.
Just select your bucket and hit delete.

One technique that can be used to avoid this problem is putting all objects in a "folder" in the bucket, allowing you to just delete the folder then go along and delete the bucket. Additionally, the s3cmd tool available from http://s3tools.org can be used to delete a bucket with files in it:
s3cmd rb --force s3://bucket-name

I hacked together a script for doing it from Python, it successfully removed my 9000 objects. See this page:
https://efod.se/blog/archive/2009/08/09/delete-s3-bucket

One more shameless plug: I got tired of waiting for individual HTTP delete requests when I had to delete 250,000 items, so I wrote a Ruby script that does it multithreaded and completes in a fraction of the time:
http://github.com/sfeley/s3nuke/
This is one that works much faster in Ruby 1.9 because of the way threads are handled.

This is a hard problem. My solution is at http://stuff.mit.edu/~jik/software/delete-s3-bucket.pl.txt. It describes all of the things I've determined can go wrong in a comment at the top. Here's the current version of the script (if I change it, I'll put a new version at the URL but probably not here).
#!/usr/bin/perl
# Copyright (c) 2010 Jonathan Kamens.
# Released under the GNU General Public License, Version 3.
# See <http://www.gnu.org/licenses/>.
# $Id: delete-s3-bucket.pl,v 1.3 2010/10/17 03:21:33 jik Exp $
# Deleting an Amazon S3 bucket is hard.
#
# * You can't delete the bucket unless it is empty.
#
# * There is no API for telling Amazon to empty the bucket, so you have to
# delete all of the objects one by one yourself.
#
# * If you've recently added a lot of large objects to the bucket, then they
# may not all be visible yet on all S3 servers. This means that even after the
# server you're talking to thinks all the objects are all deleted and lets you
# delete the bucket, additional objects can continue to propagate around the S3
# server network. If you then recreate the bucket with the same name, those
# additional objects will magically appear in it!
#
# It is not clear to me whether the bucket delete will eventually propagate to
# all of the S3 servers and cause all the objects in the bucket to go away, but
# I suspect it won't. I also suspect that you may end up continuing to be
# charged for these phantom objects even though the bucket they're in is no
# longer even visible in your S3 account.
#
# * If there's a CR, LF, or CRLF in an object name, then it's sent just that
# way in the XML that gets sent from the S3 server to the client when the
# client asks for a list of objects in the bucket. Unfortunately, the XML
# parser on the client will probably convert it to the local line ending
# character, and if it's different from the character that's actually in the
# object name, you then won't be able to delete it. Ugh! This is a bug in the
# S3 protocol; it should be enclosing the object names in CDATA tags or
# something to protect them from being munged by the XML parser.
#
# Note that this bug even affects the AWS Web Console provided by Amazon!
#
# * If you've got a whole lot of objects and you serialize the delete process,
# it'll take a long, long time to delete them all.
use threads;
use strict;
use warnings;
# Keys can have newlines in them, which screws up the communication
# between the parent and child processes, so use URL encoding to deal
# with that.
use CGI qw(escape unescape); # Easiest place to get this functionality.
use File::Basename;
use Getopt::Long;
use Net::Amazon::S3;
my $whoami = basename $0;
my $usage = "Usage: $whoami [--help] --access-key-id=id --secret-access-key=key
--bucket=name [--processes=#] [--wait=#] [--nodelete]
Specify --processes to indicate how many deletes to perform in
parallel. You're limited by RAM (to hold the parallel threads) and
bandwidth for the S3 delete requests.
Specify --wait to indicate seconds to require the bucket to be verified
empty. This is necessary if you create a huge number of objects and then
try to delete the bucket before they've all propagated to all the S3
servers (I've seen a huge backlog of newly created objects take *hours* to
propagate everywhere). See the comment at the top of the script for more
information about this issue.
Specify --nodelete to empty the bucket without actually deleting it.\n";
my($aws_access_key_id, $aws_secret_access_key, $bucket_name, $wait);
my $procs = 1;
my $delete = 1;
die if (! GetOptions(
"help" => sub { print $usage; exit; },
"access-key-id=s" => \$aws_access_key_id,
"secret-access-key=s" => \$aws_secret_access_key,
"bucket=s" => \$bucket_name,
"processess=i" => \$procs,
"wait=i" => \$wait,
"delete!" => \$delete,
));
die if (! ($aws_access_key_id && $aws_secret_access_key && $bucket_name));
my $increment = 0;
print "Incrementally deleting the contents of $bucket_name\n";
$| = 1;
my(#procs, $current);
for (1..$procs) {
my($read_from_parent, $write_to_child);
my($read_from_child, $write_to_parent);
pipe($read_from_parent, $write_to_child) or die;
pipe($read_from_child, $write_to_parent) or die;
threads->create(sub {
close($read_from_child);
close($write_to_child);
my $old_select = select $write_to_parent;
$| = 1;
select $old_select;
&child($read_from_parent, $write_to_parent);
}) or die;
close($read_from_parent);
close($write_to_parent);
my $old_select = select $write_to_child;
$| = 1;
select $old_select;
push(#procs, [$read_from_child, $write_to_child]);
}
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key,
retry => 1,
});
my $bucket = $s3->bucket($bucket_name);
my $deleted = 1;
my $total_deleted = 0;
my $last_start = time;
my($start, $waited);
while ($deleted > 0) {
$start = time;
print "\nLoading ", ($increment ? "up to $increment" :
"as many as possible")," keys...\n";
my $response = $bucket->list({$increment ? ('max-keys' => $increment) : ()})
or die $s3->err . ": " . $s3->errstr . "\n";
$deleted = scalar(#{ $response->{keys} }) ;
if (! $deleted) {
if ($wait and ! $waited) {
my $delta = $wait - ($start - $last_start);
if ($delta > 0) {
print "Waiting $delta second(s) to confirm bucket is empty\n";
sleep($delta);
$waited = 1;
$deleted = 1;
next;
}
else {
last;
}
}
else {
last;
}
}
else {
$waited = undef;
}
$total_deleted += $deleted;
print "\nDeleting $deleted keys($total_deleted total)...\n";
$current = 0;
foreach my $key ( #{ $response->{keys} } ) {
my $key_name = $key->{key};
while (! &send(escape($key_name) . "\n")) {
print "Thread $current died\n";
die "No threads left\n" if (#procs == 1);
if ($current == #procs-1) {
pop #procs;
$current = 0;
}
else {
$procs[$current] = pop #procs;
}
}
$current = ($current + 1) % #procs;
threads->yield();
}
print "Sending sync message\n";
for ($current = 0; $current < #procs; $current++) {
if (! &send("\n")) {
print "Thread $current died sending sync\n";
if ($current = #procs-1) {
pop #procs;
last;
}
$procs[$current] = pop #procs;
$current--;
}
threads->yield();
}
print "Reading sync response\n";
for ($current = 0; $current < #procs; $current++) {
if (! &receive()) {
print "Thread $current died reading sync\n";
if ($current = #procs-1) {
pop #procs;
last;
}
$procs[$current] = pop #procs;
$current--;
}
threads->yield();
}
}
continue {
$last_start = $start;
}
if ($delete) {
print "Deleting bucket...\n";
$bucket->delete_bucket or die $s3->err . ": " . $s3->errstr;
print "Done.\n";
}
sub send {
my($str) = #_;
my $fh = $procs[$current]->[1];
print($fh $str);
}
sub receive {
my $fh = $procs[$current]->[0];
scalar <$fh>;
}
sub child {
my($read, $write) = #_;
threads->detach();
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key,
retry => 1,
});
my $bucket = $s3->bucket($bucket_name);
while (my $key = <$read>) {
if ($key eq "\n") {
print($write "\n") or die;
next;
}
chomp $key;
$key = unescape($key);
if ($key =~ /[\r\n]/) {
my(#parts) = split(/\r\n|\r|\n/, $key, -1);
my(#guesses) = shift #parts;
foreach my $part (#parts) {
#guesses = (map(($_ . "\r\n" . $part,
$_ . "\r" . $part,
$_ . "\n" . $part), #guesses));
}
foreach my $guess (#guesses) {
if ($bucket->get_key($guess)) {
$key = $guess;
last;
}
}
}
$bucket->delete_key($key) or
die $s3->err . ": " . $s3->errstr . "\n";
print ".";
threads->yield();
}
return;
}

I am one of the Developer Team member of Bucket Explorer Team, We will provide different option to delete Bucket as per the users choice...
1) Quick Delete -This option will delete you data from bucket in chunks of 1000.
2) Permanent Delete-This option will Delete objects in queue.
How to delete Amazon S3 files and bucket?

Amazon recently added a new feature, "Multi-Object Delete", which allows up to 1,000 objects to be deleted at a time with a single API request. This should allow simplification of the process of deleting huge numbers of files from a bucket.
The documentation for the new feature is available here: http://docs.amazonwebservices.com/AmazonS3/latest/dev/DeletingMultipleObjects.html

I've always ended up using their C# API and little scripts to do this. I'm not sure why S3Fox can't do it, but that functionality appears to be broken within it at the moment. I'm sure that many of the other S3 tools can do it as well, though.

Delete all of the objects in the bucket first. Then you can delete the bucket itself.
Apparently, one cannot delete a bucket with objects in it and S3Fox does not do this for you.
I've had other little issues with S3Fox myself, like this, and now use a Java based tool, jets3t which is more forthcoming about error conditions. There must be others, too.

You must make sure you have correct write permission set for the bucket, and the bucket contains no objects.
Some useful tools that can assist your deletion: CrossFTP, view and delete the buckets like the FTP client. jets3t Tool as mentioned above.

I'll have to have a look at some of these alternative file managers. I've used (and like) BucketExplorer, which you can get from - surprisingly - http://www.bucketexplorer.com/.
It's a 30 day free trial, then (currently) costing US$49.99 per licence (US$49.95 on the purchase cover page).

Try https://s3explorer.appspot.com/ to manage your S3 account.

This is what I use. Just simple ruby code.
case bucket.size
when 0
puts "Nothing left to delete"
when 1..1000
bucket.objects.each do |item|
item.delete
puts "Deleting - #{bucket.size} left"
end
end

Use the amazon web managment console. With Google chrome for speed. Deleted the objects a lot faster than firefox (about 10 times faster). Had 60 000 objects to delete.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Getting Origami-pdf to work with Amazon S3 files - pdf

After a deep further analysis into the origami code, i noticed that PDF.Read accepts a binary file, and so instead of sending the local file path, we can send the file instance as a whole.

Related

Use Shrine to upload video files and generate thumbnails

How to convert raw emails (MIME) from AWS SES to Gmail?

create a (Prawn) PDF inside custom DelayedJob and upload it to S3?

How do I export PSDs as PNGs with py-appscript?

Delete Amazon S3 buckets? [closed]

Categories

Resources