Redis DEL many keys vs pipeline - are both non-blocking? - redis

I want to delete lots of keys from Redis. I have all key names "on hand", no need to search them. I consider 2 options:
Use DEL command passing multiple keys
Use lots of DEL commands in a single pipeline
I've done some performance testing on local machine and it appears that DEL with multiple keys (option 1) is almost 10 times faster than a pipeline.
What I'm worried about is that because of DEL being atomic, when I delete for example 10k keys with one command, it'll be much faster, but it'll block Redis during this single DEL command execution. On the opposite side, pipeline is slower but it does commands one-by-one, so no blocks for other clients of that same Redis.
I can't find a definite answer on whether DEL with multiple keys will block until all keys are deleted. My tests show that it doesn't block, but I don't understand why - it's kinda contradicts documentation.
Test code:
let data = await setSomeDataInRedis();
let keys = data.map(([k]) => k);
const startDel = Date.now();
console.log('\n== del[] ==')
// Sending 3 request using 2 connections to Redis in parallel.
// Well, almost in parallel 'cause it's Node.js and also network is serial.
await Promise.all([
redisClient1.hmget(['always-there', 'name']) // I expect this one returns 1st as it comes before DEL
.then((r) => console.log(`get-before-del[] - ${Date.now() - startDel}ms - ${JSON.stringify(r)}`)),
redisClient2.del(keys) // I expect this one returns 2nd and block 3rd on Redis side
.then(() => console.log(`del[] - ${Date.now() - startDel}ms`)),
redisClient1.hmget(['always-there', 'name']) // I expect this one returns 3rd because 2nd blocks Redis
.then((r) => console.log(`get-after-del[] - ${Date.now() - startDel}ms - ${JSON.stringify(r)}`)),
])
console.log('\n== pipeline ==')
// Same here - sending 3 request using 2 connections to Redis in parallel.
data = await setSomeDataInRedis();
const pipeline = redisClient2.pipeline();
data.forEach(([k]) => pipeline.del(k));
const startPipe = Date.now();
await Promise.all([
redisClient1.hmget(['always-there', 'name']) // I expect this one returns 1st as it comes before DEL
.then((r) => console.log(`get-before-pipeline - ${Date.now() - startPipe}ms - ${JSON.stringify(r)}`)),
pipeline.exec() // I expect this one returns 3rd as it has lots of commands, but non-blocking
.then(() => console.log(`pipeline - ${Date.now() - startPipe}ms`)),
redisClient1.hmget(['always-there', 'name']) // I expect this one returns 2nd as pipeline is non-blocking
.then((r) => console.log(`get-after-pipeline - ${Date.now() - startPipe}ms - ${JSON.stringify(r)}`)),
])
console.log('\n== unlink[] ==')
data = await setSomeDataInRedis();
keys = data.map(([k]) => k);
const startUnlink = Date.now();
await Promise.all([
redisClient1.hmget('always-there', 'name')
.then((r) => console.log(`get before unlink[] - ${Date.now() - startUnlink}ms - ${JSON.stringify(r)}`)),
redisClient2.unlink(keys)
.then(() => console.log(`unlink[] - ${Date.now() - startUnlink}ms`)),
redisClient1.hmget(['always-there', 'name'])
.then((r) => console.log(`get after unlink[] - ${Date.now() - startUnlink}ms - ${JSON.stringify(r)}`)),
])
Test output:
== del[] ==
get-before-del[] - 62ms - ["name0.06551250522960261"]
get-after-del[] - 64ms - ["name0.06551250522960261"]
del[] - 183ms
== pipeline ==
get-before-pipeline - 127ms - ["name0.9763696909778301"]
get-after-pipeline - 131ms - ["name0.9763696909778301"]
pipeline - 1325ms
== unlink[] ==
get before unlink[] - 41ms - ["name0.25533683439953236"]
get after unlink[] - 47ms - ["name0.25533683439953236"]
unlink[] - 153ms
The fact that get-after-del[] comes before del[] kinda shows that DEL with multiple keys is non-blocking (non-atomic), but this kinda contradicts Redis' docs.
Edit: I intentionally didn't add unlink in original version of the question as tests show that it's faster (10-20%), but unlink doesn't solve the initial problem - it'll still block Redis for significant amount of time. But given that the 1st answer is an unlink recommendation I'm adding this note and tests for unlink.

whether DEL with multiple keys will block until all keys are deleted
Yes, it will block, unless your Redis (since Redis 4.0) is configured as lazyfree-lazy-user-del. Check this for more info.
My tests show that it doesn't block, but I don't understand why - it's kinda contradicts documentation.
First of all, check the config mentioned above. Secondly, your test is not accurate. Since it async calls, it depends on how and when the then part is scheduled. Also, it depends on whether the client has a connection pool to send commands to Redis. For example, the second hmget and del might be sent with different connections, and the second hmget might reach Redis earlier. I'm not familiar with your client, correct me, if I'm wrong.
Also, you'd better use unlink instead of del, when you have many keys to be deleted. Check this for detail.

Related

How to mock current time in Perl 6?

In Perl 5 one can easily simulate script running at specific timestamp:
BEGIN {
*CORE::GLOBAL::time = sub () { $::time_mock // CORE::time };
}
use Test;
$::time_mock = 1545667200;
ok is-xmas, 'yay!';
$::time_mock = undef; # back to current time
And this works globally - every package, every method, everything that uses time() will see 1545667200 timestamp. Which is very convenient for testing time sensitive logic.
Is there a way to reproduce such behavior in Perl 6?
Here's a way to change how the "now" term works, and you may want to do the same thing to "time" (though time returns an Int, not an Instant object).
&term:<now>.wrap(
-> |, :$set-mock-time {
state $mock-time;
$mock-time = Instant.from-posix($_) with $set-mock-time;
$mock-time // callsame
});
say now; # Instant:1542374103.625357
sleep 0.5;
say now; # Instant:1542374104.127774
term:<now>(set-mock-time => 12345);
say now; # Instant:12355
sleep 0.5;
say now; # Instant:12355
On top of that, depending on what exactly you want to do, perhaps Test::Scheduler could be interesting: https://github.com/jnthn/p6-test-scheduler
There are Test::Time on the ecosystem (https://github.com/FCO/test-time)

Redis - Reliable queue pattern with multiple pops in one request

I have implemented Redis's reliable queue pattern using BRPOPLPUSH because I want to avoid polling.
However this results in a network request for each item. How can I augment this so that a worker BRPOPLPUSH'es multiple entries at once?
While BRPOPLPUSH is blocking version of RPOPLSPUSH and do not support transactions and you cant handle multiple entries. Also you cant use LUA for this purposes because of LUA execution nature: server would be blocked for new requests before LUA script has finished.
You can use application side logic to resolve queue pattern you need. Pseudo language
func MyBRPOPLPUSH(source, dest, maxItems = 1, timeOutTime = 0) {
items = []
timeOut = time() + timeOutTime
while ((timeOut > 0 && time() < timeOut) || items.count < maxItems) {
item = redis.RPOPLSPUSH(source, dest)
if (item == nil) {
sleep(someTimeHere);
continue;
}
items.add(item)
}

Why Redis keys are not expiring?

I have checked these questions but they did not help me to fix my issue. I am using Redis as a key value store for Rate Limiting in my Spring REST application using spring-data-redis library. I test with huge load. In that I use the following code to store a key and I am setting the expire time as well. Most of the time the key expires as expected. But some times the key is not expiring!
code snippet
RedisAtomicInteger counter = counter = new RedisAtomicInteger("mykey");
counter.expire(1, TimeUnit.MINUTES);
I checked the availability of the keys using redis-cli tool
keys *
and
ttl keyname
redis.conf having default values.
Any suggestions ?
Edit 1:
Full code:
The function is in an Aspect
public synchronized Object checkLimit(ProceedingJoinPoint joinPoint) throws Exception, Throwable {
boolean isKeyAvailable = false;
List<String> keysList = new ArrayList<>();
Object[] obj = joinPoint.getArgs();
String randomKey = (String) obj[1];
int randomLimit = (Integer) obj[2];
// for RedisTemplate it is already loaded as
// #Autowired
// private RedisTemplate template;
// in this class
Set<String> redisKeys = template.keys(randomKey+"_"randomLimit+"*");
Iterator<String> it = redisKeys.iterator();
while (it.hasNext()) {
String data = it.next();
keysList.add(data);
}
if (keysList.size() > 0) {
isKeyAvailable = keysList.get(0).contains(randomKey + "_" + randomLimit);
}
RedisAtomicInteger counter = null;
// if the key is not there
if (!isKeyAvailable) {
long expiryTimeStamp = 0;
int timePeriodInMintes = 1;
expiryTimeStamp = new Date(System.currentTimeMillis() + timePeriodInMintes * 60 * 1000).getTime();
counter = new RedisAtomicInteger(randomKey+ "_"+ randomLimit + "_" + expiryTimeStamp,template.getConnectionFactory());
counter.incrementAndGet();
counter.expire(timePeriodInMintes, TimeUnit.MINUTES);
break;
} else {
String[] keys = keysList.get(0).split("_");
String rLimit = keys[1];
counter = new RedisAtomicInteger(keysList.get(0), template.getConnectionFactory());
int count = counter.get();
// If count exceeds throw error
if (count != 0 && count >= Integer.parseInt(rLimit)) {
throw new Exception("Error");
}
else {
counter.incrementAndGet();
}
}
return joinPoint.proceed();
}
when these lines run
RedisAtomicInteger counter = counter = new RedisAtomicInteger("mykey");
counter.expire(1, TimeUnit.MINUTES);
I can see
75672562.380127 [0 10.0.3.133:65462] "KEYS" "mykey_1000*"
75672562.384267 [0 10.0.3.133:65462] "GET" "mykey_1000_1475672621787"
75672562.388856 [0 10.0.3.133:65462] "SET" "mykey_1000_1475672621787" "0"
75672562.391867 [0 10.0.3.133:65462] "INCRBY" "mykey_1000_1475672621787" "1"
75672562.395922 [0 10.0.3.133:65462] "PEXPIRE" "mykey_1000_1475672621787" "60000"
...
75672562.691723 [0 10.0.3.133:65462] "KEYS" "mykey_1000*"
75672562.695562 [0 10.0.3.133:65462] "GET" "mykey_1000_1475672621787"
75672562.695855 [0 10.0.3.133:65462] "GET" "mykey_1000_1475672621787"
75672562.696139 [0 10.0.3.133:65462] "INCRBY" "mykey_1000_1475672621787" "1"
in Redis log, when I "MONITOR" it in
Edit:
Now with the updated code I believe your methodology is fundamentally flawed aside from what you're reporting.
The way you've implemented it you require running KEYS in production - this is bad. As you scale out you will be causing a growing, and unnecessary, system blocking load on the server. As every bit of documentation on it says, do not use keys in production. Note that encoding the expiration time in the key name gives you no benefit. If you made that part of the key name a the creation timestamp, or even a random number nothing would change. Indeed, if you removed that bit, nothing would change.
A more sane route would instead be to use a keyname which is not time-dependent. The use of expiration handles that function for you. Let us call your rate-limited thing a "session". Your key name sans the timestamp is the "session ID". By setting an expiration of 60s on it, it will no longer be available at the 61s mark. So you can safely increment and compare the result to your limit without needing to know the current time or expiry time. All you need is a static key name and an appropriate expiration set on it.
If you INCR a non-existing key, Redis will return "1" meaning it created the key and incremented it in a single step/call. so basically the logic goes like this:
create "session" ID
increment counter using ID
compare result to limit
if count == 1, set expiration to 60s
id count > limit, reject
Step 3.1 is important. A count of 1 means this is a new key in Redis, and you want to set your expiration on it. Anything else means the expiration should already have been set. If you set it in 3.2 you will break the process because it will preserve the counter for more than 60s.
With this you don't need to have dynamic key names based on expiration time, and thus don't need to use keys to find out if there is an existing "session" for the rate-limited object. It also makes your code much simpler and predictable, as well as reduce round trips to Redis - meaning it will be lower load on Redis and perform better. As to how to do that w/the client library you're using I can't say because I'm not that familiar with it. But the basic sequence should be translatable to it as it is fairly basic and simple.
What you haven't shown, however, is anything to support the assertion that the expiration isn't happening. All you've done is show that Redis is indeed being told to and setting an expiration. In order to support your claim you need to show that the key does not expire. Which means you need to show retrieval of the key after the expiration time, and that the counter was not "reset" by being recreated after the expiration. One way you can see the expiration is happening is to use keyspace notifications. With that you will be able to see Redis saying a key was expired.
Where this process will fail a bit is if you do multiple windows for rate-limiting, or if you have a much larger window (ie. 10 minutes) in which case sorted sets might be a more sane option to prevent front-loading of requests - if desired. But as your example is written, the above will work just fine.

Ethereum private network mining

1) I setup a private ethereum network using the following command
$geth --genesis <genesis json file path> --datadir <some path to an empty
folder> --networkid 123 --nodiscover --maxpeers 0 console
2) Created an account
3) Then, started the miner using miner.start() command.
After a while ethers were getting added automatically to my account, but I don’t have any pending transaction in my private network. So from where could my miners are getting the ethers?
Even though, I didn’t instantiate any transactions in my network, I could see the some transaction being recorded in the logs, once I start the miner.
The log is as follows:
I0118 11:59:11.696523 9427 backend.go:584] Automatic pregeneration of ethash
DAG ON (ethash dir: /Users/minisha/.ethash)
I0118 11:59:11.696590 9427 backend.go:591] checking DAG (ethash dir:
/Users/minisha/.ethash)
I0118 11:59:11.696728 9427 miner.go:119] Starting mining operation (CPU=4
TOT=5)
true
> I0118 11:59:11.703907 9427 worker.go:570] commit new work on block 1 with 0
txs & 0 uncles. Took 7.109111ms
I0118 11:59:11.704083 9427 ethash.go:220] Generating DAG for epoch 0 (size
1073739904) (0000000000000000000000000000000000000000000000000000000000000000)
I0118 11:59:12.698679 9427 ethash.go:237] Done generating DAG for epoch 0, it
took 994.61107ms
I0118 11:59:15.163864 9427 worker.go:349]
And my genesis block code is as follows:
{
“nonce”: “0xdeadbeefdeadbeef”,
“timestamp”: “0x0”,
“parentHash”:
“0x0000000000000000000000000000000000000000000000000000000000000000”,
“extraData”: “0x0”,
“gasLimit”: “0x8000000”,
“difficulty”: “0x400”,
“mixhash”:
“0x0000000000000000000000000000000000000000000000000000000000000000”,
“coinbase”: “0x3333333333333333333333333333333333333333”,
“alloc”: {
}
}
Since my network is isolated and have only one node (no peers), I am quite confused with this behaviour. Any insights would be greatly appreciated.
Your client is mining empty blocks (containing no transactions) and getting rewards for mined blocks what is 5 ETH per block.
If you want to prevent empty block in your private blockchain you should consider using eth client (C++ implementation).
In case of geth client you can use a JavaScript script that will modify client's behavior. Any script can be loaded with js command: geth js script.js.
var mining_threads = 1
function checkWork() {
if (eth.getBlock("pending").transactions.length > 0) {
if (eth.mining) return;
console.log("== Pending transactions! Mining...");
miner.start(mining_threads);
} else {
miner.stop(0); // This param means nothing
console.log("== No transactions! Mining stopped.");
}
}
eth.filter("latest", function(err, block) { checkWork(); });
eth.filter("pending", function(err, block) { checkWork(); });
checkWork();
You can try EmbarkJS which can run geth client with mineWhenNeeded option on private network. It will only mine when new transactions come in.

Delete Amazon S3 buckets? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been interacting with Amazon S3 through S3Fox and I can't seem to delete my buckets. I select a bucket, hit delete, confirm the delete in a popup, and... nothing happens. Is there another tool that I should use?
It is finally possible to delete all the files in one go using the new Lifecycle (expiration) rules feature. You can even do it from the AWS console.
Simply right click on the bucket name in AWS console, select "Properties" and then in the row of tabs at the bottom of the page select "lifecycle" and "add rule". Create a lifecycle rule with the "Prefix" field set blank (blank means all files in the bucket, or you could set it to "a" to delete all files whose names begin with "a"). Set the "Days" field to "1". That's it. Done. Assuming the files are more than one day old they should all get deleted, then you can delete the bucket.
I only just tried this for the first time so I'm still waiting to see how quickly the files get deleted (it wasn't instant but presumably should happen within 24 hours) and whether I get billed for one delete command or 50 million delete commands... fingers crossed!
Remeber that S3 Buckets need to be empty before they can be deleted. The good news is that most 3rd party tools automate this process. If you are running into problems with S3Fox, I recommend trying S3FM for GUI or S3Sync for command line. Amazon has a great article describing how to use S3Sync. After setting up your variables, the key command is
./s3cmd.rb deleteall <your bucket name>
Deleting buckets with lots of individual files tends to crash a lot of S3 tools because they try to display a list of all files in the directory. You need to find a way to delete in batches. The best GUI tool I've found for this purpose is Bucket Explorer. It deletes files in a S3 bucket in 1000 file chunks and does not crash when trying to open large buckets like s3Fox and S3FM.
I've also found a few scripts that you can use for this purpose. I haven't tried these scripts yet but they look pretty straightforward.
RUBY
require 'aws/s3'
AWS::S3::Base.establish_connection!(
:access_key_id => 'your access key',
:secret_access_key => 'your secret key'
)
bucket = AWS::S3::Bucket.find('the bucket name')
while(!bucket.empty?)
begin
puts "Deleting objects in bucket"
bucket.objects.each do |object|
object.delete
puts "There are #{bucket.objects.size} objects left in the bucket"
end
puts "Done deleting objects"
rescue SocketError
puts "Had socket error"
end
end
PERL
#!/usr/bin/perl
use Net::Amazon::S3;
my $aws_access_key_id = 'your access key';
my $aws_secret_access_key = 'your secret access key';
my $increment = 50; # 50 at a time
my $bucket_name = 'bucket_name';
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id, aws_secret_access_key => $aws_secret_access_key, retry => 1, });
my $bucket = $s3->bucket($bucket_name);
print "Incrementally deleting the contents of $bucket_name\n";
my $deleted = 1;
my $total_deleted = 0;
while ($deleted > 0) {
print "Loading up to $increment keys...\n";
$response = $bucket->list({'max-keys' => $increment, }) or die $s3->err . ": " . $s3->errstr . "\n";
$deleted = scalar(#{ $response->{keys} }) ;
$total_deleted += $deleted;
print "Deleting $deleted keys($total_deleted total)...\n";
foreach my $key ( #{ $response->{keys} } ) {
my $key_name = $key->{key};
$bucket->delete_key($key->{key}) or die $s3->err . ": " . $s3->errstr . "\n";
}
}
print "Deleting bucket...\n";
$bucket->delete_bucket or die $s3->err . ": " . $s3->errstr;
print "Done.\n";
SOURCE: Tarkblog
Hope this helps!
recent versions of s3cmd have --recursive
e.g.,
~/$ s3cmd rb --recursive s3://bucketwithfiles
http://s3tools.org/kb/item5.htm
With s3cmd:
Create a new empty directory
s3cmd sync --delete-removed empty_directory s3://yourbucket
This may be a bug in S3Fox, because it is generally able to delete items recursively. However, I'm not sure if I've ever tried to delete a whole bucket and its contents at once.
The JetS3t project, as mentioned by Stu, includes a Java GUI applet you can easily run in a browser to manage your S3 buckets: Cockpit. It has both strengths and weaknesses compared to S3Fox, but there's a good chance it will help you deal with your troublesome bucket. Though it will require you to delete the objects first, then the bucket.
Disclaimer: I'm the author of JetS3t and Cockpit
SpaceBlock also makes it simple to delete s3 buckets - right click bucket, delete, wait for job to complete in transfers view, done.
This is the free and open source windows s3 front-end that I maintain, so shameless plug alert etc.
I've implemented bucket-destroy, a multi threaded utility that does everything it takes to delete a bucket. I handle non empty buckets, as well as version enabled bucket keys.
You can read the blog post here http://bytecoded.blogspot.com/2011/01/recursive-delete-utility-for-version.html and the instructions here http://code.google.com/p/bucket-destroy/
I've successfully deleted with it a bucket that contains double '//' in the key name, versioned key and DeleteMarker keys. Currently I'm running it on a bucket that contains ~40,000,000 so far I've been able to delete 1,200,000 in several hours on m1.large. Note that the utility is multi threaded but does not (yet) implemented shuffling (which will horizontal scaling, launching the utility on several machines).
If you use amazon's console and on a one-time basis need to clear out a bucket: You can browse to your bucket then select the top key then scroll to the bottom and then press shift on your keyboard then click on the bottom one. It will select all in between then you can right click and delete.
If you have ruby (and rubygems) installed, install aws-s3 gem with
gem install aws-s3
or
sudo gem install aws-s3
create a file delete_bucket.rb:
require "rubygems" # optional
require "aws/s3"
AWS::S3::Base.establish_connection!(
:access_key_id => 'access_key_id',
:secret_access_key => 'secret_access_key')
AWS::S3::Bucket.delete("bucket_name", :force => true)
and run it:
ruby delete_bucket.rb
Since Bucket#delete returned timeout exceptions a lot for me, I have expanded the script:
require "rubygems" # optional
require "aws/s3"
AWS::S3::Base.establish_connection!(
:access_key_id => 'access_key_id',
:secret_access_key => 'secret_access_key')
while AWS::S3::Bucket.find("bucket_name")
begin
AWS::S3::Bucket.delete("bucket_name", :force => true)
rescue
end
end
I guess the easiest way would be to use S3fm, a free online file manager for Amazon S3. No applications to install, no 3rd party web sites registrations. Runs directly from Amazon S3, secure and convenient.
Just select your bucket and hit delete.
One technique that can be used to avoid this problem is putting all objects in a "folder" in the bucket, allowing you to just delete the folder then go along and delete the bucket. Additionally, the s3cmd tool available from http://s3tools.org can be used to delete a bucket with files in it:
s3cmd rb --force s3://bucket-name
I hacked together a script for doing it from Python, it successfully removed my 9000 objects. See this page:
https://efod.se/blog/archive/2009/08/09/delete-s3-bucket
One more shameless plug: I got tired of waiting for individual HTTP delete requests when I had to delete 250,000 items, so I wrote a Ruby script that does it multithreaded and completes in a fraction of the time:
http://github.com/sfeley/s3nuke/
This is one that works much faster in Ruby 1.9 because of the way threads are handled.
This is a hard problem. My solution is at http://stuff.mit.edu/~jik/software/delete-s3-bucket.pl.txt. It describes all of the things I've determined can go wrong in a comment at the top. Here's the current version of the script (if I change it, I'll put a new version at the URL but probably not here).
#!/usr/bin/perl
# Copyright (c) 2010 Jonathan Kamens.
# Released under the GNU General Public License, Version 3.
# See <http://www.gnu.org/licenses/>.
# $Id: delete-s3-bucket.pl,v 1.3 2010/10/17 03:21:33 jik Exp $
# Deleting an Amazon S3 bucket is hard.
#
# * You can't delete the bucket unless it is empty.
#
# * There is no API for telling Amazon to empty the bucket, so you have to
# delete all of the objects one by one yourself.
#
# * If you've recently added a lot of large objects to the bucket, then they
# may not all be visible yet on all S3 servers. This means that even after the
# server you're talking to thinks all the objects are all deleted and lets you
# delete the bucket, additional objects can continue to propagate around the S3
# server network. If you then recreate the bucket with the same name, those
# additional objects will magically appear in it!
#
# It is not clear to me whether the bucket delete will eventually propagate to
# all of the S3 servers and cause all the objects in the bucket to go away, but
# I suspect it won't. I also suspect that you may end up continuing to be
# charged for these phantom objects even though the bucket they're in is no
# longer even visible in your S3 account.
#
# * If there's a CR, LF, or CRLF in an object name, then it's sent just that
# way in the XML that gets sent from the S3 server to the client when the
# client asks for a list of objects in the bucket. Unfortunately, the XML
# parser on the client will probably convert it to the local line ending
# character, and if it's different from the character that's actually in the
# object name, you then won't be able to delete it. Ugh! This is a bug in the
# S3 protocol; it should be enclosing the object names in CDATA tags or
# something to protect them from being munged by the XML parser.
#
# Note that this bug even affects the AWS Web Console provided by Amazon!
#
# * If you've got a whole lot of objects and you serialize the delete process,
# it'll take a long, long time to delete them all.
use threads;
use strict;
use warnings;
# Keys can have newlines in them, which screws up the communication
# between the parent and child processes, so use URL encoding to deal
# with that.
use CGI qw(escape unescape); # Easiest place to get this functionality.
use File::Basename;
use Getopt::Long;
use Net::Amazon::S3;
my $whoami = basename $0;
my $usage = "Usage: $whoami [--help] --access-key-id=id --secret-access-key=key
--bucket=name [--processes=#] [--wait=#] [--nodelete]
Specify --processes to indicate how many deletes to perform in
parallel. You're limited by RAM (to hold the parallel threads) and
bandwidth for the S3 delete requests.
Specify --wait to indicate seconds to require the bucket to be verified
empty. This is necessary if you create a huge number of objects and then
try to delete the bucket before they've all propagated to all the S3
servers (I've seen a huge backlog of newly created objects take *hours* to
propagate everywhere). See the comment at the top of the script for more
information about this issue.
Specify --nodelete to empty the bucket without actually deleting it.\n";
my($aws_access_key_id, $aws_secret_access_key, $bucket_name, $wait);
my $procs = 1;
my $delete = 1;
die if (! GetOptions(
"help" => sub { print $usage; exit; },
"access-key-id=s" => \$aws_access_key_id,
"secret-access-key=s" => \$aws_secret_access_key,
"bucket=s" => \$bucket_name,
"processess=i" => \$procs,
"wait=i" => \$wait,
"delete!" => \$delete,
));
die if (! ($aws_access_key_id && $aws_secret_access_key && $bucket_name));
my $increment = 0;
print "Incrementally deleting the contents of $bucket_name\n";
$| = 1;
my(#procs, $current);
for (1..$procs) {
my($read_from_parent, $write_to_child);
my($read_from_child, $write_to_parent);
pipe($read_from_parent, $write_to_child) or die;
pipe($read_from_child, $write_to_parent) or die;
threads->create(sub {
close($read_from_child);
close($write_to_child);
my $old_select = select $write_to_parent;
$| = 1;
select $old_select;
&child($read_from_parent, $write_to_parent);
}) or die;
close($read_from_parent);
close($write_to_parent);
my $old_select = select $write_to_child;
$| = 1;
select $old_select;
push(#procs, [$read_from_child, $write_to_child]);
}
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key,
retry => 1,
});
my $bucket = $s3->bucket($bucket_name);
my $deleted = 1;
my $total_deleted = 0;
my $last_start = time;
my($start, $waited);
while ($deleted > 0) {
$start = time;
print "\nLoading ", ($increment ? "up to $increment" :
"as many as possible")," keys...\n";
my $response = $bucket->list({$increment ? ('max-keys' => $increment) : ()})
or die $s3->err . ": " . $s3->errstr . "\n";
$deleted = scalar(#{ $response->{keys} }) ;
if (! $deleted) {
if ($wait and ! $waited) {
my $delta = $wait - ($start - $last_start);
if ($delta > 0) {
print "Waiting $delta second(s) to confirm bucket is empty\n";
sleep($delta);
$waited = 1;
$deleted = 1;
next;
}
else {
last;
}
}
else {
last;
}
}
else {
$waited = undef;
}
$total_deleted += $deleted;
print "\nDeleting $deleted keys($total_deleted total)...\n";
$current = 0;
foreach my $key ( #{ $response->{keys} } ) {
my $key_name = $key->{key};
while (! &send(escape($key_name) . "\n")) {
print "Thread $current died\n";
die "No threads left\n" if (#procs == 1);
if ($current == #procs-1) {
pop #procs;
$current = 0;
}
else {
$procs[$current] = pop #procs;
}
}
$current = ($current + 1) % #procs;
threads->yield();
}
print "Sending sync message\n";
for ($current = 0; $current < #procs; $current++) {
if (! &send("\n")) {
print "Thread $current died sending sync\n";
if ($current = #procs-1) {
pop #procs;
last;
}
$procs[$current] = pop #procs;
$current--;
}
threads->yield();
}
print "Reading sync response\n";
for ($current = 0; $current < #procs; $current++) {
if (! &receive()) {
print "Thread $current died reading sync\n";
if ($current = #procs-1) {
pop #procs;
last;
}
$procs[$current] = pop #procs;
$current--;
}
threads->yield();
}
}
continue {
$last_start = $start;
}
if ($delete) {
print "Deleting bucket...\n";
$bucket->delete_bucket or die $s3->err . ": " . $s3->errstr;
print "Done.\n";
}
sub send {
my($str) = #_;
my $fh = $procs[$current]->[1];
print($fh $str);
}
sub receive {
my $fh = $procs[$current]->[0];
scalar <$fh>;
}
sub child {
my($read, $write) = #_;
threads->detach();
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key,
retry => 1,
});
my $bucket = $s3->bucket($bucket_name);
while (my $key = <$read>) {
if ($key eq "\n") {
print($write "\n") or die;
next;
}
chomp $key;
$key = unescape($key);
if ($key =~ /[\r\n]/) {
my(#parts) = split(/\r\n|\r|\n/, $key, -1);
my(#guesses) = shift #parts;
foreach my $part (#parts) {
#guesses = (map(($_ . "\r\n" . $part,
$_ . "\r" . $part,
$_ . "\n" . $part), #guesses));
}
foreach my $guess (#guesses) {
if ($bucket->get_key($guess)) {
$key = $guess;
last;
}
}
}
$bucket->delete_key($key) or
die $s3->err . ": " . $s3->errstr . "\n";
print ".";
threads->yield();
}
return;
}
I am one of the Developer Team member of Bucket Explorer Team, We will provide different option to delete Bucket as per the users choice...
1) Quick Delete -This option will delete you data from bucket in chunks of 1000.
2) Permanent Delete-This option will Delete objects in queue.
How to delete Amazon S3 files and bucket?
Amazon recently added a new feature, "Multi-Object Delete", which allows up to 1,000 objects to be deleted at a time with a single API request. This should allow simplification of the process of deleting huge numbers of files from a bucket.
The documentation for the new feature is available here: http://docs.amazonwebservices.com/AmazonS3/latest/dev/DeletingMultipleObjects.html
I've always ended up using their C# API and little scripts to do this. I'm not sure why S3Fox can't do it, but that functionality appears to be broken within it at the moment. I'm sure that many of the other S3 tools can do it as well, though.
Delete all of the objects in the bucket first. Then you can delete the bucket itself.
Apparently, one cannot delete a bucket with objects in it and S3Fox does not do this for you.
I've had other little issues with S3Fox myself, like this, and now use a Java based tool, jets3t which is more forthcoming about error conditions. There must be others, too.
You must make sure you have correct write permission set for the bucket, and the bucket contains no objects.
Some useful tools that can assist your deletion: CrossFTP, view and delete the buckets like the FTP client. jets3t Tool as mentioned above.
I'll have to have a look at some of these alternative file managers. I've used (and like) BucketExplorer, which you can get from - surprisingly - http://www.bucketexplorer.com/.
It's a 30 day free trial, then (currently) costing US$49.99 per licence (US$49.95 on the purchase cover page).
Try https://s3explorer.appspot.com/ to manage your S3 account.
This is what I use. Just simple ruby code.
case bucket.size
when 0
puts "Nothing left to delete"
when 1..1000
bucket.objects.each do |item|
item.delete
puts "Deleting - #{bucket.size} left"
end
end
Use the amazon web managment console. With Google chrome for speed. Deleted the objects a lot faster than firefox (about 10 times faster). Had 60 000 objects to delete.