How to ignore merge commits using libgit2sharp? - libgit2

I need to get a list of commits without the auto-merged commits done by Git.
How can this be done using libgit2sharp package?

A merge commit is a commit with more than one parent.
In order to do this with Git, one would issue, for instance, the following command which would list all the commits reachable by HEAD which are not merge commits.
git log --no-merges HEAD
Where --no-merges is documented as "Do not print commits with more than one parent. This is exactly the same as --max-parents=1.".
One can do the same thing with LibGit2Sharp with the following piece of code:
using (var repo = new Repository(path))
{
var allCommitsReachableByHead = repo.Commits;
const string RFC2822Format = "ddd dd MMM HH:mm:ss yyyy K";
foreach (var c in allCommitsReachableByHead)
{
if (c.Parents.Count() > 1)
{
continue;
}
Console.WriteLine("Author: {0} <{1}>", c.Author.Name, c.Author.Email);
Console.WriteLine("Date: {0}", c.Author.When.ToString(RFC2822Format, CultureInfo.InvariantCulture));
Console.WriteLine();
Console.WriteLine(c.Message);
Console.WriteLine();
}
}

Related

PLC4X:Exception during scraping of Job

I'm actually developing a project that read data from 19 PLCs Siemens S1500 and 1 modicon. I have used the scraper tool following this tutorial:
PLC4x scraper tutorial
but when the scraper is working for a little amount of time I get the following exception:
I have changed the scheduled time between 1 to 100 and I always get the same exception when the scraper reach the same number of received messages.
I have tested if using PlcDriverManager instead of PooledPlcDriverManager could be a solution but the same problem persists.
In my pom.xml I use the following dependency:
<dependency>
<groupId>org.apache.plc4x</groupId>
<artifactId>plc4j-scraper</artifactId>
<version>0.7.0</version>
</dependency>
I have tried to change the version to an older one like 0.6.0 or 0.5.0 but the problem still persists.
If I use the modicon (Modbus TCP) I also get this exception after a little amount of time.
Anyone knows why is happening this error? Thanks in advance.
Edit: With the scraper version 0.8.0-SNAPSHOT I continue having this problem.
Edit2: This is my code, I think the problem can be that in my scraper I am opening a lot of connections and when it reaches 65526 messages it fails. But since all the processing is happenning inside the lambda function and I'm using a PooledPlcDriverManager, I think the scraper is using only one connection so I dont know where is the mistake.
try {
// Create a new PooledPlcDriverManager
PlcDriverManager S7_plcDriverManager = new PooledPlcDriverManager();
// Trigger Collector
TriggerCollector S7_triggerCollector = new TriggerCollectorImpl(S7_plcDriverManager);
// Messages counter
AtomicInteger messagesCounter = new AtomicInteger();
// Configure the scraper, by binding a Scraper Configuration, a ResultHandler and a TriggerCollector together
TriggeredScraperImpl S7_scraper = new TriggeredScraperImpl(S7_scraperConfig, (jobName, sourceName, results) -> {
LinkedList<Object> S7_results = new LinkedList<>();
messagesCounter.getAndIncrement();
S7_results.add(jobName);
S7_results.add(sourceName);
S7_results.add(results);
logger.info("Array: " + String.valueOf(S7_results));
logger.info("MESSAGE number: " + messagesCounter);
// Producer topics routing
String topic = "s7" + S7_results.get(1).toString().substring(S7_results.get(1).toString().indexOf("S7_SourcePLC") + 9 , S7_results.get(1).toString().length());
String key = parseKey_S7("s7");
String value = parseValue_S7(S7_results.getLast().toString(),S7_results.get(1).toString());
logger.info("------- PARSED VALUE -------------------------------- " + value);
// Create my own Kafka Producer
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
// Send Data to Kafka - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger.info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger.error("Error while producing", e);
}
}
});
}, S7_triggerCollector);
S7_scraper.start();
S7_triggerCollector.start();
} catch (ScraperException e) {
logger.error("Error starting the scraper (S7_scrapper)", e);
}
So in the end indeed it was the PLC that was simply hanging up the connection randomly. However the NiFi integration should have handled this situation more gracefully. I implemented a fix for this particular error ... could you please give version 0.8.0-SNAPSHOT a try (or use 0.8.0 if we happen to have released it already)

Git In-Memory repository update target

I'm still fairly new to the whole libgit2 and libgit2sharp codebases, but I've been trying to tackle the issue of In-Memory repositories and Refdb storage of references.
I've gotten everything almost working in my new branch:
Create the In-Memory repository and attach Refdb and Odb instance to it.
Create initial commit and set the reference for refs/heads/master.
Create the second commit...
The problem I run into now is updating refs/heads/master to the second commit. The UpdateTarget call to refs/heads/master runs into an error in git_reference_set_target:
LibGit2Sharp.NameConflictException : config value 'user.name' was not found
at LibGit2Sharp.Core.Ensure.HandleError(Int32 result) in \libgit2sharp\LibGit2Sharp\Core\Ensure.cs:line 154
at LibGit2Sharp.Core.Ensure.ZeroResult(Int32 result) in \libgit2sharp\LibGit2Sharp\Core\Ensure.cs:line 172
at LibGit2Sharp.Core.Proxy.git_reference_set_target(ReferenceHandle reference, ObjectId id, String logMessage) in \libgit2sharp\LibGit2Sharp\Core\Proxy.cs:line 2042
at LibGit2Sharp.ReferenceCollection.UpdateDirectReferenceTarget(Reference directRef, ObjectId targetId, String logMessage) in \libgit2sharp\LibGit2Sharp\ReferenceCollection.cs:line 476
at LibGit2Sharp.ReferenceCollection.UpdateTarget(Reference directRef, ObjectId targetId, String logMessage) in \libgit2sharp\LibGit2Sharp\ReferenceCollection.cs:line 470
at LibGit2Sharp.ReferenceCollection.UpdateTarget(Reference directRef, String objectish, String logMessage) in \libgit2sharp\LibGit2Sharp\ReferenceCollection.cs:line 498
at LibGit2Sharp.ReferenceCollection.UpdateTarget(String name, String canonicalRefNameOrObjectish, String logMessage) in \libgit2sharp\LibGit2Sharp\ReferenceCollection.cs:line 534
at LibGit2Sharp.ReferenceCollection.UpdateTarget(String name, String canonicalRefNameOrObjectish) in \libgit2sharp\LibGit2Sharp\ReferenceCollection.cs:line 565
at LibGit2Sharp.Tests.RepositoryFixture.CanCreateInMemoryRepositoryWithBackends() in \libgit2sharp\LibGit2Sharp.Tests\RepositoryFixture.cs:line 791
I have not been able to debug down into the libgit2 level, but from what I can tell the issue is in git_reference_create_matching call to git_reference__log_signature.
Is there a call I am missing that can update a reference in a Bare repo that doesn't require a signature? If any libgit2 guys know of how to do this in libgit2, I can implement it on the C# side.
As a test, I created two unit tests that perform the same actions In-Memory and on disk, and the In-Memory fails when calling UpdateTarget after creating the second commit. This follows the code on the wiki:
private Commit CreateCommit(Repository repository, string fileName, string content, string message = null)
{
if (message == null)
{
message = "i'm a commit message :)";
}
Blob newBlob = repository.ObjectDatabase.CreateBlobFromContent(content);
// Put the blob in a tree
TreeDefinition td = new TreeDefinition();
td.Add(fileName, newBlob, Mode.NonExecutableFile);
Tree tree = repository.ObjectDatabase.CreateTree(td);
// Committer and author
Signature committer = new Signature("Auser", "auser#example.com", DateTime.Now);
Signature author = committer;
// Create binary stream from the text
return repository.ObjectDatabase.CreateCommit(
author,
committer,
message,
tree,
repository.Commits,
true);
}
[Fact]
public void CanCreateRepositoryWithoutBackends()
{
SelfCleaningDirectory scd = BuildSelfCleaningDirectory();
Repository.Init(scd.RootedDirectoryPath, true);
ObjectId commit1Id;
using (var repository = new Repository(scd.RootedDirectoryPath))
{
Commit commit1 = CreateCommit(repository, "filePath.txt", "Hello commit 1!");
commit1Id = commit1.Id;
repository.Refs.Add("refs/heads/master", commit1.Id);
Assert.Equal(1, repository.Commits.Count());
Assert.NotNull(repository.Refs.Head);
Assert.Equal(1, repository.Refs.Count());
}
using (var repository = new Repository(scd.RootedDirectoryPath))
{
Commit commit2 = CreateCommit(repository, "filePath.txt", "Hello commit 2!");
Assert.Equal(commit1Id, commit2.Parents.First().Id);
repository.Refs.UpdateTarget("refs/heads/master", commit2.Sha);
Assert.Equal(2, repository.Commits.Count());
Assert.Equal(1, repository.Refs.Count());
Assert.NotNull(repository.Refs.Head);
Assert.Equal(commit2.Sha, repository.Refs.Head.ResolveToDirectReference().TargetIdentifier);
}
}
[Fact]
public void CanCreateInMemoryRepositoryWithBackends()
{
OdbBackendFixture.MockOdbBackend odbBackend = new OdbBackendFixture.MockOdbBackend();
RefdbBackendFixture.MockRefdbBackend refdbBackend = new RefdbBackendFixture.MockRefdbBackend();
ObjectId commit1Id;
using (var repository = new Repository())
{
repository.Refs.SetBackend(refdbBackend);
repository.ObjectDatabase.AddBackend(odbBackend, 5);
Commit commit1 = CreateCommit(repository, "filePath.txt", "Hello commit 1!");
commit1Id = commit1.Id;
repository.Refs.Add("refs/heads/master", commit1.Id);
Assert.Equal(1, repository.Commits.Count());
Assert.NotNull(repository.Refs.Head);
Assert.Equal(commit1.Sha, repository.Refs.Head.ResolveToDirectReference().TargetIdentifier);
// Emulating Git, repository.Refs enumerable does not include the HEAD.
// Thus, repository.Refs.Count will be 1 and refdbBackend.References.Count will be 2.
Assert.Equal(1, repository.Refs.Count());
Assert.Equal(2, refdbBackend.References.Count);
}
using (var repository = new Repository())
{
repository.Refs.SetBackend(refdbBackend);
repository.ObjectDatabase.AddBackend(odbBackend, 5);
Commit commit2 = CreateCommit(repository, "filePath.txt", "Hello commit 2!");
Assert.Equal(commit1Id, commit2.Parents.First().Id);
//repository.Refs.UpdateTarget(repository.Refs["refs/heads/master"], commit2.Id);
//var master = repository.Refs["refs/heads/master"];
//Assert.Equal(commit1Id.Sha, master.TargetIdentifier);
repository.Refs.UpdateTarget("refs/heads/master", commit2.Sha); // fails at LibGit2Sharp.Core.Proxy.git_reference_set_target(ReferenceHandle reference, ObjectId id, String logMessage)
//repository.Refs.Add("refs/heads/master", commit2.Id); // fails at LibGit2Sharp.Core.Proxy.git_reference_create(RepositoryHandle repo, String name, ObjectId targetId, Boolean allowOverwrite, String logMessage)
Assert.Equal(2, repository.Commits.Count());
Assert.Equal(1, repository.Refs.Count());
Assert.NotNull(repository.Refs.Head);
Assert.Equal(commit2.Sha, repository.Refs.Head.ResolveToDirectReference().TargetIdentifier);
}
}

JGit log strange behavior after merge

Found a strange behavior (bug?) in a log command.
The below test creates a repo, creates a branch, does some commits either to the created branch and to master, then merges master to the created branch. After merge it tries to calculate the number of commits between the branch and master. Because master has been already merged -- the branch is not behind the master, i.e. corresponding commit count should be 0.
public class JGitBugTest {
#Rule
public TemporaryFolder tempFolder = new TemporaryFolder();
#Test
public void testJGitLogBug() throws Exception {
final String BRANCH_NAME = "TST-2";
final String MASTER_BRANCH_NAME = Constants.MASTER;
File folder = tempFolder.newFolder();
// Create a Git repository
Git api = Git.init().setBare( false ).setDirectory( folder ).call();
Repository repository = api.getRepository();
// Add an initial commit
api.commit().setMessage( "Initial commit" ).call();
// Create a new branch and add some commits to it
api.checkout().setCreateBranch( true ).setName( BRANCH_NAME ).call();
api.commit().setMessage( "TST-2 Added files 1" ).call();
// Add some commits to master branch too
api.checkout().setName( MASTER_BRANCH_NAME ).call();
api.commit().setMessage( "TST-1 Added files 1" ).call();
api.commit().setMessage( "TST-1 Added files 2" ).call();
// If this delay is commented out -- test fails and
// 'behind' is equal to "the number of commits to master - 1".
// Thread.sleep(1000);
// Checkout the branch and merge master to it
api.checkout().setName( BRANCH_NAME ).call();
api.merge()
.include( repository.resolve( MASTER_BRANCH_NAME ) )
.setStrategy( MergeStrategy.RECURSIVE )
.call()
.getNewHead()
.name();
// Calculate the number of commits the branch behind of the master
// It should be zero because we have merged master into the branch already.
Iterable<RevCommit> iterable = api.log()
.add( repository.resolve( MASTER_BRANCH_NAME ) )
.not( repository.resolve( BRANCH_NAME ) )
.call();
int behind = 0;
for( RevCommit commit : iterable ) {
behind++;
}
Assert.assertEquals( 0, behind );
}
}
The above test fails, behind yields the number of commits in the master minus 1.
Moreover, if 'sleep' in line 43 is uncommented -- the bug will go away, and 'behind' is equal to 0.
What do I do wrong? Is it a bug in JGit library or in my code?
Running the code on Windows, I can reproduce what you describe.
This looks like a bug in JGit to me. I recommend to open a bugzilla or post your findings to the mailing list.

Switch / Checkout Branch ist not working

I'm using Version 0.19
I've a remote branch named 'dev'
after cloning i want to switch to this branch.
i found some code which performs an update to the branch. but for me it doesn't work.
I also try to run a checkout after this which also doesnt work.
When viewing the git log after the code i see the changesets of the master branch. But the local branch name is the name of the given name for the created branch (e.G. "dev")
what am i doing wrong?
private static Branch SwitchBranch(Repository repo, RepositoryProperties properties)
{
string branchname = properties.Branch;
Branch result = null;
if (!string.IsNullOrWhiteSpace(properties.Branch))
{
Branch remote = null;
foreach (var branch in repo.Branches)
{
if (string.Equals(branch.Name, "origin/" + branchname))
{
remote = branch;
break;
}
}
string localBranchName = properties.Branch;
Branch localbranch = repo.CreateBranch(localBranchName);
Branch updatedBranch = repo.Branches.Update(localbranch,
b =>
{
b.TrackedBranch = remote.CanonicalName;
});
repo.Checkout(updatedBranch);
result = updatedBranch;
}
return result;
}
The xml documentation of the CreateBranch() overload you're using states "Creates a branch with the specified name. This branch will point at the commit pointed at by Repository.Head".
From your question, it looks like you'd like this branch to also point to the same Commit than the remote tracking one.
As such, I'd suggest you to change your code as follows:
Branch localbranch = repo.CreateBranch(localBranchName, remote.Tip);
Be aware that you can only create the local branch once. So you're going to get an error the second time. At least, I did.
Branch localbranch = repo.Branches.FirstOrDefault(x => !x.IsRemote && x.FriendlyName.Equals(localBranchName));
if (localbranch == null)
{
localbranch = repo.CreateBranch(localBranchName, remote.Tip);
}
Branch updatedBranch = repo.Branches.Update(localbranch,
b =>
{
b.TrackedBranch = remote.CanonicalName;
});
repo.Checkout(updatedBranch);

Delete Amazon S3 buckets? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been interacting with Amazon S3 through S3Fox and I can't seem to delete my buckets. I select a bucket, hit delete, confirm the delete in a popup, and... nothing happens. Is there another tool that I should use?
It is finally possible to delete all the files in one go using the new Lifecycle (expiration) rules feature. You can even do it from the AWS console.
Simply right click on the bucket name in AWS console, select "Properties" and then in the row of tabs at the bottom of the page select "lifecycle" and "add rule". Create a lifecycle rule with the "Prefix" field set blank (blank means all files in the bucket, or you could set it to "a" to delete all files whose names begin with "a"). Set the "Days" field to "1". That's it. Done. Assuming the files are more than one day old they should all get deleted, then you can delete the bucket.
I only just tried this for the first time so I'm still waiting to see how quickly the files get deleted (it wasn't instant but presumably should happen within 24 hours) and whether I get billed for one delete command or 50 million delete commands... fingers crossed!
Remeber that S3 Buckets need to be empty before they can be deleted. The good news is that most 3rd party tools automate this process. If you are running into problems with S3Fox, I recommend trying S3FM for GUI or S3Sync for command line. Amazon has a great article describing how to use S3Sync. After setting up your variables, the key command is
./s3cmd.rb deleteall <your bucket name>
Deleting buckets with lots of individual files tends to crash a lot of S3 tools because they try to display a list of all files in the directory. You need to find a way to delete in batches. The best GUI tool I've found for this purpose is Bucket Explorer. It deletes files in a S3 bucket in 1000 file chunks and does not crash when trying to open large buckets like s3Fox and S3FM.
I've also found a few scripts that you can use for this purpose. I haven't tried these scripts yet but they look pretty straightforward.
RUBY
require 'aws/s3'
AWS::S3::Base.establish_connection!(
:access_key_id => 'your access key',
:secret_access_key => 'your secret key'
)
bucket = AWS::S3::Bucket.find('the bucket name')
while(!bucket.empty?)
begin
puts "Deleting objects in bucket"
bucket.objects.each do |object|
object.delete
puts "There are #{bucket.objects.size} objects left in the bucket"
end
puts "Done deleting objects"
rescue SocketError
puts "Had socket error"
end
end
PERL
#!/usr/bin/perl
use Net::Amazon::S3;
my $aws_access_key_id = 'your access key';
my $aws_secret_access_key = 'your secret access key';
my $increment = 50; # 50 at a time
my $bucket_name = 'bucket_name';
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id, aws_secret_access_key => $aws_secret_access_key, retry => 1, });
my $bucket = $s3->bucket($bucket_name);
print "Incrementally deleting the contents of $bucket_name\n";
my $deleted = 1;
my $total_deleted = 0;
while ($deleted > 0) {
print "Loading up to $increment keys...\n";
$response = $bucket->list({'max-keys' => $increment, }) or die $s3->err . ": " . $s3->errstr . "\n";
$deleted = scalar(#{ $response->{keys} }) ;
$total_deleted += $deleted;
print "Deleting $deleted keys($total_deleted total)...\n";
foreach my $key ( #{ $response->{keys} } ) {
my $key_name = $key->{key};
$bucket->delete_key($key->{key}) or die $s3->err . ": " . $s3->errstr . "\n";
}
}
print "Deleting bucket...\n";
$bucket->delete_bucket or die $s3->err . ": " . $s3->errstr;
print "Done.\n";
SOURCE: Tarkblog
Hope this helps!
recent versions of s3cmd have --recursive
e.g.,
~/$ s3cmd rb --recursive s3://bucketwithfiles
http://s3tools.org/kb/item5.htm
With s3cmd:
Create a new empty directory
s3cmd sync --delete-removed empty_directory s3://yourbucket
This may be a bug in S3Fox, because it is generally able to delete items recursively. However, I'm not sure if I've ever tried to delete a whole bucket and its contents at once.
The JetS3t project, as mentioned by Stu, includes a Java GUI applet you can easily run in a browser to manage your S3 buckets: Cockpit. It has both strengths and weaknesses compared to S3Fox, but there's a good chance it will help you deal with your troublesome bucket. Though it will require you to delete the objects first, then the bucket.
Disclaimer: I'm the author of JetS3t and Cockpit
SpaceBlock also makes it simple to delete s3 buckets - right click bucket, delete, wait for job to complete in transfers view, done.
This is the free and open source windows s3 front-end that I maintain, so shameless plug alert etc.
I've implemented bucket-destroy, a multi threaded utility that does everything it takes to delete a bucket. I handle non empty buckets, as well as version enabled bucket keys.
You can read the blog post here http://bytecoded.blogspot.com/2011/01/recursive-delete-utility-for-version.html and the instructions here http://code.google.com/p/bucket-destroy/
I've successfully deleted with it a bucket that contains double '//' in the key name, versioned key and DeleteMarker keys. Currently I'm running it on a bucket that contains ~40,000,000 so far I've been able to delete 1,200,000 in several hours on m1.large. Note that the utility is multi threaded but does not (yet) implemented shuffling (which will horizontal scaling, launching the utility on several machines).
If you use amazon's console and on a one-time basis need to clear out a bucket: You can browse to your bucket then select the top key then scroll to the bottom and then press shift on your keyboard then click on the bottom one. It will select all in between then you can right click and delete.
If you have ruby (and rubygems) installed, install aws-s3 gem with
gem install aws-s3
or
sudo gem install aws-s3
create a file delete_bucket.rb:
require "rubygems" # optional
require "aws/s3"
AWS::S3::Base.establish_connection!(
:access_key_id => 'access_key_id',
:secret_access_key => 'secret_access_key')
AWS::S3::Bucket.delete("bucket_name", :force => true)
and run it:
ruby delete_bucket.rb
Since Bucket#delete returned timeout exceptions a lot for me, I have expanded the script:
require "rubygems" # optional
require "aws/s3"
AWS::S3::Base.establish_connection!(
:access_key_id => 'access_key_id',
:secret_access_key => 'secret_access_key')
while AWS::S3::Bucket.find("bucket_name")
begin
AWS::S3::Bucket.delete("bucket_name", :force => true)
rescue
end
end
I guess the easiest way would be to use S3fm, a free online file manager for Amazon S3. No applications to install, no 3rd party web sites registrations. Runs directly from Amazon S3, secure and convenient.
Just select your bucket and hit delete.
One technique that can be used to avoid this problem is putting all objects in a "folder" in the bucket, allowing you to just delete the folder then go along and delete the bucket. Additionally, the s3cmd tool available from http://s3tools.org can be used to delete a bucket with files in it:
s3cmd rb --force s3://bucket-name
I hacked together a script for doing it from Python, it successfully removed my 9000 objects. See this page:
https://efod.se/blog/archive/2009/08/09/delete-s3-bucket
One more shameless plug: I got tired of waiting for individual HTTP delete requests when I had to delete 250,000 items, so I wrote a Ruby script that does it multithreaded and completes in a fraction of the time:
http://github.com/sfeley/s3nuke/
This is one that works much faster in Ruby 1.9 because of the way threads are handled.
This is a hard problem. My solution is at http://stuff.mit.edu/~jik/software/delete-s3-bucket.pl.txt. It describes all of the things I've determined can go wrong in a comment at the top. Here's the current version of the script (if I change it, I'll put a new version at the URL but probably not here).
#!/usr/bin/perl
# Copyright (c) 2010 Jonathan Kamens.
# Released under the GNU General Public License, Version 3.
# See <http://www.gnu.org/licenses/>.
# $Id: delete-s3-bucket.pl,v 1.3 2010/10/17 03:21:33 jik Exp $
# Deleting an Amazon S3 bucket is hard.
#
# * You can't delete the bucket unless it is empty.
#
# * There is no API for telling Amazon to empty the bucket, so you have to
# delete all of the objects one by one yourself.
#
# * If you've recently added a lot of large objects to the bucket, then they
# may not all be visible yet on all S3 servers. This means that even after the
# server you're talking to thinks all the objects are all deleted and lets you
# delete the bucket, additional objects can continue to propagate around the S3
# server network. If you then recreate the bucket with the same name, those
# additional objects will magically appear in it!
#
# It is not clear to me whether the bucket delete will eventually propagate to
# all of the S3 servers and cause all the objects in the bucket to go away, but
# I suspect it won't. I also suspect that you may end up continuing to be
# charged for these phantom objects even though the bucket they're in is no
# longer even visible in your S3 account.
#
# * If there's a CR, LF, or CRLF in an object name, then it's sent just that
# way in the XML that gets sent from the S3 server to the client when the
# client asks for a list of objects in the bucket. Unfortunately, the XML
# parser on the client will probably convert it to the local line ending
# character, and if it's different from the character that's actually in the
# object name, you then won't be able to delete it. Ugh! This is a bug in the
# S3 protocol; it should be enclosing the object names in CDATA tags or
# something to protect them from being munged by the XML parser.
#
# Note that this bug even affects the AWS Web Console provided by Amazon!
#
# * If you've got a whole lot of objects and you serialize the delete process,
# it'll take a long, long time to delete them all.
use threads;
use strict;
use warnings;
# Keys can have newlines in them, which screws up the communication
# between the parent and child processes, so use URL encoding to deal
# with that.
use CGI qw(escape unescape); # Easiest place to get this functionality.
use File::Basename;
use Getopt::Long;
use Net::Amazon::S3;
my $whoami = basename $0;
my $usage = "Usage: $whoami [--help] --access-key-id=id --secret-access-key=key
--bucket=name [--processes=#] [--wait=#] [--nodelete]
Specify --processes to indicate how many deletes to perform in
parallel. You're limited by RAM (to hold the parallel threads) and
bandwidth for the S3 delete requests.
Specify --wait to indicate seconds to require the bucket to be verified
empty. This is necessary if you create a huge number of objects and then
try to delete the bucket before they've all propagated to all the S3
servers (I've seen a huge backlog of newly created objects take *hours* to
propagate everywhere). See the comment at the top of the script for more
information about this issue.
Specify --nodelete to empty the bucket without actually deleting it.\n";
my($aws_access_key_id, $aws_secret_access_key, $bucket_name, $wait);
my $procs = 1;
my $delete = 1;
die if (! GetOptions(
"help" => sub { print $usage; exit; },
"access-key-id=s" => \$aws_access_key_id,
"secret-access-key=s" => \$aws_secret_access_key,
"bucket=s" => \$bucket_name,
"processess=i" => \$procs,
"wait=i" => \$wait,
"delete!" => \$delete,
));
die if (! ($aws_access_key_id && $aws_secret_access_key && $bucket_name));
my $increment = 0;
print "Incrementally deleting the contents of $bucket_name\n";
$| = 1;
my(#procs, $current);
for (1..$procs) {
my($read_from_parent, $write_to_child);
my($read_from_child, $write_to_parent);
pipe($read_from_parent, $write_to_child) or die;
pipe($read_from_child, $write_to_parent) or die;
threads->create(sub {
close($read_from_child);
close($write_to_child);
my $old_select = select $write_to_parent;
$| = 1;
select $old_select;
&child($read_from_parent, $write_to_parent);
}) or die;
close($read_from_parent);
close($write_to_parent);
my $old_select = select $write_to_child;
$| = 1;
select $old_select;
push(#procs, [$read_from_child, $write_to_child]);
}
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key,
retry => 1,
});
my $bucket = $s3->bucket($bucket_name);
my $deleted = 1;
my $total_deleted = 0;
my $last_start = time;
my($start, $waited);
while ($deleted > 0) {
$start = time;
print "\nLoading ", ($increment ? "up to $increment" :
"as many as possible")," keys...\n";
my $response = $bucket->list({$increment ? ('max-keys' => $increment) : ()})
or die $s3->err . ": " . $s3->errstr . "\n";
$deleted = scalar(#{ $response->{keys} }) ;
if (! $deleted) {
if ($wait and ! $waited) {
my $delta = $wait - ($start - $last_start);
if ($delta > 0) {
print "Waiting $delta second(s) to confirm bucket is empty\n";
sleep($delta);
$waited = 1;
$deleted = 1;
next;
}
else {
last;
}
}
else {
last;
}
}
else {
$waited = undef;
}
$total_deleted += $deleted;
print "\nDeleting $deleted keys($total_deleted total)...\n";
$current = 0;
foreach my $key ( #{ $response->{keys} } ) {
my $key_name = $key->{key};
while (! &send(escape($key_name) . "\n")) {
print "Thread $current died\n";
die "No threads left\n" if (#procs == 1);
if ($current == #procs-1) {
pop #procs;
$current = 0;
}
else {
$procs[$current] = pop #procs;
}
}
$current = ($current + 1) % #procs;
threads->yield();
}
print "Sending sync message\n";
for ($current = 0; $current < #procs; $current++) {
if (! &send("\n")) {
print "Thread $current died sending sync\n";
if ($current = #procs-1) {
pop #procs;
last;
}
$procs[$current] = pop #procs;
$current--;
}
threads->yield();
}
print "Reading sync response\n";
for ($current = 0; $current < #procs; $current++) {
if (! &receive()) {
print "Thread $current died reading sync\n";
if ($current = #procs-1) {
pop #procs;
last;
}
$procs[$current] = pop #procs;
$current--;
}
threads->yield();
}
}
continue {
$last_start = $start;
}
if ($delete) {
print "Deleting bucket...\n";
$bucket->delete_bucket or die $s3->err . ": " . $s3->errstr;
print "Done.\n";
}
sub send {
my($str) = #_;
my $fh = $procs[$current]->[1];
print($fh $str);
}
sub receive {
my $fh = $procs[$current]->[0];
scalar <$fh>;
}
sub child {
my($read, $write) = #_;
threads->detach();
my $s3 = Net::Amazon::S3->new({aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key,
retry => 1,
});
my $bucket = $s3->bucket($bucket_name);
while (my $key = <$read>) {
if ($key eq "\n") {
print($write "\n") or die;
next;
}
chomp $key;
$key = unescape($key);
if ($key =~ /[\r\n]/) {
my(#parts) = split(/\r\n|\r|\n/, $key, -1);
my(#guesses) = shift #parts;
foreach my $part (#parts) {
#guesses = (map(($_ . "\r\n" . $part,
$_ . "\r" . $part,
$_ . "\n" . $part), #guesses));
}
foreach my $guess (#guesses) {
if ($bucket->get_key($guess)) {
$key = $guess;
last;
}
}
}
$bucket->delete_key($key) or
die $s3->err . ": " . $s3->errstr . "\n";
print ".";
threads->yield();
}
return;
}
I am one of the Developer Team member of Bucket Explorer Team, We will provide different option to delete Bucket as per the users choice...
1) Quick Delete -This option will delete you data from bucket in chunks of 1000.
2) Permanent Delete-This option will Delete objects in queue.
How to delete Amazon S3 files and bucket?
Amazon recently added a new feature, "Multi-Object Delete", which allows up to 1,000 objects to be deleted at a time with a single API request. This should allow simplification of the process of deleting huge numbers of files from a bucket.
The documentation for the new feature is available here: http://docs.amazonwebservices.com/AmazonS3/latest/dev/DeletingMultipleObjects.html
I've always ended up using their C# API and little scripts to do this. I'm not sure why S3Fox can't do it, but that functionality appears to be broken within it at the moment. I'm sure that many of the other S3 tools can do it as well, though.
Delete all of the objects in the bucket first. Then you can delete the bucket itself.
Apparently, one cannot delete a bucket with objects in it and S3Fox does not do this for you.
I've had other little issues with S3Fox myself, like this, and now use a Java based tool, jets3t which is more forthcoming about error conditions. There must be others, too.
You must make sure you have correct write permission set for the bucket, and the bucket contains no objects.
Some useful tools that can assist your deletion: CrossFTP, view and delete the buckets like the FTP client. jets3t Tool as mentioned above.
I'll have to have a look at some of these alternative file managers. I've used (and like) BucketExplorer, which you can get from - surprisingly - http://www.bucketexplorer.com/.
It's a 30 day free trial, then (currently) costing US$49.99 per licence (US$49.95 on the purchase cover page).
Try https://s3explorer.appspot.com/ to manage your S3 account.
This is what I use. Just simple ruby code.
case bucket.size
when 0
puts "Nothing left to delete"
when 1..1000
bucket.objects.each do |item|
item.delete
puts "Deleting - #{bucket.size} left"
end
end
Use the amazon web managment console. With Google chrome for speed. Deleted the objects a lot faster than firefox (about 10 times faster). Had 60 000 objects to delete.