How to export file names and urls from s3 to csv? - amazon-s3

Is it possible to export all the files and their URLs from an S3 bucket to csv file?
I tried using this tool but it doesn't export URLs.

It is certainly possible using the Get Bucket command from the REST API. But you'll need to do something programmatically to parse the response for your asset names formatted into your CSV as you like. Since there is an API available most tools out there (like the one you've found) are not super rich with features.
GET / HTTP/1.1
Host: BucketName.s3.amazonaws.com
Date: Wed, 12 Oct 2009 17:50:00 GMT
Authorization: AWS AKIAIOSFODNN7EXAMPLE:xQE0diMbLRepdf3YB+FIEXAMPLE=
Content-Type: text/plain
Response example:
<?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>bucket</Name>
<Prefix/>
<Marker/>
<MaxKeys>1000</MaxKeys>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>my-image.jpg</Key>
<LastModified>2009-10-12T17:50:30.000Z</LastModified>
<ETag>"fba9dede5f27731c9771645a39863328"</ETag>
<Size>434234</Size>
<StorageClass>STANDARD</StorageClass>
<Owner>
<ID>75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a</ID>
<DisplayName>mtd#amazon.com</DisplayName>
</Owner>
</Contents>
<Contents>
<Key>my-third-image.jpg</Key>
<LastModified>2009-10-12T17:50:30.000Z</LastModified>
<ETag>"1b2cf535f27731c974343645a3985328"</ETag>
<Size>64994</Size>
<StorageClass>STANDARD</StorageClass>
<Owner>
<ID>75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a</ID>
<DisplayName>mtd#amazon.com</DisplayName>
</Owner>
</Contents> </ListBucketResult>

Thought this might help. Wrote a quick JS script to export the objects in an S3 Bucket to a JSON file instead of XML. Hope it helps!
https://github.com/springerkc/s3-bucket-exporter

Related

Unable to get the location of the bucket in s3

I am sending the following request
GET http://bucketname.s3.amazonaws.com/?location
Host: bucketname.s3.amazonaws.com
x-amz-date: 20170531T082529Z
Authorization: AWS
I am getting following error
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>?location</Key>
<RequestId>7E63980AAE</RequestId>
<HostId>HostID</HostId>
</Error>
Seems like it is trying to get the object with ?location name. but As per Amazon API page http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGETlocation.html
it should return the location of the bucket

AWS signature v4 authentication succeeds for EU bucket but fails for US bucket?

I recently implemented AWS Signature version 4 using the REST API. This is verified by an extensive regression test working perfectly.
The problem I'm experiencing is that the regression test succeeds when run against a bucket residing in the eu-central-1 region, but consistently fails with the Accessed Denied error message for buckets residing in us-east-1 or us-west-2.
Here are snippets from successful and failed attempts.
eu-central-1 : successful
HTTP request:
GET./
host:s3.eu-central-1.amazonaws.com.x-amz-content-sha256:e3b0...b855.x-amz-date:Wed, 25 May 2016 03:13:21 +0000
host;x-amz-content-sha256;x-amz-date.e3b0...b855
Signed string:
AWS4-HMAC-SHA256
Credential=AKIAJZN7UY6XHIZPWIKQ/20160525/eu-central-1/s3/aws4_request,
SignedHeaders=host;x-amz-content-sha256;x-amz-date,
Signature=cf5f...4dc8
Server response:
<?xml version="1.0" encoding="UTF-8"?>
<ListAllMyBucketsResult
xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Owner>
<ID>100a...a575</ID>
</Owner>
<Buckets>
<Bucket>
. . .
</Bucket>
</Buckets>
</ListAllMyBucketsResult>
us-east-1 : failed
HTTP request:
GET./
host:s3.us-east-1.amazonaws.com.x-amz-content-sha256:e3b0...b855.x-amz-date:Wed, 25 May 2016 03:02:27 +0000
host;x-amz-content-sha256;x-amz-date.e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Signed string:
AWS4-HMAC-SHA256
Credential=AKIAJZN7UY6XHIZPWIKQ/20160525/us-east-1/s3/aws4_request,
SignedHeaders=host;x-amz-content-sha256;x-amz-date,
Signature=01e97...4d00
Server response:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>92EEF2A86ECA88EF</RequestId>
<HostId>i3wTU6OzBrlX89xR4KnnezBx1Tb2IGN2wtgPJMRtKLjHxF/B6VdCQqPz1279J7e5</HostId>
</Error>
us-west-2 : failed
HTTP request:
GET./
host:s3.us-west-2.amazonaws.com.x-amz-content-sha256:e3b0...b855.x-amz-date:Wed, 25 May 2016 07:04:47 +0000
host;x-amz-content-sha256;x-amz-date.e3b0...b855
Signed string:
AWS4-HMAC-SHA256
Credential=AKIAJZN7UY6XHIZPWIKQ/20160525/us-west-2/s3/aws4_request,
SignedHeaders=host;x-amz-content-sha256;x-amz-date,
Signature=cf70...36b9
Server response:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>DB143DBF0F316EB8</RequestId>
<HostId>5hWJ0AHM466QcT+BK4UaEFpqXFNaJFEuAPlN/ZZPBhL+NDYBoGaySRkXQ3BRdyfy9PBDuSb0oHA=</HostId>
</Error>
Attempts made to date include:
I found references (like here) where when using US Standard (i.e., us-east-1) the REST endpoint should not include "us-east-1". I have not yet found this written officially. I therefore created a us-west-2 bucket, in the hope that the REST endpoint needs to contain "us-west-2", but that also fails.
I searched on Google and StackOverflow for possible reasons for "Access Denied", which led me to adding a bucket policy that gives permissions to all -- to no avail.
The permissions of the EU and US accounts in the AWS console look the same, so no hint there, yet.
I added logging to the buckets in the hope of seeing a failure entry, but nothing is logged until authentication is completed.
Does anyone have an idea why AWS v4 authentication will consistently succeed for an eu-central-1 bucket, but equally fail for us-east-1 and us-east-2 buckets?
Here's your issue.
For unknown reasons,¹ eu-central-1 is an oddball in S3. The REST endpoint works with two variations in hostname: bucket.s3.eu-central-1.amazonaws.com or bucket.s3-eu-central-1.amazonaws.com.
The difference is the dot or dash after s3.
All other regions (as of now) except us-east-1 and ap-northeast-2 (which is just like eu-central-1) work only with the dash after s3, e.g. bucket.s3-us-west-2.amazonaws.com... not with a dot.
And us-east-1 expects either bucket.s3.amazonaws.com or bucket.s3-external-1.amazonaws.com.
And finally, any region will work with just bucket.s3.amazonaws.com within a few minutes after the original creation of a bucket, because the DNS is integrated with the bucket location database and automatically routes requests to the right place, for each bucket.
But note that when you sign the requests, you always use the actual region name in the signing algorithm itself -- not the endpoint -- as you appear to already be doing.
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
¹I'll speculate that this convention is actually the "new normal" for new regions -- it's more consistent with other AWS services. S3 is one of the oldest, so it makes sense that legacy design decisions are more likely to exist, as seems to be the case, here.

Does Amazon S3 need time to update CORS settings? How long?

Recently I enabled Amazon S3 + CloudFront to serve as CDN for my rails application. In order to use font assets and display them in Firefox or IE, I have to enable CORS on my S3 bucket.
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>PUT</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
Then I used curl -I https://small-read-staging-assets.s3.amazonaws.com/staging/assets/settings_settings-312b7230872a71a534812e770ec299bb.js.gz, I got:
HTTP/1.1 200 OK
x-amz-id-2: Ovs0D578kzW1J72ej0duCi17lnw+wZryGeTw722V2XOteXOC4RoThU8t+NcXksCb
x-amz-request-id: 52E934392E32679A
Date: Tue, 04 Jun 2013 02:34:50 GMT
Cache-Control: public, max-age=31557600
Content-Encoding: gzip
Expires: Wed, 04 Jun 2014 08:16:26 GMT
Last-Modified: Tue, 04 Jun 2013 02:16:26 GMT
ETag: "723791e0c993b691c442970e9718d001"
Accept-Ranges: bytes
Content-Type: text/javascript
Content-Length: 39140
Server: AmazonS3
Should I see 'Access-Control-Allow-Origin' some where? Does S3 take time to update CORS settings? Can I force expiring headers if its caching them?
Try sending the Origin header:
$ curl -v -H "Origin: http://example.com" -X GET https://small-read-staging-assets.s3.amazonaws.com/staging/assets/settings_settings-312b7230872a71a534812e770ec299bb.js.gz > /dev/null
The output should then show the CORS response headers you are looking for:
< Access-Control-Allow-Origin: http://example.com
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Credentials: true
< Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
Additional information about how to debug CORS requests with cURL can be found here:
How can you debug a CORS request with cURL?
Note that there are different types of CORS requests (simple and preflight), a nice tutorial about the differences can be found here:
http://www.html5rocks.com/en/tutorials/cors/
Hope this helps!
Try these:
Try to scope-down the domain names you want to allow access to. S3 doesn't like *.
CloudFront + S3 doesn't handle the CORS configuration correctly out of the box. A kludge is to append a query string containing the name of the referring domain, and explicitly enable support for query strings in your CloudFront distribution settings.
To answer the actual question in the title:
No, S3 does not seem to take any time to propagate the CORS settings. (as of 2019)
However, if you're using Chrome (and maybe others), then CORS settings may be cached by the browser so you won't necessarily see the changes you expect if you just do an ordinary browser refresh. Instead right click on the refresh button and choose "Empty Cache and Hard Reload" (as of Chrome 73). Then the new CORS settings will take effect within <~5 seconds of making the change in the AWS console. (It may be much faster than that. Haven't tested.) This applies to a plain S3 bucket. I don't know how CloudFront affects things.
(I realize this question is 6 years old and may have involved additional technical issues that other people have long since answered, but when you search for the simple question of propagation times for CORS changes, this question is what pops up first, so I think it deserves an answer that addresses that.)
You have a few problems with the way you test CORS.
Your CORS configuration does not have a HEAD method.
Your curl command does not have -H header.
I am able to get your data by using curl like following. However they dumped garbage on my screen because your data is compressed binary.
curl --request GET https://small-read-staging-assets.s3.amazonaws.com/staging/assets/settings_settings-312b7230872a71a534812e770ec299bb.js.gz -H "http://google.com"

Making a SOAP request in Drupal?

I am trying to implement a SOAP call with Drupal 6 with the following format:
POST /0_5/ClassService.asmx HTTP/1.1
Host: api.mindbodyonline.com
Content-Type: text/xml; charset=utf-8
Content-Length: length
SOAPAction: "http://clients.mindbodyonline.com/api/0_5/AddClientsToClasses"
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<AddClientsToClasses xmlns="http://clients.mindbodyonline.com/api/0_5">
<Request>
<ClientIDs>
<string>string</string>
<string>string</string>
</ClientIDs>
<ClassIDs>
<int>int</int>
<int>int</int>
</ClassIDs>
<Test>boolean</Test>
<RequirePayment>boolean</RequirePayment>
</Request>
</AddClientsToClasses>
</soap:Body>
</soap:Envelope>
I am new to SOAP and all the web documentation doesn't work for Drupal. Also, I have to make this call in SOAP (not HTTP GET or POST).
How would I make a SOAP call in Drupal? Can you provide a working code example using the above example request format?
Drupal doesnt have any specific soap functionality - you can use the built in PHP client. There should be a WSDL file you can use to generate your soap client. Something like this:
<?php
$client = new SoapClient("http://localhost/code/soap.wsdl");
$something = $client->HelloWorld(array());
echo $something->HelloWorldResult;
die();
Refer to PHP's standard documentation http://php.net/manual/en/book.soap.php
Dude just use the module service 3 it contains all you need . you'll make a (REST, XMLRPC, JSON, JSON-RPC, SOAP, AMF) call also in order to do this in drupal pragmatically you must install soap server to drupal too ...
Follow this link to know more about service module .
http://drupal.org/project/services
this one of drupal amazing modules

What exactly will Amazon return if asked to list all the buckets?

As I have to parse it, I must know what the returning data will be structured into?
The GET operation on the Service endpoint (s3.amazonaws.com) returns a list of all of the buckets owned by the authenticated sender of the request.
Sample Request:
GET / HTTP/1.1
Host: s3.amazonaws.com
Date: Wed, 01 Mar 2009 12:00:00 GMT
Authorization: AWS 15B4D3461F177624206A:xQE0diMbLRepdf3YB+FIEXAMPLE=
Sample Response:
<?xml version="1.0" encoding="UTF-8"?>
<ListAllMyBucketsResult xmlns="http://doc.s3.amazonaws.com/2006-03-01">
<Owner>
<ID>bcaf1ffd86f461ca5fb16fd081034f</ID>
<DisplayName>webfile</DisplayName>
</Owner>
<Buckets>
<Bucket>
<Name>quotes;/Name>
<CreationDate>2006-02-03T16:45:09.000Z</CreationDate>
</Bucket>
<Bucket>
<Name>samples</Name>
<CreationDate>2006-02-03T16:41:58.000Z</CreationDate>
</Bucket>
</Buckets>
</ListAllMyBucketsResult>
Source: S3 REST API » Operations on the Service » GET Service
The S3 API is described here.