s3.exe: S3 PUT not working when bucket contains hyphens - amazon-s3

I'm trying to use s3.exe, a windows CLI for S3 from s3.codeplex.com, to PUT an object.
Here is the command I'm running:
c:\>s3 put My-Bucket file.txt /key:MYKEY /secret:MYSECRET
It returns: <403> Forbidden.
But when I try to PUT the file into a bucket without a hypen, it works.
c:\>s3 put MyNoHyphenBucket file.txt /key:MYKEY /secret:MYSECRET
Can someone else try it and see if they have the same issue? Any help on how to get it working with hyphenated bucket names would be greatly appreciated.
I'd be open to trying alternative s3 CLI for Windows.

Are you using an EU or NA bucket?
I found this:
"European Bucket allows only lower case letters. Although Buckets created in the US may contain lower case and upper case both, Amazon recommends that you use all lower case letters when creating a bucket."
Apparently whatever's behind that also impacts hyphens.
With an EU bucket, I get the same behaviour (403) as yourself. Repeat experiment with an NA bucket, and it succeeds.

I saw this error on NOT US buckets.
So, I created US bucket (select region US Standard when creating) and all works fine!

Related

AzureSynapse Lookup UserErrorFileNotFound with Wildcard path

I am facing an odd issue where my lookup is returning a filenotfound error when I use a wildcard path. If I specify and exact file path, the lookup runs without error. However, if I replace the filename with a *, I get a filenotfound error.
The file is Data_643.json, located in my Azure Data Lake Storage Gen2, under the labournavigatorfile system. The exact file path is:
labournavigatorfile/raw_data/Scraped/HeadHunter/Saudi_Arabia/Data_643.json.
If I put this exact path into the Integration dataset configuration, the pipeline runs without issue. However, as soon as I replace the 'Data_643.json' with a '*', the pipeline crashes with a filenotfound error.
What am I doing wrong? Many Thanks for any support. This must be something very simple that I am missing.
Exact path works:
Wildcrad path throws error:
I have 3 files in my container as file1.json, file2.json, file3.json as shown below:
The following is how I configured my dataset to read using wildcard with configuration same as in the image provided in the question.
When I used this in lookup I got the same error:
To overcome this, go to your lookup activity. When you want to use wildcards to read a file/files, check the wildcard file path option. Then specify the folder structure and use wildcard where required. The following is an image for reference.
The following is the debug output when I run the pipeline (Each of my files had 10 rows):

Fluentd - S3 output plugin - Impossible to use environment variable as part of S3 path

Here is my use case:
We have hundreds of kubernetes pods that generate logs and send them to S3. For performance issues with the current solution, we are trying to implement fluentd as sidecars to those pods. Fluentd will send the logs to S3 in a format that includes some variables.
Here is the problematic line from the helm chart (in the match section):
path logs/label/{{ $logName }}/env/dt=%Y-%m-%d/hr=%H/host="#{ENV['POD_NAME']}"/
This line almost works as expected, except the last part with the pod name, even though the variable POD_NAME is defined in the container.
The s3 output creates a folder called host="#{ENV['POD_NAME']}" rather than use the environment variable.
Any help is welcome.
Thank you.
To enable the embedded code evaluation, you need to enclose the value in the double quotes.
Syntax:
parameter "#{ <embedded-code-here> }"
So, your required configuration could be:
path "logs/label/{{ $logName }}/env/dt=%Y-%m-%d/hr=%H/host=#{ENV['POD_NAME']}/"
or, this, using string concatenation:
path "#{ 'logs/label/{{ $logName }}/env/dt=%Y-%m-%d/hr=%H/host=' + ENV['POD_NAME'] + '/' }"
For more on this, see:
https://docs.fluentd.org/configuration/config-file#embedded-ruby-code

Exporting large file from BigQuery to Google cloud using wildcard

I have 8Gb table in BigQuery that I'm trying to export to Google Cloud Storage (GCS). If I specify url as it is, I'm getting an error
Errors:
Table gs://***.large_file.json too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data. (error code: invalid)
Okay... I'm specifying * in a file name, but it exports it in 2 files: one 7.13Gb and one ~150Mb.
UPD. I thought I should get about 8 files, 1Gb each? Am I wrong? Or what am I doing wrong?
P.S. I tried this in WebUI mode as well as using Java library.
For files of certain size or larger, BigQuery will export to multiple GCS files - that's why it asks for the "*" glob.
Once you have multiple files in GCS, you can join them into 1 with the compose operation:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
https://cloud.google.com/storage/docs/gsutil/commands/compose
To export it to GCP you have to go to the table and click EXPORT > Export to GCS.
This opens the following screen
In Select GCS location you define the bucket, the folder and the file.
For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)
daria_bucket/test.csv
Because the file is too big, you're getting an error. To fix it, you'll have to break it down into more files using wildcard. So, you'll need to add *, just like that
daria_bucket/test*.csv
This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, ... testX.
In my case (more than 1 year after you've asked the question), using a random table of 1,25 GBs, got 16 files with 80,3 MBs each.

cfdirectory replacing spaces with + characters when action=list on an S3 folder

I am uploading a file, that contains spaces in the name, to Amazon S3 using cffile action="upload". The file name is burger+beans n beetroot.jpg.
As you can see, the name contains spaces and a plus sign.
When I read the directory, to list the contents, the file name returned by ColdFusion in the query is: burger+beans+n+beetroot.jpg. However, when viewing the file using Amazon S3 Browser, it is correctly listed as: burger+beans n beetroot.jpg. So it appears ColdFusion is replacing the spaces with + signs.
Does anyone know why this happens and if there is a way to disable this? I tried using both the DirectoryList() method as well as the <cfdirectory action="list"> tag, and both do this.
Please note: I am aware that the file name could be cleaned up before processing - that's a workaround, but not the solution I am looking for. Thanks!
I believe this is not a CF problem, it's a S3 problem. They send out their file names escaped. Which makes this a non-answer.
I created a folder in a S3 bucket. Then I uploaded a file named burger+beans n beetroot.jpg. I can see in AWS' console the file properly named. I select it, then in the Actions menu select Download. I get the modal window. Take a look at the URL in the browser footer - the file name is escaped.
I right-click their link and choose "Save Link As..." - the file name is escaped as well.
So I don't think there is anything you can do once the file is up there. You'll need to clean it before uploading. I know it's not what you want to hear.
Try URL encoding the filename, so the + sign will be converted to an url encoded space (%2b). You could use URLEncodedFormat, but make sure that the path to the file isn't urlencoded as well.

How to get the first 100 lines of a file on S3?

I have a huge (~6 GB) file on Amazon S3 and want to get the first 100 lines of it without having to download the whole thing. Is this possible?
Here's what I'm doing now:
aws cp s3://foo/bar - | head -n 100
But this takes a while to execute. I'm confused -- shouldn't head close the pipe once it's read enough lines, causing aws cp to crash with a BrokenPipeError before it has time to download the entire file?
Using the Range HTTP header in a GET request, you can retrieve a specific range of bytes in an object stored in Amazon S3. (see http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html)
if you use aws cli you can use aws s3api get-object --range bytes=0-xxx, see http://docs.aws.amazon.com/cli/latest/reference/s3api/get-object.html
It is not exactly as a number of lines but should allow you to retrieve your file in part so avoid downloading the full object