Let's say I have below code in boto3:
s3_client = boto3.client("s3")
def upload_file(file_name, bucket, object_name=None):
"""
Upload a file to a S3 bucket.
"""
try:
if object_name is None:
object_name = os.path.basename(file_name)
response = s3_client.upload_file(
file_name, bucket, object_name)
except ClientError:
logger.exception('Could not upload file to S3 bucket.')
raise
else:
return response
This works fine for actual AWS environment. Now I'm introducing localstack as testing framework prior doing the actual AWS upload.
My question is how to add localstack to this script with out changing the code.
I know if you add the endpoint_url to the boto3 client then it will work just for localstack.
But my question is there anyway I can use the same script for both localstack if localsystem is involved while for rest actual AWS will be used?
You can easily create a boto3 client that interacts with your LocalStack instance. Here is how you can modify your script for that purpose:
endpoint_url = "http://localhost.localstack.cloud:4566"
# alternatively, to use HTTPS endpoint on port 443:
# endpoint_url = "https://localhost.localstack.cloud"
s3_client = boto3.client("s3", endpoint_url=endpoint_url)
def upload_file(file_name, bucket, object_name=None):
"""
Upload a file to a S3 bucket.
"""
try:
if object_name is None:
object_name = os.path.basename(file_name)
response = s3_client.upload_file(
file_name, bucket, object_name)
except ClientError:
logger.exception('Could not upload file to S3 bucket.')
raise
else:
return response
Alternatively, if you prefer to (or need to) set the endpoints directly, you can use the $LOCALSTACK_HOSTNAME environment variable which is available when executing user code in LocalStack:
import os
endpoint_url = f"http://{os.getenv("LOCALSTACK_HOSTNAME")}:{os.getenv("EDGE_PORT")}"
client = boto3.client("s3", endpoint_url=endpoint_url)
Related
# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in:
# https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import bigquery_datatransfer_v1
def sample_start_manual_transfer_runs():
# Create a client
client = bigquery_datatransfer_v1.DataTransferServiceClient()
# Initialize request argument(s)
request = bigquery_datatransfer_v1.StartManualTransferRunsRequest(
)
# Make the request
response = client.start_manual_transfer_runs(request=request)
# Handle the response
print(response)
I am trying to write a script that manually triggers s3 to bigquery. However, I am unsure of how to integrate other types and functions provided from DataTransferServiceClient.
For example, how do I integrate transfer_config into the script above. Also, I am not quite sure how to get config_id from transfer_config once I have it.
Quick Summary now that I think I see the problem:
rclone seems to always send ACL with a copy request, with a default value of "private". This will fail in a (2022) default AWS bucket which (correctly) assumes "No ACL". Need a way to suppress ACL send in rclone.
Detail
I assume an IAM role and attempt to do an rclone copy from a data center Linux box to a default options private no-ACL bucket in the same account as the role I assume. It succeeds.
I then configure a default options private no-ACL bucket in another account than the role I assume. I attach a bucket policy to the cross-account bucket that trusts the role I assume. The role I assume has global permissions to write S3 buckets anywhere.
I test the cross-account bucket policy by using the AWS CLI to copy the same linux box source file to the cross-account bucket. Copy works fine with AWS CLI, suggesting that the connection and access permissions to the cross account bucket are fine. DataSync (another AWS service) works fine too.
Problem: an rclone copy fails with the AccessControlListNotSupported error below.
status code: 400, request id: XXXX, host id: ZZZZ
2022/08/26 16:47:29 ERROR : bigmovie: Failed to copy: AccessControlListNotSupported: The bucket does not allow ACLs
status code: 400, request id: XXXX, host id: YYYY
And of course it is true that the bucket does not support ACL ... which is the desired best practice and AWS default for new buckets. However the bucket does support a bucket policy that trusts my assumed role, and that role and bucket policy pair works just fine with the AWS CLI copy across account, but not with the rclone copy.
Given that AWS CLI copies just fine cross account to this bucket, am I missing one of rclone's numerous flags to get the same behaviour? Anyone think of another possible cause?
Tested older, current and beta rclone versions, all behave the same
Version Info
os/version: centos 7.9.2009 (64 bit)
os/kernel: 3.10.0-1160.71.1.el7.x86_64 (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.18.5
go/linking: static
go/tags: none
Failing Command
$ rclone copy bigmovie s3-standard:SOMEBUCKET/bigmovie -vv
Failing RClone Config
type = s3
provider = AWS
env_auth = true
region = us-east-1
endpoint = https://bucket.vpce-REDACTED.s3.us-east-1.vpce.amazonaws.com
#server_side_encryption = AES256
storage_class = STANDARD
#bucket_acl = private
#acl = private
Note that I've tested all permutations of the commented out lines with similar result
Note that I have tested with and without the private endpoint listed with same results for both AWS CLI and rclone, e.g. CLI works, rclone fails.
A log from the command with the -vv flag
2022/08/25 17:25:55 DEBUG : Using config file from "PERSONALSTUFF/rclone.conf"
2022/08/25 17:25:55 DEBUG : rclone: Version "v1.55.1" starting with parameters ["/usr/local/rclone/1.55/bin/rclone" "copy" "bigmovie" "s3-standard:SOMEBUCKET" "-vv"]
2022/08/25 17:25:55 DEBUG : Creating backend with remote "bigmovie"
2022/08/25 17:25:55 DEBUG : fs cache: adding new entry for parent of "bigmovie", "MYDIRECTORY/testbed"
2022/08/25 17:25:55 DEBUG : Creating backend with remote "s3-standard:SOMEBUCKET/bigmovie"
2022/08/25 17:25:55 DEBUG : bigmovie: Need to transfer - File not found at Destination
2022/08/25 17:25:55 ERROR : bigmovie: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
AccessControlListNotSupported The bucket does not allow ACLs8DW1MQSHEN6A0CFAd3Rlnx/XezTB7OC79qr4QQuwjgR+h2VYj4LCZWLGTny9YAy985be5HsFgHcqX4azSDhDXefLE+U=
2022/08/25 17:25:55 ERROR : Attempt 1/3 failed with 1 errors and: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions
The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
Now, the fluentd.conf i use sets s3 stuff like this:
<match **>
# docs: https://docs.fluentd.org/v0.12/articles/out_s3
# note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
type copy
<store>
type s3
log_level info
s3_bucket "#{ENV['S3_BUCKET_NAME']}"
s3_region "#{ENV['S3_BUCKET_REGION']}"
aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
format json
time_slice_format %Y/%m/%d
time_slice_wait 1m
flush_interval 10m
utc
include_time_key true
include_tag_key true
buffer_chunk_limit 128m
buffer_path /var/log/fluentd-buffers/s3.buffer
</store>
<store>
...
</store>
</match>
So, what i would like to do is to have something like a grep plugin
<store>
type grep
<regexp>
key type
pattern client-action
</regexp>
</store>
Which would send logs into a separate s3 bucket to the one defined for all logs
I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
I found your example yaml file at the official fluent github repo.
If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.
The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.
so your above config turns into
<match kubernetes.* >
#type s3
# user log s3 bucket
...
and for system logs match every other tag except kubernetes.*
Running terraform plan is complaining that there's no S3 key in my bucket. Note: this key does not exist however I'm pretty sure Terraform is supposed to create this if it doesn't. The log is:
[DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>my-key</Key>
and the Terraform config is:
terraform {
backend "s3" {
bucket = "<bucket>"
key = "my-key"
region = "eu-west-2"
acl = "private"
kms_key_id = "<key>"
}
}
Any suggestions?
You need to run terraform init before terraform plan to initialise the backend you have configured.
At least in recent versions of Terraform (I'm using 0.11.13) the S3 backend is automatically created if it doesn't already exist.
I spent several hrs on this only to find out remote state file will only be created after terraform apply (of course after terraform init && terraform plan).
$ terraform version
Terraform v1.0.4
I'm writing a python scripts to find out whether S3 object is encrypted. I tried using following code but key.encrypted always returns None even though I can see the object on S3 is encrypted.
keys = bucket.list()
for k in keys:
print k.name, k.size, k.last_modified, k.encrypted , "\n"
k.encrypted always returns None.
For what it's worth, you can do this using boto3 (which can be used side-by-side with boto).
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
for obj in bucket.objects.all():
key = s3.Object(bucket.name, obj.key)
print key.server_side_encryption
See the boto3 docs for a list of available key attributes.
expanding on #mfisherca's response, you can do this with the AWS CLI:
aws s3api head-object --bucket <bucket> --key <key>
# or query the value directly
aws s3api head-object --bucket <bucket> --key <key> \
--query ServerSideEncryption --output text
You can also check the encryption state for specific objects using the head_object call. Here's an example in Python/boto:
#!/usr/bin/env python
import boto3
s3_client = boto3.client('s3')
head = s3_client.head_object(
Bucket="<S3 bucket name>",
Key="<S3 object key>"
)
if 'ServerSideEncryption' in head:
print head['ServerSideEncryption']
See: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.head_object