How to check content of a Noobaa bucket - amazon-s3

I am able to check status of Nooba bucket using noobaa bucket status <bucket> command.
$ noobaa bucket status XYZ
INFO[0005] ✅ Exists: NooBaa "noobaa"
INFO[0005] ✅ Exists: Service "noobaa-mgmt"
INFO[0006] ✅ Exists: Secret "noobaa-operator"
INFO[0006] ✅ Exists: Secret "noobaa-admin"
INFO[0008] ✈️ RPC: bucket.read_bucket() Request: {Name:XYZ}
INFO[0010] ✅ RPC: bucket.read_bucket() Response OK: took 14.3ms
Bucket status:
Bucket : XYZ
OBC Namespace : xyz-namespace
OBC BucketClass : default-bucket-class
Type : REGULAR
Mode : OPTIMAL
ResiliencyStatus : OPTIMAL
QuotaStatus : QUOTA_NOT_SET
Num Objects : 1
Data Size : 3.000 B
Data Size Reduced : 5.000 B
Data Space Avail : 1.000 PB
But I am not able to check content present inside Noobaa bucket.
How can we check content of a Noobaa bucket? using Noobaa CLI or any other way?

Your question made me realize that noobaa CLI should have noobaa object list command so I opened a new issue for this enhancement on the operator github repo. Thanks :)
Until this is added, there are several ways we use to list objects:
run noobaa ui - notice that it opens the browser quickly, but on the terminal it prints the credentials for you to use for login. You can probably find the buckets and the drill down to the objects in the UI on your own, and you can also check out some recorded videos that navigate the UI - for example this video.
Take the admin S3 credentials and endpoint from noobaa status and then use your favorite s3 client - I currently use aws-cli or rclone:
alias s3='AWS_ACCESS_KEY_ID=$NOOBAA_ACCESS_KEY AWS_SECRET_ACCESS_KEY=$NOOBAA_SECRET_KEY aws --endpoint $NOOBAA_S3_ENDPOINT --no-verify-ssl s3'
and then:
s3 ls XYZ
Not many noticed but the NooBaa system CR contains a useful Readme text in its status, with commands to "Test S3 client" - ready to copy-paste to set up your aws-cli, including kubectl port-forward to support secure networks and reading the credentials from secrets. Check it out with kubectl describe noobaa. This 40 seconds youtube video shows this briefly. BTW, the readme text is generated for the system but its text does not contain actual secrets, only kubectl commands to read those secrets if permitted to.
$ kubectl describe noobaa
...
Phase: Ready
Readme:
Welcome to NooBaa!
-----------------
NooBaa Core Version: 5.3.0-9f579d9
NooBaa Operator Version: 2.1.0
Lets get started:
1. Connect to Management console:
Read your mgmt console login information (email & password) from secret: "noobaa-admin".
kubectl get secret noobaa-admin -n backup-service -o json | jq '.data|map_values(#base64d)'
Open the management console service - take External IP/DNS or Node Port or use port forwarding:
kubectl port-forward -n backup-service service/noobaa-mgmt 11443:443 &
open https://localhost:11443
2. Test S3 client:
kubectl port-forward -n backup-service service/s3 10443:443 &
NOOBAA_ACCESS_KEY=$(kubectl get secret noobaa-admin -n backup-service -o json | jq -r '.data.AWS_ACCESS_KEY_ID|#base64d')
NOOBAA_SECRET_KEY=$(kubectl get secret noobaa-admin -n backup-service -o json | jq -r '.data.AWS_SECRET_ACCESS_KEY|#base64d')
alias s3='AWS_ACCESS_KEY_ID=$NOOBAA_ACCESS_KEY AWS_SECRET_ACCESS_KEY=$NOOBAA_SECRET_KEY aws --endpoint https://localhost:10443 --no-verify-ssl s3'
s3 ls
...
Last option, which should have been mentioned first, but unfortunately I just saw it is broken in the current version v2.1.0 (opened new issue), is to use the generic noobaa api command in order to call the object_api list_objects method like so:
noobaa api object list_objects '{ "bucket": "first.bucket" }'
I hope that helps, feel free to open github issues with suggestions/issues.
Thanks!
(NooBaa CTO)

Related

Setting up S3 compatible service for blob storage on Google Cloud Storage

PS: cross posted on drone forums here.
I'm trying to setup s3 like service for drone logs. i've tested that my AWS_* values are set correctly in the container and using aws-cli from inside container gives correct output for:
aws s3api list-objects --bucket drone-logs --endpoint-url=https://storage.googleapis.com
however, drone server itself is unable to upload logs to the bucket (with following error):
{"error":"InvalidArgument: Invalid argument.\n\tstatus code: 400, request id: , host id: ","level":"warning","msg":"manager: cannot upload complete logs","step-id":7,"time":"2023-02-09T12:26:16Z"}
drone server on startup shows that s3 related configuration was picked correctly:
rpc:
server: ""
secret: my-secret
debug: false
host: drone.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
proto: https
s3:
bucket: drone-logs
prefix: ""
endpoint: https://storage.googleapis.com
pathstyle: true
the env. vars inside droner server container are:
# env | grep -E 'DRONE|AWS' | sort
AWS_ACCESS_KEY_ID=GOOGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
AWS_DEFAULT_REGION=us-east-1
AWS_REGION=us-east-1
AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_COOKIE_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_DATABASE_DATASOURCE=postgres://drone:XXXXXXXXXXXXXXXXXXXXXXXXXXXXX#35.XXXXXX.XXXX:5432/drone?sslmode=disable
DRONE_DATABASE_DRIVER=postgres
DRONE_DATABASE_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_GITHUB_CLIENT_ID=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_GITHUB_CLIENT_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_JSONNET_ENABLED=true
DRONE_LOGS_DEBUG=true
DRONE_LOGS_TRACE=true
DRONE_RPC_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_S3_BUCKET=drone-logs
DRONE_S3_ENDPOINT=https://storage.googleapis.com
DRONE_S3_PATH_STYLE=true
DRONE_SERVER_HOST=drone.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DRONE_SERVER_PROTO=https
DRONE_STARLARK_ENABLED=true
the .drone.yaml that is being used is available here, on github.
the server is running using the nolimit flag:
go build -tags "nolimit" github.com/drone/drone/cmd/drone-server

rclone failing with "AccessControlListNotSupported" on cross-account copy -- AWS CLI Works

Quick Summary now that I think I see the problem:
rclone seems to always send ACL with a copy request, with a default value of "private". This will fail in a (2022) default AWS bucket which (correctly) assumes "No ACL". Need a way to suppress ACL send in rclone.
Detail
I assume an IAM role and attempt to do an rclone copy from a data center Linux box to a default options private no-ACL bucket in the same account as the role I assume. It succeeds.
I then configure a default options private no-ACL bucket in another account than the role I assume. I attach a bucket policy to the cross-account bucket that trusts the role I assume. The role I assume has global permissions to write S3 buckets anywhere.
I test the cross-account bucket policy by using the AWS CLI to copy the same linux box source file to the cross-account bucket. Copy works fine with AWS CLI, suggesting that the connection and access permissions to the cross account bucket are fine. DataSync (another AWS service) works fine too.
Problem: an rclone copy fails with the AccessControlListNotSupported error below.
status code: 400, request id: XXXX, host id: ZZZZ
2022/08/26 16:47:29 ERROR : bigmovie: Failed to copy: AccessControlListNotSupported: The bucket does not allow ACLs
status code: 400, request id: XXXX, host id: YYYY
And of course it is true that the bucket does not support ACL ... which is the desired best practice and AWS default for new buckets. However the bucket does support a bucket policy that trusts my assumed role, and that role and bucket policy pair works just fine with the AWS CLI copy across account, but not with the rclone copy.
Given that AWS CLI copies just fine cross account to this bucket, am I missing one of rclone's numerous flags to get the same behaviour? Anyone think of another possible cause?
Tested older, current and beta rclone versions, all behave the same
Version Info
os/version: centos 7.9.2009 (64 bit)
os/kernel: 3.10.0-1160.71.1.el7.x86_64 (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.18.5
go/linking: static
go/tags: none
Failing Command
$ rclone copy bigmovie s3-standard:SOMEBUCKET/bigmovie -vv
Failing RClone Config
type = s3
provider = AWS
env_auth = true
region = us-east-1
endpoint = https://bucket.vpce-REDACTED.s3.us-east-1.vpce.amazonaws.com
#server_side_encryption = AES256
storage_class = STANDARD
#bucket_acl = private
#acl = private
Note that I've tested all permutations of the commented out lines with similar result
Note that I have tested with and without the private endpoint listed with same results for both AWS CLI and rclone, e.g. CLI works, rclone fails.
A log from the command with the -vv flag
2022/08/25 17:25:55 DEBUG : Using config file from "PERSONALSTUFF/rclone.conf"
2022/08/25 17:25:55 DEBUG : rclone: Version "v1.55.1" starting with parameters ["/usr/local/rclone/1.55/bin/rclone" "copy" "bigmovie" "s3-standard:SOMEBUCKET" "-vv"]
2022/08/25 17:25:55 DEBUG : Creating backend with remote "bigmovie"
2022/08/25 17:25:55 DEBUG : fs cache: adding new entry for parent of "bigmovie", "MYDIRECTORY/testbed"
2022/08/25 17:25:55 DEBUG : Creating backend with remote "s3-standard:SOMEBUCKET/bigmovie"
2022/08/25 17:25:55 DEBUG : bigmovie: Need to transfer - File not found at Destination
2022/08/25 17:25:55 ERROR : bigmovie: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
AccessControlListNotSupported The bucket does not allow ACLs8DW1MQSHEN6A0CFAd3Rlnx/XezTB7OC79qr4QQuwjgR+h2VYj4LCZWLGTny9YAy985be5HsFgHcqX4azSDhDXefLE+U=
2022/08/25 17:25:55 ERROR : Attempt 1/3 failed with 1 errors and: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>

Getting error while AWS EKS cluster backup using Velero tool

Please let me know what is my mistake!
Used this command to backup AWS EKS cluster using velero tool but it's not working :
./velero.exe install --provider aws --bucket backup-archive/eks-cluster-backup/prod-eks-cluster/ --secret-file ./minio.credentials --use-restic --backup-location-config region=minio,s3ForcePathStyle=true,s3Url=s3Url=s3://backup-archive/eks-cluster-backup/prod-eks-cluster/ --kubeconfig ../kubeconfig-prod-eks --plugins velero/velero-plugin-for-aws:v1.0.0
cat minio.credentials
[default]
aws_access_key_id=xxxx
aws_secret_access_key=yyyyy/zzzzzzzz
region=ap-southeast-1
Getting Error:
../kubectl.exe --kubeconfig=../kubeconfig-prod-eks.txt logs deployment/velero -n velero
time="2020-12-09T09:07:12Z" level=error msg="Error getting backup store for this location" backupLocation=default controller=backup-sync error="backup storage location's bucket name \"backup-archive/eks-cluster-backup/\" must not contain a '/' (if using a prefix, put it in the 'Prefix' field instead)" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:110" error.function=github.com/vmware-tanzu/velero/pkg/persistence.NewObjectBackupStore logSource="pkg/controller/backup_sync_controller.go:168"
Note: I have tried --bucket backup-archive but still no use
This is the source of your problem: --bucket backup-archive/eks-cluster-backup/prod-eks-cluster/.
The error says: must not contain a '/' .
This means it cannot contain a slash in the middle of the bucket name (leading/trailing slashes are trimmed, so that's not a problem). Source: https://github.com/vmware-tanzu/velero/blob/3867d1f434c0b1dd786eb8f9349819b4cc873048/pkg/persistence/object_store.go#L102-L111.
If you want to namespace your backups within a bucket, you may use the --prefix parameter. Like so:
--bucket backup-archive --prefix /eks-cluster-backup/prod-eks-cluster/.

Why can I read ksqldb streams but not topics within ksql client?

I am testing ksqldb on AWS EC2 instances in the latest release (confluent 5.5.1) and have an access problem that I can't solve.
I have a secured Kafka sever (SASL_SSSL, SASL mode PLAIN), an unsecured Schema Registry (another issue with Avro Serializers, but ok for the moment), and a secured KSQL Server and Client.
Topics are filled properly with AVRO data (value only, no key) from a JDBC source connector.
I can access the KSQL Server with ksql without issues
I can access KSQL REST API without issues
When I list topics within ksql, I get the correct list.
When I select a push stream, I get messages when I push something into the topic (with Kafka Connect, in my case).
BUT: When I call "print topic" I get a ~60 sec block in the client, followed by a 'Timeout expired while fetching topic metadata'.
The ksql-kafka.log goes wild with repeated entries like
[2020-09-02 18:52:46,246] WARN [Consumer clientId=consumer-2, groupId=null] Bootstrap broker ip-10-1-2-10.eu-central-1.compute.internal:9093 (id: -3 rack: null) disconnected (org.apache.kafka.clients.NetworkClient:1037)
The corresponding broker log shows
Sep 2 18:52:44 ip-10-1-6-11 kafka-server-start: [2020-09-02 18:52:44,704] INFO [SocketServer brokerId=1002] Failed authentication with ip-10-1-2-231.eu-central-1.compute.internal/10.1.2.231 (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)
This is my ksql-server.properties file:
ksql.service.id= hf_kafka_ksql_001
bootstrap.servers=ip-10-1-11-229.eu-central-1.compute.internal:9093,ip-10-1-6-11.eu-central-1.compute.internal:9093,ip-10-1-2-10.eu-central-1.compute.internal:9093
ksql.streams.state.dir=/var/data/ksqldb
ksql.schema.registry.url=http://ip-10-1-1-22.eu-central-1.compute.internal:8081
ksql.output.topic.name.prefix=ksql-interactive-
ksql.internal.topic.replicas=3
confluent.support.metrics.enable=false
# currently the keystore contains only the ksql server and the certificate chain to the CA
ssl.keystore.location=/var/kafka-ssl/ksql.keystore.jks
ssl.keystore.password=kspassword
ssl.key.password=kspassword
ssl.client.auth=true
# Need to set this to empty, otherwise the REST API is not accessible with the client key.
ssl.endpoint.identification.algorithm=
# currently the truststore contains only the CA certificate
ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
ssl.truststore.password=ctpassword
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="ksql" \
password="ksqlsecret";
listeners=https://0.0.0.0:8088
advertised.listener=https://ip-10-1-2-231.eu-central-1.compute.internal:8088
authentication.method=BASIC
authentication.roles=admin,ksql,cli
authentication.realm=KsqlServerProps
# authentication for producers, needed for ksql commands like "Create Stream"
producer.ssl.endpoint.identification.algorithm=HTTPS
producer.security.protocol=SASL_SSL
producer.sasl.mechanism=PLAIN
producer.ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
producer.ssl.truststore.password=ctpassword
producer.sasl.mechanism=PLAIN
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="ksql" \
password="ksqlsecret";
# authentication for consumers, needed for ksql commands like "Create Stream"
consumer.ssl.endpoint.identification.algorithm=HTTPS
consumer.security.protocol=SASL_SSL
consumer.ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
consumer.ssl.truststore.password=ctpassword
consumer.sasl.mechanism=PLAIN
consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="ksql" \
password="ksqlsecret";
I call ksql with
ksql --user cli --password test --config-file /var/kafka-ssl/ksql_cli.properties https://ip-10-1-2-231.eu-central-1.compute.internal:8088'
This is my ksql client configuration ksql_cli.properties:
security.protocol=SSL
#ssl.client.auth=true
ssl.truststore.location=/var/kafka-ssl/client.truststore.jks
ssl.truststore.password=ctpassword
ssl.keystore.location=/var/kafka-ssl/ksql.keystore.jks
ssl.keystore.password=kspassword
ssl.key.password=kspassword
JAAS config, included as Parameter on service start
KsqlServerProps {
org.eclipse.jetty.jaas.spi.PropertyFileLoginModule required
file="/var/kafka-ssl/cli.password"
debug="false";
};
with cli.password containing the authentication users and passwords for the ksql client.
I call ksql with
ksql --user cli --password test --config-file /var/kafka-ssl/ksql_cli.properties https://ip-10-1-2-231.eu-central-1.compute.internal:8088'
I possibly have tried any permutation of keys, settings etc but to no avail. Obviously there is something wroing in key management. For me, it is surprising that usings streams is ok but the low-level topics is not.
Has someone found a solution for that issue? I am really running ou of ideas here. Thanks.
Found it! It was easy to overlook - the client's configuration needs of course. a SASL setting...
security.protocol=SASL_SSL

Is there a way to add additional configurable settings in OpsCenter 6.0.2 Lifecycle Manager config profiles?

I would really like to add the following settings to our spark-defaults.conf using OpsCenter 6.0.2 in order to avoid configuration drift. Is there a way to add these config items to the config profile template?
spark.cores.max 4
spark.driver.memory 2g
spark.executor.memory 4g
spark.python.worker.memory 2g
NOTE: As Mike Lococo has pointed out in the comments for this answer -- this answer may work to update the config profile values but will not result in those values being written to spark-defaults.conf.
The following is not a solution!
You can; you have to update the config profile via the LCM Config Profile API (https://docs.datastax.com/en/opscenter/6.0/api/docs/lcm_config_profile.html#lcm-config-profile).
First, identify the config profile that needs updating:
$ curl http://localhost:8888/api/v1/lcm/config_profiles
Get the href for the specific config profile that needs updating, request it, and save the response body to a file:
$ curl http://localhost:8888/api/v1/lcm/config_profiles/026fe8e3-0bb8-49c1-9888-8187b1624375 > profile.json
Now, in the profile.json file you just saved to, you add or edit the key at json > spark-defaults-conf to include the following keys:
"spark-defaults-conf": {
"spark-cores-max": 4,
"spark-python-worker-memory": "2g",
"spark-ssl-enabled": false,
"spark-drivers-memory": "2g",
"spark-executor-memory": "4g"
}
Save the updated profile.json. Finally, execute an HTTP PUT to the same config profile URL, using the edited file as the request data:
$ curl -X PUT http://localhost:8888/api/v1/lcm/config_profiles/026fe8e3-0bb8-49c1-9888-8187b1624375 -d #profile.json