How to efficiently use S3 remote with DVC among multiple developers with different aws configs? - amazon-s3

The DVC remote configuration allows to define a profile for the AWS CLI to use. However, some developers might have their local AWS cli configuration use different profiles whose name they find helpful.
Is there a way to override the profile used by DVC on a dvc push / dvc pull without modifying the remote configuration of an s3 repo?

There are a few options to achieve this.
Use local remote config option to set remote storage config parameters that are specific to a user:
$ dvc remote modify --local myremote profile myprofile
It will create a file .dvc/config.local that will be Git-ignored, and options from this files will be merged with the main config when users run DVC commands.
Alternatively, users can use the AWS_PROFILE environment variable to specify their local profile name. In this case remember to not include profile name into DVC remote config.
$ export AWS_PROFILE=myprofile
$ dvc push

Related

How to explicitly define the AWS credentials for MLFlow when using AWS S3 as artifact store

so I'm using a MLFlow tracking server where I define a S3 bucket to be the artifact stores. Right now, MLFlow by default is getting the credentials to write/read the bucket via my default profile in .aws/credentials but I do have a staging and dev profile as well. So my question is is there a way to explicitly tells MLFlow to use the staging or dev profile credentials instead of default? I can't seem to find this info anywhere. Thanks!
To allow the server and clients to access the artifact location, you should configure your cloud provider credentials as normal. For example, for S3, you can set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables, use an IAM role, or configure a default profile in ~/.aws/credentials. See Set up AWS Credentials and Region for Development for more info.
Apparently there is no option to set another profile. I use aws-vault so it is easy to change profiles

Can I pass a key file as parameter to connect to Google Cloud Storage using gsutil?

When I use gsutil to connect to my bucket on Google Cloud Storage, I usually use the following command:
gcloud auth activate-service-account --key-file="pathKeyFile"
What should I do if two scripts that are running on the same machine at the same time need two different Service Accounts?
I would like to use a command such as:
gsutil ls mybucket --key-file="mykeyspath"
I say this because in the case my script is running and another script changes the Service Account which is actually active, my script would not have permission to access the bucket anymore.
You can do this with BOTO file. You can create one as explained in the documentation.
Then you can specify which file to use when you run your gsutil command (here an example in linux)
# if you have several GSUTIL command to run
export BOTO_CONFIG=/path/to/.botoMyBucket
gsutil ls myBucket
# For only one command, you can define an env var inline like this
BOTO_CONFIG=/path/to/.botoMyBucket2 gsutil ls myBucket2

How to download file from S3 into EC2 instance using packers to build custom AMI

I am trying to create a custom AMI using packers.
I want to install some specific software on the custom AMI and my setups are present in S3 bucket. But it seems there is no direct way to download S3 file in packers just like cfn-init.
So is there any way to download file on EC2 instance using packers.
Install the awscli in the instance and use iam_instance_profile to give the instance permissions to get the files from S3.
I can envisage an instance where this is ineffective.
When building the image upon aws you use your local creds. Whilst the image is building this building packer image has a packer user and is not you and so not your creds and can't access the S3 (if private)
One option https://github.com/enmand/packer-provisioner-s3
Two option, use local-shell provisioner you pull down the S3 files to your machine using aws S3 cp, then file provisioner to upload to the correct folder in the builder image, you can then use remote-shell to do any other work on the files. I chose this as, although it's more code, it is more universal when I share my build, other have no need to install other stuff
Three option wait and wait. There is an enhancement spoke of in 2019 packer GitHub to offer an S3 passthrough using local cars but isn't on the official roadmap.
Assuming awscli is already installed on Ec2, use below sample commmand in a shell provisioner.
sudo aws s3 cp s3://bucket-name/path_to_folder/file_name /home/ec2-user/temp

How to authenticate google APIs with different service account credentials?

As anyone who has ever had the misfortune of having to interact with the panoply of Google CLI binaries programmatically will have realised, authenticating with the likes of gcloud, gsutil, bq, etc. is far from intuitive or trivial, especially when you need to work across different projects.
I am running various cron jobs that interact with Google Cloud Storage and BigQuery for different projects. Since the cron jobs may overlap, renaming config files is clearly not an option, and nor would any sane person take that approach.
There must surely be some sort of method of passing a path to a service account's key pair file to these CLI binaries, but bq help yields nothing.
The Google documentation, while verbose, is largely useless, taking one on a tour of how OAuth2 works, etc, instead of explaining what must surely be a very common requirement, vis-a-vis, how to actually authenticate a service account without running commands that modify central config files.
Can any enlightened being tell me whether the engineers at Google decided to add a feature as simple as passing the path to a service account's key pair file to the likes of gsutil and bq? Or perhaps I could simply export some variable so they know which key pair file to use for authentication?
I realise these simplistic approaches may be an insult to the intelligence, but we aren't concerning ourselves with harnessing nuclear fusion, so we needn't even consider what Amazon got so right with their approach to authentication in comparison...
Configuration in the Cloud SDK is global for the user, but you can specify what aspects of that config to use on a per command basis. To accomplish what you are trying to do you can:
gcloud auth activate-service-account foo#developer.gserviceaccount.com --key-file ...
gcloud auth activate-service-account bar#developer.gserviceaccount.com --key-file ...
At this point, both sets of credentials are in your global credentials store.
Now you can run:
gcloud --account foo#developer.gserviceaccount.com some-command
gcloud --account bar#developer.gserviceaccount.com some-command
in parallel, and each will use the given account without interfering.
A larger extension of this is 'configurations' which do the same thing, but for your entire set of config (including settings like account and project).
# Create first configuration
gcloud config configurations create myconfig
gcloud config configurations activate myconfig
gcloud config set account foo#developer.gserviceaccount.com
gcloud config set project foo
# Create second configuration
gcloud config configurations create anotherconfig
gcloud config configurations activate anotherconfig
gcloud config set account bar#developer.gserviceaccount.com
gcloud config set project bar
And you can say which configuration to use on a per command basis.
gcloud --configuration myconfig some-command
gcloud --configuration anotherconfig some-command
You can read more about configurations by running: gcloud topic configurations
All properties have corresponding environment variables that allow you to set that particular property for a single command invocation or for a terminal session. They take the form:
CLOUDSDK_<SECTION>_<PROPERTY>
for example: CLOUDSDK_CORE_ACCOUNT
You can see all the available config settings by running: gcloud help config
The equivalent of the --configuration flag is: CLOUDSDK_ACTIVE_CONFIG_NAME
If you really want complete isolation, you can also change the Cloud SDK's config directory by setting CLOUDSDK_CONFIG to a directory of your choosing. Note that if you do this, the config is completely separate including the credential store, all configurations, logs, etc.

SSH aws ec2 elastic beanstalk without keypair

I have a running instance that was created without a keypair, as I understand is not possible to apply a keypair to a running instance, I need to ssh connect to the instance to get some logs, how can I do that?
Right click on the instance -> Connect, shows a message saying that the instance is not associated with a key pair and "you will need to log into this instance using a valid username and password combination".
Our app runs on Elastic Beanstalk, the user should be ec2-user, but what about the password? How can I retrieve that?
PS: re-launch the instance with a keypair is not an option....
Thanks!
You can download the logs using tail logs or full logs option in the console
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.loggingS3.title.html
The above will get you the default set of log files from the instance. If you want to get your files from one of the non-default locations you will need to update your environment with the following ebextension. Create a file custom-logs.config in a folder named .ebextensions in your app root. In the contents of your file create a log configuration file that points to your custom location.
Example contents:
files:
"/opt/elasticbeanstalk/tasks/systemtaillogs.d/my-cool-logs.conf" :
mode: "000777"
owner: root
group: root
content: |
/my-framework/my-logs/my-cool-log.log
This file is in yaml format, so be careful with the indentation. After creating this file you can deploy this new app version to your environment. Then when you snapshot logs using the instructions above you will get your custom logs.
If there's any way to access the command line on your instance then you could
edit
/etc/ssh/sshd_config
setting change the line to:
PasswordAuthentication yes
SSH User:Pass access defaults to no on launch.