Configure Public Key Auth with Amazon Data Pipeline, SFTP, and S3 - amazon-s3

From the following Data Pipeline ShellCommandWith (S)FTP Sample:
The sample relies on having public key authentication configured to access the SFTP server.
How do I configure public key authentication so that my Amazon Data Pipeline's ShellCommandActivity can access an on-prem server via SFTP and upload files to S3?
What command do I put in my ShellCommandActivity to test if it can talk to my on-prem SFTP server?

Since you don't want to expose your password in your pipeline definition, the sample assumes you set up passwordless ssh login for your ftp server.
You can learn more about that here: http://www.linuxproblem.org/art_9.html
Please make sure that is allowed in your organization's security rules or check with someone in your IT department if you are unsure.
If you are allowed and able to set that up, you can test it by running the sample. It has an sftp command that gets executed as part of the ShellCommandActivity.

Related

Reverse Proxy Setup for Database Access through API

Hello: I do have Django based application deployed in AWS EC2 instance. The application accesses a hospital database hosted on the premises for query and update through as set of APIs written in Django application. However, we cant have production IP, database UserID, database Pwd in our EC2 application due to security reasons. What are the options available for this problem. One option that is thought is set-up a reverse proxy server on a different machine at on-premise environment and access the prod database through it. I wanted to know,
If there are other better practiced solutions available
A pointer to an example configuration of ReverseProxy maintaining database UserID, databasePwd etc in the proxy and running the API through it.
We are still trying setting-up an UBUNTU based Ngnx server. However, still not sure where to maintain the DBUserID, DBPwd etc.
Any help or guiding pointer will be greatly appreciated. Thanks
I haven't tried this in django, here is what I consider as a neat solution:
Assign an IAM role to the EC2 instance. Assign permission to the role so that it can read from the Systems manager parameter store
Store the credentials (Username, Password, Host, etc) in the aws System manager (SSM) parameter store
Add a boot script to the ec2 and query the systems manager for the parameter. The script will run once when the ec2 instance boots, read the values from SSM and set the values as environment variables
In the django setting file where the db credentials are set, i would read the credential from the environment variables for e.g os.environ['DATABASE_PASSWORD']
References:
EC2 Boot Script
Systems manager parameter store
How to set environment variables in python

How to dynamically create Airflow S3 connection using IAM service

My Airflow application is running in AWS EC2 instance which has IAM role as well. Currently I am creating Airflow S3 connection using hardcoded access and secret key. But I want my application to pickup this AWS credentials from this instance itself.
How to achieve this?
We have a similar setup, our Airflow instance run inside containers deployed inside an EC2 machine. We set up the policies to access S3 on the EC2 machine instance profile. You don't need to pick up the credentials in the EC2 machine, because the machine has an instance profile that should have all the permissions that you need. From the Airflow side, we only use aws_default connection, in the extra parameter we only setup the default region, but there aren't any credentials.
Here a details article about Intance Profiles: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html
The question is answered but for future reference, it is possible to do it without relying on aws_default and just doing it via Environment Variables. Here is an example to write logs to s3 using an AWS connection to benefit form IAM:
AIRFLOW_CONN_AWS_LOG="aws://"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID=aws_log
AIRFLOW__CORE__REMOTE_LOGGING=true
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER="s3://path to bucket"

How to secure HDFS on DC/OS without Enterprise

I'm trying to secure HDFS cluster on open source DC/OS but it seems it's not an easy thing.
The problem I see in HDFS is the fact that it uses username of current system user so without any form of authentication anyone can just create user with certain username and get superuser permissions on cluster.
So I need any form of authentication. IP auth would be fine(clients with certain IPs can only connect to HDFS) but I couldn't find if there's an option to enable it.
Creating Kerberos for HDFS is not an option because running another service just to run another service to run another service etc. will only give tons of work.
If enabling any form of viable security is impossible, is there any other DC/OS HDFS-like service I can use? I need some HA storage to fetch config files and sometimes jars from Artifact Uris to run services. I also need a place to store parquet files from spark streaming.
Version of DC/OS HDFS is 2.6.x.
Unfortunately it seems that Kerberos is the only real form of authentication in HDFS. Without this, HDFS will trust every user.

ftp through filezilla to google cloud machine, can't achieve it

before asking this question i looked through google and tried different alternatives none of which were successful for me, sadly. I'm a little above the noob level. What i want is to basicaly host a wordpress site on a google cloud debian machine.
I was doing good installing services through their SSH access until i got to the point where i installed an ftp service and wanted to access it through a remote computer(my own) i only got as far as to:
Status: Waiting to retry...
Status: Connecting to 104.197.183.19...
Response: fzSftp started
Command: open "root#104.197.183.19" 22
Error: Connection timed out
Error: Could not connect to server
I kept on looking and trying new ways until i found the gcloud documentation for ftp but it is not aimed at new ones, so my questions are:
Where do i input the commands for gcloud, on my computer or on the SSH console(Google cloud machine)?
Do i need to use gcloud for ftp remote access or can i do it entirely through my computer and their SSH machine?
Do i really need to add an ssh authorization file to FileZilla or is there a way i can disable that check on my vps so it lets me sign in with just a username and a password?
What i already tried and didn't work for me:
gCloud documentation for ssh and ftp
Google cloud documention for setting up a wordpress site
Many others
Basically what i need in short is to manage to access the vps through ftp so i can continue with my learning.. Been stuck there two days.
To get access to a users public area, ie. public_html
Go to the accounts Cpanel area and under Security > SSH Access you can import a key file.
You can use PuttyGen to make one, you will need both a private and public key.
Past the keys into the box's.
You may get a warning message about the private key, this is ok.
Go to Manage under public key and authorize it.
Or
Make on using the interface in Cpanel and download both Keys.
Then in FileZilla
Host: IP of server
Protocol: SFTP
Logon Type: Key File
Key File: the PPK you made.
(if you asked Cpanel to make the file select the one that does not end in .pub and FileZilla will convert it for you to a .ppk file.
After clicking connect you should be in
If you still have an error make sure the SSH port (22) is open in your filewalls both Google cloud.google.com > Networks and WHM > LDF/CSF plugin
Use SSH File Transfer Protocol.
No need to install ftp service.
Use winscp for connecting with sftp.
The recommended way of transferring files to a Unix-based Google Compute Engine VM is via the gcloud compute copy-files command. For this, please install the Google Cloud SDK. Then, run a command such as the following:
gcloud compute copy-files --zone=<Compute Engine zone>/path/to/local/file.txt <Compute Engine instance name>:/path/to/destination/file.txt
If you'd like to use FileZilla, you'll have to configure it for access. The SSH daemon on Compute Engine VMs is set up for key-based authentication. This forum post indicates how this is possible in FileZilla. The catch is that you need to put your public key on the VM, which can be a little tricky. gcloud compute copy-files and gcloud compute ssh take care of this for you, which is why they are the recommended method.

Gsutil - Installing and configuring on a remote server. How to automate it?

I have currently installed gsutil on a server to access my GCS buckets. I followed the instructions under the section 'How to convert gsutil to use OAuth 2.0' from https://cloud.google.com/storage/docs/gsutil_install
The intermediate steps in the instructions require that a URL is copy pasted in the browser to generate a code that you have to enter again on the terminal. You also need to enter proxy server details (if any).
I am looking for ways to automate this set up and configuration process for gsutil.
Any ideas/references/suggestions/comments are welcome.
Thanks.
Can you say more about what you're trying to do? Are you looking to create distinct credentials for each of a set of users, or are you trying to set up gsutil running on multiple machines all as part of an application that authenticates as that application to Google Cloud Storage?
For the former you need users to set up their own credentials. The web-based dialog for OK'ing the creation of OAuth2 credentials was designed to make it unlikely that a customer could grant long lasting credentials without being aware that they are doing so (for security reasons).
For the latter you should use a service account (see https://cloud.google.com/storage/docs/authentication#service_accounts). You create those credentials once and then deploy them on your production machines along with gsutil - which is a valid security approach because all instances of those machines are authenticating on behalf of an application, not distinct users.