Get entire bucket or more than one objects from AWS S3 bucket through Ansible - amazon-s3

As far as I know S3 module of Ansible, it can only get an object at once.
My question is that what if I want to download/get entire bucket or more than one object from S3 bucket at once. Is there any hack?

I was able to achieve it like so:
- name: get s3_bucket_items
s3:
mode=list
bucket=MY_BUCKET
prefix=MY_PREFIX/
register: s3_bucket_items
- name: download s3_bucket_items
s3:
mode=get
bucket=MY_BUCKET
object={{ item }}
dest=/tmp/
with_items: s3_bucket_items.s3_keys
Notes:
Your prefix should not have a leading slash.
The {{ item }} value will have the prefix already.

You have to first list files to a variable and copy files using that variable.
- name: List files
aws_s3:
aws_access_key: 'YOUR_KEY'
aws_secret_key: 'YOUR_SECRET'
mode: list
bucket: 'YOUR_BUCKET'
prefix : 'YOUR_BUCKET_FOLDER' #Remember to add trailing slashes
marker: 'YOUR_BUCKET_FOLDER' #Remember to add trailing slashes
register: 's3BucketItems'
- name: Copy files
aws_s3:
aws_access_key: 'YOUR_KEY'
aws_secret_key: 'YOUR_SECRET'
bucket: 'YOUR_BUCKET'
object: '{{ item }}'
dest: 'YOUR_DESTINATION_FOLDER/{{ item|basename }}'
mode: get
with_items: '{{s3BucketItems.s3_keys}}'

The ansible S3 module has currently no built-in way to syncronize buckets to disk recursively.
In theory, you could try to collect the keys to download with a
- name: register keys for syncronization
s3:
mode: list
bucket: hosts
object: /data/*
register: s3_bucket_items
- name: sync s3 bucket to disk
s3:
mode=get
bucket=hosts
object={{ item }}
dest=/etc/data/conf/
with_items: s3_bucket_items.s3_keys
While I often see this solution, it does not seem to work with current ansible/boto versions, due to a bug with nested S3 'directories' (see this bug report for more information), and the ansible S3 module not creating subdirectories for keys.
I believe it is also possible that you would run into some memory issues using this method when syncing very large buckets.
I also like to add that you most likely do not want to use credentials coded into your playbooks - I suggest you use IAM EC2 instance profiles instead, which are much more secure and comfortable.
A solution that works for me, would be this:
- name: Sync directory from S3 to disk
command: "s3cmd sync -q --no-preserve s3://hosts/{{ item }}/ /etc/data/conf/"
with_items:
- data

It will be able to:
- name: Get s3 objects
s3:
bucket: your-s3-bucket
prefix: your-object-directory-path
mode: list
register: s3_object_list
- name: Create download directory
file:
path: "/your/destination/directory/path/{{ item | dirname }}"
state: directory
with_items:
- "{{ s3_object_list.s3_keys }}"
- name: Download s3 objects
s3:
bucket: your-s3-bucket
object: "{{ item }}"
mode: get
dest: "/your/destination/directory/path/{{ item }}"
with_items:
- "{{ s3_object_list.s3_keys }}"

As of Ansible 2.0 the S3 module includes the list action, which lets you list the keys in a bucket.
If you're not ready to upgrade to Ansible 2.0 yet then another approach might be to use a tool like s3cmd and invoke it via the command module:
- name: Get objects
command: s3cmd ls s3://my-bucket/path/to/objects
register: s3objects

the non-ansible solution, but finally got it working on the instance running with an assumed role with S3 bucket access, or AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
---
- name: download fs s3 bucket
command: aws s3 sync s3://{{ s3_backup_bucket }} {{ dst_path }}

The following code will list every file in every S3 bucket in the account. It is run as a role with a group_vars/localhost/vault.yml containing the AWS keys.
I still haven't found out why the second, more straight-forward method II doesn't work but maybe someone can enlighten us.
- name: List S3 Buckets
aws_s3_bucket_facts:
aws_access_key: "{{ aws_access_key_id }}"
aws_secret_key: "{{ aws_secret_access_key }}"
# region: "eu-west-2"
register: s3_buckets
#- debug: var=s3_buckets
- name: Iterate buckets
set_fact:
app_item: "{{ item.name }}"
with_items: "{{ s3_buckets.ansible_facts.buckets }}"
register: app_result
#- debug: var=app_result.results #.item.name <= does not work??
- name: Create Fact List
set_fact:
s3_bucketlist: "{{ app_result.results | map(attribute='item.name') | list }}"
#- debug: var=s3_bucketlist
- name: List S3 Bucket files - Method I - works
local_action:
module: aws_s3
bucket: "{{ item }}"
aws_access_key: "{{ aws_access_key_id }}"
aws_secret_key: "{{ aws_secret_access_key }}"
mode: list
with_items:
- "{{ s3_bucketlist }}"
register: s3_list_I
#- debug: var=s3_list_I
- name: List S3 Bucket files - Method II - does not work
aws_s3:
aws_access_key: "{{ aws_access_key_id }}"
aws_secret_key: "{{ aws_secret_access_key }}"
bucket: "{{ item }}"
mode: list
with_items: "{{ s3_bucketlist }}"
register: s3_list_II

Maybe you could change your "with_items", then should work
- name: get list to download
aws_s3:
region: "{{ region }}"
bucket: "{{ item }}"
mode: list
with_items: "{{ s3_bucketlist }}"
register: s3_bucket_items
but maybe fast is:
- name: Sync directory from S3 to disk
command: "aws --region {{ region }} s3 sync s3://{{ bucket }}/ /tmp/test"

Related

Create multiple bucket in s3 using ANSIBLE-PLAYBOOK

i want to create multiple s3 bucket using ansible. Upload an object/dir in the created buckets (which is working in terraform) and a file (i think it is not working in terraform - with more than 1 bucket).
Is there any chance doing these in ansible?
I'm just new using ansible, i just read documentations and watching some videos.
Here are my basic data that being gathered.
---
- name: create s3 bucket
hosts: localhost
connection: local
vars:
aws_access_key: "{{ aws_access_key }}"
aws_secret_key: "{{ aws_secret_key }}"
vars_files:
- creds.yml
tasks:
- name: create a simple s3 bucket
amazon.aws.s3_bucket:
name: kevs-task2-ansible
state: present
region: ap-southeast-1
acl: public-read
versnioning: enabled
- name: create folder in the bucket
amazon.aws.aws_s3:
bucket: kevs-task2-ansible
object: /public
mode: create
acl: public-read
- name: create file in the folder
amazon.aws.aws_s3:
bucket: kevs-task2-ansible
object: /public/info.txt
src: info.txt
mode: put
In a nutshell and super simple:
- name: create a simple s3 bucket
amazon.aws.s3_bucket:
name: "{{ item }}"
state: present
region: ap-southeast-1
acl: public-read
versionning: enabled
loop:
- kevs-task2-ansible
- an_other_name
- yet_an_other_one
To enhance that, please read ansible loops

Ansible variables and tags

I have a playbook that calls 2 roles with shared variables. I'm using the roles to create some level of abstraction layer.
The problem happens when I try to call the role with the tags and variables which belong to another role I get an error. Also, I tried to use dependencies didn't work
Let me paste the code here to explain.
I have a role --> KEYS. Where I save my API calls to my 2 different platforms. As listed I'm registering the result to the user_result1 and user_result2
first role my_key.yml
# tasks file for list_users
- name: List Users platform 1
uri:
url: 'http://myhttpage.example.platform1'
method: GET
headers:
API-KEY: 'SOME_API_KEY'
register: user_result1
- name: List Users platform 2
uri:
url: 'http://myhttpage.example.platform2'
method: GET
headers:
API-KEY: 'SOME_API_KEY'
register: user_result2
Second role: list_users
- name: List users platform1
set_fact:
user: '{{ user | default([]) + [ item.email ] }}'
loop: "{{ user_result1.json }}"
- debug:
msg: "{{ user }}"
tags:
- user_1
- name: List users Cloudflare
set_fact:
name: "{{ name | default([]) + [item.user.email] }}"
loop: "{{ user_result2.result }}"
- debug:
msg: "{{ name }}"
tags:
- user_2
Playbook.yml
---
- name: Users
gather_facts: no
hosts: localhost
roles:
- my_key
- list_users
When I do the call without the --tags user_1 or user_2, it works fine.
However, when I do the call using the tags I got an error showing that variable user_result1 or user_result2 doesn't exist.
Any idea, please?
Thanks, Joe.
(#U880D basically answered the question but the OCD me wants to mark this as fixed so I'm typing this)
This is working as expected - --tags basically let you skip every task except those with the tag specified. See the official doc for more info on tags:
https://docs.ansible.com/ansible/latest/user_guide/playbooks_tags.html
Echoing what #zeitounator said - if you want something to run unconditionally when --tag is used tag them with always.

How to dynamically set the hosts field in Ansible playbooks with a variable generated during execution?

I am trying to test something at home with the variables mechanism Ansible offers, which I am about to implement in one of my projects at work. So, been searching for a while now, but seems I can't get it working that easily, even with others` solutions here and there.
I will represent my project logic at work now, by demonstrating with my test directory & files structure at home. Here's the case, I have the following playbooks:
main.yaml
pl1.yaml
pl2.yaml
Contents of ./main.yaml:
- import_playbook: /home/martin/ansible/pl1.yaml
- import_playbook: /home/martin/ansible/pl2.yaml
Contents of ./pl1.yaml:
- name: Test playbook 1
hosts: localhost
tasks:
- name: Discovering the secret host
shell: cat /home/martin/secret
register: whichHostAd
- debug:
msg: "{{ whichHostAd.stdout }}"
- name: Discovering my hostname
shell: hostname
register: myHostnameAd
- set_fact:
whichHost: "{{ whichHostAd.stdout }}"
myHostname: "{{ myHostnameAd.stdout }}"
cacheable: yes
- name: Test playbook 1 part 2
hosts: "{{ hostvars['localhost']['ansible_facts']['whichHost'] }}"
tasks:
- name: Structuring info
shell: hostname
register: secretHostname
- name: Showing the secret hostname
debug:
msg: "{{ secretHostname.stdout }}"
Contents of ./pl2.yaml:
- name: Test Playbook 2
hosts: "{{ whichHost }}"
tasks:
- name: Finishing up
shell: echo "And here am i again.." && hostname
- name: Showing var myHostname
debug:
msg: "{{ myHostname.stdout }}"
The whole idea is to have a working variable on the go at the hosts field between the plays. How do we do that?
The playbook does not run at all if I won't define the whichHost variable as an extra arg, and that's ok, I can do it each time, but during the execution I would like to have that variable manageable and changeable. In the test case above, I want whichHost to be used everywhere across the plays/playbooks included in main.yaml, specifically to reflect the output of the first task in pl1.yaml (or the output of the whichHostAd.stdout variable), so I can determine the host I am about to target in pl2.yaml.
According to docs, I should be able to at least access it with hostvars (as in my playbook), but this is the output I get when I try the above example:
ERROR! The field 'hosts' has an invalid value, which includes an undefined variable. The error was: 'dict object' has no attribute 'whichHost'
The error appears to have been in '/home/martin/ansible/pl1.yaml': line 22, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Test playbook 1 part 2
^ here
set_fact also does not seem to be very helpful. Any help will be appreciated!
Ok, I've actually figured it out pretty fast.
So, we definitely need to have a fact task, holding the actual data/output:
- hosts: localhost
tasks:
- name: Saving variable
set_fact:
whichHost: "{{ whichHostAd.stdout }}"
After that, when you want to invoke the var in other hosts and plays, we have to provide the host and the fact:
"{{ hostvars['localhost']['whichHost'] }}"
Like in my test above, but without ['ansible_facts']

Using variable from set_fact in roles within same playbook

I'm stuck with using variable from tasks within roles in Ansible playbook. My playbook is following:
- hosts: server.com
gather_facts: yes
tasks:
- set_fact:
private_ip: "{{ item }}"
with_items: "{{ ansible_all_ipv4_addresses }}"
when: "item.startswith('10.')"
- debug: var=private_ip
roles:
- role: check-server
server_ip: 10.10.0.1
client_ip: "{{ private_ip }}
When pleybook is ran -debug shows correct IP inside the variable private_ip, but I can't make client_ip (from roles block) to get private_ip content. client_ip remains always undefined.
What sorcery can I apply here to have client_ip=$private_ip?
tasks are executed after roles are applied.
Change tasks to pre_tasks.
Besides, using set_fact in a loop is not the best practice. If you get the value you want, that's ok, I believe you verified it. But you should rather use (ansible_all_ipv4_addresses | select("match", "10\..*") | list)[0].

Ansible include roles that have been defined in hostvars

I am trying to do the following:
define appropriate host roles in hostvars
create a role to call ONLY the roles that relate to specific host and have been defined in a variable in hostvars
Is there a way to do this?
eg:
host_vars/hostname_one/mail.yml
roles_to_install:
- role_one
- role_two
- ...
run_all_roles.yml
---
- hosts: '{{ TARGET }}'
become: yes
...
roles:
- { role: "roles_to_install"}
Obviously this does not work.
Is there a way to make ansible-playbook -i <hosts_file> run_all_roles.yml -e "TARGET=hostname_one" to run?
This is not how you should be approaching your roles and inventories.
Instead, if you put your hosts in the inventory in appropriate groups you can use the hosts parameter of the playbook to drive what gets installed where.
For example I might have a typical web application that is running on NGINX with some application specific things (such as a Python environment), but is also fronted by some NGINX servers that may serve static content and there could also be a typical database.
My inventory might then look like this:
[frontend-web-nodes]
web-1.example.org
web-2.example.org
[application-nodes]
app-1.example.org
app-2.example.org
[database-nodes]
database.example.org
Now, I can create a playbook for my database role that goes and installs some database and configures and set hosts: database-nodes to make sure the play (and so the role(s) that it runs only targets the database.example.org box.
So something like this:
- name: database
hosts: database-nodes
roles:
- database
For my frontend and application web nodes I have a shared dependency on installing and configuring NGINX but my application servers also need some other things. So my front end web nodes can be configured with a simple play like this:
- name: frontend-web
hosts: frontend-web-nodes
roles:
- nginx
While for my application nodes I might either have something like this:
- name: application
hosts: application-nodes
roles:
- nginx
- application
Or I could just do this:
- name: application
hosts: application-nodes
roles:
- application
And in my roles/application/meta/main.yml define a dependency on the nginx role:
dependencies:
- role: nginx
As I commented the solution was easier than expected:
---
- hosts: '{{ TARGET }}'
become: yes
vars_files:
- ./vars/main.yml
roles:
- { role: "roleA", when: "'roleA' in roles_to_install" }
- { role: "roleB", when: "'roleB' in roles_to_install" }
...
Assuming that a correct roles_to_install var is defined inside host_vars/$fqdn/main.yml like so:
---
roles_to_install:
- roleA
- roleB
- ...
Thank you for you assistance guys
What about this:
playfile.yml:
- hosts: all
tasks:
- when: host_roles is defined
include_role:
name: "{{ role_item }}"
loop: "{{ host_roles }}"
loop_control:
loop_var: role_item
hostvars_file.yml:
host_roles:
- name: myrole1
myrole1_var1: "myrole1_value1"
myrole1_var2: "myrole1_value2"
- name: myrole2
myrole2_var1: "myrole2_value1"
myrole2_var2: "myrole2_value2"
but then your hostvar_roles would be run during task-execution, normally roles will be executed before tasks.
Alternativly why dont have a role for this:
roles/ansible.hostroles/tasks/main.yml:
---
# tasks file for ansible.hostroles
- when: host_roles is defined
include_role:
name: "{{ role_item }}"
loop: "{{ host_roles }}"
loop_control:
loop_var: role_item
playfile.yml:
- hosts: all
roles:
- ansible.hostroles