Snapshot vs. Volume Size - amazon-s3

I am using a public dataset snapshot in Amazon ec2. The data in the snapshot is roughly 150GB and the snapshot itself is 180GB. I knew that by performing operations on the dataset I would need more than 30GB of free memory so I put the snapshot in a 300GB volume. When I look at my stats though (unfortunately as a process is running, so I think I am about to run out of room), it appears that the snapshot is still limited to 180 GB.
Is there a way to expand its size to the size of the volume without losing my work?
Is there a possibility that the snapshot is actually continuous with another drive (e.g. /dev/sdb)? (A girl can hope, right?)
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 1.1G 8.4G 11% /
none 34G 120K 34G 1% /dev
none 35G 0 35G 0% /dev/shm
none 35G 56K 35G 1% /var/run
none 35G 0 35G 0% /var/lock
none 35G 0 35G 0% /lib/init/rw
/dev/sdb 827G 201M 785G 1% /mnt
/dev/sdf 174G 162G 2.6G 99% /var/lib/couchdb/0.10.0
My instance is running Ubuntu 10.

Is there a way to expand its size to the size of the volume without
losing my work?
That depends on whether you can live with a few minutes downtime for the computation, i.e. whether stopping the instance (hence the computation process) is a problem or not - Eric Hammond has written a detailed article about Resizing the Root Disk on a Running EBS Boot EC2 Instance, which addresses a different but pretty related problem:
[...] what if you have an EC2 instance already running and you need to
increase the size of its root disk without running a different
instance?
As long as you are ok with a little down time on the EC2 instance (few
minutes), it is possible to change out the root EBS volume with a
larger copy, without needing to start a new instance.
You have already done most of the steps he describes and created a new 300GB volume from the 180GB snapshot, but apparently you have missed the last required step indeed, namely resizing the file system on the volume - here are the instructions from Eric's article:
Connect to the instance with ssh (not shown) and resize the root file
system to fill the new EBS volume. This step is done automatically at
boot time on modern Ubuntu AMIs:
# ext3 root file system (most common)
sudo resize2fs /dev/sda1
#(OR)
sudo resize2fs /dev/xvda1
# XFS root file system (less common):
sudo apt-get update && sudo apt-get install -y xfsprogs
sudo xfs_growfs /
So the details depend on the file system in use on that volume, but there should be a respective resize command available for all but the most esoteric or outdated ones, none of which I'd expect in a regular Ubuntu 10 installation.
Good luck!
Appendix
Is there a possibility that the snapshot is actually continuous with
another drive (e.g. /dev/sdb)?
Not just like that, this would require a RAID setup of sorts, which is unlikely to be available on a stock Ubuntu 10, except if somebody provided you with a respectively customized AMI. The size of /dev/sdb does actually hint towards this being your Amazon EC2 Instance Storage:
When an instance is created from an Amazon Machine Image (AMI), in
most cases it comes with a preconfigured block of pre-attached disk
storage. Within this document, it is referred to as an instance store;
it is also known as an ephemeral store. An instance store provides
temporary block-level storage for Amazon EC2 instances. The data on
the instance store volumes persists only during the life of the
associated Amazon EC2 instance. The amount of this storage ranges from
160GiB to up to 3.3TiB and varies by Amazon EC2 instance type. [...] [emphasis mine]
Given this storage is not persisted on instance termination (in contrast to the EBS storage we all got used to enjoy - the different behavior is detailed in Root Device Storage), it should be treated with respective care (i.e. never store something on instance storage you couldn't afford to loose).

Related

Proposal to Migrate OpenNebula Datastore from Local FS to NFS

I have an instance of OpenNebula with 2 nodes running KVM and local file store. This means no live migration as vm images are scp'd to each node, so there is also no option of failover or Live Migration.
I would like to implement NFS shared storage and move the VM's from the local FS datastore to the NFS shared storage datastore. OpenNebula supports migrating VM's between datastores, but only datastores of the same type i.e. 'ssh' to 'ssh' but not 'ssh' to 'shared'.
I am working on a method of achieving this, and would love some feedback as to why this is a good or a bad idea.
Thanks
OpenNebula doesn't currently support migrating VM's from one type of datastore to another different type of datastore. I have been working on a method that is working and want to document it here to get some feedback and opinions on the method.
A datastore type is identified primarily by the Transfer manager Driver 'TM_MAD' setting. This setting cannot be changed, either through Sunstone or through the cli. So we need a method to do just this. This is what i did. I started with a fresh install of OpenNebula 5.4.13 in one VM, and 2 VM nodes all running Debian 9 within VMware virtual machines (don't forget to check virtualisation for the VM CPU options).
NOTE: This is an experimental process so make sure you Backup everything first!
Steps
To migrate to a different store, there are a few steps we need to do. They are as follows:
Setup the NFS share exports,
Move the VM images to the NFS share and mount the datastore,
Change the datastore types,
Configure the nodes for NFS share.
Setup NFS Server
First thing we want to do is setup the NFS shares that we want to use. I'm using a single share for the base datastore folder, but you could use separate shares for each datastore ID from different NFS servers.
On the NFS Server create the datastore folder i.e. mkdir /share/one_datastore,
Add the datastore path to exports and export the new share exportfs -rav,
Confirm the share is available showmount -e localhost
Prepare to Migrate
Before we modify the datastores there are a few things to do first:
Shut down any running VM's and undeploy them. This saves the machines states and copies the images back to the image store,
Stop Sunstone and OpenNebula services systemctl stop opennebula && systemctl stop opennebula-sunstone.
Migrate Data
Shared storage shares the VM disk images so all the nodes can access the same data. So copy the VM data to the NFS share ready for mounting.
From the Sunstone frontend server confirm the NFS shares showmount -e [nfs-server],
Create a temp folder to mount the share in mkdir /mnt/datastore,
Temporarily mount the NFS folder mount [nfs-server]:/share/one_datastore /mnt/datastore,
Move the datastore folders to the share mv /var/lib/one/datastores/* /mnt/datastore/
OpenNebula datastore folders now live on the NFS server: ls /mnt/datastore should list folders 0, 1 and 2,
Mount the NFS share to replace the OpenNebula datastore folder mount [nfs-server]:/share/one_datastore /var/lib/one/datastores,
Confirm the folders are available ls /var/lib/one/datastores should list our 3 folders 0, 1 and 2,
Add the mount into /etc/fstab to persist the mount on boot.
OpenNebula frontend is now configured to access the datastore folders from the NFS share. Next we want to change the datastores type from ssh to shared.
Change Datastore Types
The data for the datastore configuration is stored in the OpenNebula database /var/lib/one/one.db. We can change the driver type by editing the datastore configuration data which then tells OpenNebula whiche drivers to use, and how to handle the datastore data. By default OpenNebula uses an sqlite database with the option of MySql. i'm using sqlite but the same works for MySql.
Open the OpenNebula database sqlite3 /var/lib/one/one.db,
View all tables with .tables. datastore_pool is the table we want to modify,
List all the records in the table select * from datastore_pool; will result in a screen-full of configuration data. Each record has an identifier oid which matches the datastore ID, like this (the first 0 is the datastore ID for the default SYSTEM database):
0|system|<DATASTORE><ID>0</ID><UID>0</UID><GID>0</GID><UNAME>oneadmin</UNAME><GNAME>oneadmin</GNAME><NAME>system</NAME><PERMISSIONS><OWNER_U>1</OWNER_U><OWNER_M>1</OWNER_M><OWNER_A>0</OWNER_A><GROUP_U>1</GROUP_U><GROUP_M>0</GROUP_M><GROUP_A>0</GROUP_A><OTHER_U>0</OTHER_U><OTHER_M>0</OTHER_M><OTHER_A>0</OTHER_A></PERMISSIONS><DS_MAD><![CDATA[-]]></DS_MAD><TM_MAD><![CDATA[ssh]]></TM_MAD><BASE_PATH><![CDATA[/var/lib/one//datastores/0]]></BASE_PATH><TYPE>1</TYPE><DISK_TYPE>0</DISK_TYPE><STATE>0</STATE><CLUSTERS><ID>0</ID></CLUSTERS><TOTAL_MB>0</TOTAL_MB><FREE_MB>0</FREE_MB><USED_MB>0</USED_MB><IMAGES></IMAGES><TEMPLATE><ALLOW_ORPHANS><![CDATA[NO]]></ALLOW_ORPHANS><DISK_TYPE><![CDATA[FILE]]></DISK_TYPE><DS_MIGRATE><![CDATA[YES]]></DS_MIGRATE><RESTRICTED_DIRS><![CDATA[/]]></RESTRICTED_DIRS><SAFE_DIRS><![CDATA[/var/tmp]]></SAFE_DIRS><SHARED><![CDATA[NO]]></SHARED><TM_MAD><![CDATA[ssh]]></TM_MAD><TYPE><![CDATA[SYSTEM_DS]]></TYPE></TEMPLATE></DATASTORE>|0|0|1|1|0
Now to change the datastore type. Grab the data from the 3rd column body
(You can run select body from datastore_pool where oid=0;) and copy to your favourite text editor (that's the chunk starting with <DATASTORE> and ending with </DATASTORE>). Find and replace:
Find: <TM_MAD><![CDATA[ssh]]></TM_MAD>
Replace with: <TM_MAD><![CDATA[shared]]></TM_MAD>
Find: <SHARED><![CDATA[NO]]></SHARED>
Replace with: <SHARED><![CDATA[YES]]></SHARED>
Now to update the SYSTEM datastore record. Run the following command on the database, replacing [datastore-config] with the text block you just modified update datastore_pool set body='[datastore-config]' where oid=0,
Update IMAGE datastore is a little different. There is no SHARED option, but we want to use either shared or qcow2 drivers. I used qcow2. So: select body from datastore_pool where oid=1;:
Find: <TM_MAD><![CDATA[ssh]]></TM_MAD>
Replace: <TM_MAD><![CDATA[qcow2]]></TM_MAD>
Update the record: update datastore_pool set body='[datastore-config]' where oid=1;,
Update the FILES datastore (oid=3) by replacing <TM_MAD><![CDATA[ssh]]></TM_MAD> with <TM_MAD><![CDATA[shared]]></TM_MAD> and update using the method above.
Now that the datastores have been updated to use the shared driver, lets start Sunstone and check that the datastores show up.
systemctl start opennebula && systemctl start opennebula-sunstone
Jump into Sunstone web and go to datastores. Opening each datastore to check whether SHARED is enabled, and the correct drivers show i.e. shared or qcow2.
~DONT DO ANYTHING YET~ Still need to configure the nodes!
Configure the Nodes
So because we stopped and undeployed the VMs, there shouldn't be any data in the node datastores. So we can just set up NFS shares to the datastores folder. Confirm the folders are empty first and make sure to take backups! This is an experimental process so be warned! Right, lets get onto it:
Check the contents of /var/lib/one/datastores. If you are mounting each datastore ID based folder to its own NFS share then you can do this instead of the entire datastore folder. Empty any folders with 0, 1 and 2 folders. otherwise remove all folders from the datastores folder,
If not already installed: apt-get install nfs-common,
Check for NFS shares: showmount -e [nfs-server],
Mount the nfs share to the datastore folder: mount [nfs-server]:/share/one_datastore /var/lib/one/datastores,
Confirm the mount i.e. df,
Edit /etc/fstab adding the mount so its mounted on next boot.
Restart your node to confirm the datastore nfs persists, and to give them a restart!
Repeat with all host nodes.
Test it Out
In Sunstone go to the Hosts TAB and check they are up and running. Next go and grab a VM and deploy it. It should deploy without any issues and start booting.
Once up and running i like to constantly ping the VM while testing live migration. So start ping (ping [vm-ip] -t in windows) and then in Sunstone open the VM and do a 'Live Migrate' to another node. Watch the ping and check the logs to make sure it succeeded. I found i had to refresh the display, and go to the hosts TAB to check the VM had migrated. After that it showed correctly but i think its a caching issue in my browser. After the Live Migration you should still see the ping rolling along, with maybe one failed ping in the results.
Conclusion
So that's the process i used to migrate from ssh local storage to shared storage. I'v tested it and it is working without any issues. However, if you do have any issues or have an opinion on this process please let me know. If there are any pitfalls with this i have overlooked please also let me know.
Ok, have fun with it. I'm off to try moving the shared storage over to some kind of shared cluster like Ceph or GlusterFS!

Host Disk Usage: Warning message regarding disk usage

I've downloaded version HDF_3.0.2.0_vmware of the Hortonworks Sandbox. I am using VMWare Player version 6.0.7 on my laptop. Shortly after startup/logging into Ambari, I see this alert:
The message that is cut off reads: "Capacity Used: [60.11%, 32.3 GB], Capacity Total: [53.7 GB], path=/usr/hdp". I'd hoped that I would be able to focus on NiFi/Storm development rather than administering the sandbox itself, however it looks like the VM is undersized. Here are the VM settings I have for storage. How do I go about correcting the underlying issue prompting the alert?
I had similar issue, it's about node partitioning and directories mounted for data under HDFS -> Configs -> Settings -> DataNode
You can check your node partitioning using below command-
lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
Mostly hdfs namenode or datanode directories point to root partitions. We can change thresholds values for alerts temporary and to have permanent solution we can add additional data directories.
Below links can he helpful to do the same.
https://community.hortonworks.com/questions/21212/configure-storage-capacity-of-hadoop-cluster.html
Check from above link - I think your partitioning is wrong you are not using "/" for hdfs directory. If you want use full disk capacity, you can create any folder name under "/" example /data/1 on every data node using command "#mkdir -p /data/1" and add to it dfs.datanode.data.dir. restart the hdfs service.
https://hadooptips.wordpress.com/2015/10/16/fixing-ambari-agent-disk-usage-alert-critical/
https://community.hortonworks.com/questions/21687/how-to-increase-the-capacity-of-hdfs.html
I am not currently able to replicate this, but based on the screenshots the warning is just that there is less space available than recommended. If this is the case everything should still work.
Given that this is a sandbox that should never be used for production, feel free to ignore the warning.
If you want to get rid fo the warning sign, it may be possible to do a quick fix by changing the warning treshold via the alert definition.
If this is still not sufficient, or you want to leverage more storage, please follow the steps outlined by #manohar

How to stop Adobe Experience Manager from storing binaries into local file system?

We are using Amazon S3 for storing our binaries and we just want to keep reference of those binaries into our local files system. Currently, binaries are getting stored in both S3 and local file system.
Assuming that you are referring to repository/datastore folder when you are using the S3 data store, that is your S3 cache. You can change the size of the cache and in theory reduce it some small number but you cannot completely disable it
cacheSize=<size in bytes>
in your S3 config file.
Note that there is a practical lower limit to this number based on purge factor parameters. Setting this below 10% of your S3 bucket size will have lots of cache purge triggered and this will slow down your system. Changing it to zero will give a configuration error on startup.
Just for some background, the path property in your S3 config is the path to data store on a file system. This is because S3 datastore is implemented as a write through cache. All the S3 data is written on the file system and then asynchronously uploaded to the S3 bucket. The asynchronous uploads are controlled via other config in the same file (number of retries, threads etc.)
This write through cache gives a lot of performance boost to your application as write operations won't suffer from S3 net latency. You should, ideally, configure the cache size and purge ratio according to your disk requirements and performance efficiency rather than reducing it to bare minimum.
UPDATED 28 March 2017
Improved and updated the answer to reflect latest understanding.

Where should a dockerized web application store uploaded files?

I'm building a web application that needs to allow users to upload profile pictures. I want the application to be self-contained, so that people don't need to have an s3 or other cloud storage service account.
It's best to keep docker containers as disposable as possible, so I guess I should create a volume. So I want the volume to be created automatically, so people don't have to specify a volume when running the container, but the documentation for the VOLUME instruction in dockerfiles confuses me.
The VOLUME instruction creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers.
What does it mean to be marked as such? The data is to be written by the application, it's not coming from an extenrl source.
When you mark a volume in the dockerfile, say VOLUME /site/uploads,it makes it very easy to later run another container with --volumes-from <container-name> and have /site/uploads available in the new container with all the data that has been written and that will be written (if the first container is still running).
Also, you'll be able to see that volume with docker volume ls after you start the container the first time.
The only problem that you might have if you delete the container, is that you will lose the mapping provided by docker inspect <container-name> that tells you which volume your container created. To see the volume your container created really clearly and quickly, try docker inspect <container-name> | jq '.[].Mounts' if you have jq installed. Otherwise, docker inspect <container-name> | grep Mounts -A 10 might be enough when you only have one volume. (you can also just wade through all the json yourself)
Even if you remove the container that created the volume, the volume will remain on your system, viewable with docker volume ls unless you run docker volume rm <volume-name>
Note: I'm using docker version 1.10.3
You will not have problems with that, the images will be uploaded to the mounted filesystem without problems.
Maybe you have to specify free permissions to the uploads folder so that you can write on it.

Resize bound NFS volume in OpenShift Origin

I have a Jenkins server running on OpenShift Origin 1.1
The pod is using persistent storage using NFS. We have a pv of 3GB and a pvc on this volume. It's bound and Jenkins is using it. But when we perform:
sudo du -sh /folder we see our folder is 15GB. So we want to resize our persistent volume while it's still in use. How can we perform this?
EDIT: or is the best way to recreate the pv and pvc on the same folder as before. So all the data will remain in that folder?
This will be a manual process and is entirely dependent on the storage provider and the filesystem with which the volume is formatted.
Suppose you have NFS that has enough space say 15 Gb and you have pv only of 3Gb then you can simply edit the pv to increase the size.
"oc edit pv [name]" works and you can edit the size of the volume.