Proposal to Migrate OpenNebula Datastore from Local FS to NFS

Proposal to Migrate OpenNebula Datastore from Local FS to NFS - ssh

I have an instance of OpenNebula with 2 nodes running KVM and local file store. This means no live migration as vm images are scp'd to each node, so there is also no option of failover or Live Migration.
I would like to implement NFS shared storage and move the VM's from the local FS datastore to the NFS shared storage datastore. OpenNebula supports migrating VM's between datastores, but only datastores of the same type i.e. 'ssh' to 'ssh' but not 'ssh' to 'shared'.
I am working on a method of achieving this, and would love some feedback as to why this is a good or a bad idea.
Thanks

OpenNebula doesn't currently support migrating VM's from one type of datastore to another different type of datastore. I have been working on a method that is working and want to document it here to get some feedback and opinions on the method.
A datastore type is identified primarily by the Transfer manager Driver 'TM_MAD' setting. This setting cannot be changed, either through Sunstone or through the cli. So we need a method to do just this. This is what i did. I started with a fresh install of OpenNebula 5.4.13 in one VM, and 2 VM nodes all running Debian 9 within VMware virtual machines (don't forget to check virtualisation for the VM CPU options).
NOTE: This is an experimental process so make sure you Backup everything first!
Steps
To migrate to a different store, there are a few steps we need to do. They are as follows:
Setup the NFS share exports,
Move the VM images to the NFS share and mount the datastore,
Change the datastore types,
Configure the nodes for NFS share.
Setup NFS Server
First thing we want to do is setup the NFS shares that we want to use. I'm using a single share for the base datastore folder, but you could use separate shares for each datastore ID from different NFS servers.
On the NFS Server create the datastore folder i.e. mkdir /share/one_datastore,
Add the datastore path to exports and export the new share exportfs -rav,
Confirm the share is available showmount -e localhost
Prepare to Migrate
Before we modify the datastores there are a few things to do first:
Shut down any running VM's and undeploy them. This saves the machines states and copies the images back to the image store,
Stop Sunstone and OpenNebula services systemctl stop opennebula && systemctl stop opennebula-sunstone.
Migrate Data
Shared storage shares the VM disk images so all the nodes can access the same data. So copy the VM data to the NFS share ready for mounting.
From the Sunstone frontend server confirm the NFS shares showmount -e [nfs-server],
Create a temp folder to mount the share in mkdir /mnt/datastore,
Temporarily mount the NFS folder mount [nfs-server]:/share/one_datastore /mnt/datastore,
Move the datastore folders to the share mv /var/lib/one/datastores/* /mnt/datastore/
OpenNebula datastore folders now live on the NFS server: ls /mnt/datastore should list folders 0, 1 and 2,
Mount the NFS share to replace the OpenNebula datastore folder mount [nfs-server]:/share/one_datastore /var/lib/one/datastores,
Confirm the folders are available ls /var/lib/one/datastores should list our 3 folders 0, 1 and 2,
Add the mount into /etc/fstab to persist the mount on boot.
OpenNebula frontend is now configured to access the datastore folders from the NFS share. Next we want to change the datastores type from ssh to shared.
Change Datastore Types
The data for the datastore configuration is stored in the OpenNebula database /var/lib/one/one.db. We can change the driver type by editing the datastore configuration data which then tells OpenNebula whiche drivers to use, and how to handle the datastore data. By default OpenNebula uses an sqlite database with the option of MySql. i'm using sqlite but the same works for MySql.
Open the OpenNebula database sqlite3 /var/lib/one/one.db,
View all tables with .tables. datastore_pool is the table we want to modify,
List all the records in the table select * from datastore_pool; will result in a screen-full of configuration data. Each record has an identifier oid which matches the datastore ID, like this (the first 0 is the datastore ID for the default SYSTEM database):
0|system|<DATASTORE><ID>0</ID><UID>0</UID><GID>0</GID><UNAME>oneadmin</UNAME><GNAME>oneadmin</GNAME><NAME>system</NAME><PERMISSIONS><OWNER_U>1</OWNER_U><OWNER_M>1</OWNER_M><OWNER_A>0</OWNER_A><GROUP_U>1</GROUP_U><GROUP_M>0</GROUP_M><GROUP_A>0</GROUP_A><OTHER_U>0</OTHER_U><OTHER_M>0</OTHER_M><OTHER_A>0</OTHER_A></PERMISSIONS><DS_MAD><![CDATA[-]]></DS_MAD><TM_MAD><![CDATA[ssh]]></TM_MAD><BASE_PATH><![CDATA[/var/lib/one//datastores/0]]></BASE_PATH><TYPE>1</TYPE><DISK_TYPE>0</DISK_TYPE><STATE>0</STATE><CLUSTERS><ID>0</ID></CLUSTERS><TOTAL_MB>0</TOTAL_MB><FREE_MB>0</FREE_MB><USED_MB>0</USED_MB><IMAGES></IMAGES><TEMPLATE><ALLOW_ORPHANS><![CDATA[NO]]></ALLOW_ORPHANS><DISK_TYPE><![CDATA[FILE]]></DISK_TYPE><DS_MIGRATE><![CDATA[YES]]></DS_MIGRATE><RESTRICTED_DIRS><![CDATA[/]]></RESTRICTED_DIRS><SAFE_DIRS><![CDATA[/var/tmp]]></SAFE_DIRS><SHARED><![CDATA[NO]]></SHARED><TM_MAD><![CDATA[ssh]]></TM_MAD><TYPE><![CDATA[SYSTEM_DS]]></TYPE></TEMPLATE></DATASTORE>|0|0|1|1|0
Now to change the datastore type. Grab the data from the 3rd column body
(You can run select body from datastore_pool where oid=0;) and copy to your favourite text editor (that's the chunk starting with <DATASTORE> and ending with </DATASTORE>). Find and replace:
Find: <TM_MAD><![CDATA[ssh]]></TM_MAD>
Replace with: <TM_MAD><![CDATA[shared]]></TM_MAD>
Find: <SHARED><![CDATA[NO]]></SHARED>
Replace with: <SHARED><![CDATA[YES]]></SHARED>
Now to update the SYSTEM datastore record. Run the following command on the database, replacing [datastore-config] with the text block you just modified update datastore_pool set body='[datastore-config]' where oid=0,
Update IMAGE datastore is a little different. There is no SHARED option, but we want to use either shared or qcow2 drivers. I used qcow2. So: select body from datastore_pool where oid=1;:
Find: <TM_MAD><![CDATA[ssh]]></TM_MAD>
Replace: <TM_MAD><![CDATA[qcow2]]></TM_MAD>
Update the record: update datastore_pool set body='[datastore-config]' where oid=1;,
Update the FILES datastore (oid=3) by replacing <TM_MAD><![CDATA[ssh]]></TM_MAD> with <TM_MAD><![CDATA[shared]]></TM_MAD> and update using the method above.
Now that the datastores have been updated to use the shared driver, lets start Sunstone and check that the datastores show up.
systemctl start opennebula && systemctl start opennebula-sunstone
Jump into Sunstone web and go to datastores. Opening each datastore to check whether SHARED is enabled, and the correct drivers show i.e. shared or qcow2.
~DONT DO ANYTHING YET~ Still need to configure the nodes!
Configure the Nodes
So because we stopped and undeployed the VMs, there shouldn't be any data in the node datastores. So we can just set up NFS shares to the datastores folder. Confirm the folders are empty first and make sure to take backups! This is an experimental process so be warned! Right, lets get onto it:
Check the contents of /var/lib/one/datastores. If you are mounting each datastore ID based folder to its own NFS share then you can do this instead of the entire datastore folder. Empty any folders with 0, 1 and 2 folders. otherwise remove all folders from the datastores folder,
If not already installed: apt-get install nfs-common,
Check for NFS shares: showmount -e [nfs-server],
Mount the nfs share to the datastore folder: mount [nfs-server]:/share/one_datastore /var/lib/one/datastores,
Confirm the mount i.e. df,
Edit /etc/fstab adding the mount so its mounted on next boot.
Restart your node to confirm the datastore nfs persists, and to give them a restart!
Repeat with all host nodes.
Test it Out
In Sunstone go to the Hosts TAB and check they are up and running. Next go and grab a VM and deploy it. It should deploy without any issues and start booting.
Once up and running i like to constantly ping the VM while testing live migration. So start ping (ping [vm-ip] -t in windows) and then in Sunstone open the VM and do a 'Live Migrate' to another node. Watch the ping and check the logs to make sure it succeeded. I found i had to refresh the display, and go to the hosts TAB to check the VM had migrated. After that it showed correctly but i think its a caching issue in my browser. After the Live Migration you should still see the ping rolling along, with maybe one failed ping in the results.
Conclusion
So that's the process i used to migrate from ssh local storage to shared storage. I'v tested it and it is working without any issues. However, if you do have any issues or have an opinion on this process please let me know. If there are any pitfalls with this i have overlooked please also let me know.
Ok, have fun with it. I'm off to try moving the shared storage over to some kind of shared cluster like Ceph or GlusterFS!

Related

AzureFileShareConfiguration mount drive disconnected

I am trying to create a Pool using Azure Batch . I have uploaded content to Azure Storage using File Shares.
I would like my Pool to mount this Azure File Share as virtual file system (ref: https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#mount-a-virtual-file-system-on-a-pool ).
I am creating AzureFileShareConfiguration object using code:
mount_configuration=batchmodels.MountConfiguration(azure_file_share_configuration=batchmodels.AzureFileShareConfiguration(
account_name="mystorage",
azure_file_url="https://mystorage.file.core.windows.net/my-share1",
account_key="mystorage/key==",
relative_mount_path="S"
)
)
Using this, I get "CMDKEY: Credentials added successfully" in fsmounts. But when I RDP to the node in the pool, the S drive appears "Disconnected".
My Azure batch package versions are:
azure-batch==8.0.0
azure-common==1.1.24
Can you please help diagnose the issue or suggest the right usage?
Thanks in Advance!

I think this is windows VM you are trying?, just by looking at the drive letter : ).
Here is the key issue with RDP permissions is different then your Batch level model when your code runs and mount.
At Batch level when you mount your Drive: and you can see it via your Start task then it is working. i.e. that a Batch level permissioning model and when you RDP into Node it will be as a "user" you are logged-in. If you want to see via UI RDP user you should re-run the command from your RDP login to update that you have key to see that drive.
Although having said that try it with /persistent:Yes as mount_options.
The best test is going to be -- You mount the drive and from your start task go to the mounted directory via : S:\\Whatever_file.txt or read the mounted file which will add the result in your stdout.txt of batch node or might be just dir it or something.
Rest extra stuff below
try with this mount_options value
Also specifically this will help for various SMB version et. al. support: https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows and I think this you already know : https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount#azure-files-share
In order to use an Azure file share outside of the Azure region it is
hosted in, such as on-premises or in a different Azure region, the OS
must support SMB 3.0.
So add this to your API and give it a try:
MountOptions = "/persistent:Yes" i.e. mount_options = "/persistent:Yes"
Also: key needs to be Storage account Key, i.e. it should not start with mystorage/key :) but it could be you hiding it, so just a mention and fyi.
Sample code:
I think SDK you have is python?
mount_configuration=batchmodels.MountConfiguration(azure_file_share_configuration=batchmodels.AzureFileShareConfiguration(
account_name="mystorage",
azure_file_url="https://mystorage.file.core.windows.net/my-share1",
account_key="mystorage/key==",
relative_mount_path="S",
mount_options = "/persistent:Yes"
)
hope this helps!

relative_mount_path: The relative path on the compute node where the file system will be mounted. All file systems are mounted relative to the Batch mounts directory, accessible via the AZ_BATCH_NODE_MOUNTS_DIR environment variable.
Azure Files is the standard Azure cloud file system offering. To learn more about how to get any of the parameters in the mount configuration code sample, see Use an Azure Files share.

Accessing external hard drive after logging into a remote machine using ssh command

I am doing an intensive computing project with a super old C program. The program requires a library called Sun Performance Library which is a commercial ware. Instead of purchasing the library by myself, I am running the program by logging onto a Solaris machine in our computer lab with the ssh command, while the working directory to store output data is still on my local Mac.
Now, a problem just occurred: the program uses large amount of disk space to save some intermediate results and the space on my local Mac is quickly filled (50 GB for each user prescribed by the administrator). These results are necessary for the next stage of computing and I cannot delete any of them before it finally produce the output data. Therefore, I have to move the working directory to an external hard drive in order to continue. Obviously,
cd /Volumes/VOLNAME
is not the correct way to do it because the remote machine will give me a prompt saying
/Volumes/VOLNAME: No such file or directory.
So, what is the correct way to do it?

sshfs recently added support for "slave mode" which allows you to do this. Assuming you have sshfs on Solaris (I'm not sure about this), the following command (ran from your Mac) will do what you want: dpipe /usr/lib/openssh/sftp-server = ssh SOLARISHOSTNAME sshfs MACHOSTNAME:/Volumes/VOLNAME MOUNTPOINT -o slave
This will result in the MOUNTPOINT directory on the server being mounted to your local external drive. Note that I'm not sure whether macOS has dpipe. If it doesn't, you can replace it with one of the equivalent solutions at How to make bidirectional pipe between two programs?. Also, if your SFTP server binary is somewhere else, substitute its path.

The common way to mount a remote volume in Solaris is via NFS, but that usually requires root permissions.
Another approach would be to make your application read its data from stdin and output its results to stdout, without using the file system directly. Then you could just redirect the data from/to your local machine through ssh. For instance:
ssh user#host </Volumes/VOLNAME/input.data >/Volumes/VOLNAME/output.data

Where should a dockerized web application store uploaded files?

I'm building a web application that needs to allow users to upload profile pictures. I want the application to be self-contained, so that people don't need to have an s3 or other cloud storage service account.
It's best to keep docker containers as disposable as possible, so I guess I should create a volume. So I want the volume to be created automatically, so people don't have to specify a volume when running the container, but the documentation for the VOLUME instruction in dockerfiles confuses me.
The VOLUME instruction creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers.
What does it mean to be marked as such? The data is to be written by the application, it's not coming from an extenrl source.

When you mark a volume in the dockerfile, say VOLUME /site/uploads,it makes it very easy to later run another container with --volumes-from <container-name> and have /site/uploads available in the new container with all the data that has been written and that will be written (if the first container is still running).
Also, you'll be able to see that volume with docker volume ls after you start the container the first time.
The only problem that you might have if you delete the container, is that you will lose the mapping provided by docker inspect <container-name> that tells you which volume your container created. To see the volume your container created really clearly and quickly, try docker inspect <container-name> | jq '.[].Mounts' if you have jq installed. Otherwise, docker inspect <container-name> | grep Mounts -A 10 might be enough when you only have one volume. (you can also just wade through all the json yourself)
Even if you remove the container that created the volume, the volume will remain on your system, viewable with docker volume ls unless you run docker volume rm <volume-name>
Note: I'm using docker version 1.10.3

You will not have problems with that, the images will be uploaded to the mounted filesystem without problems.
Maybe you have to specify free permissions to the uploads folder so that you can write on it.

Cannot connect to Compute Engine CentOS Virtual Machine

I am new to Virtual Machines and CLI so please bear with me.
I have a CentOS 6.5 running on Compute Engine.
I ran yum update (without creating a snapshot of the previous disk - Yes I am an idiot) and not I cannot connect to the machine using the ip address.
I tried the following steps.
Tried to connect through Filezilla - didn't work.
Tried through Putty - didn't work
Tried through the browser option given by the CE console - didn't work.
I even tried creating a snapshot and starting up another VM with the snapshot - didn't work.
If anyone knows how I can get the files and folders out from the previous disk, I can start up a new VM and transfer everything again.
I do not have the latest database and this is important.
Please help!
Thanks
Warren

The way to recover is to delete your VM without deleting the disk, then create another VM with its own boot disk, attach and mount the original disk, and recover any data that you need from it.
First things first: on the VM instances page, click on the instance name that is currently running with that disk, and uncheck the box "Delete boot disk when instance is deleted". Then delete the instance.
Now, create a new instance with its own boot disk. To differentiate this new disk from the original boot disk:
using a different OS (or version of the OS) for the new disk, e.g., if using Ubuntu, try a different version or use Debian; if using RHEL, try CentOS, or vice versa
see which one is mounted at / — this should be the new disk
Mount the original disk as read-only and recover any information you need. Once you have a backup of your data, you can remount it with read-write access and try to fix it (but back up the data first!).

I finally solved this problem thanks to Misha for sending me in the right direction.
The steps are below for anyone who has the same issue.
Problem:
While updating the Centos server using yum update, I was unable to connect back to the server.
I tried all possible combinations but no luck. This seems to be a known issue as there was some material on the Compute Engine site regarding this.
Solution:
I followed the steps as Misha suggested. I started up another VM with its own boot disk and then attached the original disk with read write access.
Note: I was unable to mount the disk as just read only.
The commands were
mkdir /mnt/sdb1
mount /dev/sdb1 /mnt/sdb1
Once I mounted the VM, I copied the files from the html folder in the sdb1 disk to the html folder in the sda1(the new boot disk).
The database was a bit more challenging.
I tried quite a few times but copying the files from /dev/sdb1/var/lib/mysql into the new disk mysql folder was not working.
I found some tutorials but nothing helped.
Finally I downloaded the files from within the /dev/sdb1/var/lib/mysql and put them in my local windows mysql installation within the data folder.
Remember you have to download everything which includes the ib_logfile0 , ib_logfile1 and ibdata1 including the folder which has the *.frm files.
Then I opened localhost/phpmyadmin and voila... the files were there.
The rest was pretty simple... Exporting and uploading the SQL scripts back to the server.
This took me about 12 hours to figure out.
Thanks again Misha.

Recovering Apache from a mounted, unavailable NFS Mount

I have several web applications in production that utilize NFS mounts to share resources (usually static asset files) among web heads. In the event that an NFS mount becomes unavailable, Apache will hang requesting files that cannot be accessed, the kernel will log:
Nov 2 14:21:20 server2 kernel: nfs: server server1 not responding, still trying
I reproduced the behavior in RHEL5 running NFS v3 and Apache 2.2.3:
Create an NFS Mount on Server1 (contents of my /etc/exports)
/srv/test_share server2(rw)
Mount the NFS share on Server2 (contents of my /etc/fstab)
server1:/srv/test_share /mnt/test_share nfs defaults 0 0
Setup a virtual host in Apache with a simple HTML file referencing image files stored on the NFS sharen
Load the site, the html and image files all return 200
Unmount the NFS Share, loading the page returns 404s for the images referenced
Remount the NFS Share
Simulate an NFS crash by turning NFS off on Server1 - reloading the site hangs retrieving the referenced files.
Internet searches so far have not turned up a good solution. Basically the desired behavior would be for the web server to return 404s and not hang until the NFS mount recovers.
Cheers,
Ben

couple of options:
get your nfs mount options right, you need to do a soft mount so nfs access can be interupted. try soft,intr,timeo=10 instead of default
sync your document roots with something else like rsync, or script yourself a semi-atomatic checkout/export from your SCM, if you use one. SCM use is recommended anyway, gives you the possibility to revert to the last working version, for instance
use a real distributed filesystem (preferably fault tolerant like coda) or even a distributed block device system like drdb
option 2 and 3 give you disconnected operation and are therefore much more robust than nfs. drdb is sexy, but my advice would be option 2 with somwething like git or svn, simple and robust

I would not directly serve from the NFS mount, but instead from your local filesystem.
It wouldn't be too hard to setup a cron job that synced the NFS mount to the local file system every few minutes. Apache would serve its content from there, not depending on the NFS mount. If the mount goes down, Apache would still be able to serve the assets, although they might be out of date until the NFS mount comes back up.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas