Mlflow UI is taking forever to install and still not done - google-colaboratory

I am running my code on google colab to bring mlflow dashboard and whenever I ran !mlflow ui and it is taking forever to execute. The last text on my screen is the Booting worker with pid. This is my first time working with mlflow can anyone tell me why this is happening and what I can do to fix it?

You can use mlflow ui to see logs, it doesn't install anything. In fact, it hosts a server using gunicorn. In order to connect to the tracking server created by Colab, reading this thread could be useful. (Also this doc)
I recommend you to run mlflow ui command on your local host and then go to the listening address to see what happens. (Things tracked on the colab don't show here!)

aimlflow might be helpful. It helps to run a beautiful UI on top of mlflow logs
the code: https://github.com/aimhubio/aimlflow

Related

Colab: How to disconnect from session without closing the tab?

Some background
My computer fan goes crazy when I am using Google Colab, it definitely uses local resources somehow. I am running very long processes (over 4 hours). Yesterday, it occurred to me that I was disconnected, I thought my session had crashed since I stoped receiving the status updates of my task's progress bar. But then after clicking on Connect to a hosted runtime I was able to reconnect to that session and just interact with it fine. Given that Google Colab uses some of my local resource, I looking for a way to put the client application on halt for a little bit.
Question
How to manually disconnect from my remote session without crashing/terminating it? Is that even possible?
Note:
There is an answer for Does Google Colab stay connected when I close my browser? that says
The current cell will continue executing once you close your browser, but the outputs will not end up in the notebook in Drive.
I would be fine if I am able to leave the session running remotely but not being able to access the outputs on the notebook, given that I save the result on google drive when the process is done. So, not been able to see the output on the notebook would not be an issue for me.

Reconnect to Google Cloud Platform Terminal

I am running a python machine learning script on google cloud platform. I have connected through SSH in browser. When I run the code it works, but when I close the browser it seems to stop running.
I believe I can make it run in the background with nohup, but I want to be able to check back in on it as it prints outputs on its progress.
Basically I want to be able to start the script, close the terminal and then reconnect from any machine to check on its progress. Any help would be really appreciated.
I am new to google cloud platform if any of this was unclear please as an ill try providing more detail.
You may use an app called as screen. Just install it using `sudo apt-get install screen`` (if debian, ubuntu).
In some cases it might be already installed in your instance, you may check it.
Once installed enter the following command into the terminal:
screen
and press enter. Now, You may start with your job in terminal.
The moment you need to disconnect you may press Ctrl+A and then d.
The session would be disconnected. You may note the session id that would be displayed (eg. detached from 1498.pts-1.server)
You may now close the terminal.
When you come back, use the following command to get back into the older session.
screen -r *screen_id* (eg. screen -r **1498.pts-1.server**)
This process is checked for google cloud, ssh through browser, it really works.
Check this site for mode details.
It sounds like you're referring to the Google Cloud Shell feature. If so then what you desire is not possible, the cloud shell is not intended for non-interactive operation. From Usage limits:
Cloud Shell is intended for interactive use only. Non-interactive
sessions will be ended automatically after a warning.
The cloud shell operates on a temporary Compute Engine virtual machine, which is running only while the cloud shell session is active in the browser.
Apart from the obvious approach of keeping the browser session active while your application is running, you could also provision yourself a non-temporary Compute Engine instance (a free one is available), to which you can connect and on which you can run non-interactive applications as you desire.

SSH timeout when running importDump.php on a Bitnami Mediawiki instance on Google Cloud server

The import seems to start out ok, showing the contents of the mediawiki in the terminal window. At some point (often around the same point in the content), the SSH terminal freezes up. Opera Browser returns an 'out of memory' message.
2 questions -
Can I just start the import and ask the server to run it regardless of the status of the terminal window on my machine (or the internet connection)?
If no to #1, what can I modify to prevent the terminal from timing out?
It could be that the cause of the problem is not the Google Cloud server or the network but a problem with the client browser being used.
A good test would be to do the same operation using another browser if possible and see how it goes. If the operation is successful then it means that it is a problem with the Opera browser itself.
Also check the memory configurations on the client machine to see if it can handle the request.
There has been reports of out of memory errors in Opera:
https://forums.opera.com/topic/17877/new-version-out-of-memory-issue
If you have tried other browsers and issue is the same then it should not be caused by the Opera browser ‘out of memory’ error.
Have you provided your Bitnami Mediawiki deployment with the correct instance specs to handle every request?
At the Google Cloud Platform click on Products & Services which is the
icon with the four bars at the top left hand corner.
On the menu go to the Compute section and hover on ‘Compute Engine’ and
then click on ‘VM Instances’ to view all your instances.
Click on your Bitnami instance to see more details.
Go to ‘Machine type’ where you can see CPU and memory allocated for the
instance.
Ensure Bitnami Mediawiki instance has a good profile to handle the request.
You can also check the instance performance while you’re doing the import and see how it behaves.
As per the documentation, running importDump.php can take quite a long time. For a large Wikipedia dump with millions of pages, it may take days, even on a fast server.

Should I run Jenkins by command line instead of as a service?

I have been running Jenkins as a service on EC2 for a while. The problem is that since it's being run as a service, the chrome browser size is smaller than what we need. We are now running it using command line (not as a service) so it has a bigger browser size now. The only issue that I've observed so far was that the performance decreased. It took 1:30h for a 50mins jobs.
Should I keep running it using command line? Any other concerns that I need to worry about? (except the performance issue) Thank you.
Try below steps, might help you out
Stop the service (Jenkins.exe)
Right Click >> Properties >> Log On TAB >> Local System Account >> Check Allow service to interact with Desktop
then it will take the resolution of the monitor you currently use.
Make sure you put browser.driver.manage().window().maximize(); in your script.
Let me know if this does not works out.

DCOS: not able to start any service is always shows deploying

Enviornment :
DCOS : 1.7 running on vagrant
There are many reasons this could fail, but you did not provide enough information to narrow it down. However, the most common problem observed when getting started is missing the bit about having a private vs public agent available and the role being specified in the app json. I can't tell from the single screen shot what environment you are running or the json for the apps you were trying to run. If you are really stuck, try visiting https://dcos.io/docs/1.7/administration/installing/custom/troubleshooting/ and check out their slack channel for assistance. https://dcos-community.slack.com/
I got the same problem when installing DCOS form the Azure template, when I logged in, it appeared that I had 0 nodes connected in my cluster (which is obviously not good). I reinstalled it from another template and it fixed my problem. Hope it helps.