Azure Container Instances stuck in "Creating" state - azure-container-service

Whether I have the azure agent plugin for Jenkins make my container, or if I do it manually, it seems like either way it never enters a running state.
az container create \
--os-type Windows \
--location eastus \
--registry-login-server SERVER.azurecr.io \
--registry-password PASSWORD \
--registry-username USERNAME \
--image namespace/image \
--name jenkins-permanent \
--resource-group devops-aci \
--cpu 2 \
--memory 3.5 \
--restart-policy Always \
--command-line "-jnlpUrl http://host:8080/computer/NAME/slave-agent.jnlp -secret SECRET -workDir \"C:\\jenkins\""
I've gone through all the troubleshooting steps that apply, tried a different region, but to no avail.
Here's a current event that I got which seems to be the most progress I've had yet:
{
"count": 1,
"firstTimestamp": "2017-12-07T03:02:56+00:00",
"lastTimestamp": "2017-12-07T03:02:56+00:00",
"message": "Failed to pull image \"MYREPO.azurecr.io/my-company/windows-agent:latest\": Error response from da
emon: {\"message\":\"Get https://MYREPO.azurecr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout
exceeded while awaiting headers)\"}",
"name": "Failed",
"type": "Warning"
}
The funny thing is, this event happens before and after one case of the instance working (but unfortunately my entrypoint command was wrong, so it never started).
I really feel like Azure is punting on this and I just have no way to change the order I do anything. It's simply one command.

Alexander, here's a lead to actually check what could be causing the delay, or if the deployment has failed in the background, this information would be critical to narrow down what the issue is: https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-troubleshoot-tips#determine-error-code
From the article above check on deployment logs:
Enable debug logging:
PowerShell
In PowerShell, set the DeploymentDebugLogLevel parameter to All, ResponseContent, or RequestContent.
New-AzureRmResourceGroupDeployment -ResourceGroupName examplegroup -TemplateFile c:\Azure\Templates\storage.json -DeploymentDebugLogLevel All
or Azure CLI:
az group deployment operation list --resource-group ExampleGroup --name vmlinux
Check Also Check deployment sequence:
Many deployment errors happen when resources are deployed in an unexpected sequence. These errors arise when dependencies are not correctly set. When you are missing a needed dependency, one resource attempts to use a value for another resource but the other does not yet exist.
The above link contains more details. Let me know if this helps.

Figured it out, turns out the backslashes in the command in my executable path were not having their escapes honoured. Either because I was calling az from bash, or because something Azure side isn't handling the escaping correctly, or not escaping them itself.
My solution has been to just use forward slashes in the paths. Windows seems to be handling them correctly, and I prefer to not be bothered with its odd preference for backslashes.
Related to my issue is that the speed of the service makes troubleshooting very difficult. It takes a long time to go round trip with any fixes. So if you're using Azure Container Instances and want better performance, go upvote this feedback item that I've created.

How big is your image? You can always debug with 2 steps.
Run az container show -g devops-aci -n jenkins-permanent. It should contain a list of container events in the container json object. The event message should give you hint what's going on.
Run az container logs -g devops-aci -n jenkins-permanent. It should give you the logs of your container. If it's a problem within your image, you should be able to see some error output.

Related

How to send notification to Telegram from GitLab pipeline?

In our small startup we use GitLab for development and Telegram for internal communication between developers and PO. Since the PO would like to see the progress immediately, we have set up the GitLab Pipeline so that the preview version is deployed on the web server after each commit. Now we want to expand the pipeline. So that after the deployment a notification is sent via the Telegram group.
So the question - is that possible, and if so, how?
EDIT: since I've already implemented that, that's not a real question. I wanted to post the answer here so that others can use it as well.
So, we'll go through it step by step:
Create a Telegram bot
Add bot to Telegram group
Find out Telegram group Id
Send message via GitLab Pipeline
1. Create a Telegram bot
There are enough good instruction from Telegram itself for this:
https://core.telegram.org/bots#6-botfather
The instructions do not say anything explicitly, but to generate it, you have to go into the chat with the BotFather.
At the end you get a bot token, something like 110201543:AAHdqTcvCH1vGWJxfSeofSAs0K5PALDsaw
2. Add bot to Telegram group
Switch to the Telegram group, and add the created bot as a member (look for the bot by name).
3. Find out Telegram group Id
Get the update status for the bot in browser:
https://api.telegram.org/bot<YourBOTToken>/getUpdates
Find the chat-id in the response:
... "chat": {"id": <YourGroupID>, ...
see for more details: Telegram Bot - how to get a group chat id?
4. Send message via GitLab Pipeline
Send message with a curl command. For example, an existing stage in gitlab pipeline can be extended for this purpose:
upload:
stage: deploy
image: alpine:latest
script:
- 'apk --no-cache add curl'
- 'curl -X POST -H "Content-Type: application/json" -d "{\"chat_id\": \"<YourGroupID>\", \"text\": \"CI: new version was uploaded, see: https://preview.startup.com\"}" https://api.telegram.org/bot<YourBOTToken>/sendMessage '
only:
- main
Remember to adapt the YourBOTToken and YourGroupID, and the text for the message.
*) we use the alpine docker image here, so curl has to be installed - 'apk --no-cache add curl'. With other images this may have to be done in a different way.
One easy way to send notifications (particularly if you're using multiple services or chats) is to use apprise.
To send to one telegram channel:
apprise -vv --body="Notify telegram chat" \
tgram://bottoken/ChatID1 \
This makes it easy to notify many services from your pipeline all at once without needing to write code against the API of each service (apprise handles this for you).
image: python:3.9-slim # or :3.9-alpine if you prefer a smaller image
before_script:
- pip install apprise # consider caching PIP_CACHE_DIR for performance
script: |
# Set a notification to multiple telegram chats, a yahoo email account,
# Slack, and a Kodi Server with a bit of added verbosity:
apprise -vv --body="Notify more than one service" \
tgram://bottoken/ChatID1/ChatID2/ChatIDN \
mailto://user:password#yahoo.com \
slack://token_a/token_b/token_c \
kodi://example.com

Azure Container Instance is immediately killed on Startup

I am trying to run an azure container instance but it appears to be getting killed off the second I run it. This works fine in 2 other resource groups but not my production resource group where I see the following:
In events I see 'Successfully pulled image
selenium/standalone-chrome:latest' with count 1 and then 'Started
container' and then 'Killing container' with count 31. The times for
started and killed are the same.
In logs, it just says 'No logs available'
The metrics for CPU and memory on the container never show any change from zero.
Looked at this article but the proposed solution didn't work: Azure Container Group Instance I have tried putting on both an empty directory volume and 2Gb of ram as advised here: https://github.com/SeleniumHQ/docker-selenium but nothing works.
This is the code I am using to create the container:
containerGroup = await azure.ContainerGroups.Define(containerName)
.WithRegion("West Europe")
.WithExistingResourceGroup(configuration.ContainerResourceGroup)
.WithLinux()
.WithPublicImageRegistryOnly()
.WithEmptyDirectoryVolume("devshm")
.DefineContainerInstance(containerName)
.WithImage("selenium/standalone-chrome")
.WithExternalTcpPorts(4444)
.WithVolumeMountSetting("devshm", "/dev/shm")
.WithMemorySizeInGB(2)
.Attach()
.WithDnsPrefix(configuration.AppServiceName + "container")
.WithRestartPolicy(ContainerGroupRestartPolicy.OnFailure)
.CreateAsync(cancellationToken);
How do I debug what is going wrong?
What is wrong with the container?
In case this helps someone I renamed the "containerName" parameter in the above example from myinstance to myinstance1 and changed the region from West Europe to UK South. This fixed the issue. I can only think that Azure caches instances somehow to reduce start up times and the cached image I was using was poisoned somehow.
One issue could be the restart policy - have a look at the Microsoft restart policy troubleshooting on Microsoft's ACI troubleshooting page. According to the website under the Container continually exits and restarts (no long-running process) header in the page:
Container groups default to a restart policy of Always, so containers
in the container group always restart after they run to completion.
You may need to change this to OnFailure or Never if you intend to run
task-based containers. If you specify OnFailure and still see
continual restarts, there might be an issue with the application or
script executed in your container.
In your case you may need to adjust the code as follows using the withStartingCommand:
containerGroup = await azure.ContainerGroups.Define(containerName)
.WithRegion("West Europe")
.WithExistingResourceGroup(configuration.ContainerResourceGroup)
.WithLinux()
.WithPublicImageRegistryOnly()
.WithEmptyDirectoryVolume("devshm")
.DefineContainerInstance(containerName)
.WithImage("selenium/standalone-chrome")
.WithExternalTcpPorts(4444)
.WithVolumeMountSetting("devshm", "/dev/shm")
.WithMemorySizeInGB(2)
.WithStartingCommandLine("tail")
.WithStartingCommandLine("-f")
.WithStartingCommandLine("/dev/null")
.Attach()
.WithDnsPrefix(configuration.AppServiceName + "container")
.WithRestartPolicy(ContainerGroupRestartPolicy.OnFailure)
.CreateAsync(cancellationToken);
This link is helpful for this issue.
--command-line
linux => "tail -f /dev/null"
windows => "ping -t localhost"
# .yml
command: tail -f /dev/null
It will keep your azure instance running.
As now azure do have a endpoint to connect/analyze the process on.

Running AWS Log Agent from inside a Fargate container

Trying to run the AWS Logs Agent inside a docker container running on AWS ECS Fargate.
This has been working fine under EC2 for several years. Under Fargate context, it does not seem to be able to resolve the task role being passed to it.
Permissions on the Task Role should be good... I've even tried giving it full CloudWatch permissions to eliminate that as a reason.
I've managed to hack the python based launcher script to add a --debug flag which gave me this in the log:
Caught retryable HTTP exception while making metadata service request to
http://169.254.169.254/latest/meta-data/iam/security-credentials
It does not appear to be properly resolving the credentials that are passed into the task as the 'Task Role'
I managed to find a hack workaround, that may illustrate what I believe to be a bug or inadequacy in the agent. I had to hack the launcher script using sed as follows:
sed -i "s|HTTPS_PROXY|AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI HTTPS_PROXY|"
/var/awslogs/bin/awslogs-agent-launcher.sh
This essentially de-references the ENV variable holding the URI for retrieving the task role and passes it to the agent's launcher.
It results in something like this:
/usr/bin/env -i AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/f4ca7e30-b73f-4919-ae14-567b1262b27b (etc...)
With this in place, I restart the log agent and it works as expected.
Note that you can do something like this to add --debug flag to the launcher also which was very helpful in trying to figure out where it went astray.

Cannot start renderd service for mod_tile

I am building an OSM tile server as per directions available here: https://switch2osm.org/manually-building-a-tile-server-16-04-2-lts/ on an Amazon EC2 instance with Ubuntu 16-04 LTS.
Everything is working well until the step of starting renderd as a service:
sudo /etc/init.d/renderd start
This returns an error of: "Job for renderd.service failed because the control process exited with error code. See "systemctl status renderd.service" and "journalctl -xe" for details."
Checking the details mentioned gives messages like:
"renderd.service: Control process exited, code=exited status=203"
"The error number returned by this process is 8."
I can however run renderd directly no problem as below, and can even (slowly) load tiles into a leaflet map, I just cannot run it as a service.
sudo -u username renderd -f -c /usr/local/etc/renderd.conf
I have also tried changing to my rendering user and starting the service from there, but then I get a password prompt for user ubuntu (there isn't one).
What else can I test out or investigate to find out what the problem is?
I decided to start building my server again from scratch, this time also using information from other tutorials: https://www.linuxbabe.com/linux-server/openstreetmap-tile-server-ubuntu-16-04 and https://ircama.github.io/osm-carto-tutorials/tile-server-ubuntu
Following those instructions, renderd now runs as a service. The main difference I noticed was those tutorials above use https://github.com/openstreetmap/mod_tile.git rather than the
https://github.com/SomeoneElseOSM/mod_tile.git source I used before, so perhaps the settings of the branched mod_tile were not compatible with my server.

Invoking bamboo plan remotely

I know it is possible to take down a bamboo artifact remotely, I was wondering is it possible to take start a bamboo from a remote box by sending an appropriate HTTP request?
Thanks,.
Here is an example:
curl --user un:pwd -X POST -d "stage&executeAllStages" -d "bamboo.variable.TEST=WORKS" http://10.0.0.0/rest/api/latest/queue/CAP-BR.json
As you can see I am also passing an optional value to Bamboo bamboo.variable.TEST=WORKS
If I understood your question right, then what you want to do is to trigger a build via the rest api of bamboo.
This could offer you some help:
https://answers.atlassian.com/questions/65517/trigger-bamboo-plan-via-rest-call