ECS Fargate - No Space left on Device - aws-fargate

I had deployed my asp.net core application on AWS Fargate and all was working fine. I am using awslogs driver and logs were correctly sent to the cloudwatch. But after few days of correctly working, I am now seeing only one kind of log as shown below:
So no application logs are showing up due to no space. If I update the ECS service, logging starts working again, suggesting that the disk has been cleaned up.
This link suggests that awslogs driver does not take up space and sends log to cloudwatch instead.
https://docs.aws.amazon.com/AmazonECS/latest/userguide/task_cannot_pull_image.html
Did anyone also faced this issue and knows how to resolve the same?

You need to set the "LibraryLogFileName" parameter in your AWS Logging configuration to null.
So in the appsettings.json file of a .Net Core application, it would look like this:
"AWS.Logging": {
"Region": "eu-west-1",
"LogGroup": "my-log-group",
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information"
},
"LibraryLogFileName": null
}

It depends on how you have logging configured in your application. The AWSlogs driver is just grabbing all the output sent to the console and saving it to CloudWatch, .NET doesn't necessarily know about this and is going to keep writing logs like it would have otherwise.
Likely .NET is still writing logs to whatever location it otherwise would be.
Advice for how to troubleshoot and resolve:
First, run the application locally and check if log files are being saved anywhere
Second, optinally run a container test to see if log files are being saved there too
Make sure you have docker installed on your machine
Download the container image from ECR which fargate is running.
docker pull {Image URI from ECR}
Run this locally
Do some task you know will genereate some logs
Use docker exec -it to connect up to your container
Check if log files are being written to the location you identified when you did the first test
Finally, once you have identified that logs are being written to files somewhere pick one of these options
Add some flag which can be optionally specified to disable logging to a file. Use this when running your application inside of the container.
Implement some logic to clean up log files periodically or once they reach a certain size. (Keep in mind ECS containers have up to 20GB local storage)
Disable all file logging(not a good option in my opinion)
Best of luck!

Related

.NET Core 2.2 API on IIS returns error if it attempts to connect to Db but otherwise works fine

When I publish my .NET core app to a development IIS server, I make a call to the API and the method works fine locally. This method makes a Db call using a connection string stored in
appSettings.json as well as appSettings.Development.json
Observations:
- Yes, I have ASPNETCORE_ENVIRONMENT = Development and I have both an appSettings.json file AND appSettings.Development.json file
- So I started looking at the published files and I had BOTH of these json files in the published folder, even though appSettings.Development.json properties is set to "Build Action" = content and Copy to Output Directory = Do Not Copy
- If I comment out the code that his the Db, and return dummy data, and republish the api, i get the results fine with no complaints about "development mode"
Error I get when calling the API trying to hit the Db
Error.
An error occurred while processing your request.
Request ID:
0HLL1GOCEHH73:00000001
Development Mode
Swapping to the
Development environment displays detailed information about the error that occurred.
The Development environment shouldn't be enabled for deployed applications.
It can result in displaying sensitive information from exceptions to end users.
For local debugging, enable the
Development environment by setting the
ASPNETCORE_ENVIRONMENT environment variable to
Development
and restarting the app.
Questions:
- The Db connection strings are in both
[ update ]
Brain-fart! The db connection strings were using 'trusted', so no wonder they worked locally! Once I put in the credentials, and re-published, things worked like I expected. However, the error message threw me off.
Im still not sure why I have both of those appSettings files published? Which one will it use?
I am assuming your appsetting files are named as following:
appSettings.json
appSettings.dev.json
typically, you have to explicitly set the environment to dev. if you use Visual Studio for development, it sets an environment variable that tells the application to put it in dev mode.
Without seeing the initializing logic, I would say in prod it will use the appSettings.json.
Take a look at this article, it explains configuration in more details.

NServiceBus endpoint is not starting on Azure Service Fabric local cluster

I have a .NetCore stateless WebAPI service running inside Service Fabric local cluster.
return Endpoint.Start(endpointConfiguration).GetAwaiter().GetResult();
When I'm trying to start NServiceBus endpoint, I'm getting this exception :
Access to the path 'C:\SfDevCluster\Data_App_Node_0\AppType_App10\App.APIPkg.Code.1.0.0.diagnostics' is denied.
How can it be solved ? VS is running under administrator.
The issue you are having is because the folder you are trying to write to is not supposed to be written by your application.
The package folder is used to store you application binaries and can be recreated dynamically whenever an application is hosted in the node.
Also, the binaries are reused by multiple service instances running on same node, so it might compete to use the files by different instances.
You should instead instruct your application to write to the WorkFolder,
public Stateless1(StatelessServiceContext context): base(context)
{
string workdir = context.CodePackageActivationContext.WorkDirectory;
}
The code above will give you a path like this:
'C:\SfDevCluster\Data_App_Node_0\AppType_App10\App.APIPkg.Code.1.0.0.diagnostics\work'
This folder is dynamic, will change depending on the node or instance of your application is running, when created, your application should already have permission to write to it.
For more info, see:
how-do-i-get-files-into-the-work-directory-of-a-stateless-service?forum=AzureServiceFabric
Open folder properties Security tab
Select ServiceFabricAllowedUsers
Add Write permission

Stackdriver Node.js Logging not showing up

I have a Node.js application, running inside of a Docker container and logging events using Stackdriver.
It is a Node.Js app, running with Express.js and Winston for logging and using a StackDriverTransport.
When I run this container locally, everything is logged correctly and shows up in the Cloud console. When I run this same container, with the same environment variables, in a GCE VM, the logs don't show up.
What do you mean exactly by locally? Are you running the container on the Cloud Shell vs running it on an instance? Keep in mind that if you create a container or instance that has to do something that needs privileges (like the Stackdriver logging client library) and run it, if that instance doesn't have a service account with that role/privileges set up it won't work.
Yu mentioned that you use the same environment variables, I take that one of the env vars points to your json key file. Is the key file present in that path on the instance?
From Winston documentation it looks like you need to specify the key file location for the service account:
const winston = require('winston');
const Stackdriver = require('#google-cloud/logging-winston');
winston.add(Stackdriver, {
projectId: 'your-project-id',
keyFilename: '/path/to/keyfile.json'
});
Have you checked if this is defined with the key for the service account with a logging role?

ERROR: The overall deployment failed because too many individual instances failed deployment

I'm trying to deploy using CircleCI -> S3 -> CodeDeploy -> EC2.
I was able to upload deploy image onto S3 from CircleCI, but unable to deploy S3 to EC2 instance. Here's the error.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The error was provided from CodeDeploy. I can't figure out why and how.
I'd appreciate if you give some advise.
If you are running on Ubuntu there might be plenty of reasons, here is a checklist can verify
Check code-deploy agent is installed on your EC2 Instance. Please refer this document to install code deploy agent.
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-ubuntu.html
$ sudo service codedeploy-agent status
In case if you are running Ubuntu release 20.x and you get this error
./install:22:in block in method_missing': undefined method path' for
#<IO:> (NoMethodError)
try running the install file via this script
sudo ./install auto > /tmp/logfile
Check you have EC2 Instance Code Deploy Role -> Create a code deployment role and assign it to the Instance, https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-service-role.html.
In case if you assign the EC2 Role after initiate, restart the server.
Check your appsec.yml file placement as per the top answer, try to avoid any long timeout in it.
Log into your instance check your error log
$ tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
You should be able to figure out what caused the individual instances to fail by digging into the deployment instance details:
http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-view-instance-details.html
These should contain more detailed information about why your application was unable to be deployed.
This error is commonly due to problems in the configuration of the appSpec.yml or appSpec.json file (It depends on the format you are using).
"If you have any Hook I recommend that you remove them, check if it works, then you can add one by one (the Hooks) and so you can identify the error"
The appspec.yml file should be located at the root of your project:
│-- appspec.yml
│-- index.html
└-- scripts
│-- install_dependencies
│-- start_server
└-- stop_server
In the scripts folder you will have to place the processes that you want to be executed according to the Hook
Here is an example of the appspec.yml file
version: 0.0
os: linux
files:
- source: /index.html
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
I hope I can help you 😃👻🕺🏾
Make sure the CodeDeploy Host Agent Service is running in your target EC2 instance.
The error you are facing is a generic error message thrown on any of the event failure which could be beforeblockTraffic, blockTraffic, ApplicationStop etc.
The first step in this case would be check whether code deploy agent is running or not if first event i.e. BeforeBlockTraffic event is failed.
As you can see in the screenshot below, the event failure message would tell you the exact error behind.
From the failed deployments, I can see all lifecycle events were skipped. Instance i-0bcc36e73851297f2 is currently in Stopped state but I can see the IAM instance profile is missing. Your Amazon EC2 instances need permission to access the Amazon S3 buckets or GitHub repositories where the applications that will be deployed by AWS CodeDeploy are stored. To launch Amazon EC2 instances that are compatible with AWS CodeDeploy, you must create an additional IAM role, an instance profile. 1
For such failures, you can always begin with a general troubleshooting checklist for a failed deployment 2 and then look for troubleshooting guides on Deployment Issues and Instance issues3.
1[http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-iam-instance-profile.html]1
2 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-general.html]2
3 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting.html]3
Check the status of the Code Deploy Agent. In my case, the agent wasn't up.
Please check the role given to the ec2 machine(where the agent is running). It should have s3 access as well. This resolved my issue.
"The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path 'appspec.yml'"
Please place your appspec.yml file in your root folder to solve this error
To access your after script and before script
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.

Windows Azure Console for Worker Role Cloud Service

I have a worker role cloud service that I have recently developed on my local machine. The service exposes a WCF interface that receives a file as a byte array, recompiles the file, converts it to the appropriate format, then stores it in Azure Storage. I managed to get everything working using the Azure Compute Emulator on my machine and published the service to Azure and... nothing. Running it on my machine again, it works as expected. When I was working on it on my computer, the Azure Compute Emulator's console output was essential in getting the application running.
Is there a similar functionality that can be tapped into on the Cloud Service via RDP? Such as starting/restarting the role at the command prompt or in power shell? If not, what is the best way to debug/log what the worker role is doing (without using Intellitrace)? I have diagnostics enabled in the project, but it doesn't seem to be giving me the same level of detail as the Computer Emulator console. I've rerun the role and corresponding .NET application again on localhost and was unable to find any possible errors in the console.
Edit: The Next Best Thing
Falling back to manual logging, I implemented a class that would feed text files into my Azure Storage account. Here's the code:
public class EventLogger
{
public static void Log(string message)
{
CloudBlobContainer cbc;
cbc = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageClientAccount"))
.CreateCloudBlobClient()
.GetContainerReference("errors");
cbc.CreateIfNotExist();
cbc.GetBlobReference(string.Format("event-{0}-{1}.txt", RoleEnvironment.CurrentRoleInstance.Id, DateTime.UtcNow.Ticks)).UploadText(message);
}
}
Calling ErrorLogger.Log() will create a new text file and record whatever message you put in there. I found an example in the answer below.
There is no console for worker roles that I'm aware of. If diagnostics isn't giving you any help, then you need to get a little hacky. Try tracing out messages and errors to blob storage yourself. Steve Marx has a good example of this here http://blog.smarx.com/posts/printf-here-in-the-cloud
As he notes in the article, this is not for production, just to help you find your problem.