Mesos / Marathon Roles and Applications - roles

I want to run a Mesos/Marathon cluster with a mix of public and private apps. It seems like it should be possible to leverage roles for this, but I'm not having success.
As a proof-of-concept, I have a Mesos cluster with public and private roles:
# curl --silent localhost:5050/roles | jq '.' | grep name
"name": "*",
"name": "public",
"name": "private",
The problem arises with Marathon and my Marathon apps. I'm unable to set default_accepted_resource_roles to a value that's different from mesos_role. For example, if I set --mesos_role=public and --default_accepted_resource_roles=public, I'm able to run apps, but only if they include "acceptedResourceRoles": [ "public" ] in the application description. Conversely, I can set everything to private and run private apps. (Although, ironically, if I set everything to * I'm unable to run my apps. I don't understand that at all.)
Ideally, I'd like to run this way:
mesos_role=*
default_accepted_resource_roles=private
I would expect this to give Marathon access to all my roles but start apps with the private role unless they specifically request the public role. Unfortunately, this results in this error:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: --default_accepted_resource_roles contains roles for which we will not receive offers: private
and Marathon refuses to start.
Is it possible to run Marathon with --mesos_role=* and --default_accepted_resource_roles=private? If not, I don't understand why the roles exist. But I'd rather run as much as possible in a single cluster rather than spinning up separate clusters for public and private apps.

#aayore - as a workaround we deploy multiple marathon. main-marathon and couple of others marathons which run on the main marathon as app, typically one per role.
Say you have 2 roles "kafka" and "hdfs"
e.g. main marathon runs with default role (which is *)
marathon 2 runs on main-marathon with --mesos_role kafka (command line option)
marathon 3 also runs on main-marathon with --mesos_role hdfs
for an app to use resources allocated to kafka role, you need to launch the job in kafka-marathon instead of the main marathon.

Related

AWS EC2 Instance Profile for S3 permissions inconsistent

Background: I'm writing an automated deployment script to deploy a ruby on rails application to AWS on an EC2 instance using S3 as the storage for ActiveStorage. My script creates an instance profile/role and attaches it to the EC2 instance on creation. My script uses the ruby sdk for AWS.
Sometimes when my script runs, it works great (which tells me my configuration is correct). Sometimes it throws the following exception:
/home/ubuntu/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/aws-sigv4-1.2.1/lib/aws-sigv4/signer.rb:613:in `extract_credentials_provider': missing credentials, provide credentials with one of the following options: (Aws::Sigv4::Errors::MissingCredentialsError)
- :access_key_id and :secret_access_key
- :credentials
- :credentials_provider
I generally have success about 9 times out of 10 using a t3a.micro or t3.micro instance. I usually have a failure 9 times out of 10 using a t3a.nano or t3.nano instance.
It sure seems like there is something eventually consistent about these instance profiles, but I can't find anything in the documentation. What's going on, and what can I do to make this succeed consistently?
Thank you.

ECS Fargate - No Space left on Device

I had deployed my asp.net core application on AWS Fargate and all was working fine. I am using awslogs driver and logs were correctly sent to the cloudwatch. But after few days of correctly working, I am now seeing only one kind of log as shown below:
So no application logs are showing up due to no space. If I update the ECS service, logging starts working again, suggesting that the disk has been cleaned up.
This link suggests that awslogs driver does not take up space and sends log to cloudwatch instead.
https://docs.aws.amazon.com/AmazonECS/latest/userguide/task_cannot_pull_image.html
Did anyone also faced this issue and knows how to resolve the same?
You need to set the "LibraryLogFileName" parameter in your AWS Logging configuration to null.
So in the appsettings.json file of a .Net Core application, it would look like this:
"AWS.Logging": {
"Region": "eu-west-1",
"LogGroup": "my-log-group",
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information"
},
"LibraryLogFileName": null
}
It depends on how you have logging configured in your application. The AWSlogs driver is just grabbing all the output sent to the console and saving it to CloudWatch, .NET doesn't necessarily know about this and is going to keep writing logs like it would have otherwise.
Likely .NET is still writing logs to whatever location it otherwise would be.
Advice for how to troubleshoot and resolve:
First, run the application locally and check if log files are being saved anywhere
Second, optinally run a container test to see if log files are being saved there too
Make sure you have docker installed on your machine
Download the container image from ECR which fargate is running.
docker pull {Image URI from ECR}
Run this locally
Do some task you know will genereate some logs
Use docker exec -it to connect up to your container
Check if log files are being written to the location you identified when you did the first test
Finally, once you have identified that logs are being written to files somewhere pick one of these options
Add some flag which can be optionally specified to disable logging to a file. Use this when running your application inside of the container.
Implement some logic to clean up log files periodically or once they reach a certain size. (Keep in mind ECS containers have up to 20GB local storage)
Disable all file logging(not a good option in my opinion)
Best of luck!

NServiceBus endpoint is not starting on Azure Service Fabric local cluster

I have a .NetCore stateless WebAPI service running inside Service Fabric local cluster.
return Endpoint.Start(endpointConfiguration).GetAwaiter().GetResult();
When I'm trying to start NServiceBus endpoint, I'm getting this exception :
Access to the path 'C:\SfDevCluster\Data_App_Node_0\AppType_App10\App.APIPkg.Code.1.0.0.diagnostics' is denied.
How can it be solved ? VS is running under administrator.
The issue you are having is because the folder you are trying to write to is not supposed to be written by your application.
The package folder is used to store you application binaries and can be recreated dynamically whenever an application is hosted in the node.
Also, the binaries are reused by multiple service instances running on same node, so it might compete to use the files by different instances.
You should instead instruct your application to write to the WorkFolder,
public Stateless1(StatelessServiceContext context): base(context)
{
string workdir = context.CodePackageActivationContext.WorkDirectory;
}
The code above will give you a path like this:
'C:\SfDevCluster\Data_App_Node_0\AppType_App10\App.APIPkg.Code.1.0.0.diagnostics\work'
This folder is dynamic, will change depending on the node or instance of your application is running, when created, your application should already have permission to write to it.
For more info, see:
how-do-i-get-files-into-the-work-directory-of-a-stateless-service?forum=AzureServiceFabric
Open folder properties Security tab
Select ServiceFabricAllowedUsers
Add Write permission

Can I make Storm's UI expect a different service principal from "HTTP"

I have Storm working on my own Kerberos realm, but in order to connect to my corporate realm, I'm beholden to their rules. There is only one production AD, so I need to make non-prod and prod versions of all principals. Thus far, I have changed my Storm cluster config to use "storm_np" in non-prod and everything works. (i.e. storm_np#REALM and storm_np/HOST#REALM)
However, the UI seems to always try to run as HTTP/HOST#REALM so I can't get in. Before I changed the principals, I would kinit as HTTP#REALM and I could see the UI just fine. Now, I'm not sure how to (1) tell the UI to run as a different principal and (2) kinit locally so i can get to it
I'm hoping for a property of some kind. To make the "storm_np" changes, I was able to change startup properties and jaas configs. I'd love to find the same for this.

Windows Azure Console for Worker Role Cloud Service

I have a worker role cloud service that I have recently developed on my local machine. The service exposes a WCF interface that receives a file as a byte array, recompiles the file, converts it to the appropriate format, then stores it in Azure Storage. I managed to get everything working using the Azure Compute Emulator on my machine and published the service to Azure and... nothing. Running it on my machine again, it works as expected. When I was working on it on my computer, the Azure Compute Emulator's console output was essential in getting the application running.
Is there a similar functionality that can be tapped into on the Cloud Service via RDP? Such as starting/restarting the role at the command prompt or in power shell? If not, what is the best way to debug/log what the worker role is doing (without using Intellitrace)? I have diagnostics enabled in the project, but it doesn't seem to be giving me the same level of detail as the Computer Emulator console. I've rerun the role and corresponding .NET application again on localhost and was unable to find any possible errors in the console.
Edit: The Next Best Thing
Falling back to manual logging, I implemented a class that would feed text files into my Azure Storage account. Here's the code:
public class EventLogger
{
public static void Log(string message)
{
CloudBlobContainer cbc;
cbc = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageClientAccount"))
.CreateCloudBlobClient()
.GetContainerReference("errors");
cbc.CreateIfNotExist();
cbc.GetBlobReference(string.Format("event-{0}-{1}.txt", RoleEnvironment.CurrentRoleInstance.Id, DateTime.UtcNow.Ticks)).UploadText(message);
}
}
Calling ErrorLogger.Log() will create a new text file and record whatever message you put in there. I found an example in the answer below.
There is no console for worker roles that I'm aware of. If diagnostics isn't giving you any help, then you need to get a little hacky. Try tracing out messages and errors to blob storage yourself. Steve Marx has a good example of this here http://blog.smarx.com/posts/printf-here-in-the-cloud
As he notes in the article, this is not for production, just to help you find your problem.