I have logging infrastructure setup with AWS OpenSearch, Fluent-bit (DaemonSet on EKS), FluentD (Deployment on EKS) and OpenSearch Dashboard.
I am working on the ILM policy and facing issue with the rollover. I am trying to achieve with the policy, after reaching the condition (after every 1 days or when the index reaches size 2G), the rollover should happen and moved to cold storage. Once the old index reaches 7 days in total, the old index should get deleted.
My FluentD config:
<match *.**>
#type copy
<store>
type elasticsearch
include_tag_key true
host "#{ENV.fetch('ELASTICSEARCH_HOST')}"
port "#{ENV.fetch('ELASTICSEARCH_PORT')}"
user "#{ENV.fetch('ELASTICSEARCH_USER')}"
password "#{ENV.fetch('ELASTICSEARCH_PASSWORD')}"
log_es_400_reason true
ca_file /certs/ca.pem
scheme https
ssl_verify true
logstash_format true
logstash_prefix rollover-sbx
</match>
ILM policy
{
"id": "sbx-ism-policy",
"seqNo": 379851,
"primaryTerm": 2,
"policy": {
"policy_id": "sbx-ism-policy",
"description": "A simple default policy that changes the replica count between hot and cold states.",
"last_updated_time": 1650595677042,
"schema_version": 12,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"rollover": {
"min_size": "200mb",
"min_doc_count": 200,
"min_index_age": "1h"
}
}
],
"transitions": [
{
"state_name": "cold",
"conditions": {
"min_index_age": "1h"
}
}
]
},
{
"name": "cold",
"actions": [
{
"close": {}
}
],
"transitions": []
}
],
"ism_template": [
{
"index_patterns": [
"rollover-sbx*"
],
"priority": 70,
"last_updated_time": 1650583513796
}
]
}
}
above mentioned min size and time in ILM policy are for testing purpose
Related
I've been using the below Index Management Policy for a little less than a year now with no issue, but sometime a few months back it apparently stopped working, as all indices which should fall into the "delete" category are now stuck in "Evaluating transition conditions..." state. I have been searching for possible changes to syntax, but have not found any. I am also not aware of any updates having been performed for either the host machine or Kibana/Elastic. What could possibly be the issue?
{
"policy_id": "delete_14d",
"description": "Deletes old indices after 14 days",
"last_updated_time": 1661536875977,
"schema_version": 1,
"error_notification": null,
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"read_write": {}
}
],
"transitions": [
{
"state_name": "Delete",
"conditions": {
"min_index_age": "14d"
}
}
]
},
{
"name": "Delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": [
"staging*"
],
"priority": 100,
"last_updated_time": 1632510094716
}
}
As of November 2019 AWS Step Function has native support for orchestrating EMR Clusters. Hence we are trying to configure a Cluster and run some jobs on it.
We could not find any documentation on how to set the SubnetId as well as the Key Name used for the EC2 instances in the cluster. Is there any such possibility?
As of now our create cluster step looks as following:
"States": {
"Create an EMR cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "TestCluster",
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.26.0",
"Applications": [
{ "Name": "spark" }
],
"ServiceRole": "SomeRole",
"JobFlowRole": "SomeInstanceProfile",
"LogUri": "s3://some-logs-bucket/logs",
"Instances": {
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"Name": "MasterFleet",
"InstanceFleetType": "MASTER",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m3.2xlarge"
}
]
},
{
"Name": "CoreFleet",
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 2,
"InstanceTypeConfigs": [
{
"InstanceType": "m3.2xlarge",
"BidPriceAsPercentageOfOnDemandPrice": 100 }
]
}
]
}
},
"ResultPath": "$.cluster",
"End": "true"
}
}
As soon as we try to add "SubnetId" key in any of the subobjects in Parameters, or in Parameter itself we get the error:
Invalid State Machine Definition: 'SCHEMA_VALIDATION_FAILED: The field "SubnetId" is not supported by Step Functions at /States/Create an EMR cluster/Parameters' (Service: AWSStepFunctions; Status Code: 400; Error Code: InvalidDefinition;
Referring to the SF docs on the emr integration we can see that createCluster.sync uses the emr API RunJobFlow. In RunJobFlow we can specify the Ec2KeyName and Ec2SubnetId located at the paths $.Instances.Ec2KeyName and $.Instances.Ec2SubnetId.
With that said I managed to create a State Machine with the following definition (on a side note, your definition had a syntax error with "End": "true", which should be "End": true)
{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "Create an EMR cluster",
"States": {
"Create an EMR cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "TestCluster",
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.26.0",
"Applications": [
{
"Name": "spark"
}
],
"ServiceRole": "SomeRole",
"JobFlowRole": "SomeInstanceProfile",
"LogUri": "s3://some-logs-bucket/logs",
"Instances": {
"Ec2KeyName": "ENTER_EC2KEYNAME_HERE",
"Ec2SubnetId": "ENTER_EC2SUBNETID_HERE",
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"Name": "MasterFleet",
"InstanceFleetType": "MASTER",
"TargetOnDemandCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m3.2xlarge"
}
]
},
{
"Name": "CoreFleet",
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 2,
"InstanceTypeConfigs": [
{
"InstanceType": "m3.2xlarge",
"BidPriceAsPercentageOfOnDemandPrice": 100
}
]
}
]
}
},
"ResultPath": "$.cluster",
"End": true
}
}
}
I am trying to improve the build speed of one of my projects on CodeBuild. The project uses a Github source provider and source caching of type local is enabled.
The first time I ran the build it took 103 secs. I ran it again immediately after the first one finished expecting it to run in a few seconds due to source caching, but it took 60 secs.
What I am missing here? Is the cache not working? If it is working, why does it take that long on the second run?
Thanks
Project Details:
{
"projectsNotFound": [],
"projects": [
{
"environment": {
"computeType": "BUILD_GENERAL1_LARGE",
"imagePullCredentialsType": "SERVICE_ROLE",
"privilegedMode": true,
"image": "111669150171.dkr.ecr.us-east-1.amazonaws.com/***********/ep-build-env:latest",
"environmentVariables": [
{
"type": "PLAINTEXT",
"name": "NEXUS_URI",
"value": "http://***************"
},
{
"type": "PLAINTEXT",
"name": "REGISTRY",
"value": "111669150171.dkr.ecr.us-east-1.amazonaws.com/*********"
}
],
"type": "LINUX_CONTAINER"
},
"timeoutInMinutes": 60,
"name": "StorefrontApi",
"serviceRole": "arn:aws:iam::111669150171:role/CodeBuild-ECRReadOnly",
"tags": [],
"artifacts": {
"type": "NO_ARTIFACTS"
},
"lastModified": 1571227097.581,
"cache": {
"type": "LOCAL",
"modes": [
"LOCAL_DOCKER_LAYER_CACHE",
"LOCAL_SOURCE_CACHE",
"LOCAL_CUSTOM_CACHE"
]
},
"vpcConfig": {
"subnets": [
"subnet-fd7f958b"
],
"vpcId": "vpc-71e3f414",
"securityGroupIds": [
"sg-19b65e6c",
"sg-9e28e9f9"
]
},
"created": 1571082681.262,
"sourceVersion": "refs/heads/ep-mysql",
"source": {
"buildspec": "version: 0.2\n\nphases:\n build:\n commands:\n - env\n - cd extensions\n - mvn --settings $CODEBUILD_SRC_DIR_DEVOPS_WINE/pipelines/storefront/build-war/settings.xml --projects storefront/ext-storefront-webapp -am -DskipAllTests clean install\n\nartifacts:\n secondary-artifacts:\n storefront-war:\n base-directory: $CODEBUILD_SRC_DIR/extensions/storefront/ext-storefront-webapp/target\n files:\n - \"*.war\"\n\ncache:\n paths:\n - '/root/.m2/**/*'\n - '/root/.npm/**/*'",
"insecureSsl": false,
"gitSubmodulesConfig": {
"fetchSubmodules": false
},
"location": "https://github.com/*****************.git",
"gitCloneDepth": 1,
"type": "GITHUB",
"reportBuildStatus": false
},
"badge": {
"badgeEnabled": false
},
"queuedTimeoutInMinutes": 480,
"secondaryArtifacts": [],
"logsConfig": {
"s3Logs": {
"status": "DISABLED",
"encryptionDisabled": false
},
"cloudWatchLogs": {
"status": "ENABLED"
}
},
"secondarySources": [
{
"insecureSsl": false,
"gitSubmodulesConfig": {
"fetchSubmodules": false
},
"location": "https://github.com/*****************.git",
"sourceIdentifier": "DEVOPS_WINE",
"gitCloneDepth": 1,
"type": "GITHUB",
"reportBuildStatus": false
}
],
"encryptionKey": "arn:aws:kms:us-east-1:111669150171:alias/aws/s3",
"arn": "arn:aws:codebuild:us-east-1:111669150171:project/StorefrontApi",
"secondarySourceVersions": [
{
"sourceVersion": "refs/heads/staging",
"sourceIdentifier": "DEVOPS_WINE"
}
]
}
]
}
Apparently, at time of writing, CodeBuild does not use the native git client to fetch the source from GitHub. I understand that the CodeBuild internal teams have an internal feature request to move from whatever they're using to the native git client to improve performance.
Does your repository, by change, have lots of large files in its history? You can use this answer for a command to run to analyze your repository.
If you have lots of large files in your history and you're able to remove them, you can then use a tool like BFG Repo Cleaner to rewrite history. That should speed up the DOWNLOAD_SOURCE phase.
Also, if you have a dedicated support plan with AWS, you should reach out to your TAM to upvote the feature request to move to native git for GitHub source downloads.
If I deploy a static website with s3 and api gateway, is there any way for a step function to wait for some activity, then redirect the user on that static website to another?
WeCanBeFriends,
This is possible using the Job Status Poller pattern, but tweaked slightly. If the "Job" is to deploy the website, then the condition to "Complete Job" is to see some activity come in (ideally through cloudwatch metrics).
Once you see enough metrics to be ok with your deployment, you can either do a push notification to the webapp to inform it to redirect (using a lambda function that calls SNS - as in the wait timer sample) or have the webapp poll the execution status until it's complete.
Below I've posted a very simple variation to the Job Status Poller to illustrate my example:
{
"Comment": "A state machine that publishes to SNS after a deployment completes.",
"StartAt": "StartDeployment",
"States": {
"StartDeployment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:012345678912:function:KickOffDeployment",
"ResultPath": "$.guid",
"Next": "CheckIfDeploymentComplete"
},
"CheckIfDeploymentComplete": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:012345678912:function:CheckIfDeploymentComplete",
"Next": "TriggerWebAppRefresh",
"InputPath": "$.guid",
"ResultPath": "$.status",
"Retry": [ {
"ErrorEquals": [ "INPROGRESS" ],
"IntervalSeconds": 5,
"MaxAttempts": 240,
"BackoffRate": 1.0
} ],
"Catch": [ {
"ErrorEquals": ["FAILED"],
"Next": "DeploymentFailed"
}]
},
"DeploymentFailed": {
"Type": "Fail",
"Cause": "Deployment failed",
"Error": "Deployment FAILED"
},
"TriggerWebAppRefresh": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:012345678912:function:SendSNSToWebapp",
"InputPath": "$.guid",
"End": true
}
}
}
I want to deploy a software in to nodes with daemonset, but it is not a docker app. I created a daemonset json like this :
"template": {
"metadata": {
"creationTimestamp": null,
"labels": {
"app": "uniagent"
},
"annotations": {
"scheduler.alpha.kubernetes.io/tolerations": "[{\"key\":\"beta.k8s.io/accepted-app\",\"operator\":\"Exists\", \"effect\":\"NoSchedule\"}]"
},
"enable": true
},
"spec": {
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"securityContext": {},
"processes": [
{
"name": "foundation",
"package": "xxxxx",
"resources": {
"limits": {
"cpu": "100m",
"memory": "1Gi"
}
},
"lifecyclePlan": {
"kind": "ProcessLifecycle",
"namespace": "engb",
"name": "app-plc"
},
"env": [
{
"name": "SECRET_USERNAME",
"valueFrom": {
"secretKeyRef": {
"name": "key-secret",
"key": "uniagentuser"
}
}
},
{
"name": "SECRET_PASSWORD",
"valueFrom": {
"secretKeyRef": {
"name": "key-secret",
"key": "uniagenthash"
}
}
}
]
},
when the app deploy succeeds, the env variables do not exist at all.
What should I do to solve this problem?
Thanks
Daemon Sets have to be docker containers. You can't have non-containerized programs run as Daemon Sets. https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ Kubernetes only launches containers.
Also in your YAML manifest file, I see a "processes" key and I have reason to believe it's not a valid manifest file, so I doubt you deployed it successfully.
You have not pasted the "full" YAML file, but I'm guessing the "template" key at the beginning is the spec.template key of the file.
Run kubectl explain daemonset.spec.template.spec and you'll see that there is no "processes" field.