AWS Code Build - cached DOWNLOAD_SOURCE taking too long - aws-codebuild

I am trying to improve the build speed of one of my projects on CodeBuild. The project uses a Github source provider and source caching of type local is enabled.
The first time I ran the build it took 103 secs. I ran it again immediately after the first one finished expecting it to run in a few seconds due to source caching, but it took 60 secs.
What I am missing here? Is the cache not working? If it is working, why does it take that long on the second run?
Thanks
Project Details:
{
"projectsNotFound": [],
"projects": [
{
"environment": {
"computeType": "BUILD_GENERAL1_LARGE",
"imagePullCredentialsType": "SERVICE_ROLE",
"privilegedMode": true,
"image": "111669150171.dkr.ecr.us-east-1.amazonaws.com/***********/ep-build-env:latest",
"environmentVariables": [
{
"type": "PLAINTEXT",
"name": "NEXUS_URI",
"value": "http://***************"
},
{
"type": "PLAINTEXT",
"name": "REGISTRY",
"value": "111669150171.dkr.ecr.us-east-1.amazonaws.com/*********"
}
],
"type": "LINUX_CONTAINER"
},
"timeoutInMinutes": 60,
"name": "StorefrontApi",
"serviceRole": "arn:aws:iam::111669150171:role/CodeBuild-ECRReadOnly",
"tags": [],
"artifacts": {
"type": "NO_ARTIFACTS"
},
"lastModified": 1571227097.581,
"cache": {
"type": "LOCAL",
"modes": [
"LOCAL_DOCKER_LAYER_CACHE",
"LOCAL_SOURCE_CACHE",
"LOCAL_CUSTOM_CACHE"
]
},
"vpcConfig": {
"subnets": [
"subnet-fd7f958b"
],
"vpcId": "vpc-71e3f414",
"securityGroupIds": [
"sg-19b65e6c",
"sg-9e28e9f9"
]
},
"created": 1571082681.262,
"sourceVersion": "refs/heads/ep-mysql",
"source": {
"buildspec": "version: 0.2\n\nphases:\n build:\n commands:\n - env\n - cd extensions\n - mvn --settings $CODEBUILD_SRC_DIR_DEVOPS_WINE/pipelines/storefront/build-war/settings.xml --projects storefront/ext-storefront-webapp -am -DskipAllTests clean install\n\nartifacts:\n secondary-artifacts:\n storefront-war:\n base-directory: $CODEBUILD_SRC_DIR/extensions/storefront/ext-storefront-webapp/target\n files:\n - \"*.war\"\n\ncache:\n paths:\n - '/root/.m2/**/*'\n - '/root/.npm/**/*'",
"insecureSsl": false,
"gitSubmodulesConfig": {
"fetchSubmodules": false
},
"location": "https://github.com/*****************.git",
"gitCloneDepth": 1,
"type": "GITHUB",
"reportBuildStatus": false
},
"badge": {
"badgeEnabled": false
},
"queuedTimeoutInMinutes": 480,
"secondaryArtifacts": [],
"logsConfig": {
"s3Logs": {
"status": "DISABLED",
"encryptionDisabled": false
},
"cloudWatchLogs": {
"status": "ENABLED"
}
},
"secondarySources": [
{
"insecureSsl": false,
"gitSubmodulesConfig": {
"fetchSubmodules": false
},
"location": "https://github.com/*****************.git",
"sourceIdentifier": "DEVOPS_WINE",
"gitCloneDepth": 1,
"type": "GITHUB",
"reportBuildStatus": false
}
],
"encryptionKey": "arn:aws:kms:us-east-1:111669150171:alias/aws/s3",
"arn": "arn:aws:codebuild:us-east-1:111669150171:project/StorefrontApi",
"secondarySourceVersions": [
{
"sourceVersion": "refs/heads/staging",
"sourceIdentifier": "DEVOPS_WINE"
}
]
}
]
}

Apparently, at time of writing, CodeBuild does not use the native git client to fetch the source from GitHub. I understand that the CodeBuild internal teams have an internal feature request to move from whatever they're using to the native git client to improve performance.
Does your repository, by change, have lots of large files in its history? You can use this answer for a command to run to analyze your repository.
If you have lots of large files in your history and you're able to remove them, you can then use a tool like BFG Repo Cleaner to rewrite history. That should speed up the DOWNLOAD_SOURCE phase.
Also, if you have a dedicated support plan with AWS, you should reach out to your TAM to upvote the feature request to move to native git for GitHub source downloads.

Related

semantic-release in a GitLab pipeline with multiple users

I'm running a semantic-release job in a GitLab pipeline, it works great but only for my user (I configured it). No one else seems able to trigger a release, even if I merge their code. No errors, everything seems to run smoothly. I'm assuming there's some kind of authentication issue and/or everyone needs their own token or something like that? (I've only configured a token via my account and I'm not sure how I'd instruct someone to do that for multiple accounts in GitLab.)
The pipeline looks like this:
variables:
GL_TOKEN: $GL_TOKEN
stages:
- release
publish:
image: node:lts-alpine
stage: release
before_script:
- apk update
- apk add zip unzip git
- npm ci
script:
- npm run build
- npx semantic-release
only:
refs:
- main
and the config (in the package.json) is:
"release": {
"branches": [
"main"
],
"plugins": [
"#semantic-release/commit-analyzer",
"#semantic-release/release-notes-generator",
[
"#google/semantic-release-replace-plugin",
{
"replacements": [
{
"files": [
"style.css"
],
"from": "Version: .*",
"to": "Version: ${nextRelease.version}",
"results": [
{
"file": "style.css",
"hasChanged": true,
"numMatches": 1,
"numReplacements": 1
}
],
"countMatches": true
},
{
"files": [
"package.json"
],
"from": "\"version\": \".*\",",
"to": "\"version\": \"${nextRelease.version}\",",
"results": [
{
"file": "package.json",
"hasChanged": true,
"numMatches": 1,
"numReplacements": 1
}
],
"countMatches": true
}
]
}
],
[
"#semantic-release/git",
{
"assets": [
"style.css",
"package.json"
],
"message": "chore(release): ${nextRelease.version} [skip ci]"
}
],
[
"#semantic-release/exec",
{
"prepareCmd": "node bin/makezip.js"
}
],
[
"#semantic-release/gitlab",
{
"assets": [
{
"path": "file.zip",
"label": "compiled release"
}
]
}
]
]
}
I would suggest creating token per project and using that instead of your personal token.
You can create project token from project Settings > Access Token
This should solve your issue.

REST dataset for Copy Activity Source give me error Invalid PaginationRule

My Copy Activity is setup to use a REST Get API call as my source. I keep getting Error Code 2200 Invalid PaginationRule RuleKey=supportRFC5988.
I can call the GET Rest URL using the Web Activity, but this isn't optimal as I then have to pass the output to a stored procedure to load the data to the table. I would much rather use the Copy Activity.
Any ideas why I would get an Invalid PaginationRule error on a call?
I'm using a REST Linked Service with the following properties:
Name: Workday
Connect via integration runtime: link-unknown-self-hosted-ir
Base URL: https://wd2-impl-services1.workday.com/ccx/service
Authentication type: Basic
User name: Not telling
Azure Key Vault for password
Server Certificate Validation is enabled
Parameters: Name:format Type:String Default value:json
Datasource:
"name": "Workday_Test_REST_Report",
"properties": {
"linkedServiceName": {
"referenceName": "Workday",
"type": "LinkedServiceReference",
"parameters": {
"format": "json"
}
},
"folder": {
"name": "Workday"
},
"annotations": [],
"type": "RestResource",
"typeProperties": {
"relativeUrl": "/customreport2/company1/person%40company.com/HIDDEN_BI_RaaS_Test_Outbound"
},
"schema": []
}
}
Copy Activity
{
"name": "Copy Test Workday REST API output to a table",
"properties": {
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "RestSource",
"httpRequestTimeout": "00:01:40",
"requestInterval": "00.00:00:00.010",
"requestMethod": "GET",
"paginationRules": {
"supportRFC5988": "true"
}
},
"sink": {
"type": "SqlMISink",
"tableOption": "autoCreate"
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "Workday_Test_REST_Report",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "Destination_db",
"type": "DatasetReference",
"parameters": {
"schema": "ELT",
"tableName": "WorkdayTestReportData"
}
}
]
}
],
"folder": {
"name": "Workday"
},
"annotations": []
}
}
Well after posting this, I noticed that in the copy activity code there is a nugget about "supportRFC5988": "true" I switched the true to false, and everything just worked for me. I don't see a way to change this in the Copy Activity GUI
Editing source code and setting this option to false helped!

Azure Data Factory v2 If activity always fails

I'm currently struggling with the Azure Data Factory v2 If activity which always fails with this error message:
enter image description here
I've designed two separate pipelines, one takes the full snapshot of the data (1333 records) from the on-premises SQL Server and loads the data into the Azure SQL Database, and another one just takes delta from the same source.
Both pipelines work fine when executed independently.
I then decided to wrap these two pipelines into the one parent pipeline which would do this:
1.
Execute LookUp activity to check if the target table in Azure SQL Database has any records, basic Select Count(Request_ID) As record_count From target_table - activity works fine, I can preview the returned record count.
2.
Pass the output from the LookUp activity to the If activity with the conditions that if record_count = 0, the parent pipeline would invoke the full load pipeline, otherwise the parent pipeline would invoke the delta load pipeline.
This is the actual expression:
{#activity('lookup_sites_record_count').output.firstRow.record_count}==0"
Whenever I try to execute this parent pipeline, it fails with the above message of "Activity failed: Activity failed because an inner activity failed."
Both inner activities, that is, full load and delta load pipelines, work just fine when triggered independently.
What I'm missing?
Many thanks in advance :).
mikhailg
Pipeline's JSON definition below:
{
"name": "pl_remedyreports_load_rs_sites",
"properties": {
"activities": [
{
"name": "lookup_sites_record_count",
"type": "Lookup",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "Select Count(Request_ID) As record_count From mdp.RS_Sites;"
},
"dataset": {
"referenceName": "ds_azure_sql_db_sites",
"type": "DatasetReference"
}
}
},
{
"name": "If_check_site_record_count",
"type": "IfCondition",
"dependsOn": [
{
"activity": "lookup_sites_record_count",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"expression": {
"value": "{#activity('lookup_sites_record_count').output.firstRow.record_count}==0",
"type": "Expression"
},
"ifFalseActivities": [
{
"name": "pl_remedyreports_invoke_load_sites_inc",
"type": "ExecutePipeline",
"typeProperties": {
"pipeline": {
"referenceName": "pl_remedyreports_load_sites_inc",
"type": "PipelineReference"
}
}
}
],
"ifTrueActivities": [
{
"name": "pl_remedyreports_invoke_load_sites_full",
"type": "ExecutePipeline",
"typeProperties": {
"pipeline": {
"referenceName": "pl_remedyreports_load_sites_full",
"type": "PipelineReference"
}
}
}
]
}
}
],
"folder": {
"name": "Load Remedy Reference Data"
}
}
}
Your expression should be:
#equals(activity('lookup_sites_record_count').output.firstRow.record_count,0)

Can step functions wait on a static website?

If I deploy a static website with s3 and api gateway, is there any way for a step function to wait for some activity, then redirect the user on that static website to another?
WeCanBeFriends,
This is possible using the Job Status Poller pattern, but tweaked slightly. If the "Job" is to deploy the website, then the condition to "Complete Job" is to see some activity come in (ideally through cloudwatch metrics).
Once you see enough metrics to be ok with your deployment, you can either do a push notification to the webapp to inform it to redirect (using a lambda function that calls SNS - as in the wait timer sample) or have the webapp poll the execution status until it's complete.
Below I've posted a very simple variation to the Job Status Poller to illustrate my example:
{
"Comment": "A state machine that publishes to SNS after a deployment completes.",
"StartAt": "StartDeployment",
"States": {
"StartDeployment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:012345678912:function:KickOffDeployment",
"ResultPath": "$.guid",
"Next": "CheckIfDeploymentComplete"
},
"CheckIfDeploymentComplete": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:012345678912:function:CheckIfDeploymentComplete",
"Next": "TriggerWebAppRefresh",
"InputPath": "$.guid",
"ResultPath": "$.status",
"Retry": [ {
"ErrorEquals": [ "INPROGRESS" ],
"IntervalSeconds": 5,
"MaxAttempts": 240,
"BackoffRate": 1.0
} ],
"Catch": [ {
"ErrorEquals": ["FAILED"],
"Next": "DeploymentFailed"
}]
},
"DeploymentFailed": {
"Type": "Fail",
"Cause": "Deployment failed",
"Error": "Deployment FAILED"
},
"TriggerWebAppRefresh": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:012345678912:function:SendSNSToWebapp",
"InputPath": "$.guid",
"End": true
}
}
}

Create env fails when using a daemonset to create processes in Kubernetes

I want to deploy a software in to nodes with daemonset, but it is not a docker app. I created a daemonset json like this :
"template": {
"metadata": {
"creationTimestamp": null,
"labels": {
"app": "uniagent"
},
"annotations": {
"scheduler.alpha.kubernetes.io/tolerations": "[{\"key\":\"beta.k8s.io/accepted-app\",\"operator\":\"Exists\", \"effect\":\"NoSchedule\"}]"
},
"enable": true
},
"spec": {
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"securityContext": {},
"processes": [
{
"name": "foundation",
"package": "xxxxx",
"resources": {
"limits": {
"cpu": "100m",
"memory": "1Gi"
}
},
"lifecyclePlan": {
"kind": "ProcessLifecycle",
"namespace": "engb",
"name": "app-plc"
},
"env": [
{
"name": "SECRET_USERNAME",
"valueFrom": {
"secretKeyRef": {
"name": "key-secret",
"key": "uniagentuser"
}
}
},
{
"name": "SECRET_PASSWORD",
"valueFrom": {
"secretKeyRef": {
"name": "key-secret",
"key": "uniagenthash"
}
}
}
]
},
when the app deploy succeeds, the env variables do not exist at all.
What should I do to solve this problem?
Thanks
Daemon Sets have to be docker containers. You can't have non-containerized programs run as Daemon Sets. https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ Kubernetes only launches containers.
Also in your YAML manifest file, I see a "processes" key and I have reason to believe it's not a valid manifest file, so I doubt you deployed it successfully.
You have not pasted the "full" YAML file, but I'm guessing the "template" key at the beginning is the spec.template key of the file.
Run kubectl explain daemonset.spec.template.spec and you'll see that there is no "processes" field.