How to avoid reinstalling dependencies for each job in Gitlab CI

How to avoid reinstalling dependencies for each job in Gitlab CI - gitlab-ci

I'm using Gitlab CI 8.0 with gitlab-ci-multi-runner 0.6.0. I have a .gitlab-ci.yml file similar to the following:
before_script:
- npm install
server_tests:
script: mocha
client_tests:
script: karma start karma.conf.js
This works but it means the dependencies are installed independently before each test job. For a large project with many dependencies this adds a considerable overhead.
In Jenkins I would use one job to install dependencies then TAR them up and create a build artefact which is then copied to downstream jobs. Would something similar work with Gitlab CI? Is there a recommended approach?

Update: I now recommend using artifacts with a short expire_in. This is superior to cache because it only has to write the artifact once per pipeline whereas the cache is updated after every job. Also the cache is per runner so if you run your jobs in parallel on multiple runners it's not guaranteed to be populated, unlike artifacts which are stored centrally.
Gitlab CI 8.2 adds runner caching which lets you reuse files between builds. However I've found this to be very slow.
Instead I've implemented my own caching system using a bit of shell scripting:
before_script:
# unique hash of required dependencies
- PACKAGE_HASH=($(md5sum package.json))
# path to cache file
- DEPS_CACHE=/tmp/dependencies_${PACKAGE_HASH}.tar.gz
# Check if cache file exists and if not, create it
- if [ -f $DEPS_CACHE ];
then
tar zxf $DEPS_CACHE;
else
npm install --quiet;
tar zcf - ./node_modules > $DEPS_CACHE;
fi
This will run before every job in your .gitlab-ci.yml and only install your dependencies if package.json has changed or the cache file is missing (e.g. first run, or file was manually deleted). Note that if you have several runners on different servers, they will each have their own cache file.
You may want to clear out the cache file on a regular basis in order to get the latest dependencies. We do this with the following cron entry:
#daily find /tmp/dependencies_* -mtime +1 -type f -delete

EDIT: This solution was recommended in 2016. In 2021, you might consider the caching docs instead.
A better approach these days is to make use of artifacts.
In the following example, the node_modules/ directory is immediately available to the lint job once the build stage has completed successfully.
build:
stage: build
script:
- npm install -q
- npm run build
artifacts:
paths:
- node_modules/
expire_in: 1 week
lint:
stage: test
script:
- npm run lint

From docs:
cache: Use for temporary storage for project dependencies. Not useful for keeping intermediate build results, like jar or apk files. Cache was designed to be used to speed up invocations of subsequent runs of a given job, by keeping things like dependencies (e.g., npm packages, Go vendor packages, etc.) so they don’t have to be re-fetched from the public internet. While the cache can be abused to pass intermediate build results between stages, there may be cases where artifacts are a better fit.
artifacts: Use for stage results that will be passed between stages. Artifacts were designed to upload some compiled/generated bits of the build, and they can be fetched by any number of concurrent Runners. They are guaranteed to be available and are there to pass data between jobs. They are also exposed to be downloaded from the UI. Artifacts can only exist in directories relative to the build directory and specifying paths which don’t comply to this rule trigger an unintuitive and illogical error message (an enhancement is discussed at https://gitlab.com/gitlab-org/gitlab-ce/issues/15530 ). Artifacts need to be uploaded to the GitLab instance (not only the GitLab runner) before the next stage job(s) can start, so you need to evaluate carefully whether your bandwidth allows you to profit from parallelization with stages and shared artifacts before investing time in changes to the setup.
So, I use cache. When don't need to update de cache (eg. build folder in a test job), I use policy: pull (see here).

I prefer use cache because removes files when pipeline finished.
Example
image: node
stages:
- install
- test
- compile
cache:
key: modules
paths:
- node_modules/
install:modules:
stage: install
cache:
key: modules
paths:
- node_modules/
after_script:
- node -v && npm -v
script:
- npm i
test:
stage: test
cache:
key: modules
paths:
- node_modules/
policy: pull
before_script:
- node -v && npm -v
script:
- npm run test
compile:
stage: compile
cache:
key: modules
paths:
- node_modules/
policy: pull
script:
- npm run build

I think it´s not recommended because all jobs of the same stage could be executed in parallel.
First all jobs of build are executed in parallel.
If all jobs of build succeeds, the test jobs are executed in parallel.
If all jobs of test succeeds, the deploy jobs are executed in parallel.
If all jobs of deploy succeeds, the commit is marked as success.
If any of the previous jobs fails, the commit is marked as failed and no jobs of further stage are executed.
I have read that here:
http://doc.gitlab.com/ci/yaml/README.html

Solved a problem with a symbolic link to a folder outside the working directory. The solution looks like this:
//.gitlab-ci.yml
before_script:
- New-Item -ItemType SymbolicLink -Path ".\node_modules" -Target "C:\GitLab-Runner\cache\node_modules"
- yarn
after_script:
- (Get-Item ".\node_modules").Delete()
I know this is a enough dirty solution but it saves a lot of time for build process and extends the storage life.

GitLab introduced caching to avoid redownloading dependencies for each job.
The following Node.js example is inspired from the caching documentation.
image: node:latest
# Cache modules in between jobs
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- .npm/
before_script:
- npm ci --cache .npm --prefer-offline
server_tests:
script: mocha
client_tests:
script: karma start karma.conf.js
Note that the example uses npm ci. This command is like npm install, but designed to be used in automated environments. You can read more about npm ci in the documentation and the command line arguments you can pass.
For further information, check Caching in GitLab CI/CD and the cache keyword reference.

Related

How to create a Pipeline with Gitlab CI, just for the affected packages with Turborepo

So, I am in the process to integrate Turborepo into our NodeJS(React, Next, Node) Monorepo which uses GitLab CI. The thing is the example in the Docs is quite not up to what I want.
For reference here is what they have in their Docs:
image: node:latest
# To use Remote Caching, uncomment the next lines and follow the steps below.
# variables:
# TURBO_TOKEN: $TURBO_TOKEN
# TURBO_TEAM: $TURBO_TEAM
stages:
- build
build:
stage: build
script:
- npm install
- npm run build
- npm run test
We have a few stages. Beyond the ones in their example:
install
build
package
What I would ideally like to have is to use Turborepo and GitLab Downstream Pipelines to run as follows:
install stage should run when root package.json has changed.
build, package stage should be run just for the affected packages. (i.e, if shared-lib is changed, then shared-lib should run as well as the 2 consumers app-a, app-b. In parallel)
I read the Docs and I can somehow make the Downstread Job run but not for the affected. Instead it does it for all. The main problem is how can I read the affected packages and its consumers, and just run those.
I read with the latest version I can use the --dry commands to read those. But let's say that works reliable which from my testing it doesn't. how can I put those packages, as Downstream Jobs in Gitlab?

Gitlab CI: npm doesn't like the cached node_modules

The internet is full of complains about Gitlab not caching, but in my case I think, that Gitlab CI indeed caches correctly. The thing is, that npm seems to install everything again anyway.
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- vendor/
- bootstrap/
- node_modules/
build-dependencies:
image: ...
stage: build
script:
- cp .env.gitlab-testing .env
- composer install --no-progress --no-interaction
- php artisan key:generate
- npm install
- npm run prod
- npm run prod
artifacts:
paths:
- vendor/
- bootstrap/
- node_modules/
- .env
- public/mix-manifest.json
tags:
- docker
This is my gitlab-ci.yml file (well.. the relevant part). While the cached composer dependencies are used, the node_modules aren't. I even added everything to cache and artifacts out of desperation..

Updated Answer (Jul 30, 2022, GitLab#^15.12 & >13)
Just like the comments received, the use of artifacts is not ideal in the original answer, but was the cleanest approach that worked reliably. Now, that GitLab's documentation has been updated around the use of cache and it was also expanded to support multiple cache keys per job (4 maximum unfortunately), there is a better way to handle node_modules across a pipeline.
The rationale for implementation is based on understand the quirks of both GitLab and how npm works. These are the fundamentals:
NPM recommends the use of npm ci instead of npm install when running in a CI/CD environment. FYI, this will require the existence of package-lock.json, which is used to ensure 0 packages are automatically version bumped while running in a CI environment (npm i by default will not create the same deterministic build every time, such as on a job re-run).
npm ci deliberately removes the entirety of node_modules first before re-installing all packages listed in package-lock.json. Therefore, it is best to only configure GitLab to run npm ci once and ensure the resulting node_modules is passed to other jobs.
NPM has its own cache that it stores at ~/.npm/ in case of offline builds and overall speed. You can specify a different cache location with the --cache <dir> option (you will need this). (variation of #Amityo's answer)
GitLab cannot cache any directory outside of the repository! This means the default cache directory ~/.npm cannot be cached.
GitLab's global cache configuration is applied to every job by default. Jobs will need to explicit override the cache config if it doesn't need the globally cached files. Using YAML anchors, the global cache config can be copied & modified but this doesn't seem to work if you want to override a global setting when the cache uses a list (resolution still under investigation).
To run additional npx or npm run <script> without re-running an install, you should cache node_modules/ folder(s) across the pipeline.
GitLab's expectation is for users to use the cache feature to handle dependencies and only use artifacts for dynamically generated build results. This answer now supports this desire better than possible before. There is the restriction that artifacts should be less than the maximum artifact size or 1GB (compressed) on GitLab.com. And artifacts uses your storage usage quota
The use of the needs or dependencies directives will influence if artifacts from a previous job will be downloaded (or deleted) automatically on the next job.
GitLab cache's can monitor the hash value of a file and use that as the key so its possible the cache will only update when the package-lock.json updates. You could use package.json but you would invalidate your deterministic builds as it does not update when minor or patches are available.
If you have a mono-repo and have more than 2 separate packages, you will hit the cache entry limit of 4 during the install job. You will not have the ideal setup but you can combine some cache definitions together. It is also worth noting, GitLab cache.key.files supports a maximum of 2 files to use for the key hash so you likely will need to use another method to determine a useful key. One likely solution will be to use a non-file related key and cache all node_modules/ folders under that key. That way you have only 2 cache entries for the install job and 1 for each subsequent job.
Solution
Run a single install job as .pre stage, using cached downloaded packages (tar.gz's) across entire repository.
Cache all node_modules/ folders to following jobs for that pipeline execution. Do not allow any jobs except install to upload cache (lowers pipeline run time & prevents unintended consequences)
Pass the build/ directory via artifacts on to other jobs only when needed
# .gitlab-ci.yml
stages:
- build
- test
- deploy
# global cache settings for all jobs
# Ensure compatibility with the install job
# goal: the install job loads the cache and
# all other jobs can only use it
cache:
# most npm libraries will only have 1 entry for the base project deps
- key: &global_cache_node_mods
files:
- package-lock.json
paths:
- node_modules/
policy: pull # prevent subsequent jobs from modifying cache
# # ATTN mono-repo users: with only additional node_modules,
# # add up to 2 additional cache entries.
# # See limitations in #10.
# - key:
# files:
# - core/pkg1/package-lock.json
# paths:
# - core/pkg1/node_modules/
# policy: pull # prevent jobs from modifying cache
install:
image: ...
stage: .pre # always first, no matter if it is listed in stages
cache:
# store npm cache for all branches (stores download pkg.tar.gz's)
# will not be necessary for any other job
- key: ${CI_JOB_NAME}
# must be inside $CI_PROJECT_DIR for gitlab-runner caching (#3)
paths:
- .npm/
when: on_success
policy: pull-push
# Mimic &global_cache_node_mods config but override policy
# to allow this job to update the cache at the end of the job
# and only update if it was a successful job
# NOTE: I would use yaml anchors here but overriding the policy
# in a yaml list is not as easy as a dictionary entry (#5)
- key:
files:
- package-lock.json
paths:
- node_modules/
when: on_success
policy: pull-push
# # ATTN Monorepo Users: add additional key entries from
# # the global cache and override the policy as above but
# # realize the limitations (read #10).
# - key:
# files:
# - core/pkg1/package-lock.json
# paths:
# - core/client/node_modules/
# when: on_success
# policy: pull-push
# before_script:
# - ...
script:
# define cache dir & use it npm!
- npm ci --cache .npm --prefer-offline
# # monorepo users: run secondary install actions
# - npx lerna bootstrap -- --cache .npm/ --prefer-offline
build:
stage: build
# global cache settings are inherited to grab `node_modules`
script:
- npm run build
artifacts:
paths:
- dist/ # where ever your build results are stored
test:
stage: test
# global cache settings are inherited to grab `node_modules`
needs:
# install job is not "needed" unless it creates artifacts
# install job also occurs in the previous stage `.pre` so it
# is implicitly required since `when: on_success` is the default
# for subsequent jobs in subsequent stages
- job: build
artifacts: true # grabs built files
# dependencies: could also be used instead of needs
script:
- npm test
deploy:
stage: deploy
when: on_success # only if previous stages' jobs all succeeded
# override inherited cache settings since node_modules is not needed
cache: {}
needs:
- job: build
artifacts: true # grabs dist/
script:
- npm publish
GitLab's recommendation for npm can be found in the GitLab Docs.
[DEPRECATED] Original Answer (Oct 27, 2021, GitLab<13.12)
All the answers I see so far give only half answers but don't actually fully accomplish the task of caching IMO.
In order to fully cache with npm & GitLab, you must be aware of the following:
See #1 above
npm ci deliberately removes the entirety of node_modules first before re-installing all packages listed in package-lock.json. Therefore, configuring GitLab to cache the node_modules directory between build jobs is useless. The point is to ensure no preparation hooks or anything else modified node_modules from a previous run. IMO, this is not really valid for a CI environment but you can't change it and maintain the fully deterministic builds.
See #3-#4 above
If you have multiple stages, global cache will be downloaded every job. This likely is not what you want!
To run additional npx commands without re-running an install, you should pass the node_modules/ folder as an artifact to other jobs.
[DEPRECATED] Solution
Run a single install job as .pre stage, using cached downloaded packages (tar.gz's) across entire repository.
Pass node_modules & the build directory on to other jobs only when needed
stages:
- build
- test
- deploy
install:
image: ...
stage: .pre # always first, no matter if it is listed in stages
cache:
key: NPM_DOWNLOAD_CACHE # a single-key-4-all-branches for install jobs
paths:
- .npm/
before_script:
- cp .env.gitlab-testing .env
- composer install --no-progress --no-interaction
- php artisan key:generate
script:
# define cache dir & use it npm!
- npm ci --cache .npm --prefer-offline
artifacts:
paths:
- vendor/
- bootstrap/
- node_modules/
- .env
- public/mix-manifest.json
build:
stage: build
needs:
- job: install
artifacts: true # true by default, grabs `node_modules`
script:
- npm run build
artifacts:
paths:
- dist/ # whereever your build results are stored
test:
stage: test
needs:
- job: install
artifacts: true # grabs node_modules
- job: build
artifacts: true # grabs built files
script:
- npm test
deploy:
stage: deploy
needs:
# does not need node_modules so don't state install as a need
- job: build
artifacts: true # grabs dist/
- job: test # must succeed
artifacts: false # not needed
script:
- npm publish

Actually it should work, your cache is set globally, your key refers to the current branch ${CI_COMMIT_REF_SLUG}...
This is my build and it seems to cache the node_modules between the stages.
image: node:latest
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .next/
stages:
- install
- test
- build
- deploy
install_dependencies:
stage: install
script:
- npm install
test:
stage: test
script:
- npm run test
build:
stage: build
script:
- npm run build

I had the same issue, for me the problem was down to the cache settings, by default the cache does not keep unversioned git files and since we do not store node_modules in git the npm files were not cached at all.
So all I had to do was insert one line "untracked: true" like below
cache:
untracked: true
key: ${CI_COMMIT_REF_SLUG}
paths:
- vendor/
- bootstrap/
- node_modules/
Now npm is faster, although it still needs to check if things have changed, for me this still takes a couple of minutes, so I considering having a specific job to do the npm install, but it has sped things up a lot already.

The default cache path is ~/.npm
To set the npm cache directory:
npm config set cache <path> --global
see here for more information

Gitlab CI: create dist folder inside repository?

I just recently started using gitlab CI to automate some build/deploy steps. It works perfectly to build docker images etc, but I was wondering if it's possible to create a folder in the repository during a build step? For example I'm now making an npm utility package, but I'm just importing it in my other projects via a private gitlab repo (using deploy token), but the code of the util package is written in es6 and needs to be transpiled to commonJS to be used in the other packages. Manually I can run npm run build and it will output a dist folder with the transpiled code.
I was trying (and researching) if it's possible to automate this build process using .gitlab-ci but so far I couldn't find anything.
Anyone know how I can achieve this and/or if this is possible?
Thanks in advance!

Not sure if I got your question correctly, so add more details if not.
When your CI build creates new folders or files, they are written to the task runner's file system (no surprise here, I assume).
If you want to access these files from Gitlab's web UI you can define them as artifacts in your build job (see https://docs.gitlab.com/ee/user/project/pipelines/job_artifacts.html)
Your build job would look something like that (pseudo code written by memory, not tested on Gitlab):
build:
script:
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 week
UPDATE If you want to upload the build artifact to an NPM registry, you could just build and push together.
build:
script:
- npm run build
- npm publish <PARAMETERS>

Cache npm install Task in VSTS

I have configured a private agent in VSTS and have installed NPM there globally. When I'm trying to install NPM through my build task, it is still installing NPM packages for every build which is taking an aweful lot of time- approximately 12 minutes.
How can I cache the NPM installations so that the build time is reduced?

We use npm-cache, npm-cache is a node module that will calculate a hash of your package.json file for every hash it will create zip folder on your build server with the content of node_modules, now npm install is reduced to extracting a zip on every build (of course only in case you didn’t actually change package.json).
The idea is: in the first time the tool download the npm packages and save them locally, in the second time if the package.json not changed he takes the packages from the local disk and copy them to the build agent folder, only if the package.json changed he downloads the packages from the internet.
Install the npm-cache on the build machine:
npm install npm-cache -g
In the build definition add Command Line task (Tool: C:\Windows\User\AppData\Roaming\npm\npm-cache (or just npm-cache if you add the tool to environment path variables); Arguments:install npm; Working folder: $(Build.SourcesDirectory) (or where package.json located).

MS has finally implemented this feature (currently in beta) https://learn.microsoft.com/en-us/azure/devops/pipelines/caching/index?view=azure-devops#nodejsnpm
From there:
variables:
npm_config_cache: $(Pipeline.Workspace)/.npm
steps:
- task: CacheBeta#0
inputs:
key: $(Build.SourcesDirectory)/package-lock.json
path: $(npm_config_cache)
displayName: Cache npm
- script: npm ci

Unfortunately we cannot cache the NPM installations as no such a built-in feature for now.
However there's already a user voice submitted to suggest the feature : Improve hosted build agent performance with build caches, and seems the VSTS team are actively working on this now...
For now, you can try to speed Up NPM INSTALL on Visual Studio Team Services

Use Cache task
Caching is added to a pipeline using the Cache pipeline task. This
task works like any other task and is added to the steps section of a
job
With the following configuration:
pool:
name: Azure Pipelines
steps:
- task: Cache#2
inputs:
key: 'YOUR_WEB_DIR/package.json'
path: 'YOUR_WEB_DIR/node_modules/'
- task: Npm#1
inputs:
command: 'install'
workingDir: 'YOUR_WEB_DIR/frontend'
You can use key YOUR_WEB_DIR/package-lock.json too, but be aware that file might be changed by other next step like npm install so hash also will be changed.

How to run build in local machine with drone.io

Does the build have to run on the drone.io server? Can I run the build locally? Since developers need to pass the build first before pushing code to github, I am looking for a way to run the build on developer local machine. Below is my .drone.yml file:
pipeline:
build:
image: node:latest
commands:
- npm install
- npm test
- npm run eslint
integration:
image: mongo-test
commands:
- mvn test
It includes two docker containers. How to run the build against this file in drone? I looked at the drone cli but it doesn't work in my expected way.

#BradRydzewski comment is the right answer.
To run builds locally you use drone exec. You can check the docs.
Extending on his answer, you must execute the command in the root of your local repo, exactly where your .drone.yml file is. If your build relies on secrets, you need to feed these secrets through the command line using the --secret or --secrets-file option.
When running a local build, there is no cloning step. Drone will use your local git workspace and mount it in the step containers. So, if you checkout some other commit/branch/whatever during the execution of the local build, you will mess things up because Drone will see those changes. So don't update you local repo while the build is running.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas