mercure.rocks database size grows indefinitely - mercure

I have a problem with mercure - it grows indefinitely.
I see there is a setting
size but I don't see how to apply it in docker-compose
what size i should put ? i dont want history - or realy small.
mercury:
image: dunglas/mercure
restart: always
volumes:
- caddy_data:/data
- caddy_config:/config
environments:
- HEARTBEAT_INTERVAL=30s
- SERVER_NAME= :80
- MERCURE_EXTRA_DIRECTIVES= cors_origins *

Related

In the Gitlab CI pipeline, how can I conditionally run jobs in parallel?

Say, for example, my pipeline contains the following job:
sast-container:
<<: *branches
allow_failure: true
parallel:
matrix:
- CI_REGISTRY_IMAGE: $CI_REGISTRY_IMAGE/address-check-stub
- CI_REGISTRY_IMAGE: $CI_REGISTRY_IMAGE/manual-service-stub
- CI_REGISTRY_IMAGE: $CI_REGISTRY_IMAGE/employment-record-stub
and I want each of the jobs in the matrix if and only if there has been a change to the code that effects them.
I was thinking along the lines of something like this:
sast-container:
<<: *branches
allow_failure: true
parallel:
matrix:
only:
changes:
- stub-services/address-check-stub
- CI_REGISTRY_IMAGE: $CI_REGISTRY_IMAGE/address-check-stub
only:
changes:
- stub-services/address-check-stub
- CI_REGISTRY_IMAGE: $CI_REGISTRY_IMAGE/manual-service-stub
only:
changes:
- stub-services/address-check-stub
- CI_REGISTRY_IMAGE: $CI_REGISTRY_IMAGE/employment-record-stub
which, of course, to nobody's surprise (including my own), doesn't work.

How to fix AWS S3 error: socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed

We have a long running Pentaho job in Rundeck that has recently started to experience failures.
The Rundeck step definition is:
echo "Using AMI $(aws ec2 describe-images --owners 660354881677 --filters 'Name=name,Values="ds-pentaho-v1.15.0"' --query 'sort_by(Images, &CreationDate)[-1].ImageId' --output text)"
/usr/local/bin/pentaho-cli run-ec2 --environment prd \
--ami-id $(aws ec2 describe-images --owners 660354881677 --filters 'Name=name,Values="ds-pentaho-v1.15.0"' --query 'sort_by(Images, &CreationDate)[-1].ImageId' --output text) \
--arg BOOL_ARG_1=true \
--instance-type "m5.2xlarge" \
--on-demand \
--max-fleet-retries "2" \
--emr-host "some_emr_host.net" \
--dir "/some/job/path/" \
--job "some_job_name"
Specifically, the job has gone from consistently completing to not completing and just hangs in a long-running status. Digging through the logs, we find this:
2022/12/02 06:21:31 - tfo: task_read_table_2 - linenr 13900000
2022/12/02 06:21:11 - tfo: task_read_table_1 - linenr 900000 <- Last Line output for table 1 here
2022/12/02 06:21:59 - tfo: task_read_table_2 - linenr 15150000
2022/12/02 06:22:00 - tfo: task_read_table_2 - linenr 15200000
2022/12/02 06:22:01 - tfo: task_read_table_2 - linenr 15250000
com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the
timeout period. Idle connections will be closed. (Service: Amazon S3;
Status Code: 400; Error Code: RequestTimeout;
2022/12/02 06:22:02 - tfo: task_read_table_2 - linenr 15300000
2022/12/02 06:22:03 - tfo: task_read_table_2 - linenr 15350000
2022/12/02 06:24:52 - in: task_read_table_2 - Finished reading query, closing connection.
2022/12/02 06:24:54 - tfo: task_read_table_3 - linenr 50000
...
2022/12/02 06:35:53 - tfo: task_read_table_3 - linenr 37500000
2022/12/02 06:35:54 Finished reading query, closing connection.
<The job hangs here!>
The AWS S3 exception above occurs anywhere between 10 minutes and 2 hours of the job running.
The job is on a parallel step that is pulling data from multiple PostgreSQL server tables and loading the data into S3 with a text output task. It seems like S3 is hanging up after the read on task_read_table_1 - the data is not written to S3. The job continues to pull records from the other source tables until completion, but it never writes those table outputs to S3 either. From there, the job just hangs. The site engineers are not sure what is going on here. I think this may be an issue with how either AWS or Rundeck are setup. Note: we use terraform to manage timeouts and those are currently set to 24 hours, which is well within the timeout period.
The number of records between successful and unsuccessful runs appears to be the same. There does not appear to be much recent reliable search results on the internet - most results are 5-10 years old, which does not seem relevant.
I do not think this is a problem with the Pentaho job itself because it has completed without fail in the past and the overall records counts of what is being pulled/loaded is stable.
Does anyone know what is potentially causing this issue or how it can be diagnosed?
Note: This is my first engagement working with AWS, Rundeck, Terraform, and Pentaho. I am more of an ETL developer than a site engineer. Any help is appreciated.

gitlab only runs one job in child-pipeline

I have a gitlab-ci.yml that creates and trigger a child .yml
stages:
- child-pipeline-generator
- child-pipeline-trigger
generate-child-pipeline:
stage: child-pipeline-generator
tags:
- GroupRunner
script:
- $(./generate-build.ps1) *>&1 > child-pipeline-gitlab-ci.yml
- (Get-Content child-pipeline-gitlab-ci.yml) | Set-Content child-pipeline-gitlab-ci.yml -Encoding UTF8
artifacts:
paths:
- child-pipeline-gitlab-ci.yml
trigger-child-pipeline:
stage: child-pipeline-trigger
trigger:
include:
- artifact: child-pipeline-gitlab-ci.yml
job: generate-child-pipeline
strategy: depend
The resulting yml looks like
build_1:
tags:
- GroupRunner
script:
- echo 'build_1'
build_2:
tags:
- GroupRunner
script:
- echo 'build_2'
But when executed only job 1 (build_1) shows up in the Downstream list
Turned out the problem was the encoding of the powershell-output. Default encoding from powershell 5 is UFT16BOM and my reencoding to UTF8 resulted in UFT8BOM wich gitlab can´t handle properly. My sollution was to encode in ASCII.
What I can´t explain is why it was able to interpret the first job correctly, I thought that encodeing would result in an all or nothing outcome. Maybe the CRLF-CRLF after the first job caused the errror

Multiple extends or multiple stages?

I want to have a CI to deploy two commands ("bash X" and "bash Y") on different production servers (server 1, server 2, server 3, etc.).
I looked for multiple stages but it don't seems to answer my question.
I don't really care if it runs in parallel or B after A. (the manual section is for debugging)
I don't know how to do it : I tried with multiple extends but it only takes the last one (bashB) in my pipeline.
stages:
- get_password
- bashA
- bashB
get_password:
stage: get_password
# Steps
.bashA:
stage: bashA
script:
- lorem ipsum
when: manual
only:
changes:
- script/bashA.sh
.bashB:
stage: bashB
script:
- ipsum loreem
when: manual
only:
changes:
- script/bashB.sh
# SRV1
deploy-srv1:
extends:
- .bashA
- .bashB
variables:
SRV_1: urlsrv1
# SRV2
deploy-srv2:
extends:
- .bashA
- .bashB
variables:
SRV_1: urlsrv2
I just want to be able to deploy bashA and bash B on X servers (I just took 2 servers for example).
When using multiple extend in GitLab, some of the values will not be merged, but overwritten. If you check the documentation here:
https://docs.gitlab.com/ee/ci/yaml/#extends
They write:
The algorithm used for merge is “closest scope wins”, so keys from the last member will always shadow anything defined on other levels
You are not alone in wanting a feature to be able to merge scripts instead of overwriting them. Here's an open issue on GitLab to do what you described:
https://gitlab.com/gitlab-org/gitlab/issues/16376
In the meantime, and only looking at the example you provided, you can get something like what you want by manually merging bashA and bashB into one job:
stages:
- get_password
- bash
get_password:
stage: get_password
# Steps
.bash_both:
stage: bash
script:
- lorem ipsum
- ipsum loreem
when: manual
only:
changes:
- script/bashA.sh
- script/bashB.sh
# SRV1
deploy-srv1:
extends:
- .bash_both
variables:
SRV_1: urlsrv1
# SRV2
deploy-srv2:
extends:
- .bash_both
variables:
SRV_1: urlsrv2

Way to use anchors/references in job.<spec>.script for DRYness

I'm fairly new to using gitlab-ci and as such, I've run into a problem where the following fails ci-lint because of my use of anchors/references:
image: docker:latest
services:
- docker:dind
variables:
DOCKER_DRIVER: overlay2
DOCKER_HOST: tcp://localhost:2375
.install_thing1: &install_thing1
- do things
- to install
- thing1
.install_thing2: &install_thing2
- do things to
- install thing2
.setup_thing1: &setup_things1
variables:
VAR: var
FOO: bar
script:
- all
- the
- things
before_script:
...
stages:
- deploy-test
- deploy-stage
- deploy-prod
test:
stage: deploy-test
variables:
RUN_ENV: "test"
...
only:
- tags
- branches
script:
- *install_thing1
- *install_thing2
- *setup_thing1
- other stuff
...
test:
stage: deploy-stage
variables:
RUN_ENV: "stage"
...
only:
- master
script:
- *install_thing1
- *install_thing2
- *setup_thing1
- other stuff
When I attempt to lint the gitlab-ci.yml, I get the following error:
Status: syntax is incorrect
Error: jobs:test:script config should be a string or an array of strings
The error eludes to just needing an array for the script piece, which I believe I have. Use of the <<: *anchor pragma causes an error as well.
So, how can one accomplish what I'm trying to do here where I don't have to repeat the code in every -block?
You can fix it and even make it more DRY, take a look at the Auto DevOps template Gitlab created.
It can fix your issue and even more improve your CI file, just have a template job like their auto_devops job, include it in a before_script and then you can combine and call multiple functions in a script block.
The anchors only give you limited flexibility.
(This concept made it possible for me to have one CI file for 20+ projects and a centralized functions file I wget and load in my before_script.)