Spring boot flyway migrations - sql

I am using flyway for migrations in my Spring boot application. I have around 5 migration scripts with names in the below fashion:
V1__initialmigrations.sql
V2__alter_message_table.sql
When the migrations run and I see the data in 'flyway_schema_history' table, the data looks good for all migration scripts except the very first one for which under the 'script' column, the value is '<< Flyway Baseline >>' rather than the name of the script unlike other rows. Also, the 'installed_by' column has the value 'null' for this very row while other have the user name that I have in my Spring boot yml file. Also, the 'checksum' is null as well.
The only flyway related properties in the spring env yml file are :
spring:
flyway:
baseline-on-migrate: true
enabled: true
I am not sure if this is the right behavior. Any inputs would be appreciated.

You only use the baseline-on-migrate property if you've created a new baseline after the initial application. Flyway ignores all script with a version below what is set in baseline-version (default 1).
For example in applications i've worked on we had years of migration script backlog. we replaced all of them with a single file with current db structure and enabled baseline-on-migrate with baseline-version: X.1 where our new baseline script was VX_0_0.
See also official documentation on baseline: https://flywaydb.org/documentation/concepts/baselinemigrations

Related

How to specify model schema when referencing another dbt project as a package? (dbt multi-repo setup)

We're using a dbt multi-repo setup with different projects for different business areas. We have several projects, something like this:
dbt_dwh
dbt_project1
dbt_project2
The dbt_dwh project contains models which we plan to reference in projects 1 and 2 (we have ~10 projects that would reference the dbt_dwh project) by way of installing git packages. Ideally, we'd like to be able to just reference the models in the dbt_dwh project (e.g.
SELECT * FROM {{ ref('dbt_dwh', 'model_1') }}). However, each of our projects sits in it's own database schema and this causes issue upon dbt run because dbt uses the target schema from dbt_project_x, where these objects don't exist. I've included example set-up info below, for clarity.
packages.yml file for dbt_project1:
packages:
- git: https://git/repo/url/here/dbt_dwh.git
revision: master
profiles.yml for dbt_dwh:
dbt_dwh:
target: dwh_dev
outputs:
dwh_dev:
<config rows here>
dwh_prod:
<config rows here>
profiles.yml for dbt_project1:
dbt_project1:
target: project1_dev
outputs:
project1_dev:
<config rows here>
project1_prod:
<config rows here>
sf_orders.sql in dbt_dwh:
{{
config(
materialized = 'table',
alias = 'sf_orders'
)
}}
SELECT * FROM {{ source('salesforce', 'orders') }} WHERE uid IS NOT NULL
revenue_model1.sql in dbt_project1:
{{
config(
materialized = 'table',
alias = 'revenue_model1'
)
}}
SELECT * FROM {{ ref('dbt_dwh', 'sf_orders') }}
My expectation here was that dbt would examine the sf_orders model and see that the default schema for the project it sits in (dbt_dwh) is dwh_dev, so it would construct the object reference as dwh_dev.sf_orders.
However, if you use command dbt run -m revenue_model1 then the default dbt behaviour is to assume all models are located in the default target for dbt_project1, so you get something like:
11:05:03 1 of 1 START sql table model project1_dev.revenue_model1 .................... [RUN]
11:05:04 1 of 1 ERROR creating sql table model project1_dev.revenue_model1 ........... [ERROR in 0.89s]
11:05:05
11:05:05 Completed with 1 error and 0 warnings:
11:05:05
11:05:05 Runtime Error in model revenue_model1 (folder\directory\revenue_model1.sql)
11:05:05 404 Not found: Table database_name.project1_dev.sf_orders was not found
I've got several questions here:
How do you force dbt to use a specific schema on runtime when using dbt ref function?
Is it possible to force dbt to use the default parameters/settings for models inside the dbt_dwh project when this Git repo is installed as a package in another project?
Some points to note:
All objects & schemas listed above sit in the same database
I know that many people recommend mono-repo set-up to avoid exactly this type of scenario, but switching to a mono-repo structure is not feasible right now, as we are already fully invested in multi-repo setup
Although it would be feasible to create source.yml files in each of the dbt projects to reference the output objects of the dbt_dwh project, this feels like duplication of effort and could result in different versions of the same sources.yml file across projects
I appreciate it is possible to hard-code the output schema in the dbt config block, but this removes our ability to test in dev environment/schema for dbt_dwh project
I managed to find a solution so I'll answer my own question in case anybody else runs up against the same issue. Unfortunately this is not documented anywhere that I can find, however, a throw-away comment in the dbt Slack workspace sparked an idea that allowed me to find the/a solution (I'll post the message if I manage to find it again, to give credit where it's due).
To fix this is actually very simple, you just need to add the project being imported to your profiles.yml file and specify the schema. For our use case this is fine as we only have 1 schema we use.
profiles.yml for dbt_project1:
models:
db_project_1:
outputs:
project1_dev:
<configs here>
project1_prod:
<configs here>
dbt_dwh:
+schema: [[schema you want these models to run into]]
<configs here>
The advantages with this approach are:
When you generate/serve dbt docs it allows you to see the upstream lineage from the upstream project
If there are any upstream dependencies in your upstream project you can run this using dbt run -m +model_name (this can be super handy)
If you don't want this behaviour then you can use dbt run -m +model_name --exclude dbt_dwh (for example) to prevent models in your upstream project from running.
I haven't yet figured out if it is possible to use the default parameters/settings for models inside the upstream project (in this case dbt_dwh) but I will edit this answer if I find a way.

Use `needs` keyword in GitLab CI with variable stage name

My pipeline has two different contexts, per se. If a developer is running on a branch other than main, a job called scan_sandbox is created on the pipeline of the merge request to scan the Dockerfile that the developer is working on currently.
When the branch is merged into main, a scan_production job is created, implying that the image is going to be pushed to the registry and later used in production environment.
My problem is to deal with this variable stage name, either scan_sandbox or scan_production with the needs statement, in order to fetch and publish the scan results. I've tried...
needs: ["scan_production", "scan_sandbox"]
But it returns an error, since both stages aren't going to be declared in different contexts. Also tried...
needs: ["container_scan"]
Which is the name of the stage where both scans will run, but GitLab CI also doesn't interpret it this way.
Anyone has any ideas?
Here is an image of the problem:
You can specify a dependency to be optional in Gitlab yaml. In case if it's optional, Gitlab won't fail if the stage was not executed. It is specifically added to handle the stages with rules, only, or except conditions.
So you can specify
needs:
- job: scan_sandbox
optional: true
- job: scan_production
optional: true
Notes:
This will work for Gitlab version >= 13.9
Doc link: https://docs.gitlab.com/ee/ci/yaml/#needsoptional

DBT - [WARNING]: Did not find matching node for patch

I keep getting the error below when I use dbt run - I can't find anything on why this error occurs or how to fix it within the dbt documentation.
[WARNING]: Did not find matching node for patch with name 'vGenericView' in the 'models' section of file 'models\generic_schema\schema.sql'
did you by chance recently upgrade to dbt 1.0.0? If so, this means that you have a model, vGenericView defined in a schema.yml but you don't have a vGenericView.sql model file to which it corresponds.
If all views and tables defined in schema are 1 to 1 with model files then try to run dbt clean and test or run afterward.
Not sure what happened to my project, but ran into frustration looking for missing and/or misspelled files when it was just leftovers from different compiled files not cleaned out. Previously moved views around to different schemas and renamed others.
So the mistake is here in the naming:
The model name in the models.yml file should for example be: employees
And the sql file should be named: employees.sql
So your models.yml will look like:
version: 2
models:
- name: employees
description: "View of employees"
And there must be a model with file name: employees.sql
One case when this will happen is if you have the same data source defined in two different schema.yml file (or whatever you call it)

How to apply connection settings to all JDBC test steps in SoapUI?

tl;dr
After (manually) having updated the JDBC connection properties of a single SoapUI test step,
how can I copy them to the other test steps in the project (without resorting to ${property} expansion)?
I suppose Groovy is the key?
Background
I have a SoapUI Project containing many JDBC test steps pointing to my development database like that:
The Open source version of JDBC TestStep has fields for setting the
connection properties and the SQL query manually.
Getting Started | JDBC (SoapUI.org)
Constraint: I am currently working without having the Connections feature from Smartbear's Pro version available.
Goal
Before deploying, I want to run the same tests in our staging environment i.e. I have to change JDBC connection settings throughout the test suite(s).
Preliminary considerations:
In order to re-direct all JDBC steps to the staging database I could edit my tests to connection string and driver fields relying on property expansion like described in SOAPUI ability to switch between database connections for test suite.
Specific approach:
However in this case here, I need to see the connection strings and drivers directly on the test steps (in contrast to seeing just the ${expansion} variables) – Rationale: it gives more useful screenshots with the real values ...
The connection properties can be copied from one test step to other JDBC test steps in the project using the following Groovy script:
// Select "correctly configured" JDBC TestStep/Case/Suite to be used as reference
def s = testRunner.testCase
.testSuite
.project
.testSuites["Reference TestSuite"]
.testCases["Reference TestCase"].getTestStepAt(1)
log.info "${s.getConnectionString()}, ${s.getDriver()}"
// Use s to configure all JDBC TestSteps in current TestSuite
testRunner.testCase
.testSuite
.testCases
.each{iTC, testCase ->
//log.debug "${iTC}: ${testCase}"
testCase.getTestStepsOfType(com.eviware.soapui.impl.wsdl.teststeps.JdbcRequestTestStep)
.each{testStep ->
testStep.setConnectionString(s.getConnectionString())
testStep.setDriver(s.getDriver())
log.info "${testStep.getConnectionString()}, ${testStep.getDriver()}"
}
}
To run this, I have introduced an additional test suite internal TS and test case internal TC, respectively. I have added a Groovy TestStep copyJdbcSettings with above script to internal TC and executed it once.
I have then disabled internal TS until I need it again someday.

DBD::Oracle, Cursors and Environment under mod_perl

Need some help, because I can't find any solution for my problems with DBD::Oracle.
So at first, this is the current situation:
We are running Apache2 with mod_perl 2.0.4 at our company
Apache web server was set up with a startup script which is setting some environment variables (LD_LIBRARY_PATH, ORACLE_HOME, NLS_LANG)
In httpd.conf there are also environment variables for LD_LIBRARY_PATH and ORACLE_HOME (via SetEnv)
We are generally using the perl module DBI with driver DBD::Oracle to connect to our main database
Before we create a new instance of DBI we are setting some perl env variables, too (%ENV). We are setting ORACLE_HOME and NLS_LANG.
So far, this works fine. But now we are extending our system and need to connect to a remote database. Again, we are using DBI and DBD::Oracle. But for now there are some new conditions:
New connection must run in parallel to the existing one
TNSNAMES.ORA for the new connection is placed at a different location (not at $ORACLE_HOME.'/network/admin')
New database contents are provided by stored procedures, which we are fetching with DBD::Oracle and cursors (like explained here: https://metacpan.org/pod/DBD::Oracle#Binding-Cursors)
The stored procedures are returning object types and collection types, containing attributes of oracle type DATE
To get these dates in a readable format, we set a new env variable $ENV{NLS_DATE_FORMAT}
To ensure the date format we additionally alter the session by alter session set nls_date_format ...
Okay, this works fine, too. But only if we make a new connection on the console. New TNS location is found by the script, connection could be established and fetching data from the procedures by cursor is also working. Alle DATE types are formatted as specified.
Now, if we try to make this connection at apache environment, it fails. At first the datasource name could not resolved by DBI/DBD::Oracle. I think this is because of our new TNSNAMES.ORA file or rather the location is not found by DBI/DBD::Oracle in Apache context (published by $ENV{TNS_ADMIN}). But I don't know why???
The second problem is (if I create a dirty workaround for our first one) that the date format, published by $ENV{NLS_DATE_FORMAT} is only working on first level of our cursor select.
BEGIN OPEN :cursor FOR SELECT * FROM TABLE(stored_procedure) END;
The example above returns collection types of object which are containing date attributes. In Apache context the format published by NLS_DATE_FORMAT is not recognized. If I use a simple form of the example like this
BEGIN OPEN :cursor FOR SELECT SYSDATE FROM TABLE(stored_procedure) END;
the result (a single date field) is formatted well. So I think subordinated structures were not formatted because $ENV{NLS_DATE_FORMAT} works only in console context and not in Apache context, too.
So there must be a problem with the perl environment variables (%ENV) running under Apache and mod_perl. Maybe a problem of mod_perl?
I am at my wit's end. Maybe anyone in the whole wide world has a solution ... and excuse my english :-) If you need some further explanations, I will try to define it more precisely.
If your problem is that changes to %ENV made while processing a request don't seem to be honoured, this is because mod_perl assumes you might be running multiple threads and doesn't actually change the process environment when you change %ENV, so external libraries (like the oracle client) or child processes don't see the change.
You can work around it by first using the prefork MPM, so there aren't any threading issues, and then making changes to the environment using Env::C instead of the %ENV hash.