FlywayDB ignore sub-folder in migration - migration

I have a situation where I would like to ignore specific folders inside of where Flyway is looking for the migration files.
Example
/db/Migration
2.0-newBase.sql
/oldScripts
1.1-base.sql
1.2-foo.sql
I want to ignore everything inside of the 'oldScripts' sub folder. Is there a flag that I can set in Flyway configs like ignoreFolder=SOME_FOLDER or scanRecursive=false?
An example for why I would do this is say, I have 1000 scripts in my migration folder. If we onboard a new member, instead of having them run the migration on 1000 files, they could just run the one script (The new base) and proceed from there. The alternative would be to never sync those files in the first place, but then people would need to remember to check source control to prior migrations instead of just looking on their local drive.

This is not currently supported directly. You could put both directories at the same level in the hierarchy (without nesting them) and selectively configure flyway.locations to achieve the same thing.

Since Flyway 6.4.0 wildcards are supported in flyway.locations. Examples:
db/**/test
db/release1.*
db/release1.?
More info at https://flywaydb.org/blog/organising-your-migrations

Related

I am going through project configuration in craftcms

I am going through documentations for project configuration in craftcms not sure what is the different between project-config/write and project-config/modify?
project-config/write takes the config currently stored in the database and writes it out as YAML files in the config/ folder. You usually don't need this since Craft does this automatically whenever you change something in the backend (unless you have turned that off).
project-config/rebuild attempts to rebuild the entire project-config based on the state of the entire database. This is only required in rare edge cases.
Rest you can check the official documentation here:
https://craftcms.com/docs/4.x/project-config.html#what-s-stored-in-project-config

Multiple users executing the same workflow

Are there guidelines regarding how to share a Snakemake workflow among multiple users on the same data under Linux, or is the whole thing considered bad practice?
Let me explain in case it's not clear:
Suppose user A executes a workflow in directory dir/. Assume the workflow terminates successfully, and he/she then properly sets file/directory permissions recursively on all output and intermediate files and the .snakemake/ subdirectory for other users to read/write, of course.
User B subsequently navigates to dir/, adds input files to the workflow, then executes it. Can anything go wrong?
TL;DR: I'm asking about non-concurrent execution of the same workflow by distinct users on the same system, and on the same data on disk. Is Snakemake designed for such use cases?
It's possible to run snakemake --nolock which will prevent locking of the directory, so multiple runs can be made from inside the same directory. However, without lock, there's now an opening for errors due to concurrent runs trying to modify the same files. It's probably OK, if you are certain that this will be avoided, e.g. if you are in constant communication with another user about which files will be modified.
An alternative option is to create a third directory/path, and put all the data there. This way you can work from separate directories/path and avoid costly recomputes.
I would say that from the point of view of snakemake, and workflow management in general, it's ok for user B to add or update input files and re-run the pipeline. After all, one of the advantages of a workflow management system is to update results according to new input. The problem is that user A could find her results updated without being aware of it.
From the top of my head and without more detail this is what I would suggest. Make snakemake read the list of input files from a table (pandas comes in handy for this) or from some configuration file. Keep this sample sheet under version control (with git/github) together with the Snakefile and other source code.
When users update the working directory with new files, they will also need to update the sample sheet in order for snakemake to "see" the new input and other users will know about it via version control. I prefer this setup over dumping files in a directory and letting snakemake process whatever is in there.

Play framework ignores evolution script

I'm currently covering the basics of SQL databases and using them in play framework. I have created postgres database and successfully configured it in my application.conf
db.default.driver=org.postgresql.Driver
db.default.url="jdbc:postgresql://database.example.com/playdb"
db.default.user=postgres
db.default.password=qwerty
I have also created 1.sql file in conf/evolutions/evolutions/default directory and wrote there same example SQL code to create simple table. The problem is that play seems to ignore the existence of this file. When I run my server and connect to localhost, I'm suppoused to be asked by Play, whether I would like to have my script applied to my database or not. Unfortunately I'm not and the only thing play is doing, is loading my home page (CREATE TABLE in 1.sql is not executed and I don't have any tables created). Any ideas what am I doing wrong?
Make sure you have following line in your build.sbt file
libraryDependencies += evolutions
In my case, evolution ignored 2.sql (and 1.sql).
To resolve this I had to remove those comments from 1.sql:
#--- Created by Ebean DDL
# To stop Ebean DDL generation, remove this comment and start using Evolutions
and add my own comment such as:
# Initial version
Also, viewing 1.sql file in VI revealed it contains ^M characters which had to be removed.
After those two steps, evolution stopped overwriting 1.sql and finally used the provided 2.sql file.
Evolution seems to overwrite 1.sql even if you do not perform the db update, so make sure you are editing the original version.
I am using play 2.4.1.

How to automate source control with Oracle database

I work in an Oracle instance that has hundreds of schemas and multiple developers. We have a development instance where developers can integrate their work before test or production.
We want to have source control for all the DDL run in this integrated development database. Currently this is done through a product Red Gate which we run manually after we make a change to the database. Redgate finds the changes between what is in the schema and what was last checked into source control and makes a script of the differences and puts this into source control.
The problem however is of course that running regdate can take some time and people run it infrequently or not at all for small changes. Also redgate will only look in one schema at a time and it would be VERY time consuming to manually run it against all schemas to guarantee that they are up to date. However if the source controlled code cannot be relied upon it becomes less useful...
What would seem to be ideal would be to have some software that could periodically (even once a day), or when triggered by DDL being run, update the source control (preferably github as this is used by other teams) from all the schemas.
I cannot seem to see any existing software which can be simply used to do this.
Is there a problem with doing this? (there is no need to address multiple developers overwriting each others work on the same day as we have this covered in a separate process) Is anyone doing this? Can anyone recommend a way to do this?
We do this with help of a PL/SQL function, a python script and a shell script:
The PL/SQL function can generate the DDL of a whole schema and returns this as CLOB
The python script connects to the database, fetches the DDL and stores it in files
The shell script runs the Source Control to add the modifications (we use Bazaar here).
You can see the scripts on PasteBin:
The PL/SQL function is here: http://pastebin.com/AG2Fa9zL
The python program (schema_exporter.py): http://pastebin.com/nd8Lf0gK
The shell script:
The shell script:
python schema_exporter.py
d=$(date +%Y-%m-%d__%H_%M_%S)
bzr add
bzr st | grep -q -E 'added|modified' && commit -m "Database objects on $d"
exit 0
This shell script is configured to run from cron every day.
Being in the database version control space for 5 years (as director of product management at DBmaestro) and having worked as a DBA for over two decades, I can tell you the simple fact that you cannot treat the database objects as you treat your Java, C# or other files and save the changes in simple DDL scripts.
There are many reasons and I'll name a few:
Files are stored locally on the developer’s PC and the change s/he
makes do not affect other developers. Likewise, the developer is not
affected by changes made by her colleague. In database this is
(usually) not the case and developers share the same database
environment, so any change that were committed to the database affect
others.
Publishing code changes is done using the Check-In / Submit Changes /
etc. (depending on which source control tool you use). At that point,
the code from the local directory of the developer is inserted into
the source control repository. Developer who wants to get the latest
code need to request it from the source control tool. In database the
change already exists and impacts other data even if it was not
checked-in into the repository.
During the file check-in, the source control tool performs a conflict
check to see if the same file was modified and checked-in by another
developer during the time you modified your local copy. Again there
is no check for this in the database. If you alter a procedure from
your local PC and at the same time I modify the same procedure with
code form my local PC then we override each other’s changes.
The build process of code is done by getting the label / latest
version of the code to an empty directory and then perform a build –
compile. The output are binaries in which we copy & replace the
existing. We don't care what was before. In database we cannot
recreate the database as we need to maintain the data! Also the
deployment executes SQL scripts which were generated in the build
process.
When executing the SQL scripts (with the DDL, DCL, DML (for static
content) commands) you assume the current structure of the
environment match the structure when you create the scripts. If not,
then your scripts can fail as you are trying to add new column which
already exists.
Treating SQL scripts as code and manually generating them will cause
syntax errors, database dependencies errors, scripts that are not
reusable which complicate the task of developing, maintaining,
testing those scripts. In addition, those scripts may run on an
environment which is different from the one you though it would run
on.
Sometimes the script in the version control repository does not match
the structure of the object that was tested and then errors will
happen in production!
There are many more, but I think you got the picture.
What I found that works is the following:
Use an enforced version control system that enforces
check-out/check-in operations on the database objects. This will
make sure the version control repository matches the code that was
checked-in as it reads the metadata of the object in the check-in
operation and not as a separated step done manually. This also allow
several developers to work in parallel on the same database while
preventing them to accidently override each other code.
Use an impact analysis that utilize baselines as part of the
comparison to identify conflicts and identify if a difference (when
comparing the object's structure between the source control
repository and the database) is a real change that origin from
development or a difference that was origin from a different path and
then it should be skipped, such as different branch or an emergency
fix.
Use a solution that knows how to perform Impact Analysis for many
schemas at once, using UI or using API in order to eventually
automate the build & deploy process.
An article I wrote on this was published here, you are welcome to read it.
To me it seems like your way of working is backwards: developers run DDL against the DB in an unordered fashion and then you need an automated tool for inferring the changes (and the DDL) that was run.
The process would be in better control if you did the following instead:
Developers write DDL as SQL scripts, preferably using a migration tool such as Flyway (http://flywaydb.org/documentation/migration/sql.html).
Migration scripts are checked into version control
Migration scripts are periodically run against the DB (e.g. by the migration tool)
In this workflow, the DB would only get altered through automated migration scripts and no-one is allowed to do changes manually. Could this work for you?
(I develop the Oracle tools for Redgate)
Actually using the tools you can already what I think you're asking for using Schema Compare for Oracle.
You can compare multiple schemas either in the UI or via the command line - I think what you're after is automating the command line tool which can create difference scripts, sync between source and destination (live, snapshot or scripts) and generate reports.
You can automate the command line to sync to a scripts folder which is your source code checkout and then subsequently run a command to commit the changes.
I think that's all good :)
We built a commerical tool that bridges Oracle with Git. It helps you manage your database objects with Git. Basically, the database becomes the working directory for the developer. You can perform git operations in the database such as reset, commit, branch, merge etc... and the database code is updated automatically. It might be worth taking a look: https://www.gitora.com

Choosing multiple Hibernate import.sql based on conditions

How can I specify which import file I want hibernate to run. Is there any configuration option that I can put (I think I have seen something like this somewhere) that I can say custom .sql file and hibernate will run it.
I want to split my creation into multiple files. And also I want to run differnet scripts that will generate date based on my hibernate config that I am using. So if I am using local it should one set of .sql files and if I am testing it into QA it should use another.
I have multiple config files that I can run depending on what I want, so now I need to figure out how to put which script should run in which configuration.
cheers
'hibernate.hbm2ddl.import_files' is the setting you want (org.hibernate.cfg.AvailableSettings#HBM2DDL_IMPORT_FILES).
http://docs.jboss.org/hibernate/orm/4.1/javadocs/org/hibernate/cfg/AvailableSettings.html#HBM2DDL_IMPORT_FILES