Handling data loss in Liquibase - liquibase

I am doing some research on how to prevent data loss during migration and stumbled upon Liquibase.
How does Liquibase handle data loss?
Is there any loss of data when we use Liquibase in data migration? (Drop of Index/Column etc)
Thanks

That's not the goal of Liquibase, which is designed to handle schema lifecycle for an application: create table, index, columns, drop table, etc (DDL)
Liquibase deals with data only for initialization or configuration (best practice).
If you want to migrate data from one database to another you can use the editor tools to export/import (if target schema is the same).
Otherwise, you can use ETL tools like Talend for example.
AWS also offer tools to do so in their cloud environment.

Related

Best way to replicate MongoDB NoSQL into SQL tables

How can i replicate (incremental load) MongoDB (NoSQL) to SQL tables.
We have a web-based solution that loading data into MongoDB. The data size is almost 1TB. We need to do BI Reporting in the Looker BI tool. but looker doesn't support MongoDB directly. So we have to replicate our data into SQL form we have redshift for the target database.
Main requirements for parsing NoSQL to SQL:
Parent Node should be the main table
Nested node/arrays should be a separate table with parent key (foreign key)
Whenever a new column is introduced in MongoDB source it should automatically start replicating that new field from any document to the target database.
Incremental refresh from source to target.
I've seen Stitch Data ETL which fits my requirement but I'm looking for OpenSource any ETL/DB tool or library.
Please help.
Posting answers to help out others with the same requirements.
I'm not able to get any open source ETL tool who can full fill the above 4 requirements.
Trying to writing python code to do so. But a paid tool named Precog helped me to fulfill all the above requirements, and a little bit cheaper than Stitch Data ETL.
Thanks

Liquibase - Contexts by data centers

I am starting to use the Liquibase SQL migration tool. I read the documentation but I'm not sure how the context could apply in my current situation.
I have 4 data centers with 2 environments each, one of production and another of tests.
Liquibase creates two tables with the log of the applied changes, DATABASECHANGELOG and DATABASECHANGELOGLOCK and all the scripts are applied in the same table, differentiating them from the context.
Is it possible to separate the list of production changes from the test or is it recommended to differentiate it with the context field?
For example:
Contexts: DC1_PROD, DC1_TEST, DC2_PROD, DC2_TEST, DC3_PROD,
DC3_TEST, DC4_PROD, DC4_TEST
Currently our functionality uses SQL Server Integration Services (SSIS) to replicate the tables with the changes therefore we do not use a centralized system.
Is Liquibase centralized and is it only necessary to have it in a single database?

wso2cep : Data Storage in addition to display

I was wondering if in addition to process and display data on dashboard in wso2cep, can I store it somewhere for a long period of time to get further information later? I have studied there are two types of tables used in wso2cep, in-memory and rdbms tables.
Which one should I choose?
There is one more option that is to switch to wso2das. Is it a good approach?
Is default database is fine for that purpose or I should move towards other supported databases like sql, orcale etc?
In-memory or RDBMS?
In-memory tables will internally use java collections structures, so it'll get destroyed once the JVM is terminated (after server restart, data won't be available). On the other hand, RDBMS tables will persist data permanently. For your scenario, I think you should proceed with RDBMS tables.
CEP or DAS?
CEP will only provide real-time analytics, where DAS provides batch analytics (with Spark SQL) in addition to real-time analytics. If you have a scenario which require batch processing, incremental processing, etc ... You can go ahead with DAS. Note that, migration form CEP to DAS is quite simple (since the artifacts are identical).
Default (H2) DB or other DB?
By default WSO2 products use embedded H2 DB as data source. However, it's recommended to use MySQL or Oracle in production environments.

how to use liquibase diffChangeLog with the current changelog as reference (to generate incremental change set)

I have an existing database and have used the generateChangeLog command line to create the initial changelog. This works fine :-)
But now I want the developers to use all the tools/processes they know/use already to develop the database and code and use a script to generate any incremental change sets as appropriate.
That is: do a diff against the current state of the developer's database (url/username/password in the properties file) using the current changelog (changeLogFile in the properties file) as the base reference.
There seems no easy way to do this - the best I've come up with is:
Create a new temporary database.
Use liquibase to initialise the temp database (to what is currently in the changelog) by overriding the connection url: liquibase --url=jdbc:mysql://localhost:3306/tempbase update
Use liquibase to generate a changeset in the changelog by diff'ing the two databases:
liquibase --referenceUrl=jdbc:mysql://localhost:3306/tempbase --referenceUsername=foo --referencePassword=baz diffChangeLog
Drop the temporary database.
Synchronise the changeset: liquibase changelogSync
but there must be a better way...
You are right that liquibase cannot compare a changelog file with a database. The only real option is to compare your developer database with an actual liquibase-managed database, or at least one temporarily created.
What I would suggest as the better way is to consider shifting the developers to author liquibase changeSets in the first place. It is different tooling than they may be used to, but it has the huge advantage that they will know that the change they wanted to make is the one that will make it all the way to production. Any diff-based process (such as using diffChangeLog) will usually guess right about what changed, but not always and those differences are often not noticed until into production.
Liquibase has various features such as formatted SQL changelogs that are designed to make the transition from developers working directly against their database to tracking changes through Liquibase because once that transition is made many things get much easier.
With Liquibase Pro you can create a snapshot file that accomplishes the same thing. And then use the snapshot file to compare your database updates.
https://www.liquibase.org/documentation/snapshot.html
I mention Pro because it takes care of stored logic comparisons as well.

How to avoid manually writing/managing SQL

My team and I are rapidly developing an Webapp backed by an Oracle DB. We use maven's plugin flyway to manage our db creation and population from INSERT SQL scripts. Typically we add 3-4 tables per sprint and / or modify the existing tables structure.
We model the schema in an external tool that generates the schema including the constraints and run this in first followed by the SQL INSERTs to ensure the integrity of all the data.
We spend too much time managing the changes to the SQL to cover the new tables - by this I mean adding the extra column data to the existing SQL INSERT statements not to mention the manual creation of the new SQL INSERT data particularly when they reference a foreign key.
Surely there is another way, maybe maintaining raw data in Excel and passing this through a parser to the DB. Has anyone any ideas?
10 tables so far and up to 1000 SQL statements, DB is not live so we tear it down on every build.
Thanks
Edit: The inserted data is static reference data the platform depends on to function - menus etc.
The architecture is Tomcat, JSF, Spring, JPA, Oracle
Please store your raw data in tables in the database - hey! why on earth do you want to use Excel for this? You have Oracle Database - the best tool for the job!
Load your unpolished data using SQL*Loader or external tables into regular tables in the database.
From there you have SQL - the most powerful rdbms tool to manipulate your data.
NEVER do slow by slow inserts. (1000 sql statements). Please do CTAS.
Add/enable the constraints AFTER you have loaded all the data.
create table t as select * from raw_data;
or
insert into t (x,y,z) select x,y,z from raw_data;
Using this method, you can bypass the SQL engine and do direct inserts (direct path load). This can even be done in parallel to make your data go into the database superfast!
Do all of your data manipulation in SQL or PLSQL. (Not in the application)
Please invest time learning the Oracle Database. It is full of features for you to use!
Don't just use it like a datadump (a place where you store your data). Create packages - interfaces to your application - your API to the database.
Don't just throw around thousands of statements compiled into your application. It will get messy.
Build your business logic inside the database PLSQL - use your application for presentation.
Best of luck!
Alternatively, you also have the option to implement a Java migration. It could read whatever input data you have (Excel, csv, ...) and do the proper inserts.