Custom Liquibase executor combining JdbcExecutor and LoggingExecutor

Custom Liquibase executor combining JdbcExecutor and LoggingExecutor - liquibase

I'm looking for a way to record and write all those SQL statements to an output
file which get executed while running a Liquibase migration against an empty
target database.
The idea behind this is to speed up the initialization phase of integration tests
against a test database by simply reading and executing the SQL statements from the
generated file for subsequent tests.
I had no luck using updateSQL due to different handling of changesets with
pre-conditions (e.g. changeSetExecuted resolves to true for "update" but false for
"updateSQL").
Another approach was to run the Liquibase migration first, then writing a temporary
changelog file using GenerateChangeLogCommand which is finally used by another Liquibase
instance to produce an SQL update file.
While this approach works, it a) feels a bit hacky-ish, b) the end result is not the same
as running the migration directly.
Anyway, what I've come up with is a custom implementation of JdbcExecutor which incorporates
a LoggingExecutor. The implementation looks as follows:
#LiquibaseService(skip = true)
public class LoggingJdbcExecutor extends JdbcExecutor {
private LoggingExecutor loggingExecutor;
public LoggingJdbcExecutor(Database database, Writer writer) {
loggingExecutor = new LoggingExecutor(this, writer, database);
setDatabase(database);
}
#Override
public void execute(SqlStatement sql, List<SqlVisitor> sqlVisitors) throws DatabaseException {
super.execute(sql, sqlVisitors);
loggingExecutor.execute(sql, sqlVisitors);
}
#Override
public int update(SqlStatement sql, List<SqlVisitor> sqlVisitors) throws DatabaseException {
final int result = super.update(sql, sqlVisitors);
loggingExecutor.update(sql, sqlVisitors);
return result;
}
#Override
public void comment(String message) throws DatabaseException {
super.comment(message);
loggingExecutor.comment(message);
}
}
This executor gets injected into Liquibase before update() is invoked as follows:
final String path = configuration.getUpdateSqlExportFile();
ExecutorService.getInstance().setExecutor(liquibase.getDatabase(), new LoggingJdbcExecutor(
liquibase.getDatabase(), new FileWriter(path)
));
Is this approach reasonable and future proof ? While it seems to work I'm not sure if maybe I'm
missing something and there's a better way.
Thanks

Related

How to catch any exceptions thrown by BigQueryIO.Write and rescue the data which is failed to output?

I want to read data from Cloud Pub/Sub and write it to BigQuery with Cloud Dataflow. Each data contains a table ID where the data itself will be saved.
There are various factors that writing to BigQuery fails:
Table ID format is wrong.
Dataset does not exist.
Dataset does not allow the pipeline to access.
Network failure.
When one of the failures occurs, a streaming job will retry the task and stall. I tried using WriteResult.getFailedInserts() in order to rescue the bad data and avoid stalling, but it did not work well. Is there any good way?
Here is my code:
public class StarterPipeline {
private static final Logger LOG = LoggerFactory.getLogger(StarterPipeline.class);
public class MyData implements Serializable {
String table_id;
}
public interface MyOptions extends PipelineOptions {
#Description("PubSub topic to read from, specified as projects/<project_id>/topics/<topic_id>")
#Validation.Required
ValueProvider<String> getInputTopic();
void setInputTopic(ValueProvider<String> value);
}
public static void main(String[] args) {
MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
Pipeline p = Pipeline.create(options);
PCollection<MyData> input = p
.apply("ReadFromPubSub", PubsubIO.readStrings().fromTopic(options.getInputTopic()))
.apply("ParseJSON", MapElements.into(TypeDescriptor.of(MyData.class))
.via((String text) -> new Gson().fromJson(text, MyData.class)));
WriteResult writeResult = input
.apply("WriteToBigQuery", BigQueryIO.<MyData>write()
.to(new SerializableFunction<ValueInSingleWindow<MyData>, TableDestination>() {
#Override
public TableDestination apply(ValueInSingleWindow<MyData> input) {
MyData myData = input.getValue();
return new TableDestination(myData.table_id, null);
}
})
.withSchema(new TableSchema().setFields(new ArrayList<TableFieldSchema>() {{
add(new TableFieldSchema().setName("table_id").setType("STRING"));
}}))
.withFormatFunction(new SerializableFunction<MyData, TableRow>() {
#Override
public TableRow apply(MyData myData) {
return new TableRow().set("table_id", myData.table_id);
}
})
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry()));
writeResult.getFailedInserts()
.apply("LogFailedData", ParDo.of(new DoFn<TableRow, TableRow>() {
#ProcessElement
public void processElement(ProcessContext c) {
TableRow row = c.element();
LOG.info(row.get("table_id").toString());
}
}));
p.run();
}
}

There is no easy way to catch exceptions when writing to output in a pipeline definition. I suppose you could do it by writing a custom PTransform for BigQuery. However, there is no way to do it natively in Apache Beam. I also recommend against this because it undermines Cloud Dataflow's automatic retry functionality.
In your code example, you have the failed insert retry policy set to never retry. You can set the policy to always retry. This is only effective during something like an intermittent network failure (4th bullet point).
.withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
If the table ID format is incorrect (1st bullet point), then the CREATE_IF_NEEDED create disposition configuration should allow the Dataflow job to automatically create a new table without error, even if the table ID is incorrect.
If the dataset does not exist or there is an access permission issue to the dataset (2nd and 3rd bullet points), then my opinion is that the streaming job should stall and ultimately fail. There is no way to proceed under any circumstances without manual intervention.

What is the best strategy for setting up test data in integration tests?

I'm trying to implement integration tests for my repositories and I got stuck for choosing the right strategy for inserting test data before each integration test.
This is how my repository test class looks like;
[TestFixture]
public class RealtyTest : ITest<IRealtyRepository>
{
public IUnitOfWork Uow { get; set; }
public IRealtyRepository Repository { get; set; }
public ITestEnvironment TestEnvironment { get; set; }
[OneTimeSetUp]
public void OneTimeSetup()
{
TestEnvironment = new RealtyTestEnvironment();
TestEnvironment.Prepare();
}
[SetUp]
public void Setup()
{
Uow = AppCore.Instance.RealtyUow;
Uow.BeginTransaction();
Repository = ((IRealtyUow)Uow).RealtyRepository;
}
[Test]
public void should_get_realty_detail_by_id()
{
Realty realty = Repository.GetDetail(1);
Assert.IsNotNull(realty.Firm);
Assert.IsNotNull(realty.FirmUser);
Assert.IsNotNull(realty.Category);
Assert.IsNotNull(realty.SubCategory);
Assert.IsNotNull(realty.Publish);
Assert.IsNotNull(realty.ResIdence);
Assert.IsNotNull(realty.Star);
Assert.IsNotNull(realty.Floor);
Assert.IsNotNull(realty.Heating);
Assert.IsNotNull(realty.Fuel);
Assert.IsNotNull(realty.BuildState);
Assert.IsNotNull(realty.Usage);
Assert.IsNotNull(realty.Credit);
Assert.IsNotNull(realty.Register);
Assert.IsNotNull(realty.Activate);
Assert.IsNotNull(realty.District);
Assert.IsNotNull(realty.District.County);
Assert.IsNotNull(realty.District.County.City);
Assert.IsNotNull(realty.District.County.City.Country);
}
[Test]
public void should_get_realty_detail_all_by_id()
{
Realty realty = Repository.GetWithChilds(1);
Assert.IsNotNull(realty.Firm);
Assert.IsNotNull(realty.FirmUser);
Assert.IsNotNull(realty.Category);
Assert.IsNotNull(realty.SubCategory);
Assert.IsNotNull(realty.Publish);
Assert.IsNotNull(realty.ResIdence);
Assert.IsNotNull(realty.Star);
Assert.IsNotNull(realty.Floor);
Assert.IsNotNull(realty.Heating);
Assert.IsNotNull(realty.Fuel);
Assert.IsNotNull(realty.BuildState);
Assert.IsNotNull(realty.Usage);
Assert.IsNotNull(realty.Credit);
Assert.IsNotNull(realty.Register);
Assert.IsNotNull(realty.Activate);
Assert.IsNotNull(realty.District);
Assert.IsNotNull(realty.District.County);
Assert.IsNotNull(realty.District.County.City);
Assert.IsNotNull(realty.District.County.City.Country);
Assert.Greater(realty.Files.Count, 0);
Assert.Greater(realty.Attributes.Count, 0);
}
[TearDown]
public void TearDown()
{
if (Uow != null)
{
Uow.Rollback();
Uow.Dispose();
}
}
[OneTimeTearDown]
public void OneTimeTearDown()
{
TestEnvironment.Rollback();
}
}
As you see on OneTimeSetup() method I'm inserting test data and deleting on OneTimeTearDown() method after each integration test methods are done in test class.
For test data creation I've used NDbUnit library.
Here is my RealtyTestEnvironment class implementation;
public class RealtyTestEnvironment : ITestEnvironment
{
private NDbUnitTest _database = null;
public int TestRowCount { get; set; }
public RealtyTestEnvironment()
{
TestRowCount = 1;
}
public void Prepare()
{
_database = new SqlDbUnitTest(TestSettings.HemlakTestDbConnection);
_database.ReadXmlSchema(string.Format(#"{0}\HemlakDb\Hemlak.xsd", TestSettings.AppRootPath));
_database.PerformDbOperation(DbOperationFlag.DeleteAll);
GeneralTestDataBuilder generalDataBuilder = new GeneralTestDataBuilder();
_database.ReadXml(generalDataBuilder.GetTestTypeStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
LocationTestDataBuilder locationDataBuilder = new LocationTestDataBuilder();
_database.ReadXml(locationDataBuilder.GetTestCountryStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
_database.ReadXml(locationDataBuilder.GetTestCityStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
_database.ReadXml(locationDataBuilder.GetTestCountyStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
_database.ReadXml(locationDataBuilder.GetTestDistrictStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
FirmTestDataBuilder firmDataBuilder = new FirmTestDataBuilder();
_database.ReadXml(firmDataBuilder.GetTestFirmStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
_database.ReadXml(firmDataBuilder.GetTestFirmUserStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
RealtyTestDataBuilder realtyDataBuilder = new RealtyTestDataBuilder();
_database.ReadXml(realtyDataBuilder.GetTestRealtyStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
_database.ReadXml(realtyDataBuilder.GetTestRealtyFileStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
_database.ReadXml(realtyDataBuilder.GetTestRealtyAttributeStream(TestRowCount));
_database.PerformDbOperation(DbOperationFlag.InsertIdentity);
}
public void Rollback()
{
if (_database != null)
{
_database.PerformDbOperation(DbOperationFlag.DeleteAll);
}
}
}
So I've chosen NDbUnit but i don't feel comfortable because last commit date of the project is 3 years ago and requires too much effort for preparing test environment.
Some people use Repository class itself for inserting test data,so now we are using Repository class to insert test data for testing that Repository class?? It doesn't make sense to me.
-What are your approaches,how do you insert test data for testing your repositories?
-And how about my implementation, should I continue with Ndbunit?

There are a few options to write integration tests for logic needing a data storage:
mock data storage dependency and use it instead of a real database. Sth like NSubstitute or Moq is used often;
use another faster database as a replacement. EF Core provides in-memory database out of the box or you could use sth like Sqlite;
use a real database.
I personally usually choose the latter to avoid behavior discrepansies and lacking features, while using sth like in-memory database, when compared to a real one.
Then you need to initialize/cleanup your database and there are a few options here as well.
Data seeding could be implemented this way:
each test creates entities it needs;
reused piece of code to create all entities for all the tests at once;
manually written sql script;
use some libraries. I know a few: NDbUnit, which is mentioned by you. It's somewhat outdated, but there a few more-or-less fresh forks. And there is Reseed which I'm developing currently as I wasn't that happy with NDbUnit.
And also there are a few approaches to clean it up afterwards:
create a database from scratch for every test;
restore database from backup;
restore database from shapshot;
wrap tests in transaction and then revert it;
restore changes done in each test in the same test manually;
use insert and delete sql scripts to restore data to its initial state;
use libraries to insert and delete data for you. Reseed is also able to handle that and also there is Respawn.
All these could be somehow combined to achieve the best performance. And instead of using a dedicated Sql Server you could host your databases in docker with use of sth like TestContainers and start them by tests themselves on-demand.
As for the test framework integration, the approach you've chosen fits the problem well. Attributes like [OneTimeSetUp], [OneTimeTearDown], [FixtureSetUp], [FixtureTearDown], [SetUp], [TearDown] is a good way to execute initialization and cleanup of database before/after your tests are run.

Asp.Net core 2.0: Detect Startup class invoked from migration or other ef operation

At the current moment all default Startup.cs flow executed on every db related operation like droping db, adding migration, updating db to migrations, etc.
I have heavy app specific code in Startup which need to be invoked only if application run for real. So how could I detect that Startup class run from migration or other database related dotnet command.

Well, as it was already noticed in comment to a question there is a IDesignTimeDbContextFactory interface which need to be implemented to resolve DbContext at design time.
It could look somewhat like this:
public static class Programm{
...
public static IWebHost BuildWebHostDuringGen(string[] args)
{
return WebHost.CreateDefaultBuilder(args)
.UseStartup<StartupGen>() // <--- I'm just using different Startup child there where could be less complex code
.UseDefaultServiceProvider(options => options.ValidateScopes = false).Build();
}
}
public class DbContextFactory : IDesignTimeDbContextFactory<MyDbContext>
{
public MyDbContex CreateDbContext(string[] args)
{
return Program.BuildWebHostDuringGen(args).Services.GetRequiredService<MyDbContext>();
}
}
However, due to some unclear reasons (I asked guys from Microsoft, but they don't explain this to me) dotnet currently on every operation implicitly call Programm.BuildWebHost even if it's private - that's the reason why standard flow executed each time for the question's author. Workaround for that - Rename Programm.BuildWebHost to something else, like InitWebHost
There is an issue created for that, so maybe it will be resolved in 2.1 release on in future.

The documentation is still a bit unclear as to why this occurs. I've yet to find any concrete answer as to why it runs Startup.Configure. In 2.0 it's recommend to move any migration/seeding code to Program.Main. Here's an example by bricelam on Github.
public static IWebHost MigrateDatabase(this IWebHost webHost)
{
using (var scope = webHost.Services.CreateScope())
{
var services = scope.ServiceProvider;
try
{
var db = services.GetRequiredService<ApplicationDbContext>();
db.Database.Migrate();
}
catch (Exception ex)
{
var logger = services.GetRequiredService<ILogger<Program>>();
logger.LogError(ex, "An error occurred while migrating the database.");
}
}
return webHost;
}
public static void Main(string[] args)
{
BuildWebHost(args)
.MigrateDatabase()
.Run();
}

Adding 'GO' statements to Entity Framework migrations

So I have an application with a ton of migrations made by Entity framework.
We want to get a script for all the migrations at once and using the -Script tag does work fine.
However...it does not add GO statements in the SQL giving us problems like Alter view should be the first statement in a batch file...
I have been searching around and manually adding Sql("GO"); help with this problem but only for the entire script. When I use the package console manager again it returns an exception.
System.Data.SqlClient.SqlException (0x80131904): Could not find stored procedure 'GO'.
Is there a way to add these GO tags only when using the -Script tag?
If not, what is a good approach for this?
Note: we have also tried having multiple files but since we have so many migrations, this is near impossible to maintain every time.

If you are trying to alter your view using Sql("Alter View dbo.Foos As etc"), then you can avoid the should be the first statement in a batch file error without adding GO statements by putting the sql inside an EXEC command:
Sql("EXEC('Alter View dbo.Foos As etc')")

In order to change the SQL Generated by entity framework migrations you can create a new SqlServerMigrationSqlGenerator
We have done this to add a GO statement before and after the migration history:
public class MigrationScriptBuilder: SqlServerMigrationSqlGenerator
{
protected override void Generate(System.Data.Entity.Migrations.Model.InsertHistoryOperation insertHistoryOperation)
{
Statement("GO");
base.Generate(insertHistoryOperation);
Statement("GO");
}
}
then add in the Configuration constructor (in the Migrations folder of the project where you DbContext is) so that it uses this new sql generator:
[...]
internal sealed class Configuration : DbMigrationsConfiguration<PMA.Dal.PmaContext>
{
public Configuration()
{
SetSqlGenerator("System.Data.SqlClient", new MigrationScriptBuilder());
AutomaticMigrationsEnabled = false;
}
[...]
So now when you generate a script using the -Script tag, you can see that the insert into [__MigrationHistory] is surrounded by GO
Alternatively in your implementation of SqlServerMigrationSqlGenerator you can override any part of the script generation, the InsertHistoryOperation was suitable for us.

Turn out the concept exist deep in the SqlServerMigrationSqlGenerator as an optional argument for Statement(sql, batchTerminator). Here is something based on Skyp idea. It works both in -script mode or not. The GOs are for different operations than for Skyp only because our needs are a little different. You then need to register this class in the Configuration as per Skyp instructions.
public class MigrationScriptBuilder : SqlServerMigrationSqlGenerator
{
private string Marker = Guid.NewGuid().ToString(); //To cheat on the check null or empty of the base generator
protected override void Generate(AlterProcedureOperation alterProcedureOperation)
{
SqlGo();
base.Generate(alterProcedureOperation);
SqlGo();
}
protected override void Generate(CreateProcedureOperation createProcedureOperation)
{
SqlGo();
base.Generate(createProcedureOperation);
SqlGo();
}
protected override void Generate(SqlOperation sqlOperation)
{
SqlGo();
base.Generate(sqlOperation);
}
private void SqlGo()
{
Statement(Marker, batchTerminator: "GO");
}
public override IEnumerable<MigrationStatement> Generate(IEnumerable<MigrationOperation> migrationOperations, string providerManifestToken)
{
var result = new List<MigrationStatement>();
var statements = base.Generate(migrationOperations, providerManifestToken);
bool pendingBatchTerminator = false;
foreach (var item in statements)
{
if(item.Sql == Marker && item.BatchTerminator == "GO")
{
pendingBatchTerminator = true;
}
else
{
if(pendingBatchTerminator)
{
item.BatchTerminator = "GO";
pendingBatchTerminator = false;
}
result.Add(item);
}
}
return result;
}
}

The easiest way is to add /**/ before the GO statement.

Just replace the current statement with a .Replace("GO", "");

NHibernate database versioning: object level schema and data upgrades

I would like to approach database versioning and automated upgrades in NHibernate from a different direction than most of the strategies proposed out there.
As each object is defined by an XML mapping, I would like to take size and checksum for each mapping file/ configuration and store that in a document database (raven or something) along with a potential custom update script. If no script is found, use the NHibernate DDL generator to update the object schema. This way I can detect changes, and if I need to make DML changes in addition to DDL, or perform a carefully ordered transformation, I can theoretically do so in a controlled, testable manner. This should also maintain a certain level of persistence-layer agnosticism, although I'd imagine the scripts would still necessarily be database system-specific.
The trick would be, generating the "old" mapping files from the database and comparing them to the current mapping files. I don't know if this is possible. I also don't know if I'm missing anything else that would make this strategy prohibitively impractical.
My question, then: how practical is this strategy, and why?

what i did to solve just that problem
version the database in a table called SchemaVersion
query the table to see if schema is up to date (required version stored in DAL), if yes goto 6.
get updatescript with version == versionFromBb from resources/webservices/...
run the script which also alters the schemaversion to the new version
goto 2.
run app
to generate the scripts i have used 2 options
support one rdbms: run SchemaUpdate to export into file and add DML statements manually
support multiple rdbms: use Nhibernate class Table to generate at runtime ddl to add/alter/delete tables and code which uses a session DML
Update:
"what method did you use to store the current version"
small example
something like this
public static class Constants
{
public static readonly Version DatabaseSchemaVersion = new Version(1, 2, 3, 4);
}
public class DBMigration
{
private IDictionary<Version, Action> _updates = new Dictionary<Version, Action>();
private Configuration _config;
private Dialect _dialect;
private IList<Action<ISession>> _actions = new List<Action<ISession>>(16);
private string _defaultCatalog;
private string _defaultSchema;
private void CreateTable(string name, Action<Table> configuretable)
{
var table = new Table(name);
configuretable(table);
string createTable = table.SqlCreateString(_dialect, _config.BuildMapping(), _defaultCatalog, _defaultSchema);
_actions.Add(session => session.CreateSQLQuery(createTable).ExecuteUpdate());
}
private void UpdateVersionTo(Version version)
{
_actions.Add(session => { session.Get<SchemaVersion>(1).Value = version; session.Flush(); });
}
private void WithSession(Action<session> action)
{
_actions.Add(action);
}
public void Execute(Configuration config)
{
_actions.Clear();
_defaultCatalog = config.Properties[NH.Environment.DefaultCatalog];
_defaultSchema = config.Properties[NH.Environment.DefaultSchema];
_config = config;
_dialect = Dialect.GetDialect(config.Properties);
using (var sf = _config.BuildSessionFactory())
using (var session = sf.OpenSession())
using (var tx = session.BeginTransaction())
{
Version dbVersion = session.Get<SchemaVersion>(1).Value;
while (dbVersion < Constants.DatabaseSchemaVersion)
{
_actions.Clear();
_updates[dbVersion].Invoke(); // init migration, TODO: error handling
foreach (var action in _actions)
{
action.Invoke(session);
}
tx.Commit();
session.Clear();
dbVersion = session.Get<SchemaVersion>(1).Value;
}
}
}
public DBMigration()
{
_updates.Add(new Version(1, 0, 0, 0), UpdateFromVersion1);
_updates.Add(new Version(1, 0, 1, 0), UpdateFromVersion1);
...
}
private void UpdateFromVersion1()
{
AddTable("Users", table => table.AddColumn(...));
WithSession(session => session.CreateSqlQuery("INSERT INTO ..."));
UpdateVersionTo(new Version(1,0,1,0));
}
...
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas