by using liquibase diff with diffTypes=data on 2 tables in mysql data changes would not detected. In one of the tables I change existing entry and insert on row on a table this changes are not detected by liquibase. Changes on structure no problem.
Here my liquibase diff call:
liquibase --diffTypes=data --driver=com.mysql.jdbc.Driver --url=jdbc:mysql://localhost:3306/magento --username=username --password=password diff --referenceUrl=jdbc:mysql://localhost:3306/marketing_magento --referenceUsername=username --referencePassword=password
changes are in the magento db
Here the result:
Product Name: EQUAL
Product Version: EQUAL
Missing Data(s): NONE
Unexpected Data(s): NONE
Changed Data(s): NONE
Liquibase 'diff' Successful
Regards, Karsten
Liquibase doesn't support this kind of data differencing. It can output the data in certain limited cases - the main one being where a table doesn't exist at all in one database.
Because the primary use case is for doing structural change management, design decisions were made to optimize the performance of that use case. Doing row-by-row data comparisons is very expensive, performance-wise, and tedious to do correctly.
Related
I have a table that I need drop, delete transaction log and recreate, but while I am trying to drop I get following error.
I have ran repair table statement on this one and could be responsible for error but not sure.
IllegalStateException: The transaction log has failed integrity checks. We recommend you contact Databricks support for assistance. To disable this check, set spark.databricks.delta.state.corruptionIsFatal to false. Failed verification of:
Table size (bytes) - Expected: 0 Computed: 63233
Number of files - Expected: 0 Computed: 1
We think this may just be related to s3 eventual consistency. Please try waiting a few extra minutes after deleting the Delta directory before writing new data to it. Also, normal MSCK REPAIR TABLE doesn't do anything for Delta, as Delta doesn't use the Hive Metastore to store the partitions. There is an FSCK REPAIR TABLE, but that is for removing the file entries from the transaction log of a Databricks Delta table that can no longer be found in the underlying file system.
We don't recommend overwriting a Delta table in place, like you might with a normal Spark table. Delta is not like a normal table - it's a table, plus a transaction log, and many versions of your data (unless fully vacuumed). If you want to overwrite parts of the table, or even the whole table, you should use Delta's delete functionality. If you want to completely change the table, consider writing to an entirely new directory, such as /table/v2/... and separately deleting the other table.
To skip the issue from occurring can use below command (PySpark notebook):
spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)
I have a database here which is updated using Liquibase. If I understand correctly, Liquibase will apply a changeset and write a line to the DATABASECHANGELOG table with the execution date and an MD5 checksum. This way, Liquibase can find out when a changeset (unexpectedly) changes.
However, in this database, many (most) of the MD5SUM entries are NULL. I have no idea why that should be the case. Is this in any way a normal mode of operation?
When using the Liquibase status command, I see many ‘unapplied’ changes. How can Liquibase determine that without the MD5 sum? Or are changesets without MD5 sums considered changed.
The DATABASECHANGELOG in question was used by two different programs. This may be questionable, but used to work quite well.
Until… the programs started to use different versions of Liquibase with different formats of the MD5SUM. Upon detecting this, Liquibase writes NULL to the whole column (which is questionable in itself), thereby removing all the checksums for all the other programs. Depending on the order of Liquibase runs, different sets of checksums remained NULL in the end.
I have an ETL process that will run periodically. I was using kettle (PDI) to extract the data from the source database and copy it to a stage database. For this I use several transformations with table input and table output steps. However, I think I could get inconsistent data if the source database is modified during the process, since this way I don't get a snapshot of the data. Furthermore, I don't know if the source database would be blocked. This would be a problem if the extraction takes some minutes (and it will take them). The advantage of PDI is that I can select only the necessary columns and use timestamps to get only the new data.
By the other hand, I think mysqldump with --single-transaction allows me to get the data in a consistent way and don't block the source database (all tables are innodb). The disadventage is that I would get innecessary data.
Can I use PDI, or I need mysqldump?
PD: I need to read specific tables from specific databases, so I think xtrabackup it's not a good option.
However, I think I could get inconsistent data if the source database is modified during the process, since this way I don't get a snapshot of the data
I think "Table Input" step doesn't take into account any modifications that are happening when you are reading. Try a simple experiment:
Take a .ktr file with a single table input and table output. Try loading the data into the target table. While in the middle of data load, insert few records in the source database. You will find that those records are not read into the target table. (note i tried with postgresql db and the number of rows read is : 1000000)
Now for your question, i suggest you using PDI since it gives you more control on the data in terms of versioning, sequences, SCDs and all the DWBI related activities. PDI makes it easier to load to the stage env. rather than simply dumping the entire tables.
Hope it helps :)
Interesting point. If you do all the table inputs in one transformation then at least they all start at same time but whilst likely to be consistent it's not guaranteed.
There is no reason you can't use pdi to orchestrate the process AND use mysql dump. In fact for bulk insert or extract it's nearly always better to use the vendor provided tools.
We have a SQL server database that is very dynamic and is always creating new and dropping existing tables from a custom schema called 'temp' (we have a dbo schema and a temp schema). We also use SSDT to maintain and monitor changes in our schema but we are unable to use the update feature on a schema comparison because if a new table is created (say temp.MyTable) after the schema comparison is made and before the updated is attempted, SSDT invalidates the schema comparison because something has changed. At the moment, our only solution to this is to run the schema comparisons around midnight when system activity is practically non-existent but is not ideal for the person who has to do the schema comparison.
My question is, is there a way we can exlude tables from the schema comparison that are apart of the 'temp.' schema?
How are you doing the deployment? as I test I used sqlpackage.exe to publish a dacpac and sat there constantly creating new tables and it deployed without complaining.
However, there are a couple of things you can do, the first is to stop getting the deployment to stop when drift is detected:
/p:BlockWhenDriftDetected=False
This is set to true by default.
The second thing is to ignore the temp schema, but I don't think this will help unless you also stop the drift but you might want to use this filter to stop all changes to the temp schema:
http://agilesqlclub.codeplex.com/
Ed
We're developing a Doctrine backed website using YAML to define our schema. Our schema changes regularly (including fk relations) so we need to do a lot of:
Doctrine::generateModelsFromYaml(APPPATH . 'models/yaml', APPPATH . 'models', array('generateTableClasses' => true));
Doctrine::dropDatabases();
Doctrine::createDatabases();
Doctrine::createTablesFromModels();
We would like to keep existing data and store it back in the re-created database. So I copy the data into a temporary database before the main db is dropped.
How do I get the data from the "old-scheme DB copy" to the "new-scheme DB"? (the new scheme only contains NEW columns, NO COLUMNS ARE REMOVED)
NOTE:
This obviously doesn't work because the column count doesn't match.
SELECT * FROM copy.Table INTO newscheme.Table
This obviously does work, however this is consuming too much time to write for every table:
SELECT old.col, old.col2, old.col3,'somenewdefaultvalue' FROM copy.Table as old INTO newscheme.Table
Have you looked into Migrations? They allow you to alter your database schema in programmatical way. WIthout losing data (unless you remove colums, of course)
How about writing a script (using the Doctrine classes for example) which parses the yaml schema files (both the previous version and the "next" version) and generates the sql scripts to run? It would be a one-time job and not require that much work. The benefit of generating manual migration scripts is that you can easily store them in the version control system and replay version steps later on. If that's not something you need, you can just gather up changes in the code and do it directly through the database driver.
Of course, the more fancy your schema changes becomes, the harder the maintenance will get i.e. column name changes, null to not null etc.