We're developing a Doctrine backed website using YAML to define our schema. Our schema changes regularly (including fk relations) so we need to do a lot of:
Doctrine::generateModelsFromYaml(APPPATH . 'models/yaml', APPPATH . 'models', array('generateTableClasses' => true));
Doctrine::dropDatabases();
Doctrine::createDatabases();
Doctrine::createTablesFromModels();
We would like to keep existing data and store it back in the re-created database. So I copy the data into a temporary database before the main db is dropped.
How do I get the data from the "old-scheme DB copy" to the "new-scheme DB"? (the new scheme only contains NEW columns, NO COLUMNS ARE REMOVED)
NOTE:
This obviously doesn't work because the column count doesn't match.
SELECT * FROM copy.Table INTO newscheme.Table
This obviously does work, however this is consuming too much time to write for every table:
SELECT old.col, old.col2, old.col3,'somenewdefaultvalue' FROM copy.Table as old INTO newscheme.Table
Have you looked into Migrations? They allow you to alter your database schema in programmatical way. WIthout losing data (unless you remove colums, of course)
How about writing a script (using the Doctrine classes for example) which parses the yaml schema files (both the previous version and the "next" version) and generates the sql scripts to run? It would be a one-time job and not require that much work. The benefit of generating manual migration scripts is that you can easily store them in the version control system and replay version steps later on. If that's not something you need, you can just gather up changes in the code and do it directly through the database driver.
Of course, the more fancy your schema changes becomes, the harder the maintenance will get i.e. column name changes, null to not null etc.
Related
I have an ETL process that will run periodically. I was using kettle (PDI) to extract the data from the source database and copy it to a stage database. For this I use several transformations with table input and table output steps. However, I think I could get inconsistent data if the source database is modified during the process, since this way I don't get a snapshot of the data. Furthermore, I don't know if the source database would be blocked. This would be a problem if the extraction takes some minutes (and it will take them). The advantage of PDI is that I can select only the necessary columns and use timestamps to get only the new data.
By the other hand, I think mysqldump with --single-transaction allows me to get the data in a consistent way and don't block the source database (all tables are innodb). The disadventage is that I would get innecessary data.
Can I use PDI, or I need mysqldump?
PD: I need to read specific tables from specific databases, so I think xtrabackup it's not a good option.
However, I think I could get inconsistent data if the source database is modified during the process, since this way I don't get a snapshot of the data
I think "Table Input" step doesn't take into account any modifications that are happening when you are reading. Try a simple experiment:
Take a .ktr file with a single table input and table output. Try loading the data into the target table. While in the middle of data load, insert few records in the source database. You will find that those records are not read into the target table. (note i tried with postgresql db and the number of rows read is : 1000000)
Now for your question, i suggest you using PDI since it gives you more control on the data in terms of versioning, sequences, SCDs and all the DWBI related activities. PDI makes it easier to load to the stage env. rather than simply dumping the entire tables.
Hope it helps :)
Interesting point. If you do all the table inputs in one transformation then at least they all start at same time but whilst likely to be consistent it's not guaranteed.
There is no reason you can't use pdi to orchestrate the process AND use mysql dump. In fact for bulk insert or extract it's nearly always better to use the vendor provided tools.
We have a SQL server database that is very dynamic and is always creating new and dropping existing tables from a custom schema called 'temp' (we have a dbo schema and a temp schema). We also use SSDT to maintain and monitor changes in our schema but we are unable to use the update feature on a schema comparison because if a new table is created (say temp.MyTable) after the schema comparison is made and before the updated is attempted, SSDT invalidates the schema comparison because something has changed. At the moment, our only solution to this is to run the schema comparisons around midnight when system activity is practically non-existent but is not ideal for the person who has to do the schema comparison.
My question is, is there a way we can exlude tables from the schema comparison that are apart of the 'temp.' schema?
How are you doing the deployment? as I test I used sqlpackage.exe to publish a dacpac and sat there constantly creating new tables and it deployed without complaining.
However, there are a couple of things you can do, the first is to stop getting the deployment to stop when drift is detected:
/p:BlockWhenDriftDetected=False
This is set to true by default.
The second thing is to ignore the temp schema, but I don't think this will help unless you also stop the drift but you might want to use this filter to stop all changes to the temp schema:
http://agilesqlclub.codeplex.com/
Ed
I am not sure how to ask this question so please direct me in the right direction if I am not using the appropriate terminology, etc. but I can explain what I am currently doing. I would like to know if there is an easier way to update content in the database than the method I'm currently using.
(I'm using SQL Server 2008 BTW.)
I have a bunch of CSV files that I use to give to my client as a means to update content which gets imported into the DB (because the content is LARGE). The import works by running a python script that I wrote that makes use of a Jinja2 template that generates the SQL file needed to insert the CSV content into the database (if it is a from-scratch scenario). This is working fine.
Now when it comes to data migration (I need to migrate the data that exists in the DB to a new version thereof) I have a lot of manual work (I hand code it in the template, there is no SQL command or auto-generated code that I can run to do this for me) to do.
So lets say I have a list of Hospitals in a CSV file and I already have a set of hospitals in the database (which is imported from the previous version of the CSV file). I create a copy of the Hospitals table (without the data) and call it HospitalsTemp. The new CSV hospitals are inserted into the HospitalsTemp table (at least that part is generated via the template).
The Hospitals table now gets detached from all its foreign-keys and constraints. Now I go through all the tables surrounding the Hospitals (again manually!) and replace the hospitalId which pointed to the old hospitalId with the new hospitalId (as I can do a lookup from the Hospitals to the HospitalsTemp based on the hospital code to ensure that referential integrity is retained).
Then I delete the Hospitals table and rename the HospitalsTemp to Hospitals and put back the foreign-keys and constraints on the new Hospitals table.
I hope I explained it well enough for everyone to understand. I'm really hoping for a simpler way to do this.
How do you know which hospital becomes which, do the names stay the same? Is there an Id that stays the same?
Have you looked at SSIS, and the Slowly Changing Dimension component? You can use it to update existing rows and add new rows: http://blogs.msdn.com/b/karang/archive/2010/09/29/slowly-changing-dimension-using-ssis.aspx
Also SSIS would be a good tool for the import, as it handles reading CSV files well.
You could replace the current logic with simple SSIS package that's just a flat-file data source and the output of the SCD wizard by the sounds of it?
I have got a backup of a live database (A copy of an ACCDB format Access database) in which I've worked, added new fields to existing tables and whole new tables.
How do I get these changes and apply that fast in the running database?
In MS SQL Server, I'd right-click > Script Table As > Alter To, save the query and run it wherever I desire, is there an as easy way as that to do it in an Access Database ?
Details:
It's an ACCDB MS-Access database created on Access 2007, copied and edited in Access 2007, in which I need to get some "alter" scripts to run on the other database so that it has all the new columns and tables I've created on my copy.
For new tables, just import them from one database into the other. In the "External Data" section of the ribbon, choose the Access icon above "Import". That choice starts an import wizard to allow you to select which objects you want imported. You will have a choice to import just the table structure, or both structure and data.
Remou is right that you can use DDL ALTER TABLE statements to add new columns. However, DDL might not support every feature you want for your new columns. And if you want not just the empty columns added, but also also any data from those new columns, you will probably need to run UPDATE statements to get it into your new columns.
As far as "Script Table As", see if OmBelt's Export Table to SQL tool for MS Access can do what you want.
Edit: Allen Browne has sample ALTER TABLE statements. See CreateFieldDDL and the following one, CreateFieldDDL2.
You can run DDL in Access. I think it would be easiest to run the SQL with VBA, in this case.
There is a product called DbWeigher that can compare Access database schemas and synchronize them. You can get a free trial (30 days). DbWeigher will write a script of all schema differences and write it out as DDL. The script is thorough and includes relationships, indexes, validation rules, allow zero length, etc.
A free tool from the same developer, DBWConsole, will let you execute a DDL script against any Access database. If you wrote your own DDL scripts this would be an easy way to apply the changes to your live database. It even handles some DDL that I don't know how to process in VBA (so it must be magic). DBWConsole is included if you downloaded the trial version of DBWeigher. Be aware that you can't make schema changes to a table in a shared Access database if anyone has the table open.
DbWeigher creates a script of all differences between the two files. It can be a lot to manually parse through if you just want a few of the changes. I built a parser for DbWeigher script files so they could be filtered by table, to extract just the parts I wanted. I contacted the DbWeigher author about it but never heard back. It's safe to say that I have no affiliation with this developer.
I've been writing a Library management Java app lately, and, up until now, the main Library database is stored in a .txt file which was later converted to ArrayList in Java for creating and editing the database and saving the alterations back to the .txt file again. A very primitive method indeed. Hence, having heard on SQL later on, I'm considering to port my preexisting .txt database to mySQL. Since I've absolutely no idea how SQL and specifically mySQL works, except for the fact that it can interact with Java code. Can you suggest me any books/websites to visit/buy? Will the book Head First with SQL ever help? especially when using Java code to interact with the SQL database? It should be mentioned that I'm already comfortable with using 3rd Party APIs.
View from 30,000 feet:
First, you'll need to figure out how to represent the text file data using the appropriate SQL tables and fields. Here is a good overview of the different SQL data types. If your data represents a single Library record, then you'll only need to create 1 table. This is definitely the simplest way to do it, as conversion will be able to work line-by-line. If the records contain a LOT of data duplication, the most appropriate approach is to create multiple tables so that your database doesn't duplicate data. You would then link these tables together using IDs.
When you've decided how to split up the data, you create a MySQL database, and within that database, you create the tables (a database is just something that holds multiple tables). Connecting to your MySQL server with the console and creating a database and tables is described in this MySQL tutorial.
Once you've got the database created, you'll need to write the code to access the database. The link from OMG Ponies shows how to use JDBC in the simplest way to connect to your database. You then use that connection to create Statement object, execute a query to insert, update, select or delete data. If you're selecting data, you get a ResultSet back and can view the data. Here's a tutorial for using JDBC to select and use data from a ResultSet.
Your first code should probably be a Java utility that reads the text file and inserts all the data into the database. Once you have the data in place, you'll be able to update the main program to read from the database instead of the file.
Know that the connection between a program and a SQL database is through a 'connection program'. You write an instruction in an SQL statement, say
Select * from Customer order by name;
and then set up to retrieve data one record at a time. Or in the other direction, you write
Insert into Customer (name, addr, ...) values (x, y, ...);
and either replace x, y, ... with actual values or bind them to the connection according to the interface.
With this understanding you should be able to read pretty much any book or JDBC API description and get started.