If my source table keeps getting one column added to it at a time, how do I map the new column to my query/source?
It is different from slowly changing dimension, as it is not records that are changing, but the number of columns itself, i.e. the schema.
How do I design a job to do this? Any solution is fine, even if it requires custom functions, scripts, etc.
From my perspective it is not possible. This is a use case of sql injection (i.e. somehow you have to play with the ATL or repository metadata) which SAP I bet would never suggest. I think Pentaho and Talend Integration does indeed support this functionality.
According to me by using template tables it can be possible.
Thanks.
Related
I would like to build arbitrary queries to a database, by allowing the user to build queries "on the fly". For every object/table, being able to select its attributes, and then "building" the query (that would translate into a SQL statement) and finally launching it, all through a web interface.
The ticketing system "rt" does that, for example, and another example would be the http://gatherer.wizards.com/Pages/Advanced.aspx webpage.
I'm currently programming in rails but any existing solution that implements this (or something similar) would be welcome.
Just be careful when creating dynamically generated queries like this that will need to be executed via sp_executesql (example: ms sql server), etc..... make sure you cover all of your bases to ensure that your application isnt vulnerable to SQL injection attacks as this type of development will essentially get one in a lot of trouble if its done incorrectly.. I would recommend storing all queries in a table and only reading queries from this table to help isolate the queries that are being ran in your application. Just identify them with a label, and allow the EU to choose the label from a dropdown list control on the frontend.
Good luck and I'm not sure of any software that will help assist
Not quite sure what your use case is here but i would say check out the
Doctrine ORM ( Object Relational Mapper )
**Edit
After reading more and looking at the example. I would only suggest Doctrine for a large website.
Then use Doctrines DQL syntax with some javascript/jquery magic for the forms.
Note that the queries you're referencing aren't arbitrary: they're on a very specific problem domain, on a specific set of sql tables.
That said, if I were you I'd look into how people are building sql queries with javascript. Something like these:
http://code.google.com/p/django-querybuilder/
http://css.dzone.com/articles/sqlike-sql-querying-engine?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+zones%2Fria+(RIA+Zone)
http://thechangelog.com/post/4914956307/rel-arel-ported-to-node-js-with-some-changes
That'll at least get you a good idea of the underlying data structures.
I'm looking into using Microsoft's Entity Framework in an upcoming project which is a point release of an existing product. Our current product supports two DBMS (Oracle and SQL Server), the schema of each is maintained in separate .sql script files.
The entity framework (4.1) looks appealing because it allows various scenarios to be implemented automatically via code generation, reflection, etc. However, as far as I can tell, some of these benefits appear to be mutually exclusive of others.
For example, to support multiple DMBSes, I am inferring that I would need to use a model or code first design, in which case EF would generate the schema for each according to the model (I have seen little to no posts or documentation on this, so I may be wrong). This means that our existing schema would need to be either abandoned (model-first), or mapped (code-first). Additionally, updating the schema would require manual scripts as EF does not appear to support schema upgrades (without wiping out data).
Are model-first and code-first the only viable means of supporting multiple DBMSes in EF? I realize that technically it would be impossible to guarantee that two arbitrary schemae are the same, so I am thinking this is true.
Are there any potential pitfalls of code-first and mapping to multiple DBMS systems? For example, Oracle does not have auto-increment columns; you have to use sequences. How is this mapped in the DbContext? Do I need to create separate maps for each DBMS?
Does EF support any mechanism to upgrade an existing DBMS schema to one of which is representative of the EF model (schema recreation =/= upgrade), or am I limited to doing this manually?
I did come up with one possible way to use database first and support multiple DBMSes, however it is a maintenance nightmare. The idea was to add another layer of abstraction to the two generated data models and create converter classes for each of the EF generated models. This seems like the best way of doing it so that each DBMS could potentially have its own model, yet my code would handle the mapping. But in doing this, what am I really gaining from EF? Maybe query generation, but is that worth it?
Actually both the model-first and the database-first have same constraints. Both these approaches are using an EDMX file which contains SSDL (a description of store = a database layer) part related directly to a single database provider so if you want to have two different database providers you must have two different SSDL parts and keep them in sync. You can use single CSDL (a description of conceptual layer = your model classes) and a single or two MSLs (a description of mapping between SSDL and CSDL - a single file is possible only if tables and columns will have exactly same names in both SSDLs). As I know EDMX file can consists only from single SSDL, CSDL and MSL parts so I expect that the designer has no support for this scenario and you will have to modify second SSDL manually or use two EDMXs = model each change twice.
The code-first approach can make this much more simple but the question is how good is Oracle provider when using the code-first and the database generation. The provider is responsible for correctly interpreting needed features like sequences in case of auto increment columns.
EF itself currently has no support for upgrading existing DB. When using EDMX the process of the database generation is controlled either by T4 template or Workflow so it can be customized and there is already separate feature called Entity Designer Database Generation Power Pack which allow incremental building of the database with the model-first approach. The problem is that this feature is using VS Database tools. I think these tools works only with SQL server. I never like these automated tools so I still think that database upgrade should be controlled manually with help of some tools to get difference script between the current and the last deployed database versions. You should need diff script only when deploying new the new version to a production environment. In a testing and a development environment you can always recreate the whole database.
There should be no abstraction needed when working with two EDMX models. Models must produce the same conceptual layer. In such case you need only a single set of POCO classes which are mapped by conventions (same class name as the entity, same properties with same types and accesibility) so they will work with both models.
Edit:
Based on #Tridus answer I'm just adding that you can create databases first and use fluentAPI from EF 4.1 to map them. Your databases must have exactly the same schema (table names, column names, etc.), they can't use any specific features (I hope sequences will not be the problem because it is just the way how Oracle handles auto increment columns).
This is actually fairly doable with a database first design, but there's some caveats you won't be able to get around easily due to how the databases handle things differently.
Sequences are one (in that they're just ignored by EF entirely). You can fake that in Oracle by putting a trigger on the table that populates it on Insert, but I also found that if you have to update the model later then EF "forgets" that the column is an identity column and it'll try to stick a 0 in it again. I also found it unreliable in Oracle to try and get the new ID if you use a trigger. We just wound up selecting from the sequence and setting the ID on the object before doing the insert because that's how you usually do it in Oracle. You could also use a stored procedure that handles it.
Numbers aren't handled the same way. SQL Server uses number formats that map to Int32, Int64, etc. Oracle's number format is totally different and a full range Int32 in SQL Server is a Number(10,0) in Oracle... which is actually an Int64 in EF because it's bigger then an Int32. I also found that Oracle's EF provider likes to use Decimal a lot even when it doesn't have to, but that's probably just a beta issue.
Stored Procedures in Oracle require some values to be put in app.config/web.config in order to work in EF. I'm not sure if that's going to just be clutter in SQL Server or if it'll cause problems.
Finally, EF Code First is pretty immature and according to the docs doesn't support changing the database structure in this version. I'm not sure if Oracle's provider supports it either (it might, haven't tried it).
Most of this is stuff you can get around, but you're going to need to do some work to hide the differences from the rest of your code and it'll probably take a wrapper layer to do it.
edit - In regards to your #4 - EF 4.1 can generate partial POCO classes. Instead of writing a wrapper around each of the generated models to hide any differences, you can create another partial class code file that won't be regenerated when you update the model, and then add properties/methods that hide the differences. Your app code would just have to be aware to use those instead, and they'd handle the issue (like the number issue I mentioned, you could completely hide it with another property that can do the necessary casting for Oracle).
I think its quite usual task, but still solutions I saw look not so nice.
For example in Qt used approach based on MVC pattern -- you must assign all connections manually.
Or I remember one PHP engine where pages were creating from DB schema.
Which are other approaches? Which one you are prefer? What are the best practices?
Usually, there is not a one to one mapping from the database to the GUIThere are many subtle combinations that change between how you store the data and how that stored data is visualized and edited by the user.
however, you can automate a datamodel layer in your code with tools like Hibernate. You will still need to use the model as required to present your user interface.
Up until now, my experience with databases has always been working with an intermediate definition layer that we have where I work. i.e. SQL wasn't directly written for the table definitions, but generated from an intermediate file which wrote out SQL scripts for creating the appropriate tables, upgrade scripts between schema changes, and helper functions for doing simple queries/updates/inserts/deletes from the database.
Now I'm in a situation where I don't have access to that, for reasons I won't get into, and I find myself somewhat lost at sea regarding what to do. I need to have a small number of tables in a database, and I'm unsure what's usually done to manage the table definitions.
Do people normally just use the SQL script that does the table creation as their definition, or does everyone just use an IDE that manages the definition in a separate file and regenerates the SQL script to create the tables?
I'd really prefer not to have to introduce a dependence on a specific IDE, because as we all know, developers are whiners that are prone to religious debates over small things.
Open your favorite text editor -> Start writing CREATE scripts -> Save -> Put in Source Control
That script now becomes the basis for you database. Anytime there are schema changes, they get put back into the scripts so that they don't get lost.
These become your definition.
I find it more reliable than depending on any specific IDE/Platform generating those scripts for you.
We write the scripts ourselves and store them in source control like any other code. Then the scripts that are appropriate for a particular version are all groupd together and promoted to prod together. Make sure to use alter table when changing existing tables becasue you don't want to drop and recreate them if they have data! I use a drop and recireate for all other objects though. If you need to add records to a particular table (usually a lookup of sometype) we do that in scripts as well. Then that too gets promoted with the rest of the version code.
For me, putting the scripts in source control however they are generated is the key step. This is how you know what you have changed for the next release. This is how you can see earlier versions and revert back easily if there is a problem. Treat database code the same wayyou treat all other code.
YOu could use one of the data modelling tools that creates scripts for you if you are starting out on a database design and the eventually want to create it for you. Some tools for that are Erwin, Fabforce etc... (though not free)
If you have access to an IDE like SQL Management studio, you can create them by using an GUI thats pretty simple.
If you are writing your own code, Its always better to write your own scripts based on a good template so that you cover all the properties of the definition of the table like the file_group, Collation & stuff. Hope this helps
Once you do create a base copy generate scripts and have a base reference copy of it so that you could do "incremental" changes on them and manage them in a source control.
Though I use TOAD for Oracle, I always write the scripts to create my database objects by hand. It gives you (and your DBA's) more control and knowledge of what's being created and how.
If your schema is too difficult to describe in SQL, you probably have other issues more pressing than which IDE. Use modelling documentation if you need a graphical representation, but yeah, you don't need an IDE.
There are multiple ways out there for what you are asking.
Old traditional way is to have a script file ready with your application that has CREATE TABLE statement.
If you are a developer, and that too a Java enterprise developer, you could generate complete schema using a persistence library called Hibernate. Here is a how to
If you are a DBA level user, you could take schema export from one environment and import that in to your current environment. This is a standard practice among DBAs. But it requires admin access as you can see. Also, the methods are dependent on the database you are using (oracle, db2 etc)
I've shown up at a new job and discovered database which is in dire need of some help. There are many many things wrong with it, including
No foreign keys...anywhere. They're faked by using ints and managing the relationship in code.
Practically every field can be NULL, which isn't really true
Naming conventions for tables and columns are practically non-existent
Varchars which are storing concatenated strings of relational information
Folks can argue, "It works", which it is. But moving forward, it's a total pain to manage all of this with code and opens us up to bugs IMO. Basically, the DB is being used as a flat file since it's not doing a whole lot of work.
I want to fix this. The issues I see now are:
We have a lot of data (migration, possibly tricky)
All of the DB logic is in code (with migration comes big code changes)
I'm also tempted to do something "radical" like moving to a schema-free DB.
What are some good strategies when faced with an existing DB built upon a poorly designed schema?
Enforce Foreign Keys: If a relationship exists in the domain, then it should have a Foreign Key.
Renaming existing tables/columns is fraught with danger, especially if there are many systems accessing the Database directly. Gotchas include tasks that run only periodically; these are often missed.
Of Interest: Scott Ambler's article: Introduction To Database Refactoring
and Catalog of Database Refactorings
Views are commonly used to transition between changing data models because of the encapsulation. A view looks like a table, but does not exist as a finite object in the database - you can change what column is being returned for a given column alias as desired. This allows you to setup your codebase to use a view, so you can move from the old table structure to the new one without the application needing to be updated. But it means the view has to return the data in the existing format. For example - your current data model has:
SELECT t.column --a list of concatenated strings, assuming comma separated
FROM TABLE t
...so the first version of the view would be the query above, but once you created the new table that uses 3NF, the query for the view would use:
SELECT GROUP_CONCAT(t.column SEPARATOR ',')
FROM NEW_TABLE t
...and the application code would never know that anything changed.
The problem with MySQL is that the view support is limited - you can't use variables within it, nor can they have subqueries.
The reality to the changes you wish to make is effectively rewriting the application from the ground up. Moving logic from the codebase into the data model will drastically change how the application gets the data. Model-View-Controller (MVC) is ideal to implement with changes like these, to minimize the cost of future changes like these.
I'd say leave it alone until you really understand it. Then make sure you don't start with one of the Things You Should Never Do.
Read Scott Ambler's book on Refactoring Databases. It covers a good many techniques for how to go about improving a database - including the transitional measures needed to allow both old and new programs to work with the changing design.
Create a completely new schema and make sure that it is fully normalized and contains any unique, check and not null constraints etc that are required and that appropriate data types are used.
Prepopulate each table that fills the parent role in a foreign key relationship with a single 'Unknown' record.
Create an ETL (Extract Transform Load) process (I can recommend SSIS (SQL Server Integration Services) but there are plenty of others) that you can use to refill the new schema from the existing one on a regular basis. Use the 'Unknown' record as the parent of any orphaned records - there will be plenty ;). You will need to put some thought into how you will consolidate duplicate records - this will probably need to be on a case by case basis.
Use as many iterations as are necessary to refine your new schema (ensure that the ETL Process is maintained and run regularly).
Create views over the new schema that match the existing schema as closely as possible.
Incrementally modify any clients to use the new schema making temporary use of the views where necessary. You should be able to gradually turn off parts of the ETL process and eventually disable it completely.
First see how bad the code is related to the DB if it is all mixed in no DAO layer you shouldn't think about a rewrite but if there is a DAO layer then it would be time to rewrite that layer and DB along with it. If possible make the migration tool based on using the two DAOs.
But my guess is there is no DAO so you need to find what areas of the code you are going to be changing and what parts of the DB that relates to hopefully you can cut it up into smaller parts that can be updated as you maintain. Biggest deal is to get FKs in there and start checking for proper indexes there is a good chance they aren't being done correctly.
I wouldn't worry too much about naming until the rest of the db is under control. As for the NULLs if the program chokes on a value being NULL don't let it be NULL but if the program can handle it I wouldn't worry about it at this point in the future if it is doing a default value move that to the DB but that is way down the line from the sound of things.
Do something about the Varchars sooner rather then later. If anything make that the first pure background fix to the program.
The other thing to do is estimate the effort of each areas change and then add that price to the cost of new development on that section of code. That way you can fix the parts as you add new features.