Database Migration and model transformation languages - sql

I'm considering migrating a database from SQL server to PostgreSQL.
The proposed new Postges Database will have a slightly different design to the old SQL server model.
I heard about model transformation languages and thought they might be worth considering for this task.
I have used TEFKAT to a limited extent some years ago, but it did not strike me as being mature enough for this task (although that was many years ago).
But there are other options available like ATLAS, which I have not used yet.
Are there any model transformation languages out there that would be suitable?
Or is this whole model transformation stuff the wrong way to go about this?

Related

um.. What is AgensGraph?

I heard about AgensGraph, but I wonder exactly what it is.
If you know someone, please let me know.
I got "What is AgensGraph" from AgensGraph documentation, which you can find this document from a following link: http://bitnine.net/support/documents_backup/quick-start-guide-html/
Agens Graph is a new generation multi-model graph database for the modern complex data environment. Agens Graph is a multi-model database, which supports relational and graph data model at the same time. It enables developers to integrate the legacy relational data model and the nobel graph data model in one database. Agens Graph supports Ansi-SQL and Open Cypher (http://www.opencypher.org). SQL query and Cypher query can be integrated into a single query in Agens Graph.
Agens Graph is based on powerful PostgreSQL RDBMS, so it is very robust, fully-featured and ready to enterprise use. It is optimzied for handling complex connected graph data but at the same time, it provides a plenty of powerful database features essential to the enterprise database environment, like ACID transaction, multi version concurrency control, stored procedure, trigger, constraint, sophistrated monitoring and flexible data model (JSON). Moreover, Agens Graph can leverage the rich eco-systems of PostgreSQL and can be extended with many outstanding external modules, like PostGIS.

SSAS tabular model VS multidimentional model

I am new to SSAS tabular model and DAX. We are doing a POC to check which model we should use for our system. There are currently 2 models that we are evaluating: the SSAS Tabular Model and the Multidimensional Model.
My understanding is that the SSAS Tabular Model has some size limitations, i.e. it is good for data <= 1TB as on a single server but it is limited in terms of memory usage. Is this true?
Currently our requirements call for less than 1TB of data, but that may change in the future.
I find the SSAS Tabular Model attractive due to ease of use and faster development cycles, but I would like to get some input from the community on whether this is the right choice.
Thank you,
Atul.
If you have enough money to buy requisite hardware, go for Tabular model as it is almost always faster(exceptions aside). It has the new faster Vertipaq engine which does a better job at compressing data and retrieving results. But never trust just the size of data to decide the model. There could be cases where the calculations are so complex that it overwhelms the RAM. Finally there are good bit of features which are still unavailable on tabular model, so understand those very well before making the decisions. That said, there are a lot of factors in favor of multi dimensional model too and for many practical cases it doesn't make much sense to ditch it in favor of tabular. But adopting Tabular modelling surely is looking towards the future. Hope that helps. All the best.
Today multidimensional models perform better in scalability, security and stability and they have many advanced features that are not available in tabular.
For example implementing many-to-many relationships is easier in multidimensional (only workarrounds available in tabular mode).
Besides technicalities, tabular also requires more expensive SQL Server license.
These 3 resources give quite a comprehensive analysis of the situation:
http://richardlees.blogspot.ca/2012/05/sql-server-2012-tabular-versus.html
https://sqlserverbiblog.wordpress.com/2013/06/24/ssas-tabular-models-the-good-the-bad-the-ugly-the-beautiful/
http://blogs.technet.com/b/cansql/archive/2015/01/15/mvp-series-promoting-an-excel-tabular-model-to-ssas.aspx

NoSQL DataBases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
For last several years I have noticed that interest to NoSQL DBs is increasing.
A lot of new such DBs are released :
MongoDB
CouchDB
Memcachedb
memcached
Velocity
Cassandra
Tokyo Cabinet
etc..
What you think is it targeted to replace relational model and in general how do you see the future for NoSQL.
Why NoSql (MongoDB)?
Scalable and flexible datastore : This is the primary reason for moving away from relational database.
Schema less : represent complex hierarchical relationships with a single record.
Scaling Out: Partitioning data across more machine.
Amazingly Fast : MongoDB uses a binary wire protocol as the primary mode of interaction
with the server
Features:
Indexing with ease
Stored Java Script
Simple Administration (automatic fail over if master goes down in master-slave architecture)
MongoDB is powerful and attempts to keep many features from relational
systems, it is not intended to do everything that a relational database does. Whenever
possible, the database server offloads processing and logic to the client side.
NoSQL systems such as MongoDB are those which are designed for incredibly data-intensive applications - Facebook for example created a NoSQL solution called Cassandra to handle the vast amounts of data they had. NoSQL is useful for those who are building highly scalable applications and helps to reduce the need for empty table columns by not enforcing a database schema, so for instance if you had a table in which you stored information about your friends, you wouldn't have to include the reading interests of one where you knew the reading interests of the other.
Relational databases do have their place, however NoSQL isn't really meant as a replacement, just as a different way of approaching the idea of data storage on a large scale. I would say that in the future more and more companies will begin using NoSQL solutions, but at the moment most people with small websites simply don't have need of a system designed to deal with such quantities of information.
Hope that helps!
I think NoSQL Is meant to replace 'SQL'. The title alone sort of alludes to it. NoSQL means approaching the problem differently. Any system making use of both NoSQL and SQL are not fully embracing what it means to be a Key value store.
That is not to say this approach is not ideal (it is, since many of the NoSQL technologies do not have the advanced features SQL databases have had for decades that currently solve the problem better than NoSQL.)
As NoSQL technologies mature (data consistency is assured) companies will feel more comfortable completely removing SQL from their technology stack; and the commercial licensing of open source databases (MySql being bought by Oracle) will set the pace for the speed of this migration.
There are systems like playORM that can now do joins within partitions so it is more and more likely that noSQL can replace SQL systems in the future. In fact, playORM uses S-SQL(Scalable SQL) and you do nearly the same SQL as before except specify the partitions you are querying. As a start, you can move a relational model to noSQL using playORM with no partitioning and it will be just as fast and hten you can aadd partitioning to your relational model so that it scales.

How would I implement separate databases for reading and writing operations?

I am interested in implementing an architecture that has two databases one for read operations and the other for writes. I have never implemented something like this and have always built single database, highly normalised systems so I am not quite sure where to begin. I have a few parts to this question.
1. What would be a good resource to find out more about this architecture?
2. Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
3. How do you insure that data written to one database is immediately available for reading from the second?
Any further help, tips, resources would be appreciated. Thanks.
EDIT
After some research I have found this article which I found very informative for those interested..
http://www.codefutures.com/database-sharding/
I found this highscalability article very informative
I'm not a specialist but the read/write master database and read-only slaves pattern is a "common" pattern, especially for big applications doing mostly read accesses or data warehouses:
it allows to scale (you add more read-only slaves if required)
it allows to tune the databases differently (for either efficient reads or efficient writes)
What would be a good resource to find out more about this architecture?
There are good resources available on the Internet. For example:
Highscalability.com has good examples (e.g. Wikimedia architecture, the master-slave category,...)
Handling Data in Mega Scale Systems (starting from slide 29)
MySQL Scale-Out approach for better performance and scalability as a key factor for Wikipedia’s growth
Chapter 24. High Availability and Load Balancing in PostgreSQL documentation
Chapter 16. Replication in MySQL documentation
http://www.google.com/search?q=read%2Fwrite+master+database+and+read-only+slaves
Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
I'm not sure - I'm eager to read answers from experts - but I think the schemas are identical in traditional replication scenari (the tuning may be different though). Maybe people are doing more exotic things but I wonder if they rely on database replication in that case, it sounds more like "real-time ETL".
How do you insure that data written to one database is immediately available for reading from the second?
I guess you would need synchronous replication for that (which is of course slower than asynchronous). While some databases do support this mode, not all do AFAIK. But have a look at this answer or this one for SQL Server.
You might look up data warehouses.
These serve as 'normalized for reporting' type databases, while you can keep a normalized OLTP style instance for the data maintenance.
I don't think the idea of 'immediate' equivalence will be a reality. There will be some delay while the new data and changes are migrated in to the other system. The schedule and scope will be your big decisions here.
In regards to questions 2:
It really depends on what you are trying to achieve by having two databases. If it is for performance reasons (which i suspect it may be) i would suggest you look into denormalizing the read-only database as needed for performance. If performance isn't an issue then I wouldn't mess with the read-only schema.
I've worked on similar systems where there would be a read/write database that was only lightly used by administrative users. That database would then be replicated to the read only database during a nightly process.
Question 3:
How immediate are we talking here? Less than a second? 10 seconds? Minutes?

Is ORM slow? Does it matter? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really like ORM as compared to store procedure, but one thing that I afraid is that ORM could be slow, because of layers and layers of abstraction. Will using ORM slow down my application? Or does it matter?
Yes, it matters. It is using more CPU cycles and consequently slowing your application down. Hear me out though...
But, consider this: what is more expensive? Server hardware or another programmer? Server hardware, generally, is cheaper than hiring another team of programmers. So, while ORM may be costing you CPU cycles, you need one less programmer to manage your SQL queries, often resulting in a lower net cost.
To determine if it's worth it for you, calculate or determine how many hours you saved by using an ORM. Then, figure out how much money you spent on the server to support ORM. Multiply the hours you saved by your hourly rate and compare to the server cost.
Of course, whether an ORM actually saves you time is a whole another debate...
Is ORM slow?
Not inherently. Some heavyweight ORMs can add a general drag to things but we're not talking orders of magnitude slowdown.
What does make ORM slow is naïve usage. If you're using an ORM because it looks easy and you don't know how the underlying relational data model works, you can easily write code that seems reasonable to an OO programmer, but will murder performance.
ORM is a handy tool, but you need the lower-level understanding (that usually comes from writing SQL queries) to go with it.
Does it matter?
If you end up performing a looped query for each of thousands of entities at once, instead of a single fast join, then certainly it can.
ORM's are slower and do add overhead to applications (unless you specifically know how to get around these problems, which is not very common). The database is the most critical element and Web applications should be designed around it.
Many OOP frameworks using Active Record or ORMs, developers in general - treat the database as an unimportant afterthought and tend to look at it as something they don't really need to learn. But performance and scalability usually suffer as the db is heavily taxed!
Many large scale web apps have fallen flat, wasting millions and months to years of time because they didn't recognize the importance of the database. Hundreds of concurrent users and tables with millions of records require database tuning and optimization. But I believe the problem is noticeable with a few users and less data.
Why are developers so afraid to learn proper SQL and tuning measures when it's the key to performance?
In a Windows Mobile 5 project against using SqlCe, I went from using hand-coded objects to code generated (CodeSmith) objects using an ORM template. In the process all my data access used CSLA as a base layer.
The straight conversion improved my performance by 32% in local testing, almost all of it a result of better access methods.
After that change, we adjusted the templates (after seeing some SqlCe performance stuff at PDC by Steve Lasker) and in less then 20 minutes, our entire data layer was greatly improved, our average 'slow' calls went from 460ms to ~20ms. The cool part about the ORM stuff is that we only had to implement (and unit test) these changes once and all the data access code got changed. It was an amazing time saver, we maybe saved 40 hours or more.
The above being said, we did lose some time by taking out a bunch of 'waiting' and 'progress' dialogs that were no longer needed.
I have used a few of the ORM tools, and I can recommend two of them:
.NET Tiers
CSLA codegen templates
Both of them have performed quite nicely and any performance loss has not been noticeable.
I've always found it doesn't matter. You should use whatever will make you the most productive, responsive to changes, and whatever is easiest to debug and maintain.
Most applications never need enough load for the difference between ORM and SPs to noticeable. And there are optimizations to make ORM faster.
Finally, a well-written app will have its data access seperated from everything else so that in the future switching from ORM to whatever would be possible.
Is ORM slow?
Yes ( compared with stored procedures )
Does it matter?
No ( except your concern is speed )
I think the problem is many people think of ORM as a object "trick" to databases, to code less or simplify SQL usage, while in reality is .. well an Object - To Relational ( DB ) - Mapping.
ORM is used to persist your objects to a relational database manager system, and not ( just ) to substitute or make SQL easier ( although it make a good job at that too )
If you don't have a good object model, or you're using to make reports, or even if you're just trying to get some information, ORM is not worth it.
If in the other hand you have a complex system modeled through objects were each one have different rules and they interact dynamically and you your concern is persist that information into the database rather than substitute some existing SQL scripts then go for ORM.
Yes, ORM will slow down your application. By how much depends on how far the abstraction goes, how well your object model maps to the database, and other factors. The question should be, are you willing to spend more developer time and use straight data access or trade less dev time for slower runtime performance.
Overall, the good ORMs have little overhead and, by and large, are considered well worth the trade off.
Yes, ORMs affect performance, whether that matters ultimately depends on the specifics of your project.
Programmers often love ORM because they like the nice front-end cding environments like Visual Studio and dislike coding raw SQL with no intellisense, etc.
ORMs have other limitations besides a performance hit--they also often do not do what you need 100% of the time, add the complexity of an additional abstraction layer that must be maintained and re-established every time chhnges are made, there are also caching issues to be dealt with.
Just a thought -- if the database vendors would make the SQL programming environment as nice as Visual Studio, and provide a more natural linkage between the db code and front-end code, we wouldn't need the ORMs...I guess things may go in that direction eventually.
Obvious answer: It depends
ORM does a good job of insulating a programmer from SQL. This in effect substitutes mediocre, computer generated queries for the catastrophically bad queries a programmer might give.
Even in the best case, an ORM is going to do some extra work, loading fields it doesn't need to, explicitly checking constraints, and so forth.
When these become a bottle-neck, most ORM's let you side-step them and inject raw SQL.
If your application fits well with objects, but not quite so easily with relations, then this can still be a win. If instead your app fits nicely around a relational model, then the ORM represents a coding bottleneck on top of a possible performance bottleneck.
One thing I've found to be particularly offensive about most ORM's is their handling of primary keys. Most ORM's require pk's for everything they touch, even if there is no concievable use for them. Example: Authors should have pk's, Blog posts SHOULD have pk's, but the links (join table) between authors and posts not.
I have found that the difference between "too slow" and "not too much slower" depends on if you have your ORM's 2nd level (SessionFactory) cache enabled. With it off it handles fine under development load, but will crush your system under mild production load. After turning on the 2nd Level cache the server handled the expected load and scaled nicely.
ORM can get an order of magnitude slower, not just on the grount=s of wasting a lot of CPU cycles on it's own but also using much more memeory which then has to be GC-d.
Much worse that that however is that the is no standard for ORM (unlike SQL) and that my and large ORM-s use SQL vary inefficiently so at the end of the day you still have to dig into SQL to fix per issues and every time an ORM makes a mess and you have to debug it. Meaning that you haven't gained anything at all.
It's terribly immature technology for real production-level applications. Very problematic things are handling indexes, foreign keys, tweaking tables to fit object hierarchies and terribly long transactions, which means much more deadlocks and repeats - if an ORM knows hows to handle that at all.
It actually makes servers less scalable which multiplies costs but these costs don't get mentioned at the begining - a little inconvenient truth :-) When something uses transactions 10-100 times bigger than optimal it becomes impossible to scale SQL side at all. Talking about serious systems again not home/toy/academic stuff.
An ORM will always add some overhead because of the layers of abstraction but unless it is a poorly designed ORM that should be minimal. The time to actually query the database will be many times more than the additional overhead of the ORM infrastructure if you are doing it correctly, for example not loading the full object graph when not required. A good ORM (nHibernate) will also give you many options for the queries run against the database so you can optimise as required as well.
Using an ORM is generally slower. But the boost in productivity you get will get your application up and running much faster. And the time you save can later be spent finding the portions of your application that are causing the biggest slow down - you can then spend time optimizing the areas where you get the best return on your development effort. Just because you've decided to use an ORM doesn't mean you can't use other techniques in the sections of code that can really benefit from it.
An ORM can be slower, but this is offset by their ability to cache data, therefore however fast the alternative, you can't get much faster than reading from memory.
I never really understood why people think that this is slower or that is slower... get a real machine I say. I have had mixed results... I've seen where execution time for a stored procedure is much slower than ORM and vise versa.. But in both cases the performance was due to difference in hardware.