There seems to be a long-standing trend in web development to abstract the database to the point of being a glorified spreadsheet, a simple data dump.
I really like RDBM systems, SQL, and the associated procedural programming. I think the choice of RDBMS aught to be part of the solution. MY RDBMS of choice is PostgreSQL and I'd very much like to take advantage of the features of a RDBMS and of postgres' specific features.
I'd like to leverage the DB for things like data integrity. I'd like to use things like constraints (FK and otherwise), triggers, listen/notify, windowing, stored procedures, and so on.
Basically I want to create a data layer API using PL/PGSQL through which the database is accessed by the web application. I've already got an authentication and persistent login API, and a Calendar API with full support for iCal-style event repetitions (a pleasant challenge to implement!)
What I'd like is a framework that enables this kind of DB-oriented development, heavy on stored procedures. That doesn't get in the way of the DB, but makes it easy to proxy data to and from the client.
Ideally, I'd like something that doesn't have a lot of boilerplate code for executing queries (if at all.) Something that could take HTTP GET/POST data for query parameters, then return query results to the client in JSON or XML. Light-weight, asynchronous. I don't even need html templates - I'm hoping to build the client entirely out of "static" html and javascript.
Note that I'm a fairly versed developer; I just happen to be a fan of relational databases. I'm willing to admit that my perception of web frameworks and databases might be out of tune! But that's why I'm here - my perception is fairly entrenched so I'd like to poll the pros.
I think you need two things
1) For database access you should looks into Spring JDBC templates.
2) To do a simple Query to your service with GET, POST, PUT etc methods you should look into RESTful webservices here Jersey
Related
Coming from an SQL guy, I'm looking to use NoSQL in production. One thing I noticed is that transferring from one provider to another is not going to be as easy as importing/exporting in "normal" SQL. Correct me if I'm wrong but it appears each flavor of NoSQL has its own "scheme".
My question is, say I chose Google Datastore today, then some time in the future I decide to move my data to Amazon DynamoDB or some hosted MongoDB service, for whatever reason it may be (price/performance/etc).. Do I need to code my own transition script or is there a standard way/tool to move across different NoSQL solutions (like simple import/export in traditional SQL DBs)?
Yes, you guessed right. The databases you listed are all wildly different. Consequently, there is no automated way of moving data between them (that I heard of). Nor is there any sense.
MongoDB, for example, supports quite rich set of operations, compared to which the key-value API of DynamoDB looks primitive. So unless your app only uses "get_item/put_item" operations, you can't really switch between different NoSQL databases.
I suppose this isn't a particularly "good fit for our Q&A format" but no idea where else to ask it.
Why is SQL-querying text based? Is there a historical motivation, or just laziness?
mysqli_query("SELECT * FROM users WHERE Name=$name");
It opens up an incredible number of stupid mistakes like above, encouraging SQL injection. It's basically exec for SQL, and exec'ing code, especially dynamically generated, is widely discouraged.
Plus, to be honest, prepared statements seem like just a verbose workaround.
Why don't we have a more object-oriented, noninjectable way of doing it, like having a directly-accessible database object with certain methods:
$DB->setDB("mysite");
$table='users';
$col='username';
$DB->selectWhereEquals($table,$col,$name);
Which the database itself natively implements, completely eliminating all resemblance to exec and nullifying the entire basis of SQL injection.
Culminating in the real question, is there any database framework that does anything like this?
Why are programming languages text-based? Quite simply: because text can be used to represent powerful human-readable/editable DSLs (Domain-specific language), such as SQL.
The better (?) question is: Why do programmers refuse to use placeholders?
Modern database providers (e.g. ADO.NET, PDO) support placeholders and an appropriate range of database adapters1. Programmers who fail to take advantage of this support only have themselves to blame.
Besides ubiquitous support for placeholders in direct database providers, there are also many different APIs available including:
"Fluent database libraries"; these generally map an AST, such as LINQ, onto a dynamically generated SQL query and take care of the details such as correctly using placeholders.
A wide variety of ORMs; may or may not include a fluent API.
Primitive CRUD wrappers as found in the Android SQLite API, which looks suspiciously similar to the proposal.
I love the power of SQL and almost none of the queries I write can be expressed in such a trivial API. Even LINQ, which is a powerful provider, can sometimes get in my way (although the prudent use of Views helps).
1 Even though SQL is text-based (and such is used to pass the shape of the query to the back-end), bound data need not be included in-line, which is where the issue of SQL Injection (being able to change the shape of the query) comes in. The database adapter can, and does if it is smart enough, use the API exposed by the target database. In the case of SQL Server this amounts to sending the data separately [via RPC] and for SQLite this means creating and binding separate prepared-statement objects.
In HTML+CSS+JS world, http://jsfiddle.net/ is very helpful tool for asking / making example about web development. And I also saw several browser(javascript)-based programming language compilers and REPLs. But I can't find online / web-based test environment for database operations( especially for RDBMS ).
Is there any open/free database service with web-based interfaces for testing queries?
Added: This tool will be good for this situation; If I'm troubling with complex queries, then create a sample table via web interface and ask it on stackoverflow with the 'sample table URL'. Anyone can access to the URL and test their queries on web site. (Yes, queries are running on 'real' database system) And also the query results can be tracked, then we can even make 'ranking' for it :)
Try SQL Fiddle.
You can try your SQL query and execute/test it.
There are free "disposable" database servers like db4free and even MonoQL.
As far as the web-based interfaces and short URLs go, I don't think you'll have much luck.
To manage your data you have to stick to what is provided (usually phpMyAdmin or similar) and there is no short-URL to query mapping. One other caveat of such system is that (without the appropriate user permissions) one user could easily destroy all your test data -- and remember that (relational) database versioning is much more expensive than plain text versioning, so that's pretty much out of the question.
For non-RDBMS, I can think of try.mongodb.org -- but it suffers from the same problems.
Almost forgot, the Stack Exchange Data Explorer, lets you practice T-SQL queries (with permalinks).
PS: As a personal side-note, I think it's a cool idea and I would love to see something like that implemented, perhaps even mashed-up with SchemaBank or similar - that would be just awesome.
You can't really test a query without the right underlying dbms, schemas (or databases), tables, constraints, stored procedures, and permissions, which tend to be highly application specific. (That is, not readily reusable among multiple users.)
Instead, the database world has grown up into database management systems that you can freely download and install locally. Then you can build and populate your own tables, and test your queries however you like.
Most of these come with both a command line interface and some kind of graphical interface. It's not clear to me what a web-based interface would give you that doesn't already exist in one form or another.
I think that, to do what you want, would require commercial licenses for Oracle, DB2, SQL Server, and Sybase. That's a pretty high barrier to entry for a free web site.
Trouble with a web based query analyser is that you'd need to let it 'tunnel' on to your box to run the queries and for many making a development/test box open to the internet is not a possibility.
For a non web based tool you could look at LinqPad http://www.linqpad.net/ - it does Linq & Sql and other stuff too - very handy tool indeed
The reason why I ask this is because I need to know whether not using ORM for a social networking site makes any sense at all.
My argument why ORM does not fit into social networking sites are:
Social networking sites are not a product, thus you don't need to support multiple database. You know what database to use, and you most likely won't change it every now and then.
Social networking sites requires many-to-many relationship between users, and in the end sometimes you will need to write plain SQL to get those relations. The value of ORM is thus decreased again.
Related to the previous point, ORM sometimes do multiple queries in the backend to fetch its record, which sometimes may be inefficient and may cause bottleneck in the database. In the end you have to write down plain SQL query. If we know we are going to write plain SQL anyway, what is the point using ORM?
This is my limited understanding based on my limited experience. What are you're experience with building a social networking sites? Are my points valid? Is it lame to use bare SQL without worrying about using ORM? What are the points where ORM may help in building a social networking sites?
The value of using an ORM is to help speed up development, by automating the tedious work of assigning query results to object fields, and tracking changes to object fields so you can save them to the database. Hence the term Object-Relational Mapping.
An ORM has little value for you regarding database portability, since you only use the one database you deploy on.
The runtime performance aspect of an ORM is no better than, and typically much worse than writing plain SQL yourself. The generic methods of query generation often make naive mistakes and result in redundant queries, as you have mentioned. Again, the benefit is in development time, not runtime efficiency.
Using an ORM versus not using an ORM doesn't seem to make a huge difference for scalability. Other techniques with more bang-for-the-buck for scalability include:
Managing indexes in the RDBMS. Improve as many algorithms as possible from O(n) to O(log2n).
Intelligent caching architecture.
Horizontal scaling by database partitioning/sharding.
Database load-balancing and replication. Read from slave databases where possible, and write to a single master database. Index slaves and masters differently.
Supplement the RDBMS with complementary technology, such as Sphinx Search.
Vertical scaling by throwing hardware at the problem. Jeff Atwood has commented about this on the StackOverflow podcast.
Some people advocate moving your data management to a distributed architecture using cloud computing or distributed non-relational databases. This is probably not necessary until you get a very large number of users. Once you grow to a certain level of magnitude, all the rules change and you probably can't use an RDBMS anyway. But unless you are the data architect at Yahoo or Facebook or LinkedIn, don't worry about it -- cloud computing is over-hyped.
There's a common wisdom that the database is always the bottleneck in web apps, but there's also a case that improving efficiency on the front-end is at least as important. Cf. books by Steve Souders.
Julia Lerman in Programming Entity Framework (2009), p.503 shows that there's a 220% increase in query execution cost between using a DataReader directly and using Microsoft’s LINQ to Entities.
Also see Jeff Atwood's post on All Abstractions are Failed Abstractions, where he shows that using LINQ is at least double the cost of using plain SQL even in a naive way.
Here's my response to your points:
ORM does not need multiple database to be effective, in fact most cases of ORM usage are not due to the ability to adapt to different databases.
Most modern ORM frameworks are flexible enough to fetch 'lightweight' variants of mapped classes, it really depends on how you implement them.
If really required to, you can write native SQL queries within the ORM frameworks. Do note that caching and performance related algorithms are often part of the these frameworks.
IMO, an ORM helps you write cleaner, clearer code. If you use it sloppily you can cause excessive queries, but that isn't a rule by any means. If I were you I would start using the ORM and best practices of a framework, and only drop to SQL if you find yourself needing functionality that the ORM does not provide.
Also note that in web applications, many people are moving away from SQL databases. An ORM might help you to migrate to a non-relational database (precisely because you do not have SQL in your application code). Look at the use of JDO and JPA in Google's App Engine.
IMHO. ORM is need.
It allow you to access database in OOP way, no matter multiple database or not.
Cleaner code, you can define all method related to a particular table in the table class file, if you need raw sql join query, no problem, define there. it follows DRY and KISS. It is much better than you write similar raw sql query again and again.
The odds of your site being big enough that scaling becomes an issue are quite small so why prematurely optimize by doing everything in raw SQL instead of an ORM? You can get fairly far by throwing better hardware at a database assuming the database and application design are decent. While you may need to write raw SQL for things like creating friend graphs what about all the little things like updating the database when someone changes there email, sends a private message, uploads a photo, etc? Using an ORM can simplify all the simple database tasks you will have to do while still allowing you to hand code where absolutely necessary.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a web developer looking to move from hand-coded PHP sites to framework-based sites, I have seen a lot of discussion about the advantages of one ORM over another. It seems to be useful for projects of a certain (?) size, and even more important for enterprise-level applications.
What does it give me as a developer? How will my code differ from the individual SELECT statements that I use now? How will it help with DB access and security? How does it find out about the DB schema and user credentials?
Edit: #duffymo pointed out what should have been obvious to me: ORM is only useful for OOP code. My code is not OO, so I haven't run into the problems that ORM solves.
I'd say that if you aren't dealing with objects there's little point in using an ORM.
If your relational tables/columns map 1:1 with objects/attributes, there's not much point in using an ORM.
If your objects don't have any 1:1, 1:m or m:n relationships with other objects, there's not much point in using an ORM.
If you have complex, hand-tuned SQL, there's not much point in using an ORM.
If you've decided that your database will have stored procedures as its interface, there's not much point in using an ORM.
If you have a complex legacy schema that can't be refactored, there's not much point in using an ORM.
So here's the converse:
If you have a solid object model, with relationships between objects that are 1:1, 1:m, and m:n, don't have stored procedures, and like the dynamic SQL that an ORM solution will give you, by all means use an ORM.
Decisions like these are always a choice. Choose, implement, measure, evaluate.
ORMs are being hyped for being the solution to Data Access problems. Personally, after having used them in an Enterprise Project, they are far from being the solution for Enterprise Application Development. Maybe they work in small projects. Here are the problems we have experienced with them specifically nHibernate:
Configuration: ORM technologies require configuration files to map table schemas into object structures. In large enterprise systems the configuration grows very quickly and becomes extremely difficult to create and manage. Maintaining the configuration also gets tedious and unmaintainable as business requirements and models constantly change and evolve in an agile environment.
Custom Queries: The ability to map custom queries that do not fit into any defined object is either not supported or not recommended by the framework providers. Developers are forced to find work-arounds by writing adhoc objects and queries, or writing custom code to get the data they need. They may have to use Stored Procedures on a regular basis for anything more complex than a simple Select.
Proprietery binding: These frameworks require the use of proprietary libraries and proprietary object query languages that are not standardized in the computer science industry. These proprietary libraries and query languages bind the application to the specific implementation of the provider with little or no flexibility to change if required and no interoperability to collaborate with each other.
Object Query Languages: New query languages called Object Query Languages are provided to perform queries on the object model. They automatically generate SQL queries against the databse and the user is abstracted from the process. To Object Oriented developers this may seem like a benefit since they feel the problem of writing SQL is solved. The problem in practicality is that these query languages cannot support some of the intermediate to advanced SQL constructs required by most real world applications. They also prevent developers from tweaking the SQL queries if necessary.
Performance: The ORM layers use reflection and introspection to instantiate and populate the objects with data from the database. These are costly operations in terms of processing and add to the performance degradation of the mapping operations. The Object Queries that are translated to produce unoptimized queries without the option of tuning them causing significant performance losses and overloading of the database management systems. Performance tuning the SQL is almost impossible since the frameworks provide little flexiblity over controlling the SQL that gets autogenerated.
Tight coupling: This approach creates a tight dependancy between model objects and database schemas. Developers don't want a one-to-one correlation between database fields and class fields. Changing the database schema has rippling affects in the object model and mapping configuration and vice versa.
Caches: This approach also requires the use of object caches and contexts that are necessary to maintian and track the state of the object and reduce database roundtrips for the cached data. These caches if not maintained and synchrnonized in a multi-tiered implementation can have significant ramifications in terms of data-accuracy and concurrency. Often third party caches or external caches have to be plugged in to solve this problem, adding extensive burden to the data-access layer.
For more information on our analysis you can read:
http://www.orasissoftware.com/driver.aspx?topic=whitepaper
At a very high level: ORMs help to reduce the Object-Relational impedance mismatch. They allow you to store and retrieve full live objects from a relational database without doing a lot of parsing/serialization yourself.
What does it give me as a developer?
For starters it helps you stay DRY. Either you schema or you model classes are authoritative and the other is automatically generated which reduces the number of bugs and amount of boiler plate code.
It helps with marshaling. ORMs generally handle marshaling the values of individual columns into the appropriate types so that you don't have to parse/serialize them yourself. Furthermore, it allows you to retrieve fully formed object from the DB rather than simply row objects that you have to wrap your self.
How will my code differ from the individual SELECT statements that I use now?
Since your queries will return objects rather then just rows, you will be able to access related objects using attribute access rather than creating a new query. You are generally able to write SQL directly when you need to, but for most operations (CRUD) the ORM will make the code for interacting with persistent objects simpler.
How will it help with DB access and security?
Generally speaking, ORMs have their own API for building queries (eg. attribute access) and so are less vulnerable to SQL injection attacks; however, they often allow you to inject your own SQL into the generated queries so that you can do strange things if you need to. Such injected SQL you are responsible for sanitizing yourself, but, if you stay away from using such features then the ORM should take care of sanitizing user data automatically.
How does it find out about the DB schema and user credentials?
Many ORMs come with tools that will inspect a schema and build up a set of model classes that allow you to interact with the objects in the database. [Database] user credentials are generally stored in a settings file.
If you write your data access layer by hand, you are essentially writing your own feature poor ORM.
Oren Eini has a nice blog which sums up what essential features you may need in your DAL/ORM and why it writing your own becomes a bad idea after time:
http://ayende.com/Blog/archive/2006/05/12/25ReasonsNotToWriteYourOwnObjectRelationalMapper.aspx
EDIT: The OP has commented in other answers that his code base isn't very object oriented. Dealing with object mapping is only one facet of ORMs. The Active Record pattern is a good example of how ORMs are still useful in scenarios where objects map 1:1 to tables.
Top Benefits:
Database Abstraction
API-centric design mentality
High Level == Less to worry about at the fundamental level (its been thought of for you)
I have to say, working with an ORM is really the evolution of database-driven applications. You worry less about the boilerplate SQL you always write, and more on how the interfaces can work together to make a very straightforward system.
I love not having to worry about INNER JOIN and SELECT COUNT(*). I just work in my high level abstraction, and I've taken care of database abstraction at the same time.
Having said that, I never have really run into an issue where I needed to run the same code on more than one database system at a time realistically. However, that's not to say that case doesn't exist, its a very real problem for some developers.
I can't speak for other ORM's, just Hibernate (for Java).
Hibernate gives me the following:
Automatically updates schema for tables on production system at run-time. Sometimes you still have to update some things manually yourself.
Automatically creates foreign keys which keeps you from writing bad code that is creating orphaned data.
Implements connection pooling. Multiple connection pooling providers are available.
Caches data for faster access. Multiple caching providers are available. This also allows you to cluster together many servers to help you scale.
Makes database access more transparent so that you can easily port your application to another database.
Make queries easier to write. The following query that would normally require you to write 'join' three times can be written like this:
"from Invoice i where i.customer.address.city = ?" this retrieves all invoices with a specific city
a list of Invoice objects are returned. I can then call invoice.getCustomer().getCompanyName(); if the data is not already in the cache the database is queried automatically in the background
You can reverse-engineer a database to create the hibernate schema (haven't tried this myself) or you can create the schema from scratch.
There is of course a learning curve as with any new technology but I think it's well worth it.
When needed you can still drop down to the lower SQL level to write an optimized query.
Most databases used are relational databases which does not directly translate to objects. What an Object-Relational Mapper does is take the data, create a shell around it with utility functions for updating, removing, inserting, and other operations that can be performed. So instead of thinking of it as an array of rows, you now have a list of objets that you can manipulate as you would any other and simply call obj.Save() when you're done.
I suggest you take a look at some of the ORM's that are in use, a favourite of mine is the ORM used in the python framework, django. The idea is that you write a definition of how your data looks in the database and the ORM takes care of validation, checks and any mechanics that need to run before the data is inserted.
What does it give me as a developer?
Saves you time, since you don't have to code the db access portion.
How will my code differ from the individual SELECT statements that I use now?
You will use either attributes or xml files to define the class mapping to the database tables.
How will it help with DB access and security?
Most frameworks try to adhere to db best practices where applicable, such as parametrized SQL and such. Because the implementation detail is coded in the framework, you don't have to worry about it. For this reason, however, it's also important to understand the framework you're using, and be aware of any design flaws or bugs that may open unexpected holes.
How does it find out about the DB schema and user credentials?
You provide the connection string as always. The framework providers (e.g. SQL, Oracle, MySQL specific classes) provide the implementation that queries the db schema, processes the class mappings, and renders / executes the db access code as necessary.
Personally I've not had a great experience with using ORM technology to date. I'm currently working for a company that uses nHibernate and I really can't get on with it. Give me a stored proc and DAL any day! More code sure ... but also more control and code that's easier to debug - from my experience using an early version of nHibernate it has to be added.
Using an ORM will remove dependencies from your code on a particular SQL dialect. Instead of directly interacting with the database you'll be interacting with an abstraction layer that provides insulation between your code and the database implementation. Additionally, ORMs typically provide protection from SQL injection by constructing parameterized queries. Granted you could do this yourself, but it's nice to have the framework guarantee.
ORMs work in one of two ways: some discover the schema from an existing database -- the LINQToSQL designer does this --, others require you to map your class onto a table. In both cases, once the schema has been mapped, the ORM may be able to create (recreate) your database structure for you. DB permissions probably still need to be applied by hand or via custom SQL.
Typically, the credentials supplied programatically via the API or using a configuration file -- or both, defaults coming from a configuration file, but able to be override in code.
While I agree with the accepted answer almost completely, I think it can be amended with lightweight alternatives in mind.
If you have complex, hand-tuned SQL
If your objects don't have any 1:1, 1:m or m:n relationships with other objects
If you have a complex legacy schema that can't be refactored
...then you might benefit from a lightweight ORM where SQL is is not
obscured or abstracted to the point where it is easier to write your
own database integration.
These are a few of the many reasons why the developer team at my company decided that we needed to make a more flexible abstraction to reside on top of the JDBC.
There are many open source alternatives around that accomplish similar things, and jORM is our proposed solution.
I would recommend to evaluate a few of the strongest candidates before choosing a lightweight ORM. They are slightly different in their approach to abstract databases, but might look similar from a top down view.
jORM
ActiveJDBC
ORMLite
my concern with ORM frameworks is probably the very thing that makes it attractive to lots of developers.
nameley that it obviates the need to 'care' about what's going on at the DB level. Most of the problems that we see during the day to day running of our apps are related to database problems. I worry slightly about a world that is 100% ORM that people won't know about what queries are hitting the database, or if they do, they are unsure about how to change them or optimize them.
{I realize this may be a contraversial answer :) }