Is this a valid benefit of using embedded SQL over stored procedures? - sql

Here's an argument for SPs that I haven't heard. Flamers, be gentle with the down tick,
Since there is overhead associated with each trip to the database server, I would suggest that a POSSIBLE reason for placing your SQL in SPs over embedded code is that you are more insulated to change without taking a performance hit.
For example. Let's say you need to perform Query A that returns a scalar integer.
Then, later, the requirements change and you decide that it the results of the scalar is > x that then, and only then, you need to perform another query. If you performed the first query in a SP, you could easily check the result of the first query and conditionally execute the 2nd SQL in the same SP.
How would you do this efficiently in embedded SQL w/o perform a separate query or an unnecessary query?
Here's an example:
--This SP may return 1 or two queries.
SELECT #CustCount = COUNT(*) FROM CUSTOMER
IF #CustCount > 10
SELECT * FROM PRODUCT
Can this/what is the best way to do this in embedded SQL?

A very persuasive article
SQL and stored procedures will be there for the duration of your data.
Client languages come and go, and you'll have to re-implement your embedded SQL every time.

In the example you provide, the time saved is sending a single scalar value and a single follow-up query over the wire. This is insignificant in any reasonable scenario. That's not to say there might not be other valid performance reasons to use SPs; just that this isn't such a reason.

I would generally never put business logic in SP's, I like them to be in my native language of choice outside the database. The only time I agree SPs are better is when there is a lot of data movement that don't need to come out of the db.
So to aswer your question, I'd rather have two queries in my code than embed that in a SP, in my view I am trading a small performance hit for something a lot more clear.

How would you do this efficiently in
embedded SQL w/o perform a separate
query or an unnecessary query?
Depends on the database you are using. In SQL Server, this is a simple CASE statement.

Perhaps include the WHERE clause in that sproc:
WHERE (all your regular conditions)
AND myScalar > myThreshold

Lately I prefer to not use SPs (Except when uber complexity arises where a proc would just be better...or CLR would be better). I have been using the Repository pattern with LINQ to SQL where my query is written in my data layer in a strongly typed LINQ expression. The key here is that the query is strongly typed which means when I refactor I am refactoring properties of a class that is directly generated from the database table (which makes changes from the DB carried all the way forward super easy and accurate). While my SQL is generated for me and sent to the server I still have the option of sticking to DRY principles as the repository pattern allows me to break things down into their smallest component. I do have the issue that I might make a trip to the server and based on the results of query I may find that I need to make another trip to the server. I don't worry about this up front. If I find later that it becomes an issue then I may refactor that code into something more performant. The over all key here is that there is no one magic bullet. I tend to work on greenfield applications which allows this method of development to be most efficient for me.

Benefits of SPs:
Performance (are precompiled)
Easy to change (without compiling the application)
SQL set based features make very easy doing really difficult data tasks
Drawbacks:
Depend heavily on the database engine used
Makes deployment of upgrades a little harder (you have to deploy the App + the scripts)
My 2 cents...
About your example, it can be done like this:
select * from products where (select count(*) from customers>10)

Related

Why is dynamical selection of column & table names so difficult in SQL?

I figure there has to be a specific design reason why you can't write a query like the following one:
select
(select column_name
from information_schema
where column_name not like '%rate%'
and table_name = 'Fixed_Income')
from Fixed_Income
and instead have to resort to dynamic SQL.
Anyone knows what that reason is? I tried Googling it, but all the hits were cries for help in solving the problem -- meaning it's a pretty widespread need and not well understood.
The reason is that the query optimizer needs to know the exact schema objects you are referring to at compile time. It needs them to optimize the query. You wouldn't believe how slow the RDBMS would be without having this information available to the query optimizer.
It's a little like the performance difference of static vs. dynamic typing in practice: There is usually a non-trivial difference (I'm thinking just about mainstream languages here). The compiler can exploit the static information to generate great code.
Even if this feature was present, it would be implemented by first computing the table and column names and then doing a standard "static" query planning.
You ask a very interesting question.
The "relational" in "relational algebra" refers to name-value pairs, not to relationships between tables. In relational algebra, there is no requirement that all records in a set (table) have the same columns.
My best guess is that the limitation is related to the idea of entity-relationship diagrams comes into play. A database is designed around tables, and these tables have relationships to each other. The choice of a relational database for data storage and access was specifically when the data could be stored this way. Knowing the entities and their attributes suggests a static form of the data and hence static references in queries.
In addition, SQL as a language is a declarative language rather than a procedural language. This suggests -- but does not impose -- a compilation step separate from the running of the query. In general, the SQL engine does the following (at a very high level):
Compiles the query, generally into some sort of data flow process.
Optimizes the data flow process. (Typically part of the compilation process.)
Runs the query.
The first two result in what is called "the query plan". You really cannot do optimization, though, unless you know about the objects you are operating on. So, dynamically choosing tables and columns means that optimization would be part of running the query rather than compiling it.
Finally, some databases like SQL Server support dynamic SQL. This allows you to build strings that get compiled and run at the same time. This is very useful for complex decision support queries. It is not recommended when you need fast transaction throughput, because the overhead for compilation is too high relative to the query.

Confused about the role of a query language

So, I haven't had any luck finding any articles or forum posts that have explained to me how exactly a query language works in conjunction with a general use programming language like c++ or vb. So I guess it wont hurt to ask >.<
Basically, I've been having a hard time understanding what the roles of the query language are ( we'll use SQL as an example for query language and VB6 for norm language) if i'm creating a simple database query that fills a table with normal information (first name, last name, address etc). I somewhat know the steps in setting up a program like this using ado objects for the connection and whatnot, but how do we decide which language of the 2 gets used for certain things ? Does vb6 specifically handle the basics like loops, if else's, declarations of your vars, and SQL specifically handles things like connecting to the database and doing the searching, filtering and sorting ? Is it possible to do certain general use vb6 actions (loops or conditionals) in SQL syntax instead ? Any help would be GREATLY appreciated.
SQL is a language to query a database. SQL is an ISO standard and relational database vendors implement to the ISO standard and then add on their own customizations. For example in SQL Server it is called T-SQL and in Oracle it is called PL-SQL. They both implement ISO standards and so each will have identical queries for a simple select like
select columname from tablename where columnname=1
However, each have different syntax for string functions, date functions, etc....
The ISO SQL standard by design is not a full procedural language with looping, subroutines, ect as in a full procedural language like VB.
However, each vendor has added capabilities to their version to add some of this functionality in.
For example both T-SQL and PL-SQL can "loop" through records using various constructs in their language.
There is also a difference when working with data that many developers are not well in tuned with. That is set based operations vs. procedural based.
Databases can work with procedural constructs but are often more performant with set based. A developer who is not versed in this concept may end up creating a very innefficient query. Here's an example of this discussion.
With any situation you have to weight out the pro's/con's of where it is best to do this work.
I tend to favor using procedural constructs such as loops in the language I am using over SQL. I find it easier to maintain and the language I am using offers more powerful syntax for me to get the job done.
However, I keep both options as a tool in the toolbox. For example, I have written data conversion scripts in SQL and in this case I have used the looping constructs in SQL.
Usually programming language are executed in the client side (app server too), and query languages are executed in the db server, so in the end it depends where you want to put all the work. Sometimes you can put lot of work in the client side by doing all the calculations with the programming language and other times you want to use more the db server and you end up using the query language or even better tsql/psql or whatever.
Relational databases are designed to manage data. In particular, they provide an efficient mechanism for managing memory, disk, and processors for large quantities of data. In addition, relational databases can handle multiple clients, guarantee transactional integrity, security, backups, persistence, and numerous other functions.
In general, if you are using an RDBMS with another language, you want to design the data structure first and then think about the API (applications programming interface) between the two. This is particularly true when you have an app/server relationship.
For a "simple" type of application, which uses a lot of data but with minimal or batch changes to it, you want to move as much of the processing into the database as is reasonable. Here are things you do not want to do:
Use queries to load things into arrays, and then do array manipulations at the language level. SQL provides joins for this.
Load data into an array and do manipulations and summaries on the array. SQL provides aggregations for this.
Save data into a file to have a backup. Databases provide backup mechanisms.
If you data fits into an array or on an Excel spreadsheet, it is often sufficient to get started with the data stored there. Only when you start to expand the needs (multiple clients, security, integration with other data) do the advantages of a database become more apparent.
These are just for guidance and to give you some ideas.
In terms of doing what where, do as much as is sensible in SQL (given it runs on a server) as you can.
So for instance don't do stuff like this (psuedo code)
foreach(row in "Select * from Orders")
if (row[CustomerID] = 876)
Display(row)
Do
foreach(row in "Select * from Orders where CustomerId = 876")
Display(row)
First it's likely Orders is indexed by CustomerID so it will find all 876s order way quicker.
Second to do the first one you just sucked every record in that table into the client's memory space probably across your network.
What language is used is essentially irrelevant, you could invent your own DBMS with it's own language.
It's where you do what processing that matters. It's Rule with exceptions, but the essential idea is let your backend do as much as it can.

Stored Procedure VS. F#

For most SP-taught developers, there are no option between Linq and Stored-Procedures/Functions. That's may be true.
However, there are a road junctions nowadays. Before I spending too much time into syntax of F#, i would like more inputs about where the power (and opposite) of F# lies.
How will F# perform on this topic (against SP)?
F# have to communicate with a database on some way. Through Linq2Sql/Entity-app-layer or directly though AnyDbConnection. Nothing new there. But F# have the power of parallellism and less overhead in thier work (Functional Programming with C#/F#). Also F# has it's effeciency as a layer for data and machine. Pretty much like C# power of being a layer between human and machine.
Would I really still let the DB Server handle a request of recurring nodes, or just fetch plain data to F# and handle it there? Encapsulated nice and smoothly as a object method call from C#?
Would a stored procedure still be the best option for scanning 50 millions of records for finding orphans or a criteria that matching 0,5% of the result?
Would a SP or function still be best for a simple task as finding next parent node?
Would a SP still being best to collect a million records of data and return calculated sums and/or periods?
Wouldn't a single f# dll library fully built on the Single responsibility principle being of more use then stored procedures hooked up inside a sql server? There are pros and cons, of course. But what are they?
Stored procedures are not magically super-fast. Often, they're actually rather slow.
Many people will downvote this answer providing anecdotal evidence that a stored procedure once made an application faster overall. However, all of those examples that I've actually seen code for indicate that they totally rethought some bad SQL to package it as an SP. I submit that the discipline of repackaging bad SQL into a procedure helped more than the SP itself.
Most of your points can't be evaluated without a measured benchmark.
I suggest that you do the following.
Write it in F#.
Measure it.
If it's too slow for your production application, then try some stored procedures to see if it's faster. If it's fast enough for your production application, then you have your answer, F# worked for you. For your application. For your data. For your architecture.
There's no "general" answer. Although my benchmarks for some particular kinds of queries indicate that the SP engine is pretty slow compared with Java. F# will probably be faster than the SP engine also.
The important thing is to make sure that the database -- if it's going to be "pure" data -- is already optimized so that queries like your "scanning 50 millions of records for finding orphans or a criteria that matching 0,5% of the result?" would retrieve the rows as quickly as possible. This often involves tweaking buffers and array sizes and other elements of the database-to-F# connection. This usually means that you want a more direct connection so that you can adjust the sizes.
Databases are efficient for certain tasks (e.g. when they can uses index to search for a specified row), but probably won't be any faster than F# if you need to process all rows and ubdate them (in database) or calculate some new result based on all the data.
As S. Lott suggests, the best option is to try implementing what you need in F# and you'll find out. Parallelism can give you some performance benefits, especially if you're doing some computationally heavy calculations. However, you may still want to store the data in databases, load it and process it in F# (I believe this is how F# was used by adCenter at Microsoft).
Possibly the most important note is that databases give you various guarantees about the consistency of the data - no matter what happens, you'll still end up with consistent state. Implementing this yourself may be tricky (e.g. when updating data), but you need to consider whether you need it or not.
You ask this:
Would a stored procedure still be the best option for scanning 50 millions of records for finding orphans or a criteria that matching 0,5% of the result?
I take your question to mean 'I have this data in sql server. Should i query it in sql or in client code (F# in this case). Queries like this should absolutely be performed in sql if at all possible. If you're doing it in F#, you're transferring those 50 million rows to the client just to do some aggregation/lookups.
I hope I understood your question correctly.
As I understand an SP just means you call some precompiled execution plan, and you can call it through an API, instead of pushing a string to the server. These two save in the order of millseconds, nowhere near a second. For larger queries that difference is negligible. They're good for highfrequency/ throughput stuff (and of course encapsulating complex logic, but that doens't seem to apply here).
Because an SP uses a procompiled plan, it can indeed be slower than a normal query because it no longer checks the statitsics of the underlying data(becuase the execution plan is already compiled.) Since you mention a condition that applies to 0.5% of the rows, this could be important.
In the discussion of SP vs F# I would reword that to 'on the server' vs 'on the client'. If you're talking higher data volumes (50M rows qualifies) my first choice would always be to 'put the mill where the wood is', that means execute on the server if possible. Only if there is some very complicated logic involved you might want to consider F#, but I don't think that applies. Then still I'd prefer to execute that on the server than first drag all those rows over the network (potentially slow).
GJ

Stored procedures or inline queries?

First of all there is a partial question regarding this, but it is not exactly what I'm asking, so, bear with me and go for it.
My question is, after looking at what SubSonic does and the excellent videos from Rob Connery I need to ask: Shall we use a tool like this and do Inline queries or shall we do the queries using a call to the stored procedure?
I don't want to minimize any work from Rob (which I think it's amazing) but I just want your opinion on this cause I need to start a new project and I'm in the middle of the line; shall I use SubSonic (or other like tool, like NHibernate) or I just continue my method that is always call a stored procedure even if it's a simple as
Select this, that from myTable where myStuff = StackOverflow;
It doesn't need to be one or the other. If it's a simple query, use the SubSonic query tool. If it's more complex, use a stored procedure and load up a collection or create a dataset from the results.
See here: What are the pros and cons to keeping SQL in Stored Procs versus Code and here SubSonic and Stored Procedures
See answers here and here. I use sprocs whenever I can, except when red tape means it takes a week to make it into the database.
Stored procedures are gold when you have several applications that depend on the same database. It let's you define and maintain query logic once, rather than several places.
On the other hand, it's pretty easy for stored procedures themselves to become a big jumbled mess in the database, since most systems don't have a good method for organizing them logically. And they can be more difficult to version and track changes.
I wouldn't personally follow rigid rules. Certainly during the development stages, you want to be able to quickly change your queries so I would inline them.
Later on, I would move to stored procedures because they offer the following two advantages. I'm sure there are more but these two win me over.
1/ Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
One of the big bugbears of a DBA is ad-hoc queries (especially by clowns who don't know what a full table scan is). DBAs prefer to have nice consistent queries that they can tune the database to.
2/ Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
Because the logic for "that list" is stored in the database, the DBAs can modify how it's stored and extracted at will without compromising or changing the client code. This is similar to encapsulation that made object-orientd languages 'cleaner' than what came before.
I've done a mix of inline queries and stored procedures. I prefer more of the stored procedure/view approach as it gains a nice spot for you to make a change if needed. When you have inline queries you always have to go and change the code to change an inline query and then re-roll the application. You also might have the inline query in multiple places so you would have to change a lot more code than with one stored procedure.
Then again if you have to add a parameter to a stored procedure, your still changing a lot of code anyways.
Another note is how often the data changes behind the stored procedure, where I work we have third party tables that may break up into normalized tables, or a table becomes obsolete. In that case a stored procedure/view may minimize the exposure you have to that change.
I've also written a entire application without stored procedures. It had three classes and 10 pages, was not worth it at all. I think there comes a point when its overkill, or can be justified, but it also comes down to your personal opinion and preference.
Are you going to only ever access your database from that one application?
If not, then you are probably better off using stored procedures so that you can have a consistent interface to your database.
Is there any significant cost to distributing your application if you need to make a change?
If so, then you are probably better off using stored procedures which can be changed at the server and those changes won't need to be distributed.
Are you at all concerned about the security of your database?
If so, then you probably want to use stored procedures so that you don't have to grant direct access to tables to a user.
If you're writing a small application, without a wide audience, for a system that won't be used or accessed outside of your application, then inline SQL might be ok.
With Subsonic you will use inline, views and stored procedures. Subsonic makes data access easier, but you can't do everthing in a subsonic query. Though the latest version, 2.1 is getting better.
For basic CRUD operations, inline SQL will be straight forward. For more complex data needs, a view will need to be made and then you will do a Subsonic query on the view.
Stored procs are good for harder data computations and data retrieval. Set based retrieval is usually always faster then procedural processing.
Current Subsonic application uses all three options with great results.
I prefer inline sql unless the stored procedure has actual logic (variables, cursors, etc) involved. I have been using LINQ to SQL lately, and taking the generated classes and adding partial classes that have some predefined, common linq queries. I feel this makes for faster development.
Edit: I know I'm going to get downmodded for this. If you ever talk down on foreign keys or stored procedures, you will get downmodded. DBAs need job security I guess...
The advantages of stored procedure (to my mind)
The SQL is in one place
You are able to get query plans.
You can modify the database structure if necessary to improve performance
They are compiled and thus those query plans do not have to get constructed on the fly
If you use permissions - you can be sure of the queries that the application will make.
Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
the best advantage of using stored procs i found is that when we want to change in the logic, in case of inline query we need to go to everyplace and change it and re- roll the every application but in the case of stored proc change is required only at one place.
So use inline queries when you have clear logic; otherwise prefer stored procs.

Database Abstraction - supporting multiple syntaxes

In a PHP project I'm working on we need to create some DAL extensions to support multiple database platforms. The main pitfall we have with this is that different platforms have different syntaxes - notable MySQL and MSSQL are quite different.
What would be the best solution to this?
Here are a couple we've discussed:
Class-based SQL building
This would involve creating a class that allows you to build SQL querys bit-by-bit. For example:
$stmt = new SQL_Stmt('mysql');
$stmt->set_type('select');
$stmt->set_columns('*');
$stmt->set_where(array('id' => 4));
$stmt->set_order('id', 'desc');
$stmt->set_limit(0, 30);
$stmt->exec();
It does involve quite a lot of lines for a single query though.
SQL syntax reformatting
This option is much cleaner - it would read SQL code and reformat it based on the input and output languages. I can see this being a much slower solution as far as parsing goes however.
I'd recommend class-based SQL building and recommend Doctrine, Zend_Db or MDB2. And yeah, if it requires more lines to write simple selects but at least you get to rely on a parser and don't need to re-invent the wheel.
Using any DBAL is a trade-off in speed, and not just database execution, but the first time you use either of those it will be more painful than when you are really familiar with it. Also, I'm almost a 100% sure that the code generated is not the fastest SQL query but that's the trade-off I meant earlier.
In the end it's up to you, so even though I wouldn't do it and it sure is not impossible, the question remains if you can actually save time and resources (in the long run) by implementing your own DBAL.
A solution could be to have different sets of queries for different platforms with ID's something like
MySql: GET_USERS = "SELECT * FROM users"
MsSql: GET_USERS = ...
PgSql: GET_USERS = ...
Then on startup you load the needed set of queries and refers then
Db::loadQueries(platform):
$users = $db->query(GET_USERS)
Such a scheme would not take account of all the richness which SQL offers, so you would be better off with code-generated stored procs for all your tables for each DB.
Even if you use parametrized stored procs which are more database model-aware (i.e. they do joins or are user-aware and so are optimized for each vendor), that's still a great approach. I always view the database interface layer as providing more than just simple tables to the application, because that approach can be bandwidth-intensive and roundtrip wasteful.
if you have a set of backends that support it, I would agree that generating stored procedures to form a contract is the best approach. This approach, however, doesnt work if you have a backend that is limited in capabilty with regards to stored procedures in which case you build an abstaction layer to implement SQL or generate target specific sql based on an abstract/limited sql syntax.