sqlite3 views affect performance? - sql

Trying to understand databases better in general, and sqlite3 in particular:
Are views in sqlite3 mainly an organizational feature, allowing complex queries to be broken into a series of smaller ones; or do views actually affect the performance of queries that use them?
I noticed that views are stored in the database itself as part of the schema. Are views stored on disk, updated dynamically as dependent tables are updated; or are they evaluated on demand?
Thanks.

Views will always be executed on demand (sqlite3 or otherwise), so the results they return are never persistently stored.
As for performance, while I can't speak to sqlite3 specifically, usually using views will have slightly less overhead as the query parser/planner doesn't have to reparse the raw sql on each execution. It can parse it once, store its execution strategy, and then use that each time the query is actually run.
The performance boost you see with this will generally be small, in the grand scheme of things. It really only helps if its a fast query that you're executing frequently. If its a slow query you execute infrequently, the overhead associated with parsing the query is insignificant. Views do, of course, provide a level of organization which is nice.

Related

SQL Views v Stored procedures

I've just started joining Stored Procedures together using views and it seems a simple way of building up a short query using the results of others.
Are there any disadvantages to over relying on Views before I plough on with this method? Am I better to pursue the temporary table option?
The main differences are that a view only actually stores the query not the results (with the exception of materialised views) and views persist after the end of your session. Views are an excellent way of hiding complexity, but does not make the queries run more quickly than if you wrote out the whole thing in one query. Views also do not use up storage space (except for a very small amount for the metadata).
I would recommend using views if you do not have any requirements to speed the queries up further or if you need to be able to reference the data without recreating it subsequent sessions.
Temporary tables do store the result, but just for the current session, so if you need a base query to speed up further queries for the duration of your session, this can be useful.
In fact, views are mostly used for security reasons, and they also make queries more simple (for some cases.) So it just depends on what you are doing, based on if it requires storing and other requirements.

temporary tables in SQL

I need to know if it is standard practice to decompose complex queries into parts and create temporary tables which are dropped at the end.? In OLAP applications it shouldnt be much of an issue, but in OLTP since speed matters is it avoided?.
For simple queries which are well-optimized by your DBMS, temporary tables are usually a bad idea because they introduce overhead.
But sometimes your DBMS will have a really hard time optimizing complex queries. At that point you have at least 5 options:
change your schema or indexes to make it easier for the optimizer to choose a better query plan
tweak your SQL to get the DBMS to choose the indexes, join strategies, etc. that you want and to work around known and unknown bugs in your DBMS's optimizer.
use "hints" to get the DBMS to choose the indexes, join strategies, etc. that you want.
Get the plan you want and use a "saved plan" to force its use by the DBMS.
use temp tables (or table variables, etc.) to decompose complex queries into simpler intermediate queries
There's no hard-and-fast rule about which option is best for any particular query. I've used all of the above strategies. I tend to choose the temp table approach when I don't own the schema, so I can't change it, and when I don't want to depend on hints or query tuning or saved plans (often because I don't want to expose myself to changes in the underlying schema made later).
Keep in mind that using temp tables to decompose queries will give you sub-optimal performance every time. But it's usually predictably sub-optimal. The worst case using temp tables isn't nearly as bad as when your DBMS chooses a bad plan for a single large query. This happens surprisingly often, especially in the face of changes in underlying schema, DBMS version changes, dev vs. production differences, etc.
Personally, I find that if a query gets to a level of complexity where I have to bend over backwards to get the DBMS to do what I want, and if I feel that maintainability of the application is at risk, then I'll often go with decomposition and temp tables if I can't change the schema or indexes.
Of course, in theory you shouldn't be running expensive, complex queries on your OLTP database, but in practice most applications are never "pure" OLTP-- there's always a few complicated, hard-to-optimize queries in any OLTP project.
The critical word in your question is "decompose". Temp tables and other strategies are generally discouraged and found to lead to lower overall performance. The optimizer is perfectly capable of using intermediate tables if they are useful for getting to the answer most quickly. Very rarely can you help the optimizer by coercing it with your own strategy.
The same thing goes for suggesting which indexes to use.
When you see this going on, almost always some one has more work to do refining their query statements.
The only time I've used temp tables during OLTP processing is when I am dealing with a batch of data that I need to analyze/join, and eventually do a data change operation on it (Insert/Update/Delete). I'll use temp tables for a) speed but more importantly, b) because the normal select/update or select/delete logic is either too complex or can't be done in one transactional statement.
For example, find 100k users who meet some condition, and insert them into an archive table and then delete them.
I don't recommend using temp tables in most cases for normal select statements. You can almost always get better performance with either proper indexing, better sql join/hints and/or changing the data structure to match data access paths.
IN oltp systems if the processing is part of the online system (i.e. not batch) then I can't recall ever using a temporary table. Using some sort of procedural logic is usually the way to go - e.g. PL/Sql in Oracle and so on.
In OLAP temporary tables are very common, usually load the data into a table, transform it and save the result in another table, and depending on the processing have a number of transform steps.
I'd go so far as to say if you have an oltp system and you need to use a temporary table, then something is incorrect, modify your design, or use procedural logic. In OLAP, temporary tables are very common.
hth

Is a view faster than a simple query?

Is a
select * from myView
faster than the query itself to create the view (in order to have the same resultSet):
select * from ([query to create same resultSet as myView])
?
It's not totally clear to me if the view uses some sort of caching making it faster compared to a simple query.
Yes, views can have a clustered index assigned and, when they do, they'll store temporary results that can speed up resulting queries.
Microsoft's own documentation makes it very clear that Views can improve performance.
First, most views that people create are simple views and do not use this feature, and are therefore no different to querying the base tables directly. Simple views are expanded in place and so do not directly contribute to performance improvements - that much is true. However, indexed views can dramatically improve performance.
Let me go directly to the documentation:
After a unique clustered index is created on the view, the view's result set is materialized immediately and persisted in physical storage in the database, saving the overhead of performing this costly operation at execution time.
Second, these indexed views can work even when they are not directly referenced by another query as the optimizer will use them in place of a table reference when appropriate.
Again, the documentation:
The indexed view can be used in a query execution in two ways. The query can reference the indexed view directly, or, more importantly, the query optimizer can select the view if it determines that the view can be substituted for some or all of the query in the lowest-cost query plan. In the second case, the indexed view is used instead of the underlying tables and their ordinary indexes. The view does not need to be referenced in the query for the query optimizer to use it during query execution. This allows existing applications to benefit from the newly created indexed views without changing those applications.
This documentation, as well as charts demonstrating performance improvements, can be found here.
Update 2: the answer has been criticized on the basis that it is the "index" that provides the performance advantage, not the "View." However, this is easily refuted.
Let us say that we are a software company in a small country; I'll use Lithuania as an example. We sell software worldwide and keep our records in a SQL Server database. We're very successful and so, in a few years, we have 1,000,000+ records. However, we often need to report sales for tax purposes and we find that we've only sold 100 copies of our software in our home country. By creating an indexed view of just the Lithuanian records, we get to keep the records we need in an indexed cache as described in the MS documentation. When we run our reports for Lithuanian sales in 2008, our query will search through an index with a depth of just 7 (Log2(100) with some unused leaves). If we were to do the same without the VIEW and just relying on an index into the table, we'd have to traverse an index tree with a search depth of 21!
Clearly, the View itself would provide us with a performance advantage (3x) over the simple use of the index alone. I've tried to use a real-world example but you'll note that a simple list of Lithuanian sales would give us an even greater advantage.
Note that I'm just using a straight b-tree for my example. While I'm fairly certain that SQL Server uses some variant of a b-tree, I don't know the details. Nonetheless, the point holds.
Update 3: The question has come up about whether an Indexed View just uses an index placed on the underlying table. That is, to paraphrase: "an indexed view is just the equivalent of a standard index and it offers nothing new or unique to a view." If this was true, of course, then the above analysis would be incorrect! Let me provide a quote from the Microsoft documentation that demonstrate why I think this criticism is not valid or true:
Using indexes to improve query performance is not a new concept; however, indexed views provide additional performance benefits that cannot be achieved using standard indexes.
Together with the above quote regarding the persistence of data in physical storage and other information in the documentation about how indices are created on Views, I think it is safe to say that an Indexed View is not just a cached SQL Select that happens to use an index defined on the main table. Thus, I continue to stand by this answer.
Generally speaking, no. Views are primarily used for convenience and security, and won't (by themselves) produce any speed benefit.
That said, SQL Server 2000 and above do have a feature called Indexed Views that can greatly improve performance, with a few caveats:
Not every view can be made into an indexed view; they have to follow a specific set of guidelines, which (among other restrictions) means you can't include common query elements like COUNT, MIN, MAX, or TOP.
Indexed views use physical space in the database, just like indexes on a table.
This article describes additional benefits and limitations of indexed views:
You Can…
The view definition can reference one or more tables in the
same database.
Once the unique clustered index is created, additional nonclustered
indexes can be created against the view.
You can update the data in the underlying tables – including inserts,
updates, deletes, and even truncates.
You Can’t…
The view definition can’t reference other views, or tables
in other databases.
It can’t contain COUNT, MIN, MAX, TOP, outer joins, or a few other
keywords or elements.
You can’t modify the underlying tables and columns. The view is
created with the WITH SCHEMABINDING option.
You can’t always predict what the query optimizer will do. If you’re
using Enterprise Edition, it will automatically consider the unique
clustered index as an option for a query – but if it finds a “better”
index, that will be used. You could force the optimizer to use the
index through the WITH NOEXPAND hint – but be cautious when using any
hint.
EDIT: I was wrong, and you should see Marks answer above.
I cannot speak from experience with SQL Server, but for most databases the answer would be no. The only potential benefit that you get, performance wise, from using a view is that it could potentially create some access paths based on the query. But the main reason to use a view is to simplify a query or to standardize a way of accessing some data in a table. Generally speaking, you won't get a performance benefit. I may be wrong, though.
I would come up with a moderately more complicated example and time it yourself to see.
In SQL Server at least, Query plans are stored in the plan cache for both views and ordinary SQL queries, based on query/view parameters. For both, they are dropped from the cache when they have been unused for a long enough period and the space is needed for some other newly submitted query. After which, if the same query is issued, it is recompiled and the plan is put back into the cache. So no, there is no difference, given that you are reusing the same SQL query and the same view with the same frequency.
Obviously, in general, a view, by it's very nature (That someone thought it was to be used often enough to make it into a view) is generally more likely to be "reused" than any arbitrary SQL statement.
Definitely a view is better than a nested query for SQL Server. Without knowing exactly why it is better (until I read Mark Brittingham's post), I had run some tests and experienced almost shocking performance improvements when using a view versus a nested query. After running each version of the query several hundred times in a row, the view version of the query completed in half the time. I'd say that's proof enough for me.
It may be faster if you create a materialized view (with schema binding). Non-materialized views execute just like the regular query.
My understanding is that a while back, a view would be faster because SQL Server could store an execution plan and then just use it instead of trying to figure one out on the fly. I think the performance gains nowadays is probably not as great as it once was, but I would have to guess there would be some marginal improvement to use the view.
I would expect the two queries to perform identically. A view is nothing more than a stored query definition, there is no caching or storing of data for a view. The optimiser will effectively turn your first query into your second query when you run it.
It all depends on the situation. MS SQL Indexed views are faster than a normal view or query but indexed views can not be used in a mirrored database invironment (MS SQL).
A view in any kind of a loop will cause serious slowdown because the view is repopulated each time it is called in the loop. Same as a query. In this situation a temporary table using # or # to hold your data to loop through is faster than a view or a query.
So it all depends on the situation.
There should be some trivial gain in having the execution plan stored, but it will be negligible.
In my finding, using the view is a little bit faster than a normal query. My stored procedure was taking around 25 minutes (working with a different larger record sets and multiple joins) and after using the view (non-clustered), the performance was just a little bit faster but not significant at all. I had to use some other query optimization techniques/method to make it a dramatic change.
Select from a View or from a table will not make too much sense.
Of course if the View does not have unnecessary joins, fields, etc. You can check the execution plan of your queries, joins and indexes used to improve the View performance.
You can even create index on views for faster search requirements. http://technet.microsoft.com/en-us/library/cc917715.aspx
But if you are searching like '%...%' than the sql engine will not benefit from an index on text column. If you can force your users to make searches like '...%' than that will be fast
referred to answer on asp forums :
https://forums.asp.net/t/1697933.aspx?Which+is+faster+when+using+SELECT+query+VIEW+or+Table+
Against all expectation, views are way slower in some circumstances.
I discovered this recently when I had problems with data which was pulled from Oracle which needed to be massaged into another format. Maybe 20k source rows. A small table. To do this we imported the oracle data as unchanged as I could into a table and then used views to extract data.
We had secondary views based on those views. Maybe 3-4 levels of views.
One of the final queries, which extracted maybe 200 rows would take upwards of 45 minutes! That query was based on a cascade of views. Maybe 3-4 levels deep.
I could take each of the views in question, insert its sql into one nested query, and execute it in a couple of seconds.
We even found that we could even write each view into a temp table and query that in place of the view and it was still way faster than simply using nested views.
What was even odder was that performance was fine until we hit some limit of source rows being pulled into the database, performs just dropped off a cliff over the space of a couple of days - a few more source rows was all it took.
So, using queries which pull from views which pull from views is much slower than a nested query - which makes no sense for me.
There is no practical different and if you read BOL you will find that ever your plain old SQL SELECT * FROM X does take advantage of plan caching etc.
The purpose of a view is to use the query over and over again. To that end, SQL Server, Oracle, etc. will typically provide a "cached" or "compiled" version of your view, thus improving its performance. In general, this should perform better than a "simple" query, though if the query is truly very simple, the benefits may be negligible.
Now, if you're doing a complex query, create the view.
No. view is just a short form of your actual long sql query. But yes, you can say actual query is faster than view command/query.
First view query will tranlate into simple query then it will execute, so view query will take more time to execute than simple query.
You can use sql views when you are using joins b/w multiple tables, to reuse complicated query again and again in simple manners.
I ran across this thread and just wanted to share this post from Brent Ozar as something to consider when using availability groups.
Brent Ozar bug report

Performance gains in stored procs for long running transactions

I have several long running report type transactions that take 5-10 minutes. Would I see any performance increase by using stored procs? Would it be significant?
each query runs once a night.
Probably not. Stored procs give you the advantage of pre-compiled SQL. If your SQL is invoked infrequently, they this advantage will be pretty worthless. So if you have SQL that is expensive because the queries themselves are expensive, then stored procs will gain you no meaningful performance advantage. If you have queries that are invoked very frequently and which themselves execute quickly, then it's worth having a proc.
Most likely not. The performance gains from stored procs, if any (depends on your use case) are the kind that are un-noticable in the micro -- only in the macro.
Reporting-type queries are ones that aggregate LOTS of data and if that's the case it'll be slow no matter how the execution method. Only indexing and/or other physical data changes can make it faster.
See:
Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's?
The short answer is: no, stored procedures aren't going to improve the performance.
For a start, if you are using parameterised queries there is no difference in performance between a stored procedure and inline SQL. The reason is that ALL queries have cached execution plans - not just stored procedures.
Have a look at http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx
If you aren't parameterising your inline queries and you're just building the query up and inserting the 'parameters' as literals then each query will look different to the database and it will need to pre-compile each one. So in this case, you would be doing yourself a favour by using parameters in your inline SQL. And you should do this anyway from a security perspective, otherwise you are opening yourself up to SQL injection attacks.
But anyway the pre-compilation issue is a red herring here. You are talking about long running queries - so long that the pre-compliation is going to be insignificant. So unfortunately, you aren't going to get off easily here. Your solution is going to be to optimise the actual design of your queries, or even to rethink the whole way you are aproaching the task.
yes, the query plan for stored procs can be optimized
and even if it can't procs are preferred over embedded sql
"would you see any performance improvement" - the only way to know for certain is to try it
in theory, stored procedures pre-parse the sql and store the query plan instead of figuring out each time, so there should be some speedup just from that, however, i doubt it would be significant in a 5-10 minute process
if the speed is of concern your best bet is to look at the query plan and see if it can be improved with different query structures and/or adding indices et al
if the speed is not of concern, stored procs provide better encapsulation than inline sql
As others have said, you won't see much performance gain from the stored procedure being pre-compiled. However, if your current transactions have multiple statements, with data going back and forth between the server, then wrapping it in a stored procedure could eliminate some of that back-and-forth, which can be a real performance killer.
Look into proper indexing, but also consider the fact that the queries themselves (or the whole process if it consists of multiple steps) might be inefficient. Without seeing your actual code it's hard to say.

Why is parameterized SQL generated by NHibernate just as fast as a stored procedure?

One of my co-workers claims that even though the execution path is cached, there is no way parameterized SQL generated from an ORM is as quick as a stored procedure. Any help with this stubborn developer?
I would start by reading this article:
http://decipherinfosys.wordpress.com/2007/03/27/using-stored-procedures-vs-dynamic-sql-generated-by-orm/
Here is a speed test between the two:
http://www.blackwasp.co.uk/SpeedTestSqlSproc.aspx
Round 1 - You can start a profiler trace and compare the execution times.
For most people, the best way to convince them is to "show them the proof." In this case, I would create a couple basic test cases to retrieve the same set of data, and then time how long it takes using stored procedures versus NHibernate. Once you have the results, hand it over to them and most skeptical people should yield to the evidence.
I would only add a couple things to Rob's answer:
First, Make sure the amount of data involved in the test cases is similiar to production values. In other words if your queries are normally against tables with hundreds of thousands or rows, then create such a test environment.
Second, make everything else equal except for the use of an nHibernate generated query and a s'proc call. Hopefully you can execute the test by simply swapping out a provider.
Finally, realize that there is usually a lot more at stake than just stored procedures vs. ORM. With that in mind the test should look at all of the factors: execution time, memory consumption, scalability, debugging ability, etc.
The problem here is that you've accepted the burden of proof. You're unlikely to change someone's mind like that. Like it or not, people--even programmers-- are just too emotional to be easily swayed by logic. You need to put the burden of proof back on him- get him to convince you otherwise- and that will force him to do the research and discover the answer for himself.
A better argument to use stored procedures is security. If you use only stored procedures, with no dynamic sql, you can disable SELECT, INSERT, UPDATE, DELETE, ALTER, and CREATE permissions for the application database user. This will protect you against most 2nd order SQL Injection, whereas parameterized queries are only effective against first order injection.
Measure it, but in a non-micro-benchmark, i.e. something that represents real operations in your system. Even if there would be a tiny performance benefit for a stored procedure it will be insignificant against the other costs your code is incurring: actually retrieving data, converting it, displaying it, etc. Not to mention that using stored procedures amounts to spreading your logic out over your app and your database with no significant version control, unit tests or refactoring support in the latter.
Benchmark it yourself. Write a testbed class that executes a sampled stored procedure a few hundred times, and run the NHibernate code the same amount of times. Compare the average and median execution time of each method.
It is just as fast if the query is the same each time. Sql Server 2005 caches query plans at the level of each statement in a batch, regardless of where the SQL comes from.
The long-term difference might be that stored procedures are many, many times easier for a DBA to manage and tune, whereas hundreds of different queries that have to be gleaned from profiler traces are a nightmare.
I've had this argument many times over.
Almost always I end up grabbing a really good dba, and running a proc and a piece of code with the profiler running, and get the dba to show that the results are so close its negligible.
Measure it.
Really, any discussion on this topic is probably futile until you've measured it.
He may be correct for the specific use case he is thinking of. A stored procedure will probably execute faster for some complex set of SQL, that can be arbitrarily tuned. However, something you get from things like hibernate is caching. This may prove much faster for the lifetime of your actual application.
The additional layer of abstraction will cause it to be slower than a pure call to a sproc. Just by the fact that you have additional allocations on the managed heap, and additional pushes and pops off the callstack, the truth of the matter is that it is more efficient to call a sproc over having an ORM build the query, regardless how good the ORM is.
How slow, if its even measurable, is debatable. This is also helped by the fact that most ORM's have a caching mechanism to avoid doing the query at all.
Even if the stored procedure is 10% faster (it probably isn't), you may want to ask yourself how much it really matters. What really matters in the end, is how easy it is to write and maintain code for your system. If you are coding a web app, and your pages all return in 0.25 seconds, then the extra time saved by using stored procedures is negligible. However, there can be many added advantages of using an ORM like NHibernate, which would be extremely hard to duplicate using only stored procedures.