Views or table functions or something else - sql

I designed 5 stored procedures which almost use same join condition but parameters or values in where clause change for each on different runs.
Is it best solution to create a view with all join conditions without where clause and then query from view or work on view? Can views auto update itself if i create view?
Can i do sub-queries or query similar to (i think i read somewhere views do not support sub queries but not 100% sure)
select count(x1) as x1cnt, count(x2) as x2count
from (
select x1,x2,
(
case when x1 is 'y' then 1 else 0 end +
case when x2 is 'y' then 1 else 0 end
) per
from vw_viewname) v1
where v1.per = 1
Updated below:
In my queries i use joins similar to this also
select c1,c2,c3
FROM [[join conditions - 5 tables]]
Inner join
(
select x1,x2,x3, some case statements
FROM [[join conditions - 5 tables]]
where t1.s1 = val1 and t2.s2 = v2 etc
) s
on s.id = id
so i'm using join twice so i thought can i reduce it using some views

Leaving out their where clause could make the query run more slowly or just give more results than a specific query would. But you will have to determine if that is advantageous based on your system.
You will get the common view results table to work with. View basically run the query when you use them so you will get results as if you did the query yourself by some other mechanism. You can do sub queries on a view just as if it were another table. That should not be a problem. But if you have 5 different queries doing 5 specific things then it is probably beneficial to leave it as so. One or two of those may be called more and you would be trading off their performance with a general view table and gain nothing really for doing so other than view reuse.
I would only construct the view if you have some specific benefit from doing so.
Also I found this post that may be similar Dunno if you will find it helpful or not.
EDIT: Well, I think it would just make it worse really. You would just be calling the view twice and if its a generic view it means each of those calls is going to get a lot of generic results to deal with.
I would say just focus on optimizing those queries to give you exactly what you need. Thats really what you have 5 different procedure for anyway right? :)

It's 5 different queries so leave it like that.
It's seductive to encapsulate similar JOINs in a view, but before you know it you have views on top of views and awful performance. I've seen it many times.
The "subquery in a view" thing probably refers to indexed views which have limitations.

Unless your talking about an indexed view, the view will actually run the script to generate the view on demand. In that regard, it would be the same as using a subquery.
If I were you, I would leave it as it is. It may seem like you should compact your code (each of the 5 scripts have almost the same code), but its what is different that is important here.

You can have subqueries in a view, and that approach is perfectly acceptable.

SQL Server views do support sub-queries. And, in a sense, views to auto update themselves because a view is not a persisted object (unless you use an Indexed View). With a non Indexed View, each time you query the view, it is using the underlying tables. So, your view will be as up to date as the tables they are based upon.
It sounds to me like a view would be a good choice here.

It's fine to create a view, even if it contains a subselect. You can remove the where for the view.
Are you sure you want to use COUNT like that without a group by? It counts the number of rows which contain non-null values or the parameter.

I've done a lot of presentations recently on the simplification offered by the Query Optimiser. Essentially if you have planned your joins well enough, the system can see that they're redundant and ignore them completely.
http://msmvps.com/blogs/robfarley/archive/2008/11/09/join-simplification-in-sql-server.aspx
Stored procedures will do the same work each time (parameters having some effect), but a view (or inline TVF) will be expanded into the outer query and simplified out.

Related

SQL adding Order By clause causes the query to run significantly faster. Explanation needed

So i have this query:
SELECT *
FROM ViewTechnicianStatus
WHERE NotificationClass = 2
AND MachineID IN (SELECT ID FROM MachinesTable WHERE DepartmentID = 1 AND IsMachineActive <> 0)
--ORDER BY ResponseDate DESC
The view is huge and complex with a lot of joins and subqueries. When i run this query it takes forever to finish, however if i add the ORDER BY it finishes instantly and returns 20 rows as intended. I don't understand how adding the ORDER BY could have such a huge positive impact on the performance. Would love if somebody could explain to me the phenomenon.
EDIT:
Here is the rundown with SET STATISTICS TIME, IO ON; flags on. Sorry for the hidden table names but i don't think i can expose those.
Without ORDER BY
With Order BY
To answer your question, The reason that your query runs faster when adding order by is due to the INDEXING. Probably in all the Clients that you tested, had Indexing for those specific fields/tables, and on using the Order by made the performance better.
Summary
OK.. I've thought about this for a while as I think it's an interesting issue. I believe it's very much an edge case - which is part of what makes it interesting.
I'm taking an educated guess based on the info provided - obviously, without being able to see it/play with it, I cannot be certain. But I think this explanation matches the evidence based on the info you provide and the statistics.
I think the main issue is a poor query plan. In the version without the sort, it uses an inappropriate nested loop; in the version with the sort, it does (say) a hash match or merge join.
I have found that SQL Server often has issues with query plans within complex views that reference other views, and especially if those sub-views have group bys/sorts/etc.
For demonstration of what difference that could occur, I'll simplify your complex view into 2 subgroups I'll call 'view groups' (which may be one or several views and tables - not a technical term, just a term to summarise them).
The first view group contains most tables,
The second view group contains the views getting data from tables 6 and 7.
For both approaches, how SQL uses the data in the view groups are probably the same (e.g., use the same indexes, etc). However, there's a difference in its approach to how it does the join between the two view groups.
Example - query planner underestimates cost of view group 2 and doesn't care which method is used
I'm guessing
The first view group, at the point of the join, is dealing with about 3000 rows (it hasn't filtered it down yet), and
The query builder thinks view group 2 is easy to run
In the version without the order by, the query plan is designed with a nested loop join. That is, it gets each value in view group 1, and then for each value it runs view group 2 to get the relevant data. This means the view group 2 is run 3000-ish times (once for each value in view group 1).
In the version with the order by, it decides to do (say) a hash match between view group 1 and view group 2. This means it has to only run view group 2 once, but spend a bit more time sorting it. However, because you asked for it to be sorted anyway, it chooses the hash match.
However, because the query designed underestimated the cost of view group 2, it turns out that the hash match is a much much better query plan for the circumstances.
Example - query planner use of cached plans
I believe (but may be wrong!) that when you reference views within views, it can often just used cached plans for the sub-views rather than trying to get the best plan possible for your current situation.
It may be that one of your views uses the "cached plan" whereas the other one tries to optimise the query plan including the sub-views.
Ironically, it may be that the query version with the order by is more complex, and in this case it uses the cached plans for view group 2. However, as it knows it hasn't optimised the plan for view group 2, it simply gets the data once for view group 2, then keeps all results in memory and uses it in a hash match.
In contrast, in the version without the order by, it takes a shot at optimising the query plan (including optimising how it uses the views), and makes a mess of it.
Possible solutions
These are all possibilities - they may make it better or may make it worse! Note that SQL is a declarative language (you tell the computer what to do/what you want, but not how to do it).
This is not a comprehensive list of possibilities, but they are things you can try
Pre-calculate all or part(s) of the views (e.g., put the pre-calculated stuff from tables 6 and 7 into a temporary table, then use the temporary tables in the views)
Simplify the SQL and/or move all the SQL into a single view that doesn't call other views
Use join hints e.g., instead of INNER JOIN, use INNER HASH JOIN in the appropriate position
Use OPTION(RECOMPILE)

Filtering a view of 20 thousands rows takes 30minutes

My issue is i have a view that returns 20.000 lines of data for a certain day
select * from view
where date = X
but then when i add a where clause for the perimeter i need
select * from view
where date = X and perimeter in ( 'A' , 'B' , 'C', 'D' )
the result is 10.000 lines more or less but it takes up to 30minutes to get the result.
The view is quite complicated with dozens of joins and agregates etc
how can i fix that ?
Thanks for your help,
and i you have any question, ask me :)
** UPDATE : thanks for your answers .
The real query is pretty complicated. This is only an example to explain my problem. The execution plan is quite long and unuseful since i have the the exact (100%) same execution plan ( estimated and real )
whether i use the filter on the perimeter column or not.
but time behind that goes from 30secondes when i only filter date to 30minutes when i add the perimter filter.
so it must mean that the execution plan is wrong anyways.
statistics are normally updated on that tables.
there is no index on this column on the table where i seek it. is adding one necessary for the view ?
The column we filter on is retrieved thanks to a left join in the view
**
First ask yourself:
Is it normal to take such time?
In some case, when data is huge and statements are complex, is normal for certain query to take more time.
If this is not your case, then:
Do I need all columns? - why SELECT *, selecting only the needed columns can optimize the view execution sometimes
Do I need all the logic used in the view? - if the view is so complex, think if it is possible to get only the statements you need (for example, create separate view with less logic, less data sources, less columns)
Is the query using the best indexes - this is huge topic, but you are performing filtering, so it might be possible to optimize the query using appropriate index or even filtered index?
Can I filtered the data in advance? - sometimes you can create inline table-valued function in order to perform filtering in advance (the function is basically a view with parameters - imagine passing parameter 'X' to this function, which will be passed to other function refereed in the view returning now only 5% of the rows generated previously - so, may be some of the values in your WHERE clauses can be passed as paramateres);
Do I need to materialized this view? - it is a last resort, but you can create index on the view in order to materialized it like ordinary table and optimize performance, further (note, that not every view can be indexed)
From what I have seen, a lot of developers are using legacy views, created for solving particular issues in the past, instead creating new ones. And many many times, these views are doing much more then actually is needed.

Create table from table/view?

I have a weird scenario. I tried to see if I could find any help on the topic, but I either don't know how to search for it properly, or there is nothing to find.
So here is the scenario.
I have a table A. From Table T_A, I created a view V_B. Now, I can make UPDATES to V_B, and it works just fine. Then when I create a view V_C which is an UNION of T_A and T_D, the view V_C is un-Updateable. I understand the logic behind why that is the case.
But my question is, is there something I can do where I combine 2 tables and am able to update?
Maybe in a way have table T_D extend T_A?
Some extra information: T_A has items 1-10 and T_D has items 100 - 200. I want to join them so there is a table/view which is updateable that has items 1-10 and 100-200.
If you have a non-updatable view, you can always make it updatable by defining instead of triggers on the view. That means that you would need to implement the logic to determine how to translate DML against the view into DML against one or both of the base tables. In your case, it sounds like that would be the logic to figure out which of the two tables to update.
A couple of points, though.
If T_A and T_D have non-overlapping data, it doesn't make sense to use a UNION, which does an implicit DISTINCT. You almost certainly want to use the less expensive UNION ALL.
If you find yourself storing data about items in two separate tables, only to UNION ALL those two tables together in a view, it is highly likely that you have an underlying data model problem. It would seem to make much more sense to have a single table of items possibly with an ITEM_TYPE that is either A or D.
It may be possible to make your view updatable if you use a UNION ALL and have (or add) non-overlapping constraints that would allow you to turn your view into a partition view. That's something that has existed in Oracle for a long time but you won't find a whole lot of documentation about it in recent versions because Oracle partitioning is a much better solution for the vast majority of use cases today. But the old 7.3.4 documentation should still work.

SQL Query Performance in case of simple union

I have a simple query tuning question, it is
can We improve the performance of a view which have definition as
SELECT * FROM A
UNION ALL
SELECT * FROM B
and this is performing so poorly that it is taking 12 seconds for 6.5k Records
Any help is appreciated.
The problem with a union view like that is that the original columns of the tables are not accessible to the optimizer via the view, so when you use the view like this:
select * from myview where foo = 5
the where clause is a filter on the entire rowset delivered by the view, so all the rows from both tables are processed after the union, so no indexes will be used.
To have any hope at performance, you have to somehow get the condition you want inside the view and applied to each table, but keep it variable so you can apply different criteria when you use it.
I found a work-around that does this. It's a little bit hacky, and doesn't work for concurrent use, but it works! Try this:
create table myview_criteria(val int);
insert into myview_criteria values (0); -- it should have exactly one row
create view myview as
SELECT * FROM A
WHERE foo = (select val from myview_criteria)
UNION ALL
SELECT * FROM B
WHERE foo = (select val from myview_criteria);
then to use it:
update myview_criteria set val = 5;
select * from myview;
Assuming there's an index on foo, that index will be used.
Caution here, because obviously this technique won't work when multiple executions are done concurrently.
is there a reason u keep these two not in the same table, no matter what you do, eventually it will be horrible. If possible consider a migration to one table. If not, you can always "materialize a view" by inserting them all into the same thing.
with that being said, you are not going to be selecting * on a union, if there are specific conditions on top of the view, performance can be improve by the indexing that column.
It would help if you tell everyone what specific db this is for.
My best advice is to limit the number or records that are returned from the source tables to only what you really need and making sure your data is properly indexed.
Don't use * unless its an ad-hoc script (even then it would still be better to list the columns) include only the columns that you really need for the task at hand.
Use proper indexes based on the most frequently used queries. Remove any unused indexes. Remove any duplicate indexes.
Pluralsight has a wonderful course on MS SQL Server query performance tuning.
http://www.pluralsight.com/training/Courses/TableOfContents/query-tuning-introduction
The course is presented by Vinod Kumar and Pinal Dave who are two very brilliant individuals.
You could also check out Pinal's article on SQL Server Performance to help point out areas that you may not have thought about.
http://blog.sqlauthority.com/sql-server-performance-tuning/

Performance when querying a View

I'm wondering if this is a bad practice or if in general this is the correct approach.
Lets say that I've created a view that combines a few attributes from a few tables.
My question, what do I need to do so I can query against this view as if it were a table without worrying about performance?
All attributes in the original tables are indexed, my concern is that the result view will have hundreds of thousands of records, which I will want to narrow down quite a bit based on user input.
What I'd like to avoid, is having multiple versions of the code that generates this view floating around with a few extra "where" conditions to facilitate the user input filtering.
For example, assume my view has this header VIEW(Name, Type, DateEntered) this may have 100,000+ rows (possibly millions). I'd like to be able to make this view in SQL Server, and then in my application write querlies like this:
SELECT Name, Type, DateEntered FROM MyView WHERE DateEntered BETWEEN #date1 and #date2;
Basically, I am denormalizing my data for a series of reports that need to be run, and I'd like to centralize where I pull the data from, maybe I'm not looking at this problem from the right angle though, so I'm open to alternative ways to attack this.
My question, what do I need to do so I can query against this view as if it were a table without worrying about performance?
SQL Server is very good in view unnesting.
Your queries will be as efficient as if the view's query were used in the query itself.
This means that
CREATE VIEW myview AS
SELECT *
FROM /* complex joins */
SELECT *
FROM mytable
JOIN myiew
ON …
and
SELECT *
FROM mytable
JOIN (
SELECT *
FROM /* complex joins */
) myview
ON …
will have the same performance.
SQL Server 2005 has indexed views - these provide indexes on views. That should help with performance. If the underlying tables already have good indexes on the queried fields, these will be used - you should only add indexed views when this is not the case.
These are known in other database systems as materialized views.
The view will make use of the index in your WHERE clause to filter the results.
Views aren't stored result sets. They're stored queries, so you'll have the performance gained from your indexes each time you query the view.
Why would it perform badly? I, mean you can think of a view as a compiled select statement. It makes use of existing indexes on the underlying tables, even when you add extra where clauses. In my opinion it is a good approach. In any case it's better than having virtually the same select statement scattered all over your application (from a design and maintainability point of view at least).
If not indexed then...
When you query a view, it's ignored. The view is expanded into the main query.
It is the same as querying the main tables directly.
What will kill you is view on top of view on top of view, in my experience.
It should, in general, perform no worse than the inline code.
Note that it is possible to make views which hide very complex processing (joins, pivots, stacked CTEs, etc), and you may never want anyone to be able to SELECT * FROM view on such a view for all time or all products or whatever. If you have standard filter criteria, you can use an inline table-valued function (effectively a parameterized view), which would require all users to supply the expected parameters.
In your case, for instance, they would always have to do:
SELECT Name, Type, DateEntered
FROM MyITVF(#date1, #date2);
To share the view logic between multiple ITVFs, you can build many inline table-valued functions on top of the view, but not give access to the underlying tables or views to users.