Strategies for optimizing views referencing views in SQL Server? - sql

Update:
Out of respect for your time I am adding indices in the tables that the subviews calling from. I will come back and edit this once I have improved as much as I can to minimize complexity and be more specific in my request for help. I can also delete and rewrite this post if that is preferred.
The bottleneck is the subview. The estimated plan shows most of the work is a Hash Match between the tables and the subview. link to query plan
I understand that the predicates and the join columns should be indexed. What I'm not sure are ideal strategies for the subviews.
Should I:
Convert the subview to a table value function? I heard from an expert that this is an ideal solution but they didn't cover why. I don't know if the indexed columns from the subview carry in to the main view
Or do I need to convert the main view in to a table function too to take advantage of the indices?
Or maybe I'm way off and don't need to convert to table value function at all?
Main view:
SELECT *
FROM table1 WITH (INDEX(IX_table1))
INNER JOIN table2 WITH (INDEX(IX_table2)) ON table1.field1 = table2.field1
AND table1.field2 = table2.field2
LEFT JOIN SubView WITH (nolock) ON table1.field1 = SubView1.field1
AND table1.field2 = SubView1.field2
AND table2.field3 = SubView1.field3
AND table2.field4 = SubView1.field4
WHERE table1.PredicateDate >= dynamicDate
AND table1.field1 IN (3, 4)
AND table1.field5 = 0

Apologies . . . put in the answer section instead of comment section. But after trying to fix, it didn't matter . . . not enough posts to add a comment.
This is for tables on a Microsoft ERP system. Microsoft has their default indexes on tables that shouldn't be changed or deleted. On any ERP upgrades, the indexes get recreated by Microsoft anyways.
The tables for most of the reporting are order history headers (8 million records) and lines (57 million records). These tables get populated when an order is transferred to invoice or an invoice is posted. For first situation, order goes to history table and an invoice is created in the open table. The 2nd situation, an anvoice is moved to history table when the invoice is posted. For these processes, the ERP system has a thick client (that hasn't changed much since 2010 or earlier). The process is a rather long process with many tables that does not use an explicit SQL transaction. If this process is interrupted, then a manual fix up is required for any tables that were not updated.
The READONLY/READUNCOMMITTED was initially used for large reporting against the live the database. The Views that Vinh is using are used against a replication server that is now in place. The READONLY is normally used against information that is in previous months/days so the current day changes are not a problem. The large reports were slowing down the transfer and posting processes discussed in the previous paragraph. The posting process above currently takes about 1 hr to post 500 transactions, so it is good any time we can keep the process from slowing down.
Why a specific index is specified: The 57 million rows are divided into order types (SOPTYPE 2 (order), 3 (invoice), and 5 (backorder)). Most of the Microsoft indexes use the SOPTYPE as the 1st field in the index. So most of the queries end up using a index scan rather than an index seek. In some cases, just specifying the index reduces the query time from 2 min to 5 sec. When comparing the index scores, both indexes may be at 80% but SQL tends to choose the index with the SOPTYPE as the first index field.
We are probably one of the larger data users of the particular ERP system. I don't believe Microsoft has optimized for this data size.
I hope this information helps.

It is taking awhile to optimize the subviews due to dependencies outside my control. Were not able to delete the post so closing out this question for now.

Related

Oracle DB performance: view vs table

I have created a table that is a join of 3-4 other tables. The field values in the original source tables from which this table was created DO change, but rarely.
Updating or recreating the table takes about 30 mins-1 hour, and then some reports are run against it. However, this requires keeping track of any changes from the original source tables.
If, instead, I run reports off a VIEW, I know with 100% certainty that all the field values are correct - but will my SELECT performance suffer and become slower due to the view 'going back and fetching' values each time?
In this case, speed is on the same level of importance of accuracy, and my ultimate question is whether to use a view or a table. Thank you to anyone who's taken the time to read this!

Create a Historical Auditing Table

Currently we have an AuditLog table that holds over 11M records. Regardless on the indexes and statistics any query referencing this table takes a long time. Most reports don't check for Audit records past a year but we would still like to keep these records. Whats the best way to handle this?
I was thinking of keeping the AuditLog table to hold all records less than or equal to a year old. Then move any records greater than a year old to an AuditLogHistory table. Maybe just running a batch file every night to move these records over and then update the indexes and statistics of the AuditLog table. Is this an okay way to complete this task? Or what other way should I be storing older records?
The records brought back from the AuditLog table hit a linked server and check in 6 different db's to see if a certain member exists in them based on a condition. I don't have access to make any changes to the linked server db's so can only optimize what I have which is the Auditlog. Hitting the linked server db's uses up over 90% of the queries cost. So I'm just trying to limit what I can.
First, I find it hard to believe that you cannot optimize a query on a table with 11 million records. You should investigate the indexes that you have relative to the queries that are frequently run.
In any case, the answer to your question is "partitioning". You would partition by the date column and be sure to include this condition in all queries. That will reduce the amount of data and probably speed the processing.
The documentation is a good place to start for learning about partitioning.

Performance Issue : Sql Views vs Linq For Decreasing Query Execution Time

I am having a System Setup in ASP.NET Webforms and there is Acccounts Records Generation Form In Some Specific Situation I need to Fetch All Records that are near to 1 Million .
One solution could be to reduce number of records to fetch but when we need to fetch records for more than a year of 5 years that time records are half million, 1 million etc. How can I decrease its time?
What could be points that I can use to reduce its time? I can't show full query here, it's a big view that calls some other views in it
Does it take less time if I design it in as a Linq query? That's why I asked Linq vs Views
I have executed a "Select * from TableName" Query and its 40 mins and its still executing table is having 1,17,000 Records Can we decrease this timeline
I started this as a comment but ran out of room.
Use the server to do as much filtering for you as possible and return as few rows as possible. Client side filtering is always going to be much slower than server side filtering. Eg, it does not have access to the indexes & optimisation techniques that exist on the server.
Linq uses "lazy evaluation" which means that it builds up a method for filtering but does not execute it until it is forced to. I've used it and was initially impressed with the speed ... until I started to access the data it returned. When you use the data you want from Linq, this will trigger the actual selection process, which you'll find is slow.
Use the server to return a series of small resultsets and then process those. If you need to join these resultsets on a key, save them into dictionaries with that key so you can join them quickly.
Another approach is to look at Entity Framework to create a mirror of the server database structure along with indexes so that the subset of data you retrieve can be joined quickly.

How to merge 500 million table with another 500 million table

I have to merge two 500M+ row tables.
What is the best method to merge them?
I just need to display the records from these two SQL-Server tables if somebody searches on my webpage.
These are fixed tables, no one will ever change data in these tables once they are live.
create a view myview as select * from table1 union select * from table2
Is there any harm using the above method?
If I start merging 500M rows it will run for days and if machine reboots it will make the database go into recovery mode, and then I have to start from the beginning again.
Why Am I merging these table?
I have a website which provides a search on the person table.
This table have columns like Name, Address, Age etc
We got 500 million similar .txt files which we loaded into some other
table.
Now we want the website search page to query both tables to see if
a person exists in the table.
We get similar .txt files of 100 million or 20 million, which we load
to this huge table.
How we are currently doing it?
We import the .txt files into separate tables ( some columns are different
in .txt)
Then we arrange the columns and do the data type conversions
Then insert this staging table into the liveCopy huge table ( in
test environment)
We have SQL server 2008 R2
Can we use table partitioning for performance benefits?
Is it ok to create monthly small tables and create a view on top of
them?
How can indexing be done in this case?
We only load new data once in a month and do the select
Does replication help?
Biggest issue I am facing is managing huge tables.
I hope I explained the situation .
Thanks & Regards
1) Usually developers, to achieve more performance, are splitting large tables into smaller ones and call this as partitioning (horizontal to be more precise, because there is also vertical one). Your view is a sample of such partitions joined. Of course, it is mostly used to split a large amount of data into range of values (for example, table1 contains records with column [col1] < 0, while table2 with [col1] >= 0). But even for unsorted data it is ok too, because you get more room for speed improvements. For example - parallel reads if put tables to different storages. So this is a good choice.
2) Another way is to use MERGE statement supported in SQL Server 2008 and higher - http://msdn.microsoft.com/en-us/library/bb510625(v=sql.100).aspx.
3) Of course you can copy using INSERT+DELETE, but in this case or in case of MERGE command used do this in a small batches. Smth like:
SET ROWCOUNT 10000
DECLARE #Count [int] = 1
WHILE #Count > 0 BEGIN
... INSERT+DELETE/MERGE transcation...
SET #Count = ##ROWCOUNT
END
If your purpose is truly just to move the data from the two tables into one table, you will want to do it in batches - 100K records at a time, or something like that. I'd guess you crashed before because your T-Log got full, although that's just speculation. Make sure to throw in a checkpoint after each batch if you are in Full recovery mode.
That said, I agree with all the comments that you should provide why you are doing this - it may not be necessary at all.
You may want to have a look at an Indexed View.
In this way, you can set up indexes on your view and get the best performance out of it. The expensive part of using Indexed Views is in the CRUD operations - but for read performance it would be your best solution.
http://www.brentozar.com/archive/2013/11/what-you-can-and-cant-do-with-indexed-views/
https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/
If the two tables are linked one to one, then you are wasting the cpu time a lot for each read. Especially that you mentioned that the tables don't change at all. You should have only one table in this case.
Try creating a new table including (at least) the two columns from the two tables.
You can do this by:
Select into newTable
from A left join B on A.x=B.y
or (if some people don't have the information of the text file)
Select into newTable
from A inner join B on A.x=B.y
And note that you have to have made index on the join fields at least (to speed up the process).
More details about the fields may help giving more precise answer as well.

Massive table in SQL 2005 database needs better performance!

I am working on a data driven web application that uses a SQL 2005 (standard edition) database.
One of the tables is rather large (8 million+ rows large with about 30 columns). The size of the table obviously effects the performance of the website which is selecting items from the table through stored procs. The table is indexed but still the performance is poor due to the sheer amount of rows in the table - this is part of the problem - the table is as equally read as updated, so we can't add / remove indexes without making one of the operations worse.
The goal I have here is to increase the performance when selecting items from the table. The table has 'current' data and old / barely touched data. The most effective solution we can think of at this stage is to seperate the table into 2, i.e, one for old items (before a certain date, say 1 Jan 2005) and one for newer items (equal to or before 1 Jan 2005).
We know of things like Distributed Partitioned Views - but all of these features require Enterprise Edition, which the client will not buy (and no, throwing hardware at it isn't going to happen either).
You can always roll your own "poor man's partitioning / DPV," even if it doesn't smell like the right way to do it. This is just a broad conceptual approach:
Create a new table for the current year's data - same structure, same indexes. Adjust the stored procedure that writes to the main, big table to write to both tables (just temporarily). I recommend making the logic in the stored procedure say IF CURRENT_TIMESTAMP >= '[some whole date without time]' - this will make it easy to backfill the data in this table which pre-dates the change to the procedure that starts logging there.
Create a new table for each year in your history by using SELECT INTO from the main table. You can do this in a different database on the same instance to avoid the overhead in the current database. Historical data isn't going to change I assume, so in this other database you could even make it read only when it is done (which will dramatically improve read performance).
Once you have a copy of the entire table, you can create views that reference just the current year, another view that references 2005 to the current year (by using UNION ALL between the current table and those in the other database that are >= 2005), and another that references all three sets of tables (those mentioned, and the tables that pre-date 2005). Of course you can break this up even more but I just wanted to keep the concept minimal.
Change your stored procedures that read the data to be "smarter" - if the date range requested falls within the current calendar year, use the smallest view that is only local; if the date range is >= 2005 then use the second view, else use the third view. You can follow similar logic with stored procedures that write, if you are doing more than just inserting new data that is relevant only to the current year.
At this point you should be able to stop inserting into the massive table and, once everything is proven to be working, drop it and reclaim some disk space (and by that I mean freeing up space in the data file(s) for reuse, not performing a shrink db - since you will use that space again).
I don't have all of the details of your situation but please follow up if you have questions or concerns. I have used this approach in several migration projects including one that is going on right now.
performance is poor due to the sheer amount of rows in the table
8 million rows doesn't sound all that crazy. Did you check your query plans?
the table is as equally read as updated
Are you actually updating an indexed column or is it equally read and inserted to?
(and no, throwing hardware at it isn't going to happen either)
That's a pity because RAM is dirt cheap.
Rebuild all your indexes. This will boost up performance of queries.
How to do it is this and More on effect on rebuild of clustered and non-clustered index here
Secondly perform de-fragmentation on your drive on which the DB is stored.