Full-text index population performance on a SQL 2005 indexed view

Full-text index population performance on a SQL 2005 indexed view - sql-server-2005

I have created an indexed view:
CREATE VIEW LogValueTexts WITH SCHEMABINDING AS
SELECT ISNULL(LRVS_SLOG_ID*256+LRVS_IDX,0) AS ID,LRVS_VALUE AS Value
FROM dbo.LRVS_LogRecordedValues WHERE LEN(LRVS_VALUE)>4
CREATE UNIQUE CLUSTERED INDEX IX_LogValueTexts ON LogValueTexts (ID)
On SQL 2005 Standard SP3 it takes forever to populate a full-text index on that view because the full-text indexing executes the following query for every row in the view:
SELECT COLUMN FULLTEXTALL FROM[dbo].[LogValueTexts] WHERE COLUMN FULLTEXTKEY = #p1
I assume that COLUMN FULLTEXTALL and COLUMN FULLTEXTKEY are actually Value and ID, but that's what SQL Server Profiler shows. The problem is that the query plan uses a clustered index scan over about 11M rows/1GB of data because it doesn't use the index on the view.
I have tried creating a plan guide for that query, but since it's not a standard T-SQL query it doesn't allow it (Incorrect syntax near the keyword 'FULLTEXTKEY').
Is there a way to get this full-text index to work other than:
upgrading to SQL 2008 (or SQL 2005 Enterprise) where it works fine.
creating a unique ID and a covering index on the underlying table.
Upgrading would require downtime on the server and probably new SQL Server licences while creating the unique ID and a covering index would waste a lot of space because only a subset of the 11M rows needs full-text indexing (LRVS_VALUE is often NULL or has a very short text value).

I dont know your data; why cant you put the full text index on the original table? You could add the calculated column into your table structure. That way you wouldnt have the index rebuild operation ( i think that is the cause of your scan)
If you can't do that then next easiest change is likely to create an lookup table populated with an sp or a triggerthat way you can change the table's indexing so they make sense for your query.
Your final option (and one you would need to spend some time on to get right) would be using partitioned tables. You could have a partition that covers the data filtered on the view. Full text index the entire table; your query at run time would hit the partitioned table with the relevant data in it.

Related

SQL Server Query execution plan

I am trying to learn about optimizing databases and queries. I have a test table Objednavka, that has a foreign key attribute ODIS. In this database, queries like
SELECT * FROM Objednavka WHERE ODIS = 123
are frequent, so I created an index like this
CREATE NONCLUSTERED INDEX Objednavka_ODIS_index ON Objednavka (ODIS)
I then looked at the plan of the query I mentioned and this is what I see:
Can someone please explain why do I have the Index Seek and Key Lookup operations performed in parallel, then joined using Nested Loops? From what I learned, I thought that the Index Seek should be performed first, so that the engine finds the location of the row that contains (the indexed) ODIS attribute in the index, and then it should retrieve the whole data using Key Lookup, when it already knows the location, or am I wrong?

non clustered index has the PK columns added to it automatically by SQL Server. The index seek is to seek the index b-tree for the value(s) you provided for the indexed column(s) and the result is the PK values (or RIDs if your table is a heap). If all the columns you queried are part of the indexed columns, which is called a covering index, you query is done because SQL server can get all the information from the b-tree. If you need to return other columns not in the index, SQL server needs to find the data of the record using the PK, which is the key-lookup part.

Creating a non clustered index on a table with existing 1mln records affects that data immediately?

I have a column with 1 mln records. If I create a non clustered index on Column 'A', and then perform filtering by that column, should I immediately feel that the request takes much less time? Or I should create the index on empty table first, and only then add data to table in order to feel the power of index?

I cannot explain why you would or would not feel that a query is taking too much time.
But, once you have added an index -- and the statement completes -- then the index is available for any query that is compiled after that point in time.
As a rule, we can think that creating an index will remove the plan from the query cache. This is effectively what happens, but the actual sequence of events is that the next execution of the query will replace the plan. You can think of this as "delayed removal".
Creating an index on table when it is created means that the index will be available for all queries on the table.

Indexing a table with duplicate records

I have a SQL Server table with around 50,000 rows. The table gets updated once in a day by some upstream process.
The following query has been fired from application:
SELECT * FROM Table1 where Field1 = "somevalue"
The "Field1" column contains duplicate values. I am trying to improve performance of the above query. I cannot modify the code in the application side. So limiting column instead of "SELECT *" is not possible. I am planning to index the table. Should I define a NON-CLUSTERED index on "Field1" column in order to improve performance? Or some other kind of indexing would help? Is there any other ways to improve performance from DB side ?

Yes, a non-clustered index on Field1 should serve your purposes...
For example,
CREATE NONCLUSTERED INDEX Idx_Table1_Field1 ON Table1 (Field1)

The best thing you can do is run SP_BlitzIndex by Brent Ozar to get a better picture of your entire database index setup (including this table).
http://www.brentozar.com/blitzindex/
If your table already has a clustered index (which it should - apply one following these principles), you should first look at the execution plan to see what it is advocating.
Further, if the table is only updated every day, and presumably during off hours, you can easily compress the table and given it has repetitive data mostly, you will save over 50% IO and space on the query and incur a small CPU overhead. Table compression has no effect on the data itself, only on the space it holds. This feature is only available in SQL Server Enterprise.
Last but not least, are your data types properly set, i.e. are you pulling from datetime when the column could easily be date, or are you pulling from bigint when the column could easily be int.
Asking a question as to how to make an index really isn't a proper question for Stack, i.e.
CREATE NONCLUSTERED INDEX Idx_Table1_Field1 ON Table1 (Field1)
As it is already on MSDN and can even be created via SSMS via Create Index drop down right clicking on the index burst out section under a given table icon, the question you should be asking is how do I properly address performance improvements in my environment related to indexing. Finally, analyze whether or not your end query result really necessitates a select * - this is a common oversight on data display, a table with 30 columns is selected from a dataset when the developer only plans on showing 5 of the columns, which would be a 600% IO gain if the dataset only populated 5 columns.
Please also note the famous index maintenance script by Ole Hallengren

How to get list of values stored in index?

I'm having this issue in Oracle 11g R2. Table containing not null column which is indexed with non unique index. The index is not containing other columns.
Then I assumed that if I query distinct values of the column from the table, it would use index to get different values of the column (sounds logical to me). However at least explain plan is telling me it's doing full table scan. Also it took some time so probably the plan was not changed during run time. Optimizer index hint didn't helped.
I tried to search answer for this but no luck. Is there way to get values stored in index or somehow query the table without "touching" the table at all (like multi column index joins can)?
Thanks!
EDIT: This was about Oracle EBS gl_balances table and gl_balances_n2 index. I got answer and this changed the explain plan:
select /*+ index_ffs(gl gl_balances_n2) */
distinct gl.period_name
from gl_balances gl;

It may not be more efficient to scan the index than to scan the table -- don't forget that the index segment also contains branch nodes, and each index entry has to contain a ROWID of about 16 bytes (if memory serves).
So a "fast full index scan", which is the plan you're looking to get, may not be as fast as a full table scan. (You'd use an index_ffs() hint for that, by the way.)
edit: It be possible to use a more exotic method
Maintaining your own list by periodically querying the table using DBMS_Scheduler.
A materialized view. Complete refresh on demand might be adequate, though barely better than just periodically querying the data and maintaining your own unique list.
Making the index compressed, though that would only be of value for longish index keys.
A bitmap index -- not for a concurrently modified table though.

How to decrease response time of a simple select query?

MarketPlane table contains more than 60 million rows.
When I need the total number of plane from a particular date, I execute this query which takes more than 7 min. How can I reduce this time ?
SELECT COUNT(primaryKeyColumn)
FROM MarketPlan
WHERE LaunchDate > #date
I have implemented all things mentioned in your links even now I have implemented With(nolock) which reduce response time is to 5 min.

You will have to create an index on the table, or maybe partition the table by date.
You might also want to have a look at
SQL Server 2000/2005 Indexed View Performance Tuning and Optimization Tips
SQL Server Indexed Views

Does the table in question have an index on the LaunchDate column? Also, did you really mean to post LaunchDate>#date?

Assuming SQL-Server based on #date, although the same can be applied to most databases.
If your primary query is to select out a range of data (based on sample), adding, or altering the CLUSTERED INEDX will go a long way to improving query times.
See: http://msdn.microsoft.com/en-us/library/ms190639.aspx
By default, SQL-Server creates the Primary Key as the Clustered Index which is great from a transactional point of view, but if your focus is to retrieve the data, then altering that default makes a huge difference.
CREATE CLUSTERED INDEX name ON MarketPlan (LaunchDate DESC)
Note: Assuming LaunchDate is a static date value and is primarily inserted in increasing/sequential order to minimize index fragmentation.

There are some fine suggestions here, if all else fails, consider a little denormalization, create another table with the cumulative counts and update it with a trigger. If you have more queries of this nature think about OLAP

Your particular query does not require clustered key on the date column. It would actually run better with nonclustered index with the leading date column because you don't need to do key lookup in this query, so the nonclustered index would be covering and more compact than clustered (it implicitly includes clustered key columns).
If you have it indexed properly and it still does not perform it is most likely fragmentation. In this case defragment the index and try again.

Create a new index like this:
CREATE INDEX xLaunchDate on MarketPlan (LaunchDate, primaryKeyColumn)
Check this nice article about how an index can improve the performance.
http://blog.sqlauthority.com/2009/10/08/sql-server-query-optimization-remove-bookmark-lookup-remove-rid-lookup-remove-key-lookup-part-2/

"WHERE LaunchDate > #date"
Is the value of parameter #date defined in the same batch (or transaction or context)?
If not, then this would lead to Clustered Index Scan (of all rows) instead of Clustered Index Seek (of just rows satisfying WHERE condition) if its value is coming from outside of current batch (as, for example, input parameter of stored procedure or udf function).
The query cannot be fully optimized by SQL Server optimizer (at compile time) leading to full table scan since the value of parameter is known only at run-time
Update: Comment to answers proposing OLAP.
OLAP is just concept, SSAS cubes is just one of the possible ways of OLAP implementation.
It is convenience, not obligation in getting/using OLAP concept.
You have not use SSAS to use OLAP concept.
See, for ex., Simulated OLAP
Update2: Comment to question in comments to answer:
MDX performance vs. T-SQL
MDX is an option/convenience/feature/functionality provided by SSAS (cubes/OLAP) not obligation

The simplest thing you can do is:
SELECT COUNT(LaunchDate)
FROM MarketPlan
WHERE LaunchDate > #date
This will guarantee you index-only retrieval for any LaunchDate index.
Also (this depends on your execution plan), I have seen instances (but not specific to SQL Server) in which > did a table scan and BETWEEN used an index. If you know the top date you might try WHERE LaunchDate BETWEEN #date AND <<Literal Date>>.

How wide is the table? If the table is wide (ie: many columns of (n)char, (n)varchar or xml) there might be a significant amount of IO causing the query to run slowly as a result of using the clustered index.
To determine if IO is causing the long query time perform the following:
Create a non-clustered index only on the LaunchDate column.
Run the query below which counts LaunchDate and forces the use of the new index.
SELECT COUNT(LaunchDate)
FROM MarketPlan WITH (INDEX = TheNewIndexName)
WHERE LaunchDate > #date
I do not like to use index hints and I only suggest this hint only to prove if the IO is causing long query times.

There are two ways to do this
First create a clustered index on the date column, since query is date range specific, all the data will be in the actual order and this will avoid having to scan through all records in the table
You can try using Horizontal partioning, this will affect your existing table design but this is the most optimal way to do so, see this
http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas