Improve SQl query by indexing or aggregated index - sql

Assume this general SQL query takes too long to execute. Which of the 4 options below can be done to improve its performance? Why? This is just for the understanding of SQL, not a dbms specific.
CREATE INDEX IDX ON Vaccinations (COUNT(*), Site)
Wrap the query in a view and ask users to query from the view.
CREATE AGGREGATE INDEX ON Vaccinations(COUNT(*))
CREATE INDEX IDX ON Vaccinations (Site)
SELECT Site, COUNT(*) AS Number of Vaccinations
FROM Vaccinations
GROUP BY Site;
A few rows from the Vaccinations table is shown here:
Personally, I'm leaning towards 1. Since the count(*) is the only thing being used in the SELECT...FROM as a condition. However, I'm not entirely sure what the "aggregate index" option entails. Can someone explain when it's preferred over normal index? Thanks!

Related

Indexed views. Query ignoring view and uses table instead

My task is to optimize this query:
Declare #sumBalance NUMERIC = (select SUM(CURRENT_BALANCE) as Balance
from dbo.ACCOUNT_DETAILS)
select #sumBalance
I've read that the best solution for aggregation functions is using indexed views instead of tables.
I've created the view with SCHEMABINDING:
CREATE VIEW dbo.CURRENT_BALANCE_VIEW
WITH SCHEMABINDING
AS
SELECT id,CURRENT_BALANCE
FROM dbo.ACCOUNT_DETAILS
After that I've created 2 indexes:
The first for ID
CREATE UNIQUE CLUSTERED INDEX index_ID_VIEW ON dbo.View(ID);
The second for CURRENT_BALANCE my second one column
CREATE NONCLUSTERED INDEX index_CURRENT_BALANCE_VIEW
ON dbo.CURRENT_BALANCE_VIEW(ID);
And here I got troubles with new query:
Declare #sumBalance NUMERIC = (select SUM(CURRENT_BALANCE) as Balance
from dbo.CURRENT_BALANCE_VIEW)
select #sumBalance
New query doesn't use view
http://i.stack.imgur.com/jlPEd.png
Somehow my indexes were added to the folder Statistics
Look in another post
I don't understand why I can see index 'index_current_balance' cause there is no such an index in the table
Look in another post
P.S. Already tried create index in the table and it helped. It made query works faster from 0.2 Es.operator cost to 0.009 but anyway it must be faster.
p.s.s Sorry for making you click on the link, my reputation doesn't allow me to past images properly =\
p.s.s.s Working with SQL Server 2014
p.s.s.s.s Just realized that I don't need to sum 0-s. Expected them grom function.
Thanks in advance.
if you use Standard Edition of SQL-Server you have to use the NOEXPAND-Hint in order to use the index of a view.
For example:
SELECT *
FROM dbo.CURRENT_BALANCE_VIEW (NOEXPAND);
This query:
Declare #sumBalance NUMERIC = (select SUM(CURRENT_BALANCE) as Balance
from dbo.ACCOUNT_DETAILS);
select #sumBalance;
is not easy to optimize. The only index that will help it is:
create index idx_account_details_current_balance on account_details(current_balance);
This is a covering index for the query, and can be used for the SUM(). However, the index still needs to be scanned to do the SUM(). Scanning the index should be faster than scanning the table because it is likely to be much smaller.
SQL Server 2012+ has a facility called columnstore indexes that would have the same effect.
The advice for using indexed views for aggregation functions doesn't seem like good advice. For instance, if the above query used MIN() or MAX(), then the above index should be the optimal index for the query, and it should run quite fast.
EDIT:
Your reference article is quite reasonable. If you want to create an indexed view for this purpose, then create it with aggregation.
CREATE VIEW dbo.CURRENT_BALANCE_VIEW
WITH SCHEMABINDING
AS
SELECT SUM(CURRENT_BALANCE) as bal, COUNT_BIG(CURRENT_BALANCE) as cnt
FROM dbo.ACCOUNT_DETAILS;
This is a little weird, because it returns one row. I think the following will work:
create index idx_account_details on current_balance_view(bal);
If not, you may need to introduce a dummy column for the index.
Then:
select *
from dbo.current_balance_view;
should have the precomputed value.

How to create database INDEX for SQL expression?

I am beginner with indexes. I want to create index for this SQL expression which takes too much time to execute so I would like on what exact columns should I create index?
I am using DB2 db but never mind I think that question is very general.
My SQL expression is:
select * from incident where (relatedtoglobal=1)
and globalticketid in (select ticketid from INCIDENT where status='RESOLVED')
and statusdate <='2012-10-09 12:12:12'
Should I create index with this 5 columns or how?
Thanks
Your query:
select *
from incident
where relatedtoglobal = 1
and globalticketid in ( select ticketid
from INCIDENT
where status='RESOLVED'
)
and statusdate <='2012-10-09 12:12:12' ;
And the subquery inside:
select ticketid
from INCIDENT
where status='RESOLVED'
An index on (status, ticketid) will certainly help efficiency of the subquery evaluation and thus of the query.
For the query, besides the previous index, you'll need one more index. The (relatedtoglobal, globalticketid) may be sufficient.
I'm not sure if a more complex indexing would/could be used by the DB2 engine.
Like one on (relatedtoglobal, globalticketid) INCLUDE (statusdate) or
Two indices, one on (relatedtoglobal, globalticketid) and one on (relatedtoglobal, statusdate)
The DB2 documentation is not an easy read but has many details. Start with CREATE INDEX statement and Implementing Indexes.

sql count performance [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Count(*) vs Count(1)
If I have a table, 'id' is primary key, then these two commands have different performance ?
select count(*) from t;
select count(id) from t;
thanks
These would have the same performance. In most databases, count() results in a scan of the table or available indexes. Whether or not it uses the index instead of the table depends only on the query optimizer. If the optimizer is smart enough to use the index, it should be smart enough in both cases.
Using available metadata tables, you can often get the number of rows in a table much mroe efficiently than by using a count() query.
No, Oracle takes over and takes the fastest way in case of count(*)
I think if id is primary Key both count(*) and count(id) are semantically equivalent.
But for readers count(id) means the intention to count all rows where id is not null. To avoid confusions I would rather use count(*).

Selecting from a Large Table SQL 2005

I have a SQL table it has more than 1000000 rows, and I need to select with the query as you can see below:
SELECT DISTINCT TOP (200) COUNT(1) AS COUNT, KEYWORD
FROM QUERIES WITH(NOLOCK)
WHERE KEYWORD LIKE '%Something%'
GROUP BY KEYWORD ORDER BY 'COUNT' DESC
Could you please tell me how can I optimize it to speed up the execution process? Thank you for useful answers.
I'd first look at the execution plan to see how sql server is trying to access your data. Here is a link to just one of many articles on execution plan analysis.
Asking a question about SQL Server performance without providing a schema is a complete waste of everybody time. I'm going to answer a different question, which is one you should had been ask in the first place:
What schema should I use to
efficiently satisfy a query like
SELECT DISTINCT TOP (200) COUNT(1) AS
COUNT, KEYWORD FROM QUERIES WHERE
KEYWORD LIKE '%Something%'GROUP BY
KEYWORD ORDER BY 'COUNT' DESC when QUERIES table has over 1M rows?
The proper schema depend on the selectivity of KEYWORD. One possible design would be to normalize KEYWORD into a lookup table and have a narrow non-clustered index on the lookup id:
CREATE TABLE KEYWORDS (KeywordId INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
Keyword VARCHAR(...) UNIQUE);
CREATE TABLE QUERIES (...,
KeywordId INT NOT NULL,
CONSTRAINT FK_KEYWORD
FOREIGN KEY KeywordId
REFERENCES KEYWORDS (KeywordId),
...);
CREATE INDEX ndxQueriesKeyword ON Queries (KeywordId);
If the number of distinct keyword is relatively low, the original query can be satisfied quickly by a scan of the Keywqord table followed by a nexted loop range scan of the ndxQueriesKeyword index, which is very narrow and therefore generates low IO.
As the number of distinct keyword increases, this approach may start showing problems due to the high number of range scans on the Queries table, and possible even due to the full scan on the Keywords table.
You may consider using a different WHERE clause, namely one LIKE 'Something%, which is SARGable and can leverage an index on KEYWORK, benefiting from a range reduction and a narrower scan than a full table scan.
If you are on Enterprise Edition you can consider adding an indexes view with the pre-computed aggregates:
CREATE VIEW vwQueryKeywords
WITH SCHEMABINDING
AS SELECT KEYWORD, COUNT_BIG(*) as COUNT
FROM dbo.QUERIES
GROUP BY KEYWORD;
CREATE CLUSTERED INDEX cdxQueryKeywords ON vwQueryKeywords(KEYWORD);
On EE the optimizer will consider the indexed view for the original query. On non-EE you will have to change the query to run against the view with the NOEXPAND hint:
SELECT KEYWORD, COUNT
FROM vwQueryKeywords WITH(NOEXPAND)
WHERE KEYWORD LIKE '%Something%';
Another completely different approach is to ditch the LIKE '%Something%' condition altogether in favor of fullt-text search:
SELECT DISTINCT TOP (200) COUNT(1) AS
COUNT, KEYWORD FROM QUERIES WHERE
CONTAINS (Keyword, Something)
GROUP BY
KEYWORD ORDER BY 'COUNT' DESC
Because the FT search is a reverse-index lookup, it may prove optimal over a traditional WHERE. The only issue is that you'll only be able to search for full words, since FT won't let you search partial matches the way LIKE does. Again, the actual mileage will vary based on Keyword data profile (ie. its statistics and distribution).
As Jeremy stated, you need to look at the execution plan and client statistics to see what is faster. However, a couple of suggestions. First, do you really need a prefixing wildcard on your search? I.e., LIKE '%Something%' will not be able to use an index whereas LIKE 'Something%' will. Second, you might try a CTE to see if will be faster. So, something like:
;With NumberedItems As
(
Select Keyword, Count(*) As [Count]
, ROW_NUMBER() OVER ( ORDER BY Keyword, Count(*) DESC ) As ItemRank
From Queries WITH (NOLOCK)
Where Keyword LIKE '%Something%'
Group By Keyword
)
Select Keyword, [Count]
From NumberedItems
Where ItemRank <= 200
It's rather hard to guess what may be causing the performance issues with just a query and no schema or execution plan. You should definitely read-up on them as all performance tuning of SQL queries is ultimately driven by the execution plan.
If you really want to delve into it, you can also read up on the query optimizer which attempts to execute your query using the most optimal plan. Understanding the optimizer is important to ensure you are taking full advantage of the indexes, etc. you have on the database. Microsoft also has several helpful documents such as this on troubleshooting performance issue.
For your particular case, the bottleneck is most likely in the WHERE clause. LIKE comparisons tend to be inefficient, especially when surrounded by percent signs as the query tends to be unable to take advantage of indexes on the column, etc. Depending on how you've stored data, full-text indexing may be a useful option, as that can frequently outperform LIKE '%SOMEVALUE%'.
If you can't use a full-text search engine from a third party, create an inverted index from your text periodically and search that instead. A naive implementation would beat your current strategy.
http://en.wikipedia.org/wiki/Inverted_index
Your query is not optimizable (without implementing some form of full-text indexing, itself expensive) because you have a leading wildcard in your keyword match. You would need to split the keywords out into separate column values (probably in a separate, related table) and search on an exact match or, at least, a match with the wildcard not at the beginning of the text.
Additionally the results you're getting may not be accurate if you have some keywords that are nested in others (eg "cart" will match a keyword search on "car", which is not what you want).

In SQL, what’s the difference between count(*) and count('x')? [duplicate]

This question already has answers here:
In SQL, what's the difference between count(column) and count(*)?
(12 answers)
Closed 9 years ago.
I have the following code:
SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;
Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?
(This question is related to a previous one)
To say that SELECT COUNT(*) vs COUNT(1) results in your DBMS returning "columns" is pure bunk. That may have been the case long, long ago but any self-respecting query optimizer will choose some fast method to count the rows in the table - there is NO performance difference between SELECT COUNT(*), COUNT(1), COUNT('this is a silly conversation')
Moreover, SELECT(1) vs SELECT(*) will NOT have any difference in INDEX usage -- most DBMS will actually optimize SELECT( n ) into SELECT(*) anyway. See the ASK TOM: Oracle has been optimizing SELECT(n) into SELECT(*) for the better part of a decade, if not longer:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1156151916789
problem is in count(col) to count()
conversion
**03/23/00 05:46 pm *** one workaround is to set event 10122 to
turn off count(col) ->count()
optimization. Another work around is
to change the count(col) to count(),
it means the same, when the col has a
NOT NULL constraint. The bug number is
1215372.
One thing to note - if you are using COUNT(col) (don't!) and col is marked NULL, then it will actually have to count the number of occurrences in the table (either via index scan, histogram, etc. if they exist, or a full table scan otherwise).
Bottom line: if what you want is the count of rows in a table, use COUNT(*)
The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.
i.e. in the simple case below, the query will return immediately, without needing to examine any rows.
select count(*) from table
I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.
To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.
This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.
To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.
Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS
MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count
I'm guessing with a having clause with a count in it may change things.