I'm running a query against an Azure SQL DB...
select Id
from Table1
WHERE ([Table1].[CustomFieldString2] IS NULL) AND
(N'New' = [Table1].[CustomFieldString7]) AND (0 = [Table1].[Deleted])
This query runs fast roughly 300ms...
As soon as I add another column to my select (bool) as in
Select Id, IsActive
my query is super slow (minutes)
This doesn't make any sense...
Was wondering if anyone knew what this could be
In Summary, when you add columns which are not part of index to the select then SQL can't choose the same execution plan.
If SQL estimates there are fewer rows, then it will opt to use nested lookups in the execution plan. This can take more time, if estimates are wrong.
If there are more rows or key lookup cost crosses some threshold, SQL may then decide that a scan of the table is likely to be more efficient.
Try adding isactive to the included column list, if the query performance is not acceptable.
You query constructure is important of course but Azure is naturally slow. Is is using cloud systems so it is not so fast (I supposed using free version.) ı have not seen anyone pleasure about azure velocity. (in low prices)
Related
I have the following query:
SELECT * FROM messages GROUP BY peer
(really it's more complicated with joins, but I omitted them here for simplicity)
The problem is that SQLite doesn't use any indexes and always performs a full scan of the table. Expectedly, it works fast on small data sets but it's noticeably slow with a big table containing thousands of rows. Here's the output of the EXPLAIN QUERY PLAN command:
0|0|0|SCAN TABLE messages USING INDEX messages_peer_mid (~1000000 rows)
Despite it says "USING INDEX" it still performs a full scan. Is there any way to make SQLite use index for this query or it's better to give up with GROUP BY and look for some other approach?
The plan takes into account the amount of data and performs a scan because it's algorithm probably concludes it's faster to do so.
Other comments, your query has no WHERE condition and you are returning ALL columns so why wouldn't you expect a table scan?
Indexes assist in selecting records from a table (using a WHERE clause or as a result of a JOIN operation). GROUP BY is performed on a set of records after they've been selected and retrieved from the table. It cannot be assisted by indexes.
If you want to know more about what options are available for index use in your query, please post the entire query.
Also, you note that the SQL you gave is a symbolic representation of the code you're running, but if you're really using *, or any non-aggregated field names other than peer in your statement you may not be getting the results you want.
Finally, you ask "it's better to give up with GROUP BY and look for some other approach?" GROUP BY is used for a specific function in SQL (producing new aggregated result sets from non-aggregated data). If that's your goal, GROUP BY is likely to be the best solution (because it defers to the database engine, which is highly optimized and cognizant of database statistics the decision about how to retrieve and process the data). If that's not your goal and you're trying to do something else using GROUP BY as an "approach" to that other functionality, let us know what it is you're actually trying to achieve.
Below is my query, I use four joins to access data from three different tables, now when searching for 1000 records it takes around 5.5 seconds, but when I amp it up to 100,000 it takes what seems like an infinite amount of time, (last cancelled at 7 hours..)
Does anyone have any idea of what I am doing wrong? Or what could be done to speed up the query?
This query will proabably end up having to be run to return millions of records, I've only limited it to 100,000 for the purpose of testing the query and it seems to fall over at even this small amount.
For the record im on oracle 8
CREATE TABLE co_tenancyind_batch01 AS
SELECT /*+ CHOOSE */ ou_num,
x_addr_relat,
x_mastership_flag,
x_ten_3rd_party_source
FROM s_org_ext,
s_con_addr,
s_per_org_unit,
s_contact
WHERE s_org_ext.row_id = s_con_addr.accnt_id
AND s_org_ext.row_id = s_per_org_unit.ou_id
AND s_per_org_unit.per_id = s_contact.row_id
AND x_addr_relat IS NOT NULL
AND rownum < 100000
Explain Plan in Picture : http://imgur.com/Xw9x4BA (easy to read)
Your test based on 100,000 rows is not meaningful if you are then going to run it for many millions. The optimiser knows that it can satisfy the query faster when it has a stopkey by using nested loop joins.
When you run it for a very large data set you're likely to need a different plan, with hash joins most likely. Covering indexes might help with that, but we can't tell because the selected columns are missing column aliases that tell us which table they come from. You're most likely to hit memory problems with large hash joins, which could be ameliorated with hash partitioning but there's no way the Siebel people would go for that -- you'll have to use manual memory management and monitor v$sql_workarea to see how much you really need.
(Hate the visual explain plan, by the way).
First of all, can you make sure there is an index on S_CONTACT table and it is enabled ?
If it is so, try the select statement with /*+ CHOOSE */ hint and have another look at the explain plan to see if optimizer mode is still RULE. I believe cost based optimizer would result better in this query.
If still rule try updating database statistics and try again. You can use DBMS_STATS package for that purpose, if i am not wrong it was introduced with version 8i. Are you using 8i ?
And at last, i don't know the record numbers, the cardinality between tables. I might have been more helpful if i knew the design.
Your dataset, looking at the last execution plan appear to be huge, you could limit access to the base table instead of limiting the number of returned row, like this:
CREATE TABLE co_tenancyind_batch01 AS
SELECT /*+ CHOOSE */ ou_num,
x_addr_relat,
x_mastership_flag,
x_ten_3rd_party_source
FROM s_org_ext,
s_con_addr,
s_per_org_unit,
(select * from s_contact where rownum <= 100000) cont
WHERE s_org_ext.row_id = s_con_addr.accnt_id
AND s_org_ext.row_id = s_per_org_unit.ou_id
AND s_per_org_unit.per_id = cont.row_id
AND x_addr_relat IS NOT NULL
should improve but not be extremely quick.
Well, maybe I am too old school and I would like to understand the following.
query 1.
select count(*), gender from customer
group by gender
query 2.
select count(*), 'M' from customer
where gender ='M'
union
select count(*), 'F' from customer
where gender ='F'
the 1st query is simpler, but for some reason in the profiler,when I execute both at the same time, it says that query 2 uses 39% of the time, and query 1, 61%.
I would like to understand the reason, maybe I have to rewrite all my queries.
Your query 2 is actually a nice trick. It works like this: You have an index on gender. The DBMS can seek into that index two times to get two ranges of rows (one for M and one for F). It doesn't need to read anything from these rows, just that they exist. It can count the number of rows that exist in the two ranges.
In the first query the DBMS needs to decode the rows to read the gender, then it needs to either sort the rows or build a hashtable to aggregate them. That is more expensive than just counting rows.
Are you sure?
Maybe the second query is just using cached resources from the first on.
run them in two separately batches and before each one run DBCC FREEPROCCACHE to clean the cache. Then compare the values of each execution plan.
The optimization of a query depends on the database. What you are seeing is database specific.
The union, as written, would naively require two passes through the data, doing a filter and a count. Basically no other storage is necessary.
The aggregation might sort the data and then do a count. Or, it might generate a hash table. Given the performance difference, I would guess a sort is being used. Clearly, this is overkill for this type of query.
If you have an index on gender, both methods would essentially scan the index so the performance should be similar (the union version might scan it twice=.
Does the database that you are using offer a way to calculate statistics on tables? If so, you should update the statistics and see if you still get the same results.
Also, can you post the results of "explain" or the execution plan? That would precisely explain why one is faster than the other.
I tried an equivalent query, but found the opposite result; the union took 65% and the 'group by' took 35%. (Using SQL Server 2008). I do not have an index on gender so my execution plan shows a clustered index scan. Unless you examine the execution plan in detail, it really isn't possible to explain this result.
Adding an index for this query is probably not a good idea, since you are probably not going to be running this query nearly as often as you are going to insert records in the customer table. In some other database engines with bitmap indexes (Oracle, PostgreSQL), the database engine can combine multiple indexes, so that can alter the utility of single column indexes. But in SQL Server, you need to design the indexes to 'cover' the commonly used queries.
i found a in a table there are 50 thousands records and it takes one minute when we fetch data from sql server table just by issuing a sql. there are one primary key that means a already a cluster index is there. i just do not understand why it takes one minute. beside index what are the ways out there to optimize a table to get the data faster. in this situation what i need to do for faster response. also tell me how we can write always a optimize sql. please tell me all the steps in detail for optimization.
thanks.
The fastest way to optimize indexes in table is to use SQL Server Tuning Advisor. Take a look http://www.youtube.com/watch?v=gjT8wL92mqE <-- here
Select only the columns you need, rather than select *. If your table has some large columns e.g. OLE types or other binary data (maybe used for storing images etc) then you may be transferring vastly more data off disk and over the network than you need.
As others have said, an index is no help to you when you are selecting all rows (no where clause). Using an index would be slower in such cases because of the index read and table lookup for each row, vs full table scan.
If you are running select * from employee (as per question comment) then no amount of indexing will help you. It's an "Every column for every row" query: there is no magic for this.
Adding a WHERE won't help usually for select * query too.
What you can check is index and statistics maintenance. Do you do any? Here's a Google search
Or change how you use the data...
Edit:
Why a WHERE clause usually won't help...
If you add a WHERE that is not the PK..
you'll still need to scan the table unless you add an index on the searched column
then you'll need a key/bookmark lookup unless you make it covering
with SELECT * you need to add all columns to the index to make it covering
for a many hits, the index will probably be ignored to avoid key/bookmark lookups.
Unless there is a network issue or such, the issue is reading all columns not lack of WHERE
If you did SELECT col13 FROM MyTable and had an index on col13, the index will probably be used.
A SELECT * FROM MyTable WHERE DateCol < '20090101' with an index on DateCol but matched 40% of the table, it will probably be ignored or you'd have expensive key/bookmark lookups
Irrespective of the merits of returning the whole table to your application that does sound an unexpectedly long time to retrieve just 50000 rows of employee data.
Does your query have an ORDER BY or is it literally just select * from employee?
What is the definition of the employee table? Does it contain any particularly wide columns? Are you storing binary data such as their CVs or employee photo in it?
How are you issuing the SQL and retrieving the results?
What isolation level are your select statements running at (You can use SQL Profiler to check this)
Are you encountering blocking? Does adding NOLOCK to the query speed things up dramatically?
What is the Big-O for SQL select, for a table with n rows and for which I want to return m result?
And What is the Big-O for an Update, or delete, or Create operation?
I am talking about mysql and sqlite in general.
As you don't control the algorithm selected, there is no way to know directly. However, without indexes a SELECT should be O(n) (a table scan has to inspect every record which means it will scale with the size of the table).
With an index a SELECT is probably O(log(n)) (although it would depend on the algorithm used for indexing and the properties of the data itself if that holds true for any real table). To determine your results for any table or query you have to resort to profiling real world data to be sure.
INSERT without indexes should be very quick (close to O(1)) while UPDATE needs to find the records first and so will be slower (slightly) than the SELECT that gets you there.
INSERT with indexes will probably again be in the ballpark of O(log(n)^2) when the index tree needs to be rebalanced, closer to O(log(n)) otherwise. The same slowdown will occur with an UPDATE if it affects indexed rows, on top of the SELECT costs.
Edit: O(log(n^2)) = O(2log(n)) = O(log(n)) did you mean O(log(n)^2)?
All bets are off once you are talking about JOIN in the mix: you will have to profile and use your databases query estimation tools to get a read on it. Also note that if this query is performance critical you should reprofile from time to time as the algorithms used by your query optimizer will change as the data load changes.
Another thing to keep in mind... big-O doesn't tell you about fixed costs for each transaction. For smaller tables these are probably higher than the actual work costs. As an example: the setup, tear down and communication costs of a cross network query for a single row will surely be more than the lookup of an indexed record in a small table.
Because of this I found that being able to bundle a group of related queries in one batch can have vastly more impact on performance than any optimization I did to the database proper.
I think the real answer can only be determined on a case by case basis (database engine, table design, indices, etc.).
However, if you are a MS SQL Server user, you can familiarize yourself with the Estimated Execution Plan in Query Analyzer (2000) or Management Studio (2005+). That gives you a lot of information you can use for analysis.
All depends on how (well) you write your SQL and how well your database is designed for the operation you are performing. Try to use the explain plan function to see how things will be executed by the db. The. You can calculate the big-O