Can anyone tell me what's the disadvantages of using OPTION (FAST n) in SQL Queries.
For example, I grab 100,000 records so quickly, but does this make effect on other processes of SQL Server?
I am moving a bit close to my issue.
I have to run a data process every week. So the first result comes out after 5-7 seconds and then I do my data process on these results. The results normally consists of few thousand rows. and every row take a few seconds to be processed. Normally the process waits for the whole result to be there then it start processing. The result comes out in dataset (I am using c# console app), I So I want the top 10 results to comes out quickly so that I can start the process immediately and then the rest of the rows comes out and add in the queue and wait for there turn.
Any idea how can I do this.
Thanks
Option fast forces the query optimizer to not optimize the total runtime of the query, but the time it takes to fetch the first N rows.
if you have 2 tables of 1 million rows you want to join, a standard query plan is a hashmap of one table (temp table of a million rows) and then use a hashmap lookup on the other.
a fast 10 optimisation would probably just use nested loops, because the effort of building that 1 million row hashmap is quite a bit more than the fast 10 steps of nested loop. If you are after all 1 million rows, the nested loop could take 3 times longer, but under fast 10, you'll get those 10 quicker. (this example assumes the existence of a suitable index)
Related
I have a table with 3 columns:
time (timestamptz)
price (numeric(8,2))
set_id (int)
The table contains 7.4M records.
I've created an simple index for time and an index for set_id.
I want to run the following query:
select * from test_prices where time BETWEEN '2015-06-05 00:00:00+00' and '2020-06-05 00:00:00+00';
Depsite my indices, the query takes 2 minutes and 30 seconds.
See explain analze stats: https://explain.depesz.com/s/ZwSH
GCP postgres DB has the following stats:
What do I miss here? Why is this query so slow and how can I improve?
According to your explain plan, the row is returning 1.6 million rows out of 4.5 million. That means that a significant portion of rows are being returned.
Postgres wisely decides that a full table scan is more efficient than using an index, because there is a good chance that all the data pages will need to be read anyway.
It is surprising that you are reporting 00:02:30 for the query. The explain is saying that the query completes in about 1.4 seconds -- which seems reasonable.
I suspect that the elapsed time is caused by the volume of data being returned (perhaps the rows are very wide), a slow network connection to the database, or contention on the database/server.
Your query selects two thirds of the table. A sequential scan is the most efficient way to process such a query.
Your query executes in under 2 seconds. It must be your client that takes a long time to render the query result (pgAdmin is infamous for that). Use a different client.
I am troubleshooting a slow query, it runs in less than 100 ms 99% of the time, but once in an hr (or two no pattern, i guess), goes bad and does 6 million reads and takes 11 seconds! I saw the query plan, it does do a clustered index scan, I noticed the cached_plans dynamic management view use counts column keeps increasing every time the query executes, so i am thinking its the same plan, just wondering why at one point it goes out-of-whack! any pointers will be helpful. I haven't tried anything as it runs pretty fast most of the time.
First something could easily be blocking the query to make it run slow. Otr there could be other things happening onthe server at the same time that are consuming most of its resources.
Next, the parameters of the query might be bad for the saved execution plan.
Or the statistics might be out of date
Or if the query is an action query as opposed to a select, the particular parameters may be causing a problem in a trigger that makes it take longer.
Or teh query might be returning significanlty more results at times. If you run it at 10 and return 10 results and an import puts more records inteh table that meet the query conditions, at 10:30 you might return a million results which would clearly be slower.
One thing I like to do in such circumstances is set up logging so that the exact query is logged with the time at the time of execution. Then you can see what the query that ran sloweractaully was if you have varaible , than might be differnt from run to run.
I have a single large denormalized table that mirrors the make up of a fixed length flat file that is loaded yearly. 112 columns and 400,000 records. I have a unique clustered index on the 3 columns that make up the where clause of the query that is run most against this table. Index Frag is .01. Performance on the query is good, sub second. However, returning all the records takes almost 2 minutes. The execution plan shows 100% of the cost is on a Clustered Index Scan (not seek).
There are no queries that require a join (due to the denorm). The table is used for reporting. All fields are type nvarchar (of the length of the field in the data file).
Beyond normalizing the table. What else can I do to improve performance.
Try paginating the query. You can split the results into, let's say, groups of 100 rows. That way, your users will see the results pretty quickly. Also, if they don't need to see all the data every time they view the results, it will greatly cut down the amount of data retrieved.
Beyond this, adding parameters to the query that filter the data will reduce the amount of data returned.
This post is a good way to get started with pagination: SQL Pagination Query with order by
Just replace the "50" and "100" in the answer to use page variables and you're good to go.
Here are three ideas. First, if you don't need nvarchar, switch these to varchar. That will halve the storage requirement and should make things go faster.
Second, be sure that the lengths of the fields are less than nvarchar(4000)/varchar(8000). Anything larger causes the values to be stored on a separate page, increasing retrieval time.
Third, you don't say how you are retrieving the data. If you are bringing it back into another tool, such as Excel, or through ODBC, there may be other performance bottlenecks.
In the end, though, you are retrieving a large amount of data, so you should expect the time to be much longer than for retrieving just a handful of rows.
When you ask for all rows, you'll always get a scan.
400,000 rows X 112 columns X 17 bytes per column is 761,600,000 bytes. (I pulled 17 out of thin air.) Taking two minutes to move 3/4 of a gig across the network isn't bad. That's roughly the throughput of my server's scheduled backup to disk.
Do you have money for a faster network?
I have a very big table contains around 20 million rows. I have to fetch some 4 million rows from this table based on some filtering criteria.
All the columns in filtering criteria are covered by some index and table stats are upto date.
I have been suggested that instead of loading all rows in a single go, use a batch size e.g. say 80000 rows at a time and that will be faster compared to loading all the rows at a time.
Can you suggest if this idea makes sense?
If it makes sense, what will be optimal row size to load at a time.
It can be much faster than single sql.
Split data using PK.
Batch size. It depends on the length of lines and processing time. Start with 10 000.
Thread job if possible.
Use SSIS to manipulate your data...it does everything you are wanting like threading and optimizations on load sizing and cache.
Spin up a cube or look into Business Intelligence Data Warehouse Tools...
I saw something from an "execution plan" article:
10 rows fetched in 0.0003s (0.7344s)
(the link: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ )
How come there are 2 durations shown? What if I don't have large data set yet. For example, if I have only 20, 50, or even just 100 records, I can't really measure how faster 2 different SQL statements compare in term of speed in real life situation? In other words, there needs to be at least hundreds of thousands of records, or even a million records to accurately compares the performance of those 2 different SQL statements?
For your first question:
X row(s) fetched in Y s (Z s)
X = number of rows (of course);
Y = time it took the MySQL server to execute the query (parse, retrieve, send);
Z = time the resultset spent in transit from the server to the client;
(Source: http://forums.mysql.com/read.php?108,51989,210628#msg-210628)
For the second question, you will never ever know how the query performs unless you test with a realistic number of records. Here is a good example of how to benchmark correctly: http://www.mysqlperformanceblog.com/2010/04/21/mysql-5-5-4-in-tpcc-like-workload/
That blog in general as well as the book "High Performance MySQL" is a goldmine.
The best way to test and compare performance of operations is often (if not always !) to work with a realistic set of data.
If you plan on having millions of rows when your application is in production, then, you should test with millions of rows right now, and not only a dozen !
A couple of tips :
While benchmarking, use select SQL_NO_CACHE ..., instead of select ...
This will prevent MySQL from using its query cache (which would make the first query take a normal amount of time, and re-executing it several times a lot faster)
Learn how to use EXPLAIN, and understand its output
Read the Chapter 7. Optimization section of the manual ;-)
Generally when there are 2 times shown, one is CPU time and one is wall-clock time. I cannot recall which is which, but it appears that the first is the CPU time and the second is elapsed time.