Teradata SQL Optimization

Teradata SQL Optimization - sql

I hope this is concise. I am basically looking for a methodology on how to improve queries after watching one of my colleagues speed up my query almost 10 fold with a quick change
I had a query that had two tables t_item and t_action
t_item is basically an item with characteristics and t_action is the events or actions that are performed on this item with a time stamp for each action each action also has an id
My query joined the two tables on id. There were also some criteria made on t_action.action_type which is free text
My simplified original query was like below
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
WHERE t_action.action_type LIKE ('%PURCHASE%')
AND t_item.location = 'DE'
This ran OK, it came back in roughly 8 mins
My colleague changed it so that the t_action.action_type ended up in the FROM portion of the SQL. This reduced the time to 2 mins
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE ('%PURCHASE%')
WHERE t_item.location = 'DE'
My question is, Generally, how do you know when to put limits in the FROM clause vs in the WHERE clause.
I thought that Teradata SQL optimizer does this automatically
Thank you for your help

In this case, you don't actually need to understand the plan. You just need to see if the two plans are the same. Teradata has a pretty good optimizer, so I would not expect there to be a difference between the two version (could be, but I would be surprised). Hence, caching is a possibility for explaining the difference in performance.
For this query:
SELECT *
FROM t_item JOIN
t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE '%PURCHASE%'
WHERE t_item.location = 'DE';
The best indexes are probably on t_item(location, pk) and t_action(action_type). However, you should try to get rid of the wildcards for a production query. This makes the query harder to optimize, which in turn might have a large impact on performance.

I tried to create similar query but didn't see any difference in the explain plan..though record counts were less trans(15k) and accounts(10k) with indexes on Account_number. Probably what Gordon has specified , try to run the query at different time and also check explain plan for both the queries to see any difference.
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
where t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
and t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.

The general order of query processing on Teradata is :
Where/And + Joins
Aggregate
Having
Olap/Window
Qualify
Sample/Top
Order By
Format
An easy way to remember is WAHOQSOF - as in Wax On, Wax Off :)

Related

SQL Server 2012 Estimated Row Numbers Much Different than Actual

I have a query that cross joins two tables. TABLE_1 has 15,000 rows and TABLE_2 has 50,000 rows. A query very similar to this one has run in the past in roughly 10 minutes. Now it is running indefinitely with the same server situation (i.e. nothing else running), and the very similar query is also running indefinitely.
SELECT A.KEY_1
,A.FULL_TEXT_1
,B.FULL_TEXT_2
,B.KEY_2
,MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0) AS confidence
FROM #TABLE_1 A
CROSS JOIN #TABLE_2 B
WHERE MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0) >= 0.9
When I run the estimated execution plan for this query, the Nested Loops (Inner Join) node is estimated at 96% of the execution. The estimated number of rows is 218 million, even though cross joining the tables should result in 15,000 * 50,000 = 750 million rows. When I add INSERT INTO #temp_table to the beginning of the query, the estimated execution plan puts Insert Into at 97% and estimates the number of rows as 218 million. In reality, there should be less than 100 matches that have a similarity score above 0.9.
I have read that large differences in estimated vs. actual row counts can impact performance. What could I do to test/fix this?

I have read that large differences in estimated vs. actual row counts can impact performance. What could I do to test/fix this?
Yes, this is true. It particularly affects optimizations involving join algorithms, aggregation algorithms, and indexes.
But it is not true for your query. Your query has to do a nested loops join with no indexes. All pairs of values in the two tables need to be compared. There is little algorithmic flexibility and (standard) indexes cannot really help.

For better performance, use the minScoreHint parameter. This allows to prevent doing the full similarity calculation for many pairs and early exit.
So this should run quicker:
SELECT A.KEY_1
,A.FULL_TEXT_1
,B.FULL_TEXT_2
,B.KEY_2
,MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0, 0.9) AS confidence
FROM #TABLE_1 A
CROSS JOIN #TABLE_2 B
WHERE MDS_DB.MDQ.SIMILARITY(A.FULL_TEXT_1,B.FULL_TEXT_2, 2, 0, 0, 0.9) >= 0.9
It is not clear from docs if 0.9 results would be included. If not, change 0.9 to 0.89

The link provided by scsimon will help you prove whether it's statistics or not. Have the estimates changed significantly since to when it was running fast?
Parallelism spring to mind. If the query was going parallel, but now isn't (e.g. if a server setting has been changed, or statistics) then that could cause significant performance degradation.

Teradata Performance issue and example

I am facing an issue in our Teradata QA environment where a simple query that ran in under 1 minute is now taking 12 minutes to complete. This select is pulling 5 fields based on a simple inner join
select a.material
, b.season
, b.theme
, b.collection
from SalesOrders_view.Allocation_Deliveries_cur a
inner join SalesOrders_view.Material_Attributes_cur b
on a.material = b.material;
I can run this same query in our Prod environment and it returns in less than a minute while running on approx 200k more records than QA.
Total volume is under 1.1 M records in SalesOrders.Allocation_Deliveries and 129 k records in SalesOrders.Material_Attributes. These are small datasets.
I compared the Explain plans on both environments and there is a stark difference in the estimated spool volume in the first Join step. The estimate in Production is on the money while the Estimate in QA is an order of magnitude off. However the data and table/views are identical in both systems and we have collected stats in every conceivable manner and we can see the particular table demographics in both systems as identical.
Lastly, this query has always returned in under a minute in all environments including QA as it is still doing in Production. This latent behavior is recent in the last week or so. I discussed this with our DBA and we have had no changes to software or configuration. He is new, but seems to know what he's doing but still getting caught up with a new environment.
I am looking for some pointers on what to check next. I have compared the relavant table / view definitions across QA and Prod and they are identical. The Table demographics in each system are also the same (I went through these with our DBA to make sure)
Any help is appreciated. Thanks in advance.
Pat
This is the Explain plan from QA. Note the very Low estimate in Step 5 (144 Rows). In Prod, the same Explain shows > 1 M rows which would be close to what I know.
Explain select a.material
, b.season
, b.theme
, b.collection
from SalesOrders_view.Allocation_Deliveries a
inner join SalesOrders_view.Material_Attributes_cur b
on a.material = b.material;
1) First, we lock SalesOrders.Allocation_Deliveries in view
SalesOrders_view.Allocation_Deliveries for access, and we lock
SalesOrders.Material_Attributes in view SalesOrders_view.Material_Attributes_cur for
access.
2) Next, we do an all-AMPs SUM step to aggregate from
SalesOrders.Material_Attributes in view SalesOrders_view.Material_Attributes_cur by way
of an all-rows scan with no residual conditions
, grouping by field1 ( SalesOrders.Material_Attributes.material
,SalesOrders.Material_Attributes.season ,SalesOrders.Material_Attributes.theme
,SalesOrders.Material_Attributes.theme ,SalesOrders.Material_Attributes.af_grdval
,SalesOrders.Material_Attributes.af_stcat
,SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM). Aggregate
Intermediate Results are computed locally, then placed in Spool 4.
The size of Spool 4 is estimated with high confidence to be
129,144 rows (41,713,512 bytes). The estimated time for this step
is 0.06 seconds.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
way of an all-rows scan into Spool 2 (all_amps), which is
redistributed by the hash code of (
SalesOrders.Material_Attributes.Field_9,
SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
SalesOrders.Material_Attributes.Field_7, SalesOrders.Material_Attributes.Field_6,
SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
SalesOrders.Material_Attributes.season, SalesOrders.Material_Attributes.material)
to all AMPs. Then we do a SORT to order Spool 2 by row hash
and the sort key in spool field1 eliminating duplicate rows.
The size of Spool 2 is estimated with low confidence to be
129,144 rows (23,504,208 bytes). The estimated time for this
step is 0.11 seconds.
2) We do an all-AMPs RETRIEVE step from SalesOrders.Material_Attributes in
view SalesOrders_view.Material_Attributes_cur by way of an all-rows scan
with no residual conditions locking for access into Spool 6
(all_amps), which is redistributed by the hash code of (
SalesOrders.Material_Attributes.material, SalesOrders.Material_Attributes.season,
SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
SalesOrders.Material_Attributes.Material_Attributes_UPD_TS, (CASE WHEN (NOT
(SalesOrders.Material_Attributes.af_stcat IS NULL )) THEN
(SalesOrders.Material_Attributes.af_stcat) ELSE ('') END )(VARCHAR(16),
CHARACTER SET UNICODE, NOT CASESPECIFIC), (CASE WHEN (NOT
(SalesOrders.Material_Attributes.af_grdval IS NULL )) THEN
(SalesOrders.Material_Attributes.af_grdval) ELSE ('') END )(VARCHAR(8),
CHARACTER SET UNICODE, NOT CASESPECIFIC)) to all AMPs. Then
we do a SORT to order Spool 6 by row hash. The size of Spool
6 is estimated with high confidence to be 129,144 rows (
13,430,976 bytes). The estimated time for this step is 0.08
seconds.
4) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan into Spool 7 (all_amps), which is built locally
on the AMPs. Then we do a SORT to order Spool 7 by the hash code
of (SalesOrders.Material_Attributes.material, SalesOrders.Material_Attributes.season,
SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
SalesOrders.Material_Attributes.Field_6, SalesOrders.Material_Attributes.Field_7,
SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
SalesOrders.Material_Attributes.Field_9). The size of Spool 7 is estimated
with low confidence to be 129,144 rows (13,301,832 bytes). The
estimated time for this step is 0.05 seconds.
5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using an inclusion
merge join, with a join condition of ("(material = material) AND
((season = season) AND ((theme = theme) AND ((theme =
theme) AND (((( CASE WHEN (NOT (af_grdval IS NULL )) THEN
(af_grdval) ELSE ('') END ))= Field_6) AND (((( CASE WHEN (NOT
(AF_STCAT IS NULL )) THEN (AF_STCAT) ELSE ('') END ))= Field_7)
AND ((Material_Attributes_SRC_SYS_NM = Material_Attributes_SRC_SYS_NM) AND
(Material_Attributes_UPD_TS = Field_9 )))))))"). The result goes into Spool
8 (all_amps), which is duplicated on all AMPs. The size of Spool
8 is estimated with low confidence to be 144 rows (5,616 bytes).
The estimated time for this step is 0.04 seconds.
6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to SalesOrders.Allocation_Deliveries in view
SalesOrders_view.Allocation_Deliveries by way of an all-rows scan with no
residual conditions. Spool 8 and SalesOrders.Allocation_Deliveries are
joined using a single partition hash_ join, with a join condition
of ("SalesOrders.Allocation_Deliveries.material = material"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with low confidence to be 3,858
rows (146,604 bytes). The estimated time for this step is 0.44
seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.70 seconds.
Here is what the record distribution looks like and the SQL I used to generate the result set
SELECT HASHAMP(HASHBUCKET(HASHROW( MATERIAL ))) AS
"AMP#",COUNT(*)
FROM EDW_LND_SAP_VIEW.EMDMMU01_CUR
GROUP BY 1
ORDER BY 2 DESC;
Output
Highest: AMP 137 with 1093 rows
Lowest: AMP 72 with 768 rows
Total AMPs: 144

Statistics Recommendations
Run the following in PROD and QA and post the differences (obscure column names if need be):
DIAGNOSTIC HELPSTATS ON FOR SESSION;
EXPLAIN
select a.material
, b.season
, b.theme
, b.collection
from SalesOrders_view.Allocation_Deliveries_cur a
inner join SalesOrders_view.Material_Attributes_cur b
on a.material = b.material;
This diagnostic when run in conjunction with the EXPLAIN command will produce a list of recommended statistics that may beneficial to optimizer in producing the lowest cost query plan. This may yield no difference or it may point to something that is different between the environments (data or otherwise).
Views and JOIN conditions
Based on your EXPLAIN plan one or both of the views in the SalesOrders_View database appear to be using an EXISTS clause. This EXISTS clause is relying on a COALESCE condition (or explicit CASE logic) to accomodate for a comparison between a column in one table that is defined as NOT NULL and a column in another table that is defined to allow NULL values. This can affect performance of that join.
Data Distribution
Your distribution results appear to be from the PRODUCTION environment. (Based on the number of AMPS and the number of rows shown on the AMP with the highest and lowest rows.) What does that look like for QA?
Edit - 2013-01-09 09:21
If the data was copied from Prod 2 months ago it may seem silly to ask but were the statistics recollected afterward? Stale statistics on top of replaced data could lead to the variance in the query plan between the environments.
Are you collecting PARTITION statistics on your tables even if they are not PPI tables? This helps the optimizer with cardinality estimates.
Are you the only workload running on the QA system?
Have you looked at the DBQL metrics to compare CPU and IO consumption for the query in each environment? Look at IO Skew, CPU Skew, and Unnecessary IO metrics as well.
Do you have delay throttles in place in the QA environment that may be delaying your workload? This will givie you the perception that it is taking longer to run in the QA environment when in fact the actual CPU consumption and IO consumption are the same between QA and PROD.
Do you have access to Viewpoint?
If so, have you looked at your query using either the My Queries and/or Query Spotlight
portlets to observe it's behavior?
Do you know which step in the Query Plan is the most expensive or time consuming? Viewpoint Rewind with either portlet I mentioned or Step logging in DBQL can show you this.
Are the DBS Control settings between the environments identical? Ask your DBA to look at this. There are settings in there that can affect the join plans that are used by the optimizer.
In the end if the data, table structures, indexes, and statistics are the same on the two systems whose hardware and TDBMS patch levels are identical you should not get two different EXPLAIN plans. If that ends up being the case I would suggest that you contact the GSC and get them involved.

Adding 'distinct' keyword to oracle query obliterates query performance for no reason

I am quite confused by something I'm seeing in an Oracle 10 database.
I have the following query.
select
t2.duplicate_num
from table1 t1, table2 t2,
(
select joincriteria_0 from intable1 it1, intable2 it2
where it2.identifier in (4496486,5911382)
and it1.joincriteria_0 = it2.joincriteria_0
and it1.filter_0 = 1
) tt
where t1.joincriteria_0 = tt.joincriteria_0
and t2.joincriteria_1 = t1.joincriteria_1
and t2.filter_0 = 3
and t2.filter_1 = 1
and t2.filter_2 not in (48020)
It doesn't really seem like anything special to me, here are the baseline performance numbers from autotrace:
CR_GETS: 318
CPU: 3
ROWS: 33173
Now if I add the 'DISTINCT' keyword to the query (e.g. 'select distinct t2.duplicate_num...') this happens
CR_GETS: 152921
CPU: 205
ROWS: 305
The query plan has not changed, but the logical IO grows by a factor of 500. I was expecting CPU only to go up and logical IO to be largely unchanged.
The net result is a query that runs 10-100x slower with the distinct keyword. I can put code into the applciation which would make the result set distinct in a fraction of the time. How does this make any sense? particularly without the query plan changing?

This indicates a lack of index somewhere. It also means, your original query without the distinct clause wasn't optimized. With "distinct" also it could not be optimized, so the query plan remained the same. An unoptimized query varies widely in performance due to the full table scans.

Why does changing the where clause on this criteria reduce the execution time so drastically?

I ran across a problem with a SQL statement today that I was able to fix by adding additional criteria, however I really want to know why my change fixed the problem.
The problem query:
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000'
ORDER BY actionhistory_id
)
WHERE rownum <= 30000;
The "fix"
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000' and actionhistory_id <= 'ACT100030000'
ORDER BY actionhistory_id
)
All of the _id columns are indexed sequences.
The first query's explain plan had a cost of 372 and the second was 14. This is running on an Oracle 11g database.
Additionally, if actionhistory_id in the where clause is anything less than ACT100000000, the original query returns instantly.

This is because of the index on the actionhistory_id column.
During the first query Oracle has to return all the index blocks containing indexes for records that come after 'ACT100010000', then it has to match the index to the table to get all the records, and then it pulls 29999 records from the result set.
During the second query Oracle only has to return the index blocks containing records between 'ACT100010000' and 'ACT100030000'. Then it grabs from the table those records that are represented in the index blocks. A lot less work in that step of grabbing the record after having found the index than if you use the first query.
Noticing your last line about if the id is less than ACT100000000 - sounds to me that those records may all be in the same memory block (or in a contiguous set of blocks).
EDIT: Please also consider what is said by Justin - I was talking about actual performance, but he is pointing out that the id being a varchar greatly increases the potential values (as opposed to a number) and that the estimated plan may reflect a greater time than reality because the optimizer doesn't know the full range until execution. To further optimize, taking his point into consideration, you could put a function based index on the id column or you could make it a combination key, with the varchar portion in one column and the numeric portion in another.

What are the plans for both queries?
Are the statistics on your tables up to date?
Do the two queries return the same set of rows? It's not obvious that they do but perhaps ACT100030000 is the largest actionhistory_id in the system. It's also a bit confusing because the first query has a predicate on actionhistory_id with a value of TRA100010000 which is very different than the ACT value in the second query. I'm guessing that is a typo?
Are you measuring the time required to fetch the first row? Or the time required to fetch the last row? What are those elapsed times?
My guess without that information is that the fact that you appear to be using the wrong data type for your actionhistory_id column is affecting the Oracle optimizer's ability to generate appropriate cardinality estimates which is likely causing the optimizer to underestimate the selectivity of your predicates and to generate poorly performing plans. A human may be able to guess that actionhistory_id is a string that starts with ACT10000 and then has 30,000 sequential numeric values from 00001 to 30000 but the optimizer is not that smart. It sees a 13 character string and isn't able to figure out that the last 10 characters are always going to be numbers so there are only 10 possible values rather than 256 (assuming 8-bit characters) and that the first 8 characters are always going to be the same constant value. If, on the other hand, actionhistory_id was defined as a NUMBER and had values between 1 and 30000, it would be dramatically easier for the optimizer to make reasonable estimates about the selectivity of various predicates.

Poor DB Performance when using ORDER BY

I'm working with a non-profit that is mapping out solar potential in the US. Needless to say, we have a ridiculously large PostgreSQL 9 database. Running a query like the one shown below is speedy until the order by line is uncommented, in which case the same query takes forever to run (185 ms without sorting compared to 25 minutes with). What steps should be taken to ensure this and other queries run in a more manageable and reasonable amount of time?
select A.s_oid, A.s_id, A.area_acre, A.power_peak, A.nearby_city, A.solar_total
from global_site A cross join na_utility_line B
where (A.power_peak between 1.0 AND 100.0)
and A.area_acre >= 500
and A.solar_avg >= 5.0
AND A.pc_num <= 1000
and (A.fips_level1 = '06' AND A.fips_country = 'US' AND A.fips_level2 = '025')
and B.volt_mn_kv >= 69
and B.fips_code like '%US06%'
and B.status = 'active'
and ST_within(ST_Centroid(A.wkb_geometry), ST_Buffer((B.wkb_geometry), 1000))
--order by A.area_acre
offset 0 limit 11;

The sort is not the problem - in fact the CPU and memory cost of the sort is close to zero since Postgres has Top-N sort where the result set is scanned while keeping up to date a small sort buffer holding only the Top-N rows.
select count(*) from (1 million row table) -- 0.17 s
select * from (1 million row table) order by x limit 10; -- 0.18 s
select * from (1 million row table) order by x; -- 1.80 s
So you see the Top-10 sorting only adds 10 ms to a dumb fast count(*) versus a lot longer for a real sort. That's a very neat feature, I use it a lot.
OK now without EXPLAIN ANALYZE it's impossible to be sure, but my feeling is that the real problem is the cross join. Basically you're filtering the rows in both tables using :
where (A.power_peak between 1.0 AND 100.0)
and A.area_acre >= 500
and A.solar_avg >= 5.0
AND A.pc_num <= 1000
and (A.fips_level1 = '06' AND A.fips_country = 'US' AND A.fips_level2 = '025')
and B.volt_mn_kv >= 69
and B.fips_code like '%US06%'
and B.status = 'active'
OK. I don't know how many rows are selected in both tables (only EXPLAIN ANALYZE would tell), but it's probably significant. Knowing those numbers would help.
Then we got the worst case CROSS JOIN condition ever :
and ST_within(ST_Centroid(A.wkb_geometry), ST_Buffer((B.wkb_geometry), 1000))
This means all rows of A are matched against all rows of B (so, this expression is going to be evaluated a large number of times), using a bunch of pretty complex, slow, and cpu-intensive functions.
Of course it's horribly slow !
When you remove the ORDER BY, postgres just comes up (by chance ?) with a bunch of matching rows right at the start, outputs those, and stops since the LIMIT is reached.
Here's a little example :
Tables a and b are identical and contain 1000 rows, and a column of type BOX.
select * from a cross join b where (a.b && b.b) --- 0.28 s
Here 1000000 box overlap (operator &&) tests are completed in 0.28s. The test data set is generated so that the result set contains only 1000 rows.
create index a_b on a using gist(b);
create index b_b on a using gist(b);
select * from a cross join b where (a.b && b.b) --- 0.01 s
Here the index is used to optimize the cross join, and speed is ridiculous.
You need to optimize that geometry matching.
add columns which will cache :
ST_Centroid(A.wkb_geometry)
ST_Buffer((B.wkb_geometry), 1000)
There is NO POINT in recomputing those slow functions a million times during your CROSS JOIN, so store the results in a column. Use a trigger to keep them up to date.
add columns of type BOX which will cache :
Bounding Box of ST_Centroid(A.wkb_geometry)
Bounding Box of ST_Buffer((B.wkb_geometry), 1000)
add gist indexes on the BOXes
add a Box overlap test (using the && operator) which will use the index
keep your ST_Within which will act as a final filter on the rows that pass
Maybe you can just index the ST_Centroid and ST_Buffer columns... and use an (indexed) "contains" operator, see here :
http://www.postgresql.org/docs/8.2/static/functions-geometry.html

I would suggest creating an index on area_acre. You may want to take a look at the following: http://www.postgresql.org/docs/9.0/static/sql-createindex.html
I would recommend doing this sort of thing off of peak hours though because this can be somewhat intensive with a large amount of data. One thing you will have to look at as well with indexes is rebuilding them on a schedule to ensure performance over time. Again this schedule should be outside of peak hours.
You may want to take a look at this article from a fellow SO'er and his experience with database slowdowns over time with indexes: Why does PostgresQL query performance drop over time, but restored when rebuilding index

If the A.area_acre field is not indexed that may slow it down. You can run the query with EXPLAIN to see what it is doing during execution.

First off I would look at creating indexes , ensure your db is being vacuumed, increase the shared buffers for your db install, work_mem settings.

First thing to look at is whether you have an index on the field you're ordering by. If not, adding one will dramatically improve performance. I don't know postgresql that well but something similar to:
CREATE INDEX area_acre ON global_site(area_acre)
As noted in other replies, the indexing process is intensive when working with a large data set, so do this during off-peak.

I am not familiar with the PostgreSQL optimizations, but it sounds like what is happening when the query is run with the ORDER BY clause is that the entire result set is created, then it is sorted, and then the top 11 rows are taken from that sorted result. Without the ORDER BY, the query engine can just generate the first 11 rows in whatever order it pleases and then it's done.
Having an index on the area_acre field very possibly may not help for the sorting (ORDER BY) depending on how the result set is built. It could, in theory, be used to generate the result set by traversing the global_site table using an index on area_acre; in that case, the results would be generated in the desired order (and it could stop after generating 11 rows in the result). If it does not generate the results in that order (and it seems like it may not be), then that index will not help in sorting the results.
One thing you might try is to remove the "CROSS JOIN" from the query. I doubt that this will make a difference, but it's worth a test. Because a WHERE clause is involved joining the two tables (via ST_WITHIN), I believe the result is the same as an inner join. It is possible that the use of the CROSS JOIN syntax is causing the optimizer to make an undesirable choice.
Otherwise (aside from making sure indexes exist for fields that are being filtered), you could play a bit of a guessing game with the query. One condition that stands out is the area_acre >= 500. This means that the query engine is considering all rows that meet that condition. But then only the first 11 rows are taken. You could try changing it to area_acre >= 500 and area_acre <= somevalue. The somevalue is the guessing part that would need adjustment to make sure you get at least 11 rows. This, however, seems like a pretty cheesy thing to do, so I mention it with some reticence.

Have you considered creating Expression based indexes for the benefit of the hairier joins and where conditions?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas