Materialize Common Table Expression in HANA - hana

Is there a way to force HANA to materialize a subquery in a WITH clause like what MATERIALIZE and INLINE optimizer hints do as below in Oracle?
WITH dept_count AS (
SELECT /*+ MATERIALIZE */ deptno, COUNT(*) AS dept_count
FROM emp
GROUP BY deptno)
SELECT ...
I find no such hint in hana. Any help?

For SAP HANA materializing an intermediate result set usually is not improving performance; on the contrary, SAP HANA strives to materialize as late as possible, so that it can process the data in its more compact internal representation.
What you probably like to see is that common table expressions, that are used at multiple places in your query, don't get re-executed for every reference.
This optimisation is called "subplan sharing" in SAP HANA and active by default.
Looking at an EXPLAIN PLAN output can verify if a query uses "subplan sharing":
OPERATOR_NAME OPERATOR_DETAILS
...
ROW SEARCH KF_MED.MKF1, KF_MED.MKF2, KF_DAT.MKF1,
[...]
COLUMN SEARCH KF_MED.MKF1, KF_MED.MKF2, FACT.KF1, FACT.KF2 ...
FILTER FACT.KF2 <> KF_MED.MKF2 OR KF_MED.MKF2 IS NULL OR FACT.KF2 IS NULL ...
JOIN JOIN CONDITION: (INNER many-to-one) FACT.KF1 = KF_MED.MKF1 ...
COLUMN TABLE ...
ROW SEARCH KF_MED.MKF1, KF_MED.MKF2 ...
VIEW PROJECT COLS: KF_MED.MKF1, KF_MED.MKF2, ...
===> (SHARED SUBPLAN) SUBPLAN OPERATOR_ID : 9 ...
VIEW PROJECT COLS: KF_MED.MKF1, KF_MED.MKF2, ...
===> (SHARED SUBPLAN) SUBPLAN OPERATOR_ID : 9
See the two line marked with ===> above.
In case the optimizer does not choose to use "subplan sharing" you may try and use a hint to indicate that this is the desired behaviour:
SELECT * FROM T1 WITH HINT( SUBPLAN_SHARING );
However, before trying to "force" the optimizer to do anything, make sure to understand what is currently happening. Hints are technical debt in your code and should be avoided if possible.

Related

Teiid not performing optimal join

For our Teiid Springboot project we use a row filter in a where clause to determine what results a user gets.
Example:
SELECT * FROM very_large_table WHERE id IN ('01', '03')
We want the context in the IN clause to be dynamic like so:
SELECT * FROM very_large_table WHERE id IN (SELECT other_id from very_small_table)
The problem now is that Teiid gets all the data from very_large_table and only then tries to filter with the where clause, this makes the query 10-20 times slower. The data in this very_small_tableis only about 1-10 records and it is based on the user context we get from Java.
The very_large_table is located on a Oracle database and the very_small_table is on the Teiid Pod/Container. Somehow I can't force Teiid to ship the data to Oracle and perform filtering there.
Things that I have tried:
I have specified the the foreign data wrappers as follows
CREATE FOREING DATA WRAPPER "oracle_override" TYPE "oracle" OPTIONS (EnableDependentsJoins 'true');
CREATE SERVER server_name FOREIGN DATA WRAPPER "oracle_override";
I also tried, exists statement or instead of a where clause use a join clause to see if pushdown happened. Also hints for joins don't seem to matter.
Sadly the performance impact at the moment is that high that we can't reach our performance targets.
Are there any cardinalities on very_small_table and very_large_table? If not the planner will assume a default plan.
You can also use a dependent join hint:
SELECT * FROM very_large_table WHERE id IN /*+ dj */ (SELECT other_id from very_small_table)
Often, exists performs better than in:
SELECT vlt.*
FROM very_large_table vlt
WHERE EXISTS (SELECT 1 FROM very_small_table vst WHERE vst.other_id = vlt.id);
However, this might end up scanning the large table.
If id is unique in vlt and there are no duplicates in vst, then a JOIN might optimize better:
select vlt.*
from very_small_table vst join
very_large_table vlt
on vst.other_id = vlt.id;

SQL Server Execute Order

As I know the order of execute in SQL is
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY
So I am confused with the correlated query like the below code.
Is FROM WHERE clause in outer query executed first or SELECT in inner query executed first? Can anyone give me idea and explanation? Thanks
SELECT
*, COUNT(1) OVER(PARTITION BY A) pt
FROM
(SELECT
tt.*,
(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS A
FROM
t tt
WHERE
data > 10) t1
As I know the order of execute in SQL is FROM-> WHERE-> GROUP BY-> HAVING -> SELECT ->ORDER BY
False. False. False. Presumably what you are referring to is this part of the documentation:
The following steps show the logical processing order, or binding
order, for a SELECT statement. This order determines when the objects
defined in one step are made available to the clauses in subsequent
steps.
As the documentation explains, this refers to the scoping rules when a query is parsed. It has nothing to do with the execution order. SQL Server -- as with almost any database -- reserves the ability to rearrange the query however it likes for processing.
In fact, the execution plan is really a directed acyclic graph (DAG), whose components generally do not have a 1-1 relationship with the clauses in a query. SQL Server is free to execute your query in whatever way it decides is best, so long as it produces the result set that you have described.

Is index used for 'outer' and 'inner' where clauses in nested selects?

Faced with need of using column aliases in where condition for selection.
Found possible solution here.
Let's assume we have one-to-one relationship (user-to-role) and we want to get results as following:
SELECT u.name AS u_name, r.name AS r_name
FROM users AS u
INNER JOIN roles AS r
ON u.role_id = r.role_id
WHERE u.name = 'John'
And we have corresponding idex for user.name (just for example).
If this query is run with EXPLAIN, it shows all indexes that are used during selection (including index for name).
Now, as we want to use aliases in WHERE clause, based on proposed solution we can rewrite the query:
SELECT * FROM (
SELECT u.name AS u_name, r.name AS r_name
FROM users AS u
INNER JOIN roles AS r
ON u.role_id = r.role_id
) AS temp
WHERE u_name = 'John'
As you see, there's no WHERE clause in nested select. Running this query with EXPLAIN gives the same results (just to admit, I'm not an expert in analyzing results of 'explain', but still):
same indexes
same costs
similar time of execution
And I'm a little bit confused by this result: was convinced that at least index for user name won't be used.
Q1: Does postgres use indexes in that way?
Q2: Are there possible performance issues?
The subquery is not needed, so it can be unrolled/collapsed.
The following query will generate a flat plan (and indexes are not relevant)
\i tmp.sql
CREATE TABLE t
(a integer not null primary key
);
insert into t(a)
select v from generate_series(1,10000) v
;
ANALYZE t;
EXPLAIN
SELECT * from (
select d AS e from (
select c as d from (
select b AS c from (
select a AS b from t
) s
) r
) q
) p
where e =95
;
Resulting plan:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 10000
ANALYZE
QUERY PLAN
---------------------------------------------------------------------
Index Only Scan using t_pkey on t (cost=0.17..2.38 rows=1 width=4)
Index Cond: (a = 95)
(2 rows)
In the OP's fragment,the innermost query (table expression) is a two-table join
, but the mechanism is the same: all the outer layers can be peeled off (and the result column is renamed)
And yes: the join will benefit from indexes on the joined fields, and the final where could use an index, too.
SQL is a descriptive language, not a procedural language. A SQL query describes the result set being produced. It does not specify how to create it -- and that is even more true in Postgres which doesn't have compiler options or hints.
What actually gets run is a directed acyclic graph of operations (DAG). The compiling step creates the DAG. Postgres is smart enough to realize that the subquery is meaningless, so the two versions are optimized to the same DAG.
Let me add that I think Postgres usually materializes CTEs, so using a CTE might prevent the index from being used.

More Efficient Query than the EXISTS Condition

I was reading up on the SQL EXISTS Condition and found this snippet from Techonthenet.com
Note: SQL Statements that use the SQL EXISTS Condition are very inefficient since the sub-query is RE-RUN for EVERY row in the outer query's table. There are more efficient ways to write most queries, that do not use the SQL EXISTS Condition
Unless I skipped over it, the article does not explain a more efficient query that doesn't need this condition. Anyone have an idea of what they could be referring to?
You can usually use some "clever" inner join or something like that.
However, all in all, the advice is severely outdated. Yes, there used to be a time when subqueries had a huge cost, but that isn't necessarily the case anymore - as always, profile. And examine execution plans. It's very much possible your DB engine can handle subqueries just fine - in fact, it can be much faster than the hacky inner join (and similar solutions) :)
Always make sure you understand the rationale behind the advice, and to what it actually applies. A simple example on MS SQL:
select * from order
where exists (select * from user where order.CreatedBy = user.Id)
What a horrible sub-query, right? Totally going to run the subquery for every row of the order table, right? Well, the execution planner is smart enough to translate this into a simple left join - involving just two table scans (or, if applicable, index seeks). In other cases, the engine might decide to build hash sets, or temporary tables, or do any other smart thing to make sure the query is fast (within the other trade-offs, like memory usage). Nowadays, you will rarely find that your query tweaks are smarter than what the execution planner does - if your DB engine is up to the task. In fact, this is the whole reason we use SQL - a declarative language - in the first place! Instead of saying how the results should be obtained, you say what relationships lead to the result set you want, giving the DB engine a massive freedom in how to actually get the data - whether it means going through every single row in a table one by one, or seeking through an index.
The default should always be to write the query in a way that makes the most sense. Once you've got a nice, clean and easy to understand query, think about any performance implications, and profile the results (using realistic test data). Look at the execution plan of the query - if you care about SQL performance, you really need to understand execution plans anyway; they tell you all there is to know about the way the query is actually executed, and how to improve various parts of the query (or, more often, the indices and statistics involved).
First of all, don't trust general statements like
Note: SQL Statements that use the SQL EXISTS Condition are very
inefficient since the sub-query is RE-RUN for EVERY row in the outer
query's table.
This can be true for some database systems, but other database systems might be able to find a more efficient execution plan for such statements.
For example, I tried such a statement on my Oracle database and it uses a hash join to execute the statement efficiently.
Now for the alternatives:
In many cases, you can use an IN subquery. This might work out well even on database systems that would execute EXISTS inefficiently.
So, instead of
select * from foo where exists (select 1 from bar where foo.x = bar.y)
write
select * from foo where x in (select y from bar)
The same can be written with ANY
select * from foo where x = any (select y from bar)
In many cases, it's most desirable to use a join, e.g.
select foo.* from foo inner join bar on foo.x = bar.y
You might have to use DISTINCT to make sure you don't get duplicate results when a row in foo matches more than one row in bar, though.
select distinct foo.* from foo inner join bar on foo.x = bar.y
When you have main query result set small and result set of sub-query is Large and sub-query uses appropriate indexes - EXISTS / NOT EXISTS is better option in place of IN / NOT IN Clause.
When you have index on larger result set of main query and smaller result set in the sub-query - IN / NOT IN is better option in place of EXISTS / NOT EXISTS Clause.
This question is answered at Oracle Document
11.5.3.4 Use of EXISTS versus IN for Subqueries.
11.5.3.4 Use of EXISTS versus IN for Subqueries
In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.
Sometimes, Oracle can rewrite a subquery when used with an IN clause to take advantage of selectivity specified in the subquery. This is most beneficial when the most selective filter appears in the subquery and there are indexes on the join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria.
"Example 1: Using IN - Selective Filters in the Subquery" and "Example 2: Using EXISTS - Selective Predicate in the Parent" are two examples that demonstrate the benefits of IN and EXISTS. Both examples use the same schema with the following characteristics:
There is a unique index on the employees.employee_id field.
There is an index on the orders.customer_id field.
There is an index on the employees.department_id field.
The employees table has 27,000 rows.
The orders table has 10,000 rows.
The OE and HR schemas, which own these segments, were both analyzed
with COMPUTE.
11.5.3.4.1 Example 1: Using IN - Selective Filters in the Subquery
This example demonstrates how rewriting a query to use IN can improve performance. This query identifies all employees who have placed orders on behalf of customer 144.
The following SQL statement uses EXISTS:
SELECT /* EXISTS example */
e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE EXISTS (SELECT 1 FROM orders o /* Note 1 */
WHERE e.employee_id = o.sales_rep_id /* Note 2 */
AND o.customer_id = 144); /* Note 3 */
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The plan requires a full table scan of the employees table, returning many rows. Each of these rows is then filtered against the orders table (through an index).
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 FILTER
2 TABLE ACCESS FULL EMPLOYEES ANA 155
3 TABLE ACCESS BY INDEX ROWID ORDERS ANA 3
4 INDEX RANGE SCAN ORD_CUSTOMER_IX ANA 1
Rewriting the statement using IN results in significantly fewer resources used.
The SQL statement using IN:
SELECT /* IN example */
e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE e.employee_id IN (SELECT o.sales_rep_id /* Note 4 */
FROM orders o
WHERE o.customer_id = 144); /* Note 3 */
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The optimizer rewrites the subquery into a view, which is then joined through a unique index to the employees table. This results in a significantly better plan, because the view (that is, subquery) has a selective predicate, thus returning only a few employee_ids. These employee_ids are then used to access the employees table through the unique index.
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 NESTED LOOPS 5
2 VIEW 3
3 SORT UNIQUE 3
4 TABLE ACCESS FULL ORDERS ANA 1
5 TABLE ACCESS BY INDEX ROWID EMPLOYEES ANA 1
6 INDEX UNIQUE SCAN EMP_EMP_ID_PK ANA
11.5.3.4.2 Example 2: Using EXISTS - Selective Predicate in the Parent
This example demonstrates how rewriting a query to use EXISTS can improve performance. This query identifies all employees from department 80 who are sales reps who have placed orders.
The following SQL statement uses IN:
SELECT /* IN example */
e.employee_id, e.first_name, e.last_name, e.department_id, e.salary
FROM employees e
WHERE e.department_id = 80 /* Note 5 */
AND e.job_id = 'SA_REP' /* Note 6 */
AND e.employee_id IN (SELECT o.sales_rep_id FROM orders o); /* Note 4 */
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The SQL statement was rewritten by the optimizer to use a view on the orders table, which requires sorting the data to return all unique employee_ids existing in the orders table. Because there is no predicate, many employee_ids are returned. The large list of resulting employee_ids are then used to access the employees table through the unique index.
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 NESTED LOOPS 125
2 VIEW 116
3 SORT UNIQUE 116
4 TABLE ACCESS FULL ORDERS ANA 40
5 TABLE ACCESS BY INDEX ROWID EMPLOYEES ANA 1
6 INDEX UNIQUE SCAN EMP_EMP_ID_PK ANA
The following SQL statement uses EXISTS:
SELECT /* EXISTS example */
e.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE e.department_id = 80 /* Note 5 */
AND e.job_id = 'SA_REP' /* Note 6 */
AND EXISTS (SELECT 1 /* Note 1 */
FROM orders o
WHERE e.employee_id = o.sales_rep_id); /* Note 2 */
The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The cost of the plan is reduced by rewriting the SQL statement to use an EXISTS. This plan is more effective, because two indexes are used to satisfy the predicates in the parent query, thus returning only a few employee_ids. The employee_ids are then used to access the orders table through an index.
ID OPERATION OPTIONS OBJECT_NAME OPT COST
---- -------------------- --------------- ---------------------- --- ----------
0 SELECT STATEMENT CHO
1 FILTER
2 TABLE ACCESS BY INDEX ROWID EMPLOYEES ANA 98
3 AND-EQUAL
4 INDEX RANGE SCAN EMP_JOB_IX ANA
5 INDEX RANGE SCAN EMP_DEPARTMENT_IX ANA
6 INDEX RANGE SCAN ORD_SALES_REP_IX ANA 8

Will Oracle optimizer use multiple Hints in the same SELECT?

I'm trying to optimize query performance and have had to resort to using optimizer hints. But I've never learned if the optimizer will use more than one hint at a time.
e.g.
SELECT /*+ INDEX(i dcf_vol_prospect_ids_idx)*/
/*+ LEADING(i vol) */
/*+ ALL_ROWS */
i.id_number,
...
FROM i_table i
JOIN vol_table vol on vol.id_number = i.id_number
JOIN to_a_bunch_of_other_tables...
WHERE i.solicitor_id = '123'
AND vol.solicitable_ind = 1;
The explain plan shows the same cost, but I know that's just an estimate.
Please assume that all table and index statistics have been calculated. FYI, the index dcf_vol_prospect_ids_idx is on the i.solicitor_id column.
Thanks,
Stew
Try specifying all the hints in a single comment block, as shown in this example from the wonderful Oracle documentation (http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm).
16.2.1 Specifying a Full Set of Hints
When using hints, in some cases, you
might need to specify a full set of
hints in order to ensure the optimal
execution plan. For example, if you
have a very complex query, which
consists of many table joins, and if
you specify only the INDEX hint for a
given table, then the optimizer needs
to determine the remaining access
paths to be used, as well as the
corresponding join methods. Therefore,
even though you gave the INDEX hint,
the optimizer might not necessarily
use that hint, because the optimizer
might have determined that the
requested index cannot be used due to
the join methods and access paths
selected by the optimizer.
In Example 16-1, the LEADING hint
specifies the exact join order to be
used; the join methods to be used on
the different tables are also
specified.
Example 16-1 Specifying a Full Set of
Hints
SELECT /*+ LEADING(e2 e1) USE_NL(e1) INDEX(e1 emp_emp_id_pk)
USE_MERGE(j) FULL(j) */
e1.first_name, e1.last_name, j.job_id, sum(e2.salary) total_sal
FROM employees e1, employees e2, job_history j
WHERE e1.employee_id = e2.manager_id
AND e1.employee_id = j.employee_id
AND e1.hire_date = j.start_date
GROUP BY e1.first_name, e1.last_name, j.job_id ORDER BY total_sal;
Oracle 19c introduced Hint Usage Reporting feature:
EXPLAIN PLAN FOR
SELECT /*+ INDEX(i dcf_vol_prospect_ids_idx)*/
/*+ LEADING(i vol) */
/*+ ALL_ROWS */
i.id_number,
...
FROM i_table i
JOIN vol_table vol on vol.id_number = i.id_number
JOIN to_a_bunch_of_other_tables...
WHERE i.solicitor_id = '123'
AND vol.solicitable_ind = 1;
SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT'));
--============
It shows another section Hint Report:
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: ...
---------------------------------------------------
...
In fact, the recommendation of Jonathan Lewis, Author of Cost-Based Oracle Fundamentals is that if the CBO fails at finding the correct plan, you need to take over the job of the CBO and "layer-in" the hints - an average of two hints per table in the query.
The reason is that one hint could lead to yet another bad and possibly even worse plan than the CBO would get unaided. If the CBO is wrong, you need to give it the whole plan, not just a nudge in the right direction.