Difference between Apache Ignite SQL and Scan Query? - ignite

I'm wondering what's the difference between Ignite SQL and Scan Query, including but not limited to:
What are strengths and weaknesses of them?
Does Scan Query support Index like SQL?
Is there any query optimization for SQL/Scan Query specifically?
Performance comparison between them?
Ignite SQL make use of H2 database to query cache, does Scan Query do it this way too? What's the way of Scan Query if not?
Thanks

SqlQuery
(+) Can provide better performance with proper indexes
(+) Full power of SQL: Joins, aggregations, groupings
(-) Requires QueryEntity configuration
(-) Consumes more memory for indexes and internal data structures
ScanQuery
(+) In some cases, more flexible, since filter and transformer can contain arbitrary code
(-) Always performs full cache scan (does not use indexes in any way)
(-) Requires filter and transformer code (classes) to be deployed to the server nodes
I would say that SQL is the default choice for most use cases. ScanQuery is useful when filter/transformer logic can't be expressed in SQL and/or require custom method calls.

Related

High query execution time

SELECT T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME,
ROUND((SUM (T.TOT_AVAIL_TIME-T.maintenance_TIME) / SUM(T.TOT_AVAIL_TIME))*100) AVAILABILITY,
ROUND((SUM(UTIL_TIME) / nullif(SUM(T.TOT_AVAIL_TIME-T.maintenance_TIME ),0) )*100) UTILIZATION,
ROUND(SUM(T.failedcmds)/
SUM(T.total_failedcmds),2)*100 failedcmds,
AVG(MAX_QL) MAX_QL,
AVG(AVG_QL) avg_ql
FROM db1 T,
db2 mt
WHERE T.eq_model = mt.eq_model
and TOT_AVAIL_TIME != 0
AND TOT_AVAIL_TIME IS NOT NULL
GROUP BY T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME
This query returns 2550 records and takes 12-15 s to run in SQL Developer and in mybatis it takes 30-35 secs. Is there any thing wrong with existing query? Is there any way to optimize above query and bring down the execution time to <5 secs in sql developer and <15 secs in ORM?
Explain Plan
Let me bring up a few points for us to think about performance.
The matter of fact is that "MyBatis or any other ORM is going to be slower than native SQL execution". Some of the reasons for this fact are:
R1. It is clear that MyBatis adds overhead to database calls. You gain flexibility, maintenance and encapsulation but you lose in performance.
R2. In most cases, MyBatis is going to parse the ResultSet to Java objects, this adds additional overhead. SQL Clients can work straight with cursors which are faster but in a long run it may be harder to maintain.
R3. In most cases, MyBatis is going to create a transaction for you, this adds additional overhead.
R4. In most cases, MyBatis is going to create and manage a cache for you, this adds additional overhead for the fist select but speed up the process for the next selects.
R5. MyBatis can also help you with lazy loading and other data retrieval strategies.
-R6. We should compare MyBatis executions against JDBC executions. Instead of comparing MyBatis against a SQL Client Tool (such as SQL Developer) because there are variables which can obscure your results. Example, they may not fetch all rows at once.
That being said, MyBatis may give you flexibility, better maintenance, an easy way to handle transact, easy way to parse tables to Java objects but it you take some performance from you.
So, if you want to speed up your queries there are a few things you should consider. See MyBatis Documentation:
C1. Use fetchSize in your queries.
C2. Use cache wisely, see what kind of caching it is needed, in some cases it make sense to do not use cache at all. Example: <select ... useCache="false">
C3. Be aware of the "N+1 Selects Problem". The documentation has some insights about this characteristic of many ORMs.
C4. Try to use a lightweight transactionManager such as "JDBC", keep in mind that aspect oriented transactions (such as in Spring #Transactional) will add a little bit more of overhead.
If the tables aren't indexed properly yet, you could get the most out of indexing T by TOT_AVAIL_TIME and T.eq_model and indexing mt by mt.eq_model
Put T.eq_model and T.TOT_AVAIL_TIME in one index (in this order).
Note: as you also check for NULL-ness of TOT_AVAIL_TIME, I assume it can be NULL by column definition. If you index it in a simple index, NULL-s are not indexed, and a TOT_AVAIL_TIME IS NOT NULL won't use the index later. So either combine it with eq_model, or remove that filter and change the column to NOT NULL.
I used this workaround on nullable columns on JIRA: (PROJECT_CAN_HAS_NULL ASC, '1').
Add index on eq_model column and check
Please try with this query only i combined the where condition on TOT_AVAIL_TIM
SELECT T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME,
ROUND((SUM (T.TOT_AVAIL_TIME-T.maintenance_TIME) / SUM(T.TOT_AVAIL_TIME))*100) AVAILABILITY,
ROUND((SUM(UTIL_TIME) / nullif(SUM(T.TOT_AVAIL_TIME-T.maintenance_TIME ),0) )*100) UTILIZATION,
ROUND(SUM(T.failedcmds)/
SUM(T.total_failedcmds),2)*100 failedcmds,
AVG(MAX_QL) MAX_QL,
AVG(AVG_QL) avg_ql
FROM db1 T,
db2 mt
WHERE T.eq_model = mt.eq_model
AND (TOT_AVAIL_TIME != 0 AND TOT_AVAIL_TIME IS NOT NULL)
GROUP BY T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME

SQL Server missing index in execution plan

We have created a view which contain 50 joins and some correlated subqueries.
When I am trying to look at the execution plan, it is not recommended missing index.
Could you please let me know why SQL Server is not showing any missing index statements for the running statement?
Here is my understanding. SQL is declarative language.
It means you only need to specify what data you want and from where you want it.
The rest is the task of server to do.
SQL server is using CBO (cost based optimizer) engine to determine which access method should be used. If you use select * from tablename without where it will be not used any index.
Index while increasing performance on certain case, it will be hindrance.
SQL uses the statistics to determine which access method will be used. Index seek, index scan, cluster index seek, cluster index scan etc.
So to answer your question is probably because of :
1. the statistics is not updated
2. you use select without where
3. your database is fit in memory so no index will be used

Does SQLite have an eval command?

I have a query referenced in Why is SQLite refusing to use available indexes when adding a JOIN? that is a compound query. When the segments of the query are evaluated individually, the query plan generated applies the relevant indicies and runs smoothly. However, when run together (via a JOIN) it fails to do so. Therefore, I was wondering if there was a way to create a query that runs 'eval' on the subquery and passes that to the outer query to force SQLite to use the query plans that would have been generated had they been done individually.
The answer to your other question tells you why already: indexes are not used when they're not useful.
In essence:
If it's cheapest to hop back and forth on disk pages to fetch a handful of rows that match a query, an index gets used.
If it's cheapest to just read the entire mess and filter out uneeded rows, an index is not used.
Some databases (e.g. Postgres) offer an intermediary level between the two in the form of a bitmap index scan: it amounts to the second with a pre-flight check based on the index, to avoid visiting disk pages that contain no matching rows.
That's all there is to it, really: a few rows, index; lots of rows, no index.
Naturally, poorly written queries don't use indexes either, but that's for different reasons: they just confuse the query planner, and while smart the latter is not all-knowing. Joining on a union or an aggregate, in particular, are a prime recipe for not using indexes. (And that is what you are doing.)
Per usual you should write your queries and indexes that way so that Sqlite's query optimizer recognizes the optimal indexes and just uses them.
But as your question in this case is more specific it seems you look for an equivalent of SQL Server's FORCE(INDEX) clause.
As I have read about it in Sqlite there is the clause INDEXED BY, though it seems Sqlite's community's opinions about it are split (probably because of what I mentioned in my first sentence)
link 1 sqlite.org's documentation about it
link 2 for a tutorial on that

MS SQL Server optimizer and varying table and field aliases

We have a lot of queries for which we append a random alias at the end of field and table names (due to a custom ORM implementation that might be hard to change). The queries are like the following (though substantially more complex, most of the time):
SELECT fooA.field1 as field1B,
fooA.field2 as field1C
FROM foo as fooA
The suffixes A, B and C are randomly generated (and longer than one character). Will this hurt performance of our queries (i.e. will the optimizer not be able to recognize repeated queries due to the random part)? We mainly use SQL Server 2005.
Yes, the optimizer will need to reparse and recompile your query each time, since the query hash will change.
The Query Optimizer Engine uses the Execution plan which goes by ObjectId - the aliases are purely for programming purposes, but are not used during execution. So I do not think that performance will be affected by using different aliases or by using small or long aliases.
Quassnoi makes a good point about the rehashing of the query. Although the query performance itself is not impacted, the overall performance environment will be impacted.
If these are ad-hoc queries, then each will get compiled and cached as a separate query. In SQL' for a query to match to a cached plan, the text must be identical (down to case and white-space)
If you've got lots of queries that differ only in the table aliases then you're going to have very poor plan reuse, lots of compiles and a very big plan cache.
It's only stored procedures that match to cached plans by objectid

How does including a SQL index hint affect query performance?

Say I have a table in a SQL 2005 database with 2,000,000+ records and a few indexes. What advantage is there to using index hints in my queries? Are there ever disadvantages to using index hints in queries?
First, try using SQL Profiler to generate a .trc file of activity in your database for a normal workload over a few hours. And then use the "Database Engine Tuning Advisor" on the SQL Server Management Studio Tools menu to see if it suggests any additional indexes, composite indexes, or covering indexes that may be beneficial.
I never use query hints and mostly work with multi-million row databases. They sometimes can affect performance negatively.
The key point that I believe everyone here is pointing to is that with VERY careful consideration the usage of index hints can improve the performance of your queries, IF AND ONLY IF, multiple indexes exist that could be used to retreive the data, AND if SQL Server is not using the correct one.
In my experience I have found that it is NOT very common to need Index hints, I believe I maybe have 2-3 queries that are in use today that have used them.... Proper index creation and database optimization should get you most of the way there to the performing database.
The index hint will only come into play where your query involves joining tables, and where the columns being used to join to the other table matches more than one index. In that case the database engine may choose to use one index to make the join, and from investigation you may know that if it uses another index the query will perform better. In that case you provide the index hint telling the database engine which index to use.
My experience is that sometimes you know more about your dataset then SQL Server does. In that case you should use query hints. In other words: You help the optimizer decide.
I once build a datawarehouse where SQL Server did not use the optimal index on a complex query. By giving an index hint in my query I managed to make a query go about 100 times faster.
Use them only after you analysed the query plan. If you think your query can run faster when using another index or by using them in a different order, give the server a hint.