Why I can't query a subset of data from Datastax DSE 5.0.x Graph without getting allow_scan is disabled error? - datastax

Hi I have disabled scans in my schema.
I know that queries like those wouldn't be allowed:
g.V()
g.V().hasLabel("User")
org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Could
not find an index to answer query clause and graph.allow_scan is
disabled:
I wonder why even those:
g.V().limit(2)
g.V().hasLabel("User").limit(2)
cause the same exception to be thrown! This is frustrating since they are bounded queries and they certainly don't cause full cassandra table scans..
Thanks

There's an ongoing discussion about what kind of queries (if any at all) should be allowed with scans disabled. For now the rule is simple: If the initial step is not an index lookup, then it's considered to be a full scan.
It's easy to say that:
g.V().hasLabel("user").limit(2)
...for example should be allowed, but if that is not considered to be a full scan, then what about these:
g.V().hasLabel("user").limit(10)
g.V().hasLabel("user").limit(100)
g.V().hasLabel("user").limit(1000)
g.V().hasLabel("user").limit(10000)
g.V().hasLabel("user").limit(100000)
Where do we draw the line? I'm not expecting you to answer this question, just want to show that it's not as easy as it may seem at first.

Related

The function "PROBE" on column may be causing a table scan

I used SQL Doctor from Idera against my database. It generated report in "Query Optimization" category I got finding that "The function "PROBE" on column may be causing a table scan". Tool provided link http://sqldoctor.idera.com/query-optimization/implicit-conversion-recommendation/ but I can't find anything related to PROBE.
If anyone know what it stands for and where can I find the exact details for it.
I don't normally like to do all-link answers, but you asked for "what it stands for and where [you can] find the exact details for it."
Here's a nice summary explanation: Probe Residual on Hash Match
Here's a long Microsoft explanation: Interpreting Execution Plans Containing Bitmap Filters.
And here's one that I think might be the most helpful: Probe Residual when you have a Hash Match – a hidden cost in execution plans
And here's my two cents as well. Without seeing your queries, tables, or execution plan, I'm mostly guessing, but I would say that the fact that you were directed to that page in the documentation suggests that you are doing a join that requires an implicit conversion. Since PROBE is associated with hash matches, I infer your join is one of those.
So my guess is that you are joining on two or more fields that have mismatched data types, and that the conversion this requires means that the indexes on one of your tables can't be used. Without a usable index, the query engine needs to do a table scan, a very expensive operation (particularly if you have a large table.)

give the db2 a hint which index to use?

moin-moin,
I have a join over some tables, and I want to give the db2-database a hint, which index i want her to use. I know, this may result in a slow query, but I have a production and a test database, and I want the same behaviour in both databases (even if in one db the amount of data is significantly different or what state the (index-)cache has.
Is this possible (and how)? I did not find anything in the online manual, which could mean, I had the wron searching criteria.
Thanks a million.
This is not something that is commonly done with DB2. However, you can use selectivity. It should still be around in present versions. Adding selectivity clauses to queries will affect the decisions made by the query optimizer.
Also what Gilbert Le Blanc noted above will work. You can UPDATE the syscat.tables colums and fool the DB2 to optimize the queries for non-existent amounts of data in the rows. Also the rest of your DB / DBM CFG should match (ie. the calculated disk and cpu speeds, memory usage related settings etc) because in some situations they might also matter to some degree.
You can influence the optimizer via a Profile:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1202storedprocedure/index.html
It was recently asked here: Is it possible to replace NL join with HS join in sql
However, I haven't heard about the selectivity clause, and I think you should try first this option, before create a profile. But you should do this just after having tried other options. Follow the steps as indicated in the DeveloperWorks tutorial before influence the optimizer:
Experiment with different SQL optimization classes. The default optimization class is controlled by the DFT_QUERYOPT parameter in the database configuration file.
Attempt to resolve any performance problems by ensuring that proper database statistics have been collected. The more detailed the statistics, the better the optimizer can perform. (See RUNSTATS in the DB2 Command Reference).
If the poor access plan is the result of rapidly changing characteristics of the table (i.e. grows very quickly such that statistics get out of date quickly), try marking the table as VOLATILE using the ALTER TABLE command.
Try explaining the query using literal values instead of parameter markers in your predicates. If you are getting different access plans when using parameter markers, it will help you understand the nature of the performance problem better. You may find that using literals in your application will yield a better plan (and therefore better performance) at the cost of SQL compilation overhead.
Try using DB2’s index advisor (db2advis) to see if there are any useful indexes which you may have overlooked.

Indexing does not work with LIKE and NOT operator in Sql Server, Is it a MYTH

I have seen in many articles on Sql Server which states that when we are writing a query we should avoid using NOT and LIKE operator in our query because indexing is not applied on it.
eg.
SELECT [CityId]
,[CityName]
,[StateId]
FROM [LIMS].[dbo].[City]
WHERE CityId NOT IN(1, 2)
I executed above query and found that indexing was getting used to filter the records.
Following is the execution plan, which clearly shows Clustured Index seek. This clearly violates what I used to think and read.
Was my previous understanding incorrect ?
That indexing does not work with LIKE and NOT operators is just a rule of thumb. SQL Server (or any competent RDBMS, for that matter) will use the best algorithm it can in most cases. So, if you could manually seek an index, so could SQL Server.
In the particular example you provided, it is unclear whether a seek is any more efficient than a scan because most of the records are going to be returned anyhow. So, I wouldn't read much into that particular execution plan.
Bottom line: Learn and understand how your database system internally organizes data and indices so you won't have to rely on rules of thumb.

Reading SQL deadlock graph

Can someone please help me to read/understand this deadlock graph?
I don't understand why process 75 is requesting a lock on an object that he has a lock on already?
According to a blog article that I've found the existance of an "Exchange Event" indicates that the source of your problem may be parallelism in your query.
Today's Annoyingly-Unwieldy Term: "Intra-Query Parallel Thread Deadlocks"
The above article goes into much more detail, however the punchline is:
Workaround #1: Add an index or improve the query to eliminate the need for parallelism. In most cases, the use of parallelism in a query indicates that you have a very large scan, sort, or join that isn't supported by proper indexes. If you tune the query, you will often find that you end up with a much quicker and more efficient plan that doesn't use parallelism, and therefore isn't subject to this type of problem. Of course, in some queries (DSS/OLAP-type queries, in particular) it may be difficult to eliminate all large scans.
Workaround #2: Force single-threaded execution with an "OPTION (MAXDOP 1)" query hint at the end of the query. If you can't modify the query, you can apply the hint to any query with a plan guide.
You might want to try this to see if there is any improvement.

How do you interpret a query's explain plan?

When attempting to understand how a SQL statement is executing, it is sometimes recommended to look at the explain plan. What is the process one should go through in interpreting (making sense) of an explain plan? What should stand out as, "Oh, this is working splendidly?" versus "Oh no, that's not right."
I shudder whenever I see comments that full tablescans are bad and index access is good. Full table scans, index range scans, fast full index scans, nested loops, merge join, hash joins etc. are simply access mechanisms that must be understood by the analyst and combined with a knowledge of the database structure and the purpose of a query in order to reach any meaningful conclusion.
A full scan is simply the most efficient way of reading a large proportion of the blocks of a data segment (a table or a table (sub)partition), and, while it often can indicate a performance problem, that is only in the context of whether it is an efficient mechanism for achieving the goals of the query. Speaking as a data warehouse and BI guy, my number one warning flag for performance is an index based access method and a nested loop.
So, for the mechanism of how to read an explain plan the Oracle documentation is a good guide: http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/ex_plan.htm#PFGRF009
Have a good read through the Performance Tuning Guide also.
Also have a google for "cardinality feedback", a technique in which an explain plan can be used to compare the estimations of cardinality at various stages in a query with the actual cardinalities experienced during the execution. Wolfgang Breitling is the author of the method, I believe.
So, bottom line: understand the access mechanisms. Understand the database. Understand the intention of the query. Avoid rules of thumb.
This subject is too big to answer in a question like this. You should take some time to read Oracle's Performance Tuning Guide
The two examples below show a FULL scan and a FAST scan using an INDEX.
It's best to concentrate on your Cost and Cardinality. Looking at the examples the use of the index reduces the Cost of running the query.
It's a bit more complicated (and i don't have a 100% handle on it) but basically the Cost is a function of CPU and IO cost, and the Cardinality is the number of rows Oracle expects to parse. Reducing both of these is a good thing.
Don't forget that the Cost of a query can be influenced by your query and the Oracle optimiser model (eg: COST, CHOOSE etc) and how often you run your statistics.
Example 1:
SCAN http://docs.google.com/a/shanghainetwork.org/File?id=dd8xj6nh_7fj3cr8dx_b
Example 2 using Indexes:
INDEX http://docs.google.com/a/fukuoka-now.com/File?id=dd8xj6nh_9fhsqvxcp_b
And as already suggested, watch out for TABLE SCAN. You can generally avoid these.
Looking for things like sequential scans can be somewhat useful, but the reality is in the numbers... except when the numbers are just estimates! What is usually far more useful than looking at a query plan is looking at the actual execution. In Postgres, this is the difference between EXPLAIN and EXPLAIN ANALYZE. EXPLAIN ANALYZE actually executes the query, and gets real timing information for every node. That lets you see what's actually happening, instead of what the planner thinks will happen. Many times you'll find that a sequential scan isn't an issue at all, instead it's something else in the query.
The other key is identifying what the actual expensive step is. Many graphical tools will use different sized arrows to indicate how much different parts of the plan cost. In that case, just look for steps that have thin arrows coming in and a thick arrow leaving. If you're not using a GUI you'll need to eyeball the numbers and look for where they suddenly get much larger. With a little practice it becomes fairly easy to pick out the problem areas.
Really for issues like these, the best thing to do is ASKTOM. In particular his answer to that question contains links to the online Oracle doc, where a lot of the those sorts of rules are explained.
One thing to keep in mind, is that explain plans are really best guesses.
It would be a good idea to learn to use sqlplus, and experiment with the AUTOTRACE command. With some hard numbers, you can generally make better decisions.
But you should ASKTOM. He knows all about it :)
The output of the explain tells you how long each step has taken. The first thing is to find the steps that have taken a long time and understand what they mean. Things like a sequential scan tell you that you need better indexes - it is mostly a matter of research into your particular database and experience.
One "Oh no, that's not right" is often in the form of a table scan. Table scans don't utilize any special indexes and can contribute to purging of every useful in memory caches. In postgreSQL, for example, you will find it looks like this.
Seq Scan on my_table (cost=0.00..15558.92 rows=620092 width=78)
Sometimes table scans are ideal over, say, using an index to query the rows. However, this is one of those red-flag patterns that you seem to be looking for.
Basically, you take a look at each operation and see if the operations "make sense" given your knowledge of how it should be able to work.
For example, if you're joining two tables, A and B on their respective columns C and D (A.C=B.D), and your plan shows a clustered index scan (SQL Server term -- not sure of the oracle term) on table A, then a nested loop join to a series of clustered index seeks on table B, you might think there was a problem. In that scenario, you might expect the engine to do a pair of index scans (over the indexes on the joined columns) followed by a merge join. Further investigation might reveal bad statistics making the optimizer choose that join pattern, or an index that doesn't actually exist.
look at the percentage of time spent in each subsection of the plan, and consider what the engine is doing. for example, if it is scanning a table, consider putting an index on the field(s) that is is scanning for
I mainly look for index or table scans. This usually tells me I'm missing an index on an important column that's in the where statement or join statement.
From http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx:
If you see any of the following in an
execution plan, you should consider
them warning signs and investigate
them for potential performance
problems. Each of them are less than
ideal from a performance perspective.
* Index or table scans: May indicate a need for better or additional indexes.
* Bookmark Lookups: Consider changing the current clustered index,
consider using a covering index, limit
the number of columns in the SELECT
statement.
* Filter: Remove any functions in the WHERE clause, don't include wiews
in your Transact-SQL code, may need
additional indexes.
* Sort: Does the data really need to be sorted? Can an index be used to
avoid sorting? Can sorting be done at
the client more efficiently?
It is not always possible to avoid
these, but the more you can avoid
them, the faster query performance
will be.
Rules of Thumb
(you probably want to read up on the details too:
Oracle Docs
ASKTOM
SQL Server Docs
)
Bad
Table Scans of Several Large Tables
Good
Using a unique index
Index includes all required fields
Most Common Win
In about 90% of performance problems I have seen, the easiest win is to break up a query with lots (4 or more) of tables into 2 smaller queries and a temporary table.