IN vs OR in OQL pivotal gemfire - gemfire

When dealing with content spanned across multiple tables (regions in terms of gemfire) on different nodes in a cluster, which operator provides faster results.
Let's say, for now my search OQL query looks like following:
select * from /content_region where content_type = 'xyz' AND (shared_with.contains('john') OR (shared_with.contains('michael') OR (shared_with.contains('peter')))
Consider 'shared_with' is List.
References:
IN vs OR in the SQL WHERE Clause
SQL performance tuning for Oracle Many OR vs IN () [duplicate]

"IN" on an indexed field will be extremely more responsive than "OR" as a direct answer but there are exceptions.
Some comments on your example:
select * from /content_region where content_type = 'xyz' AND (shared_with.contains('john') OR (shared_with.contains('michael') OR (shared_with.contains('peter')))
You want to have that "content_type = 'xyz' AND " in front of your "OR" statements because GemFire will first execute that statement and with the smaller result set, apply the "contains" operation in-memory with the limited result set.
In the example that you provided, an "IN" clause cannot be applied with contains.
Before I leave this answer, I do use IN frequently with keys from another result set. If the attribute is indexed, it is very fast.

Related

SQL: like v. equals performance comparison

I have a large table (100 million rows) which is properly indexed in a traditional RDBMS system (Oracle, MySQL, Postgres, SQL Server, etc.). I would like to perform a SELECT query which can be formulated with either of the following criteria options:
One that can be represented by a single criteria:
LIKE "T40%"
which only looks for matches at the beginning of the string field due to the wildcard
or
One that requires a list of say 200 exact criteria:
WHERE IN("T40.x21","T40.x32","T40.x43")
etc.
All other things being equal. Which should I expect to be more performant?
Assuming that both queries return the same set of rows (i.e. the list of items that you supply in the IN expression is exhaustive) you should expect almost identical performance, perhaps with some advantage for the LIKE query.
RDBMS engines have been using index searches for begins-with LIKE queries, so LIKE 'T40%' will produce records after an index search
Your IN query would be optimized for index search as well, perhaps giving RDBMS a tighter lower and upper bounds. However, there would be an additional filtering step to eliminate records outside your IN list, which is a waste of CPU cycles under the assumption that all rows would be returned anyway.
In case you'd parameterize your query, the second query becomes harder to pass to an RDBMS from your host program. All other things being equal, I would use LIKE.
i would suggest to go with LIKE operator because the ESCAPE OPTION Has to be used along with '\' symbol to increase the exact matching the character string.

Ordering the SQL where conditions so that the most commonly true is first

Considering that you have a query like the one bellow and have several thousand records to process:
SELECT *
FROM A
WHERE A.COLUMN IN (some list) OR VARIABLE='SOMETHING'
I have confidence that in most cases VARIABLE='SOMETHING' will be true, do you think that switching the where condition to
WHERE VARIABLE='SOMETHING' OR A.COLUMN IN (some list)
making the mostly true (and clearly lighter on the processing requirements) condition first give a nice boost in performance? I don't have a dataset in development big enough to test this for myself
As with more performance question, you should test the two versions. However, unless the list is rather long, then this probably will not make a difference in performance.
You can try adding indexes, particularly on A(column) and A(variable, column). Oracle is smart in its use of indexes, but it might not be quite smart enough in this case (you need to look at the execution plans). You could rephrase the query as:
SELECT *
FROM A
WHERE VARIABLE = 'SOMETHING'
UNION ALL
SELECT *
FROM A
WHERE A.COLUMN IN (some list) AND VARIABLE <> 'SOMETHING';
Oracle should use both indexes in this case.

What is the best way to use sdo_relate in oralce spatial 10g?

I'm using sdo_relate Operator of Oracle spatial in a client-server environment to query two tables each has thousands of geometry objects. I apply a condition in the where cluase to pass only one object to the so-called query-window.
Using '/*+ ordered */' hint and the required order of tables in the from clause( as documented in Oracle Spatial reference) i get a bad Performance:
SELECT /*+ ORDERED */ A.someAttr FROM Polygons A,lines B WHERE
B.id=someValue AND sdo_relate(B.geom,A.geom,
'mask=anyinteract') = 'TRUE'; --6 Min!
I thinkt it is the way around, because without the ordered hint, it takes 50 Sec. (still need to be optimized)
Any way, it seems that the spatial documentation is worng!
http://docs.oracle.com/cd/B19306_01/appdev.102/b14255/sdo_operat.htm#i78531
Any one has had such an experience, look Forward your solution.
The key thing is that your query is incorrectly written. In all spatial operators, the first column is that from the table you search, the second one is your query window. So rewrite your query like this:
SELECT A.someAttr
FROM Polygons A,lines B
WHERE B.id=someValue
AND sdo_relate(A.geom,B.geom,'mask=anyinteract') = 'TRUE';
or simpler:
SELECT A.someAttr
FROM Polygons A,lines B
WHERE B.id=someValue
AND sdo_anyinteract(A.geom,B.geom) = 'TRUE';
This will be much faster than the 50 seconds you indicate. And the hint is not necessary at all.
Assuming you want to do the reverse operation (= search for all LINES that intersect a given POLYGON), then you would write this:
SELECT A.someAttr
FROM Polygons A,lines B
WHERE A.id=someValue
AND sdo_anyinteract(B.geom,A.geom) = 'TRUE';
In other words you need to order the arguments to SDO_ANYINTERACT in such a way that the first is the name of the column you search and the second is your search window.
The ordering of the tables in the from clause is not important, neither is the ordering of the predicates in the where clause: the database optimizer will produce the same query plan.
The only reason for ordering the tables in the where clause is if you use the /*+ order */ hint to get the optimiser to perform the join in the order the tables are listed. But this is unnecessary here (and may even have negative effects).
The simple rule is this: do not use any hints - unless you know you have a problem and you know for a fact that using some specific hints will solve that. Never use hints just because you think them necessary. The optimizer is clever enough to produce the proper plan and hints are used only in rare occasions when it does not.

SQLITE FTS3 Query Slower than Standard Tabel

I built sqlite3 from source to include the FTS3 support and then created a new table in an existing sqlite database containing 1.5million rows of data, using
CREATE VIRTUAL TABLE data USING FTS3(codes text);
Then used
INSERT INTO data(codes) SELECT originalcodes FROM original_data;
Then queried each table with
SELECT * FROM original_data WHERE originalcodes='RH12';
This comes back instantly as I have an index on that column
The query on the FTS3 table
SELECT * FROM data WHERE codes='RH12';
Takes almost 28 seconds
Can someone help explain what I have done wrong as I expected this to be significantly quicker
The documentation explains:
FTS tables can be queried efficiently using SELECT statements of two different forms:
Query by rowid. If the WHERE clause of the SELECT statement contains a sub-clause of the form "rowid = ?", where ? is an SQL expression, FTS is able to retrieve the requested row directly using the equivalent of an SQLite INTEGER PRIMARY KEY index.
Full-text query. If the WHERE clause of the SELECT statement contains a sub-clause of the form " MATCH ?", FTS is able to use the built-in full-text index to restrict the search to those documents that match the full-text query string specified as the right-hand operand of the MATCH clause.
If neither of these two query strategies can be used, all queries on FTS tables are implemented using a linear scan of the entire table.
For an efficient query, you should use
SELECT * FROM data WHERE codes MATCH 'RH12'
but this will find all records that contain the search string.
To do 'normal' queries efficiently, you have to keep a copy of the data in a normal table.
(If you want to save space, you can use a contentless or external content table.)
You should read documentation more carefully.
Any query against virtual FTS table using WHERE col = 'value' will be slow (except for query against ROWID), but query using WHERE col MATCH 'value' will be using FTS and fast.
I'm not an expert on this, but here are a few things to think about.
Your test is flawed (I think). You are contrasting a scenario where you have an exact text match (the index can be used on original_data - nothing is going to outperform this scenario) with an equality on the fts3 table (I'm not sure that FTS3 would even come into play in this type of query). If you want to compare apples to apples (to see the benefit of FTS3), you're going to want to compare a "like" operation on original_data against the FTS3 "match" operation on data.

SQL efficiency - [=] vs [in] vs [like] vs [matches]

Just out of curiosity, I was wondering if there are any speed/efficiency differences in using [=] versus [in] versus [like] versus [matches] (for only 1 value) syntax for sql.
select field from table where field = value;
versus
select field from table where field in (value);
versus
select field from table where field like value;
versus
select field from table where field matches value;
I will add to that also exists and subquery.
But the performance depends on the optimizer of the given SQL engine.
In oracle you have a lot of differences between IN and EXISTS, but not necessarily in SQL Server.
The other thing that you have to consider is the selectivity of the column that you use. Some cases show that IN is better.
But you have to remember that IN is non-sargable (non search argument able) so it will not use the index to resolve the query, the LIKE and = are sargable and support the index
The best ? You should spend some time to test it in your environment
it depends on the underlying SQL engine. In MS-SQL, for example (according to the query planner output), IN clauses are converted to =, so there would be no difference
normally the "in" statement is used when there are several values to be compared. The engine walks the list for each value to see if one matches. if there is only one element then there is no difference in time vs the "=" statement.
the "like" expression is different in that is uses pattern matching to find the correct values, and as such requires a bit more work in the back end. For a single value, there wouldn't be a significant time difference because you only have one possible match, and the comparison would be the same type of comparison that would occur for "=" and "in".
basically, no, or at least the difference is so insignificant that you wouldn't notice.
The best practice for any question about what would be faster, is to measure. SQL engines are notoriously difficult to predict. You can look at the output of EXPLAIN PLAN to get a sense of it, but in the end, only measuring the performance on real data will tell you what you need to know.
In theory, a SQL engine could implement all three of these exactly the same, but they may not.