(Cosmos DB) System functions in composite indexes are range filters? - indexing

Cosmos DB document says
Queries with multiple range filters can also be optimized with a
composite index. However, each individual composite index can only
optimize a single range filter. Range filters include >, <, <=, >=,
and !=. The range filter should be defined last in the composite
index.
What about system functions ?
For example, CONTAINS, ARRAY_CONTAINS, BETWEEN...

The official documentation does not mention that system functions can benefit from composite index. Then I test this. No matter if I make or not use composite index, I get the same RU/S. It seems that composite index can't work for system functions.
Tested SQL:
SELECT * FROM c where c.age = 20 and (c.partitionkey between '1' and '5') and contains(c.data,'str') order by c.name desc

Related

Column not appearing on Seek Predicate

We have a table called "MealEvent" which has an index on EventDate,ConsumerId and some other columns. I have a poorly performing query which has an ON clause like this (me2 is MealEvent):
ON me2.CONSUMER_ID=fe.CONSUMER_ID and me2.MEAL_EVENT_DATE between fe.START_DATE and fe.END_DATE
When I look at the execution plan, I see the ConsumerId in the Predicate as opposed to Seek Predicate. I am wondering why does SQL Server opt to not use it in the Seek Predicate? Can I force it to use it in the Seek Predicate?
Any advice is highly appreciated.
You need to flip the order of the key columns in the index. The server cannot use a range predicate and then an equality predicate, the equality must come first
It should be (CONSUMER_ID, MEAL_EVENT_DATE).
The reason is simple: When the server seeks the index for MEAL_EVENT_DATE, it does not have an exact key to look up, it wants a range of keys. It cannot then, within those keys, skip from one key to the next to get a single CONSUMER_ID.
But if CONSUMER_ID is first, then the server can seek directly to that key, then within that can look for the range of MEAL_EVENT_DATE.

SQL - How to find a specific value from DB without going through the entire DB table

I wonder how can I find a specific value from DB without going through the entire DB table.
by example:
There is a DB of students and we are looking for all the students with a certain name, how do you do that without going through the whole DB table.
Use INDEXES
Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. ... Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
SQL Server has four options for improving performance for this type of query:
A regular index (either clustered or non-clustered).
A full text index.
Partitioning.
Hash index (for memory optimized tables).
A regular index, created using create index, is the "canonical" answer to this question. It is like an alphabetical list of all names with a pointer to the record. The implementation uses something called B-trees, so the analogy is not perfect. These indexes can be used for equality (eg. =, is null) and inequality comparisons (eg. in, <, >)
A full text index indexes all words in a text column (for some definition of "word"). This can be used for a range of full text search options -- and available through contains.
Partitioning is used when you have lots and lots of data and only a handful of categories. That is highly unlikely with a name in a student database. But it physically splits the data into separate files for each name or range of names.
Hash-based indexing is only available on memory-optimized tables. These are only useful for comparisons using = and in (and <> and not in).

Why we can't use a compound index with two or more independent range conditions?

use-the-index-luke.com says:
Nevertheless there are queries where a single index cannot do a
perfect job, no matter how you define the index; e.g., queries with
two or more independent range conditions as in the following example:
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE UPPER(last_name) < ?
AND date_of_birth < ?
It is impossible to define a B-tree index that would support this
query without filter predicates.
I don't understand its explanation specially the last sentence. Can someone help?
Isn't the explaination given there good enough?
No matter how you twist and turn the index definition, the entries are always arranged along a chain. At one end, you have the small entries and at the other end the big ones. An index can therefore only support one range condition as an access predicate. Supporting two independent range conditions requires a second axis, for example like a chessboard. The query above would then match all entries from one corner of the chessboard, but an index is not like a chessboard—it is like a chain. There is no corner.

Performance of SQL query with condition vs. without where clause

Which SQL-query will be executed with less time — query with WHERE-clause or without, when:
WHERE-clause deals with indexed field (e.g. primary key field)
WHERE-clause deals with non-indexed field
I suppose when we're working with indexed fields, thus query with WHERE will be faster. Am I right?
As has been mentioned there is no fixed answer to this. It all depends on the particular context. But just for the sake of an answer. Take this simple query:
SELECT first_name FROM people WHERE last_name = 'Smith';
To process this query without an index, every column, last_name must be checked for every row in the table (full table scan).
With an index, you could just follow a B-tree data structure until 'Smith' was found.
With a non index the worst case looks linear (n), whereas with a B-tree it would be log n, hence computationally less expensive.
Not sure what you mean by 'query with WHERE-clause or without', but you're correct that most of the time a query with a WHERE clause on an indexed field with outperform a query whose WHERE clause on a non-indexed field.
One instance where the performance will be the same (ie indexing doesn't matter) is when you run a range based query in your where clause (ie WHERE col1 > x ). This forces a scan of the table, and thus will be the same speed as a range query on a non indexed column.
Really, it depends on the columns you reference in the where clause, the types of data in the columns, the types of queries your running, etc.
It may depend on the type of where clause you are writing. In a simple where clause, it is generally better to have an index on the field you are using (and uindexes can and should be built on more than the PK). However, you have to write a saragble where clause for the index to make any difference. See this question for some guidelines on sarability:
What makes a SQL statement sargable?
There are cases where a where clause on the primary key will be slower.
The simplest is a table with one row. Using the index requires loading both the index and the data page -- two reads. No index cuts the work in half.
That is a degenerate case, but it points to the issue -- the proportion of the rows selected. Or, more accurately, the proportion of pages needed to resolve the query.
When the desired data is on all pages, then using an index slowed things down. For a non primary key, this can be disastrous, when the table is bigger than the page cache and the accesses are random.
Since pages are ordered by a primary key, the worst case is an additional index scan -- not too bad.
Some databases use statistics on tables to decide when to use an index and when to do a full table scan. Some don't.
In short, for low selectivity queries, an index will improve performance. For high selectivity queries, using an index can result in marginally worse performance or dire performance, depending on various factors.
Some of my queries are quite complex and applying a where clause degrading the performance. For the workaround, I used temp tables and then applied where clause on them. This significantly improved the performance. Also, where I had joins especially Left Outer Join, improved the performance.

Will adding a full text index in PostgreSQL speed up my regular queries which use LIKE?

If I add a full text index in PostgreSQL, will my LIKE and ILIKE queries use it automatically?
Do I need to use the special full text syntax in order to take advantage of a full-text index in PostgreSQL:
SELECT title, ts_headline(body, query) AS snippet, ts_rank_cd(body_tsv, query, 32) AS rank
FROM blog_entry, plainto_tsquery('hello world') AS query
WHERE body_tsv ## query
ORDER BY rank DESC;
No, a full-text index will not be used by the LIKE operator.
With PostgreSQL 9.1 you can create a new type of index that will speed up LIKE operations even if the wildcard is not only at the end of the expression
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
To expand on ahwnn's answer, they are different comparisons and cannot use the same index.
The full-text-search always involves tokenizing the text and typically involves stemming words. This makes it difficult to search for exact prefixes and usually impossible to e.g. match two spaces in a row (they generally just get discarded).
You might want to read up on the various index operator classes too, in particular text_pattern_ops and friends. Without this LIKE 'cab%' can't be optimised to >= 'cab' AND < 'cac'.
http://www.postgresql.org/docs/9.1/static/indexes-opclass.html