Let's assume we got a database for todolist and want to query all item that are important and are not already done. In SQL, I will use something like
SELECT * FROM todolist WHERE important = true AND state <> 'done'
How can we perform that type of request in an indexeddb nosql database?
With indexes?
Another way?
Not possible?
As I know to filter result on important = true :
objectstore.index('important').openCursor(IDBKeyRange.only('true'))
But I do not know how to filter on state <> 'done' as we got only IDBKeyRange.only(z).
And I do not know how to filter on both clause.
N.B. : In MongoDB we do :
db.userdetails.find({"date_of_join" : "16/10/2010","education":"M.C.A."})
In onupgradeneeded, create an index on the criteria that uses an array:
todolistStore.createIndex('importantIncomplete', ['important','state'],{unique:false});
For your query do:
var lowerBound = ['true','started'];
var upperBound = ['true','almostdone'];
var range = IDBKeyRange.bound(lowerBound,upperBound);
var request = todolistStore.index('importantIncomplete').openCursor(range);
There are two ways to query multiple indexes in IndexedDB. Josh described the fastest way of querying multiple fields using composite index. However it has cost of storage and slow down writing. Additionally query must be known a priori and hence index are created as required. Second method is manual key joining using sorted merge or others algorithm. This method requires indexes (not composite) on fields of interested and work all combination query. See here http://dev.yathit.com/ydn-db/nosql-query.html for sorted merge, nested loop and zigzag merge implementation in ydn-db library. I attend to add more key joining algorithms.
BTW, generally we don't index boolean value. If your query field is boolean data type, just do table scan. Not point in using index.
You will need indexes for that, and you can retrieve the data using a cursor where you can provide one filter.
On my blog (http://www.kristofdegrave.be/2012/01/indexed-db-reading-multiple-records.html?m=1) you can find some more info about it.
Related
We currently have an audit table of the following form:
TABLE(timestamp, type, data, id) -- all the fields are of type varchar / clob
The issue is that for some types there are actually several well suited ids. We currently only store one of those but it would be interesting to be able to query by any of those ids while still keeping only one column for that.
With Oracle's recent support for JSON we were thinking about maybe storing all the ids as a JSON object:
{ orderId:"xyz123"; salesId:"31232131" }
This would be interesting if we could continue to make queries by id with very good performance. Is this possible? Does Oracle allow for indexing in these kind of situations or would it always end up being a O(n) text search over all the millions of rows this table has?
Thanks
Although you can start fiddling around with features such as JSON types, nested tables and the appropriate indexing methods, I wouldn't recommend that unless you specifically want to enhance your Oracle skills.
Instead, you can store the ids in a junction table:
table_id synonym_id
1 1
1 10
1 100
2 2
With an index on synonym_id, table_id, looking up the synonym should be really fast. This should be simple to maintain. You can guarantee that a given synonym only applies to one id. You can hide the join in a view, if you like.
Of course, you can go down another route, but this should be quite efficient.
1)Oracle's JSON support allows functional indexing on the JSON data (b-tree index on JSON_VALUE virtual column.)
2) for non-indexed fields the json date needs to be parsed to get to the field values. Oracle uses a streaming parser with early termination. This means that fields at the beginning of the JSON data are found faster than those at the end.
I'm using the excellent Sequel ORM, and at some point I've generated an array of IDs and I'm trying to retrieve the users that have those IDs. e.g.
user_ids = [1, 2, 3]
User.where(id: user_ids).all
# (0.004223s) SELECT * FROM "users" WHERE ("id" IN (1, 2, 3))
But sometimes, the list of users is empty, and what I effectively get is this:
User.where(id: []).all
# (0.421287s) SELECT * FROM "users" WHERE ("id" != "id")
The result is correct (i.e. no rows returned), but the query is two orders of magnitude slower.
On tables with millions of rows, it can take a couple of seconds to return.
I'm curious why this query is generated in the first place, but I'm also curious why the Postgres query planner doesn't seem to detect this contradiction and return an empty dataset instantly.
Is there a neat solution to this, or will I have to check all arrays for emptiness?
I'm using Ruby 2.0.0, Sequel 4.24.0 & Postgres 9.3.6.
There is a closed issue on github about this behavior.
I agree with jeremyevans that there is no need for a fix, because it is obvious that the reseult will and should always be empty. Furthermore you would agrue that PostgreSQL should be clever enough to optimize queries like that and should not do a whole table scan.
Therefore it makes way more sense IMHO to fix this behavior in the code to avoid to call the database completely:
user_ids.present? ? User.where(id: user_ids) : []
Note that Sequel's default behavior was changed in 4.25.0 so that such queries will use false instead of id != id. While id != id may be more correct in terms of NULL handling, the performance benefits of false make it a better default.
I've got a table with close to 5kk rows. Each one of them has one text column where I store my XML logs
I am trying to find out if there's some log having
<node>value</node>
I've tried with
SELECT top 1 id_log FROM Table_Log WHERE log_text LIKE '%<node>value</node>%'
but it never finishes.
Is there any way to improve this search?
PS: I can't drop any log
A wildcarded query such as '%<node>value</node>%' will result in a full table scan (ignoring indexes) as it can't determine where within the field it'll find the match. The only real way I know of to improve this query as it stands (without things like partitioning the table etc which should be considered if the table is logging constantly) would be to add a Full-Text catalog & index to the table in order to provide a more efficient search over that field.
Here is a good reference that should walk you through it. Once this has been completed you can use things like the CONTAINS and FREETEXT operators that are optimised for this type of retrieval.
Apart from implementing full-text search on that column and indexing the table, maybe you can narrow the results by another parameters (date, etc).
Also, you could add a table field (varchar type) called "Tags" which you can populate when inserting a row. This field would register "keywords, tags" for this log. This way, you could change your query with this field as condition.
Unfortunately, about the only way I can see to optimize that is to implement full-text search on that column, but even that will be hard to construct to where it only returns a particular value within a particular element.
I'm currently doing some work where I'm also storing XML within one of the columns. But I'm assuming any queries needed on that data will take a long time, which is okay for our needs.
Another option has to do with storing the data in a binary column, and then SQL Server has options for specifying what type of document is stored in that field. This allows you to, for example, implement more meaningful full-text searching on that field. But it's hard for me to imagine this will efficiently do what you are asking for.
You are using a like query.
No index involved = no good
There is nothing you can do with what you have currently to speed this up unfortunately.
I don't think it will help but try using the FAST x query hint like so:
SELECT id_log
FROM Table_Log
WHERE log_text LIKE '%<node>value</node>%'
OPTION(FAST 1)
This should optimise the query to return the first row.
I need to know if the Azure Storage Tables have indexed the RowKey separately from the PartitionKey in order to do a query of this kind...
Assumption: My table have Forum Posts having PartitionKey = UserEMail, RowKey = PostInstant. So, I want to query like this...
SELECT data FROM forum WHERE PartitionKey="user#company.com" AND RowKey < DateLimit;
(Note: I know that the PostInstant should be written "inversed" to take advantage of ascending sort and thus obtain it in descending order, that's not the point).
As I understand, by explicitly indicating the PartitionKey the query is well going on the path of performance, but after that... will the RowKey be intelligently used to a) give the results sorted and b) stop the scan after reached the DateLimit?
Or, in other words, Does the Azure Storage Table indexing applies to the concatenation of PartitionKey+RowKey, thus only be useful for exact row matching and full table sort?
Yes the query that you have written is as efficient as you can write it against Azure tables and it should make use of indexes for both the PartitionKey and the RowKey.
Your results are guaranteed to come back in PartitionKey then RowKey order.
The PartitionKey and RowKey are currently the only indexed attributes you can use. However, you are only using one in your question. So, your query will search the entire PartitionKey 'user#company.com' without any further index. If it is a small partition, this might not be a big deal. If it is a big one, then you should also use the RowKey. Note, that while you are specifying an attribute called 'PostInstant', that is not the same as the RowKey. You must specifically query on RowKey (even if it is same as another column name).
So your query would be more like this:
ctx.CreateQuery<Foo>('tablename').Where(s => s.PartitionKey == "user#company.com" && s.RowKey.Compare(DateLimit) > 0);
I am assuming of course that 'DateLimit' is actually a string formatted date (like ticks). If you invert the ordering of ticks you would also invert the comparison operator (>).
Is there going to be much benefit in indexing a boolean field in a database table?
Given a common situation, like "soft-delete" records which are flagged as inactive, and hence most queries include WHERE deleted = 0, would it help to have that field indexed on its own, or should it be combined with the other commonly-searched fields in a different index?
No.
You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).
Maybe you would make it the first field in the clustered index if every query took into account soft deletes?
What is about a deleted_at DATETIME column? There are two benefits.
If you need an unique column like name, you can create and soft-delete a record with the same name multiple times (if you use an unique index on the columns deleted_at AND name)
You can search for recently deleted records.
You query could look like this:
SELECT * FROM xyz WHERE deleted_at IS NULL
I think it would help, especially in covering indices.
How much/little is of course dependent on your data and queries.
You can have theories of all sorts about indices but final answers are given by the database engine in a database with real data. And often you are surprised by the answer (or maybe my theories are too bad ;)
Examine the query plan of your queries and determine if the queries can be improved, or if the indices can be improved.
It's quite simple to alter indices and see what difference it makes
I think it would help if you were using a view (where deleted = 0) and you are regularly querying from this view.
i think if your boolean field is such that you would be referring to them in many cases, it would make sense to have a separate table, example DeletedPages, or SpecialPages, which will have many boolean type fields, like is_deleted, is_hidden, is_really_deleted, requires_higher_user etc, and then you would take joins to get them.
Typically the size of this table would be smaller and you would get some advantage by taking joins, especially as far as code readability and maintainability is concerned. And for this type of query:
select all pages where is_deleted = 1
It would be faster to have it implemented like this:
select all pages where pages
inner join DeletedPages on page.id=deleted_pages.page_id
I think i read it somewhere about mysql databases that you need a field to at least have cardinality of 3 to make indexing work on that field, but please confirm this.
If you are using database that supports bitmap indexes (such as Oracle), then such an index on a boolean column will much more useful than without.