Using OrientDB 2.0.2, a LUCENE query does not seem to respect the "LIMIT n" operator - lucene

Using LUCENE inside of OrientDB seems to work fine, but there are very many LUCENE-specific query parameters that I would ordinarily pass directly to LUCENE (normally through Solr). The first one I need to pass is the result limiter such as SELECT * FROM V WHERE field LUCENE "Value" LIMIT 10.
If I use a value that only returns a few rows, I get the performance I expect, but if it has a lot of values, I need the limiter to get the result to return quickly. Otherwise I get an message in the console stating that The query would return more than 50000 records. Please consider using an index.
How do I pass additional LUCENE query paramters?

There's a known issue with the query parser which is in the process of being fixed, until then the following workaround should help:
SELECT FROM (
SELECT * FROM V WHERE Field LUCENE 'Value'
) LIMIT 10
Alternatively, depending on which client libraries you're using you may be able to set a limit using the out-of-band query settings.

Related

Conditionally LIMIT in BigQuery

I have read that in Postgres setting LIMIT NULL will effectively not limit the results of the SELECT. However in BigQuery when I set LIMIT NULL based on a condition I see Syntax error: Unexpected keyword NULL.
I'd like to figure out a way to limit or not based on a condition (could be an argument passed into a procedure, or a parameter passed in by a query job, anything I can write a CASE or IF statement for). The mechanism for setting the condition shouldn't matter, what I'm looking for is whether there is a way to syntactically indicate a value for LIMIT, that will not limit, in a valid way to BigQuery.
The LIMIT clause works differently within BigQuery. It specifies the maximum number of depression inputs in the result. The LIMIT n must be a constant INT64.
Using the LIMIT clause, you can overcome the limitation on cache result size:
Using filters to limit the result set.
Using a LIMIT clause to reduce the result set, especially if you are
using an ORDER BY clause.
You can see this example:
SELECT
title
FROM
`my-project.mydataset.mytable`
ORDER BY
title DESC
LIMIT
100
This will only return 100 rows.
The best practice is to use it if you are sorting a very large number of values. You can see this document with examples.
If you want to return all rows from a table, you need to omit the LIMIT clause.
SELECT
title
FROM
`my-project.mydataset.mytable`
ORDER BY
title DESC
This example will return all the rows from a table. It is not recommended to omit LIMIT if your tables are too large, as it will consume a lot of resources.
One solution to optimize resources is to use cluster tables. This will save costs and querying times. You can see this document with a detailed explanation of how it works.
You can write a stored procedure that dynamically creates a query based on input parameters. Once your sql query is ready, you can use execute immediate to run that. In this way, you can control what value should be provided to the limit clause of your query.
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting#execute_immediate
Hope this answers your query.

Passing reduced ES query results to SQL

This is a follow-up question to How to pass ElasticSearch query to hadoop.
Basically, I want to do a full-text-search in ElasticSearch and then pass the result set to SQL to run an aggregation query. Here's an example:
Let's say we search "Terminator" in a financials database that has 10B records. It has the following matches:
"Terminator" (1M results)
"Terminator 2" (10M results)
"XJ4-227" (1 result ==> Here "Terminator" is in the synopsis of the title)
Instead of passing back the 10+M ids, we'd pass back the following 'reduced query' --
...WHERE name in ('Terminator', 'Terminator 2', 'XJ4-227')
How could we write such an algorithm to reduce the ES result set to a smallest possible filter query that we could send back to SQL? Does ES have any sort of match-metadata that would help us in this?
If you know that which "not analyzed" (keyword at 5.x) field would be suitable for your use case you could get their distinct values and number of matches by terms aggregation. sum_other_doc_count even tells you if your search resulted in too many distinct values, as only top N are returned.
Naturally you could run terms aggregation on multiple fields and use the one in SQL which had fewest distinct values. And actually it could be more efficient to first run cardinality aggregation to know to which field you should run terms aggregation.
If your search is a pure filter then its result should be cached but please benchmark both solutions as your ES cluster has quite a lot of data.

How to cast a MongoDB query and use Index

I have a sql query that I want to convert to MongoDB and still use the index. Here is the sql query
SELECT * FROM "Data1" WHERE age > Cast('12' as int)
According to this SO answer the above query can be converted to:
db.test.find("this.age > 12")
However, using this syntax will not use any indexes. I wanted to know has this issue has been fixed for MonoDB.
You need to use the $gt operator, to make use of indexes.
db.test.find({age:{$gt:12}})
The field age should be indexed.
I wanted to know has this issue has been fixed for MongoDB.
It still can't be achieved.
The doc says, If you need to use java script for evaluation and select documents, you need to use the $where operator. But it would not use the index.
db.test.find( { $where: "this.age>12" } );
From the docs,
$where evaluates JavaScript and cannot take advantage of indexes;
Hence whenever query criteria is evaluated as Java script , it can never make use of indexes.

Apache Solr - Wrong facet count

Are there any known issues in Solr 3.6, where when grouping, filter query (fq) is applied with faceting, the facet counts are coming back wrong.
If I have a query:
..indent=on&wt=json&start=0&rows=500&q=*:*&group=true&group.truncate=true&group.ngroups=true&group.field=Id&facet=true&facet.mincount=1&facet.field={!ex=filed1}field1&
facet.field={!ex=filed2}field2
if user filters on field1, then I have following query:
...indent=on&wt=json&start=0&rows=500&q=*:*&fq={!tag=dt}field1:value&group=true&group.truncate=true&group.ngroups=true&group.field=Id&facet=true&facet.mincount=1&facet.field={!ex=dt}field1&facet.field={!ex=dt}field2
I'm noticing that the facet counts for are different in the results coming back from each query.
thanks,
There seem to be two problems here:
You are misspelling field1 as filed1 in your !ex queries.
You are using !ex local parameter but without corresponding !tag parameter.

How are strings passed to SQLAlchemy's .like() method

I'm building an interface on Flask using SQLAlchemy and part of it is a search API. Essentially a type-ahead input is calling the server with its value (for example an email) and then the server is performing a SQLalchemy query using .like in a filter like below
q = session.query(User).filter(User.email.like('%'+term+'%')).all()
This query isn't really returning anything useful and after the first few characters, nothing at all. But if I perform the same query with the term hardcoded, like so:
q = session.query(User).filter(User.email.like('%mysearchterm%')).all()
It will return results perfectly fine, so there's something with how I'm putting the term into the like() method but I really can't figure out what the issue is. The term is coming in from an ajax POST and the value is there on the server-side, just .like() isn't using it correctly.
By "nothing useful" I mean that the first sets of results coming back have nothing to do with the actual term being inputted, after a term of length higher than 3-4 no results are returned back despite matching items existing in the DB.
Any help greatly appreciated.
Issues been resolved. This query sat inside of a larger query builder function which applied limit and offset to the query later in the function, because the limit and offset were higher than the amount of results back the returned result set was empty.
Can chock this one up to bad human error and lack of sleep.