Force FAST ESP to search within scope fields when query does not specify scopes explicitly - fast-esp

This is a question to any FAST ESP developers out there :) I've noticed that documents with matches within scope fields are not returned when a simple query is issued.
Say, I have a document with a scope field 'Places', that contains the value 'London, UK' within one of the subscopes. If a query 'London' is issued, the document is not returned. If the query is changed to 'places:London', then the document gets found.
Since we have multiple scope fields, rewriting the initial query to include all of the scope names would be a pain, especially with advanced search operators.
How can you force FAST ESP to return such a document even if the query does not define the scope explicitly?
Thanks.

You simply can't, The Scope root is set in in the index profile. Your best bet would be to have a common scope and use that instead.

Related

Custom, user-definable "wildcard" constants in SQL database search -- possible?

My client is making database searches using a django webapp that I've written. The query sends a regex search to the database and outputs the results.
Because the regex searches can be pretty long and unintuitive, the client has asked for certain custom "wildcards" to be created for the regex searches. For example.
Ω := [^aeiou] (all non-vowels)
etc.
This could be achieved with a simple permanent string substitution in the query, something like
query = query.replace("Ω", "[^aeiou]")
for all the elements in the substitution list. This seems like it should be safe, but I'm not really sure.
He has also asked that it be possible for the user to define custom wildcards for their searches on the fly. So that there would be some other input box where a user could define
∫ := some other regex
And to store them you might create a model
class RegexWildcard(models.Model):
symbol = ...
replacement = ...
I'm personally a bit wary of this, because it does not seem to add a whole lot of functionality, but does seem to add a lot of complexity and potential problems to the code. Clients can now write their queries to a db. Can they overwrite each other's symbols?
That I haven't seen this done anywhere before also makes me kind of wary of the idea.
Is this possible? Desirable? A great idea? A terrible idea? Resources and any guidance appreciated.
Well, you're getting paid by the hour....
I don't see how involving the Greek alphabet is to anyone's advantage. If the queries are stored anywhere, everyone approaching the system would have to learn the new syntax to understand them. Plus, there's the problem of how to type the special symbols.
If the client creates complex regular expressions they'd like to be able to reuse, that's understandable. Your application could maintain a list of such expressions that the user could add to and choose from. Notionally, the user would "click on" an expression, and it would be inserted into the query.
The saved expressions could have user-defined names, to make them easier to remember and refer to. And you could define a syntax that referenced them, something otherwise invalid in SQL, such as ::name. Before submitting the query to the DBMS, you substitute the regex for the name.
You still have the problem of choosing good names, and training.
To prevent malformed SQL, I imagine you'll want to ensure the regex is valid. You wouldn't want your system to store a ; drop table CUSTOMERS; as a "regular expression"! You'll either have to validate the expression or, if you can, treat the regex as data in a parameterized query.
The real question to me, though, is why you're in the vicinity of standardized regex queries. That need suggests a database design issue: it suggests the column being queried is composed of composite data, and should be represented as multiple columns that can be queried directly, without using regular expressions.

Why some SPARQL queries lack FROM keyword?

I am using this client
http://yasgui.laurensrietveld.nl
and I hope to query bioportal http://bioportal.bioontology.org
Most of my prior queries had a PREFIX and no FROM part. Can I move any FROM URL into PREFIX?
Using YASGUI client, what is the difference between FROM and the Endpoint field?
Can I rewrite any query with a from statement into a query that does not have it?
I am not able to list for example details of Human Phenotype Ontology concept id: HP:0000023 because I am not sure what to put into FROM or if to use it at all.
There are a number of terms and mechanisms here. Let's go over them one by one.
First of all, a PREFIX clause is simply a declaration of a syntax shortcut, for use within your query. So this line:
PREFIX ex: <http://example.org/>
says that the string ex: is a shortcut for the string http://example.org/. If you have this prefix declared at the start of your query, you can use ex:someUrl (instead of <http://example.org/someUrl>) in other places in your query. It's simply there to make queries easier to read and write, but apart from that it has no influence on the meaning of your query.
A SPARQL endpoint is another term for a web service that can answer SPARQL queries.
The FROM clause of a SPARQL query determines the dataset (or more precisely, the default graph, which is part of the dataset) over which the query is executed. Any SPARQL endpoint may contain several graphs, each identified by a URI (so-called named graphs). A collection of such graph together is a dataset. If you don't specify a FROM clause (and perhaps also one or more FROM NAMED clauses), the dataset queried is simply whatever default dataset the endpoint chooses.
So, what this mean for your specific questions?
Most of my prior queries had a PREFIX and no FROM part. Can I move any FROM URL into PREFIX?
As you can see from the above explanation, that would make no sense. They are different mechanisms, for different purposes, that just both happen to use URIs.
Using YASGUI client, what is the difference between FROM and the Endpoint field?
The endpoint field defines which service YASGUI needs to send the query to. The FROM clause tells the endpoint what dataset you want to query.
Can I rewrite any query with a from statement into a query that does not have it?
Not generally, no. The absence of a FROM clause means that the endpoint executes the query over its default dataset. Depending on how that endpoint is configured, this may mean that you either get a lot more results (namely not just from the one dataset you want, but from a lot of others) or none at all (in case the dataset you wanted to query is not part of the endpoint's default dataset).

Expanding an arbitrary Lucene Query

I am using Lucene 3.6.1. I receive a query from a user. This query may contain + or - operators, and may also contain phrases. In certain circumstances, I would like to expand the query by adding some extra terms that I compute. These terms are optional. However, any required include/exclude constraints specified by the user must be respected.
My initial strategy was to create a BooleanQuery, add a clause to it that contains the parsed user query, and then add further clauses that contain my expansion terms. The expansion terms would all be added as Occur.SHOULD. My question is how to constrain the user's query. I can imagine three possibilities:
The user's query contains no operators, which means I can include it as an Occur.SHOULD clause.
The user's query contains a + operator, so I need to include it as an Occur.MUST clause.
The user's query contains a - operator, but also other terms: Do I still include it as an Occur.MUST clause?
The question implicit in these three choices is how do I tell which condition is appropriate? I suppose I can rewrite the query and test for BooleanQuery instances, but that seems brittle.
I suppose can also try to tactic of creating a single string from the user's input and from my expansion terms, like this:
(fld1:userterm1 userterm2 -fld2:userterm3 +userterm4)^10 (fld1:expterm1)^8 (fld2:expterm2)^7 ...
Is this the best way to go? Or is there some elegant programmatic solution?
Okay, Not sure how useful this answer will be, but can't seem to come up with a hard and fast answer here, so I'll list a couple possibilities that come to mind:
First, a problem:
Modifying the query to look like:
(userquery) (other) (stuff)
I makes some sense to add the + with he rules you've shown, but a '-' prohibited term will be hard to respect correctly, since (query -prohibition) (other) will allow matches on other with prohibition present as well, and +(query -prohibition) (other) will require 'query' be matched.
The only way I see to really do that part right is to propagate the prohibited term into your automatically added terms as well, or extract it out to a parent query layer, more like (query -prohibition) --> (query) (other) -(prohibition).
And with user entered queries of arbitrary complexity, that may not be a great strategy.
If you want to tackle it by modifying the query string, then you should probably just add any terms to the end of the query. Nothing more to it.
I don't believe
(fld1:userterm1 userterm2 -fld2:userterm3 +userterm4)^10 (fld1:expterm1)^8 (fld2:expterm2)^7 ...
Is satisfactory, because userterm4 is only required within it's subquery, but a match Only on expterm1 is still acceptable. However, a query like:
fld1:userterm1 userterm2 -fld2:userterm3 +userterm4 (fld1:expterm1)^.8 (fld2:expterm2)^.7 ...
Should, I think, satisfy your needs, and prevents you from having to worry about the internals of your queryparser. I think this is the best approach.
I can also see logic in a query structured like
+(parsed userquery) (other stuff)
Effectively, always requiring a match on the user query. Lucene implicitly does this, in a sense, as it won't return a result that matches no term, even if no required fields are present in the query. This would then be using your added terms to impact scoring, rather than return a larger set of documents. This doesn't quite address what your asking, but might be worth considering.
If, despite the aforementioned problems of applying them, you still want to detect '+' and '-' operators, I think it can be reasonably assumed that a StandardQueryParser will return a BooleanQuery at base level for any query that you need to check for these operands on. You might have to worry about, for instance, DisjunctionMaxQueries, as well as what will happen when you have a simple query with an operator, like:
+myterm
I don't know if QueryParser would simply return a TermQuery, losing the plus (since it would be redundant without another term present). Concerns like that make me hesitant to address it in this way.
Similarly, attempting to detect these values from the query string must make assumptions about how things are parsed, and could become complicated.
To sumamrize, I think the best options are to, either: add terms to the end of the raw query string before doing any parsing, or treat the user query as atomic, and define the appropriate booleanclause independant of it's contents when adding to a boolean clause wrapping it with whatever other queries you need to include.

RESTful API Design OR Predicates

I'm designing a RESTful API and I'm trying to work out how I could represent a predicate with OR an operator when querying for a resource.
For example if I had a resource Foo with a property Name, how would you search for all Foo resources with a name matching "Name1" OR "Name2"?
This is straight forward when it's an AND operator as I could do the following:
http://www.website.com/Foo?Name=Name1&Age=19
The other approach I've seen is to post the search in the body.
You will need to pick your own approach, but I can name few that seem to be pretty logical (although not without disadvantages):
Option 1.: Using | operator:
http://www.website.com/Foo?Name=Name1|Name2
Option 2.: Using modified query param to allow selection by one of the values from the set (list of possible comma-separated values):
http://www.website.com/Foo?Name_in=Name1,Name2
Option 3.: Using PHP-like notation to provide list instead of single string:
http://www.website.com/Foo?Name[]=Name1&Name[]=Name2
All of the above mentioned options have one huge advantage: they do not interfere with other query params.
But as I mentioned, pick your own approach and be consistent about it across your API.
Well one quick way to fixing that is to add an additional parameter that is identifying the relationship between your parameters wether they're an and or an or for example:
http://www.website.com/Foo?Name=Name1&Age=19&or=true
Or for much more complex queries just keep a single parameter and in it include your whole query by making up your own little query language and on the server side you would parse the whole string and extract the information and the statement.

Android Notepad Uri Explanation

In the android Notes demo, it accepts the URI:
sUriMatcher.addURI(NotePad.AUTHORITY, "notes", NOTES);
sUriMatcher.addURI(NotePad.AUTHORITY, "notes/#", NOTE_ID);
Where the difference between notes and notes/# is that notes/# returns the note who's ID matches #.
However, the managedQuery() method that is used to get data from the content provider has the following parameters:
Parameters
uri The URI of the content provider to query.
projection List of columns to return.
selection SQL WHERE clause.
selectionArgs The arguments to selection, if any ?s are pesent
sortOrder SQL ORDER BY clause.
So, is there any particular cause for the design decision of providing a URI for that, rather than just using the selection parameter? Or is it just a matter of taste?
Thank you.
I thinks its so you can do more complex lookups without having to complicate your selections and arguments. For example in my project I have multiple tables but use the same selection and arguments. To filter content. By using the URI I don't have interpret the query, I can just switch on the URI. It.might be personal taste to begin with. But in more complex scenarios you appreciate the URI. You can also use * to match strings in the same.way you can with#.
I think it's mostly a matter of taste. IMHO, putting the id in the Uri is a little cleaner since you can make the id opaque rather than require the client to know that it actually represents a specific row id. For instance, you can pass a lookup key (like in the the Contacts API) rather than a specific row id.