Using unfiltered search in MarkLogic - indexing

Consider the following query:
xquery version "1.0-ml";
declare namespace ts = "http://marklogic.com/MLU/top-songs";
let $range_query := cts:element-range-query(xs:QName("ts:week"), ">=", xs:date("2009-05-01"))
let $query := cts:search(fn:doc(), $range_query)
return $query/ts:top-song/ts:title/text()
(Not: Range index for week has been enabled)
I believe that the above query can give results based only on the indexes and with this assumption I made the following change:
let $query := cts:search(fn:doc(), $range_query, "unfiltered")
I got the same results.
However,
fn:count($query/ts:top-song/ts:title/text()) gave a result of 8
and
xdmp:estimate($query/ts:top-song/ts:title/text())
gave an error:Expression is unsearchable
I believe this means that the query cannot be searched using indexes. If so, why does the unfiltered approach work just fine ?

The unfiltered search works and the xdmp:estimate expression doesn't because they're not using the same query and expression. The query you pass cts:search is fully searchable, so it will work when you call xdmp:estimate on it:
xdmp:estimate(cts:search(fn:doc(), $range_query, "unfiltered")
xdmp:estimate requires a "partially searchable" XPath expression, which has a specific definition according to MarkLogic. There are some subtle details about what makes an expression fully or partially or not searchable, and probably the most instructive way to go about it is using xdmp:query-trace to test the expression.

Related

Amazon CloudSearch returns false results

I have a DB of articles, and i would like to search for all the articles who:
1. contain the word 'RIO' in either the title or the excerpt
2. contain the word 'BRAZIL' in the parent_post_content
3. and in a certain time range
The query I search with (structured) was:
(and (phrase field=parent_post_content 'BRAZIL') (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (or (phrase field=title 'RIO') (phrase field=excerpt 'RIO')))
but for some reason i get results that contain 'RIO' in the title, but do not contain 'BRAZIL' in the parent_post_content.
This is especially weird because i tried to condition only on the title (and not the excerpt) with this query:
(and (phrase field=parent_post_content 'BRAZIL') (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (phrase field=name 'RIO'))
and the results seem OK.
I'm fairy new to CloudSearch, so i very likely have syntax errors, but i can't seem to find them. help?
You're using the phrase operator but not actually searching for a phrase; it would be best to use the term operator (or no operator) instead. I can't see why it should matter but using something outside of how it was intended to be used can invite unintended consequences.
Here is how I'd re-write your queries:
Using term (mainly just used if you want to boost fields):
(and (term field=parent_post_content 'BRAZIL') (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (or (term field=title 'RIO') (term field=excerpt 'RIO')))
Without an operator (I find this simplest):
(and parent_post_content:'BRAZIL' (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (or title:'RIO' excerpt:'RIO'))
If that fails, can you post the complete query? I'd like to check that, for example, you're using the structured query parser since you mentioned you're new to CloudSearch.
Here are some relevant docs from Amazon:
Compound queries for more on the various operators
Searching text for specifics on the phrase operator
Apparently the problem was not with the query, but with the displayed content. I foolishly trusted that the content displaying in the CloudSearch site was complete, and so concluded that it does not contain Brazil. But alas, it is not the full content, and when i check the full content, Brazil was there.
Sorry for the foolery.

Product Index Using Django ORM

I have a list of Products with a field called 'Title' and I have been trying to get a list of initial letters with not much luck. The closes I have is the following that dosn't work as 'Distinct' fails to work.
atoz = Product.objects.all().only('title').extra(select={'letter': "UPPER(SUBSTR(title,1,1))"}).distinct('letter')
I must be going wrong somewhere,
I hope someone can help.
You can get it in python after the queryset got in, which is trivial:
products = Project.objects.values_list('title', flat=True).distinct()
atoz = set([i[0] for i in products])
If you are using mysql, I found another answer useful, albeit using sql(django execute sql directly):
SELECT DISTINCT LEFT(title, 1) FROM product;
The best answer I could come up with, which isn't 100% ideal as it requires post processing is this.
atoz = sorted(set(Product.objects.all().extra(select={'letter': "UPPER(SUBSTR(title,1,1))"}).values_list('letter', flat=True)))

How to set an SQL parameters in Apps Scripts and BigQuery

I am trying to avoid a sql injection. This topic has been dealt with in Java (How to prevent query injection on Google Big Query) and Php.
How is this accomplished in App Scripts? I did not find how to add a parameter to a SQL statement. Here is what I had hoped to do:
var sql = 'SELECT [row],etext,ftext FROM [hcd.hdctext] WHERE (REGEXP_MATCH(etext, esearch = ?) AND REGEXP_MATCH(ftext, fsearch = ?));';
var queryResults;
var resource = {
query: sql,
timeoutMs: 1000,
esearch='r"[^a-zA-z]comfortable"',
fsearch='r"[a-z,A-z]confortable"'
};
queryResults = BigQuery.Jobs.query(resource,projectNumber);
And then have esearch and fsearch filled in with the values (which could be set elsewhere).
That does not work, according to the doc.
Any suggestions on how to get a parameter in an SQL query? (I could not find a setString function...)
Thanks!
Unfortunately, BigQuery doesn't support this type of parameter substitution. It is on our list of features to consider, and I'll bump the priority since it seems like this is a common request.
The only suggestion that I can make in the mean time is that if you are building query strings by hand, you will need to make sure you escape them carefully (which is a non-trivial operation).

Alfresco boost results from site

I'm trying to boost results from a particular Alfresco site compared to others.
I've written the following query but it's not working properly :
(((TAG:term or cm:name:term OR cm:title:term )^8 OR (cm:description:term )^6) AND PATH:'/app:company_home/st:sites/cm:Pub/cm:documentLibrary//*')^8
OR
(((TAG:term or cm:name:term OR cm:title:term )^4 OR (cm:description:term )^2) AND -PATH:'/app:company_home/st:sites/cm:Pub/cm:documentLibrary//*')
The result is overall correct but the results from Pub aren't the first ones in the list.
Is there a way to achieve that ?
^8 on the first part query part was just not enough. Putting a huge value like 100 makes it work perfectly !

Endeca UrlENEQuery java API search

I'm currently trying to create an Endeca query using the Java API for a URLENEQuery. The current query is:
collection()/record[CONTACT_ID = "xxxxx" and SALES_OFFICE = "yyyy"]
I need it to be:
collection()/record[(CONTACT_ID = "xxxxx" or CONTACT_ID = "zzzzz") and
SALES_OFFICE = "yyyy"]
Currently this is being done with an ERecSearchList with CONTACT_ID and the string I'm trying to match in an ERecSearch object, but I'm having difficulty figuring out how to get the UrlENEQuery to generate the or in the correct fashion as I have above. Does anyone know how I can do this?
One of us is confused on multiple levels:
Let me try to explain why I am confused:
If Contact_ID and Sales_Office are different dimensions, where Contact_ID is a multi-or dimension, then you don't need to use EQL (the xpath like language) to do anything. Just select the appropriate dimension values and your navigation state will reflect the query you are trying to build with XPATH. IE CONTACT_IDs "ORed together" with SALES_OFFICE "ANDed".
If you do have to use EQL, then the only way to modify it (provided that you have to modify it from the returned results) is via string manipulation.
ERecSearchList gives you ability to use "Search Within" functionality which functions completely different from the EQL filtering, though you can achieve similar results by using tricks like searching only specified field (which would be separate from the generic search interface") I am still not sure what's the connection between ERecSearchList and the EQL expression above?
Having expressed my confusion, I think what you need to do is to use String manipulation to dynamically build the EQL expression and add it to the Query.
A code example of what you are doing would be extremely helpful as well.