How do you query a ravendb index containing dates using Lucene? - lucene

I am using the http api to query ravendb (so a LINQ query is not the solution to my question).
My Product document looks like this:
{
"editDate": "2012-08-29T15:00:00.846Z"
}
and I have the index:
from doc in docs.Product
select new { doc.editDate }
I want to query all documents before a certain date AND time. I can query on the DATE using this syntax:
editDate: [NULL TO 2012-09-17]
however I can't figure out how to query the time component as well.
Any ideas?

You can query that using:
editDate: [NULL TO 2012-09-17T15:00:00.846Z]
If you care for a part of that, use:
editDate: [NULL TO 2012-09-17T15:00]
Note that you might have to escape parts of the query, like so:
editDate: [NULL TO 2012\-09\-17T15\:00]
For this to work you also need to ensure that the field is analysed.
In Raven Studio - Add Field -> editDate, and set Indexing to Analyzed.

One way of working around this problem is to store the value not as a date but as the number of seconds since the unix epoch.
This results in a number that you can compare against with ease.
I am using javascript and the moment js library:
{
"editDate": "2012-08-29T15:00:00.846Z",
"editDateUnix": moment("2012-08-29T15:00:00.846Z").unix()
}
Any my index:
from doc in docs.Product
select new { doc.editDate, doc.editDateUnix }
and my lucene query:
"editDate: [NULL TO "+moment("2012-09-17").unix()+"]"

Related

Apache Solr sort based on score and fieldn values

I used the following request
http://localhost:8983/solr/test6/select?q=*:*&sort=product(score,hits)%20desc
to sort results based on their relevancy score as determined by Apache Solr multiplied by a field called hits (integers).
However, I receive the following error message:
{ "responseHeader":{
"status":400,
"QTime":0,
"params":{
"q":"*:*",
"sort":"product(score,hits) desc"}}, "error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"sort param could not be parsed as a query, and is not a field that exists in the index: product(score,hits)",
"code":400}}
Why is it that sort cannot correctly input the function value when:
http://localhost:8983/solr/test6/select?q=*:*&sort=score%20desc
http://localhost:8983/solr/test6/select?q=*:*&sort=hits%20desc
work when a function isn't applied?
NOTE: http://localhost:8983/solr/test6/select?q=*:*&sort=product(hits,2)%20desc where I added the product() function also returns the same error message.
The score value isn't really a field - so you can't use a function to manipulate it in the sort clause.
Instead you can use a multiplicative boost through boost (if you're using edismax) to achieve what you want: &boost=hits. You might want to use log(hits) or something similar (recip for example) instead to avoid large differences in score for just small changes in the number of hits.

How to change sql generated by linq-to-entities?

I am querying a MS SQL database using Linq and Entity Framework Code First. The requirement is to be able to run a WHERE SomeColumn LIKE '%sometext'clause against the table.
This, on the surface, is a simple requirement that could be accomplished using a simple Linq query like this:
var results = new List<MyTable>();
using(var context = new MyContext())
{
results = context.MyTableQueryable
.Where(x => x.SomeColumn.EndsWith("sometext"))
.ToList();
}
// use results
However, this was not effective in practice. The problem seems to be that the column SomeColumn is not varchar, rather it's a char(31). This means that if a string is saved in the column that is less than 31 characters then there will be spaces added on the end of the string to ensure a length of 31 characters, and that fouls up the .EndsWith() query.
I used SQL Profiler to lookup the exact sql that was generated from the .EndsWith() method. Here is what I found:
--previous query code removed for brevity
WHERE [Extent1].[SomeColumn] LIKE N'%sometext'
So that is interesting. I'm not sure what the N means before '%sometext'. (I'll Google it later.) But I do know that if I take the same query and run it in SSMS without the N like this:
--previous query code removed for brevity
WHERE [Extent1].[SomeColumn] LIKE '%sometext'
Then the query works fine. Is there a way to get Linq and Entity Framework to drop that N from the query?
Please try this...
.Where(x => x.SomeColumn.Trim().EndsWith("sometext"))
Just spoke to my colleague who had a similar issue, see if the following works for you:
[Column(TypeName = "varchar")]
public string SomeColumn
{
get;
set;
}
Apparently setting the type on the column mapping will force the query to recognise it as a VARCHAR, where a string is normally interpreted as an NVARCHAR.

Is getting the General ID same as getting FormattedID in rally?

I am trying to get the ID under "General" from a feature item in rally. This is my query:
body = { "find" => {"_ProjectHierarchy" => projectID, "_TypeHierarchy" => "PortfolioItem/Feature"
},
"fields" => ["FormattedID","Name","State","Release","_ItemHierarchy","_TypeHierarchy","Tags"],
"hydrate" => ["_ItemHierarchy","_TypeHierarchy","Tags"],
"fetch"=>true
}
I am not able to get any value for FormattedID, I tried using "_UnformattedID" but it pulls up an entirely different value than the FormattedID. Any help would be appreciated.
LBAPI does not have FormattedID field. You are correct using _UnformattedID. It is the FormattedID without the prefix. For example, this query:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1111/artifact/snapshot/query.js?find={"_ProjectHierarchy":2222,"_TypeHierarchy":"PortfolioItem/Feature","State":"Developing",_ValidFrom: {$gte: "2013-06-01TZ",$lt: "2013-09-01TZ"}},sort:{_ValidFrom:-1}}&fields=["_UnformattedID","Name","State"]&hydrate=["State"]&compress=true&pagesize:200
shows _UnformattedID that correspond to FormattedID as this screenshot shows:
I noticed your are using fields and fetch . Per LBAPI's documentation, it uses fields rather than fetch. If you want to get all fields, use fields=true
As far as the missing custom fields, make sure that the custom field value was set within the dates of the query.
Compare these almost identical queries: the first query does not return a custom field, the second query does.
Query #1:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1111/artifact/snapshot/query.js?find={"_ProjectHierarchy":2222,"_TypeHierarchy":"PortfolioItem/Feature","State":"Developing",_ValidFrom: {$gte: "2013-06-01TZ",$lt: "2013-09-01TZ"}}}&fields=["_UnformattedID","Name","State","c_PiCustomField"]&hydrate=["State","c_PiCustomField"]
Query #2:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/11111/artifact/snapshot/query.js?find={"_ProjectHierarchy":2222,"_TypeHierarchy":"PortfolioItem/Feature","State":"Developing",__At: "current"}&fields=["_UnformattedID","Name","State","c_PiCustomField"]&hydrate=["State","c_PiCustomField"]
The first query uses time period: _ValidFrom: {$gte: "2013-06-01TZ",$lt: "2013-09-01TZ"}
The second query uses __At: "current"
Let's say I just create a new custom field on PortfolioItem. It is not possible to create a custom field on PorfolioItem/Feature, so the field is created on PI, but both queries still use "_TypeHierarchy":"PortfolioItem/Feature".
After I created this custom field, called PiCustomField, I set a value of that field for a specific Feature, F4.
The first query does not have a single snapshot that includes that field because that field did not exist in the time period we lookback. We can't change the past.
The second query returns this field for F4. It does not return it for other Features because all other Features do not have this field set.
Here is the screenshot:

Apache solr - more like this score

I have a small index with ~1000 documents with only two fields:
- id (string)
- content (text_general)
I noticed that when I do MLT search by id for similar content, the original document(which id is the searched id) have a score 5.241327.
There is 1:1 duplicated document and for the duplicated content it is returning score = 1.5258181. Why? Why it is not 5.241327 when it is 100% duplicate.
Another question is can I in any way to get similarity documents by content by passing some text in the query.
Example:
/mlt/?q=content:Some encoded long text&mlt.fl=content
I am trying to check if there is similar content uploaded and the check must be performed at new content upload time.
It might be worth to try some different parameters. I also use MLT on only one field, I use the following parameters:
'mlt.boost': 'true',
'mlt.fl': 'my_field_name',
'mlt.maxqt': 1000,
'mlt.mindf': '0',
'mlt.mintf': '0',
'qt': 'mlt',
'rows': '10'
See http://wiki.apache.org/solr/MoreLikeThis for an explanation of the parameters. I think with a small index mindf might be important and I see the default mintf (term frequency) is 2, so I assume an ID is only one term, so this is probably ignored!
First, how does Solr More-Like-This works?
A regular Solr query is conducted (e.g. "?q=content:Some encoded long text&.....".
For each document returned by the above query, More-Like-This conduct More like this query...
So, the first result set "response", is just like any Solr query results set.
The More-Like-This appears below and start with something like that (Json format):
"moreLikeThis":{
"57375":{"numFound":18155,"start":0,"docs":["
For an explanation about More Like This algorithm, please read that:
http://blog.brattland.no/node/18
and: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If you didn't solved the problem yet, please let me know and I will guide you through.

jqGrid/NHibernate/SQL: navigate to selected record

I use jqGrid to display data which is retrieved using NHibernate. jqGrid does paging for me, I just tell NHibernate to get "count" rows starting from "n".
Also, I would like to highlight specific record. For example, in list of employees I'd like a specific employee (id) to be shown and pre-selected in table.
The problem is that this employee may be on non-current page. E.g. I display 20 rows from 0, but "highlighted" employee is #25 and is on second page.
It is possible to pass initial page to jqGrid, so, if I somehow use NHibernate to find what page the "highlighted" employee is on, it will just navigate to that page and then I'll use .setSelection(id) method of jqGrid.
So, the problem is narrowed down to this one: given specific search query like the one below, how do I tell NHibernate to calculate the page where the "highlighted" employee is?
A sample query (simplified):
var query = Session.CreateCriteria<T>();
foreach (var sr in request.SearchFields)
query = query.Add(Expression.Like(sr.Key, "%" + sr.Value + "%"));
query.SetFirstResult((request.Page - 1) * request.Rows)
query.SetMaxResults(request.Rows)
Here, I need to alter (calculate) request.Page so that it points to the page where request.SelectedId is.
Also, one interesting thing is, if sort order is not defined, will I get the same results when I run the search query twice? I'd say that SQL Server may optimize query because order is not defined... in which case I'll only get predictable result if I pull ALL query data once, and then will programmatically in C# slice the specified portion of query results - so that no second query occur. But it will be much slower, of course.
Or, is there another way?
Pretty sure you'd have to figure out the page with another query. This would surely require you to define the column to order by. You'll need to get the order by and restriction working together to count the rows before that particular id. Once you have the number of rows before your id, you can figure what page you need to select and perform the usual paging query.
OK, so currently I do this:
var iquery = GetPagedCriteria<T>(request, true)
.SetProjection(Projections.Property("Id"));
var ids = iquery.List<Guid>();
var index = ids.IndexOf(new Guid(request.SelectedId));
if (index >= 0)
request.Page = index / request.Rows + 1;
and in jqGrid setup options
url: "${Url.Href<MyController>(c => c.JsonIndex(null))}?_SelectedId=${Id}",
// remove _SelectedId from url once loaded because we only need to find its page once
gridComplete: function() {
$("#grid").setGridParam({url: "${Url.Href<MyController>(c => c.JsonIndex(null))}"});
},
loadComplete: function() {
$("#grid").setSelection("${Id}");
}
That is, in request I lookup for index of id and set page if found (jqGrid even understands to display the appropriate page number in the pager because I return the page number to in in json data). In grid setup, I setup url to include the lookup id first, but after grid is loaded I remove it from url so that prev/next buttons work. However I always try to highlight the selected id in the grid.
And of course I always use sorting or the method won't work.
One problem still exists is that I pull all ids from db which is a bit of performance hit. If someone can tell how to find index of the id in the filtered/sorted query I'd accept the answer (since that's the real problem); if no then I'll accept my own answer ;-)
UPDATE: hm, if I sort by id initially I'll be able to use the technique like "SELECT COUNT(*) ... WHERE id < selectedid". This will eliminate the "pull ids" problem... but I'd like to sort by name initially, anyway.
UPDATE: after implemented, I've found a neat side-effect of this technique... when sorting, the active/selected item is preserved ;-) This works if _SelectedId is reset only when page is changed, not when grid is loaded.
UPDATE: here's sources that include the above technique: http://sprokhorenko.blogspot.com/2010/01/jqgrid-mvc-new-version-sources.html