I have the following document structure:
Item: {ItemId (string), Flag (bool), Type ("Item")}
SubItem" {ItemId (string), Text (sting), Type ("SubItem")}
I need to get all Items with Flag=true and any of its SubItem Text has a term "term".
I can easily get list of Items if any of its SubItem Text has the term by using DuplicateFiler but how to do filter by Flag? Tried to create BooleanQuery but it's not very good as number of Items is big
I greatly recommend you to take a look into BlockJoinQuery in Lucene.
Very good start for it - http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
Related
I have a scenario in which I need to find the largest documents in my Ravendb database.
When I select any given document in Ravendb Studio, the size is displayed in the Properties section as circled in red in this screen shot:
Is there a query I can run that will order documents by this Size property so that I can identity the largest documents?
Maybe write a method that calculates your object size, probably using reflection.
Then, create a static Map Index with a field 'size',
and set it with your method that you will provide in the 'additional sources' in the index
See https://ravendb.net/docs/article-page/4.2/Csharp/studio/database/indexes/create-map-index#additional-sources
And then you could query this index and order-by the 'size' field
fyi - you can get a specific document size using the following endpoint:
{yourServerUrl}/databases/{yourDatabaseName}/docs/size?id={yourDocumentId}
Learn about ravenDB rest api in:
https://ravendb.net/docs/article-page/4.2/csharp/client-api/rest-api/rest-api-intro
Index (Map) definition:
from doc in docs
select new {
doc.BlittableJson.Size
}
I am trying to integrate full text search for django 1.10 with postgres database.
I am following tutorial from
https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/search/
class Question(models.Model):
text = models.TextField(max_length=500)
ans = models.TextField(max_length=1500, blank=True)
I have several questions in the database which has the text 'for' in its text field for example: one question is:
text: what is best for me?
ans: this is best for you.
I am trying to make a query like
q = Question.objects.filter(text__search='for')
But this query is not returning any result. can anyone suggest me why?
It is actually my mistake. For Full text search when Postgres creates index it by default ignore common words like 'the', 'for', 'are','is' etc. So If you try to search using this keywords you search query will return empty even if there are lots of sentences with these words.
I did not know this. So I thought I misconfigured.
I'm new to Solr and I have a very specific problem that I need to solve:
I have a csv file that contains my Solr document. Now, I do have a column (field) that's not only multiValued, but also contains 'subfields'
for example
"id":"0101",
"addMaterials":[{"name":"Mat1", "property":"prop1"},
{"name":"Mat2","property":"prop2"},
{"name":"Mat3","property":"prop3"}],
"mainProperty":"mainproperty1",
"URL":"http://www.mySite..."
where id, addMaterials, mainProperty, and URL are my main fields while 'name' and 'property' are my subfields. I know that Solr is designed to handle denormalized documents but denormalizing is not a possible solution for my application.
What I'm thinking is to just separate my data set and move the fields (that have subfields) to another document and somehow make a new field to link it to the orginial document (e.g. fromIdField).
Is there any other solution to do this? My minimum goal is to index the values of addMaterials field (even without indexing the subfields)
from:
"addMaterials":[{"name":"Mat1", "property":"prop1"},
{"name":"Mat2","property":"prop2"},
{"name":"Mat3","property":"prop3"}],
to
"addMaterials":{"name":"Mat1", "property":"prop1"}
"addMaterials":{"name":"Mat2", "property":"prop2"}
"addMaterials":{"name":"Mat3", "property":"prop3"}
Thanks in advance.
I have found a solution to my problem. Instead of separating my data set, I kept the addMaterials field as a multiValued field and ignored the subfields. So I only have one multiValued field to be indexed. What I did was to use the update/ request of Solr to index my csv file and put },{ as my separator in my addMaterials multiValued field. The indexed document looks like this:
"addMaterials": ["[{\"name\":\"Mat1\", \"property\":\"prop1\"",
"\"name\":\"Mat2\", \"property\":\"prop2\"",
"\"name\":\"Mat3\", \"property\":\"prop3\"}]"]
I indexed my document using this:
curl "http://localhost:8983/solr/<coreName>/update/csv?
stream.file=C:/userName/Solr/solr-5.2.0/documentFolder/myFile.csv&
f.addMaterials.split=true&
f.addMaterials.separator=\},\{&
stream.contentType=text/plain;charset=utf-8"
Also, this assumes that the addMaterials field is a multiValued field. So make sure you modify your schema first before indexing your document using the procedure above. Otherwise, it will give an error saying that the f. is not a multiValued field.
Of course, if you need to query against the sub-fields then I guess you can use the !join command/function of Solr.
Currently I'm working on Apache Lucene 4.6.
Can Lucene search in all of the document's fields?
OR
I have to create another field which will contain all text?
Please help.
Thanks in advance.
You can search in all of the fields. You just have to know all the field. e.g. when you have the fields body and header and title, and you want to find the word "hello" you can search for +body:hello +header:hello +title:hello
I've created an index which indexes the event items in different sections of a website.
This items are on the website in a structure like this:
/Start/Section1/Events/2011/12/25/X-mas
/Start/Section2/Events/2012/01/01/New-years-day
These paths are stored in the field path in the index.
On the start page I need an overview of the events from all the different sections.
When I'm in a section I only need the events placed under that section.
I add a booleanquery like this:
QueryParser queryParser = new QueryParser("path", analyzer);
Query query = queryParser.Parse(startPath);
completeQuery.Add(query, BooleanClause.Occur.MUST);
"path" is a field that is added through a custom index script;
To retreive the items for the start page I would search my index using:
string startPath = "/Start";
This normally gives me all item where the path starts with "/Start"
To retreive the items for section1 I would search my index using:
string startPath = "/Start/Section1/Events";
This normally gives me all item where the path starts with "/Start/Section1/Events"
I've implemented this solution for news items and that works fine. For event items it does not.
When I search my index it returns no hits. The problem is that the last three folder names are numeric.
When I rename the folders (f.e. 2011,12,25) to text (two-thousand,twelve,twenty-five) it DOES return hits.
How can I get my index to return results keeping my folder names numeric?
Use a CharTokenizer for your path, and have IsTokenChar(char c) return false for the /.
This way you'll be sure each part of your path is an individual Token.