How to optimize solr indexes

How to optimize solr indexes - optimization

when i run solr/admin page i got this information, it shows optimize=true, but i have not set optimize=true in configuration file than how it is optimizing the indexes.
and how can i set it to false then .
Schema Information
Unique Key: UID_PK
Default Search Field: text
numDocs: 2881
maxDoc: 2881
numTerms: 41960
version: 1309429290159
optimized: true
current: true
hasDeletions: false
directory: org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory# C:\apache-solr-1.4.0\example\example-DIH\solr\db\data\index
lastModified: 2011-06-30T10:25:04.89Z

It doesnt say "optimize = true" or that it will "optimize something". It says that your index is currently optimized. Thats a difference it describes only the current status of your index.
The best way to lookup this kind of thing is:
Insert a couple of rows
Lookup this value it will show "optimized"
Delete a row
Lookup again this value it will say "not optimized"

Related

fnupdate updates wrong row datatables

I am trying to update a row by passing index.
http://live.datatables.net/raculubo/1/
But it most of the time replaces a wrong row.
The code is :-
$(document).ready(function() {
var table = $('#example').DataTable();
var index = table.column(0).data().indexOf("Cedric Kelly");
console.log("index2",index);
table.row().data(["ax","by","dd"], index);
} );

This is happening because of how you are sorting your data, leading to a difference between the "sort order" index and the "internal DataTables" index.
The table.column(0).data() function will return an array of names, as currently displayed in the table, taking into account sorting. In this scenario, the index of "Cedric Kelly" is therefore 1.
However, the internal unique index value stored by DataTables is actually 3 because that is the order provided to DataTables from your HTML code when the data was loaded for the very first time (where Cedric Kelly is the 4th record listed - so the index is 3).
This initial loading happens before data is sorted, and it is during this step that data indexes are assigned. Once assigned, they never change (unless you delete data).
Your data update function uses the value of 1 - thus updating the wrong row.
The fix for this is to tell DataTables to use the original loading order in the table.column(0).data() function:
var index = table.column(0, {order:'index'} ).data().indexOf("Cedric Kelly");
That directive {order:'index'} causes DataTables to use the original loading order. Now, the correct record will be updated because this index will now return 3 instead of 1.
You can see more details about this "selector modifier" syntax here.
Bear in mind that the correct syntax for updating a row is actually this:
table.row( index ).data(["ax","by","dd"]);
Finally, bear in mind that if you filter your data, then you are OK, since the default value used is search: 'none' - which means "do not take searching/filtering into account" when selecting the column data.

Apache Solr sort based on score and fieldn values

I used the following request
http://localhost:8983/solr/test6/select?q=*:*&sort=product(score,hits)%20desc
to sort results based on their relevancy score as determined by Apache Solr multiplied by a field called hits (integers).
However, I receive the following error message:
{ "responseHeader":{
"status":400,
"QTime":0,
"params":{
"q":"*:*",
"sort":"product(score,hits) desc"}}, "error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"sort param could not be parsed as a query, and is not a field that exists in the index: product(score,hits)",
"code":400}}
Why is it that sort cannot correctly input the function value when:
http://localhost:8983/solr/test6/select?q=*:*&sort=score%20desc
http://localhost:8983/solr/test6/select?q=*:*&sort=hits%20desc
work when a function isn't applied?
NOTE: http://localhost:8983/solr/test6/select?q=*:*&sort=product(hits,2)%20desc where I added the product() function also returns the same error message.

The score value isn't really a field - so you can't use a function to manipulate it in the sort clause.
Instead you can use a multiplicative boost through boost (if you're using edismax) to achieve what you want: &boost=hits. You might want to use log(hits) or something similar (recip for example) instead to avoid large differences in score for just small changes in the number of hits.

Peoplesoft CreateRowset with related display record

According to the Peoplebook here, CreateRowset function has the parameters {FIELD.fieldname, RECORD.recname} which is used to specify the related display record.
I had tried to use it like the following (just for example):
&rs1 = CreateRowset(Record.User, Field.UserId, Record.UserName);
&rs1.Fill();
For &k = 1 To &rs1.ActiveRowCount
MessageBox(0, "", 999999, 99999, &rs1(&k).UserName.Name.Value);
End-for;
(Record.User contains only UserId(key), Password.
Record.UserName contains UserId(key), Name.)
I cannot get the Value of UserName.Name, do I misunderstand the usage of this parameter?

Fill is the problem. From the doco:
Note: Fill reads only the primary database record. It does not read
any related records, nor any subordinate rowset records.
Having said that, it is the only way I know to bulk-populate a standalone rowset from the database, so I can't easily see a use for the field in the rowset.
Simplest solution is just to create a view, but that gets old very soon if you have to do it a lot. Alternative is to just loop through the rowset yourself loading the related fields. Something like:
For &k = 1 To &rs1.ActiveRowCount
&rs1(&k).UserName.UserId.value = &rs1(&k).User.UserId.value;
&rs1(&k).UserName.SelectByKey();
End-for;

Neo4j 2.0.1 Cypher performance difference between using start and match with a predicate

Started using Cypher about a week ago (really like it). In the 'browser' interface I'm running two queries:
1) start n=node:Node(name="foo") match (n)-[r*..4]-(m) return n,m
2) match (n{name:"foo"})-[r*..4]-(m) return n,m
The first query returns almost immediately, the second query more than an hour and counting. Naively I would think these would be equivalent, clearly they are not. I ran a 'smaller' (path just up to 1) version of both in the neo-shell so I could profile them.
profile start n=node:Node(name="foo") match (n)-[r*..1]-(m) return n,m;
ColumnFilter(symKeys=["m", "n", " UNNAMED51", "r"], returnItemNames=["n", "m"], _rows=4, _db_hits=0)
TraversalMatcher(start={"expr": "Literal(foo)", "identifiers": ["n"], "key": "Literal(name)",
"idxName": "Node", "producer": "NodeByIndex"}, trail="(n)-[*1..1]-(m)", _rows=4, _db_hits=5)
.
profile match (n{name:"foo"})-[r*..1]-(m) return n,m
ColumnFilter(symKeys=["n", "m", " UNNAMED33", "r"], returnItemNames=["n", "m"], _rows=4, _db_hits=0)
Filter(pred="Property(n,name(0)) == Literal(foo)", _rows=4, _db_hits=196870)
TraversalMatcher(start={"producer": "AllNodes", "identifiers": ["m"]},
trail="(m)-[*1..1]-(n)", _rows=196870, _db_hits=396980)
From other stackoverflow questions I understand db_hits is good to look at, so it looks like the second query has basically done a linear scan (my db is almost 400k nodes). This seems to be confirmed by the "producer" value of "AllNodes" instead of "NodeByIndex".
Obviously I need to specify the match (predicate) differently so that it hits the index. The index is called 'Node' on parameter 'name'. My googling, stacko search is failing me.. how do I specify the conditional in the match so that it hits the index?
Update:
After some poking around it appears I'm using a 'legacy' index? and then trying to hit that with the 'new style (don't use start) query... (kinda extrapolating here). So I can do the following:
create index ON :label(name)
and that would provide an index for a particular label on the name property, but I really want an index (I guess non-legacy index) on ALL the node names. I have use cases where that's important (user may not know the label but does know the name).
Any suggestions or guidance is much appreciated.

Right now there is no global schema index, so you would probably want to create an index on a generic label like Entity or Node and create an index like this:
create index on :Entity(name);
And add that Entity label to all your nodes.
match (n) set n:Entity;

Apache solr - more like this score

I have a small index with ~1000 documents with only two fields:
- id (string)
- content (text_general)
I noticed that when I do MLT search by id for similar content, the original document(which id is the searched id) have a score 5.241327.
There is 1:1 duplicated document and for the duplicated content it is returning score = 1.5258181. Why? Why it is not 5.241327 when it is 100% duplicate.
Another question is can I in any way to get similarity documents by content by passing some text in the query.
Example:
/mlt/?q=content:Some encoded long text&mlt.fl=content
I am trying to check if there is similar content uploaded and the check must be performed at new content upload time.

It might be worth to try some different parameters. I also use MLT on only one field, I use the following parameters:
'mlt.boost': 'true',
'mlt.fl': 'my_field_name',
'mlt.maxqt': 1000,
'mlt.mindf': '0',
'mlt.mintf': '0',
'qt': 'mlt',
'rows': '10'
See http://wiki.apache.org/solr/MoreLikeThis for an explanation of the parameters. I think with a small index mindf might be important and I see the default mintf (term frequency) is 2, so I assume an ID is only one term, so this is probably ignored!

First, how does Solr More-Like-This works?
A regular Solr query is conducted (e.g. "?q=content:Some encoded long text&.....".
For each document returned by the above query, More-Like-This conduct More like this query...
So, the first result set "response", is just like any Solr query results set.
The More-Like-This appears below and start with something like that (Json format):
"moreLikeThis":{
"57375":{"numFound":18155,"start":0,"docs":["
For an explanation about More Like This algorithm, please read that:
http://blog.brattland.no/node/18
and: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If you didn't solved the problem yet, please let me know and I will guide you through.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimize solr indexes - optimization

Related

fnupdate updates wrong row datatables

Apache Solr sort based on score and fieldn values

Peoplesoft CreateRowset with related display record

Neo4j 2.0.1 Cypher performance difference between using start and match with a predicate

Apache solr - more like this score

Categories

Resources