Neo4j automatic (numeric) indexing - lucene

If I enable automatic indexing on for example latitude and longitude fields (type: double) I can't do this query
autoIndex.query(
QueryContext.numericRange(
'longitude',
16.598145,
46.377254
)
);
autoIndex is defined as graphDb.index().getNodeAutoIndexer().getAutoIndex().
Simply the result size is size=0.
Other queries like
autoIndex.get(...)
or
autoIndex.query('id', new QueryContext("*").sort(sort))
work just fine.
If I however manually add index locations_index with
ValueContext latValue = new ValueContext(node.getProperty('latitude')).indexNumeric();
ValueContext lonValue = new ValueContext(node.getProperty('longitude')).indexNumeric();
locationsIndex.add(node, 'latitude', latValue);
locationsIndex.add(node, 'longitude', lonValue);
for each latitude/longitude in database then the query works.
My question is - is there a way to automatically index numeric field as ValueContext.indexNumeric()? Shouldn't that be automatic?

Related

Is there a way to create Columnfamily in external table dynamically?

I created a External Table like this:
CREATE External TABLE IF NOT EXISTS words (word string, timest string,
url string, occs string, nos string, hiveall string, occall string) STORED
BY org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
('hbase.columns.mapping' =':key, count:timest, count:url, count:occs,
count:nos, other:hiveall, other:occall ')
Is there any way to create the columnfamilys dynamically? so that i have for example something like this:
1397897857000 column=word:occall, timestamp=1449778100184, value=value1
1397897857000 column=otherword:occall, timestamp=1449778100184, value=value2
I thought about something like this but from hive, this code here is from hbase :
Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String table = "myTable";
admin.disableTable(table);
HColumnDescriptor cf1 = ...;
admin.addColumn(table, cf1); // adding new ColumnFamily
HColumnDescriptor cf2 = ...;
admin.modifyColumn(table, cf2); // modifying existing ColumnFamily
admin.enableTable(table);
from here:
http://hbase.apache.org/0.94/book/schema.html
Or does somebody has another idea for my Problem:
I have multiple data from a word count job. This data contains the url, where the word was read from, a timestamp ,when the word was read, the occurance of how often it was found in the url, and some information about a category( there are news, social and all) with the occurance. The main problem is that multiple words can occur at the same timestamp, which will override a existing one. I need the rowkey to be the timestamp to make some querys against it ( like what was most used word in the last 2 weeks).
Column families can't be changed after creation like this. In your scenario, you should create different column qualifiers instead of different column families.
Fix a column family and use word coming as qualifier name. So, it will not override when different words come at the same timestamp.

Calculate percentage in query ZF2

Is there a way to calculate the percentage between two values in a table and select it as a column? I know it's possible, but I'd like to know if it's possible in a ZF2 context specifically.
I have a select in my ZF2 application that fetches a bunch of data from a db (SQL Server). This query concerns a table "libraries" that I want to order by free disk space (a column in the table). But I don't want it to order by the absolute amount of free space but rather the percentage relative to the total disk space.
So I mean something like
libraries.freeSpace / libraries.totalSpace as 'percentage'
but within a ZF2 select. This is the query currently:
$resultSet = $this->tableGateway->select(function(Select $select) use($report){
$select->where('id = ' . $report[0]->customer)
->order('freeSpace', 'asc');
});
e: ANSWER.
Use Zend\Db\Sql\Expression your model.
Add columns to your select:
$select->columns(array(
'percentage' => new Expression('cast(libraries.freeSpace') as float / cast.('libraries.totalSpace') as float', false);
You can pass an expression to Columns, don't forget the second parameter, you will need to set this to FALSE to stop the table prefix being automatically added for you when you manually supply them.
use Zend\Db\Sql\Expression;
$resultSet = $this->tableGateway->select(function(Select $select) use($report){
$select->columns(array(
'percentage' => new Expression('libraries.freeSpace / libraries.totalSpace')
), FALSE)
->where('id = ' . $report[0]->customer)
->order('freeSpace', 'asc')
;
});

Sphinx Search: get list of words from index by source column

I have a table(let's call it my_table) with two text fields: title and description. Also I have an index(my_index) that uses next source-query:
SELECT * FROM my_table;
When I need to get all words and frequencies from my_index I use something like:
$indexer my_index --buildstops word_freq.txt 1000 --buildfreqs
But now, I need to get words that are presented only in column title(and their frequencies only from title column). What is the best solution to do this?
Edit:
It will be perfect, if solution won't build new indexes on disk space.
Create a new "index", that only includes the title column. No need to ever build an physical index with it, can just use it with --buildstops :)
Index inheritence, allows its creation with very compact bit in the config file
source my_index_title : my_index {
sql_query = SELECT id,title from my_table
}
index my_index_title : my_index {
source = my_index_title
path = /tmp/my_index_title
}

NHibernate Like with integer

I have a NHibernate search function where I receive integers and want to return results where at least the beginning coincides with the integers, e.g.
received integer: 729
returns: 729445, 7291 etc.
The database column is of type int, as is the property "Id" of Foo.
But
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.InsensitiveLike("Id", id.ToString() + "%"));
return criteria.List<Foo>();
does result in an error (Could not convert parameter string to int32). Is there something wrong in the code, a work around, or other solution?
How about this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(Expression.Like(Projections.Cast(NHibernateUtil.String, Projections.Property("Id")), id.ToString(), MatchMode.Anywhere));
return criteria.List<Foo>();
Have you tried something like this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.Like(Projections.SqlFunction("to_char", NHibernate.NHibernateUtil.String, Projections.Property("Id")), id.ToString() + "%"));
return criteria.List<Foo>();
The idea is convert the column before using a to_char function. Some databases do this automatically.
AFAIK, you'll need to store your integer as a string in the database if you want to use the built in NHibernate functionality for this (I would recommend this approach even without NHibernate - the minute you start doing 'like' searches you are dealing with a string, not a number - think US Zip Codes, etc...).
You could also do it mathematically in a database-specific function (or convert to a string as described in Thiago Azevedo's answer), but I imagine these options would be significantly slower, and also have potential to tie you to a specific database.

Lucene Field Grouping

say i m having fields stud_roll_number and date_leave.
select stud_roll_number,count(*) from some_table where date_leave > some_date group by stud_roll_number;
how to write the same query using Lucene....I tried after querying date_leave > some_date
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = search.doc(scoreDoc.doc);
String value = doc.get(fieldName);
Integer key = mapGrouper.get(value);
if (key == null) {
key = 1;
} else {
key = key+1;
}
mapGrouper.put(value, key);
}
But, I m having huge data set, it takes much time to compute this. Is there any other way to find it???? Thanks in advance...
Your performance bottleneck is almost certainly the I/O it takes to perform the document and field value lookups. What you want to do in this situation is use a FieldCache for the field you want to group by. Once you have a field cache, you can look up the values by Lucene doc ID, which will be fast because all the values are in memory.
Also remember to give your HashMap an initial capacity to avoid array resizing.
There is a very new grouping module, on https://issues.apache.org/jira/browse/LUCENE-1421 as a patch, that will do this.