Neo4j: Using index in cypher statement - indexing

I am struggeling on how to use indexes in cypher.
After creating and indexing nodes in java
I am fine with executing cypher queries on those nodes.
I am fine as well with querying those nodes using the created index in java.
However, when I call the index in the cypher statement I get an MissingIndexException.
So, why can't cypher find the index? Do I have to create a separate cypher index? (I have not found anything about that)
I am using version 1.8.2
Here's what I did:
public class IndexTester {
String DB_PATH = "target/java-query-db";
String resultString ="";
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
ExecutionEngine engine = new ExecutionEngine( db );
IndexManager index = db.index();
Index<Node> personIndex;
Node n;
Node n1;
public static void main( String[] args )
{
IndexTester indexTester = new IndexTester();
indexTester.runIndex();
}
public void runIndex(){
Transaction tx = db.beginTx();
try
{
personIndex = index.forNodes( "person" );
n = createAndIndexNode("type", "adult", personIndex, db);
addPropertyAndIndexNode("name", "John", personIndex, n);
addPropertyAndIndexNode("id", "1", personIndex, n);
n1 = createAndIndexNode("type", "adult", personIndex, db);
addPropertyAndIndexNode("name", "Jane", personIndex, n1);
addPropertyAndIndexNode("id", "2", personIndex, n1);
//This works fine!!
Node foundNode = personIndex.get("name", "John").getSingle();
System.out.println("Found Node: " + foundNode.getProperty("name"));
//This throws a MissingIndexException
resultString = engine.execute( "start m=node:personIndex(name= 'John') return m" ).toString();
System.out.println(resultString);
n.delete();
n1.delete();
tx.success();
}
finally
{
tx.finish();
}
}
private Node createAndIndexNode(final String property, final String name, Index<Node> nodeIndex, GraphDatabaseService db ) {
Node node = db.createNode();
node.setProperty(property , name);
nodeIndex.add(node, property, name);
return node;
}
public Node addPropertyAndIndexNode(String property, String name, Index<Node> nodeIndex, Node node)
{
node.setProperty( property, name );
nodeIndex.add( node, property, node.getProperty( property ) );
return node;
}
}
Any ideas / suggestions how to solve this?
Thank you!!

I think the actual name of your index is just person (as specified here: index.forNodes( "person" );), rather than personIndex.
Try:
start m=node:person(name= 'John') return m

Related

Using Lucene's highlighting, getting too much highlighted, is there a workaround for this?

I am using the highlighting feature of Lucene to isolate matching terms for my query, but some of the matched terms are excessive.
I have some simple test cases which are delivered in an Ant project (download details below).
Materials
You can download the test case here: mydemo_with_libs.zip
That archive includes the Lucene 8.6.3 libraries which my test uses; if you prefer a copy without the JAR files you can download that from here: mydemo_without_libs.zip
The necessary libraries are: core, analyzers, queries, queryparser, highlighter, and memory.
You can run the test case by unzipping the archive into an empty directory and running the Ant command ant synsearch
Input
I have provided a short synonym list which is used for indexing and analysing in the highlighting methods:
cope,manage
jobs,tasks
simultaneously,at once
and there is one document being indexed:
Queues are a useful way of grouping jobs together in order to manage a number of them at once. You can:
hold or release multiple jobs at the same time;
group multiple tasks (for the same event);
control the priority of jobs in the queue;
Eventually log all events that take place in a queue.
Use either job.queue or task.queue in specifications.
Process
When building the index I am storing the text field, and using a custom analyzer. This is because (in the real world) the content I am indexing is technical documentation, so stripping out punctuation is inappropriate because so much of it may be significant in technical expressions. My analyzer uses a TechTokenFilter which breaks the stream up into tokens consisting of strings of words or digits, or individual characters which don't match the previous pattern.
Here's the relevant code for the analyzer:
public class MyAnalyzer extends Analyzer {
public MyAnalyzer(String synlist) {
if (synlist != "") {
this.synlist = synlist;
this.useSynonyms = true;
}
}
public MyAnalyzer() {
this.useSynonyms = false;
}
#Override
protected TokenStreamComponents createComponents(String fieldName) {
WhitespaceTokenizer src = new WhitespaceTokenizer();
TokenStream result = new TechTokenFilter(new LowerCaseFilter(src));
if (useSynonyms) {
result = new SynonymGraphFilter(result, getSynonyms(synlist), Boolean.TRUE);
result = new FlattenGraphFilter(result);
}
return new TokenStreamComponents(src, result);
}
and here's my filter:
public class TechTokenFilter extends TokenFilter {
private final CharTermAttribute termAttr;
private final PositionIncrementAttribute posIncAttr;
private final ArrayList<String> termStack;
private AttributeSource.State current;
private final TypeAttribute typeAttr;
public TechTokenFilter(TokenStream tokenStream) {
super(tokenStream);
termStack = new ArrayList<>();
termAttr = addAttribute(CharTermAttribute.class);
posIncAttr = addAttribute(PositionIncrementAttribute.class);
typeAttr = addAttribute(TypeAttribute.class);
}
#Override
public boolean incrementToken() throws IOException {
if (this.termStack.isEmpty() && input.incrementToken()) {
final String currentTerm = termAttr.toString();
final int bufferLen = termAttr.length();
if (bufferLen > 0) {
if (termStack.isEmpty()) {
termStack.addAll(Arrays.asList(techTokens(currentTerm)));
current = captureState();
}
}
}
if (!this.termStack.isEmpty()) {
String part = termStack.remove(0);
restoreState(current);
termAttr.setEmpty().append(part);
posIncAttr.setPositionIncrement(1);
return true;
} else {
return false;
}
}
public static String[] techTokens(String t) {
List<String> tokenlist = new ArrayList<String>();
String[] tokens;
StringBuilder next = new StringBuilder();
String token;
char minus = '-';
char underscore = '_';
char c, prec, subc;
// Boolean inWord = false;
for (int i = 0; i < t.length(); i++) {
prec = i > 0 ? t.charAt(i - 1) : 0;
c = t.charAt(i);
subc = i < (t.length() - 1) ? t.charAt(i + 1) : 0;
if (Character.isLetterOrDigit(c) || c == underscore) {
next.append(c);
// inWord = true;
}
else if (c == minus && Character.isLetterOrDigit(prec) && Character.isLetterOrDigit(subc)) {
next.append(c);
} else {
if (next.length() > 0) {
token = next.toString();
tokenlist.add(token);
next.setLength(0);
}
if (Character.isWhitespace(c)) {
// shouldn't be possible because the input stream has been tokenized on
// whitespace
} else {
tokenlist.add(String.valueOf(c));
}
// inWord = false;
}
}
if (next.length() > 0) {
token = next.toString();
tokenlist.add(token);
// next.setLength(0);
}
tokens = tokenlist.toArray(new String[0]);
return tokens;
}
}
Examining the index I can see that the index contains the separate terms I expect, including the synonym values. For example the text at the end of the first line has produced the terms
of
them
at , simultaneously
once
.
You
can
:
and the text at the end of the third line has produced the terms
same
event
)
;
When the application performs a search it analyzes the query without using the synonym list (because the synonyms are already in the index), but I have discovered that I need to include the synonym list when analyzing the stored text to identify the matching fragments.
Searches match the correct documents, but the code I have added to identify the matching terms over-performs. I won't show all the search method here, but will focus on the code which lists matched terms:
public static void doSearch(IndexReader reader, IndexSearcher searcher,
Query query, int max, String synList) throws IOException {
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter("\001", "\002");
Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
Analyzer analyzer;
if (synList != null) {
analyzer = new MyAnalyzer(synList);
} else {
analyzer = new MyAnalyzer();
}
// Collect all the docs
TopDocs results = searcher.search(query, max);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = Math.toIntExact(results.totalHits.value);
System.out.println("\nQuery: " + query.toString());
System.out.println("Matches: " + numTotalHits);
// Collect matching terms
HashSet<String> matchedWords = new HashSet<String>();
int start = 0;
int end = Math.min(numTotalHits, max);
for (int i = start; i < end; i++) {
int id = hits[i].doc;
float score = hits[i].score;
Document doc = searcher.doc(id);
String docpath = doc.get("path");
String doctext = doc.get("text");
try {
TokenStream tokens = TokenSources.getTokenStream("text", null, doctext, analyzer, -1);
TextFragment[] frag = highlighter.getBestTextFragments(tokens, doctext, false, 100);
for (int j = 0; j < frag.length; j++) {
if ((frag[j] != null) && (frag[j].getScore() > 0)) {
String match = frag[j].toString();
addMatchedWord(matchedWords, match);
}
}
} catch (InvalidTokenOffsetsException e) {
System.err.println(e.getMessage());
}
System.out.println("matched file: " + docpath);
}
if (matchedWords.size() > 0) {
System.out.println("matched terms:");
for (String word : matchedWords) {
System.out.println(word);
}
}
}
Problem
While the correct documents are selected by these queries, and the fragments chosen for highlighting do contain the query terms, the highlighted pieces in some of the selected fragments extend over too much of the input.
For example, if the query is
+text:event +text:manage
(the first example in the test case) then I would expect to see 'event' and 'manage' in the highlighted list. But what I actually see is
event);
manage
Despite the highlighting process using an analyzer which breaks terms apart and treats punctuation characters as single terms, the highlight code is "hungry" and breaks on whitespace alone.
Similarly if the query is
+text:queeu~1
(my final test case) I would expect to only see 'queue' in the list. But I get
queue.
job.queue
task.queue
queue;
It is so nearly there... but I don't understand why the highlighted pieces are inconsistent with the index, and I don't think I should have to parse the list of matches through yet another filter to produce the correct list of matches.
I would really appreciate any pointers to what I am doing wrong or how I could improve my code to deliver exactly what I need.
Thanks for reading this far!
I managed to get this working by replacing the WhitespaceTokenizer and TechTokenFilter in my analyzer with a PatternTokenizer; the regular expression took a bit of work but once I had it all the matching terms were extracted with pinpoint accuracy.
The replacement analyzer:
public class MyAnalyzer extends Analyzer {
public MyAnalyzer(String synlist) {
if (synlist != "") {
this.synlist = synlist;
this.useSynonyms = true;
}
}
public MyAnalyzer() {
this.useSynonyms = false;
}
private static final String tokenRegex = "(([\\w]+-)*[\\w]+)|[^\\w\\s]";
#Override
protected TokenStreamComponents createComponents(String fieldName) {
PatternTokenizer src = new PatternTokenizer(Pattern.compile(tokenRegex), 0);
TokenStream result = new LowerCaseFilter(src);
if (useSynonyms) {
result = new SynonymGraphFilter(result, getSynonyms(synlist), Boolean.TRUE);
result = new FlattenGraphFilter(result);
}
return new TokenStreamComponents(src, result);
}

Redis key restrictions with scan

I have some code that stores, in redis, a flag of whether a user is active, under a unique key per user.
class RedisProfileActiveRepo implements ProfileActiveRepo
{
/** #var Redis */
private $redis;
public function __construct(Redis $redis)
{
$this->redis = $redis;
}
public function markProfileIsActive(int $profile_id)
{
$keyname = ProfileIsActiveKey::getAbsoluteKeyName($profile_id);
// Set the user specific key for 10 minutes
$result = $this->redis->setex($keyname, 10 * 60, 'foobar');
}
public function getNumberOfActiveProfiles()
{
$count = 0;
$pattern = ProfileIsActiveKey::getWildcardKeyName();
$iterator = null;
while (($keys = $this->redis->scan($iterator, $pattern)) !== false) {
$count += count($keys);
}
return $count;
}
}
When I generate the keys from this code:
namespace ProjectName;
class ProfileIsActive
{
public static function getAbsoluteKeyName(int $profile_id) : string
{
return __CLASS__ . '_' . $profile_id;
}
public static function getWildcardKeyName() : string
{
return __CLASS__ . '_*';
}
}
Which results in the keys looking like ProjectName\ProfileIsActive_1234 the scan command in Redis fails to match any keys.
When I replace the slashes with underscores:
class ProfileIsActive
{
public static function getAbsoluteKeyName(int $profile_id) : string
{
return str_replace('\\', '', __CLASS__) . '_' . $profile_id;
}
public static function getWildcardKeyName() : string
{
return str_replace('\\', '', __CLASS__) . '_*';
}
}
The code works as expected.
My question is - why is doing a scan with a slash in the keyname failing to behave as expected, and are there any other characters that should be avoided in keynames to avoid similar problems?
Theoretically latest Redis autoescapes backslashes when you set keys at redis-cli:
127.0.0.1:6379> set this\test 1
OK
127.0.0.1:6379> keys this*
1) "this\\test"
Issue a MONITOR command in redis-cli before you run your php client code, and watch for SCAN commands. If your collection is big enough and your count parameter is absent or low enough you might not get the record:
127.0.0.1:6379> scan 0 match this*
1) "73728"
2) (empty list or set)
127.0.0.1:6379> scan 0 match this* count 10000
1) "87704"
2) 1) "this\\test"

Why is this method throwing a null pointer exception?

I'm attempting to write a class to create a prefix tree (or trie) with an addWord method that takes in a string as a parameter and stores each character in it's appropriate place in the tree.
However, I keep getting a NullPointerException in the line of my first if statement (indicated below). Can anyone help me understand what's causing this? Thank you in advance!
public class PrefixTree {
private Node root;
public PrefixTree () {
root = new Node();
}
public void addWord(String word) {
int length = word.length();
char currentCharacter = word.charAt(0);
Node currentNode = root;
//Essentially this is saying "for each character in the string..."
for(int i=0; i<length; i++){
currentCharacter= word.charAt(i);
//if the children array of parent node does not contain the current character
//create a new node and add it to the parent array.
//HERE IS WHERE THE EXCEPTION IS BEING THROWN
if(currentNode.children[currentCharacter - 'a'] == null) {
Node newNode = new Node();
//set the node character value equal to the current character
newNode.c=currentCharacter;
//add the new node to the child array of its parent node
currentNode.children[currentCharacter - 'a']= newNode;
//if this is the last character in the word, change the endWord value to true
if( i == length-1) {
newNode.endWord = true;
//stores the complete string in its ending node
newNode.fullWord = word;
}
//set current node equal to the new node created and repeat the process
currentNode = newNode;
}
}
}
private class Node {
public boolean endWord;
public char c;
public Node[] children;
public String fullWord;
public Node(){
c = '0';
endWord = false;
Node[] children = new Node[26];
//Stores the complete string of a word ending w/ this node to make life easier later
String fullWord = null;
}
}
public static void main(String [] args){
PrefixTree test = new PrefixTree();
test.addWord("test");
}
}
Because you're assigning to local variables in Node's constructor. Change to this:
public Node(){
c = '0';
endWord = false;
this.children = new Node[26];
//Stores the complete string of a word ending w/ this node to make life easier later
this.fullWord = null;
}

How to use GroupFormatter with ObjectListView control

I cannot seem to find anywhere, any examples on how to make use of the GroupFormatter delegate to allow me to add footers to my groups when using the ObjectListView control.
Does anyone have any examples that could demonstrate this? I want to remove the text from the group header and add a footer (different text per footer). As well as changing font, etc.
Any examples would be very helpful.
You can analyze the code for the
public void MakeGroupies<T>(T[] values, string[] descriptions, object[] images, string[] subtitles, string[] tasks)
method of the ObjectListView class. That explicitly sets the GroupKeyGetter, GroupKeyToTitleConverter and GroupFormatter property delegates.
This is C# but your VB adaptation should be straightforward. I am using this small test class as the object type to bind to the list view.
public class TestClass
{
private readonly string _s;
private readonly float _f;
public TestClass( string p1, float p2 )
{
this._s = p1;
this._f = p2;
}
[OLVColumn(DisplayIndex = 1, Name="S", Title="String")]
public string S {get {return this._s;}}
[OLVColumn( DisplayIndex = 2, Name = "F", Title = "Float" )]
public float F {get {return this._f;}}
}
So as not to manually define column traits I am using attributes inside the bound object and a
BrightIdeasSoftware.Generator.GenerateColumns( this.olv, typeof( TestClass ) );
call in the form/user control where I am using the list view. In fact here is the method that completely isolates ObjectListView configuration:
void SetData( TestClass[] objects )
{
// build list columns
Generator.GenerateColumns( this.olv, typeof( TestClass ) );
// use groups and make current column the priimary sort column
this.olv.ShowGroups = true;
this.olv.SortGroupItemsByPrimaryColumn = false;
// loop through columns and set properties
foreach( OLVColumn col in this.olv.Columns )
{
col.Groupable = true;
col.Sortable = true;
if( col.Name == "F" )
{
col.MakeGroupies<float>( new float[] { 10f, 100f, 1000f }, new string[] { "<10", "10-100", "100-1000", ">1000" } );
}
else if( col.Name == "S" )
{
col.UseInitialLetterForGroup = false;
//
col.GroupKeyGetter = ( obj ) =>
{
TestClass tc = (TestClass)obj;
switch( char.ToLower( tc.S[0] ) )
{
case 'a':
case 'e':
case 'i':
case 'o':
case 'u': return true;
default: return false;
}
};
//
col.GroupKeyToTitleConverter = ( o ) => { bool b = (bool)o; return b ? "vowel" : "consonant"; };
//
col.GroupFormatter = ( /*OLVGroup*/ group, /*GroupingParameters*/ parms ) =>
{
string s = string.Format ("{0} {1}", group.GroupId, group.Id);
//group.BottomDescription = "BottomDescription: " + s;
//group.TopDescription = "TopDescription: " + s;
group.Footer = "Footer: " + s;
};
}
}
//
this.olv.RebuildColumns();
//
this.olv.SetObjects( objects );
}
You will definitely have one different footer per each group.

NHibernate does not seems doing Bulk Inserting into PostgreSQL

I am interfacing with a PostgreSQL database with NHibernate.
Background
I made some simple tests...it seems it's taking 2 seconds to persist 300 records.
I have a Perl program with identical functionality, but issue direct SQL instead, takes only 70% of the time.
I am not sure if this is expected. I thought C#/NHibernate would be faster or at least on par.
Questions
One of my observation is that (with show_sql turned on), the NHibernate is issuing INSERTs a few hundreds times, instead of doing bulk INSERT that take cares of multiple rows. And note I am assigning the primary key myself, not using the "native" generator.
Is that expected? Is there anyway I could make it issue bulk INSERT statement instead? It seems to me that this could be one of the area I could speed up the performance.
As stachu found out correctly: NHibernate does not have *BatchingBatcher(Factory) for PostgreSQL(Npgsql)
As stachu askes: Did anybody managed to force Nhibarnate to do batch inserts to PostgreSQL
I wrote a Batcher that doesn't use any Npgsql batching stuff, but does manipulate the SQL String "oldschool style" (INSERT INTO [..] VALUES (...),(...), ...)
using System;
using System.Collections;
using System.Data;
using System.Diagnostics;
using System.Text;
using Npgsql;
namespace NHibernate.AdoNet
{
public class PostgresClientBatchingBatcherFactory : IBatcherFactory
{
public virtual IBatcher CreateBatcher(ConnectionManager connectionManager, IInterceptor interceptor)
{
return new PostgresClientBatchingBatcher(connectionManager, interceptor);
}
}
/// <summary>
/// Summary description for PostgresClientBatchingBatcher.
/// </summary>
public class PostgresClientBatchingBatcher : AbstractBatcher
{
private int batchSize;
private int countOfCommands = 0;
private int totalExpectedRowsAffected;
private StringBuilder sbBatchCommand;
private int m_ParameterCounter;
private IDbCommand currentBatch;
public PostgresClientBatchingBatcher(ConnectionManager connectionManager, IInterceptor interceptor)
: base(connectionManager, interceptor)
{
batchSize = Factory.Settings.AdoBatchSize;
}
private string NextParam()
{
return ":p" + m_ParameterCounter++;
}
public override void AddToBatch(IExpectation expectation)
{
if(expectation.CanBeBatched && !(CurrentCommand.CommandText.StartsWith("INSERT INTO") && CurrentCommand.CommandText.Contains("VALUES")))
{
//NonBatching behavior
IDbCommand cmd = CurrentCommand;
LogCommand(CurrentCommand);
int rowCount = ExecuteNonQuery(cmd);
expectation.VerifyOutcomeNonBatched(rowCount, cmd);
currentBatch = null;
return;
}
totalExpectedRowsAffected += expectation.ExpectedRowCount;
log.Info("Adding to batch");
int len = CurrentCommand.CommandText.Length;
int idx = CurrentCommand.CommandText.IndexOf("VALUES");
int endidx = idx + "VALUES".Length + 2;
if (currentBatch == null)
{
// begin new batch.
currentBatch = new NpgsqlCommand();
sbBatchCommand = new StringBuilder();
m_ParameterCounter = 0;
string preCommand = CurrentCommand.CommandText.Substring(0, endidx);
sbBatchCommand.Append(preCommand);
}
else
{
//only append Values
sbBatchCommand.Append(", (");
}
//append values from CurrentCommand to sbBatchCommand
string values = CurrentCommand.CommandText.Substring(endidx, len - endidx - 1);
//get all values
string[] split = values.Split(',');
ArrayList paramName = new ArrayList(split.Length);
for (int i = 0; i < split.Length; i++ )
{
if (i != 0)
sbBatchCommand.Append(", ");
string param = null;
if (split[i].StartsWith(":")) //first named parameter
{
param = NextParam();
paramName.Add(param);
}
else if(split[i].StartsWith(" :")) //other named parameter
{
param = NextParam();
paramName.Add(param);
}
else if (split[i].StartsWith(" ")) //other fix parameter
{
param = split[i].Substring(1, split[i].Length-1);
}
else
{
param = split[i]; //first fix parameter
}
sbBatchCommand.Append(param);
}
sbBatchCommand.Append(")");
//rename & copy parameters from CurrentCommand to currentBatch
int iParam = 0;
foreach (NpgsqlParameter param in CurrentCommand.Parameters)
{
param.ParameterName = (string)paramName[iParam++];
NpgsqlParameter newParam = /*Clone()*/new NpgsqlParameter(param.ParameterName, param.NpgsqlDbType, param.Size, param.SourceColumn, param.Direction, param.IsNullable, param.Precision, param.Scale, param.SourceVersion, param.Value);
currentBatch.Parameters.Add(newParam);
}
countOfCommands++;
//check for flush
if (countOfCommands >= batchSize)
{
DoExecuteBatch(currentBatch);
}
}
protected override void DoExecuteBatch(IDbCommand ps)
{
if (currentBatch != null)
{
//Batch command now needs its terminator
sbBatchCommand.Append(";");
countOfCommands = 0;
log.Info("Executing batch");
CheckReaders();
//set prepared batchCommandText
string commandText = sbBatchCommand.ToString();
currentBatch.CommandText = commandText;
LogCommand(currentBatch);
Prepare(currentBatch);
int rowsAffected = 0;
try
{
rowsAffected = currentBatch.ExecuteNonQuery();
}
catch (Exception e)
{
if(Debugger.IsAttached)
Debugger.Break();
throw;
}
Expectations.VerifyOutcomeBatched(totalExpectedRowsAffected, rowsAffected);
totalExpectedRowsAffected = 0;
currentBatch = null;
sbBatchCommand = null;
m_ParameterCounter = 0;
}
}
protected override int CountOfStatementsInCurrentBatch
{
get { return countOfCommands; }
}
public override int BatchSize
{
get { return batchSize; }
set { batchSize = value; }
}
}
}
I also found that NHibernate is not doing batch inserts into PostgreSQL.
I identified two possible reasons:
1) Npgsql driver does not support batch inserts/updates (see forum)
2) NHibernate does not have *BatchingBatcher(Factory) for PostgreSQL(Npgsql). I tried using Devart dotConnect driver with NHibernate (I wrote custom driver for NHibernate) but it still did not worked.
I suppose this driver should also implement IEmbeddedBatcherFactoryProvider interface, but it seems not trivial for me (using one for Oracle did not worked ;) )
Did anybody managed to force Nhibarnate to do batch inserts to PostgreSQL or can confirm my conclusion?