Converting gremlin query from gremlin console to Bytecode - tinkerpop

I am trying to convert gremlin query received from gremlin console to bytecode in order to extract StepInstructions. I am using the below code to do that but it looks hacky and ugly to me. Is there any better way of converting gremlin query from gremlin console to Bytecode?
String query = (String) requestMessage.getArgs().get(Tokens.ARGS_GREMLIN);
final GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine();
CompiledScript compiledScript = engine.compile(query);
final Graph graph = EmptyGraph.instance();
final GraphTraversalSource g = graph.traversal();
final Bindings bindings = engine.createBindings();
bindings.put("g", g);
DefaultGraphTraversal graphTraversal = (DefaultGraphTraversal) compiledScript.eval(bindings);
Bytecode bytecode = graphTraversal.getBytecode();

If you need to take a Gremlin string and convert it to Bytecode I don't think there is a much better way to do that. You must pass the string through a GremlinGroovyScriptEngine to evaluate it into an actual Traversal object that you can manipulate. The only improvement that I can think of would be to call eval() more directly:
// construct all of this once and re-use it for your application
final GremlinGroovyScriptEngine engine = new GremlinGroovyScriptEngine();
final Graph graph = EmptyGraph.instance();
final GraphTraversalSource g = graph.traversal();
final Bindings bindings = engine.createBindings();
bindings.put("g", g);
//////////////
String query = (String) requestMessage.getArgs().get(Tokens.ARGS_GREMLIN);
DefaultGraphTraversal graphTraversal = (DefaultGraphTraversal) engine.eval(query, bindings);
Bytecode bytecode = graphTraversal.getBytecode();

Related

Idiomatic way of comparing results of two completableFutures with assertj

There are 2 completable futures cf1 and cf2 defined as follows:
CompletableFuture<Boolean> cf1 = CompletableFuture.completedFuture(true);
CompletableFuture<Boolean> cf2 = CompletableFuture.completedFuture(true);
Technically, one could do:
var result1 = cf1.get();
var result2 = cf2.get();
assertThat(result1).isEqualTo(result2);
For example, if there was only one future, we could do the following:
assertThat(cf1)
.succeedsWithin(Duration.ofSeconds(1))
.isEqualTo(true);
Is there a more idiomatic way to compare the two futures against each other? Note that while the example here uses CompletableFuture<Boolean>, the Boolean can be replaced with any class.
If you are interested only in value comparison, passing cf2.get() as argument of isEqualTo should be enough:
CompletableFuture<Boolean> cf1 = CompletableFuture.completedFuture(true);
CompletableFuture<Boolean> cf2 = CompletableFuture.completedFuture(true);
assertThat(cf1)
.succeedsWithin(Duration.ofSeconds(1))
.isEqualTo(cf2.get());
The only downside is that get() can potentially throw ExecutionException and InterruptedException so they need to be declared in the test method signature.
If type-specific assertions are needed, succeedsWithin(Duration, InstanceOfAssertFactory) can help:
assertThat(cf1)
.succeedsWithin(Duration.ofSeconds(1), InstanceOfAssertFactories.BOOLEAN)
.isTrue(); // hardcoded check to show type-specific assertion
What about composing the two futures into one that completes successfully with a value of true if and only if both individual futures complete successfully with the same value? You could e.g. use thenCompose followed by thenApply:
CompletableFuture<Boolean> bothEqual = cf1.thenCompose(b1 -> cf2.thenApply(b2 -> b1 == b2));
If the sequential execution from this solution is problematic, you can parallelize by implementing a helper function alsoApply and use that one instead of thenCompose. See this answer for more details.

How to use secured queries in Apache Jena 3.10.0?

I am trying to build secured query in Apache Jena v.3.10.0.
I want to pass some query, modify the query according to existing SecurityEvaluator and execute it later.
However, I don't understand how it should be used with certain format of queries.
I tried doing it using simplified version of SecuredQueryEngine.
public class GenSecuredEngine extends QueryEngineMain {
private static Logger LOG = LoggerFactory
.getLogger(SecuredQueryEngine.class);
private SecurityEvaluator securityEvaluator;
private Node graphIRI;
public GenSecuredEngine(final Query query, final DatasetGraph dataset,
final SecurityEvaluator evaluator,
final Binding input, final Context context) {
super(query, dataset, input, context);
this.securityEvaluator = evaluator;
graphIRI = NodeFactory.createURI("urn:x-arq:DefaultGraph");
}
#Override
protected Op modifyOp(final Op op) {
final OpRewriter rewriter = new OpRewriter(securityEvaluator, graphIRI);
LOG.debug("Before: {}", op);
op.visit(rewriter);
Op result = rewriter.getResult();
result = result == null ? op : result;
LOG.debug("After: {}", result);
result = super.modifyOp(result);
LOG.debug("After Optimize: {}", result);
return result;
}
}
Then we have such code modifying the given query into Op, checking the permissions and building the new Op object.
Op oporiginal = new AlgebraGenerator().compile(query);
Op result = securedEngine.modifyOp(oporiginal);
System.out.println(OpAsQuery.asQuery(result));
If I pass the query like this
select *
where {
graph <forbiddenGraphUri> {?a ?b ?c}
}
then everything works fine and the "forbiddenUri" is checked against SecurityEvaluator and throws org.apache.jena.shared.ReadDeniedException: Model permissions violation.
But what if user wants to execute something like this:
select * where { graph ?g {?a ?b ?c}}
In this case we can check only URI of "?g" which is not really informative. I understand that AlgebraGenerator does not fill the missing spots so this is probably the wrong approach.
So, how can it be done? For example, if user wants to run the query against many named graphs, how to filter the not allowed ones? Is it possible at all with the existing tools?

What is an alternative for Lucene Query's extractTerms?

In Lucene 4.6.0 there was the method extractTerms that provided the extraction of terms from a query (Query 4.6.0). However, from Lucene 6.2.1, it does no longer exist (Query Lucene 6.2.1). Is there a valid alternative for it?
What I'd need is to parse terms (and corrispondent fields) of a Query built by QueryParser.
Maybe not the best answer but one way is to use the same analyzer and tokenize the query string:
Analyzer anal = new StandardAnalyzer();
TokenStream ts = anal.tokenStream("title", query); // string query
CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
ts.reset();
while (ts.incrementToken()) {
System.out.println(termAtt.toString());
}
anal.close();
I have temporarely solved my problem with the following code. Smarter alternatives will be well accepted:
QueryParser qp = new QueryParser("title", a);
Query q = qp.parse(query);
Set<Term> termQuerySet = new HashSet<Term>();
Weight w = searcher.createWeight(q, true, 3.4f);
w.extractTerms(termQuerySet);

What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?

Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.SequenceExample relatively confusing (especially due to lack of documentation), and managed to build an input pipe using tf.train.Example just fine instead.
Are there any advantages to using tf.train.SequenceExample? Using the standard example protocol when there is a dedicated one for variable length sequences seems like a cheat, but does it bear any consequence?
Here are the definitions of the Example and SequenceExample protocol buffers, and all the protos they may contain:
message BytesList { repeated bytes value = 1; }
message FloatList { repeated float value = 1 [packed = true]; }
message Int64List { repeated int64 value = 1 [packed = true]; }
message Feature {
oneof kind {
BytesList bytes_list = 1;
FloatList float_list = 2;
Int64List int64_list = 3;
}
};
message Features { map<string, Feature> feature = 1; };
message Example { Features features = 1; };
message FeatureList { repeated Feature feature = 1; };
message FeatureLists { map<string, FeatureList> feature_list = 1; };
message SequenceExample {
Features context = 1;
FeatureLists feature_lists = 2;
};
An Example contains a Features, which contains a mapping from feature name to Feature, which contains either a bytes list, or a float list or an int64 list.
A SequenceExample also contains a Features, but it also contains a FeatureLists, which contains a mapping from list name to FeatureList, which contains a list of Feature. So it can do everything an Example can do, and more. But do you really need that extra functionality? What does it do?
Since each Feature contains a list of values, a FeatureList is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample.
For example, if you handle text, you can represent it as one big string:
from tensorflow.train import BytesList
BytesList(value=[b"This is the first sentence. And here's another."])
Or you could represent it as a list of words and tokens:
BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
b"'s", b"another", b"."])
Or you could represent each sentence separately. That's where you would need a list of lists:
from tensorflow.train import BytesList, Feature, FeatureList
s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])
Then create the SequenceExample:
from tensorflow.train import SequenceExample, FeatureLists
seq = SequenceExample(feature_lists=FeatureLists(feature_list={
"sentences": fl
}))
And you can serialize it and perhaps save it to a TFRecord file.
data = seq.SerializeToString()
Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example().
The link you provided lists some benefits. You can see how parse_single_sequence_example is used here https://github.com/tensorflow/magenta/blob/master/magenta/common/sequence_example_lib.py
If you managed to get the data into your model with Example, it should be fine. SequenceExample just gives a little more structure to your data and some utilities for working with it.

Using ShingleFilter to build costomized analyzer in PyLucene

I am pretty new to Lucene and Pylucene. This is a problem when I am using pylucene to write a customized analyzer, to tokenize text in to bigrams.
The code for analyzer class is:
class BiGramShingleAnalyzer(PythonAnalyzer):
def __init__(self, outputUnigrams=False):
PythonAnalyzer.__init__(self)
self.outputUnigrams = outputUnigrams
def tokenStream(self, field, reader):
result = ShingleFilter(LowerCaseTokenizer(Version.LUCENE_35,reader))
result.setOutputUnigrams(self.outputUnigrams)
#print 'result is', result
return result
I used ShingleFilter on the TokenStream produced by LowerCaseTokeinizer. When I call the tokenStream function directly, it works just tine:
str = ‘divide this sentence'
bi = BiGramShingleAnalyzer(False)
sf = bi.tokenStream('f', StringReader(str))
while sf.incrementToken():
print sf
(divide this,startOffset=0,endOffset=11,positionIncrement=1,type=shingle)
(this sentence,startOffset=7,endOffset=20,positionIncrement=1,type=shingle)
But when I tried to build a query parser using this analyzer, problem occurred:
parser = QueryParser(Version.LUCENE_35, 'f', bi)
query = parser.parse(str)
In query there is nothing.
After I add print statement in the tokenStream function, I found when I call parser.parse(str), the print statement in tokenStream actually get called 3 times (3 words in my str variable). It seems to me the parser pre-processed the str I passed to it, and call the tokenStream function on the result of the pre-processing.
Any thoughts on how should I make the analyzer work, so that when I pass it to query parser, the parser could parse a string into bigrams?
Thanks in advance!