Creating Titan indexed types for elasticsearch - indexing

I am having problems getting the elastic search indexes to work correctly with Titan Server. I currently have a local Titan/Cassandra setup using Titan Server 0.4.0 with elastic search enabled. I have a test graph 'bg' with the following properties:
Vertices have two properties, "type" and "value".
Edges have a number of other properties with names like "timestamp", "length" and so on.
I am running titan.sh with the rexster-cassandra-es.xml config, and my configuration looks like this:
storage.backend = "cassandra"
storage.hostname = "127.0.0.1"
storage.index.search.backend = "elasticsearch"
storage.index.search.directory = "db/es"
storage.index.search.client-only= "false"
storage.index.search.local-mode = "true"
This configuration is the same in the bg config in Rexter and the groovy script that loads the data.
When I load up Rexster client and type in g = rexster.getGraph("bg"), I can perform an exact search using g.V.has("type","ip_address") and get the correct vertices back. However when I run the query:
g.V.has("type",CONTAINS,"ip_")
I get the error:
Data type of key is not compatible with condition
I think this is something to do with the type "value" not being indexed. What I would like to do is make all vertex and edge attributes indexable so that I can use any of the string matching functions on them as necessary. I have already tried making an indexed key using the command
g.makeKey("type").dataType(String.class).indexed(Vertex.class).indexed("search",Vertex.class).make()
but to be honest I have no idea how this works. Can anyone help point me in the right direction with this? I am completely unfamiliar with elastic search and Titan type definitions.
Thanks,
Adam

the Wiki page Indexing Backend Overview should answer every little detail of your questions.
Cheers,
Daniel

Related

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

Elasticsearch connector and owl:sameAs on graphdb

im using ruleset OWL-RL optimized and using elasticsearch connector for search.
All i want is to recoginize the entity has same value and merge all values into one document in es.
Im doing this by:
Person - hasPhone - Phone and have InverseFunctionalProperty on relation hasPhone
Example:
http://example.com#1 http://example.com#hasPhone http://example.com#111.
http://example.com#2 http://example.com#hasPhone http://example.com#111.
=> #1 owl:sameAs #2
when i search by ES, i receive two result both #1, #2 . But when i repair connector i get only one result (that what i want).
1./ I want to ask is there a way that ES connector auto merge doc and delete previous doc ?, because i dont want to repair connector all the time. When i set manageIndex:false, it always get two results when searching.
2./ How to receive only one record, exculding the others have owl:sameAs with this record by SPARQL.
3./ Is there a better ruleset for owl:sameAs and InverseFunctionalProperty for reference ?
The connector watches for changes to property paths (as specified by the index definition) but I don't think it can detect the merging (smushing) caused by sameAs, that's why you need the rebuilding. If this is an important case, I can post an improvement issue for you, but please email graphdb-support and vladimir.alexiev at ontotext with a description of your business case (and link to this question)
If you have "sameAs optimization" enabled for the repo (which it is by default) and do NOT have "expand sameAs URLs" in the query, you should get only 1 result for queries like ?x <http://example.com#hasPhone> <http://example.com#111>
OWL-RL-Optimized is good for your case. (The rulesets supporting InverseFunctionalProperty are OWL-RL, OWL-QL, rdfsPlus and their optimized variants.)

Create subgraph query in Gremlin around single node with outgoing and incoming edges

I have a large Janusgraph database and I'd to create a subgraph centered around one node type and including incoming and outgoing nodes of specific types.
In Cypher, the query would look like this:
MATCH (a:Journal)N-[:PublishedIn]-(b:Paper{paperTitle:'My Paper Title'})<-[:AuthorOf]-(c:Author)
RETURN a,b,c
This is what I tried in Gremlin:
sg = g.V().outE('PublishedIn').subgraph('j_p_a').has('Paper','paperTitle', 'My Paper Title')
.inE('AuthorOf').subgraph('j_p_a')
.cap('j_p_a').next()
But I get a syntax error. 'AuthorOf' and 'PublishedIn' are not the only edge types ending at 'Paper' nodes.
Can someone show me how to correctly execute this query in Gremlin?
As written in your query, the outE step yields edges and the has step will check properties on those edges, following that the query processor will expect an inV not another inE. Without your data model it is hard to know exactly what you need, however, looking at the Cypher I think this is what you want.
sg = g.V().outE('PublishedIn').
subgraph('j_p_a').
inV().
has('Paper','paperTitle', 'My Paper Title').
inE('AuthorOf').
subgraph('j_p_a')
cap('j_p_a').
next()
Edited to add:
As I do not have your data I used my air-routes graph. I modeled this query on yours and used some select steps to limit the data size processed. This seems to work in my testing. Hopefully you can see the changes I made and try those in your query.
sg = g.V().outE('route').as('a').
inV().
has('code','AUS').as('b').
select('a').
subgraph('sg').
select('b').
inE('contains').
subgraph('sg').
cap('sg').
next()

Ravendb Multi Get with index

I am trying to use ravendb (build 960) multi get to get the results of several queries.
I am posting to /multi_get with:
[
{"Url":"/databases/myDb/indexes/composers?query=title:beethoven&fetch=title&fetch=biography"},
{"Url":"/databases/myDb/indexes/products?query=title:beethoven&fetch=title&fetch=price"}
]
The server responds with results for each query, however it responds with EVERY document for each index. It looks like neither the query is used, or the fetch parameters.
Is there something I am doing wrong here?
Multi GET assumes all the urls are local to the current database, you can specify urls starting with /datbases/foo
You specify that in the multi get url.
Change you code to generate:
[
{"Url":"/indexes/composers?query=title:beethoven&fetch=title&fetch=biography"},
{"Url":"/indexes/products?query=title:beethoven&fetch=title&fetch=price"}
]
And make sure that you multi get goes to
/databases/mydb/multi_get

Endeca UrlENEQuery java API search

I'm currently trying to create an Endeca query using the Java API for a URLENEQuery. The current query is:
collection()/record[CONTACT_ID = "xxxxx" and SALES_OFFICE = "yyyy"]
I need it to be:
collection()/record[(CONTACT_ID = "xxxxx" or CONTACT_ID = "zzzzz") and
SALES_OFFICE = "yyyy"]
Currently this is being done with an ERecSearchList with CONTACT_ID and the string I'm trying to match in an ERecSearch object, but I'm having difficulty figuring out how to get the UrlENEQuery to generate the or in the correct fashion as I have above. Does anyone know how I can do this?
One of us is confused on multiple levels:
Let me try to explain why I am confused:
If Contact_ID and Sales_Office are different dimensions, where Contact_ID is a multi-or dimension, then you don't need to use EQL (the xpath like language) to do anything. Just select the appropriate dimension values and your navigation state will reflect the query you are trying to build with XPATH. IE CONTACT_IDs "ORed together" with SALES_OFFICE "ANDed".
If you do have to use EQL, then the only way to modify it (provided that you have to modify it from the returned results) is via string manipulation.
ERecSearchList gives you ability to use "Search Within" functionality which functions completely different from the EQL filtering, though you can achieve similar results by using tricks like searching only specified field (which would be separate from the generic search interface") I am still not sure what's the connection between ERecSearchList and the EQL expression above?
Having expressed my confusion, I think what you need to do is to use String manipulation to dynamically build the EQL expression and add it to the Query.
A code example of what you are doing would be extremely helpful as well.