need to join the vertex in dse - datastax

I have created properties and vertex like
schema.propertyKey('REFERENCE_ID').Int().multiple().create();
schema.propertyKey('Name').Text().single().create();
schema.propertyKey('PARENT_NAME').Text().single().create(); ... ....
.. schema.propertyKey('XXX').Text().single().create();
schema.vertexLabel('VERT1').properties("REFERENCE_ID",.."PROPERTY10"....."PROPERTY15")//15
PROPERTIES
schema.vertexLabel('VER2').properties("REFERENCE_ID",.."PROPERTY20"......"PROPERTY35")//35
PROPERTIES
schema.vertexLabel('VERT3').properties("REFERENCE_ID",.."PROPERTY20"....."PROPERTY25")//25
PROPERTIES
schema.vertexLabel('VERT4').properties("REFERENCE_ID",.."PROPERTY20"....."PROPERTY25")//25
PROPERTIES
and loaded csv data using DSG GRAPHLOADER(CSV TO(VERTEX)).
and created edge
schema.edgeLabel('ed1').single().create()
schema.edgeLabel('ed1').connection('VERT1', 'VER2').add()
schema.edgeLabel('ed1').single().create()
schema.edgeLabel('ed1').connection('VERT1', 'VERT3').add()
schema.edgeLabel('ed2').single().create()
schema.edgeLabel('ed2').connection('VERT3','VERT4').add()
But I don't know how to map the data between vertex and edge. I want to join all these 4 vertex. Could you please help on this?
I'm new to dse. I just ran the above code in datastax studio successfully and I can see the loaded data. I need to join the vertex...
Sql code: I want same in dse germlin.
select v1.REFERENCE_ID,v2.name,v3.total from VERT1 v1
join VER2 v2 on v1.REFERENCE_ID=v2.REFERENCE_ID
join VERT3 v3 on v2.sid=v3.sid

there are 2 "main" options in DSE for adding edge data, plus one if you're also using DSE Analytics.
One is to use Gremlin, like what's documented here - https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/insertTraversalAPI.html
This approach would be a traversal based approach and may not be the best/fastest choice for bulk operations
Another solution is to use the Graph Loader, check out the example with the .asEdge code sample here - https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/dgl/dglCSV.html#dglCSV
If you have DSE Analytics enabled, you can also use DataStax's DSE GraphFrame implementation, which leverages Spark, to preform this task as well. Here's an example - https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/graphAnalytics/dseGraphFrameImport.html

Related

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

Create subgraph query in Gremlin around single node with outgoing and incoming edges

I have a large Janusgraph database and I'd to create a subgraph centered around one node type and including incoming and outgoing nodes of specific types.
In Cypher, the query would look like this:
MATCH (a:Journal)N-[:PublishedIn]-(b:Paper{paperTitle:'My Paper Title'})<-[:AuthorOf]-(c:Author)
RETURN a,b,c
This is what I tried in Gremlin:
sg = g.V().outE('PublishedIn').subgraph('j_p_a').has('Paper','paperTitle', 'My Paper Title')
.inE('AuthorOf').subgraph('j_p_a')
.cap('j_p_a').next()
But I get a syntax error. 'AuthorOf' and 'PublishedIn' are not the only edge types ending at 'Paper' nodes.
Can someone show me how to correctly execute this query in Gremlin?
As written in your query, the outE step yields edges and the has step will check properties on those edges, following that the query processor will expect an inV not another inE. Without your data model it is hard to know exactly what you need, however, looking at the Cypher I think this is what you want.
sg = g.V().outE('PublishedIn').
subgraph('j_p_a').
inV().
has('Paper','paperTitle', 'My Paper Title').
inE('AuthorOf').
subgraph('j_p_a')
cap('j_p_a').
next()
Edited to add:
As I do not have your data I used my air-routes graph. I modeled this query on yours and used some select steps to limit the data size processed. This seems to work in my testing. Hopefully you can see the changes I made and try those in your query.
sg = g.V().outE('route').as('a').
inV().
has('code','AUS').as('b').
select('a').
subgraph('sg').
select('b').
inE('contains').
subgraph('sg').
cap('sg').
next()

Using apache beam JsonTimePartitioning to create time partitioned tables in bigqiery

I have tried using the JsonTimePartitioning class in apache beam JAVA sdk to write data to dynamic tables in bigquery but i get "cannot find symbol" for the class JsonTimePartitioning.
this is how i try to import the class
import com.google.api.services.bigquery.model.JsonTimePartitioning;
and this is how i try to use it in my pipeline
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withJsonTimePartitioningTo(new JsonTimePartitioning().setType("DAY")));
I can't seem to find the JsonTimePartitioning anywhere. Can you point to an example that you are trying to follow? The existing methods on BigQueryIO either accept an instance of TimePartiotioning, or a value-provider for a String that is actually a JSON-serialized instance of the same TimePartitioning. And in fact, when calling the TimePartitioning version of the method, you still end up just serializing it into string internally:. You can find an example of how it's used here:
Loading historical data into time-partitioned BigQuery tables To load
historical data into a time-partitioned BigQuery table, specify
BigQueryIO.Write.withTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning)
with a field used for column-based partitioning. For example:
PCollection<Quote> quotes = ...;
quotes.apply(BigQueryIO.write()
.withSchema(schema)
.withFormatFunction(quote -> new TableRow()
.set("timestamp", quote.getTimestamp())
.set(..other columns..))
.to("my-project:my_dataset.my_table")
.withTimePartitioning(new TimePartitioning().setField("time"))); ```

finds all the paths between a group of defined vertices in datastax dse graph

According to
this
the following query:
g.V(ids).as("a").repeat(bothE().otherV().simplePath()).times(5).emit(hasId(within(ids))).as("b").filter(select(last,"a","b").by(id).where("a", lt("b"))).path().by().by(label)
does not work in datastax graph because the lt("b") part cannot work on datastax id which is a json format
{
'~label=person',
member_id=54666,
community_id=505443455
}
How can I change the lt("b) part in order the query to work ?
Please help
You can pick any property that is comparable. E.g., if all your vertices have a name property:
g.V(ids).as("a").repeat(bothE().otherV().simplePath()).times(5).
emit(hasId(within(ids))).as("b").
filter(select(last,"a","b").by("name").where("a", lt("b"))).
path().by().by(label)

Creating Titan indexed types for elasticsearch

I am having problems getting the elastic search indexes to work correctly with Titan Server. I currently have a local Titan/Cassandra setup using Titan Server 0.4.0 with elastic search enabled. I have a test graph 'bg' with the following properties:
Vertices have two properties, "type" and "value".
Edges have a number of other properties with names like "timestamp", "length" and so on.
I am running titan.sh with the rexster-cassandra-es.xml config, and my configuration looks like this:
storage.backend = "cassandra"
storage.hostname = "127.0.0.1"
storage.index.search.backend = "elasticsearch"
storage.index.search.directory = "db/es"
storage.index.search.client-only= "false"
storage.index.search.local-mode = "true"
This configuration is the same in the bg config in Rexter and the groovy script that loads the data.
When I load up Rexster client and type in g = rexster.getGraph("bg"), I can perform an exact search using g.V.has("type","ip_address") and get the correct vertices back. However when I run the query:
g.V.has("type",CONTAINS,"ip_")
I get the error:
Data type of key is not compatible with condition
I think this is something to do with the type "value" not being indexed. What I would like to do is make all vertex and edge attributes indexable so that I can use any of the string matching functions on them as necessary. I have already tried making an indexed key using the command
g.makeKey("type").dataType(String.class).indexed(Vertex.class).indexed("search",Vertex.class).make()
but to be honest I have no idea how this works. Can anyone help point me in the right direction with this? I am completely unfamiliar with elastic search and Titan type definitions.
Thanks,
Adam
the Wiki page Indexing Backend Overview should answer every little detail of your questions.
Cheers,
Daniel