Create subgraph query in Gremlin around single node with outgoing and incoming edges

Create subgraph query in Gremlin around single node with outgoing and incoming edges - cypher

I have a large Janusgraph database and I'd to create a subgraph centered around one node type and including incoming and outgoing nodes of specific types.
In Cypher, the query would look like this:
MATCH (a:Journal)N-[:PublishedIn]-(b:Paper{paperTitle:'My Paper Title'})<-[:AuthorOf]-(c:Author)
RETURN a,b,c
This is what I tried in Gremlin:
sg = g.V().outE('PublishedIn').subgraph('j_p_a').has('Paper','paperTitle', 'My Paper Title')
.inE('AuthorOf').subgraph('j_p_a')
.cap('j_p_a').next()
But I get a syntax error. 'AuthorOf' and 'PublishedIn' are not the only edge types ending at 'Paper' nodes.
Can someone show me how to correctly execute this query in Gremlin?

As written in your query, the outE step yields edges and the has step will check properties on those edges, following that the query processor will expect an inV not another inE. Without your data model it is hard to know exactly what you need, however, looking at the Cypher I think this is what you want.
sg = g.V().outE('PublishedIn').
subgraph('j_p_a').
inV().
has('Paper','paperTitle', 'My Paper Title').
inE('AuthorOf').
subgraph('j_p_a')
cap('j_p_a').
next()
Edited to add:
As I do not have your data I used my air-routes graph. I modeled this query on yours and used some select steps to limit the data size processed. This seems to work in my testing. Hopefully you can see the changes I made and try those in your query.
sg = g.V().outE('route').as('a').
inV().
has('code','AUS').as('b').
select('a').
subgraph('sg').
select('b').
inE('contains').
subgraph('sg').
cap('sg').
next()

Related

PromQL: How to get cpu usage of all replicasets and containers given the cluster, namesapce, and deployment?

I am trying to write PromQL on Prometheus UI to get cpu usage of all replicasets and their containers by fixing the cluster, namespace, and deployment. My desirable outcome is to graph each {replicaset, container} pair cpu usage on the same graph. Since there are no labels within container_cpu_usage_seconds_total that allow me to group by replicaset name. I am sure that I have to retrieve replicaset name and somehow aggregate by containers using the kube_pod_info metrics. And then, join different metrics to get what I want.
Below is what I came up with right now:
avg by (replicaset, container) (
container_cpu_usage_seconds_total
* on (replicaset) group_left (created_by_kind, created_by_name)
(kube_pod_info {app_kubernetes_io_name="kube-state-metrics", cluster=~"${my_cluster}", namespace=~"${my_namespace}", created_by_kind=~"ReplicaSet"} * 0)
)
I got an error saying "many-to-many matching not allowed: matching labels must be unique on one side".
My desirable output is:
*{r, c} means certain {replicaset, container} pair

How to automatically break down a SQL-like query with many joins into discrete, independent steps?

Note: This is a learning exercise to learn how to implement a SQL-like relational database. This is just one thin slice of a question in the overall grand vision.
I have the following query, given a test database with a few hundred records:
select distinct "companies"."name"
from "companies"
inner join "projects" on "projects"."company_id" = "companies"."id"
inner join "posts" on "posts"."project_id" = "projects"."id"
inner join "comments" on "comments"."post_id" = "posts"."id"
inner join "addresses" on "addresses"."company_id" = "companies"."id"
where "addresses"."name" = 'Address Foo'
and "comments"."message" = 'Comment 3/3/2/1';
Here, the query is kind of unrealistic, but it demonstrates the point which I am trying to make. The point is to have a query with a few joins, so that I can figure out how to write this in sequential steps.
The first part of the question is (which I think I've partially figured out), is how do you write these joins as a sequence of independent steps, with the output of one fed into the input of the other? Also, is there more than one way to do it?
// step 1
let companies = select('companies')
// step 2
let projects = join(companies, select('projects'), 'id', 'company_id')
// step 3
let posts = join(projects, select('posts'), 'id', 'project_id')
// step 4
let comments = join(posts, select('comments'), 'id', 'post_id')
// step 5
let finalPosts = posts.filter(post => !!comments.find(comment => comment.post_id === post.id))
// step 6
let finalProjects = projects.filter(project => !!posts.find(post => post.project_id === project.id))
// step 7, could also be run in parallel to step 2 potentially
let addresses = join(companies, select('addresses'), 'id', 'company_id')
// step 8
let finalCompanies = companies.filter(company => {
return !!posts.find(post => post.company_id === company.id)
&& !!addresses.find(address => address.company_id === company.id)
})
These filters could probably be more optimized using indexes of some sort, but that is beside the point I think. This just demonstrates that there seem to be about 8 steps to find the companies we are looking for.
The main question is, how do you automatically figure out the steps from the SQL query?
I am not asking about how to parse the SQL query into an AST. Assume we have some sort of object structure we are dealing with, like an AST, to start.
How would you have to have the SQL query in structured object form, such that it would lead to these 8 steps? I would like to be able to specify a query (using a custom JSON-like syntax, not SQL), and then have it divide the query into these steps to divide and conquer so to speak and perform the queries in parts (for learning how to implement distributed databases). But I don't see how we go from SQL-like syntax, to 8 steps. Can you show how that might be done?
Here is the full code for the demo, which you can run with psql postgres -f test.sql. The result should be "Company 3".
Basically looking for a high level algorithm (doesn't even need to be code), which describes the key way you would break down some sort of AST-like object representation of a SQL query, into the actual planned steps of the query.
My algorithm looks like this in my head:
represent SQL query in object tree.
convert object tree to steps.
I am not really sure what (1) should be structured like, and even if we had some sort of structure, I'm not sure how to get that to complete (2). Looking for more details on the implementations of these steps, mainly step (2).
My "object structure" for step 1 would be something like this:
const query = {
select: {
distinct: true,
columns: ['companies.name'],
from: ['companies'],
},
joins: [
{
type: 'inner join',
table: 'projects',
left: 'projects.company_id',
right: 'companies.id',
},
...
],
conditions: [
{
left: 'addresses.name',
op: '=',
right: 'Address Foo'
},
...
]
}
I am not sure how useful that is, but it doesn't relate to steps at all. At a high level, what kind of code would I have to write to convert that object sort of structure into steps? Seems like one potential avenue is do a topological sort on the joins. But then you need to combine that with the select and conditions somehow, not sure how you would even begin to programmatically know what step should be before what other step, or even what the steps are. Maybe if I somehow could break it into known "chunks", then it would be simple to apply TOP sort to it after that, but then the question is still, how to get into chunks from the object structure / SQL?
Basically, I have been reading about the theory behind "query planning and optimization", but don't know how to apply it in this regard. How did this site do it?
One aspect is breaking at least the where conditions into CNF.

Implementing joins is a huge topic which is probably out of scope for a StackOverflow answer.
If you're looking for practical information about how joins are implemented, I would suggest...
The Join Operation section of Use The Index, Luke for different types of join implementation.
Section 7 of the The SQLite Query Optimizer Overview covers joins. And reading the SQLite source code. It is about as small a practical SQL implementation will get.
The output of explain in Postgresql gives very detailed information about how it has implemented the query. And they are explained in Operator Optimization Information

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA

It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

How to use multiple union step in gremlin query?

I want to convert following cypher query in gremlin but facing issue while using union.
match d=(s:Group{id:123})<-[r:Member_Of*]-(p:Person) with d,
RELATIONSHIPS(d) as rels
WHERE NONE(rel in rels WHERE EXISTS(rel.`Ceased On`))
return *
UNION
match d=(s:Group{id:123})<-[r:Member_Of*]-(p:Group)-[r1:
Member_Of*]->(c:Group) with d,
RELATIONSHIPS(d) as rels
WHERE NONE(rel in rels WHERE EXISTS(rel.`Ceased On`))
return *
UNION
match d=(c:Group{id:123})-[r:Member_Of*]->(p:Group) with d,
RELATIONSHIPS(d) as rels
WHERE NONE(rel in rels WHERE EXISTS(rel.`Ceased On`))
return *
UNION
match d=(s:Group{id:123})<-[r:Member_Of*]-(p:Group)<-[r1:
Member_Of*]-(c:Person) with d,
RELATIONSHIPS(d) as rels
where NONE(rel in rels WHERE EXISTS(rel.`Ceased On`))
return *
In above cypher query, source vertex is Group, which has id '123', so for incoming and outgoing edge, I have created following gremlin query.
g.V().hasLabel('Group').has('id',123)
union(
__.inE('Member_Of').values('Name'),
__.outE('Member_Of').values('Name'))
.path()
Now I have to traverse incoming edge for the vertex, which is incoming vertex for source vertex in above query, where I am confused with union syntax.
Please help, Thanks :)

This part of the Cypher query
match d=(s:Group{id:123})<-[r:Member_Of*]-(p:Group)<-[r1:Member_Of*]-(c:Person)
In Gremlin, can be expressed as
g.V().has('Group','id',123).
repeat(inE('Member_Of').outV()).until(hasLabel('G')).
repeat(inE('Member_Of').outV()).until(hasLabel('Person')).
path().
by(elementMap())
If there are not really multiple hops involved (i.e. in Cypher if you do not really need the '*') you can remove the repeat construct and just keep the inE and outV steps. Anyway, this bit of Gremlin will get you the nodes and edges (relationships) along with all their properties etc. as the path will contain their elementMap.
Note that a straight port from Cypher to Gremlin may not take full advantage of the target database. For example, many Gremlin enabled stores allow user provided (real) ID values. This makes querying a lot more efficient as you can use something like this:
g.V('123')
to directly find a vertex.
Please add a comment below if this does not fully unblock you.
UPDATED: 2021-10-29
I used the air routes data set to test that the query pattern works, using this query:
g.V().has('airport','code','AUS').
repeat(inE('contains').outV()).until(hasLabel('country')).limit(1).
repeat(outE('contains').inV()).until(hasLabel('airport')).limit(3).
path().
by(elementMap())

Following is the total replacement of cypher query with gremlin query.
g.V().has('Group','id',123). //source vertex
union(
//Following represent incoming Member_Of edge towards source vertex from Person Node, untill the last node in the chain
repeat(inE('Member_Of').outV()).until(hasLabel('Person')),
//Following represent incoming Member_Of edge from Group to source vertex and edge outwards from same Group, untill the last node in the chain
repeat(inE('Member_Of').outV().hasLabel('Group').simplePath()).until(inE().count().is(0)).
repeat(outE('Member_Of').inV().hasLabel('Group').simplePath()).until(inE().count().is(0)),
//Following represent outgoing Member_Of edge from source vertex to another Group node, untill the last node in the chain
repeat(outE('Member_Of').inV().hasLabel('Group').simplePath()).until(outE().count().is(0)),
//Following represent incoming Member_Of edge from Group to source vertex and incoming edge from person to Group untill the last node in the chain
repeat(inE('Member_Of').outV().hasLabel('Group').simplePath()).until(inE().count().is(0)).
repeat(inE('Member_Of').outV().hasLabel('Person').simplePath()).until(inE().count().is(0))
).
path().by(elementMap())

orientdb sql update edge?

I have been messing around with orientdb sql, and I was wondering if there is a way to update an edge of a vertex, together with some data on it.
assuming I have the following data:
Vertex: Person, Room
Edge: Inside (from Person to Room)
something like:
UPDATE Persons SET phone=000000, out_Inside=(
select #rid from Rooms where room_id=5) where person_id=8
obviously, the above does not work. It throws exception:
Error: java.lang.ClassCastException: com.orientechnologies.orient.core.id.ORecordId cannot be cast to com.orientechnologies.orient.core.db.record.ridbag.ORidBag
I tried to look at the sources at github searching for a syntax for bag with 1 item,
but couldn't find any (found %, but that seems to be for serialization no for SQL).
(1) Is there any way to do that then? how do I update a connection? Is there even a way, or am I forced to create a new edge, and delete the old one?
(2) When writing this, it came to my mind that perhaps edges are not the way to go in this case. Perhaps I should use a LINK instead. I have to say i'm not sure when to use which, or what are the implications involved in using any of them. I did found this though:
https://groups.google.com/forum/#!topic/orient-database/xXlNNXHI1UE
comment 3 from the top, of Lvc#, where he says:
"The suggested way is to always create an edge for relationships"
Also, even if I should use a link, please respond to (1). I would be happy to know the answer anyway.
p.s.
In my scenario, a person can only be at one room. This will most likely not change in the future. Obviously, the edge has the advantage that in case I might want to change it (however improbable that may be), it will be very easy.
Solution (partial)
(1) The solution was simply to remove the field selection. Thanks for Lvca for pointing it out!
(2) --Still not sure--

CREATE EDGE and DELETE EDGE commands have this goal: avoid the user to fight with underlying structure.
However if you want to do it (a little "dirty"), try this one:
UPDATE Persons SET phone=000000, out_Inside=(
select from Rooms where room_id=5) where person_id=8

update EDGE Custom_Family_Of_Custom
set survey_status = '%s',
apply_source = '%s'
where #rid in (
select level1_e.#rid from (
MATCH {class: Custom, as: custom, where: (custom_uuid = '%s')}.bothE('Custom_Family_Of_Custom') {as: level1_e} .bothV('Custom') {as: level1_v, where: (custom_uuid = '%s')} return level1_e
)
)
it works well

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas