Dse Graph loader duplicate edges - datastax

I have the following csv files:
one is with the person and the other one is with the addresses and one with person address connection (one row on each file plus header). For testing purpose at first run I have:
config create_schema: true, load_new: true, load_threads: 3
The import is a success with the vertices and edges. (two vertices and one edge between them)
Now when I run the same script(same data, same input script) but with different config
config create_schema: false, load_new: false, load_threads: 3
It seems that the nodes didn’t change but I have a duplicate edge for the nodes. (two vertices and two edges between the same nodes)
this is the code that i run:
inputfiledir = 'data/'
personInput = File.csv(inputfiledir + 'sna_person_test.csv').delimiter(',')
addressInput = File.csv(inputfiledir + 'sna_address_test.csv').delimiter(',')
personAddressInput = File.csv(inputfiledir + 'san_person_address_test.csv').delimiter(',')
load(personInput).asVertices {
label "person"
key "id"
}
load(addressInput).asVertices {
label "address"
key "id"
}
load(personAddressInput).asEdges {
label "has_address"
outV "person_id", {
label "person"
key "id"
}
inV "address_id", {
label "address"
key "id"
}
}
Is there a way to avoid this ?
Thanks

This is due to edges not having an Id, which leads to Graph Loader not having a way to determine if an edge is in fact a duplicate. This will cause subsequent loads to duplicate the edges, but not the vertices.

Related

Is there a way to use the graphLookup aggregation pipeline stage for arrays?

I am currently working on an application that uses MongoDB as the data repository. I am mainly concerned about the graphLookup query to establish links between different people, based on what flights they took. My document contains an array field, that in turn contains key value pairs. I need to establish the links based on one of the key:value pairs of that array.
I have already tried some queries of aggregation pipeline with $graphLookup as one of the stages and they have all worked fine. But now that I am trying to use it with an array, I am hitting a blank.
Below is the array field from the first document :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1550932676000,
"arrivalDateTimeMillis":1551019076000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA007_1550932676000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_8",
"departureMonthlyTemporalSpatialWindow":"DOH_2",
"arrivalWeeklyTemporalSpatialWindow":"LHR_8",
"arrivalMonthlyTemporalSpatialWindow":"LHR_2"
}
]
The other document has the below field :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
},
{
"carrierCode":"MO270",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
}
]
And I am running the below query :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'carrierCode',
connectToField: 'carrierCode',
as: 'carrier_connections'
}
}
])
The above query creates an array field in the document, but there are no values in it. As per the expectation, both my documents should get linked based on the carrier number.
Just to be clear about the query, the documents contain an eventId field, and the match pipeline returns one document to me after the match stage.
Well, I don't know how I missed it, but here is the solution to my problem which gives me the required results :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'movementSegments.carrierCode',
connectToField: 'movementSegments.carrierCode',
as: 'carrier_connections'
}
}
])

Cytoscape inconsistent spacing in a grid layout

I'm using cytoscape to represent a DAG where nodes are grouped under compound nodes. The number of compound nodes is fixed (4). I want to make this graph entirely visible even for many nodes. This is possible in theory as the total width isn't large (4 cols) and the nodes inside the parents can be positioned vertically in rows.
For that I have been using a grid layout like that:
{
name: 'grid',
fit: false,
nodeDimensionsIncludeLabels: true,
condense: false,
avoidOverlap: true,
cols: 4,
position: function(node) {
let col, row;
if (node.classes().includes('a')) {
col = 0;
row = aRow++;
} else if (node.classes().includes('b')) {
col = 1;
row = bRow++;
} else if (node.classes().includes('c')) {
col = 2;
row = cRow++;
} else if (node.classes().includes('d')) {
col = 3;
row = 0;
}
return {row: row, col: col};
},
sort: function(l, r) {
return l.data('id').localeCompare(r.data('id'));
}
}
I set these attributes for the node css:
'height': 'label'
'width': 'label'
I'd expect the horizontal spacing between the parent components and the vertical spacing between nodes inside the same parent to be the same for different graphs. Unfortunately it doesn't seem to be the case.
The produced spacing for this larger graph looks ok:
However for this smaller graph I don't understand why those two parents nodes end up so close and why there is so much padding between the nodes:
Am I doing something wrong or the grid layout isn't suited for this?

Pentaho SQL to MongoDb - Array Issue

I need to update elements in an array, then, when I run the transformation at the first time, the array receives the righ numbers if elements in the PROD array. But if I run it again, the array will receives the same elements
Example:
At the first time, I got the document below, and It is correct:
{
"_id" : ObjectId("58e2c81f781a75592f69f8a5"),
"DDATA_ORC" : ISODate("2016-08-02T03:00:00.000Z"),
"SNUMORC" : "113239",
"PROD" : [
{
"SPRODUTO" : "TONER HP CE411A CIANO (305A)"
}
]
}
But if I run the transformation again, the PROD array will be updated with the same SPRODUTO:
{
"_id" : ObjectId("58e2c81f781a75592f69f8a5"),
"DDATA_ORC" : ISODate("2016-08-02T03:00:00.000Z"),
"SNUMORC" : "113239",
"PROD" : [
{
"SPRODUTO" : "TONER HP CE411A CIANO (305A)"
},
{
"SPRODUTO" : "TONER HP CE411A CIANO (305A)"
}
]
}
It is a problem because I will get wrong results for queries.
That is may plugin configurations:
Options Tab and Document Path tab
I need to update the array only if It receives or lose an item.
Thanks in advance
I solved this issue.
If anyone have this problem, the solution is to create 2 "MongoDB Output". In the first output, you need to set the array (the array will be recreated every time that the update query runs sucessfuly) . I did It using a dummy field.
First Output Document Fields
In the second "MongoDB Output", You need to execute a push to populate the array.
Second Output Document Fields
In the "Output Options" tab, You have to set Update, Upsert and "Modifier Update"

query for Time Stamp in mongo [duplicate]

I have a problem when querying mongoDB with nested objects notation:
db.messages.find( { headers : { From: "reservations#marriott.com" } } ).count()
0
db.messages.find( { 'headers.From': "reservations#marriott.com" } ).count()
5
I can't see what I am doing wrong. I am expecting nested object notation to return the same result as the dot notation query. Where am I wrong?
db.messages.find( { headers : { From: "reservations#marriott.com" } } )
This queries for documents where headers equals { From: ... }, i.e. contains no other fields.
db.messages.find( { 'headers.From': "reservations#marriott.com" } )
This only looks at the headers.From field, not affected by other fields contained in, or missing from, headers.
Dot-notation docs
Since there is a lot of confusion about queries MongoDB collection with sub-documents, I thought its worth to explain the above answers with examples:
First I have inserted only two objects in the collection namely: message as:
> db.messages.find().pretty()
{
"_id" : ObjectId("5cce8e417d2e7b3fe9c93c32"),
"headers" : {
"From" : "reservations#marriott.com"
}
}
{
"_id" : ObjectId("5cce8eb97d2e7b3fe9c93c33"),
"headers" : {
"From" : "reservations#marriott.com",
"To" : "kprasad.iitd#gmail.com"
}
}
>
So what is the result of query: db.messages.find({headers: {From: "reservations#marriott.com"} }).count()
It should be one because these queries for documents where headers equal to the object {From: "reservations#marriott.com"}, only i.e. contains no other fields or we should specify the entire sub-document as the value of a field.
So as per the answer from #Edmondo1984
Equality matches within sub-documents select documents if the subdocument matches exactly the specified sub-document, including the field order.
From the above statements, what is the below query result should be?
> db.messages.find({headers: {To: "kprasad.iitd#gmail.com", From: "reservations#marriott.com"} }).count()
0
And what if we will change the order of From and To i.e same as sub-documents of second documents?
> db.messages.find({headers: {From: "reservations#marriott.com", To: "kprasad.iitd#gmail.com"} }).count()
1
so, it matches exactly the specified sub-document, including the field order.
For using dot operator, I think it is very clear for every one. Let's see the result of below query:
> db.messages.find( { 'headers.From': "reservations#marriott.com" } ).count()
2
I hope these explanations with the above example will make someone more clarity on find query with sub-documents.
The two query mechanism work in different ways, as suggested in the docs at the section Subdocuments:
When the field holds an embedded document (i.e, subdocument), you can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using dot notation, to specify values for individual fields in the subdocument:
Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.
In the following example, the query matches all documents where the value of the field producer is a subdocument that contains only the field company with the value 'ABC123' and the field address with the value '123 Street', in the exact order:
db.inventory.find( {
producer: {
company: 'ABC123',
address: '123 Street'
}
});

Map/Reduce over sharded data with RavenDB

I'm having trouble getting a map reduce sample to work when the data is sharded across two nodes. I'm storing documents that relate to application errors being logged on two local ravenDB nodes, the error documents look like:
Example of document on node 1, there are 6 total
errors/1/6
{
"UniqueId": "c62c7e30-8ec7-45af-88e4-da023d796727",
"ApplicationName": "MyAppName"
}
Example of document on node 2, there are 7 total
errors/2/6 --Error stored on shard node 2
{
"UniqueId": "7e0b0f87-9d75-4e70-9fa0-d64a18bc88dc",
"ApplicationName": "MyAppName"
}
when I run this query:
public class ApplicationNames : AbstractIndexCreationTask<ErrorDocument, Application>
{
public ApplicationNames()
{
Map = errors => from error in errors
select new { error.ApplicationName, Count = 1 };
Reduce = results => from error in results
group error by new { error.ApplicationName, error.Count } into g
select new { g.Key.ApplicationName, Count = g.Sum(x=> x.Count) };
}
}
I'm getting back 2 results; one with a Count of 6, the second with a Count of 7. I was expecting that the two results from each shard would be combined into one result with a count of 13. Not sure if I'm doing something wrong or if that's not how its supposed to work. I followed the example at http://ravendb.net/documentation/docs-sharding to set up the sharding strategy.
Grant,
RavenDB currently doesn't handle reduce over multiple nodes.
You can do that yourself using:
session.Query<Application, ApplicationNames>()
.ToList()
.Select(new ApplicationNames().Reduce)
.ToList();