OrientDB: Information about RecordLinks - batch-processing

I am trying to run a batch command in orientdb where I am updating records (which can be either edges or vertices). I am trying to run this in a batch command such that I get back the property "name" and the rid from the database (assume that "name is guaranteed to exist).
So, I have the following batch command:
begin;
let a0 = update #44:845 merge {"name": "B4"} return after ;
let a1 = update #44:849 merge {"name": "Name4"} return after ;
let a2 = update #42:297 merge {"name": "Name2"} return after ;
let a3 = update #43:278 merge {"name": "B1"} return after ;
let a4 = update #42:298 merge {"name": "B2"} return after ;
let a5 = update #29:15698 merge { "name": "Name1""} return after ;
commit retry 100;
return [$a0,$a1,$a2,$a3,$a4,$a5]
But instead of returning objects, this is returning orientdbLinks, which doesn't allow me to get the names of the objects.
At the end of my batch function, I want to return a dictionary like this:
{"B4", "44:845", "Name3": "44:849", . . .}
Is this possible? I have tried a bunch of different commands
let a0 = update #44:845 merge {"name": "B4"} return after ;
let a0 = update #44:845 merge {"name": "B4"} return after [$current#rid, $current.name]; #This fails entirely
let a0 = update #44:845 merge {"name": "B4"} return after $current;
but no matter what I try, it is just returning a OrientDBRecordLink instead of a proper OrientDBStorageObject. As far as I can tell, there is no way to get properties from an OrientDBRecordLink, is there?
Does anyone know how I can do this?

To return a property of an object you just need to append the .field on your variable, as for the dictionary you should look into returning a map:
return { "$a0.name" : $a0.#rid, "$a1.name" : $a1.#rid, "$a2.name" : $a2.#rid, "$a3.name" : $a3.#rid, "$a4.name" : $a4.#rid, "$a5.name" : $a5.#rid }
OrientDB Docs | SQL-Batch

Related

Karate json key list variable assignment

New to Karate, and JSON, for that matter, but I've got a variable like:
response {
entries {
products [
{
names [
"Peter Parker",
"Tony Stark",
"Captain America"
]
},
{
names [
"Thomas Tinker",
"Jimmy Johnson",
"Mama Martha"
]
}
]
}
}
match each response.entries.products[*].names returns a list like:
["Peter Parker","Tony Stark","Captain America","Thomas Tinker","Jimmy Johnson","Mama Martha"]
But I'd like to assign that output to a variable, such as:
* def variable = response.entries.products[*].names
that would hold a similar value. When I use the above line, I get the following error:
Expected an operand but found *
Is it possible to achieve that, or something similar? If so, how?
Thanks!
Yes, there is syntax for that:
* def variable = $response.entries.products[*].names
Read the docs: https://github.com/intuit/karate#get

Do I need Indexing required in case of less data returned by DynamoDb

Say, the dynamo db has data of format:-
{
"id":"<id>",
"field-1":"<field-1-value>",
"field-2":"<field-2-value>",
"field-3":"<field-3-value>",
"field-4":"<field-4-value>",
"metadata":{
"subfield-1":"<subfield-1-value>",
"subfield-2":"<subfield-2-value>"
}
}
So, I have a partition key on id column and sort key on field-1 say. Now, say, I have a requirement that for the same id, if we want a search capability on subfield-1 value, so can that be easily done in Dynamo Db without creating any index. The max. number of rows that would be there for each id would be 70. So, looks like a small set of data.
Please let me know your views.
Yes, this can be achieved without index. You can use FilterExpression to filter the data i.e. metadata.subfield-1.
Example:
var params = {
TableName : 'yourTableName',
KeyConditionExpression : 'id = :idval',
FilterExpression : '#metadata = :subField1Val',
ExpressionAttributeNames : {
'#metadata' : 'metadata.subfield-1'
},
ExpressionAttributeValues : {
':idval' : '7',
'subField1Val' : 'somevalue'
}
};

Pentaho SQL to MongoDb - Array Issue

I need to update elements in an array, then, when I run the transformation at the first time, the array receives the righ numbers if elements in the PROD array. But if I run it again, the array will receives the same elements
Example:
At the first time, I got the document below, and It is correct:
{
"_id" : ObjectId("58e2c81f781a75592f69f8a5"),
"DDATA_ORC" : ISODate("2016-08-02T03:00:00.000Z"),
"SNUMORC" : "113239",
"PROD" : [
{
"SPRODUTO" : "TONER HP CE411A CIANO (305A)"
}
]
}
But if I run the transformation again, the PROD array will be updated with the same SPRODUTO:
{
"_id" : ObjectId("58e2c81f781a75592f69f8a5"),
"DDATA_ORC" : ISODate("2016-08-02T03:00:00.000Z"),
"SNUMORC" : "113239",
"PROD" : [
{
"SPRODUTO" : "TONER HP CE411A CIANO (305A)"
},
{
"SPRODUTO" : "TONER HP CE411A CIANO (305A)"
}
]
}
It is a problem because I will get wrong results for queries.
That is may plugin configurations:
Options Tab and Document Path tab
I need to update the array only if It receives or lose an item.
Thanks in advance
I solved this issue.
If anyone have this problem, the solution is to create 2 "MongoDB Output". In the first output, you need to set the array (the array will be recreated every time that the update query runs sucessfuly) . I did It using a dummy field.
First Output Document Fields
In the second "MongoDB Output", You need to execute a push to populate the array.
Second Output Document Fields
In the "Output Options" tab, You have to set Update, Upsert and "Modifier Update"

updating a value in an array in mongodb from java

I have couple of documens in mongodb as follow:
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Date" : ISODate("2014-10-20T04:00:00.000Z"),
"Type" : "Twitter",
"Entities" : [
{
"ID" : 4,
"Name" : "test1",
"Sentiment" : {
"Value" : 20,
"Neutral" : 1
}
},
{
"ID" : 5,
"Name" : "test5",
"Sentiment" : {
"Value" : 10,
"Neutral" : 1
}
}
]
}
Now I want to update the document that has Entities.ID=4 by adding (Sentiment.Value+4)/2 for example in the above example after update we have 12.
I wrote the following code but I am stuck in the if statement as you can see:
DBCollection collectionG;
collectionG = db.getCollection("GraphDataCollection");
int entityID = 4;
String entityName = "test";
BasicDBObject queryingObject = new BasicDBObject();
queryingObject.put("Entities.ID", entityID);
DBCursor cursor = collectionG.find(queryingObject);
if (cursor.hasNext())
{
BasicDBObject existingDocument = new BasicDBObject("Entities.ID", entityID);
//not sure how to update the sentiment.value for entityid=4
}
First I thought I should unwind the Entities array first to get the value of sentiment but if I do that then how can I wind them again and update the document with the same format as it has now but with the new sentiment value ?
also I found the this link as well :
MongoDB - Update objects in a document's array (nested updating)
but I could not understand it since it is not written in java query,
can anyone explain how I can do this in java?
You need to do this in two steps:
Get all the _id of the records which contain a Entity with sentiment
value 4.
During the find, project only the entity sub document that has
matched the query, so that we can process it to consume only its
Sentiment.Value. Use the positional operator($) for this purpose.
Instead of hitting the database every time to update each matched
record, use the Bulk API, to queue up the updates and execute it
finally.
Create the Bulk operation Writer:
BulkWriteOperation bulk = col.initializeUnorderedBulkOperation();
Find all the records which contain the value 4 in its Entities.ID field. When you match documents against this query, you would get the whole document returned. But we do not want the whole document, we would like to have only the document's _id, so that we can update the same document using it, and the Entity element in the document that has its value as 4. There may be n other Entity documents, but they do not matter. So to get only the Entity element that matches the query we use the positional operator $.
DBObject find = new BasicDBObject("Entities.ID",4);
DBObject project = new BasicDBObject("Entities.$",1);
DBCursor cursor = col.find(find, project);
What the above could would return is the below document for example(since our example assumes only a single input document). If you notice, it contains only one Entity element that has matched our query.
{
"_id" : ObjectId("54901212f315dce7077204af"),
"Entities" : [
{
"ID" : 4,
"Name" : "test1",
"Sentiment" : {
"Value" : 12,
"Neutral" : 1
}
}
]
}
Iterate each record to queue up for update:
while(cursor.hasNext()){
BasicDBObject doc = (BasicDBObject)cursor.next();
int curVal = ((BasicDBObject)
((BasicDBObject)((BasicDBList)doc.get("Entities")).
get(0)).get("Sentiment")).getInt("Value");
int updatedValue = (curVal+4)/2;
DBObject query = new BasicDBObject("_id",doc.get("_id"))
.append("Entities.ID",4);
DBObject update = new BasicDBObject("$set",
new BasicDBObject("Entities.$.Sentiment.Value",
updatedValue));
bulk.find(query).update(update);
}
Finally Update:
bulk.execute();
You need to do a find() and update() and not simply an update, because currently mongodb does not allow to reference a document field to retrieve its value, modify it and update it with a computed value, in a single update query.

Elasticsearch bulk/batch indexing with python requests module

I have a smallish (~50,00) array of json dictionaries that I want to store/index in ES. My preference is to use python, since the data I want to index is coming from a csv file, loaded and converted to json via python. Alternatively, I would like to skip the step of converting to json, and simply use the array of python dictionaries I have. Anyway, a quick search revealed the bulk indexing functionality of ES. I want to do something like this:
post_url = 'http://localhost:9202/_bulk'
request.post(post_url, data = acc ) # acc a python array of dictionaries
or
post_url = 'http://localhost:9202/_bulk'
request.post(post_url, params = acc ) # acc a python array of dictionaries
both request give a [HTTP 500 error]
My understanding is that you have to have one "command" per line (index, create, delete...) and then some of them (like index) takes a row of data on the next line like so
{'index': ''}\n
{'your': 'data'}\n
{'index': ''}\n
{'other': 'data'}\n
NB the new-lines, even on the last row.
Empty index objects like above works if you POST to ../index/type/_bulk or else you need to specify index and type I think, have not tried that.
You the following function will do it:
def post_request(self, endpoint, data):
endpoint = 'localhost:9200/_bulk'
response = requests.post(endpoint, data=data, headers={'content-type':'application/json', 'charset':'UTF-8'})
return response
As data you need to pass a String such:
{ "index" : { "_index" : "test-index", "_type" : "_doc", "_id" : "1681", "routing" : 0 }}
{ "field1" : ... , ..., "fieldN" : ... }
{ "index" : { "_index" : "test-index", "_type" : "_doc", "_id" : "1684", "routing" : 1 }}
{ "field1" : ... , ..., "fieldN" : ... }
Make sure you add a "\n" at the end of each line.
I don't know much about Python, but did you look at Pyes?
Bulk is supported in Pyes.