how to reduce amount of data in neo4j query? - api

when requesting data over Neo4j, as in, say,
curl -i -XPOST -d'{ "query" : "start n=node(*) return n" }'
-H "accept:application/json;stream=true"
-H content-type:application/json
http://localhost:7474/db/data/cypher
i get, as documented, a response like this:
{
"columns" : [ "n" ],
"data" : [ [ {
"outgoing_relationships" : "http://localhost:7474/db/data/node/0/relationships/out",
"data" : {
},
"traverse" : "http://localhost:7474/db/data/node/0/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/0/properties/{key}",
"self" : "http://localhost:7474/db/data/node/0",
"properties" : "http://localhost:7474/db/data/node/0/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/0/relationships/in",
"extensions" : {
},
"create_relationship" : "http://localhost:7474/db/data/node/0/relationships",
"paged_traverse" : "http://localhost:7474/db/data/node/0/paged/traverse/{returnType}{?pageSize,leaseTime}",
"all_relationships" : "http://localhost:7474/db/data/node/0/relationships/all",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/0/relationships/in/{-list|&|types}"
} ], [ {
"outgoing_relationships" : "http://localhost:7474/db/data/node/1/relationships/out",
"data" : {
"glyph" : "δΈ€",
"~isa" : "glyph"
},
"traverse" : "http://localhost:7474/db/data/node/1/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/1/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/1/properties/{key}",
"self" : "http://localhost:7474/db/data/node/1",
"properties" : "http://localhost:7474/db/data/node/1/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/1/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/1/relationships/in",
"extensions" : {
},
"create_relationship" : "http://localhost:7474/db/data/node/1/relationships",
"paged_traverse" : "http://localhost:7474/db/data/node/1/paged/traverse/{returnType}{?pageSize,leaseTime}",
"all_relationships" : "http://localhost:7474/db/data/node/1/relationships/all",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/1/relationships/in/{-list|&|types}"
} ], [ {
"outgoing_relationships" : "http://localhost:7474/db/data/node/2/relationships/out",
"data" : {
"~isa" : "LPG",
"LPG" : "1"
},
"traverse" : "http://localhost:7474/db/data/node/2/traverse/{returnType}",
"all_typed_relationships" : "http://localhost:7474/db/data/node/2/relationships/all/{-list|&|types}",
"property" : "http://localhost:7474/db/data/node/2/properties/{key}",
"self" : "http://localhost:7474/db/data/node/2",
"properties" : "http://localhost:7474/db/data/node/2/properties",
"outgoing_typed_relationships" : "http://localhost:7474/db/data/node/2/relationships/out/{-list|&|types}",
"incoming_relationships" : "http://localhost:7474/db/data/node/2/relationships/in",
"extensions" : {
},
"create_relationship" : "http://localhost:7474/db/data/node/2/relationships",
"paged_traverse" : "http://localhost:7474/db/data/node/2/paged/traverse/{returnType}{?pageSize,leaseTime}",
"all_relationships" : "http://localhost:7474/db/data/node/2/relationships/all",
"incoming_typed_relationships" : "http://localhost:7474/db/data/node/2/relationships/in/{-list|&|types}"
} ], [ {
and so on and on. the URLs delivered with each node are certainly well meant, but they also occupy a major portion of the data transmitted. they're also highly redundant and not what i', after with my query. is there any way to drop all of that traverse,
all_typed_relationships,
property,
self,
properties,
outgoing_typed_relationships,
incoming_relationships,
extensions,
create_relationship,
paged_traverse,
all_relationships,
incoming_typed_relationships
jazz?

The only way is to specify the properties you want returned in the return statement. Like:
return id(n), n.glyph;

Related

How to use springdoc with #PostMapping(consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE) and #RequestParam

I'm upgrading a project from SpringFox to SpringDoc v1.6.12 and I struggle to make the new code work for the following method of my RestController:
#PostMapping(path = TASK_MAPPING_PATH, consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE)
public ResponseEntity<String> loadTask(
#RequestParam String applicationId,
#RequestParam String businessId,
#RequestParam boolean directLink
) {[...]}
The particularity of this method is that it should encode its parameters in the body since the Content-Type application/x-www-form-urlencoded is used.
But when I browse the url https://localhost:8443/v3/api-docs, the generated code is the following:
"/api/enrolment/task" : {
"post" : {
"operationId" : "loadTask",
"parameters" : [ {
"in" : "query",
"name" : "applicationId",
"required" : true,
"schema" : {
"type" : "string"
}
}, {
"in" : "query",
"name" : "businessId",
"required" : true,
"schema" : {
"type" : "string"
}
}, {
"in" : "query",
"name" : "directLink",
"required" : true,
"schema" : {
"type" : "boolean"
}
} ],
"responses" : {
[...]
},
"summary" : [...],
"tags" : [...]
}
},
All of the applicationId, businessId and directLink parameters are passed in the URL instead of the request body as expected.
I would have expected the following openApi definition instead:
"/api/enrolment/task" : {
"post" : {
"operationId" : "loadTask",
"requestBody" : {
"content" : {
"application/x-www-form-urlencoded" : {
"schema" : {
"type" : "object",
"properties" : {
"applicationId" : {
"type" : "string"
},
"businessId" : {
"type" : "string"
},
"directLink" : {
"type" : "boolean"
}
},
"required" : [ "applicationId", "businessId", "directLink" ]
}
}
}
},
"responses" : {
[...]
},
"summary" : [...],
"tags" : [...]
}
},
Does anyone ever had the same issue ?
Does anyone knows the solution to my problem ?
Thanks.

Get from jenkins list of nodes by label - by REST API

I need to GET list of nodes that contains certain label.
I know how to do that by getting entire nodes list by using Jenkins REST API and then getting node by node also REST API and checking its labels - but its to many API calls.
I also can create some job that writing to some place nodes list by label as parameter - but its bad way as Jenkins job that triggered remotely have no return value and I cant know it finished and will need read results from some other place the job saved it there.
I need some way that by one API call I will get nodes list contains a given label.
You can run a single API call to <JENKINS_URL>/computer/api/json (or <JENKINS_URL>/computer/api/python for a python api) which return a list of all nodes and their properties.
One of the properties is the label - so just go over all nodes and extract the ones that contain your needed label.
Here is an example for the returned object:
{
"_class" : "hudson.model.ComputerSet",
"busyExecutors" : 0,
"computer" : [
{
"_class" : "hudson.model.Hudson$MasterComputer",
"actions" : [
],
"assignedLabels" : [
{
"name" : "built-in"
}
],
"description" : "the Jenkins controller's built-in node",
"displayName" : "Built-In Node",
"executors" : [
{
},
{
}
],
"icon" : "symbol-computer",
"iconClassName" : "symbol-computer",
"idle" : true,
"jnlpAgent" : false,
"launchSupported" : true,
"loadStatistics" : {
"_class" : "hudson.model.Label$1"
},
"manualLaunchAllowed" : true,
"monitorData" : {
"hudson.node_monitors.SwapSpaceMonitor" : {
"_class" : "hudson.node_monitors.SwapSpaceMonitor$MemoryUsage2",
"availablePhysicalMemory" : 6938730496,
"availableSwapSpace" : 6906019840,
"totalPhysicalMemory" : 16885276672,
"totalSwapSpace" : 21046026240
},
"hudson.node_monitors.TemporarySpaceMonitor" : {
"_class" : "hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace",
"timestamp" : 1653907906021,
"path" : "C:\\Windows\\Temp",
"size" : 426696622080
},
"hudson.node_monitors.DiskSpaceMonitor" : {
"_class" : "hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace",
"timestamp" : 1653907905929,
"path" : "C:\\ProgramData\\Jenkins\\.jenkins",
"size" : 426696622080
},
"hudson.node_monitors.ArchitectureMonitor" : "Windows 10 (amd64)",
"hudson.node_monitors.ResponseTimeMonitor" : {
"_class" : "hudson.node_monitors.ResponseTimeMonitor$Data",
"timestamp" : 1653907905941,
"average" : 0
},
"hudson.node_monitors.ClockMonitor" : {
"_class" : "hudson.util.ClockDifference",
"diff" : 0
}
},
"numExecutors" : 2,
"offline" : false,
"offlineCause" : null,
"offlineCauseReason" : "",
"oneOffExecutors" : [
],
"temporarilyOffline" : false
}
],
"displayName" : "Nodes",
"totalExecutors" : 2
}
You are interested in the assignedLabels object - notice that it can contain multiple labels.

db.find vs db.aggregation to select nested array Object

I'v tried to perform the following query :
db.getCollection('fxh').find({"username": "user1", "pf.acc.accnbr" : 915177},{userid: true, "pf.pfid": true, "pf.acc.accid":true})
and my collection is the following :
{
"_id" : ObjectId("5932fd8f381d4c0a7de21942"),
"userid" : 1496513894,
"username" : "user1",
"email" : "user1#gmail.com",
"fullname" : "User 1",
"pf" : {
"acc" : [
{
"cyc" : [
{
"det" : {
"status" : "New",
"dcycid" : 1496513941
},
"status" : "New",
"name" : "QPT202017_M1",
"cycid" : 1496513940
}
],
"status" : "New",
"accnbr" : 915177,
"accid" : 1496513939
},
{
"cyc" : [
{
"det" : {
"status" : "New",
"dcycid" : 1496552643
},
"status" : "New",
"name" : "QPT202017_S8",
"cycid" : 1496552642
}
],
"status" : "New",
"accnbr" : 73497,
"accid" : 1496552641
}
],
"pfid" : 1496513935,
},
"lastupdate" : ISODate("2017-06-03T18:18:55.080Z"),
"__v" : 0
}
When I execute the query the result is the following :
{
"_id" : ObjectId("5932fd8f381d4c0a7de21942"),
"userid" : 1496513894,
"portfolio" : {
"acc" : [
{
"accid" : 1496513939
},
{
"accid" : 1496552641
}
],
"pfid" : 1496513935
}
}
And my problem is that I need to see only the concerned accid and the result returns the all accid !.
Any idea how just to return the selected accid of accnbr ?
NB : I have also tried to add $ sign at the end of my query , it
selects the right acc but it returns the all objects or I need just
only ONE returned object.
On 6/5/17
I also used the aggregate command instead of find and it get result by using this :
db.getCollection('fxh').aggregate([ { $unwind : "$pf.acc"} , { $match : {"username":"adh1", "pf.acc.accbr": 915177 } }, {$project : {_id:0, accid: "$pf.acc.accid"}}])
But could NOT get a lower level result, when I ran this :
db.getCollection('fxh').aggregate([ { $unwind : "$pf.acc.cyc"} , { $match : {"username":"adh1", "pf.acc.accbr": 915177, "pf.acc.cyc.name": "QPT202017_M1" } }, {$project : {_id:0, cycid: "$pf.acc.cyc.cycid"}}])
Any idea ?
You can try the below aggregation pipeline.
The idea is to $unwind one nested level at a time, starting from the outermost to the innermost.
For each nested level unwinding, you can apply the$match to limit the documents and continue till you have the desired shape.
You can $group it together at the end to get back to the original shape.
db.getCollection('fxh').aggregate([
{ $match : {"username":"adh1"} },
{ $unwind : "$pf.acc"} ,
{ $match : {"pf.acc.accbr": 915177 } },
{ $unwind : "$pf.acc.cyc"},
{ $match : {"pf.acc.cyc.name": "QPT202017_M1" } },
{$project : {_id:0, accid: "$pf.acc.accid", cycid: "$pf.acc.cyc.cycid"}}])

Elasticsearch - Extracting PDF content and encoding with base64

I want to be able to extract content from a PDF file and to be able to search within that content using ElasticSearch.
I did install elasticsearch/elasticsearch-mapper-attachments/2.6.0
I have created a new index named "docs".
I did create a file named "tmp.json" with that content :
{"title": "file.pdf", "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="}
I did execute the following :
curl -X PUT "http://localhost:9200/docs/attachment/_mapping" -d '{
"attachment": {
"properties" : {
'file" : {
"type" : "attachment",
"fields" : {
"title" : {"store":"yes"},
"file":{
"type":"string",
"term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'
and the following :
curl -X POST "http://localhost:9200/docs/attachment" -d #tmp.json
The problem is that the content is stored as it is in the file.
I was expecting the content to be decoded, like so :
base64.b64decode("IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg==")
That gives :
b'"God Save the Queen" (alternatively "God Save the King"'
To encode in base64, here what I do :
import json, base64
file64 = base64.b64encode(open('file.pdf', "rb").read()).decode('ascii')
f = open('tmp.json', 'w')
data = {"file":file64, "title":fname}
json.dump(data,f)
f.close()
I would like to be able to see the content using kibana (but for now I see only the base64 data ...)
This didn't work :
curl -X PUT "http://localhost:9200/docs/attachment/_mapping" -d '{
"attachment": {
"properties" : {
"content" : {
"type" : "attachment",
"fields" : {
"title" : {"store":"yes"},
"content":{
"type":"string",
"term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'
This worked, and I can see the content of the PDF through Kibana :
curl -X PUT "http://localhost:9200/docs" -d '{
"mappings" : {
"attachment" : {
"properties" : {
"content" : {
"type" : "attachment",
"fields" : {
"content" : { "store" : "yes" },
"author" : { "store" : "yes" },
"title" : { "store" : "yes"},
"date" : { "store" : "yes" },
"keywords" : { "store" : "yes", "analyzer" : "keyword" },
"name" : { "store" : "yes" },
"content_length" : { "store" : "yes" },
"content_type" : { "store" : "yes" }
}
}
}
}
}
}'

restart jobtracker through cloudera manager API

I am trying to restart Mapreduce Jobtracker through Cloudera Manager API. Stats for Jobtracker is as follows :
local-iMac-399:$ curl -u 'admin:admin' 'http://hadoop-namenode.dev.com:7180/api/v6/clusters/Cluster%201/services/mapreduce/roles/mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86'
{
"name" : "mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86",
"type" : "JOBTRACKER",
"serviceRef" : {
"clusterName" : "cluster",
"serviceName" : "mapreduce"
},
"hostRef" : {
"hostId" : "24259373-7e71-4089-8251-faf055e42ad7"
},
"roleUrl" : "http://hadoop-namenode.dev.com:7180/cmf/roleRedirect/mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86",
"roleState" : "STARTED",
"healthSummary" : "GOOD",
"healthChecks" : [ {
"name" : "JOB_TRACKER_FILE_DESCRIPTOR",
"summary" : "GOOD"
}, {
"name" : "JOB_TRACKER_GC_DURATION",
"summary" : "GOOD"
}, {
"name" : "JOB_TRACKER_HOST_HEALTH",
"summary" : "GOOD"
}, {
"name" : "JOB_TRACKER_LOG_DIRECTORY_FREE_SPACE",
"summary" : "GOOD"
}, {
"name" : "JOB_TRACKER_SCM_HEALTH",
"summary" : "GOOD"
}, {
"name" : "JOB_TRACKER_UNEXPECTED_EXITS",
"summary" : "GOOD"
}, {
"name" : "JOB_TRACKER_WEB_METRIC_COLLECTION",
"summary" : "GOOD"
} ],
"configStalenessStatus" : "STALE",
"haStatus" : "ACTIVE",
"maintenanceMode" : false,
"maintenanceOwners" : [ ],
"commissionState" : "COMMISSIONED",
"roleConfigGroupRef" : {
"roleConfigGroupName" : "mapreduce-JOBTRACKER-BASE"
}
}
local-iMac-399:$
Dont know How do I use API to restart just Jobtracker ?
I tried to restart Hive service using following command but got some error
local-iMac-399:$curl -X POST -u 'admin:admin' 'http://hadoop-namenode.dev.com:7180/api/v6/clusters/Cluster%201/services/hive/roleCommands/restart'
{
"message" : "No content to map due to end-of-input\n at [Source: org.apache.cxf.transport.http.AbstractHTTPDestination$1#4169c499; line: 1, column: 1]"
}
I would appreciate if someone help in understanding how to use Cloudera Manager API
Based on the information provided, this is how you'd invoke the CM API JobTracker restart
curl -u 'admin:admin' -X POST -H "Content-Type:application/json" -d '{"items":["mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86"]}' 'http://hadoop-namenode.dev.com:7180/api/v6/clusters/Cluster%201/services/mapreduce/roleCommands/restart'