Minimize the size of returned json data from spring data repository - spring-data-rest

I have two microservice, one of them need at boot to load all operator name/codes and index them in a RadixTree.
I am trying to load around 36000 records using feign/data-rest and it is working but I noticed that in the response approximately half of the data size are coming from links
{
"_embedded" : {
"operatorcode" : [ {
"enabled" : true,
"code" : 9320,
"operatorCodeId" : 110695,
"operatorName" : "Afghanistan - Kabul/9320",
"operatorId" : 1647,
"activationDate" : "01-01-2008",
"deactivationDate" : "31-12-2099",
"countryId" : 1,
"_links" : {
"self" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695"
},
"operatorCode" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695{?projection}",
"templated" : true
},
"operator" : {
"href" : "http://10.44.0.51:8083/operatorcode/110695/operator"
}
}
}
...
]
}
Is there any way to stop sending back the _links as in my case it is not being used I tried setting use-hal-as-default-JSON-media-type: false and using projections but did not succeed.

I am not sure that it is a correct way to do this but you can try something like this:
#Bean
public Jackson2ObjectMapperBuilder jacksonBuilder() {
Jackson2ObjectMapperBuilder b = new Jackson2ObjectMapperBuilder();
b.mixIn(Object.class, IgnorePropertiesInJackson.class);
return b;
}
#JsonIgnoreProperties({"_links"})
private abstract class IgnorePropertiesInJackson {
}

Related

Get from jenkins list of nodes by label - by REST API

I need to GET list of nodes that contains certain label.
I know how to do that by getting entire nodes list by using Jenkins REST API and then getting node by node also REST API and checking its labels - but its to many API calls.
I also can create some job that writing to some place nodes list by label as parameter - but its bad way as Jenkins job that triggered remotely have no return value and I cant know it finished and will need read results from some other place the job saved it there.
I need some way that by one API call I will get nodes list contains a given label.
You can run a single API call to <JENKINS_URL>/computer/api/json (or <JENKINS_URL>/computer/api/python for a python api) which return a list of all nodes and their properties.
One of the properties is the label - so just go over all nodes and extract the ones that contain your needed label.
Here is an example for the returned object:
{
"_class" : "hudson.model.ComputerSet",
"busyExecutors" : 0,
"computer" : [
{
"_class" : "hudson.model.Hudson$MasterComputer",
"actions" : [
],
"assignedLabels" : [
{
"name" : "built-in"
}
],
"description" : "the Jenkins controller's built-in node",
"displayName" : "Built-In Node",
"executors" : [
{
},
{
}
],
"icon" : "symbol-computer",
"iconClassName" : "symbol-computer",
"idle" : true,
"jnlpAgent" : false,
"launchSupported" : true,
"loadStatistics" : {
"_class" : "hudson.model.Label$1"
},
"manualLaunchAllowed" : true,
"monitorData" : {
"hudson.node_monitors.SwapSpaceMonitor" : {
"_class" : "hudson.node_monitors.SwapSpaceMonitor$MemoryUsage2",
"availablePhysicalMemory" : 6938730496,
"availableSwapSpace" : 6906019840,
"totalPhysicalMemory" : 16885276672,
"totalSwapSpace" : 21046026240
},
"hudson.node_monitors.TemporarySpaceMonitor" : {
"_class" : "hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace",
"timestamp" : 1653907906021,
"path" : "C:\\Windows\\Temp",
"size" : 426696622080
},
"hudson.node_monitors.DiskSpaceMonitor" : {
"_class" : "hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace",
"timestamp" : 1653907905929,
"path" : "C:\\ProgramData\\Jenkins\\.jenkins",
"size" : 426696622080
},
"hudson.node_monitors.ArchitectureMonitor" : "Windows 10 (amd64)",
"hudson.node_monitors.ResponseTimeMonitor" : {
"_class" : "hudson.node_monitors.ResponseTimeMonitor$Data",
"timestamp" : 1653907905941,
"average" : 0
},
"hudson.node_monitors.ClockMonitor" : {
"_class" : "hudson.util.ClockDifference",
"diff" : 0
}
},
"numExecutors" : 2,
"offline" : false,
"offlineCause" : null,
"offlineCauseReason" : "",
"oneOffExecutors" : [
],
"temporarilyOffline" : false
}
],
"displayName" : "Nodes",
"totalExecutors" : 2
}
You are interested in the assignedLabels object - notice that it can contain multiple labels.

Mapping ElasticSearch apache module field

I am new to ES and I am facing a little problem I am struggling with.
I integrated metricbeat apache module with ES and the it works fine.
The problem is that metricbeat apache module reports the KB of web traffic of apache (field apache.status.total_kbytes), instead I would like to create my own field, the name of which would be "apache.status.total_mbytes).
I am trying to create a new mapping via Dev Console using the followind api commands:
PUT /metricbeat-7.2.0/_mapping
{
"settings":{
},
"mappings" : {
"apache.status.total_mbytes" : {
"full_name" : "apache.status.total_mbytes",
"mapping" : {
"total_mbytes" : {
"type" : "long"
}
}
}
}
}
Still ES returns the following error:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [settings : {}] [mappings : {apache.status.total_mbytes={mapping={total_mbytes={type=long}}, full_name=apache.status.total_mbytes}}]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [settings : {}] [mappings : {apache.status.total_mbytes={mapping={total_mbytes={type=long}}, full_name=apache.status.total_mbytes}}]"
},
"status" : 400
}
FYI
The following may shed some light
GET /metricbeat-*/_mapping/field/apache.status.total_kbytes
Returns
{
"metricbeat-7.9.2-2020.10.06-000001" : {
"mappings" : {
"apache.status.total_kbytes" : {
"full_name" : "apache.status.total_kbytes",
"mapping" : {
"total_kbytes" : {
"type" : "long"
}
}
}
}
},
"metricbeat-7.2.0-2020.10.05-000001" : {
"mappings" : {
"apache.status.total_kbytes" : {
"full_name" : "apache.status.total_kbytes",
"mapping" : {
"total_kbytes" : {
"type" : "long"
}
}
}
}
}
}
What am I missing? Is the _mapping command wrong?
Thanks in advance,
A working example:
Create new index
PUT /metricbeat-7.2.0
{
"settings": {},
"mappings": {
"properties": {
"apache.status.total_kbytes": {
"type": "long"
}
}
}
}
Then GET metricbeat-7.2.0/_mapping/field/apache.status.total_kbytes will result in (same as your example):
{
"metricbeat-7.2.0" : {
"mappings" : {
"apache.status.total_kbytes" : {
"full_name" : "apache.status.total_kbytes",
"mapping" : {
"total_kbytes" : {
"type" : "long"
}
}
}
}
}
}
Now if you want to add a new field to an existing mapping use the API this way:
Update an existing index
PUT /metricbeat-7.2.0/_mapping
{
"properties": {
"total_mbytes": {
"type": "long"
}
}
}
Then GET metricbeat-7.2.0/_mapping will show you the updated mapping:
{
"metricbeat-7.2.0" : {
"mappings" : {
"properties" : {
"apache" : {
"properties" : {
"status" : {
"properties" : {
"total_kbytes" : {
"type" : "long"
}
}
}
}
},
"total_mbytes" : {
"type" : "long"
}
}
}
}
}
Also, take a look at Put Mapping Api

How to look for more than one element in an embedded array in MongoDb

I have a mongodb query: (Give me settings where account='test')
db.collection_name.find({"account" : "test1"}, {settings : 1}).pretty();
where I get the following output:
{
"_id" : ObjectId("49830ede4bz08bc0b495f123"),
"settings" : {
"clusterData" : {
"us-south-1" : "cluster1",
"us-east-1" : "cluster2"
},
},
What I'm looking for now, is to give me the account where the clusterData has more than 1 element in its array.
I'm only interested in listing those accounts with (2) or more elements.
I've tried this:
db.collection_name.find({'settings.clusterData.1': {$exists: true}}, {account : 1}).pretty();
Its not returning any results. Is my query correct? Is there another way to do this?
The reason that it isn't working is that your clusterdata is an object, not an array. I would suggest changing your data to be an array of clusters with two properties like below, then it will work.
{
"_id" : ObjectId("49830ede4bz08bc0b495f123"),
"settings" : {
"clusterData" : [
{
name : "cluster1",
location : "us-south-1"
},
{
name : "cluster2",
location : "us-east-1"
}
]
}
}

What is the default doc sequence of the result from an Elasticsearch filter request?

I recently run an Elasticsearch filter request that is
{
"from" : 0,
"size" : 10,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"terms" : {
"a_id" : [ 257793, 257798, 257844 ]
}
}
}
}
}
},
"explain" : false,
"fields" : "a_id"
}
So that I can find all docs with a_id in 257793, 257798, 257844 and the results are 257844, 257798, 257793. So far so good.
Then I find that whatever the sequence of the term numbers are, the return docs are always in the same a_id order. That is, even I run
"terms" : {
"a_id" : [257798, 257844, 257793 ]
}
The result docs are in the order of 257844, 257798, 257793 as well.
So I am so curious about the mechanism behind the Elasticsearch filtering. Can anyone help me and give me a hint?
By default, ES returns in descending order of _score. You can provide the sort option, to say in which order and based on what you want the results to be returned. For e.g., for based on date field
{
"sort": { "date": { "order": "desc" }}
"query" : {
"term" : { "user" : "kimchy" }
}
}
You can get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_sorting.html

Obtaining Object IDs for Schedule States in Rally

I have set up a "checkbox group" with the five schedule states in our organization's workspace. I would like to query using the Lookback API with the selected schedule states as filters. Since the LBAPI is driven by ObjectIDs, I need to pass in the ID representations of the schedule states, rather than their names. Is there a quick way to get these IDs so I can relate them to the checkbox entries?
Lookback API will accept string-valued ScheduleStates as query arguments. Thus the following query:
{
find: {
_TypeHierarchy: "HierarchicalRequirement",
"ScheduleState": "In-Progress",
__At:"current"
}
}
Works correctly for me. If you want/need OIDs though, and add &fields=true to the end of your REST query URL, you'll notice the following information coming back:
GeneratedQuery: {
{ "fields" : true,
"find" : { "$and" : [ { "_ValidFrom" : { "$lte" : "2013-04-18T20:00:25.751Z" },
"_ValidTo" : { "$gt" : "2013-04-18T20:00:25.751Z" }
} ],
"ScheduleState" : { "$in" : [ 2890498684 ] },
"_TypeHierarchy" : { "$in" : [ -51038,
2890498773,
10487547445
] },
"_ValidFrom" : { "$lte" : "2013-04-18T20:00:25.751Z" }
},
"limit" : 10,
"skip" : 0
}
}
You'll notice the ScheduleState OID here:
"ScheduleState" : { "$in" : [ 2890498684 ] }
So you could run a couple of sample queries on different ScheduleStates and find their corresponding OIDs.