MongoDB aggregation on location data: Skip document/row if no movement - mongodb-query

I have data as sampled below.
I would like to query the location data from MongoDB and skip rows (or documents in my case) where no movement happens. I can't group by location (lon and lat) as a location can be used as a start/end point multiple times.
Basically: $skip if lat and lon is the same as in the document with the timestamp earlier to the current one.
In the sample I would want to skip the second and fourth document.
I figured that should be somehow possible with $skip in aggregation and and a $cond or $lookup operation.
Thanks in advance!
{
"_id": {"$oid": "5ee4dbe8d30bb8f72598731a"},
"vehicle": "CF729EE12AA98C3673266DF965FD89CB4C8D2E84",
"lat": 52.51312,
"lon": 13.317027,
"timestamp": {"$date": "2020-06-10T11:10:48.040Z"}
},
{
"_id": {"$oid": "5ee4dbe8d30bb8f725987c24"},
"vehicle": "CF729EE12AA98C3673266DF965FD89CB4C8D2E84",
"lat": 52.51312,
"lon": 13.317027,
"timestamp": {"$date": "2020-06-10T11:16:22.662Z"}
},
{
"_id": {"$oid": "5ee4dc21d30bb8f725a33905"},
"vehicle": "CF729EE12AA98C3673266DF965FD89CB4C8D2E84",
"lat": 52.53564,
"lon": 13.42169,
"timestamp": {"$date": "2020-06-11T15:57:50.290Z"}
},
{
"_id": {"$oid": "5ee4dc21d30bb8f725a3421d"},
"vehicle": "CF729EE12AA98C3673266DF965FD89CB4C8D2E84",
"lat": 52.53564,
"lon": 13.42169,
"timestamp": {"$date": "2020-06-11T16:03:23.421Z"}
}
]```

Related

Adobe Analytics - how to get report API 2.0 to get multi-level breakdown data using Java

I need help in getting adobe-analytics data when multiple IDs are passed for multi-level breakdown using API 2.0.
I am getting first level data for V124 ----
"metricContainer": {
"metrics": [
{
"columnId": "0",
"id": "metrics/event113",
"sort": "desc"
}
]
},
"dimension": "variables/evar124",
but i want to use IDs returned in above call response to in second level breakdown of v124 to get v125 as---
"metricContainer": {
"metrics": [
{
"columnId": "0",
"id": "metrics/event113",
"sort": "desc",
"filters": [
"0"
]
}
],
"metricFilters": [
{
"id": "0",
"type": "breakdown",
"dimension": "variables/evar124",
"itemId": "2629267831"
},
{
"id": "1",
"type": "breakdown",
"dimension": "variables/evar124",
"itemId": "2629267832"
}
]
},
"dimension": "variables/evar125",
This always returns data only for 2629267831 ID and not 2629267832.
I want to get data for thousands of IDs (returned from first API call) in a single API call. What am i doing wrong?

select node value from json column type

A table I called raw_data with three columns: ID, timestamp, payload, the column paylod is a json type having values such as:
{
"data": {
"author_id": "1461871206425108480",
"created_at": "2022-08-17T23:19:14.000Z",
"geo": {
"coordinates": {
"type": "Point",
"coordinates": [
-0.1094,
51.5141
]
},
"place_id": "3eb2c704fe8a50cb"
},
"id": "1560043605762392066",
"text": " ALWAYS # London, United Kingdom"
},
"matching_rules": [
{
"id": "1560042248007458817",
"tag": "london-paris"
}
]
}
From this I want to select rows where the coordinates is available, such as [-0.1094,51.5141]in this case.
SELECT *
FROM raw_data, json_each(payload)
WHERE json_extract(json_each.value, '$.data.geo.') IS NOT NULL
LIMIT 20;
Nothing was returned.
EDIT
NOT ALL json objects have the coordinates node. For example this value:
{
"data": {
"author_id": "1556031969062010881",
"created_at": "2022-08-18T01:42:21.000Z",
"geo": {
"place_id": "006c6743642cb09c"
},
"id": "1560079621017796609",
"text": "Dear Desperate sister say husband no dey oo."
},
"matching_rules": [
{
"id": "1560077018183630848",
"tag": "kaduna-kano-katsina-dutse-zaria"
}
]
}
The correct path is '$.data.geo.coordinates.coordinates' and there is no need for json_each():
SELECT *
FROM raw_data
WHERE json_extract(payload, '$.data.geo.coordinates.coordinates') IS NOT NULL;
See the demo.

One to many geo points order by route travel time

I got a tricky use case.
We got a selection of custom interest points in our database.
From a mobile app, a client should be able ask to get the list of point of interests, order by real travel time.
It's not a shortest path problème, it's really several possible points, and the client will choose by himself the point he prefers, depending of the travel time between his position and one of the destination.
So the answer must be quick and can not be precalculated.
We could have in the worst case, 50 to 100 points.
So i'm looking for an WEB API that can be called in France where I can send several route optimisation queries at once.
a json body could looks like this, with more points of course:
[
{
"from": {
"lat": "user current lat",
"lon": "user current lon"
},
"dest": {
"lat": "point1 lat",
"lon": "point1 lon"
}
},
{
"from": {
"lat": "user current lat",
"lon": "user current lon"
},
"dest": {
"lat": "point2 lat",
"lon": "point2lon"
}
},
{
"from": {
"lat": "user current lat",
"lon": "user current lon"
},
"dest": {
"lat": "point3 lat",
"lon": "point3 lon"
}
},
{
"from": {
"lat": "user current lat",
"lon": "user current lon"
},
"dest": {
"lat": "point4 lat",
"lon": "point4 lon"
}
}
]
And then the response would send the route optimization information for each point.
Do you know any api that have this possibility, or any solution to manage this use case ?
I finally found a solution with some help.
The distance Matrix API + a middle-ware will answer to my needs.
https://developers.google.com/maps/documentation/distance-matrix/overview

Cumulocity measurement representation

I create measurements at reception of an event, I can get them using the API, but they are not represented graphically in the Device Management interface. I there a specific format they would have to respect to be representable automatically? If so, is there a place I can find all the formats supported by Cumulocity? I infered the c8y_TemperatureMeasurement from the examples in the doc but I didn't find an exhaustive list of the native formats.
Here are examples of the measurements I have at the moment:
{
"time": "2016-06-29T12:10:02.000+02:00",
"id": "27006",
"self": "https://<tenant-id>/measurement/measurements/27006",
"source": {
"id": "26932",
"self": "https://<tenant-id>/inventory/managedObjects/26932"
},
"type": "c8y_BatteryMeasurement",
"c8y_BatteryMeasurement": {
"unit": "V",
"value": 80
}
},
{
"time": "2016-06-29T10:15:22.000+02:00",
"id": "27010",
"self": "https://<tenant-id>/measurement/measurements/27010",
"source": {
"id": "26932",
"self": "https://<tenant-id>/inventory/managedObjects/26932"
},
"type": "c8y_TemperatureMeasurement",
"c8y_TemperatureMeasurement": {
"T": {
"unit": "C",
"value": 24
}
}
}
The measurements have to be sent to Cumulocity in the following format:
{
"fragment": {
"series": {
"unit": "x",
"value": y
}
}
}

Aggregations on most recent document in group using elasticsearch

Suppose there are several documents per person that contain values:
{
"name": "John",
"value": 1,
"timestamp": 2014-06-15
}
{
"name": "John",
"value": 2,
"timestamp": 2014-06-16
}
{
"name": "Sam",
"value": 2,
"timestamp": 2014-06-15
}
{
"name": "Sam",
"value": 3,
"timestamp": 2014-06-16
}
How do I get a list of the most recent documents for each person?
How do I get an average of the values for the list of the most recent documents for each person? Given the sample data, this would be 2.5, not 2.
Is there some combination of buckets and metrics that could achieve this result? Will I need to implement a custom aggregator as part of a plugin, or must this sort of computation be performed in memory?
If you only need to find the most recent persons try something like this:
"aggs": {
"personName": {
"terms": {
"field": "name",
"size": 5,
"order": {"timeCreated": "desc"}
},
"aggs": {
"timeCreated": {
"max": {"field": "timestamp"}
}
}
}
}
The second operation is just an aggregation, and to get the average of the value field you could try something like:
curl -XPOST "http://DOMAIN:9200/your/data/_search" -d'
{
"size": 0,
"aggregations": {
"the_name": {
"terms": {
"field": "name",
"order": {
"value_avg": "desc"
}
},
"aggregations": {
"value_avg": {
"avg": {
"field": "value"
}
}
}
}
}
}'
To achieve a solution for your first issue I would recommend you to order the response by date, and then in your project ignore a term when you have another with the same name (meaning filter the data after the response of ES)