Selecting the latest document for each "Group" - sql

I am using Azure Cosmos DB SQL API to try to achieve the following;
We have device data stored within a collection and would love to retrieve the latest event data per device serial effectively without having to do N queries for each device separately.
SELECT *
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1') ORDER BY c.EventEnqueuedUtcTime DESC
Im assuming I would need to use Group By - https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-group-by
Any assistance would be greatly appreciated
Rough example of data :
[
{
"temperature": 25.22063251827873,
"humidity": 71.54208429695204,
"serial": "V55555555",
"testid": 1,
"location": {
"type": "Point",
"coordinates": [
30.843687,
-29.789895
]
},
"EventProcessedUtcTime": "2020-09-07T12:04:34.5861918Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2020-09-07T12:04:34.4700000Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "V55555555",
"ConnectionDeviceGenerationId": "637323979596346475",
"EnqueuedTime": "2020-09-07T12:04:34.0000000"
},
"Name": "admin",
"id": "6dac491e-1f28-450d-bf97-3a15a0efaad8",
"_rid": "i2UhAI7ofAo3AQAAAAAAAA==",
"_self": "dbs/i2UhAA==/colls/i2UhAI7ofAo=/docs/i2UhAI7ofAo3AQAAAAAAAA==/",
"_etag": "\"430131c1-0000-0100-0000-5f5621d80000\"",
"_attachments": "attachments/",
"_ts": 1599480280
}
]
UPDATE:
So doing the following returns the correct data but sadly you can only return data thats inside your group by or an aggregate function (i.e. cant do select *)
SELECT c.serial, MAX(c.EventProcessedUtcTime)
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')
GROUP BY c.serial
[
{
"serial": "synap-aim-g1",
"$1": "2020-09-09T06:29:42.6812629Z"
},
{
"serial": "V55555555",
"$1": "2020-09-07T12:04:34.5861918Z"
}
]

Thanks for #AnuragSharma-MSFT's help:
I am afraid there is no direct way to achieve it using a query in
cosmos db. However you can refer to below link for the same topic. If
you are using any sdk, this would help in achieving the desired
functionality: https://learn.microsoft.com/en-us/answers/questions/38454/index.html
We're glad that you resolved it in this way, thanks for sharing the update:
So doing the following returns the correct data but sadly you can only return data thats inside your group by or an aggregate function (i.e. cant do select *)
SELECT c.serial, MAX(c.EventProcessedUtcTime)
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')
GROUP BY c.serial
[
{
"serial": "synap-aim-g1",
"$1": "2020-09-09T06:29:42.6812629Z"
},
{
"serial": "V55555555",
"$1": "2020-09-07T12:04:34.5861918Z"
}
]

If the question is really about an efficient approach to this particular query scenario, we can consider denormalization in cases where the query language itself doesn't offer an efficient solution. This guide on partitioning and modeling has a relevant section on getting the latest items in a feed.
We just need to get the 100 most recent posts, without the need to
paginate through the entire data set.
So to optimize this last request, we introduce a third container to
our design, entirely dedicated to serving this request. We denormalize
our posts to that new feed container.
Following this approach, you could create a "Feed" or "LatestEvent" container dedicated to the "latest" query which uses the device serial as id and having a single partition key in order to guarantee that there is only one (the most recent) event item per device, and that it can be fetched by the device serial or listed with least possible cost using a simple query:
SELECT *
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')
The change feed could be used to upsert the latest event, such that the latest event is created/overwritten in the "LatestEvent" container as its source item is created in the main.

Related

How to specify 'includes' for nested properties in a document in RavenDB

I am trying to use RavenDB's REST API to make some calls to my database and I'm wondering if there's a way to use the 'includes' feature to return documents that are nested in a document.
For example, I have an object, Order, that looks similar to this:
"Order": {
"Lines": [
{
"Product": "products/11-A"
},
{
"Product": "products/42-A"
},
{
"Product": "products/72-A"
}
],
"OrderedAt": "1996-07-04T00:00:00.0000000",
"Company": "companies/85-A"
}
Company maps to another document and is simple enough to include in the query.
{ "Query": "from Orders include Company" }
My problem is Product that is nested in Lines, which is an array of order lines. Since I didn't find anything in the documentation about it I tried things like include Product or include Lines.Product, but those didn't work.
Is this kind of thing possible with the REST API? If so, how would I go about doing it?
The syntax to Query for Related Documents from the client can be found in this demo:
https://demo.ravendb.net/demos/csharp/related-documents/query-related-documents
The matching RQL to be used in the Query when using REST API is:
from 'Orders' include 'Lines[].Product'

Delete function for multi data

I create a collection like above .Can i able to write query to delete a specific addons from this collection?
In here my partition key is id .
{
"id": "00000000-0000-0000-0000-000000000000",
"Subcategory": [{
"Product": [{
"MethodOfPreparation": [{ }],
"Addons": [{ }]
}]
}
Please help me
Point out a point firstly,i presume you are talking about Document,not Collection.
Using sql to delete documents is not possible so far. You need to fetch the documents using SELECT query and then delete the documents individually using DELETE SDK.
If you want to remove partial of document like your description in your question,you need to select the document first to get rid of the parts you want to remove,then replace the document.Please see this newly blog.

Cumulocity Inventory API filter by Creation Date

I'm currently trying to implement a simple date filter for the Inventory API using the query language. The filter should return a list of managed objects which were created after a given date. For some reasons I always receive an empty list as result but the example in the query language documentation looks the same as my query:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'
gives me
{
"managedObjects": [],
"next": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=2",
"statistics": {
"currentPage": 1,
"pageSize": 5
},
"self": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=1"
}
And if I try this structure for the timestamp I even receive an error:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.3512B1:00'
{
"error": "inventory/Invalid Data",
"info": "https://www.cumulocity.com/guides/reference-guide/#error_reporting",
"message": "Find by filter query failed : Query 'creationTime gt '2018-12-01T09:00:00'' could not be understood. Please try again."
}
Try to filter by
creationTime.date
Background is that the timestamps are stored as MongoDb dates.
You can also check the device list filter in device management which has a filter on creationTime as well.

Square Connect API: Retrieving all items within a category

I have been reading over the Square Connect API and messing around with the catalog portion.
I am unable to find how to retrieve all items and their data associated with a particular category. Can someone please point me in the right direction.
I thought it was the
BatchRetrieveCatalogObjects endpoint
I was using the category ID but it was only returning the catalog's data. I need each of the IDs of the items to retrieve their individual data.
I was looking to propagate a list of all the items and their data in one request in JSON.
JSON data to be passed to endpoint:
data = {
"object_ids": [
"category id"
],
"include_related_objects": True
}
My connection to the API:
category_item_endpoint = self.connection.post('/v2/catalog/batch-retrieve', data)
I am using python3 and the requests library.
In order to list items in a category I found it easiest to use the /v2/catalog/search endpoint. Simply follow the documentation on what parameters are accepted. Below are the search parameters that I used to list items by category id.
let sParams: JSON = [
"object_types": [
"ITEM"
],
"include_related_objects": true,
"include_deleted_objects": false,
"query": [
"exact_query": [
"attribute_name": "category_id",
"attribute_value": id
]
],
"limit": 1000
]
You'd probably have the most luck listing your entire catalog GET /v2/catalog/list and then applying filtering (in this case specific catagory_ids ) after you get the data. Based on the documentation doing what you desire doesn't seem possible with an endpoint/query combitionation.

Without JOINs, what is the right way to handle data in document databases?

I understand that JOINs are either not possible or frowned upon in document databases. I'm coming from a relational database background and trying to understand how to handle such scenarios.
Let's say I have an Employees collection where I store all employee related information. The following is a typical employee document:
{
"id": 1234,
"firstName": "John",
"lastName": "Smith",
"gender": "Male",
"dateOfBirth": "3/21/1967",
"emailAddresses":[
{ "email": "johnsmith#mydomain.com", "isPrimary": "true" },
{ "email": "jsmith#someotherdomain.com", "isPrimary": "false" }
]
}
Let's also say, I have a separate Projects collection where I store project data that looks something like that:
{
"id": 444,
"projectName": "My Construction Project",
"projectType": "Construction",
"projectTeam":[
{ "_id": 2345, "position": "Engineer" },
{ "_id": 1234, "position": "Project Manager" }
]
}
If I want to return a list of all my projects along with project teams, how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc?
Is it two separate queries? One for projects and the other for people whose ID's appear in the projects collection?
If so, how do I then insert the data about people i.e. full names, email addresses? Do I then do a foreach loop in my app to update the data?
If I'm relying on my application to handle populating all the pertinent data, is this not a performance hit that would offset the performance benefits of document databases such as MongoDB?
Thanks for your help.
"...how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc? Is it two separate queries?"
It is either 2 separate queries OR you denormalize into the Project document. In our applications we do the 2nd query and keep the data as normalized as possible in the documents.
It is actually NOT common to see the "_id" key anywhere but on the top-level document. Further, for collections that you are going to have millions of documents in, you save storage by keeping the keys "terse". Consider "name" rather than "projectName", "type" rather than "projectType", "pos" rather than "position". It seems trivial but it adds up. You'll also want to put an index on "team.empId" so the query "how many projects has Joe Average worked on" runs well.
{
"_id": 444,
"name": "My Construction Project",
"type": "Construction",
"team":[
{ "empId": 2345, "pos": "Engineer" },
{ "empId": 1234, "pos": "Project Manager" }
]
}
Another thing to get used to is that you don't have to write the whole document every time you want to update an individual field or, say, add a new member to the team. You can do targeted updates that uniquely identify the document but only update an individual field or array element.
db.projects.update(
{ _id : 444 },
{ $addToSet : "team" : { "empId": 666, "position": "Minion" } }
);
The 2 queries to get one thing done hurts at first, but you'll get past it.
Mongo DB is a document storage database.
It supports High Availability, and Scalability.
For returning a list of all your projects along with project team(details),
according to my understanding, you will have to run 2 queries.
Since mongoDb do not have FK constraints, we need to maintain it at the program level.
Instead of FK constraints,
1) if the data is less, then we can embed the data as a sub document.
2) rather than normalized way of designing the db, in MongoDb we need to design according to the access pattern. i.e. the way we need to query the data more likely. (However time for update is more(slow), but at the user end the performance mainly depends on read activity, which will be better than RDBMS)
The following link provides a certificate course on mongo Db, free of cost.
Mongo DB University
They also have a forum, which is pretty good.