I am new to MongoDb and have a query which I am struggling with..
I have a collection of reported users which looks like this:
{
"_id": 1,
"userId": 1,
"reason": "some reason",
"date": "2017-07-22"
}
I need a query which will add to each report the number of reports for that userId.
meaning if the collection has 3 records with userId=1. the query will return three records and each of them will also include a field
count=3 meaning the record above will now look like this:
{
"_id": 1
"userId": 1,
"reason": "some reason",
"date": "2017-07-22",
"count": 3
}
I tried using $project and $addFields aggregations but was not able to add a field which is a result of a query over the whole collection.
any ideas?
The answer provided by Veeram is correct, just that if you are running mongodb version less than 3.6, in the last stage of aggregate you might want to replace the $replaceRoot operator with a $project
aggregate([
{$group:{_id:"$userId", "data":{"$push":"$$ROOT"}, count:{$sum:1} } },
{"$unwind":"$data"},
{$project:{_id:"$data._id", userId:"$data.userId", reason:"$data.reason", date:"$data.date", count:1}}
])
Related
What am I trying to achieve:
I would like to have a time series chart showing the total number of members in my club at any time. This member count should be calculated by using the field "Eintrittsdatum" (joining-date) and "Austrittsdatum" (leaving-date). I’m thinking of it as a running sum - every filled field with a joining-date means +1 on the member count, every leaving-date entry is a -1.
Data structure
I’m calling the API of webling.ch with a secret key. This is my data structure with sample data per member:
[
{
"type": "member",
"meta": {
"created": "2020-03-02 11:33:00",
"createuser": {
"label": "Joana Doe",
"type": "user"
},
"lastmodified": "2022-12-06 16:32:56",
"lastmodifieduser": {
"label": "Joana Doe",
"type": "user"
}
},
"readonly": true,
"properties": {
"Mitglieder ID": 99,
"Anrede": "Dear",
"Vorname": "Jon",
"Name": "Doe",
"Strasse": "Doeington Street",
"Adresszusatz": null,
"PLZ": "9999",
"Ort": "Doetown",
"E-Mail": "jon.doe#doenet.net",
"Telefon Privat": null,
"Telefon Geschäft": null,
"Mobile": "099 877 54 54",
"Geschlecht": "m",
"Geburtstag": "1966-03-10",
"Mitgliedschaftstyp": "Aktivmitgliedschaft",
"Eintrittsdatum": "2020-03-01",
"Austrittsdatum": null,
"Passfoto": null,
"Wordpress Benutzername": null,
"Wohnhaft im Glarnerland": false,
"Lat": "43.1563379",
"Long": "6.0474622"
},
"parents": [
240
],
"children": {
},
"links": {
"debitor": [
2124,
3056,
3897
],
"attendee": [
2576
]
},
"id": 1815
}
]
Grafana data source
I am using the “JSON API” by Marcus Olsson: GitHub - grafana/grafana-json-datasource: A data source plugin for loading JSON APIs into Grafana.
Grafana v9.3.1 (89b365f8b1) on Linux
My current approach
Queries:
Query C - uses a filter on the source-API to only show entries with "Eintrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Eintrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Query D - uses a filter on the source-API to only show entries with "Austrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Austrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Here's a screenshot to clarify things
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-1.png)
Transformations:
My applied transformations
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-2.png)
What's working
I can correctly gather the number of members added/subtracted per day.
What's not working
I can't get the graph to display the way i want: I'd like to have a running sum of these numbers instead of the following two graphs.
Time series graph with merged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-3.png)
Time series graph with unmerged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-4.png)
I can't get the names to display within the tooltip of the data points (really not THAT necessary).
I have a simple string and hash stored in redis
get test
"1"
hget htest first
"first hash"
I'm able to see the "table" test, but there are no columns
trino> show columns from redis.default.test;
Column | Type | Extra | Comment
--------+------+-------+---------
(0 rows)
and obviously I can't get result from select
trino> select * from redis.default.test;
Query 20210918_174414_00006_dmp3x failed: line 1:8: SELECT * not allowed from relation
that has no columns
I see in the documentation that I might need to create a table definition file, but I wasn't able to create one that will work.
I had few variations of this, but this is the one for example:
{
"tableName": "test",
"schemaName": "default",
"value": {
"dataFormat": "json",
"fields": [
{
"name": "number",
"mapping": 0,
"type": "INT"
}
]
}
}
any idea what am I doing wrong?
I focused on the string since it's simpler, but I also need to query the hash
Background:
I wish to locate the entire JSON document that has a condition where "state" = "new" and where length(Features.id) > 4
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
This is what I have tried to do:
Since this is a nested document. My query looks like this:
A stackoverflow member has helped me to access the nested contents within the query, but is there a way to obtain the full document
I have used:
SELECT VALUE t.id FROM t IN f.feedback.Features where t.state = 'new' and length(t.id)>4
This will give me the ids.
My desire is to have access to the full document with this condition?
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
Any help is appreciated
Try this
SELECT *
FROM f
WHERE
f.feedback.Features[0].state = 'new'
AND length(f.feedback.Features[0].id)>4
Here is the SELECT spec for CosmosDB for more details
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-select
Also, check out "working with JSON" in CosmosDB notes
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-working-with-json
If the Features array has more than 1 value, you can use EXISTS clause to search within them. See specs of EXISTS here with examples:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#exists-expression
SELECT * FROM some_table;
I can query to get the following results:
{
"sku0": {
"Id": "18418",
"Desc": "yes"
},
"sku1": {
"Id": "17636",
"Desc": "no"
},
"sku2": {
"Id": "206714",
"Desc": "yes"
},
"brand": "abc",
"displayName": "something"
}
First, the number of skus is not fixed. It may be sku0, sku1, sku2, sku3, sku4 ... but they all start with sku.
Then, I want to query Id with 17636 and determine whether its value of Desc is yes or no. After reading the PostgreSQL JSON Functions and Operators documentation, Depressing I didn't find a good way.
I can convert the result into a Python dictionary, and then use python's method can easily achieve my requirements.
If the requirements can also be achieved with postgresql statements, which method is more recommended than the Python dictionary?
I am not sure I completely understand what the result is you want. But if you want to filter on the Id, you need to unnest all the elements inside the JSON column:
select d.v ->> 'Desc' as description
from the_table t
cross join jsonb_each(t.data) as d(k,v)
where d.v ->> 'Id' = '17636'
You could use the new jsonpath notation of PostgreSQL v12:
SELECT data ## '$.* ? (#.Id == "17636").Desc == "yes"'
FROM some_table;
That will start with the root of data ($), find any attribute in it (*), filter only those attributes that contain an Id with value "17636", get their Desc attribute and return TRUE only if that attribute is "yes".
Nice, isn't it?
This will probably give you what you need.
select value->>'Desc' from jsonb_each('{
"sku0": {
"Id": "18418",
"Desc": "yes"
},
"sku1": {
"Id": "17636",
"Desc": "no"
},
"sku2": {
"Id": "206714",
"Desc": "yes"
},
"brand": "abc",
"displayName": "something"
}'::jsonb)
where key like 'sku%'
and value->>'Id'='17636'
Best regards,
Bjarni
I am trying to do a simple facet request over a field containing more than one word (Eg: 'Name1 Name2', sometimes with dots and commas inside) but what I get is...
"terms" : [{
"term" : "Name1",
"count" : 15
},
{
"term" : "Name2",
"count" : 15
}]
so my field value is split by spaces and then runs the facet request...
Query example:
curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
"query": {
"query_string": {
"fields": [
"dataset"
],
"query": "2",
"default_operator": "AND"
}
},
"facets": {
"test": {
"terms": {
"field": [
"speciesName"
],
"size": 50000
}
}
}
}'
Your field shouldn't be analyzed, or at least not tokenized. You need to update your mapping and then reindex if you want to index the field without tokenizing it.
First of all, javanna provided a very good answer from a practical perspective. However, for the sake of completeness, I want to mention that in some cases there is a way to do it without reindexing the data.
If the speciesName field is stored and your queries produce relatively small number of results, you can use script_field to retrieve stored field values:
curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
"query": {
"query_string": {
"fields": ["dataset"],
"query": "2",
"default_operator": "AND"
}
},
"facets": {
"test": {
"terms": {
"script_field": "_fields['\''speciesName'\''].value",
"size": 50000
}
}
}
}
'
As a result of this query, elasticsearch will retrieve the speciesName field for every record in your result set and it will construct facets from these values. Needless to say, if your result set contains millions of records, performance of this query might be sluggish.
Similarly, if the field is not stored, but record source is stored, you can use script_field facet to retrieve field values from the source:
......
"script_field": "_source['\''speciesName'\'']",
......
Again, source for each record in the result list will be retrieved and parsed, so you might need some patience to run this query on a large set of records.