Find distinct length of string field using mongo shell - mongodb-query

Given a mongo collection like:
col1 col2
1 "mango"
2 "banana"
3 "watermelon"
4 "orange"
How do I get the length of distinct string lengths of column col2? It would probably be using the strLenCP function but am not able to construct it just for the projection.
Expected output would be:
(5, 6, 10)
since the distinct string lengths of (banana, orange) are 6, watermelon 10 and mango 5.

You can do this with an aggregation pipeline by using $strLenCP within a $group:
db.test.aggregate([
// Group documents by their col2 string length
{$group: {_id: {$strLenCP: '$col2'}}}
])
Output:
{ "_id" : 10 }
{ "_id" : 6 }
{ "_id" : 5 }

Related

How can I modify all values that match a condition inside a json array?

I have a table which has a JSON column called people like this:
Id
people
1
[{ "id": 6 }, { "id": 5 }, { "id": 3 }]
2
[{ "id": 2 }, { "id": 3 }, { "id": 1 }]
...and I need to update the people column and put a 0 in the path $[*].id where id = 3, so after executing the query, the table should end like this:
Id
people
1
[{ "id": 6 }, { "id": 5 }, { "id": 0 }]
2
[{ "id": 2 }, { "id": 0 }, { "id": 1 }]
There may be more than one match per row.
Honestly, I didnĀ“t tried any query since I cannot figure out how can I loop inside a field, but my idea was something like this:
UPDATE mytable
SET people = JSON_SET(people, '$[*].id', 0)
WHERE /* ...something should go here */
This is my version
SELECT VERSION()
+-----------------+
| version() |
+-----------------+
| 10.4.22-MariaDB |
+-----------------+
If the id values in people are unique, you can use a combination of JSON_SEARCH and JSON_REPLACE to change the values:
UPDATE mytable
SET people = JSON_REPLACE(people, JSON_UNQUOTE(JSON_SEARCH(people, 'one', 3)), 0)
WHERE JSON_SEARCH(people, 'one', 3) IS NOT NULL
Note that the WHERE clause is necessary to prevent the query replacing values with NULL when the value is not found due to JSON_SEARCH returning NULL (which then causes JSON_REPLACE to return NULL as well).
If the id values are not unique, you will have to rely on string replacement, preferably using REGEXP_REPLACE to deal with possible differences in spacing in the values (and also avoiding replacing 3 in (for example) 23 or 34:
UPDATE mytable
SET people = REGEXP_REPLACE(people, '("id"\\s*:\\s*)2\\b', '\\14')
Demo on dbfiddle
As stated in the official documentation, MySQL stores JSON-format strings in a string column, for this reason you can either use the JSON_SET function or any string function.
For your specific task, applying the REPLACE string function may suit your case:
UPDATE
mytable
SET
people = REPLACE(people, CONCAT('"id": ', 3, ' '), CONCAT('"id": ',0, ' '))
WHERE
....;

How to convert a JSON field to Tabular format in SQL Query?

I have a Table containing 3 columns
(ID, Content, Date), where the Content column have values in json format as shown below:
{
"Id": "9999",
"Name": "PETERPAN",
"SubContent": [
{
"subcontent1": "ABC",
"subcontent2": "123"
}
[
}
How can I convert it into tabular format using SQL Query?
Use LATERAL FLATTEN to get the key/value pairs as separate rows:
with t as (
select parse_json('{
"Id": "9999",
"Name": "PETERPAN",
"SubContent":
{
"subcontent1": "ABC",
"subcontent2": "123"
}
}') col
)
select col:Id as id, col:Name as name, sc.key, sc.value
from t, lateral flatten( input => col:SubContent ) sc;
The result is
ID NAME KEY VALUE
9999 PETERPAN subcontent1 ABC
9999 PETERPAN subcontent2 123

GET last element of array in json column of my Transact SQL table

Thanks for helping.
I have my table CONVERSATIONS structured in columns like this :
[ ID , JSON_CONTENT ]
In the column ID i have a simple id in Varchar
In the column JSON_CONTENT i something like this :
{
"id_conversation" : "25bc8cbffa8b4223a2ed527e30d927bf",
"exchanges": [
{
"A" : "...",
"B": "..."
},
{
"A" : "...",
"B": "..."
},
{
"A" : "...",
"Z" : "..."
}
]
}
I would like to query and get the id and the last element of exchanges :
[ ID , LAST_ELT_IN_EXCHANGE_IN_JSON_CONTENT]
I wanted to do this :
select TOP 3 ID, JSON_QUERY(JSON_CONTENT, '$.exchange[-1]')
from CONVERSATION
But of course Transact SQL is not Python.
I saw theses answers, but i don't know how to applicate to my problem.
Select last value from Json array
Thanks for helping <3
If I understand you correctly, you need an additional APPLY operator and a combination of OPENJSON() and ROW_NUMBER(). The result from the OPENJSON() call is a table with columns key, value and type and when the JSON content is an array, the key column returns the index of the element in the specified array:
Table:
SELECT ID, JSON_CONTENT
INTO CONVERSATION
FROM (VALUES
(1, '{"id_conversation":"25bc8cbffa8b4223a2ed527e30d927bf","exchanges":[{"A":"...","B":"..."},{"A":"...","B":"..."},{"A":"...","Z":"..."}]}')
) v (ID, JSON_CONTENT)
Statement:
SELECT c.ID, j.[value]
FROM CONVERSATION c
OUTER APPLY (
SELECT [value], ROW_NUMBER() OVER (ORDER BY CONVERT(int, [key]) DESC) AS rn
FROM OPENJSON(c.JSON_CONTENT, '$.exchanges')
) j
WHERE j.rn = 1
Result:
ID value
------------------------
1 {
"A" : "...",
"Z" : "..."
}
Notice, that -1 is not a valid array index in your path expression, but you can access the item in a JSON array by index (e.g. '$.exchanges[2]').

Postgresql search if exists in nested jsonb

I'm new with jsonb request and i got a problem. Inside an 'Items' table, I have 'id' and 'data' jsonb. Here is what can look like a data:
[
{
"paramId": 3,
"value": "dog"
},
{
"paramId": 4,
"value": "cat"
},
{
"paramId": 5,
"value": "fish"
},
{
"paramId": 6,
"value": "",
"fields": [
{
"paramId": 3,
"value": "cat"
},
{
"paramId": 4,
"value": "dog"
}
]
},
{
"paramId": 6,
"value": "",
"fields": [
{
"paramId": 5,
"value": "cat"
},
{
"paramId": 3,
"value": "dog"
}
]
}
]
The value in data is always an array with object inside but sometimes the object can have a 'fields' value with objects inside. It is maximum one level deep.
How can I select the id of the items which as for example an object containing "paramId": 3 and "value": "cat" and also have an object with "paramId": 5 and "value" LIKE '%ish%'.
I already have found a way to do that when the object is on level 0
SELECT i.*
FROM items i
JOIN LATERAL jsonb_array_elements(i.data) obj3(val) ON obj.val->>'paramId' = '3'
JOIN LATERAL jsonb_array_elements(i.data) obj5(val) ON obj2.val->>'paramId' = '5'
WHERE obj3.val->>'valeur' = 'cat'
AND obj5.val->>'valeur' LIKE '%ish%';
but I don't know how to search inside the fields array if fields exists.
Thank you in advance for you help.
EDIT:
It looks like my question is not clear. I will try to make it better.
What I want to do is to find all the 'item' having in the 'data' column objects who match my search criteria. This without looking if the objects are at first level or inside a 'fields' key of an object.
Again for example. This record should be selected if I search:
'paramId': 3 AND 'value': 'cat
'paramId': 4 AND 'value': LIKE '%og%'
the matching ones are in the 'fields' key of the object with 'paramId': 6 and I don't know how to do that.
This can be expressed using a JSON/Path expression without the need for unnesting everything
To search for paramId = 3 and value = 'cat'
select *
from items
where data #? '$[*] ? ( (#.paramId == 3 && #.value == "cat") || exists( #.fields[*] ? (#.paramId == 3 && #.value == "cat")) )'
The $[*] part iterates over all elements of the first level array. To check the elements in the fields array, the exists() operator is used to nest the expression. #.fields[*] iterates over all elements in the fields array and applies the same expression again. I don't see a way how repeating the values could be avoided though.
For a "like" condition, you can use like_regex:
select *
from items
where data #? '$[*] ? ( (#.paramId == 4 && #.value like_regex ".*og.*") || exists( #.fields[*] ? (#.paramId == 4 && #.value like_regex ".*og.*")) )'
For now I have found a solution but it is not really clean and I don't know how it will perform in production with 10M records.
SELECT i.id, i.data
FROM ( -- A;
select it.id, it.data, i as value
from items it,
jsonb_array_elements(it.data) i
union
select it.id, it.data, f as value
from items it,
jsonb_array_elements(it.data) i,
jsonb_array_elements(i -> 'fields') f
) as i
WHERE (i.value ->> 'paramId' = '5' -- B1;
AND i.value ->> 'value' LIKE '%ish%')
OR (i.value ->> 'paramId' = '3' -- B2;
AND i.value ->> 'value' = 'cat')
group by i.id, i.data
having COUNT(*) >= 2; -- C;
A: I "flatten" the first and second level (second level is in 'fields' key)
B1, B2: These are my search criteria
C: I make sure the fields have all the criteria matching. If 3 criteria --> COUNT(*) >=3
It really doesn't look clean to me. It is working for dev purpose but I think there is a better way to do it.
If somebody have an idea Big thanks to him/her!

BigQuery: Select entire repeated field with group

I'm using LegacySQL, but am not strictly limited to it. (though it does have some methods I find useful, "HASH" for example).
Anyhow, the simple task is that I want to group by one top level column, while still keeping the first instance of a nested+repeated set of data alongside.
So, the following "works", and produces nested output:
SELECT
cd,
subarray.*
FROM [magicalfairy.land]
And now I attempt to just grab the entire first subarray (honestly, I don't expect this to work of course)
The following is what doesn't work:
SELECT
cd,
FIRST(subarray.*)
FROM [magicalfairy.land]
GROUP BY cd
Any alternate approaches would be appreciated.
Edit, for data behaviour example.
If Input data was roughly:
[
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
},
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
}
]
Would expect to get out:
[
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
}
]
You'll have a much better time preserving the structure using standard SQL, e.g.:
WITH T AS (
SELECT
cd,
ARRAY<STRUCT<x INT64, y BOOL>>[
STRUCT(off, MOD(off, 2) = 0),
STRUCT(off - 1, false)] AS subarray
FROM UNNEST([1, 2, 1, 2]) AS cd WITH OFFSET off)
SELECT
cd,
ANY_VALUE(subarray) AS subarray
FROM T
GROUP BY cd;
ANY_VALUE will return some value of subarray for each group. If you wanted to concatenate the arrays instead, you could use ARRAY_CONCAT_AGG.
to run this against your table - try below
SELECT
cd,
ANY_VALUE(subarray) AS subarray
FROM `magicalfairy.land`
GROUP BY cd
Try below (BigQuery Standard SQL)
SELECT cd, subarray
FROM (
SELECT cd, subarray,
ROW_NUMBER() OVER(PARTITION BY cd) AS num
FROM `magicalfairy.land`
) WHERE num = 1
This gives you expected result - equivalent of "ANY ARRAY"
This solution can be extended to "FIRST ARRAY" by adding ORDER BY sort_col into OVER() clause - assuming that sort_col defines the logical order