Select JSON object that appears more than one time - sql

I am trying to write a query to return all trains that have more than one etapesSupervision.
My table has a column called DETAIL, in this column I can find the JSON of my train.
"nomTrain": "EVOL99",
"compositionCourtLong": "LONG",
"sillons": [{
"numeroTrain": "EVOL99"
}],
"sillonsV4": [{
"refSillon": "sillons/4289505/2"
}],
"branchesStif": [{
"data": "49",
"data": "BP",
"data": "ORIGINE"
} ],
"etapesSupervision": [{
"data": "PR/0087-758896-00",
"data": "PR/0087-758607-BV",
"superviseur": "1287",
"uoSuperviseur": "B"
},
{
"data": "PR/0087-758607-BV",
"data": "PR/0087-001479-BV",
"superviseur": "1287",
"uoSuperviseur": "B"
}],
This is the query I wrote :
select * from course where CODE_LIGNE_COMMERCIALE='B'
--and ref = 'train/2018-11-12'
and instr(count(train.detail,'"etapesSupervision":'))> 1 ;
Using this, I return trains with only one etapesSupervision.
The thing is the column DETAIL is JSON, so I feel like I can't do a lot with it.
I tried also with like, but it doesn't work either.

Thank you for your comments.
This is the query that worked:
select data,data,data
from train
where
length(DETAIL) - length(replace(DETAIL,'uoSuperviseur',null)) > 20 ;
And this way I have only trains that have more than one supervisor.
Thanks again

Related

Add computed field to Query in Grafana using JSON API als data source

What am I trying to achieve:
I would like to have a time series chart showing the total number of members in my club at any time. This member count should be calculated by using the field "Eintrittsdatum" (joining-date) and "Austrittsdatum" (leaving-date). I’m thinking of it as a running sum - every filled field with a joining-date means +1 on the member count, every leaving-date entry is a -1.
Data structure
I’m calling the API of webling.ch with a secret key. This is my data structure with sample data per member:
[
{
"type": "member",
"meta": {
"created": "2020-03-02 11:33:00",
"createuser": {
"label": "Joana Doe",
"type": "user"
},
"lastmodified": "2022-12-06 16:32:56",
"lastmodifieduser": {
"label": "Joana Doe",
"type": "user"
}
},
"readonly": true,
"properties": {
"Mitglieder ID": 99,
"Anrede": "Dear",
"Vorname": "Jon",
"Name": "Doe",
"Strasse": "Doeington Street",
"Adresszusatz": null,
"PLZ": "9999",
"Ort": "Doetown",
"E-Mail": "jon.doe#doenet.net",
"Telefon Privat": null,
"Telefon Geschäft": null,
"Mobile": "099 877 54 54",
"Geschlecht": "m",
"Geburtstag": "1966-03-10",
"Mitgliedschaftstyp": "Aktivmitgliedschaft",
"Eintrittsdatum": "2020-03-01",
"Austrittsdatum": null,
"Passfoto": null,
"Wordpress Benutzername": null,
"Wohnhaft im Glarnerland": false,
"Lat": "43.1563379",
"Long": "6.0474622"
},
"parents": [
240
],
"children": {
},
"links": {
"debitor": [
2124,
3056,
3897
],
"attendee": [
2576
]
},
"id": 1815
}
]
Grafana data source
I am using the “JSON API” by Marcus Olsson: GitHub - grafana/grafana-json-datasource: A data source plugin for loading JSON APIs into Grafana.
Grafana v9.3.1 (89b365f8b1) on Linux
My current approach
Queries:
Query C - uses a filter on the source-API to only show entries with "Eintrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Eintrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Query D - uses a filter on the source-API to only show entries with "Austrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Austrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Here's a screenshot to clarify things
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-1.png)
Transformations:
My applied transformations
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-2.png)
What's working
I can correctly gather the number of members added/subtracted per day.
What's not working
I can't get the graph to display the way i want: I'd like to have a running sum of these numbers instead of the following two graphs.
Time series graph with merged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-3.png)
Time series graph with unmerged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-4.png)
I can't get the names to display within the tooltip of the data points (really not THAT necessary).

Change JSON Keys in Nested JSON in SQL Table

I have a table with column called tableJson which contains information of the following type:
[
{
"type": "TABLE",
"content": {
"rows":
[{"Date": "2021-09-28","Monthly return": "1.44%"},
{"Date": "2021-11-24", "Yearly return": "0.62%"},
{"Date": "2021-12-03", "Monthly return": "8.57%"},
{},
]
}
}
]
I want to change "Monthly Return" to "Weekly Return" everywhere in the table column where it exists.
Thank you in advance!
I tried different approaches to Parse, read, OPENJSON, CROSS APPLY but could not make it.

How to understand the field "groups" and the agg "GROUPING" in EnumerableAggregate

I am new to Calcite and I am using Calcite to convert a SQL query to an optimized plan, where I will translate the plan to a dataflow graph in an execution engine. One challenge is the translation of different RelNodes (e.g., Filter, Project, Aggregate, Calc, etc). I found a difficulty in understanding the EnumerableAggregate RelNode. Specifically, for the following example, where I defined a table T as
create table T (src int, dst int, label int, time int);
and wrote a toy query as
select count(distinct dst), sum(distinct label), count(*)
from T
where dst > 1
group by src
having src = 0;
I will obtain an optimized plan which contains two EnumerableAggregate RelNodes and here is the first EnumerableAggregate RelNode:
{
"id": "2",
"relOp": "org.apache.calcite.adapter.enumerable.EnumerableAggregate",
"group": [ 0, 1, 2 ],
"groups": [
[ 0, 1 ], [ 0, 2 ], [ 0 ]
],
"aggs": [
{
"agg": {
"name": "COUNT",
"kind": "COUNT",
"syntax": "FUNCTION_STAR"
},
"type": {
"type": "BIGINT",
"nullable": false
},
"distinct": false,
"operands": [],
"name": "EXPR$2"
},
{
"agg": {
"name": "GROUPING",
"kind": "GROUPING",
"syntax": "FUNCTION"
},
"type": {
"type": "BIGINT",
"nullable": false
},
"distinct": false,
"operands": [ 0, 1, 2 ],
"name": "$g"
}
]
}
I think I understand the reason why there are two Aggregate RelNodes. The reason is due to the use of distinct on dst in count and the use of distinct on label in sum in the query, where the optimizer wants to first group the data by (1) the group key src and (2) the two distinct columns (dst and label), in order to remove duplications. Then in the second Aggregate we calculate count and sum,
What I do not understand is how the first Aggregate processes the input data, what does the field groups (i.e., [0, 1], [0, 2] and [0]) do, what does the agg function GROUPING do and how many columns are there in the ouput of the first Aggregate.
For example, given the following input data: [[2,3,4,0], [2,3,4,1], [3,2,4,2], [3,2,4,3], [5,6,7,4], [5,6,7,5]], I think the data will be firstly divided into three groups: [[2,3,4,0], [2,3,4,1]] and [[3,2,4,2], [3,2,4,3]] and [[5,6,7,4], [5,6,7,5]]. But what is the next step?
Any help would be appreciated. Thanks!

How to generate JSON array from multiple rows, then return with values of another table

I am trying to build a query which combines rows of one table into a JSON array, I then want that array to be part of the return.
I know how to do a simple query like
SELECT *
FROM public.template
WHERE id=1
And I have worked out how to produce the JSON array that I want
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
However, I cannot work out how to combine the two, so that the result is a number of fields from public.template with the output of the second query being one of the returned fields.
I am using PostGreSQL 9.6.6
Edit, as requested more information, a definition of field and template tables and a sample of each queries output.
Currently, I have a JSONB row on the template table which I am using to store an array of fields, but I want to move fields to their own table so that I can more easily enforce a schema on them.
Template table contains:
id
name
data
organisation_id
But I would like to remove data and replace it with the field table which contains:
id
name
format
data
template_id
At the moment the output of the first query is:
{
"id": 1,
"name": "Test Template",
"data": [
{
"id": "1",
"data": null,
"name": "Assigned User",
"format": "String"
},
{
"id": "2",
"data": null,
"name": "Office",
"format": "String"
},
{
"id": "3",
"data": null,
"name": "Department",
"format": "String"
}
],
"id_organisation": 1
}
This output is what I would like to recreate using one query and both tables. The second query outputs this, but I do not know how to merge it into a single query:
[{
"id": 1,
"name": "Assigned User",
"format": "String",
"data": null
},{
"id": 2,
"name": "Office",
"format": "String",
"data": null
},{
"id": 3,
"name": "Department",
"format": "String",
"data": null
}]
The feature you're looking for is json concatenation. You can do that by using the operator ||. It's available since PostgreSQL 9.5
SELECT to_jsonb(template.*) || jsonb_build_object('data', (SELECT to_jsonb(field) WHERE template_id = templates.id)) FROM template
Sorry for poorly phrasing what I was trying to achieve, after hours of Googling I have worked it out and it was a lot more simple than I thought in my ignorance.
SELECT id, name, data
FROM public.template, (
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
) as data
WHERE id = 1
I wanted the result of the subquery to be a column in the ouput rather than compiling the entire output table as a JSON.

How to perform a SELECT in the results returned from a GROUP BY Druid?

I am having a hard time converting this simple SQL Query below into Druid:
SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;
So I came up with this query so far:
{
"queryType": "groupBy",
"dataSource" : "people_data",
"granularity": "all",
"metric" : "num_of_pages",
"dimensions": ["country", "city"],
"filter" : {
"type" : "and",
"fields" : [
{
"type": "in",
"dimension": "name",
"values": ["Mary"]
},
{
"type" : "javascript",
"dimension" : "email",
"function" : "function(value) { return (value.length !== 0) }"
}
]
},
"aggregations": [
{ "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
],
"intervals": [ "2016-07-20/2016-07-21" ]
}
The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?
Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.
Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.
In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).
Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.