How to group by duplicate value of nested array in Postgresql? - sql

Previously question : How to group by duplicate value and nested the array Postgresql
Using this query :
SELECT json_build_object(
'nama_perusahaan',"a"."nama_perusahaan",
'proyek', json_agg(
json_build_object(
'no_izin',"b"."no_izin",
'kode',c.kode,
'judul_kode',d.judul
)
)
)
FROM "t_pencabutan" "a"
LEFT JOIN "t_pencabutan_non" "b" ON "a"."id_pencabutan" = "b"."id_pencabutan"
LEFT JOIN "t_pencabutan_non_b" "c" ON "b"."no_izin" = "c"."no_izin"
LEFT JOIN "t_pencabutan_non_c" "d" ON "c"."id_proyek" = "d"."id_proyek"
GROUP BY "a"."nama_perusahaan"
The result is shown below:
{
"nama_perusahaan" : "JASA FERRIE",
"proyek" :
{
"no_izin" : "26A/E/IU/PMA/D8FD",
"kode" : "14302",
"judul_kode" : "IND"
}
{
"no_izin" : "26A/E/IU/PMA/D8FD",
"kode" : "13121",
"judul_kode" : "IND B"
}
}
As you could see, the proyek have been nested, so the duplicate proyek will be grouped. Now i have to group the same value of no_izin so it will double nested array like expected result below.
{
"nama_perusahaan" : "JASA FERRIE",
"proyek" :
[{
"no_izin" : "26A/E/IU/PMA/D8FD",
"kode_list":[
{
"kode" : "14302",
"judul_kode" : "IND"
},
{
"kode" : "13121",
"judul_kode" : "IND B"
}]
}]
}
I tried to use this query:
SELECT json_build_object(
'nama_perusahaan',"a"."nama_perusahaan",
'proyek', json_agg(
json_build_object(
'no_izin',"b"."no_izin",
'kode_list',json_agg(
json_build_object(
'kode',c.kode,
'judul_kode',d.judul
)
)
)
)
)
FROM "t_pencabutan" "a"
LEFT JOIN "t_pencabutan_non" "b" ON "a"."id_pencabutan" = "b"."id_pencabutan"
LEFT JOIN "t_pencabutan_non_b" "c" ON "b"."no_izin" = "c"."no_izin"
LEFT JOIN "t_pencabutan_non_c" "d" ON "c"."id_proyek" = "d"."id_proyek"
GROUP BY "a"."nama_perusahaan", b.no_izin
but it didnt work, it gives ERROR: aggregate function calls cannot be nested LINE 6:'kode_list',json_agg(.
What could go wrong with my code ?

Disclaimer: It is very hard for us to construct a query without knowing the input data and table structure and have to handle a language we don't know. Please try to minimize your further questions (e.g. For your question it is not relevant that you need to join some tables before converting the result into a JSON output), create examples in English (handling foreign languages makes the code looking confusing and leads to spelling errors, so the probably right idea fails on writing the words wrong) and add the input data! This would help you as well: You would get an answer faster and the chance of code mistakes is much more less (because now without the data we cannot create a runnable example to check our ideas).
Creating a nested JSON structure is only possible doing it from the innermost nested object to the outermost one. So first you have to create the no_izin array in a subquery. This can be used to create the proyek object:
SELECT
json_build_object(
'nama_perusahaan',"s"."nama_perusahaan",
'proyek', json_agg(no_izin)
)
)
FROM (
SELECT
"a"."nama_perusahaan",
json_build_object(
'no_izin',
"b"."no_izin",
'kode_list',
json_agg(
json_build_object(
'kode',c.kode,
'judul_kode',d.judul
)
)
) AS no_izin
FROM "t_pencabutan" "a"
LEFT JOIN "t_pencabutan_non" "b" ON "a"."id_pencabutan" = "b"."id_pencabutan"
LEFT JOIN "t_pencabutan_non_b" "c" ON "b"."no_izin" = "c"."no_izin"
LEFT JOIN "t_pencabutan_non_c" "d" ON "c"."id_proyek" = "d"."id_proyek"
GROUP BY "c"."id_proyek", "a"."nama_perusahaan"
) AS s
GROUP BY "s"."nama_perusahaan"

Related

PostgresSQL Cannot order by json_build_object result (got from subquery)

I have a SQL query. And I'd like to order by json field:
SELECT "ReviewPacksModel"."id",
(SELECT json_build_object(
'totalIssues', COUNT(*),
'openIssues', COUNT(*) filter (where "issues".status = 'Open'),
'fixedIssues', COUNT(*) filter (where "issues".status = 'Fixed')
)
FROM "development"."issues" "issues"
JOIN "development"."reviewTasks" as "rt" ON "issues"."reviewTaskId" = "rt".id
WHERE "issues"."isDeleted" = false
AND "rt"."reviewPackId" = "ReviewPacksModel"."id"
) as "issueStatistic"
FROM "development"."reviewPacks" AS "ReviewPacksModel"
WHERE "ReviewPacksModel"."projectId" = '2'
AND "ReviewPacksModel"."mode" IN ('Default', 'Live')
AND "ReviewPacksModel"."status" IN ('Draft', 'Active')
ORDER BY "issueStatistic"->'totalIssues'
LIMIT 50;
And I get an error:
ERROR: column "issueStatistic" does not exist
If I try to order by issueStatistic without ->'totalIssues', I will get another error:
ERROR: could not identify an equality operator for type json
It seems like I cannot extract field from the JSON.
I also tested it with this query:
SELECT "ReviewPacksModel".*,
(SELECT Count(*)
FROM "development"."issues" "issues"
JOIN "development"."reviewTasks" as "rt" ON "issues"."reviewTaskId" = "rt".id
WHERE "issues"."isDeleted" = false
AND "rt"."reviewPackId" = "ReviewPacksModel"."id"
) AS "issueStatistic"
FROM "development"."reviewPacks" AS "ReviewPacksModel"
WHERE "ReviewPacksModel"."projectId" = '2'
AND "ReviewPacksModel"."mode" IN ('Default', 'Live')
AND "ReviewPacksModel"."status" IN ('Draft', 'Active')
ORDER BY "issueStatistic"
LIMIT 50;
And it works without any problems. But I cannot use it cause it's not possible to return multiple columns from a subquery. I also tried to use alternatives like array_agg, json_agg, etc. but it doesn't help.
I know that it's possible to make multiple queries, but they aren't super fast and for me it's better to use json_build_object.
You can use aliases in ORDER BY, but you cannot use expressions involving aliases.
You'll have to use a subquery.
Also, you cannot order on a json. You'll have to convert it to a sortable data type. In the following I assume it is a number; you'll have to adapt the query if my assumption is wrong.
SELECT id, "issueStatistic"
FROM (SELECT "ReviewPacksModel"."id",
(SELECT json_build_object(
'totalIssues', COUNT(*),
'openIssues', COUNT(*) filter (where "issues".status = 'Open'),
'fixedIssues', COUNT(*) filter (where "issues".status = 'Fixed')
)
FROM "development"."issues" "issues"
JOIN "development"."reviewTasks" as "rt" ON "issues"."reviewTaskId" = "rt".id
WHERE "issues"."isDeleted" = false
AND "rt"."reviewPackId" = "ReviewPacksModel"."id"
) as "issueStatistic"
FROM "development"."reviewPacks" AS "ReviewPacksModel"
WHERE "ReviewPacksModel"."projectId" = '2'
AND "ReviewPacksModel"."mode" IN ('Default', 'Live')
AND "ReviewPacksModel"."status" IN ('Draft', 'Active')
) AS subq
ORDER BY CAST ("issueStatistic"->>'totalIssues' AS bigint)
LIMIT 50;
demos:db<>fiddle
You cannot order by type json because, simply spoken, there is no definition on how to handle different types included in the JSON object. But this gives you a type json:
"issueStatistic"->'totalIssues'
However, type jsonb can be ordered. So, instead of creating a type json object, you should use jsonb_build_object() to create a type jsonb object.
Alternatively you could cast your expression into type int (mind the ->> operator instead of your -> which casts the output into type text which can be directly cast into type int):
("issueStatistic"->>'totalIssues')::int
Edit:
As Laurenz mentioned correctly, to use aliases you need a separate subquery:
SELECT
*
FROM (
-- <your query minus ORDER clause>
) s
ORDER BY "issueStatistic"->'totalIssues'

Jpa Specification ORA-01791: not a SELECTed expression

I have a problem with Jpa Specification.
My specification looks like this:
public static Specification<PersonView> getByFilter(PersonViewFilter filter) {
return (root, query, criteriaBuilder) -> {
List<Predicate> predicates = new ArrayList<>();
SetJoin<PersonView, PersonDetailsView> personDetailsJoin =
root.join(PersonView_.personDetails);
Path<String> namePath = personDetailsJoin.get(PersonDetailsView_.name);
Path<String> surnamePath = personDetailsJoin.get(PersonDetailsView_.surname);
predicates.add(namePredicate(personDetailsJoin, criteriaBuilder, filter.getName()));
predicates.add(surnamePredicate(personDetailsJoin, criteriaBuilder, filter.getSurname()));
predicates.add(peselPredicate(personDetailsJoin, criteriaBuilder, filter.getPesel()));
predicates.removeAll(Collections.singleton(EMPTY_PREDICATE));
query.orderBy(List.of(criteriaBuilder.asc(namePath), criteriaBuilder.asc(surnamePath))).distinct(true);
return criteriaBuilder.and(predicates.toArray(new Predicate[predicates.size()]));
};
}
When I try search results I can see an error:
ORA-01791: not a SELECTed expression
The problem is related to generated sql by JpaSpecificationExecutor.
It looks like this:
select
*
from
( select distinct pv.person_id
from
personview pv
inner join
persondetailsview pdv on pv.person_id=pdv.person_id
where
1=1
order by
pdv.name asc,
pdv.surname asc )
where
rownum <= ?
In Select clausule should be added name and surname then it would work, but I don't know how to do this in Specification. Without distinct query works, but i don't have duplicates.
Please for help.
From the error ORA-01791: not a SELECTed expression, you want to add the field to the select or remove the distinct with this :
query.distinct(false);
If you want to keep the distinct, have a look there :
https://stackoverflow.com/a/53549880/2641426

Asking for help on correct way to us SQL with CTE to create JSON_OBJECT

The requested JSON needs to be in this form:
{
"header": {
"InstanceName": "US"
},
"erpReferenceData": {
"erpReferences": [
{
"ServiceID": "fb16e421-792b-4e9c-935b-3cea04a84507",
"ERPReferenceID": "J0000755"
},
{
"ServiceID": "7d13d907-0932-44c0-ad81-600c9b97b6e5",
"ERPReferenceID": "J0000756"
}
]
}
}
The program that I created looks like this:
dcl-s OutFile sqltype(dbclob_file);
exec sql
With x as (
select json_object(
'InstanceName' : trim(Cntry) ) objHeader
from xmlhdr
where cntry = 'US'),
y as (
select json_object(
'ServiceID' VALUE S.ServiceID,
'ERPReferenceID' VALUE I.RefCod) oOjRef
FROM IMH I
INNER JOIN GUIDS G ON G.REFCOD = I.REFCOD
INNER JOIN SERV S ON S.GUID = G.GUID
WHERE G.XMLTYPE = 'Service')
VALUES (
select json_object('header' : objHeader Format json ,
'erpReferenceData' : json_object(
'erpReferences' VALUE
JSON_ARRAYAGG(
ObjRef Format json)))
from x
LEFT OUTER JOIN y ON 1=1
Group by objHeader)
INTO :OutFile;
This is the compile error I get:
SQL0122: Position 41 Column OBJHEADER or expression in SELECT list not valid.
I am asking if this is the correct way to create this SQL statement, is there a better easier way? Any idea how to rewrite the SQL statement to make it work correctly?
The key with generating JSON or XML for that matter is to start from the inside and work your way out.
(I've simplified the raw data into just a test table...)
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
select * from elm;
Now add the next layer as a CTE the builds on the first CTE..
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
select * from arr;
And the next layer...
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
, erpReferences (refs) as ( select json_object
('erpReferences' value arrDta )
from arr)
select *
from erpReferences;
Nice thing about building with CTE's is at each step, you can see the results so far...
You can actually always go back and stick a Select * from CTE; in the middle to see what you have at some point.
Note that I'm building this in Run SQL Scripts. Once you have the statement complete, you can embed it in your RPG program.

bigquery joins on nested repeated

I am having trouble joining on a repeated nested field while still preserving the original row structure in BigQuery.
For my example I'll call the two tables being joined A and B.
Records in table A look something like:
{
"url":"some url",
"repeated_nested": [
{"key":"some key","property":"some property"}
]
}
and records in table B look something like:
{
"key":"some key",
"property2": "another property"
}
I am hoping to find a way to join this data together to generate a row that looks like:
{
"url":"some url",
"repeated_nested": [
{
"key":"some key",
"property":"some property",
"property2":"another property"
}
]
}
The very first query I tried was:
SELECT
url, repeated_nested.key, repeated_nested.property, repeated_nested.property2
FROM A
AS lefttable
LEFT OUTER JOIN B
AS righttable
ON lefttable.key=righttable.key
This doesn't work because BQ can't join on repeated nested fields. There is not a unique identifier for each row. If I were to do a FLATTEN on repeated_nested then I'm not sure how to get the original row put back together correctly.
The data is such that a url will always have the same repeated_nested field with it. Because of that, I was able to make a workaround using a UDF to sort of roll up this repeated nested object into a JSON string and then unroll it again:
SELECT url, repeated_nested.key, repeated_nested.property, repeated_nested.property2
FROM
JS(
(
SELECT basetable.url as url, repeated_nested
FROM A as basetable
LEFT JOIN (
SELECT url, CONCAT("[", GROUP_CONCAT_UNQUOTED(repeated_nested_json, ","), "]") as repeated_nested
FROM
(
SELECT
url,
CONCAT(
'{"key": "', repeated_nested.key, '",',
' "property": "', repeated_nested.property, '",',
' "property2": "', mapping_table.property2, '"',
'}'
)
) as repeated_nested_json
FROM (
SELECT
url, repeated_nested.key, repeated_nested.property
FROM A
GROUP BY url, repeated_nested.key, repeated_nested.property
) as urltable
LEFT OUTER JOIN [SDF.alchemy_to_ric]
AS mapping_table
ON urltable.repeated_nested.key=mapping_table.key
)
GROUP BY url
) as companytable
ON basetable.url = urltable.url
),
// input columns:
url, repeated_nested_json,
// output schema:
"[{'name': 'url', 'type': 'string'},
{'name': 'repeated_nested_json', 'type': 'RECORD', 'mode':'REPEATED', 'fields':
[ { 'name': 'key', 'type':'string' },
{ 'name': 'property', 'type':'string' },
{ 'name': 'property2', 'type':'string' }]
}]",
// UDF:
"function(row, emit) {
parsed_repeated_nested = [];
try {
if ( row.repeated_nested_json != null ) {
parsed_repeated_nested = JSON.parse(row.repeated_nested_json);
}
} catch (ex) { }
emit({
url: row.url,
repeated_nested: parsed_repeated_nested
});
}"
)
This solution works fine for small tables. But the real life tables I'm working with have many more columns than in my example above. When there are other fields in addition to url and repeated_nested_json they all have to be passed through the UDF. When I work with tables that are around the 50 gb range everything is fine. But when I apply the UDF and query to tables that are 500-1000 gb, I get an Internal Server Error from BQ.
In the end I just need all of the data in new line delimited JSON format in GCS. As a last ditch effort I tried concatenating all of the fields into a JSON string (so that I only had 1 column) in the hopes that I could export it as CSV and have what I need. However, the export process escaped the double quotes and adds double quotes around the JSON string. According to the BQ docs on jobs (https://cloud.google.com/bigquery/docs/reference/v2/jobs) there is a property configuration.query.tableDefinitions.(key).csvOptions.quote that could help me. But I can't figure out how to make it work.
Does anybody have advice on how they have dealt with this sort of situation?
I have never had to do this, but you should be able to use flatten, then join, then use nest to get repeated fields again.
The docs state that BigQuery always flattens query results, but that appears to be false: you can choose to not have results flattened if you set a destination table. You should then be able to export that table as JSON to Storage.
See also this answer for how to get nest to work.
#AndrewBackes - we rolled out some fixes for UDF memory-related issues this week; there are some details on the root cause here https://stackoverflow.com/a/36275462/5265394 and here https://stackoverflow.com/a/35563562/5265394.
The UDF version of your query is now working for me; could you verify on your side?

Entity framework join with a subquery via linq syntax

I'm trying to translate a sql query in linq sintax, but I'm having big trouble
This is my query in SQL
select * FROM dbo.ITEM item inner join
(
select SUM([QTA_PRIMARY]) QtaTotale,
TRADE_NUM,
ORDER_NUM,
ITEM_NUM
from [dbo].[LOTTI]
where FLAG_ATTIVO=1
group by [TRADE_NUM],[ORDER_NUM],[ITEM_NUM]
)
TotQtaLottiGroupByToi
on item.TRADE_NUM = TotQtaLottiGroupByToi.TRADE_NUM
and item.ORDER_NUM = TotQtaLottiGroupByToi.ORDER_NUM
and item.ITEM_NUM = TotQtaLottiGroupByToi.ITEM_NUM
where item.PRIMARY_QTA > TotQtaLottiGroupByToi.QtaTotale
and item.FLAG_ATTIVO=1
How can I translate into linq sintax?
This approach doesn't work
var res= from i in context.ITEM
join d in
(
from l in context.LOTTI
group l by new { l.TRADE_NUM, l.ORDER_NUM, l.ITEM_NUM } into g
select new TotQtaByTOI()
{
TradeNum = g.Key.TRADE_NUM,
OrderNum = g.Key.ORDER_NUM,
ItemNum = g.Key.ITEM_NUM,
QtaTotale = g.Sum(oi => oi.QTA_PRIMARY)
}
)
on new { i.TRADE_NUM, i.ORDER_NUM, i.ITEM_NUM} equals new { d.TradeNum, d.OrderNum, d.ItemNum }
I get this error
The type of one of the expressions in the join cluase is incorrect. Type inference failed in the call to 'Join'
Can you help me with this query?
Thank you!
The problem is Anonymous Type comparison. You need to specify matching property names for your two anonymous type's properties (e.g. first, second, third)
I tried it out, here's an example: http://pastebin.com/hRj0CMzs