Return JSON array with postgres functions - sql

I'm new to Postgres functions.
I'm trying to return part of JSON response similar to:
"ids":[
"9f076580-b5f5-4e73-af08-54d5fc4b87c0",
"bd34cfad-53c7-4443-bf48-280e34d76881"
]
This ids is stored in table unit and I query them as a part of subquery and then transform into JSON with the next query
SELECT coalesce(json_agg(row_to_json(wgc)), '[]'::json)
FROM (
SELECT
(
SELECT COALESCE(json_agg(row_to_json(ids)), '[]'::json)
FROM (SELECT json_agg(l.ids) as "id"
FROM unit
) as ids
) as "ids",
......
FROM companies c
) AS wgc;
The problem is that this query gives me extract object which I want to omit.
"ids":[
{
"id":[
"9f076580-b5f5-4e73-af08-54d5fc4b87c0",
"bd34cfad-53c7-4443-bf48-280e34d76881"
]
}
]
How can omit this "id" object??

It's a bit hard to tell how your table looks like, but something like this should work:
select jsonb_build_object('ids', coalesce(jsonb_agg(id), '[]'::jsonb))
from unit
I think you are overcomplicating things. You only need a single nesting level to get the IDs as an array. There is no need to use row_to_json on the array of IDs. The outer row_to_json() will properly take care of that.
SELECT coalesce(json_agg(row_to_json(wgc)), '[]'::json)
FROM (
SELECT (SELECT json_agg(l.ids) FROM unit ) as ids
....
FROM companies c
) AS wgc;
The fact that the select ... from unit is not a co-related sub-query is a bit suspicious though. This means you will get the same array for each row in the companies table. I would have expected something like (select .. from unit u where u.??? = c.???) as ids

I don't fully understand your question. This code:
SELECT (
SELECT COALESCE(json_agg(row_to_json(ids)), '[]'::json)
FROM (SELECT json_agg(l.ids) as "id"
FROM unit l
) as ids
) as "ids"
Returns:
[{"id":["9f076580-b5f5-4e73-af08-54d5fc4b87c0", "bd34cfad-53c7-4443-bf48-280e34d76881as"]}]
which seems to be what you want.
Here is a db<>fiddle.
Something else in your query is returning a JSON object that has ids as a field. You seem to want to construct the object you want.

Related

Select rows according to another table with a comma-separated list of items

Have a table test.
select b from test
b is a text column and contains Apartment,Residential
The other table is a parcel table with a classification column. I'd like to use test.b to select the right classifications in the parcels table.
select * from classi where classification in(select b from test)
this returns no rows
select * from classi where classification =any(select '{'||b||'}' from test)
same story with this one
I may make a function to loop through the b column but I'm trying to find an easier solution
Test case:
create table classi as
select 'Residential'::text as classification
union
select 'Apartment'::text as classification
union
select 'Commercial'::text as classification;
create table test as
select 'Apartment,Residential'::text as b;
You don't actually need to unnest the array:
SELECT c.*
FROM classi c
JOIN test t ON c.classification = ANY (string_to_array(t.b, ','));
db<>fiddle here
The problem is that = ANY takes a set or an array, and IN takes a set or a list, and your ambiguous attempts resulted in Postgres picking the wrong variant. My formulation makes Postgres expect an array as it should.
For a detailed explanation see:
How to match elements in an array of composite type?
IN vs ANY operator in PostgreSQL
Note that my query also works for multiple rows in table test. Your demo only shows a single row, which is a corner case for a table ...
But also note that multiple rows in test may produce (additional) duplicates. You'd have to fold duplicates or switch to a different query style to get de-duplicate. Like:
SELECT c.*
FROM classi c
WHERE EXISTS (
SELECT FROM test t
WHERE c.classification = ANY (string_to_array(t.b, ','))
);
This prevents duplication from elements within a single test.b, as well as from across multiple test.b. EXISTS returns a single row from classi per definition.
The most efficient query style depends on the complete picture.
You need to first split b into an array and then get the rows. A couple of alternatives:
select * from nj.parcels p where classification = any(select unnest(string_to_array(b, ',')) from test)
select p.* from nj.parcels p
INNER JOIN (select unnest(string_to_array(b, ',')) from test) t(classification) ON t.classification = p.classification;
Essential to both is the unnest surrounding string_to_array.

How to select a column whose name is a value in another column in POSTGRESQL?

I know this isn't valid SQL, but I'd like to do something like:
SELECT items.{SELECT items.preferred_column}
To elaborate, to achieve what I'm trying to achieve, I could write a long case when statement:
SELECT
CASE WHEN items.preferred_column = "column_a" THEN items.column_a
CASE WHEN items.preferred_column = "column_b" THEN items.column_b
CASE WHEN items.preferred_column = "column_c" THEN items.column_c
... and so on...
But that seems wrong. I would prefer to write a query that looks at the value of items.preferred_column and loads that column.
Is this possible?
My use case involves an Active Record (the ORM for Rails) query, which limits me. I'm not able to use "INTO" for example.
Doing this without creating a SQL function would preferred, though if it's not possible without creating a SQL function that would be good to know.
Thanks in advance for lending your expertise!
You can try transforming the table rows with row_to_json() and then using json_each(), you can join the resultant "key" field on the preferred_column:
WITH CTE AS (
SELECT
row_to_json(Z.*)::jsonb as rcr,
row_number() over(partition by null order by <whatever comparator clause>) as rn,
Z.*
FROM items Z)
SELECT b.value, a.*
FROM CTE a, jsonb_each(rcr) b, CTE c
WHERE c.rn=a.rn AND b.key = ( c.preferred_column )
Note that this essentially operates as a quasi-pivot, so you'll need to maintain an index (the row_number invocation) to self-join the table when extracting the appropriate key-value pairs from jsonb_each's set-return. Casting to jsonb will be helpful in that the binary form will alphabetize the key-value pairs by key order within the object itself.
If you need to get the resultant value as a text string instead of a json primitive, you can do
b.value #>>'{}'
instead of using jsonb_each_text(), which will preserve any json columns.

Unnest string array and transpose in Big query

I'm using Bigquery, I've a table A with string array and I need to cast to int64/string ( if possible ) so I can join with table B which of Int64/string
The main ask here is:
I've a table A, where I've string array mapped with Ref ID as below:
I'm trying to get unnest and my desired output should be as below.
I did tried below script:
SELECT a0_string_arrat,
ref_id
FROM TableA AS t,
t.String_array AS a0_String_array
But the challenge with above script is, I've close to 1000 Ref IDs, but my output is resulting only 100
If I try the below, I'm able to get all 1000 rows.
SELECT string_array,
ref_id
FROM TableA
The end goal is to I need to unnest and cast to Int64/string. The above script is not working for my need. can someone help on this.
You can use CROSS JOIN + UNNEST() in order to get the values from the array attributed to each ref_id:
select
ref_id,
unnested_numbers
from tablea
cross join unnest(string_array) as unnested_numbers
order by 2, 1
This should give you the desired output that you specified.

How to efficiently select records matching substring in another table using BigQuery?

I have a table of several million strings that I want to match against a table of about twenty thousand strings like this:
#standardSQL
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Unfortunately this is taking an awful long time.
Considering that the fragment table is only 20k records, can I load it into a JavaScript array using a UDF and match it that way? I'm trying to figure out how to this right now but perhaps there's already some magic I could do here to make this faster. I tried a CROSS JOIN and got resource exceeded fairly quickly. I've also tried using EXISTS but I can't reference the record.name inside that subquery's WHERE without getting an error.
Example using Public Data
This seems to reflect about the same amount of data ...
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT LOWER(name) AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Below is for BigQuery Standard SQL
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT DISTINCT LOWER(name) AS name
FROM `bigquery-public-data.usa_names.usa_1910_current`
), temp_record AS (
SELECT record, TO_JSON_STRING(record) id, name, item
FROM record, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
), temp_fragment AS (
SELECT name, item FROM fragment, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
)
SELECT AS VALUE ANY_VALUE(record) FROM (
SELECT ANY_VALUE(record) record, id, r.name name, f.name fragment_name
FROM temp_record r
JOIN temp_fragment f
USING(item)
GROUP BY id, name, fragment_name
)
WHERE name LIKE CONCAT('%', fragment_name, '%')
GROUP BY id
above was completed in 375 seconds, while original query is still running at 2740 seconds and keep running, so I will not even wait for it to complete
Mikhail's answer appears to be faster - but lets have one that doesn't need to SPLIT nor separate the text into words.
First, compute a regular expression with all the words to be searched:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT FORMAT('(%s)',STRING_AGG(name,'|'))
FROM fragment
Now you can take that resulting string, and use it in a REGEX ignoring case:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), largestring AS (
SELECT '(?i)(mary|margaret|helen|more_names|more_names|more_names|josniel|khaiden|sergi)'
)
SELECT record.* FROM `record`
WHERE REGEXP_CONTAINS(record.name, (SELECT * FROM largestring))
(~510 seconds)
As eluded to in my question, I worked on a version using a JavaScript UDF which solves this albeit in a slower way than the answer I accepted. For completeness, I'm posting it here because perhaps someone (like myself in the future) may find it useful.
CREATE TEMPORARY FUNCTION CONTAINS_ANY(str STRING, fragments ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
for (var i in fragments) {
if (str.indexOf(fragments[i]) >= 0) {
return fragments[i];
}
}
return null;
""";
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
WHERE text IS NOT NULL
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE name IS NOT NULL
GROUP BY name
), fragment_array AS (
SELECT ARRAY_AGG(name) AS names, COUNT(*) AS count
FROM fragment
GROUP BY LENGTH(name)
), records_with_fragments AS (
SELECT record.name,
CONTAINS_ANY(record.name, fragment_array.names)
AS fragment_name
FROM record INNER JOIN fragment_array
ON CONTAINS_ANY(name, fragment_array.names) IS NOT NULL
)
SELECT * EXCEPT(rownum) FROM (
SELECT record.name,
records_with_fragments.fragment_name,
ROW_NUMBER() OVER (PARTITION BY record.name) AS rownum
FROM record
INNER JOIN records_with_fragments
ON records_with_fragments.name = record.name
AND records_with_fragments.fragment_name IS NOT NULL
) WHERE rownum = 1
The idea is that the list of fragments is relatively small enough that it can be processed in an array, similar to Felipe's answer using regular expressions. The first thing I do is create a fragment_array table which is grouped by the fragment lengths ... a cheap way of preventing an over-sized array which I found can cause UDF timeouts.
Next I create a table called records_with_fragments that joins those arrays to the original records, finding only those which contain a matching fragment using the JavaScript UDF CONTAINS_ANY(). This will result in a table containing some duplicates since one record may match multiple fragments.
The final SELECT then pulls in the original record table, joins to records_with_fragments to determine which fragment matched, and also uses the ROW_NUMBER() function to prevent duplicates, e.g. only showing the first row of each record as uniquely identified by its name.
Now, the reason I do the join in the final query is because in my actual data there are more fields I want besides just the string being matched. Earlier on in my actual data I create a table of DISTINCT strings which then later need to be re-joined.
Voila! Not the most elegant but it gets the job done.

Array field in postgres, need to do self-join with results

I have a table that looks like this:
stuff
id integer
content text
score double
children[] (an array of id's from this same table)
I'd like to run a query that selects all the children for a given id, and then right away gets the full row for all these children, sorted by score.
Any suggestions on the best way to do this? I've looked into WITH RECURSIVE but I'm not sure that's workable. Tried posting at postgresql SE with no luck.
The following query will find all rows corresponding to the children of the object with id 14:
SELECT *
FROM unnest((SELECT children FROM stuff WHERE id=14)) t(id)
JOIN stuff USING (id)
ORDER BY score;
This works by finding the children of 14 as array first, then we convert it into a table using the unnest function, and then we join with stuff to find all rows with the given ids.
The ANY construct in the join condition would be simplest:
SELECT c.*
FROM stuff p
JOIN stuff c ON id = ANY (p.children)
WHERE p.id = 14
ORDER BY c.score;
Doesn't matter for the query whether the array of children IDs is in the same table or different one. You just need table aliases here to be unambiguous.
Related:
Check if value exists in Postgres array
Similar solution:
With Postgres you can use a recursive common table expression:
with recursive rel_tree as (
select rel_id, rel_name, rel_parent, 1 as level, array[rel_id] as path_info
from relations
where rel_parent is null
union all
select c.rel_id, rpad(' ', p.level * 2) || c.rel_name, c.rel_parent, p.level + 1, p.path_info||c.rel_id
from relations c
join rel_tree p on c.rel_parent = p.rel_id
)
select rel_id, rel_name
from rel_tree
order by path_info;
Ref: Postgresql query for getting n-level parent-child relation stored in a single table