Here is a lateral query which is part of a bigger query:
lateral (
select
array_agg(
sh.dogsfilters
) filter (
where
sh.dogsfilters is not null
) as dependencyOfFoods
from
shelter sh
where
sh.shelterid = ${shelterid}
) filtersOfAnimals,
the problem is with array_agg method as it fails when it has arrays with different lengths like this ("[[7, 9], [7, 9, 8], [8]]")!
The problem is easy to solve using json_agg but later in the query there's a any check like this:
...
where
cd.dogsid = any(filtersOfAnimals.dependencyOfFoods)
and
...
...
But as any will not work on json data which is prepared using json_agg so I can't use it instead of array_agg!
What might be a better solution to this?
Unnest the arrays and re-aggregate:
lateral
(select array_agg(dogfilter) filter (where dogfilter is not null) as dependencyOfFoods
from shelter sh cross join
unnest(sh.dogsfilters) dogfilter
where sh.shelterid = ${shelterid}
) filtersOfAnimals,
It is interesting that Postgres doesn't have a function that does this. BigQuery offers array_concat_agg() which does exactly what you want.
It is ugly, but it works:
regexp_split_to_array(
array_to_string(
array_agg(
array_to_string(value,',')
),','
),',')::integer[]
I don't know if this could be a valid solution from the the performance point of view ...
In PostgreSQL, you can define your own aggregates. I think that this one does what you want:
create function array_concat_agg_tran(anyarray,anyarray) returns anyarray language sql
as $$ select $1||$2 $$;
create aggregate array_concat_agg(anyarray) (sfunc=array_concat_agg_tran, stype=anyarray);
Then:
select array_concat_agg(x) from (values (ARRAY[1,2]),(ARRAY[3,4,5])) f(x);
array_concat_agg
------------------
{1,2,3,4,5}
With a bit more work, you could make it parallelizable as well.
Related
Everyone,
I am new to Trino, and I find no function in Trino like regexp_split_to_table() in GreenPlum or PostgreSQL. How can I approach that?
select regexp_split_to_table( sensor_type, E',+' ) as type from hydrology.device_info;
There is regexp_split(string, pattern) function, returns array, you can unnest it.
Demo:
select s.str as original_str, u.str as exploded_value
from
(select 'one,two,,,three' as str)s
cross join unnest(regexp_split(s.str,',+')) as u(str)
Result:
original_str exploded_value
one,two,,,three one
one,two,,,three two
one,two,,,three three
How to convert XML SQL Coding to JSON SQL Coding.
Example:
SELECT XMLELEMENT(NAME "ORDER", XMLFOREST(PURCHASE_ORDER AS OD_NO)) AS "XMLELEMENT" FROM
TBL_SALES
Now how to convert this XMLELEMENT & XMLFOREST into JSON functions. Please help me. Do we have equivalent XMLELEMENT/XMLFOREST in JSON functions.
xml:
<order><OD_NO>4524286167</OD_NO><order_date>2020-06-15</order_date><sales_office>CH</sales_office></order>
json:
{ "OD_NO": "4524286167", "order_date": "2020-06-15", "sales_office": "CH" }
Here row_to_json will do the thing.
You can write your query like below:
select row_to_json(x) from
(select purchase_order "OD_NO", order_date, sales_office from tbl_sales ) x
If You want to aggregate all the results in a single JSON Array use JSON_AGG with row_to_json:
select json_agg(row_to_json(x)) from
(select purchase_order "OD_NO", order_date, sales_office from tbl_sales ) x
DEMO
These Postgresql functions
json_build_object(VARIADIC "any") and
jsonb_build_object(VARIADIC "any")
are semantically close to XMLELEMENT and very convenient for 'embroidering' of whatever complex JSON you may need. Your query might look like this:
select json_build_object
(
'OD_NO', order_number, -- or whatever the name of the column is
'order_date', order_date,
'sales_office', sales_office
) as json_order
from tbl_sales;
I do not think that there is a XMLFOREST equivalent however.
I'm writing a unit test for a method with the following sql
WITH temptab(
i__id , i__name, i__effective, i__expires, i__lefttag, i__righttag,
hier_id, hier_dim_id, parent_item_id, parent_hier_id, parent_dim_id,
ancestor, h__id, h__name, h__level, h__effective, h__expires, rec_lvl)
AS (
SELECT
item.id as i__id,
item.name as i__name,
item.effectivets as i__effective,
item.expirests as i__expires,
item.lefttag as i__lefttag,
item.righttag as i__righttag,
hier_id, hier_dim_id,
parent_item_id,
parent_hier_id,
parent_dim_id, 1 as ancestor,
hier.id as h__id, hier.name as h__name,
hier.level as h__level, hier.effectivets as h__effective,
hier.expirests as h__expires, 1 as rec_lvl FROM metro.item item,
metro.hierarchy hier WHERE item.id = 'DI' AND hier_id = '69' AND hier_dim_id= '36' AND hier.id =item.hier_id
)
SELECT
i__id, i__name, i__effective, i__expires, i__lefttag,
i__righttag, hier_id, hier_dim_id, parent_item_id,
parent_hier_id, parent_dim_id, ancestor,
h__id, h__name, h__level, h__effective, h__expires
FROM temptab
This query returns empty dataset, but I expect 1 row.
The data are correct, as similar simple query without with clause works fine.
I investigated the problem and I've found the
Sub Query with WITH-CLAUSE in H2DB
but that solution did not help.
So, does anyone know how H2 supports with clause?
Thanks in advance for your time.
According to the following :h2 database grammar
Looks like WITH clause is not supported in H2 database, except of experimental support for recursive queries: h2 recursive queries
Its supported now http://www.h2database.com/html/grammar.html
For non-recursive queries also.
Is there an equivalent way to do the following SQL command with Django's QuerySet API?
select id, childid from mysite_nodetochild
where childid NOT IN (Select "Nodeid" from mysite_nodetochild)
I would prefer not to use raw sql if possible but I can't get a clean working version using Django's Queryset.
Try
nodetochild.objects.exclude(childid=nodetochild.objects.values_list('Nodeid', flat=True)).only('id', 'childid')
This should evaluate to, more or less:
SELECT "mysite_nodetochild"."id", "mysite_nodetochild"."childid" FROM "mysite_nodetochild" WHERE NOT ("mysite_nodetochild"."childid" = (SELECT U0."nodeid" FROM "mysite_nodetochild" U0))
Or, if you need the IN condition:
nodetochild.objects.exclude(childid__in=nodetochild.objects.values_list('Nodeid', flat=True)).only('id', 'childid')
Would evaluate to:
SELECT "mysite_nodetochild"."id", "mysite_nodetochild"."childid" FROM "mysite_nodetochild" WHERE NOT ("mysite_nodetochild"."childid" IN (SELECT U0."nodeid" FROM "mysite_nodetochild" U0))
Is there an easy way to do URL decoding within the BigQuery query language? I'm working with a table that has a column containing URL-encoded strings in some values. For example:
http://xyz.com/example.php?url=http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345&foo=bar&abc=xyz
I extract the "url" parameter like so:
SELECT REGEXP_EXTRACT(column_name, "url=([^&]+)") as url
from [mydataset.mytable]
which gives me:
http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345
What I would like to do is something like:
SELECT URL_DECODE(REGEXP_EXTRACT(column_name, "url=([^&]+)")) as url
from [mydataset.mytable]
thereby returning:
http://www.example.com/hello?v=12345
I would like to avoid using multiple REGEXP_REPLACE() statements (replacing %20, %3A, etc...) if possible.
Ideas?
Below is built on top of #sigpwned answer, but slightly refactored and wrapped with SQL UDF (which has no limitation that JS UDF has so safe to use)
#standardSQL
CREATE TEMP FUNCTION URLDECODE(url STRING) AS ((
SELECT SAFE_CONVERT_BYTES_TO_STRING(
ARRAY_TO_STRING(ARRAY_AGG(
IF(STARTS_WITH(y, '%'), FROM_HEX(SUBSTR(y, 2)), CAST(y AS BYTES)) ORDER BY i
), b''))
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r"%[0-9a-fA-F]{2}|[^%]+")) AS y WITH OFFSET AS i
));
SELECT
column_name,
URLDECODE(REGEXP_EXTRACT(column_name, "url=([^&]+)")) AS url
FROM `project.dataset.table`
can be tested with example from question as below
#standardSQL
CREATE TEMP FUNCTION URLDECODE(url STRING) AS ((
SELECT SAFE_CONVERT_BYTES_TO_STRING(
ARRAY_TO_STRING(ARRAY_AGG(
IF(STARTS_WITH(y, '%'), FROM_HEX(SUBSTR(y, 2)), CAST(y AS BYTES)) ORDER BY i
), b''))
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r"%[0-9a-fA-F]{2}|[^%]+")) AS y WITH OFFSET AS i
));
WITH `project.dataset.table` AS (
SELECT 'http://example.com/example.php?url=http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345&foo=bar&abc=xyz' column_name
)
SELECT
URLDECODE(REGEXP_EXTRACT(column_name, "url=([^&]+)")) AS url,
column_name
FROM `project.dataset.table`
with result
Row url column_name
1 http://www.example.com/hello?v=12345 http://example.com/example.php?url=http%3A%2F%2Fwww.example.com%2Fhello%3Fv%3D12345&foo=bar&abc=xyz
Update with further quite optimized SQL UDF
CREATE TEMP FUNCTION URLDECODE(url STRING) AS ((
SELECT STRING_AGG(
IF(REGEXP_CONTAINS(y, r'^%[0-9a-fA-F]{2}'),
SAFE_CONVERT_BYTES_TO_STRING(FROM_HEX(REPLACE(y, '%', ''))), y), ''
ORDER BY i
)
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r"%[0-9a-fA-F]{2}(?:%[0-9a-fA-F]{2})*|[^%]+")) y
WITH OFFSET AS i
));
It's a good feature request, but currently there is no built in BigQuery function that provides URL decoding.
One more workaround is using a user-defined function.
#standardSQL
CREATE TEMPORARY FUNCTION URL_DECODE(enc STRING)
RETURNS STRING
LANGUAGE js AS """
try {
return decodeURI(enc);;
} catch (e) { return null }
return null;
""";
SELECT ven_session,
URL_DECODE(REGEXP_EXTRACT(para,r'&kw=(\w|[^&]*)')) AS q
FROM raas_system.weblog_20170327
WHERE para like '%&kw=%'
LIMIT 10
I agree with everyone here that URLDECODE should be a native function. However, until that happens, it is possible to write a "native" URLDECODE:
SELECT id, SAFE_CONVERT_BYTES_TO_STRING(ARRAY_TO_STRING(ps, b'')) FROM (SELECT
id,
ARRAY_AGG(CASE
WHEN REGEXP_CONTAINS(y, r"^%") THEN FROM_HEX(SUBSTR(y, 2))
ELSE CAST(y AS bytes)
END ORDER BY i) AS ps
FROM (SELECT x AS id, REGEXP_EXTRACT_ALL(x, r"%[0-9a-fA-F]{2}|[^%]+") AS element FROM UNNEST(ARRAY['domodossola%e2%80%93locarno railway', 'gabu%c5%82t%c3%b3w']) AS x) AS x
CROSS JOIN UNNEST(x.element) AS y WITH OFFSET AS i GROUP BY id);
In this example, I've tried and tested the implementation with a couple of percent-encoded page names from Wikipedia as the input. It should work with your input, too.
Obviously, this is extremely unwieldly! For that reason, I'd suggest building a materialized join table, or wrapping this in a view, rather than using this expression "naked" in your query. However, it does appear to get the job done, and it doesn't hit the UDF limits.
EDIT: #MikhailBerylyant's post below has wrapped this cumbersome implementation into a nice, tidy little SQL UDF. That's a much better way to handle this!