Is there a way to lookup a jsonb column by its values

Is there a way to lookup a jsonb column by its values - sql

I have a table, test, in postgres 12 with a jsonb column, data_col, that has many different keys and values.
My requirement is to select * from that table where value matches a string.
for example, the table has data as below
id some_value data_col
---------------------------------------------------------------
11 2018 {"a": "Old Farm"}
12 2019 {"b": "My house is old", "c": "See you tomorrow"}
13 2020 {"d": "The old house", "a": "Very Green", "e": "Olden days"}
As you can see, there are many different keys and so its not practical to lookup like the examples on the web suggests i.e col_name->>'Key'
I am looking to write a sql with a where clause to give me all rows that have the string "old" in it.
something like:
select * from test where data_col ILIKE '%old%'
should give me
11, 2018, Old Farm
12, 2019, My house is old
13, 2020, Olden days

One option uses jsonb_each():
select t.*, x.*
from test t
cross join lateral jsonb_each(t.data_col) x
where x.value ilike '%old%'
Note that this multiplies the rows if an object contains "old" more than once. To avoid that, you can use exists instead:
select t.*
from test t
where exists (
select 1
from jsonb_each(t.data_col) x
where x.val ilike '%old%'
)
Or if you want to aggregate all the matched values in one column:
select t.*, x.*
from test t
cross join lateral (
select string_agg(x.val, ',') as vals
from jsonb_each(t.data_col) x
where x.val ilike '%old%'
) x
where x.vals is not null

As you are using Postgres 12, you can use a SQL/JSON path function:
select id, some_value,
jsonb_path_query_first(data_col, '$.* ? (# like_regex "old" flag "i")') #>> '{}'
from data
The #>> operator is only there to convert the scalar JSON value into text (as there is no direct cast from jsonb to text that would remove the double quotes)
If there are potentially more values with the substring, you can use jsonb_path_query_array() to all of them as an array (obviously you need to remove the #>> then)

Related

PartiQL/SQL: JSON-SUPER array query to extract values to table on Redshift

I have a somewhat complicated SUPER array that I brought in to Redshift using a REST API. The 'API_table' currently looks like this:table example
One of the sample columns "values" reads as follows:
values
[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},,{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"},...
I've queried the "value" data using:
SELECT c.values[0].value[0].value as v
FROM API_table c;
However, this only returns the first value "6.9" in each row and not all the "value" items in the row. The same approach doesn't work for extracting the "dateTime" items as it produced NULL values:
SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;
The above example only resembles one row of the table. My question is-- are there ways to query the data in every row of the table so that all the values ("value" & "dateTime") of every row can be extracted onto a new table?
The desired result is:
v
dt
6.9
2023-01-30T17:45:00.000-05:00
6.9
2023-01-30T18:00:00.000-05:00
6.9
2023-01-30T18:15:00.000-05:00
Many thanks.
I tried the following query but it only returned singular "value' results for each row.
SELECT c.values[0].value[0].value as v
FROM API_table c;
When applied to the "dateTime" items, it yielded NULL values:
SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;
===================================================================
#BillWeiner thanks, I worked through both the CTE and test case examples and got the desired results (especially with CTE). The only issue that remains is knowing how to select the original table/column that contains the entire super array so that it can be inserted into test1 (or col1 in the CTE case).
There are super arrays in every row of column 'values' so the issue remains in selecting the column 'values' and extracting each of the multiple value ("6.9") and dateTime objects from each row.
================================================================
I've managed to get the query going when the json strings are explicitly stated in the insert into test1 values query.
Now I'm running this query:
SET enable_case_sensitive_identifier TO true;
create table test1 (jvalues varchar(2048));
insert into test1 select c.values from ph_api c;
create table test2 as select json_parse(jvalues) as svalues from test1;
with recursive numbers(n) as
( select 0 as n
union all
select n + 1
from numbers n
where n.n < 20
),
exp_top as
( select c.svalues[n].value
from test2 c
cross join numbers n
)
,
exp_bot as
( select c.value[n]
from exp_top c
cross join numbers n
where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;
However, I'm getting an error--ERROR: column "jvalues" is of type character varying but expression is of type super Hint: You will need to rewrite or cast the expression. when I try to insert the source table with insert into test1 SELECT c.values from table c;
I would like to be able to SELECT this source data:
sourceinfo
variable
values
{"siteName":"YAN","siteCode":[{"value":"01"}]
{“variableCode":[{"value":"00600","network":"ID"}
[{“value":[{"value":"3.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"4.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"}]
{"siteName":"YAN","siteCode":[{"value":"01"}]
{“variableCode":[{"value":"00600","network":"ID"}
[{“value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"}]
as the jvalues so that it could be unrolled into a desired result of:
v
dt
3.9
2023-01-30T17:30:00.000-05:00
4.9
2023-01-30T17:45:00.000-05:00
5.9
2023-01-30T18:00:00.000-05:00
6.9
2023-01-30T18:15:00.000-05:00
================================================================
The following query worked to select the desired json strings:
with exp_top as
( select s.value
from <source_table> c, c.values s
)
select s.value, s."dateTime" from exp_top c, c.value s;

Yes. You need to expand each array element into its own row. A recursive CTE (or something similar) will be needed to expand the arrays into rows. This can be done based on the max array length in the super or with some fixed set of numbers. This set of numbers will need to be crossed joined with your table to extract each array element.
I wrote up a similar answer previously - Extract value based on specific key from array of jsons in Amazon Redshift - take a look and see if this gets you unstuck. Let me know if you need help adapting this to your situation.
==============================================================
Based on the comments it looks like a more specific example is needed. This little test case should help you understand what is needed to make this work.
I've repeated your data a few times to create multiple rows and to populate the outer array with 2 inner arrays. This hopefully show how to unroll multiple nested arrays manually (the compact Redshift unrolling method is below but hard to understand if you don't get the concepts down first).
First set up the test data:
SET enable_case_sensitive_identifier TO true;
create table test1 (jvalues varchar(2048));
insert into test1 values
('[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]'),
('[{"value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"8.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]');
create table test2 as select json_parse(jvalues) as svalues from test1;
Note that we have to turn on case sensitivity for the session to be able to select "dateTime" correctly.
Then unroll the arrays manually:
with recursive numbers(n) as
( select 0 as n
union all
select n + 1
from numbers n
where n.n < 20
),
exp_top as
( select row_number() over () as r, n as x, c.svalues[n].value
from test2 c
cross join numbers n
)
,
exp_bot as
( select r, x, n as y, c.value[n]
from exp_top c
cross join numbers n
where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;
This version
creates the numbers 0 - 19,
expands the outer array (2 elements in each row) by cross joining
with these numbers,
expands the inner array by the same method,
produces the desired results
Redshift has a built in method for doing this unrolling of super arrays and it is defined in the FROM clause. You can produce the same results from:
with exp_top as (select inx1, s.value from test2 c, c.svalues s at inx1)
select inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;
Much more compact. This code has been tested and runs as is in Redshift. If you see the "dateTime" value as NULL it is likely that you don't have case sensitivity enabled.
==========================================================
To also have the original super column in the final result:
with exp_top as (select c.svalues, inx1, s.value from test2 c, c.svalues s at inx1)
select svalues, inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;
==========================================================
I think that unrolling your actual data will be simpler than the code I provided for the general question.
First you don't need to use the test1 and test2 tables, you can query your table directly. If you still want to use test2 then use your table as the source of the "create table test2 ..." statement. But let's see if we can just use your source table.
with exp_top as (
select s.value from <your table> c, c.values s
)
select s.value, s."dateTime" from exp_top c, c.value s;
This code is untested but should work.

SQL Unnest- how to use correctly?

Say I have some data in a table, t.
id, arr
--, ---
1, [1,2,3]
2, [4,5,6]
SQL
SELECT AVG(n) FROM UNNEST(
SELECT arr FROM t AS n) AS avg_arr
This returns the error, 'Mismatched input 'SELECT'. Expecting <expression>.
What is the correct way to unnest an array and aggregate the unnested values?

unnest is normally used with a join and will expand the array into relation (i.e. for every element of array an row will be introduced). To calculate average you will need to group values back:
-- sample data
WITH dataset (id, arr) AS (
VALUES (1, array[1,2,3]),
(2, array[4,5,6])
)
--query
select id, avg(n)
from dataset
cross join unnest (arr) t(n)
group by id
Output:
id
_col1
1
2.0
2
5.0
But you also can use array functions. Depended on presto version either array_average:
select id, array_average(n)
from dataset
Or for older versions more cumbersome approach with manual aggregation via reduce:
select id, reduce(arr, 0.0, (s, x) -> s + x, s -> s) / cardinality(arr)
from dataset

Inclusion of nulls with ANY_VALUE in BigQuery

I have a 'vendors' table that looks like this...
**company itemKey itemPriceA itemPriceB**
companyA, 203913, 20, 10
companyA, 203914, 20, 20
companyA, 203915, 25, 5
companyA, 203916, 10, 10
It has potentially millions of rows per company and I want to query it to bring back a representative delta between itemPriceA and itemPriceB for each company. I don't care which delta I bring back as long as it isn't zero/null (like row 2 or 4), so I was using ANY_VALUE like this...
SELECT company
, ANY_VALUE(CASE WHEN (itemPriceA-itemPriceB)=0 THEN null ELSE (itemPriceA-itemPriceB) END)
FROM vendors
GROUP BY 1
It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...
"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."
If ANY_VALUE returns null "when expression is NULL for all rows in the group" it should NEVER return null for companyA right (since only 2 of 4 rows are null)? But the second sentence sounds like it will indeed include the null rows.
P.s. you may be wondering why I don't simply add a WHERE clause saying "WHERE itemPriceA-itemPriceB>0" but in the event that a company has ONLY matching prices, I still want the company to be returned in my results.

Clarification
I'm afraid the accepted answer will have to show stronger evidence that contradicts the docs.
#Raul Saucedo suggests that the following BigQuery documentation is referring to WHERE clauses:
rows for which expression is NULL are considered and may be selected
This is not the case. WHERE clauses are not mentioned anywhere in the ANY_VALUE docs. (Nowhere on the page. Try to ctrl+f for it.) And the docs are clear, as I'll explain.
#d3wannabe is correct to wonder about this:
It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...
"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."
But the docs are not contradictory. The 2 sentences coexist.
"Returns NULL when expression is NULL for all rows in the group." So if all rows in a column are NULL, it will return NULL.
"ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected." So if the column has rows mixed with NULLs and actual data, it will select anything from that column, including nulls.
How to create an ANY_VALUE without nulls in BigQuery
We can use ARRAY_AGG to turn a group of values into a list. This aggregate function has the option to INGORE NULLS. We then select 1 item from the list after ignoring nulls.
If we have a table with 2 columns: id and mixed_data, where mixed_data has some rows with nulls:
SELECT
id,
ARRAY_AGG( -- turn the mixed_data values into a list
mixed_data -- we'll create an array of values from our mixed_data column
IGNORE NULLS -- there we go!
LIMIT 1 -- only fill the array with 1 thing
)[SAFE_OFFSET(0)] -- grab the first item in the array
AS any_mixed_data_without_nulls
FROM your_table
GROUP BY id
See similar answers here:
https://stackoverflow.com/a/53508606/6305196
https://stackoverflow.com/a/62089838/6305196
Update, 2022-08-12
There is evidence that the docs may be inconsistent with the actual behavior of the function. See Samuel's latest answer to explore his methodology.
However, we cannot know if the docs are incorrect and ANY_VALUE behaves as expected or if ANY_VALUE has a bug and the docs express the intended behavior. We don't know if Google will correct the docs or the function when they address this issue.
Therefore I would continue to use ARRAY_AGG to create a safe ANY_VALUE that ignores nulls until we see a fix from Google.
Please upvote the issue in Google's Issue Tracker to see this resolved.

This is an explanation about how “any_value works with null values”.
With any_value always return the first value, if there is a value different from null.
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST([null, "banana",null,null]) as fruit;
Return null if all rows have null values. Refers at this sentence
“Returns NULL when expression is NULL for all rows in the group”
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST([null, null, null]) as fruit
Return null if one value is null and you specified in the where clause. Refers to these sentences
“ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which
expression is NULL are considered and may be selected.”
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST(["apple", "banana", null]) as fruit
where fruit is null
Always depends which filter you are using and the field inside the any_value.
You can see this example, return two rows that are different from 0.
SELECT ANY_VALUE(e).company, (itemPriceA-itemPriceB) as value
FROM `vendor` e
where (itemPriceA-itemPriceB)!=0
group by e.company

The documentation says that "NULL are considered and may be" returned by an any_value statement. However, I am quite sure the documentation is wrong here. In the current implementation, which was tested on 13th August 2022, the any_value will return the first value of that column. However, if the table does not have an order by specified, the sorting may be random due to processing of the data on several nodes.
For testing a large table of nulls is needed. To generate_array will come handy for that. This array will have several entries and the value zero for null. The first 1 million entries with value zero are generated in the table tmp. Then table tbl adds before and after the [-100,0,-90,-80,3,4,5,6,7,8,9] the 1 million zeros. Finally, calculating NULLIF(x,0) AS x replaces all zeros by null.
Several test of any_value using the test table tbl are done. If the table is not further sorted, the first value of that column is returned: -100.
WITH
tmp AS (SELECT ARRAY_AGG(0) AS tmp0 FROM UNNEST(GENERATE_ARRAY(1,1000*1000))),
tbl AS (
SELECT
NULLIF(x,0) AS x,
IF(x!=0,x,NULL) AS y,
rand() AS rand
FROM
tmp,
UNNEST(ARRAY_CONCAT(tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0)) AS x )
SELECT "count rows", COUNT(1) FROM tbl
UNION ALL SELECT "count items not null", COUNT(x) FROM tbl
UNION ALL SELECT "any_value(x): (returns first non null element in list: -100)", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "2nd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "3rd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "any_value(y)", ANY_VALUE(y) FROM tbl
UNION ALL SELECT "order asc", ANY_VALUE(x) FROM (Select * from tbl order by x asc)
UNION ALL SELECT "order desc (returns largest element: 9)", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order desc", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order abs(x) desc", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) desc )
UNION ALL SELECT "order abs(x) asc (smallest number: 3)", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) asc )
UNION ALL SELECT "order rand asc", ANY_VALUE(x) FROM (Select * from tbl order by rand asc )
UNION ALL SELECT "order rand desc", ANY_VALUE(x) FROM (Select * from tbl order by rand desc )
This gives following result:
The first not null entry, -100 is returned.
Sorting the table by this column causes the any_value to always return the first entry
In the last two examples, the table is ordered by random values, thus any_value returns random entries
If the dataset is larger than 2 million rows, the table may be internally split to be processed; this will result in a not ordered table. Without the order by command the first entry on the table and thus the result of any_value cannot be predicted.
For testing this, please replace the 10th line by
UNNEST(ARRAY_CONCAT(tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0,tmp0)) AS x )

How does BigQuery manage a struct field in a SELECT

The following queries a struct from a public data source:
SELECT year FROM `bigquery-public-data.words.eng_gb_1gram` LIMIT 1000
Its schema is:
And the resultset is:
It seems BigQuery automatically translates a struct to all its (leaf) fields when accessed, is that correct? Or how does BigQuery handle directly calling a struct in a select statement?

Two things are going on. You have an array of structs (aka "records").
Each element of the array appears on a separate line in the result set.
Each field in the struct is a separate column.
So, your results are not for a struct but for an array of structs.
You can see what happens for a single struct using:
select year[safe_ordinal(1)]
from . . .
You will get a single row for each row in the data, with the first element of the year array in the row. It will have separate columns, with the names of year.year, year.term_frequency and so on. If you wanted these as "regular" columns, you can use:
select year[ordinal(1)].*
from . . .
Then the columns are year, term_frequency, and so on.

As you might know - RECORD can be NULLABLE - in this case it is a STRUCT and RECORD can be REPEATED - in this case it is an array of record
You can use dot-start notion with the struct to select out all its fields as you do with tables' individual rows with SELECT * FROM tbl or its equivalent SELECT t.* FROM tbl t
So, for example below code
with tbl as (
select struct(1 as a, 2 as b, 3 as c) as col_struct,
[ struct(11 as x, 12 as y, 13 as z),
struct(21, 22, 23),
struct(31, 32, 33)
] as col_array
)
select col_struct.*
from tbl
produces
as if those are the rows of "mini" table called col_struct
Same dot-star notion - does not work for arrays - if you want to output separately elements of array - you need to first to unnest that array. like in below example
with tbl as (
select struct(1 as a, 2 as b, 3 as c) as col_struct,
[ struct(11 as x, 12 as y, 13 as z),
struct(21, 22, 23),
struct(31, 32, 33)
] as col_array
)
select rec
from tbl, unnest(col_array) rec
which outputs
And now, because each row is a struct - you can use dot-star notion
select rec.*
from tbl, unnest(col_array) rec
with output
And, finally - you can combine above as
select col_struct.*, rec.*
from tbl t, t.col_array rec
with output
Note: from tbl t, t.col_array rec is a shortcut for from tbl, unnest(col_array) rec
One more note - if you reference field name that is used in multiple places of your schema - the engine picks most outer matching one. And if by chance this matching one is within the ARRAY - you first need to unnest that array. And if this one is part of STRUCT - you need to make sure you fully qualify the path
For example - with above simplified data
select a from tbl // will not work
select col_struct.a from tbl // will work
select col_array.x from tbl // will not work
select x from tbl, unnest(col_array) // will work
There are many more can be said about subject based on what exactly your use case - but above is some hopefully helpful basics

Purposely having a query return blank entries at regular intervals

I want to write a query that returns 3 results followed by blank results followed by the next 3 results, and so on. So if my database had this data:
CREATE TABLE table (a integer, b integer, c integer, d integer);
INSERT INTO table (a,b,c,d)
VALUES (1,2,3,4),
(5,6,7,8),
(9,10,11,12),
(13,14,15,16),
(17,18,19,20),
(21,22,23,24),
(25,26,37,28);
I would want my query to return this
1,2,3,4
5,6,7,8
9,10,11,12
, , ,
13,14,15,16
17,18,19,20
21,22,23,24
, , ,
25,26,27,28
I need this to work for arbitrarily many entries that I select for, have three be grouped together like this.
I'm running postgresql 8.3

This should work flawlessly in PostgreSQL 8.3
SELECT a, b, c, d
FROM (
SELECT rn, 0 AS rk, (x[rn]).*
FROM (
SELECT x, generate_series(1, array_upper(x, 1)) AS rn
FROM (SELECT ARRAY(SELECT tbl FROM tbl) AS x) x
) y
UNION ALL
SELECT generate_series(3, (SELECT count(*) FROM tbl), 3), 1, (NULL::tbl).*
ORDER BY rn, rk
) z
Major points
Works for a query that selects all columns of tbl.
Works for any table.
For selecting arbitrary columns you have to substitute (NULL::tbl).* with a matching number of NULL columns in the second query.
Assuming that NULL values are ok for "blank" rows.
If not, you'll have to cast your columns to text in the first and substitute '' for NULL in the second SELECT.
Query will be slow with very big tables.
If I had to do it, I would write a plpgsql function that loops through the results and inserts the blank rows. But you mentioned you had no direct access to the db ...

In short, no, there's not an easy way to do this, and generally, you shouldn't try. The database is concerned with what your data actually is, not how it's going to be displayed. It's not an appropriate scope of responsibility to expect your database to return "dummy" or "extra" data so that some down-stream process produces a desired output. The generating script needs to do that.
As you can't change your down-stream process, you could (read that with a significant degree of skepticism and disdain) add things like this:
Select Top 3
a, b, c, d
From
table
Union Select Top 1
'', '', '', ''
From
table
Union Select Top 3 Skip 3
a, b, c, d
From
table
Please, don't actually try do that.

You can do it (at least on DB2 - there doesn't appear to be equivalent functionality for your version of PostgreSQL).
No looping needed, although there is a bit of trickery involved...
Please note that though this works, it's really best to change your display code.
Statement requires CTEs (although that can be re-written to use other table references), and OLAP functions (I guess you could re-write it to count() previous rows in a subquery, but...).
WITH dataList (rowNum, dataColumn) as (SELECT CAST(CAST(:interval as REAL) /
(:interval - 1) * ROW_NUMBER() OVER(ORDER BY dataColumn) as INTEGER),
dataColumn
FROM dataTable),
blankIncluder(rowNum, dataColumn) as (SELECT rowNum, dataColumn
FROM dataList
UNION ALL
SELECT rowNum - 1, :blankDataColumn
FROM dataList
WHERE MOD(rowNum - 1, :interval) = 0
AND rowNum > :interval)
SELECT *
FROM dataList
ORDER BY rowNum
This will generate a list of those elements from the datatable, with a 'blank' line every interval lines, as ordered by the initial query. The result set only has 'blank' lines between existing lines - there are no 'blank' lines on the ends.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is there a way to lookup a jsonb column by its values - sql

Related

PartiQL/SQL: JSON-SUPER array query to extract values to table on Redshift

SQL Unnest- how to use correctly?

Inclusion of nulls with ANY_VALUE in BigQuery

How does BigQuery manage a struct field in a SELECT

Purposely having a query return blank entries at regular intervals

Categories

Resources