query column names for a single row where value is null? - sql

I'm attempting to return all of the column names for a single row where the value is null. I can parse the entire row afterward but curious if there's a function that I can leverage.
I can return a JSON object containing key value pairs where the value is not null using row_to_json() and json_strip_nulls where conditional references a single unique row:
SELECT json_strip_nulls(row_to_json(t))
FROM table t where t.id = 123
Is there a function or simple way to accomplish the inverse of this, returning all of the keys (column names) with null values?

You need a primary key or unique column(s). In the example id is unique:
with my_table(id, col1, col2, col3) as (
values
(1, 'a', 'b', 'c'),
(2, 'a', null, null),
(3, null, 'b', 'c')
)
select id, array_agg(key) as null_columns
from my_table t
cross join jsonb_each_text(to_jsonb(t))
where value is null
group by id
id | null_columns
----+--------------
2 | {col2,col3}
3 | {col1}
(2 rows)
key and value are default columns returned by the function jsonb_each_text(). See JSON Functions and Operators in the documentation.

Actually the JSON approach might work. First transform the rows to a JSON object with row_ro_json(). Then expand the JSON objects back to a set using json_each_text(). You can now filter for NULL values and use aggregation to get the columns, that contain NULL.
I don't know what output format you want. json_object_agg() is the "complement" to your json_strip_nulls()/row_to_json() approach. But you may also want a JSON array (json_agg), just an array (array_agg()) or a comma separated string list (string_agg()).
SELECT json_object_agg(jet.k, jet.v),
json_agg(jet.k),
array_agg(jet.k),
string_agg(jet.k, ',')
FROM elbat t
CROSS JOIN LATERAL row_to_json(t) rtj(j)
CROSS JOIN LATERAL json_each_text(rtj.j) jet(k, v)
WHERE jet.v IS NULL
GROUP BY rtj.j::text;
db<>fiddle

Related

Compare two arrays in PostgreSQL

I have a table in postgres with a value column that contains string arrays. My objective is to find all arrays that contain any of the following strings: {'cat', 'dog'}
id value
1 {'dog', 'cat', 'fish'}
2 {'elephant', 'mouse'}
3 {'lizard', 'dog', 'parrot'}
4 {'bear', 'bird', 'cat'}
The following query uses ANY() to check if 'dog' is equal to any of the items in each array and will correctly return rows 1 and 3:
select * from mytable where 'dog'=ANY(value);
I am trying to find a way to search value for any match in an array of strings. For example :
select * from mytable where ANY({'dog', 'cat'})=ANY(value);
Should return rows 1, 3, and 4. However, the above code throws an error. Is there a way to use the ANY() clause on the left side of this equation? If not, what would be the workaround to check if any of the strings in an array are in value?
You can use && operator to find out whether two array has been overlapped or not. It will return true only if at least one element from each array match.
Schema and insert statements:
create table mytable (id int, value text[]);
insert into mytable values (1,'{"dog", "cat", "fish"}');
insert into mytable values (2,'{"elephant", "mouse"}');
insert into mytable values (3,'{"lizard", "dog", "parrot"}');
insert into mytable values (4,'{"bear", "bird", "cat"}');
Query:
select * from mytable where array['dog', 'cat'] && (value);
Output:
id
value
1
{dog,cat,fish}
3
{lizard,dog,parrot}
4
{bear,bird,cat}
db<>fiddle here

Selecting Columns which are not null in Athena(Metabase)

I have a table of 1000+ columns in Athena(Metabase), and I want to know how can I extract only those columns which are not null for a certain group of ID.
Typically, this would need an UNPIVOTING of your columns to rows and then check where not null and then back to PIVOT.
From the documentation, Athena may do it simpler.
As documented here
SELECT filter(ARRAY [-1, NULL, 10, NULL], q -> q IS NOT NULL)
Which returns:
[-1,10]
Unfortunately, since there is no ability to be dynamic until we get to an array, this looks like:
WITH dataset AS (
SELECT
ID,
ARRAY[field1, field2, field3, .....] AS fields
FROM
Your1000ColumnTable
)
SELECT ID, SELECT filter(fields, q -> q IS NOT NULL)
FROM dataset
If you need to access the column names from the array, use a mapping to field names when creating the array as seen here

SELECT on JSON operations of Postgres array column?

I have a column of type jsonb[] (a Postgres array of jsonb objects) and I'd like to perform a SELECT on rows where a criteria is met on at least one of the objects. Something like:
-- Schema would be something like
mytable (
id UUID PRIMARY KEY,
col2 jsonb[] NOT NULL
);
-- Query I'd like to run
SELECT
id,
x->>'field1' AS field1
FROM
mytable
WHERE
x->>'field2' = 'user' -- for any x in the array stored in col2
I've looked around at ANY and UNNEST but it's not totally clear how to achieve this, since you can't run unnest in a WHERE clause. I also don't know how I'd specify that I want the field1 from the matching object.
Do I need a WITH table with the values expanded to join against? And how would I achieve that and keep the id from the other column?
Thanks!
You need to unnest the array and then you can access each json value
SELECT t.id,
c.x ->> 'field1' AS field1
FROM mytable t
cross join unnest(col2) as c(x)
WHERE c.x ->> 'field2' = 'user'
This will return one row for each json value in the array.

Traverse a nested json object and insert the keys and values into two related tables

I'm passing the following json structure to my procedure:
{questA: [[a1, a2], [a3, a4]], questB: [[b1, b2], [b2, b4]...]}
I would like to go over all the 'quest' keys (questA, questB...) and insert each key name to one table and it's value sets to another table in multiple rows so each set (a1, a2) has it's own row plus foreign key field to it's parent quest key.
quest
-------
id
key
questValues
-------------
id
val
val
quest_id
foreign key (quest_id) references quest(id)
I've tried something like:
FOR key, val IN SELECT * FROM jasonb_each_text(myJson) LOOP
...
END LOOP;
But it loops over everything so the val arrays are just plain text now. I thought about chaining selects with one of the json literal functions but I'm unsure about the syntax.
You can indeed do this with chaining the output various JSON functions:
with input (parameter) as (
values ('{"questA": [["a1", "a2"], ["a3", "a4"]], "questB": [["b1", "b2"], ["b2", "b4"]]}'::jsonb)
), elements as (
select j.quest, k.answer
from input i
cross join lateral jsonb_each(i.parameter) as j(quest,vals)
cross join lateral jsonb_array_elements(j.vals) as k(answer)
), new_quests as (
insert into quest ("key")
select distinct quest
from elements
returning *
)
insert into quest_values (val1, val2, quest_id)
select e.answer ->> 0 as val1,
e.answer ->> 1 as val2,
nq.id as quest_id
from new_quests nq
join elements e on e.quest = nq.key;
The first step ("elements") turns the JSON value into rows that can be used as the source of the INSERT statements. It returns this:
quest | answer
-------+-------------
questA | ["a1", "a2"]
questA | ["a3", "a4"]
questB | ["b1", "b2"]
questB | ["b2", "b4"]
The next step inserts the unique values of the quest column into the quest table and returns the generated IDs.
And the final statement joins the generated IDs with the rows from the first step and extracts the two array elements as two values. It uses that query as the source to insert into the quest_values table.
Inside a procedure you obviously don't need the part that generates the sample data, so it would look something like this:
with elements as (
select j.quest, k.answer
from jsonb_each(the_parameter) as j(quest,vals)
cross join lateral jsonb_array_elements(j.vals) as k(answer)
), new_quests as (
insert into quest ("key")
select distinct quest
from elements
returning *
)
insert into quest_values (val1, val2, quest_id)
select e.answer ->> 0 as val1,
e.answer ->> 1 as val2,
nq.id as quest_id
from new_quests nq
join elements e on e.quest = nq.key;
Where the_parameter is the JSONB parameter passed to your procedure.
Online example: https://rextester.com/NBJIK44025

INSERT SELECT FROM VALUES casting

It's often desirable to INSERT from a SELECT expression (e.g. to qualify with a WHERE clause), but this can get postgresql confused about the column types.
Example:
CREATE TABLE example (a uuid primary key, b numeric);
INSERT INTO example
SELECT a, b
FROM (VALUES ('d853b5a8-d453-11e7-9296-cec278b6b50a', NULL)) as data(a,b);
=> ERROR: column "a" is of type uuid but expression is of type text
This can be fixed by explicitly casting in the values:
INSERT INTO example
SELECT a, b
FROM (VALUES ('d853b5a8-d453-11e7-9296-cec278b6b50a'::uuid, NULL::numeric)) as data(a,b);
But that's messy and a maintenance burden. Is there some way to make postgres understand that the VALUES expression has the same type as a table row, i.e. something like
VALUES('d853b5a8-d453-11e7-9296-cec278b6b50a', NULL)::example%ROWTYPE
Edit:
The suggestion of using (data::example).* is neat, but unfortunately it complete seems to screw up the postgres query planner when combined with a WHERE clause like so:
INSERT INTO example
SELECT (data::example).*
FROM (VALUES ('d853b5a8-d453-11e7-9296-cec278b6b50a', NULL)) as data
WHERE NOT EXISTS (SELECT * FROM example
WHERE (data::example)
IS NOT DISTINCT FROM example);
This takes minutes with a large table.
You can cast a record to a row type of your table:
INSERT INTO example
SELECT (data::example).*
FROM (
VALUES
('d853b5a8-d453-11e7-9296-cec278b6b50a', NULL),
('54514c89-f188-490a-abbb-268f9154ab2c', 42)
) as data;
data::example casts the complete row to a record of type example. The (...).* then turns that into the columns defined in the table type example
You could use VALUES directly:
INSERT INTO example(a, b)
VALUES ('d853b5a8-d453-11e7-9296-cec278b6b50a', NULL);
DBFiddle Demo
Or just cast once:
INSERT INTO example(a, b)
SELECT a::uuid, b::numeric
FROM (VALUES ('d853b5a8-d453-11e7-9296-cec278b6b50a', NULL),
('bb53b5a8-d453-11e7-9296-cec278b6b50a',1) ) as data(a,b);
DBFiddle Demo2
Note, please always explicitly define columns list.