Count the number of attributes that are NULL for a row - sql

I want to add a new column to a table to record the number of attributes whose value are null for each tuple (row). How can I use SQL to get the number?
for example, if a tuple is like this:
Name | Age | Sex
-----+-----+-----
Blice| 100 | null
I want to update the tuple as this:
Name | Age | Sex | nNULL
-----+-----+-----+--------
Blice| 100 | null| 1
Also, because I'm writing a PL/pgSQL function and the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name. Anyone know how to do this?

Possible without spelling out columns. Unpivot columns to rows and count.
The aggregate function count(<expression>) only counts non-null values, while count(*) counts all rows. The shortest and fastest way to count NULL values for more than a few columns is count(*) - count(col) ...
Works for any table with any number of columns of any data types.
In Postgres 9.3+ with built-in JSON functions:
SELECT *, (SELECT count(*) - count(v)
FROM json_each_text(row_to_json(t)) x(k,v)) AS ct_nulls
FROM tbl t;
What is x(k,v)?
json_each_text() returns a set of rows with two columns. Default column names are key and value as can be seen in the manual where I linked. I provided table and column aliases so we don't have to rely on default names. The second column is named v.
Or, in any Postgres version since at least 8.3 with the additional module hstore installed, even shorter and a bit faster:
SELECT *, (SELECT count(*) - count(v) FROM svals(hstore(t)) v) AS ct_nulls
FROM tbl t;
This simpler version only returns a set of single values. I only provide a simple alias v, which is automatically taken to be table and column alias.
Best way to install hstore on multiple schemas in a Postgres database?
Since the additional column is functionally dependent I would consider not to persist it in the table at all. Rather compute it on the fly like demonstrated above or create a tiny function with a polymorphic input type for the purpose:
CREATE OR REPLACE FUNCTION f_ct_nulls(_row anyelement)
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (count(*) - count(v))::int FROM svals(hstore(_row)) v';
(PARALLEL SAFE only for Postgres 9.6 or later.)
Then:
SELECT *, f_ct_nulls(t) AS ct_nulls
FROM tbl t;
You could wrap this into a VIEW ...
db<>fiddle here - demonstrating all
Old sqlfiddle
This should also answer your second question:
... the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name.

In Postgres, you can express this as:
select t.*,
((name is null)::int +
(age is null)::int +
(sex is null)::int
) as numnulls
from table t;
In order to implement this on an unknown table, you will need to use dynamic SQL and obtaining a list of columns (say from information_schema.columns)).

Function to add column automatically
This is an audited version of what #winged panther posted, per request.
The function adds a column with given name to any existing table that the calling role has the necessary privileges for:
CREATE OR REPLACE FUNCTION f_add_null_count(_tbl regclass, _newcol text)
RETURNS void AS
$func$
BEGIN
-- add new col
EXECUTE format('ALTER TABLE %s ADD COLUMN %I smallint', _tbl, _newcol);
-- update new col with dynamic count of nulls
EXECUTE (
SELECT format('UPDATE %s SET %I = (', _tbl, _newcol) -- regclass used as text
|| string_agg(quote_ident(attname), ' IS NULL)::int + (')
|| ' IS NULL)::int'
FROM pg_catalog.pg_attribute
WHERE attnum > 0
AND NOT attisdropped
AND attrelid = _tbl -- regclass used as OID
AND attname <> _newcol -- no escaping here, it's the *text*!
);
END
$func$ LANGUAGE plpgsql;
SQL Fiddle demo.
How to treat identifiers properly
Sanitize identifiers with cast to regclass, format() with %I or quote_ident().
I am using all three techniques in the example, each happens to be the best choice where they are used. More here:
Table name as a PostgreSQL function parameter
I formatted the relevant code fragments in bold.
Other points
I am basing my query on pg_catalog.pg_attribute, but that's a optional decision with pros and cons. Makes my query simpler and faster because I can use the OID of the table. Related:
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
You have to exclude the newly added column from the count, or the count will be off by one.
Using data type smallint for the count, since there cannot more than 1600 columns in a table.
I don't use a variable but execute the result of the SELECT statement directly. Assignments are comparatively expensive in plpgsql. Not a big deal, though. Also a matter of taste and style.
I make it a habbit to prepend parameters and variable with an underscore (_tbl) to rule out ambiguity between variables and column names.

I just created a function to perform OP's requirement by using Gordon Linoff's answer with following table and data:
Table det:
CREATE TABLE det (
name text,
age integer,
sex text
);
Data:
insert into det (name,age,sex) values
('Blice',100,NULL),
('Glizz',NULL,NULL),
(NULL,NULL,NULL);
Function:
create or replace function fn_alter_nulls(tbl text,new_col text) returns void as
$$
declare vals text;
begin
-- dynamically getting list of columns *
select string_agg(format('(%s is null)::int',column_name),'+') into vals
from information_schema.columns
where table_schema='public' and table_name=''||tbl||'' and table_catalog='yourDB_Name';
-- adds new column
execute format('alter table %s add column "%s" int',tbl,new_col);
--updates new column
execute format('update det set %s =(%s)',new_col,vals);
end;
$$
language plpgsql
Function call:
select fn_alter_nulls('det','nnulls')

Since the null count is derived data and simple/cheap to determine at query time, why not create a view:
create view MyTableWithNullCount as
select
*,
case when nullableColumn1 is null then 1 else 0 end +
case when nullableColumn2 is null then 1 else 0 end +
...
case when nullableColumnn is null then 1 else 0 end as nNull
from myTable
And just use the view instead.
This has the upside of not having to write triggers/code to maintain a physical null count column, which will be a bigger headache than this approach.

Related

Creating columns from a enum type in PostgreSQL

I have an enumerated type on PostgreSQL, and I want to create a view that has a column for each enumerated value.
My use case is similar to this question. I have a jsonb column that I want to turn into a view with columns made up of the keys of the json blob. The difference in my case is that the valid keys are defined in an enum rather than aggregated from the objects themselves.
The following SQL statement is essentially what I want to do, but doesn't work:
SELECT json_populate_record(null::activity_type_enum, activities) from some_table;
Is there a way to cast the enumerated type into what is expected by the first argument of json_populate_record?
That needs some dynamic SQL.
Assuming all the columns in the view should have text type.
We get the enum members as an array with enum_range() and transform them into a set using unnest().
To each enum member in the set we append ' text' and use string_agg() to build a comma separated list of them. Like that, we get a column definition.
We build a CREATE VIEW statement selecting from the table lateral cross joining json_to_record() aliased with the column definition we built.
Execute the CREATE VIEW statement with EXECUTE.
Together we get the following DO block:
DO
$$
BEGIN
EXECUTE '
CREATE VIEW some_view
AS
SELECT x.*
FROM some_table t
CROSS JOIN LATERAL json_to_record(t.activities) x(' || (SELECT string_agg(un.m || ' text', ', ')
FROM unnest(enum_range(NULL::activity_type_enum)) un(m)) || ');
';
END;
$$
LANGUAGE plpgsql;
db<>fiddle
Replace json_to_record() with jsonb_to_record(), if the type of activities is jsonb rather than json.
If the enum changes however, the DO block has to be rerun, to have the view reflect the changes.

Pass table name used in FROM to function automatically?

Working with PostgreSQL 9.6.3. I am new to functions in databases.
Let's say there are multiple tables of item numbers. Each one has the item number, the item cost and several other columns which are factored into the "additional cost". I would like to put the calculation into a function so I can call it for any of these tables.
So instead of:
SELECT
itemnumber,
itemname,
base,
CASE
WHEN labor < 100 AND overhead < .20 THEN
WHEN .....
WHEN .....
WHEN .....
.....
END AS add_cost,
gpm
FROM items1;
I can just do:
SELECT
itemnumber,
itemname,
base,
calc_add_cost(),
gpm
FROM items1;
If I want to be able to use it on any of the item tables, I guess I would need to set a table_name parameter that the function takes since adding the table name into the function would be undesirable to say the least.
calc_add_cost(items1)
However, is there a simpler way such that when I call calc_add_cost() it will just use the table name from the FROM clause?
SELECT ....., calc_add_cost(item1) FROM item1
Just seems redundant.
I did come across a few topics with titles that sounded like they addressed what I was hoping to accomplish, but upon reviewing them it looked like they were a different issue.
You can even emulate a "computed field" or "generated column" like you had in mind. Basics here:
Store common query as column?
Simple demo for one table:
CREATE OR REPLACE FUNCTION add_cost(items1) -- function name = default col name
RETURNS numeric AS
$func$
SELECT
CASE
WHEN $1.labor < 100 AND $1.overhead < .20 THEN numeric '1'
-- WHEN .....
-- WHEN .....
-- WHEN .....
ELSE numeric '0' -- ?
END;
$func$
LANGUAGE sql IMMUTABLE;
Call:
SELECT *, t.add_cost FROM items1 t;
Note the table-qualification in t.add_cost. I only demonstrate this syntax variant since you have been asking for it. My advise is to use the less confusing standard syntax:
SELECT *, add_cost(t) AS add_cost FROM items1 t; -- column alias is also optional
However, SQL is a strictly typed language. If you define a particular row type as input parameter, it is bound to this particular row type. Passing various whole table types is more sophisticated, but still possible with polymorphic input type.
CREATE OR REPLACE FUNCTION add_cost(ANYELEMENT) -- function name = default col name
RETURNS numeric AS
$func$
SELECT
CASE
WHEN $1.labor < 100 AND $1.overhead < .20 THEN numeric '1'
-- WHEN .....
-- WHEN .....
-- WHEN .....
ELSE numeric '0' -- ?
END;
$func$
LANGUAGE sql IMMUTABLE;
Same call for any table that has the columns labor and overhead with matching data type.
dbfiddle here
Also see the related simple case passing simple values here:
How to put part of a SELECT statement into a Postgres function
For even more complex requirements - like also returning various row types - see:
Refactor a PL/pgSQL function to return the output of various SELECT queries

Conditionally delete item inside an Array Field PostgreSQL

I'm building a kind of dictionary app and I have a table for storing words like below:
id | surface_form | examples
-----------------------------------------------------------------------
1 | sounds | {"It sounds as though you really do believe that",
| | "A different bell begins to sound midnight"}
Where surface_form is of type CHARACTER VARYING and examples is an array field of CHARACTER VARYING
Since the examples are generated automatically from another API, it might not contain the exact "surface_form". Now I want to keep in examples only sentences that contain the exact surface_form. For instance, in the given example, only the first sentence is kept as it contain sounds, the second should be omitted as it only contain sound.
The problem is I got stuck in how to write a query and/or plSQL stored procedure to update the examples column so that it only has the desired sentences.
This query skips unwanted array elements:
select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id;
id | new_examples
----+----------------------------------------------------
1 | {"It sounds as though you really do believe that"}
(1 row)
Use it in update:
with corrected as (
select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id
)
update a_table
set examples = new_examples
from corrected
where examples <> new_examples
and a_table.id = corrected.id;
Test it in rextester.
Maybe you have to change the table design. This is what PostgreSQL's documentation says about the use of arrays:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
Documentation:
https://www.postgresql.org/docs/current/static/arrays.html
The most compact solution (but not necessarily the fastest) is to write a function that you pass a regular expression and an array and which then returns a new array that only contains the items matching the regex.
create function get_matching(p_values text[], p_pattern text)
returns text[]
as
$$
declare
l_result text[] := '{}'; -- make sure it's not null
l_element text;
begin
foreach l_element in array p_values loop
-- adjust this condition to whatever you want
if l_element ~ p_pattern then
l_result := l_result || l_element;
end if;
end loop;
return l_result;
end;
$$
language plpgsql;
The if condition is only an example. You need to adjust that to whatever you exactly store in the surface_form column. Maybe you need to test on word boundaries for the regex or a simple instr() would do - your question is unclear about that.
Cleaning up the table then becomes as simple as:
update the_table
set examples = get_matching(examples, surface_form);
But the whole approach seems flawed to me. It would be a lot more efficient if you stored the examples in a properly normalized data model.
In SQL, you have to remember two things.
Tuple elements are immutable but rows are mutable via updates.
SQL is declarative, not procedural
So you cannot "conditionally" "delete" a value from an array. You have to think about the question differently. You have to create a new array following a specification. That specification can conditionally include values (using case statements). Then you can overwrite the tuple with the new array.
Looks like one way could to update the array with array elements that are valid by doing a select using like or some regular expression.
https://www.postgresql.org/docs/current/static/arrays.html
If you want to hold elements from array that have "surface_form" in it you have to use that entries with substring(....,...) is not null
First you unnest the array, hold only items that match, and then array_agg the stored items
Here is a little query you can run to test without any table.
SELECT
id,
surface_form,
(SELECT array_agg(examples_matching)
FROM unnest(surfaces.examples) AS examples_matching
WHERE substring(examples_matching, surfaces.surface_form) IS NOT NULL)
FROM
(SELECT
1 AS id,
'example' :: TEXT AS surface_form,
ARRAY ['example form', 'test test','second example form'] :: TEXT [] AS examples
) surfaces;
You can select data in temp table using
Then update temp table using update query on row number
Merge value using
This merge value you can update in original table
For Example
Suppose you create temp table
Temp (id int, element character varying)
Then update Temp table and nest it.
Finally update original table
Here is the query you can directly try to execute in editor
CREATE TEMP TABLE IF NOT EXISTS temp_element (
id bigint,
element character varying)WITH (OIDS);
TRUNCATE TABLE temp_element;
insert into temp_element select row_number() over (order by p),p from (
select unnest(ARRAY['It sounds as though you really do believe that',
'A different bell begins to sound midnight']) as P)t;
update temp_element set element = 'It sounds as though you really'
where element = 'It sounds as though you really do believe that';
--update table
select array_agg(r) from ( select element from temp_element)r

Is it possible to look up a table-valued function's return columns in SAP HANA's dictionary views?

I've created a table-valued function in SAP HANA:
CREATE FUNCTION f_tables
RETURNS TABLE (
column_value INTEGER
)
LANGUAGE SQLSCRIPT
AS
BEGIN
RETURN SELECT 1 column_value FROM SYS.DUMMY;
END
Now I'd like to be able to discover the function's table type using the dictionary views. I can run this query here:
select *
from function_parameters
where schema_name = '[xxxxxxxxxx]'
and function_name = 'F_TABLES'
order by function_name, position;
Which will yield something like:
PARAMETER_NAME TABLE_TYPE_SCHEMA TABLE_TYPE_NAME
---------------------------------------------------------------------
_SYS_SS2_RETURN_VAR_ [xxxxxxxxxx] _SYS_SS_TBL_[yyyyyyy]_RET
Unfortunately, I cannot seem to be able to look up that _SYS_SS_TBL_[yyyyyyy]_RET table in SYS.TABLES (and TABLE_COLUMNS), SYS.VIEWS (and VIEW_COLUMNS), SYS.DATA_TYPES, etc. in order to find the definitions of the individual columns.
Note that explicitly named table types created using CREATE TYPE ... do appear in SYS.TABLES...
Is there any way for me to look formally look up a table-valued function's return columns? I'm not interested in parsing the source, obviously.
These kind of tables are internal row-store tables, therefore you can only find your _SYS_SS_TBL_[yyyyyyy]_RET table in SYS.RS_TABLES_. This will give you some basic information, including a column ID (CID). This value is important to find the column information.
For example, if your CID is 100, you can find column information in the RS_COLUMNS_ table with this query:
SELECT * FROM SYS.RS_COLUMNS_ WHERE CID = 100

How to get unique values from each column based on a condition?

I have been trying to find an optimal solution to select unique values from each column. My problem is I don't know column names in advance since different table has different number of columns. So first, I have to find column names and I could use below query to do it:
select column_name from information_schema.columns
where table_name='m0301010000_ds' and column_name like 'c%'
Sample output for column names:
c1, c2a, c2b, c2c, c2d, c2e, c2f, c2g, c2h, c2i, c2j, c2k, ...
Then I would use returned column names to get unique/distinct value in each column and not just distinct row.
I know a simplest and lousy way is to write select distict column_name from table where column_name = 'something' for every single column (around 20-50 times) and its very time consuming too. Since I can't use more than one distinct per column_name, I am stuck with this old school solution.
I am sure there would be a faster and elegant way to achieve this, and I just couldn't figure how. I will really appreciate any help on this.
You can't just return rows, since distinct values don't go together any more.
You could return arrays, which can be had simpler than you may have expected:
SELECT array_agg(DISTINCT c1) AS c1_arr
,array_agg(DISTINCT c2a) AS c2a_arr
,array_agg(DISTINCT c2b) AS c2ba_arr
, ...
FROM m0301010000_ds;
This returns distinct values per column. One array (possibly big) for each column. All connections between values in columns (what used to be in the same row) are lost in the output.
Build SQL automatically
CREATE OR REPLACE FUNCTION f_build_sql_for_dist_vals(_tbl regclass)
RETURNS text AS
$func$
SELECT 'SELECT ' || string_agg(format('array_agg(DISTINCT %1$I) AS %1$I_arr'
, attname)
, E'\n ,' ORDER BY attnum)
|| E'\nFROM ' || _tbl
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
$func$ LANGUAGE sql;
Call:
SELECT f_build_sql_for_dist_vals('public.m0301010000_ds');
Returns an SQL string as displayed above.
I use the system catalog pg_attribute instead of the information schema. And the object identifier type regclass for the table name. More explanation in this related answer:
PLpgSQL function to find columns with only NULL values in a given table
If you need this in "real time", you won't be able to archive it using a SQL that needs to do a full table scan to archive it.
I would advise you to create a separated table containing the distinct values for each column (initialized with SQL from #Erwin Brandstetter ;) and maintain it using a trigger on the original table.
Your new table will have one column per field. # of row will be equals to the max number of distinct values for one field.
For on insert: for each field to maintain check if that value is already there or not. If not, add it.
For on update: for each field to maintain that has old value != from new value, check if the new value is already there or not. If not, add it. Regarding the old value, check if any other row has that value, and if not, remove it from the list (set field to null).
For delete : for each field to maintain, check if any other row has that value, and if not, remove it from the list (set value to null).
This way the load mainly moved to the trigger, and the SQL on the value list table will super fast.
P.S.: Make sure to pass all you SQL from trigger to explain plan to make sure they use best index and execution plan as possible. For update/deletion, just check if old value exists (limit 1).