Compute value in SELECT based on whether a column is NOT NULL - sql

I'm not sure of the correct terminology, but this should be clear when you see what I have so far.
SELECT
(
-- WHAT GOES HERE?
) as "Type",
COUNT(*) AS pie
FROM "people"
GROUP BY "Type"
ORDER BY COUNT(*) DESC, "Type"
I'm trying to classify people based on whether or not they have a value in any of these columns:
employee_id
student_id
with these types being possible:
Employee
Student
Both Employee & Student
(As you might have guessed from the SQL, this is going to generate a pie graph, so instead of putting anyone in 2 categories, I have a category that includes the people who are both employees and students.)

I believe a CASE expression would be suitable
CASE
WHEN employee_id IS NOT NULL AND student_id IS NOT NULL THEN 'Both'
WHEN employee_id IS NOT NULL then 'Employee'
WHEN student_id IS NOT NULL then 'Student'
ELSE 'None'
END AS "Type"
You can say GROUP BY 1 to select the first expression.
From https://www.postgresql.org/docs/10/sql-select.html ...
The optional GROUP BY clause has the general form
GROUP BY grouping_element [, ...]
GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions. An expression used inside a grouping_element can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name.

Related

Why column must appear in the GROUP BY?

I have this:
SELECT name, value,
MIN(value) as find_min
FROM history
WHERE date_num >= 1609459200
AND date_num <= 1640995200
AND name IN('A')
GROUP BY name
Trying to get the minimum value between dates for each subject separately :
name value
A. 3
B 4
C 9
A 0
C 2
I keep getting this popular error:
column "history.value" must appear in the GROUP BY clause or be used in an aggregate function
I read this must appear in the GROUP BY clause or be used in an aggregate function
and I still do not understand:
Why I have to include in GROUP BY everything? what is the logic?
Why is this not working?
is Min() over (partition by name) better, and if so, how can I get only a single result per name?
EDIT:
If I try:GROUP BY name, find_min it will fail as well, even though in this case he can produce a unique result (the all the same)
That is actually easy to understand.
When you say GROUP BY name, all rows where name is the same are grouped together to form a single result row. Now the original table could contain two rows with the same name, but different value. If you add value to the SELECT list, which of those should be output? On the other hand, determining min(value) for each group is no problem.
Even if there is only a single value for the whole group (like with your find_min), you have to add the column to GROUP BY.
There is actually one exception: if the primary key of a table is in the GROUP BY clause, other columns from that table need not be in GROUP BY, because this proves automatically that there can be no different values.
try like below
SELECT name,
MIN(value) as find_min
FROM history
WHERE date_num >= 1609459200 AND date_num <= 1640995200
GROUP BY name
I removed name in ('A') because your are searching for all name min value so it will restrict just A
To answer your question, GROUP BY groups similar data in a table.
For example this table:
A B C
a d 1
a k 2
b d 3
And you have the query:
SELECT A, B, MIN(C)
FROM t
GROUP BY A
and this would not work you can't give a decisive answer what to do with the entry a k 2 because you don't group by Column B, but you group by column A, is there now two entries but they are different. Therefore you have to group by all non min,max,sum,etc. columns.

How to execute a select with a WHERE using a not-always-existing column

Simple example: I have some (nearly) identical tables with personal data (age, name, weight, ...)
Now I have a simple, but long SELECT to find missing data:
Select ID
from personal_data_a
where
born is null
or age < 1
or weight > 500
or (name is 'John' and surname is 'Doe')
Now the problem is:
I have some personal_data tables where the column "surname" does not exit, but I want to use the same SQL-statement for all of them. So I have to check (inside the WHERE clause) that the last OR-condition is only used "IF the column surname exists".
Can it be done in a simple way?
You should have all people in the same table.
If you can't do that for some reason, consider creating a view. Something like this:
CREATE OR REPLACE VIEW v_personal_data
AS
SELECT id,
born,
name,
surname,
age,
weight
FROM personal_data_a
UNION ALL
SELECT id,
born,
name,
NULL AS surname, --> this table doesn't contain surname
age,
weight
FROM personal_data_b;
and then
SELECT id
FROM v_personal_data
WHERE born IS NULL
OR age < 1
OR ( name = 'John'
AND ( surname = 'Doe'
OR surname IS NULL))
Can it be done in a simple way?
No, SQL statements work with static columns and the statements will raise an exception if you try to refer to a column that does not exist.
You will either:
need to have a different query for tables with the surname column and those without;
have to check in the data dictionary whether the table has the column or not and then use dynamic SQL to build your query; or
to build a VIEW of the tables which do not have that column and add the column to the view (or add a GENERATED surname column with a NULL value to the tables that are missing it) and use that instead.
While dynamic predicates are usually best handled by the application or by custom PL/SQL objects that use dynamic SQL, you can solve this problem with a single SQL statement using DBMS_XMLGEN, XMLTABLE, and the data dictionary. The following code is not what I would call "simple", but it is simple in the sense that it does not require any schema changes.
--Get the ID column from a PERSONAL table.
--
--#4: Get the IDs from the XMLType.
select id
from
(
--#3: Convert the XML to an XMLType.
select xmltype(personal_xml) personal_xmltype
from
(
--#2: Convert the SQL to XML.
select dbms_xmlgen.getxml(v_sql) personal_xml
from
(
--#1: Use data dictionary to create SQL statement that may or may not include
-- the surname predicate.
select max(replace(replace(
q'[
Select ID
from #TABLE_NAME#
where
born is null
or age < 1
or weight > 500
or (name = 'John' #OPTIONAL_SURNAME_PREDICATE#)
]'
, '#TABLE_NAME#', table_name)
, '#OPTIONAL_SURNAME_PREDICATE#', case when column_name = 'SURNAME' then
'and surname = ''Doe''' else null end)) v_sql
from all_tab_columns
--Change this literal to the desired table.
where table_name = 'PERSONAL_DATA_A'
)
)
where personal_xml is not null
)
cross join xmltable
(
'/ROWSET/ROW'
passing personal_xmltype
columns
id number path 'ID'
);
See this db<>fiddle for a runnable example.

SQL query to get conflicting values in JSONB from a group

I have a table defined similar to the one below. location_id is a FK to another table. Reports are saved in an N+1 fashion: for a single location, N reporters are available, and there's one report used as the source of truth, if you will. Reports from reporters have a single-letter code (let's say R), the source of truth has a different code (let's say T). The keys for the JSONB column are regular strings, values are any combination of strings, integers and integral arrays.
create table report (
id integer not null primary key,
location_id integer not null,
report_type char(1),
data jsonb
)
Given all the information above, how can I get all location IDs where the data values for a given set of keys (supplied at query time) are not all the same for the report_type R?
There are at least two solid approaches, depending on how complex you want to get and how numerous and/or dynamic the keys are. The first is very straightforward:
select location_id
from report
where report_type = 'R'
group by location_id
having count(distinct data->'key1') > 1
or count(distinct data->'key2') > 1
or count(distinct data->'key3') > 1
The second construction is more complex, but has the advantage of providing a very simple list of keys:
--note that we also need distinct on location id to return one row per location
select distinct on(location_id) location_id
--jsonb_each returns the key, value pairs with value in type JSON (or JSONB) so the value field can handle integers, text, arrays, etc
from report, jsonb_each(data)
where report_type = 'R'
and key in('key1', 'key2', 'key3')
group by location_id, key
having count(distinct value) > 1
order by location_id

Determine the number of times a null value occurs in column B for a distinct value in column A, SQL table

I have a SQL table with "name" as one column, date as another, and location as a third. The location column supports null values.
I am trying to write a query to determine the number of times a null value occurs in the location column for each distinct value in the name column.
Can someone please assist?
One method uses conditional aggregation:
select name, sum(case when location is null then 1 else 0 end)
from t
group by name;
Another method that involves slightly less typing is:
select name, count(*) - count(location)
from t
group by name;
use count along with filters, as you only requires Null occurrence
select name, count(*) occurances
from mytable
where location is null
group by name
From your question, you'll want to get a distinct list of all different 'name' rows, and then you would like a count of how many NULLs there are per each name.
The following will achieve this:
SELECT name, count(*) as null_counts
FROM table
WHERE location IS NULL
GROUP BY name
The WHERE clause will only retrieve records where the records have NULL as their location.
The GROUP BY will pivot the data based on NAME.
The SELECT will give you the name, and the COUNT(*) of the number of records, per name.

Select another column value from the same table if another column value is NULL

I have a table with four columns: NAME, AGE, PRIMARYWEIGHT and SECONDARYWEIGHT
Where NAME = 'Damian', I wish to select AGE and PRIMARYWEIGHT only if SECONDARYWEIGHT is NULL otherwise I'll take PRIMARYWEIGHT.
Ideally I'd like to give its an alias 'WEIGHT' regardless of whether it was PRIMARYWEIGHT or SECONDARYWEIGHT.
SELECT NAME, AGE, ISNULL(PRIMARYWEIGHT, SECONDARYWEIGH) As WEIGHT
msdn reference
SELECT AGE, COALESCE(SECONDARYWEIGH, PRIMARYWEIGHT) As WEIGHT
You can use COALESCE (as indicated in your tag)
Evaluates the arguments in order and returns the current value of the first expression that initially does not evaluate to NULL.