Majority voting of columns SQL - sql

I need to do something like 'majority voting' of columns in SQL database. That means, that having columns: c0, c1, ..., cn, I would like to have in some other column for each row the most frequent value among mentioned columns (and null or random otherwise - it doesn't really matter). For example, if we have the following table:
+--+--+--+------+
|c0|c1|c2|result|
+--+--+--+------+
| 0| 1| 0| 0|
| 0| 1| 1| 1|
| 2| 2| 0| 2|
| 0| 3| 1| null|
That is what I mean by majority voting of columns c0, c1, c2: in the first row we have 2 rows with value 0 and 1 with 1, so result = 0. In the second we have one 0's vs two 1's, ergo result = 1 and so on. We assume, that all the columns have the same type.
It would be great, if the query were concise (it can be build dynamically). Native SQL is preferred, but PL/SQL, psql will also do.
Thank you in advance.

This can easily be done by creating a table out of the three columns and using an aggregate function on that:
The following works in Postgres:
select c0,c1,c2,
(select c
from unnest(array[c0,c1,c2]) as t(c)
group by c
having count(*) > 1
order by count(*) desc
limit 1)
from the_table;
If you don't want to hard-code the column names, you can use Postgres' JSON function as well:
select t.*,
(select t.v
from jsonb_each_text(to_jsonb(t)) as t(c,v)
group by t.v
having count(*) > 1
order by count(*) desc
limit 1) as result
from the_table t;
Note that the above takes all columns into account. If you want to remove specific columns (e.g. an id column) you need to use to_jsonb(t) - 'id' to remove that key from the JSON value.
Neither of those solutions deals with ties (two different values appearing the same number of times).
Online example: https://rextester.com/PJR58760
The first solution can be "adapted" somewhat to Oracle, especially if you can build the SQL on the fly:
select t.*,
(select c
from (
-- this part would need to be done dynamically
-- if you don't know the columns
select t.c0 as c from dual union all
select t.c1 from dual union all
select t.c2 from dual
) x
group by c
having count(*) > 1
order by count(*) desc
fetch first 1 rows only) as result
from the_table t;

In Postgres use jsonb functions. You need primary key or unique column(s), id is unique in the example:
with my_table(id, c0, c1, c2) as (
values
(1, 0, 1, 0),
(2, 0, 1, 1),
(3, 2, 2, 0),
(4, 0, 3, 1)
)
select distinct on (id) id, value
from (
select id, value, count(*)
from my_table t
cross join jsonb_each_text(to_jsonb(t)- 'id')
group by id, value
) s
order by id, count desc
id | value
----+-------
1 | 0
2 | 1
3 | 2
4 | 1
(4 rows)
The query works well regardless of the number of columns.

Here's a solution for Postgres.
SELECT t1.c0,
t1.c1,
t1.c2,
(SELECT y.c
FROM (SELECT x.c,
count(*) OVER (PARTITION BY x.rn) ct
FROM (SELECT v.c,
rank() OVER (ORDER BY count(v.c) DESC) rn
FROM (VALUES (t1.c0),
(t1.c1),
(t1.c2)) v(c)
GROUP BY v.c) x
WHERE x.rn = 1) y
WHERE y.ct = 1) result
FROM elbat t1;
db<>fiddle
In the subquery first all the values with maximum count are taken using rank(). The windowed version of count() is then used to filter if there is only one value with maximum count.
If you need to do this over more columns, just add them to the SELECT and the VALUES.

THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.
You can just compare the values. For your example with two values neither of which is NULL:
select t.*
(case when ((case when c0 = 0 then 1 else -1 end) +
(case when c1 = 0 then 1 else -1 end) +
(case when c2 = 0 then 1 else -1 end)
) > 0
then 0 else 1
end)
from t;

Related

Aggregate function of column that has at least one type of value in a different column

I have table below as
ID | Blood_Type | Size
B2 A 100
B2 B 200
C2 C 102
C2 O 88
G4 I 44
G4 A 100
How can I query my table above to get average of only my IDs that have at least have one row with blood type of A
Expected output:
ID | Avg_Size
B2 150
G4 72
Thanks!
Tim's answer is good, a simpler albeit perhaps not how you would want to do it, other way, is doing HAVING in long form
SELECT id,
avg_size
FROM (
SELECT id,
AVG(size) AS avg_size,
SUM(IFF(blood_type = 'A', 1, 0)) AS a_count
FROM table
GROUP BY id
)
WHERE a_count > 1;
so you can ether use SUM or COUNT, they both ignore nulls, which is the implicit result of Tim's CASE WHEN Blood_Type = 'A' THEN 1 END is the same as
CASE
WHEN Blood_Type = 'A' THEN 1
ELSE NULL
END
if you use SUM it can handle null's or zeros thus the IFF can be used, which I like as it's smaller and more explicit about what is happening.
thus Tim's answer can be swapped to a SUM(IFF
like:
SELECT
id,
AVG(size) AS avg_size
FROM table
GROUP BY id
HAVING SUM(IFF(Blood_Type = 'A',1, 0) > 0;
You may try aggregation here with an assertion in the HAVING clause:
SELECT ID, AVG(Size) AS Avg_Size
FROM yourTable
GROUP BY ID
HAVING COUNT(CASE WHEN Blood_Type = 'A' THEN 1 END) > 0;

Select distinct count after count?

I'll cut right to the chase: I have a select I'm currently writing with a rather lengthy where clause, what I want to do is calculate percentages.
So what I need is the count of all results and then my each distinct counts.
SELECT distinct count(*)
FROM mytable
WHERE mywhereclause
ORDER BY columnIuseInWhereClause
works fine for getting each individual values, but I want to avoid doing something like
Select (Select count(*) from mytable WHERE mywhereclause),
distinct count(*)
FROM mytable
WHERE mywhereclause
because I'd be using the same where-clause twice which just seems unnecessary.
This is for OracleDB but I'm only using standard SQL syntax, nothing database specific if I can help it.
Thanks for any ideas.
Edit:
Sample Data
__ID__,__someValue__
1 | A
2 | A
3 | B
4 | C
I want the occurances of A, B, C as numbers as well as the overall count.
__CountAll__,__ACounts__,__BCounts__,__CCounts__
4 | 2 | 1 | 1
So I can get to
100% | 50% | 25% | 25%
That last part I can probably figure out on my own. Excuse my lack of experience or even logic thinking, it's early in the morning. ;)
Edit2:
I do have written a query that works but is clumsy and long as all holy heck, this one is for trying with group by.
Try:
select count(*) as CountAll,
count(distinct SomeColumn) as CoundDistinct -- The DISTINCT goes inside the brackets
from myTable
where SomeOtherColumn = 'Something'
Use case expressions to do conditional counting:
select count(*) as CountAll,
count(case when someValue = 'A' then 1 end) as ACounts,
count(case when someValue = 'B' then 1 end) as BCounts,
count(case when someValue = 'C' then 1 end) as CCounts
FROM mytable
WHERE mywhereclause
Wrap it up in a derived table to do the % part easy:
select 100,
ACounts * 100 / CountAll,
BCounts * 100 / CountAll,
CCounts * 100 / CountAll
from
(
select count(*) as CountAll,
count(case when someValue = 'A' then 1 end) as ACounts,
count(case when someValue = 'B' then 1 end) as BCounts,
count(case when someValue = 'C' then 1 end) as CCounts
FROM mytable
WHERE mywhereclause
) dt
Here's an alternative using window function:
with data_table(ID, some_value)
AS
(SELECT 1,'A' UNION ALL
SELECT 2,'A' UNION ALL
SELECT 3,'B' UNION ALL
SELECT 4,'C'
)
SELECT DISTINCT [some_value],
COUNT([some_value]) OVER () AS Count_All,
COUNT([some_value]) OVER (PARTITION BY [some_value]) AS 'Counts' FROM [data_table]
ORDER BY [some_value]
The advantage is that you don't have to hard-code your [some_value]

Case having subquery for Oracle bot working

I am facing some problem in CASE query in Oracle
select
case
when substr(object_subtype,0,1) = '8'
then
'Planatias'
when substr(object_subtype,0,1) = '1'
then
'Licence'
when substr(object_subtype,0,1) = '4'
then
'PMA'
when substr(object_subtype,0,1) = '7'
then
'Location'
else
'no'
end objectType,
id ,substr(object_subtype,0,1)
from amatia_logtask order by 1
Now my problem is I have 4 different tables for each number from case
select * from amatia_licencias ;
select * from amatia_locacion ;
select * from amatia_pma;
select id_plantilla from amatia_plantillas;
And I want specific field from these 4 table with respect to their Id in CASE statment
but query like this
select
case
when substr(object_subtype,0,1) = '8'
then
select id_pma from amatia_plantillas where id_plantilla = substr(object_subtype,3)
when substr(object_subtype,0,1) = '1'
then
'Licence'
when substr(object_subtype,0,1) = '4'
then
'PMA'
when substr(object_subtype,0,1) = '7'
then
'Location'
else
'no'
end objectType,
id ,substr(object_subtype,0,1)
from amatia_logtask order by 1
is not working for me
giving This error
ORA-00936: missing expression
00936. 00000 - "missing expression"
*Cause:
*Action:
Error at Line: 5 Column: 15
CASE statements return expressions: SELECT statements don't count as expressions in this context. So don't use CASE here, use outer joins.
You haven't provided enough details for us to guarantee working SQL so you'll have to pick the bones out of this:
with logtask as ( select id
, substr(object_subtype,0,1) as st_1
, substr(object_subtype,3) as st_3
from amatia_logtask
)
select logtask.id
, logtask.st_1
, coalesce ( apla.id_pma
, alic.id_blah
, aloc.id_meh
, apma.id_etc
, 'no' ) as whatever
from logtask
left join amatia_plantillas pla
on logtask.st_1 = apla.id_plantilla
left join amatia_licencias alic
on logtask.st_1 = alic.id_licencia
left join amatia_locacion aloc
on logtask.st_1 = aloc.id_locacion
left join amatia_pma apma
on logtask.st_1 = apma.id_pma
order by 3, 1
/
It is possible to use subqueries - BUT they MUST only have one row and one column. Think about the result of a query in columns/rows :
| COL1 | COL2 |
|------|------|
| a | x |
| b | y |
each "cell" holds one value (and one value only)
| COL1 | COL2 |
|------|----------------------------------------------------------------------|
| a | cannot be a "select *" subquery because that is more than one column |
| b | y |
So, IF you use subqueries in a select clause, they can only return one value (one row, one column) so the subquery must be carefully written
CREATE TABLE A_TABLE
("COL1" varchar2(1), "COL2" varchar2(1))
;
INSERT ALL
INTO A_TABLE ("COL1", "COL2")
VALUES ('a', 'x')
INTO A_TABLE ("COL1", "COL2")
VALUES ('b', 'y')
SELECT * FROM dual
;
**Query 1**:
select
case
when col1 = 'a' then (select 'subquery 1' from dual)
when col1 = 'b' then (select 'subquery 2' from dual)
else (select 'one value' from dual)
end as col1_case
from a_table
**[Results][2]**:
| COL1_CASE |
|------------|
| subquery 1 |
| subquery 2 |
http://sqlfiddle.com/#!4/76041/3
I'm NOT recommending it, I'm merely showing the method. I would use joins in preference if it is feasible.

Group by multiple criteria

Given the table like
| userid | active | anonymous |
| 1 | t | f |
| 2 | f | f |
| 3 | f | t |
I need to get:
number of users
number of users with 'active' = true
number of users with 'active' = false
number of users with 'anonymous' = true
number of users with 'anonymous' = false
with single query.
As for now, I only came out with the solution using union:
SELECT count(*) FROM mytable
UNION ALL
SELECT count(*) FROM mytable where active
UNION ALL
SELECT count(*) FROM mytable where anonymous
So I can take first number and find non-active and non-anonymous users with simple deduction .
Is there any way to get rid of union and calculate number of records matching these simple conditions with some magic and efficient query in PostgreSQL 9?
You can use an aggregate function with a CASE to get the result in separate columns:
select
count(*) TotalUsers,
sum(case when active = 't' then 1 else 0 end) TotalActiveTrue,
sum(case when active = 'f' then 1 else 0 end) TotalActiveFalse,
sum(case when anonymous = 't' then 1 else 0 end) TotalAnonTrue,
sum(case when anonymous = 'f' then 1 else 0 end) TotalAnonFalse
from mytable;
See SQL Fiddle with Demo
Assuming your columns are boolean NOT NULL, this should be a bit faster:
SELECT total_ct
,active_ct
,(total_ct - active_ct) AS not_active_ct
,anon_ct
,(total_ct - anon_ct) AS not_anon_ct
FROM (
SELECT count(*) AS total_ct
,count(active OR NULL) AS active_ct
,count(anonymous OR NULL) AS anon_ct
FROM tbl
) sub;
Find a detailed explanation for the techniques used in this closely related answer:
Compute percents from SUM() in the same SELECT sql query
Indexes are hardly going to be of any use, since the whole table has to be read anyway. A covering index might be of help if your rows are bigger than in the example. Depends on the specifics of your actual table.
-> SQLfiddle comparing to #bluefeet's version with CASE statements for each value.
SQL server folks are not used to the proper boolean type of Postgres and tend to go the long way round.

How do I modify this query without increasing the number of rows returned?

I've got a sub-select in a query that looks something like this:
left outer join
(select distinct ID from OTHER_TABLE) as MYJOIN
on BASE_OBJECT.ID = MYJOIN.ID
It's pretty straightforward. Checks to see if a certain relation exists between the main object being queried for and the object represented by OTHER_TABLE by whether or not MYJOIN.ID is null on the row in question.
But now the requirements have changed a little. There's another row in OTHER_TABLE that can have a value of 1 or 0, and the query needs to know whether a relation exists between the primary for a 1-value, and also if it exists for a 0 value. The obvious solutions is to put:
left outer join
(select distinct ID, TYPE_VALUE from OTHER_TABLE) as MYJOIN
on BASE_OBJECT.ID = MYJOIN.ID
But that would be wrong because if 0-type and 1-type objects both exist for the same ID, it will increase the number of rows returned by the query, which isn't acceptable. So what I need is some sort of subselect that will return 1 row for each distinct ID, with a "1-type exists" column and a "0-type exists" column. And I have no idea how to code that in SQL.
For example, for the following table,
ID | TYPE_VALUE
_________________
1 | 1
3 | 0
3 | 1
4 | 0
I'd like to see a result set like this:
ID | HAS_TYPE_0 | HAS_TYPE_1
______________________________
1 | 0 | 1
3 | 1 | 1
4 | 1 | 0
Anyone know how I could set up a query to do this? Hopefully with a minimum of ugly hacks?
In the general case, you would use EXISTS:
SELECT DISTINCT ID,
CASE WHEN EXISTS (
SELECT * FROM Table1 y
WHERE y.TYPE_VALUE = 0 AND ID = x.ID)
THEN 1
ELSE 0 END AS HAS_TYPE_0,
CASE WHEN EXISTS (
SELECT * FROM Table1 y
WHERE y.TYPE_VALUE = 1 AND ID = x.ID)
THEN 1
ELSE 0 END AS HAS_TYPE_1
FROM Table1 x;
If you have a very large number of elements in the table, this won't perform so great - those nested subselects are often a kiss of death when it comes to performance.
For your specific case, you could also use GROUP BY and MAX() and MIN() to speed things up:
SELECT
ID,
CASE WHEN MIN(TYPE_VALUE) = 0 THEN '1' ELSE 0 END AS HAS_TYPE_0,
CASE WHEN MAX(TYPE_VALUE) = 1 THEN '1' ELSE 0 END AS HAS_TYPE_1
FROM Table1
GROUP BY ID;
Instead of select distinct ID, TYPE_VALUE from OTHER_TABLE
use
select ID,
MAX(CASE WHEN TYPE_VALUE =0 THEN 1 END) as has_type_0,
MAX(CASE WHEN TYPE_VALUE =1 THEN 1 END) as has_type_1
from OTHER_TABLE
GROUP BY ID;
You can do the same using PIVOT opearator...