SQL aggregation function to choose the only value

SQL aggregation function to choose the only value - sql

I have an rowset with two columns: technical_id and natural_id. The rowset is actually result of complex query. The mapping between columns values is assumed to be bijective (i.e. for two rows with same technical_id the natural_ids are same too, for distinct technical_ids the natural_ids are distinct too). The (technical_id,natural_id) pairs are not unique in rowset because of joins in original query. Example:
with t (technical_id, natural_id, val) as (values
(1, 'a', 1),
(1, 'a', 2),
(2, 'b', 3),
(2, 'b', 2),
(3, 'c', 0),
(3, 'c', 1),
(4, 'd', 1)
)
Unfortunately, the bijection is enforced only by application logic. The natural_id is actually collected from multiple tables and composed using coalesce-based expression so its uniqueness hardly can be enforced by db constraint.
I need to aggregate rows of rowset by technical_id assuming the natural_id is unique. If it isn't (for example if tuple (4, 'x', 1) were added into sample data), the query should fail. In ideal SQL world I would use some hypothetical aggregate function:
select technical_id, only(natural_id), sum(val)
from t
group by technical_id;
I know there is not such function in SQL. Is there some alternative or workaround? Postgres-specific solutions are also ok.
Note that group by technical_id, natural_id or select technical_id, max(natural_id) - though working well in happy case - are both unacceptable (first because the technical_id must be unique in result under all circumstances, second because the value is potentially random and masks data inconsistency).
Thanks for tips :-)
UPDATE: the expected answer is
technical_id,v,sum
1,a,3
2,b,5
3,c,1
4,d,1
or fail when 4,x,1 is also present.

You can get only the "unique" natural ids using:
select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having min(natural_id) = max(natural_id);
If you want the query to actually fail, that is a little hard to guarantee. Here is a hacky way to do it:
select technical_id, max(natural_id), sum(val)
from t
group by technical_id
having (case when min(natural_id) = max(natural_id) then 0 else 1 / (count(*) - count(*)) end) = 0;
And a db<>fiddle illustrating this.

Seems I've finally found solution based on single-row cardinality of correlated subquery in select clause:
select technical_id,
(select v from unnest(array_agg(distinct natural_id)) as u(v)) as natural_id,
sum(val)
from t
group by technical_id;
This is the simplest solution for my situation at this moment so I'll resort to self-accept. Anyway if some disadvantages show, I will describe them here and reaccept to other answer. I appreciate all other proposals and believe they will be valuable for anybody too.

You can use
SELECT technical_id, max(natural_id), count(natural_id)
...
GROUP BY technical_id;
and throw an error whenever the count is not 1.
If you want to guarantee the constraint with the database, you could do one of these:
Do away with the artificial primary key.
Do something complicated like this:
CREATE TABLE id_map (
technical_id bigint UNIQUE NOT NULL,
natural_id text UNIQUE NOT NULL,
PRIMARY KEY (technical_id, natural_id)
);
ALTER TABLE t
ADD FOREIGN KEY (technical_id, natural_id) REFERENCES id_map;

You can create your own aggregates. ONLY is a keyword, so best not use it as the name of an aggregate. Not willing to put much time into deciding, I called it only2.
CREATE OR REPLACE FUNCTION public.only_agg(anyelement, anyelement)
RETURNS anyelement
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
if $1 is null then return $2; end if;
if $2 is null then return $1; end if;
if $1=$2 then return $1; end if;
raise exception 'not only';
END $function$;
create aggregate only2 (anyelement) ( sfunc = only_agg, stype = anyelement);
It might not do the thing you want with NULL inputs, but I don't know what you want in that case.

Related

Is there any way to output the result of a column of each row to show a different value in OracleDB using PL/SQL?

I have a Select Statement that has a column which is a code value. For e.g. instead of Java its JV, and instead of Python its PY. However, instead of outputting the coded value, I would like to display them as Java or Python i.e. their full description. Is there a way to do this with PL/SQL?
Any help would be greatly appreciated, thanks!

you can use case expression
select
case
when myColumn = 'JV' then 'Java'
when myColumn = 'PY' then 'Python'
end as myColumn
from yourTable
In oracle you can use decode as well.
decode(col, 'JV', 'Java',
'PY', 'Python'
'No Match'
)

Given that you are using Oracle, I would recommend using the DECODE function:
SELECT
col,
DECODE(col, 'JV', 'Java',
'PY', 'Python',
'Not found') AS col_out
FROM yourTable;

You can use either the "DECODE" function, or a "CASE" construct, as follows:
select DECODE(my_column,
'JV','Java',
'PY','Python',
'no_match_found') my_column_alias
from my_table;
select
case my_column
when 'JV' then 'Java'
when 'PY' then 'Python'
else 'no_match_found'
end my_column_alias
from my_table;

While there are several ways of hard coding the for the sample data given none are the proper method for more than a very limited set. The proper method is to create a lookup table. In this case contains the code and corresponding name. Yes, it is quite overkill for 2 languages, but how about the TIOBE Index or the 700 listed in Wikipedia, or the estimated approximately 9000 claimed in the HOPL list. See here and additional links more.
Moreover it is a standard technique to OP underlying question: Is there a way to give detail about a given code value.
It is easily extended (just add row to a table) and applicable across virtually all domains (abet with slight modifications).
-- create language lookup
create table language_lookup(
code varchar2(8)
, name varchar2(200)
, description varchar2(4000)
, constraint language_lookup_pk
primary key (code)
);
insert into language_lookup(code, name)
select 'JV','Java' from dual union all
select 'PY','Python' from dual ;
-- your table
create table your_table ( id integer
, lang_code varchar2(8)
--...
, constraint your_table_pk
primary key (id)
, constraint your_table_2_language_lookup_fk
foreign key (lang_code)
references language_lookup(code)
) ;
insert into your_table (id, lang_code)
select 1,'JV' from dual union all
select 2,'PY' from dual;
select yt.lang_code, ll.name
from your_table yt
join language_lookup ll
on ll.code = yt.lang_code
where yt.lang_code in ('JV','PY')
;
-- now modify to include plsql
insert into language_lookup(code, name)
values ( 'PLSQL', 'Oracle''s Programming Language extension for SQL');
insert into your_table (id,lang_code)
values (3,'PLSQL');
select yt.lang_code, ll.name
from your_table yt
join language_lookup ll
on ll.code = yt.lang_code
;
Try that with your hard coded values. Then add 30 - 40 more ...

Is there a SQL SELECT to rename one column preserving column order? [duplicate]

here is what I'm trying to do- I have a table with lots of columns and want to create a view with one of the column reassigned based on certain combination of values in other columns, e.g.
Name, Age, Band, Alive ,,, <too many other fields)
And i want a query that will reassign one of the fields, e.g.
Select *, Age =
CASE When "Name" = 'BRYAN ADAMS' AND "Alive" = 1 THEN 18
ELSE "Age"
END
FROM Table
However, the schema that I now have is Name, Age, Band, Alive,,,,<too many>,, Age
I could use 'AS' in my select statment to make it
Name, Age, Band, Alive,,,,<too many>,, Age_Computed.
However, I want to reach the original schema of
Name, Age, Band, Alive.,,,, where Age is actually the computed age.
Is there a selective rename where I can do SELECT * and A_1 as A, B_1 as b? (and then A_1 completely disappears)
or a selective * where I can select all but certain columns? (which would also solve the question asked in the previous statement)
I know the hacky way where I enumerate all columns and create an appropriate query, but I'm still hopeful there is a 'simpler' way to do this.

Sorry, no, there is not a way to replace an existing column name using a SELECT * construct as you desire.
It is always better to define columns explicitly, especially for views, and never use SELECT *. Just use the table's DDL as a model when you create the view. That way you can alter any column definition you want (as in your question) and eliminate columns inappropriate for the view. We use this technique to mask or eliminate columns containing sensitive data like social security numbers and passwords. The link provided by marc_s in the comments is a good read.

Google BigQuery supports SELECT * REPLACE:
A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement.
In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.
A SELECT * REPLACE statement does not change the names or order of columns. However, it can change the value and the value type.
Select *, Age = CASE When "Name" = 'BRYAN ADAMS' AND "Alive" = 1 THEN 18
ELSE "Age"
END
FROM tab
=>
SELECT * REPLACE(CASE WHEN Name = 'BRYAN ADAMS' AND Alive = 1 THEN 18
ELSE Age END AS Age)
FROM Tab

Actually, there is a way to do this in MySQL. You need to use a hack to select all but one column as posted here, then add it separately in the AS statement.
Here is an example:
-- Set-up some example data
DROP TABLE IF EXISTS test;
CREATE TABLE `test` (`ID` int(2), `date` datetime, `val0` varchar(1), val1 INT(1), val2 INT(4), PRIMARY KEY(ID, `date`));
INSERT INTO `test` (`ID`, `date`, `val0`, `val1`, `val2`) VALUES
(1, '2016-03-07 12:20:00', 'a', 1, 1001),
(1, '2016-04-02 12:20:00', 'b', 2, 1004),
(1, '2016-03-01 10:09:00', 'c', 3, 1009),
(1, '2015-04-12 10:09:00', 'd', 4, 1016),
(1, '2016-03-03 12:20:00', 'e', 5, 1025);
-- Select all columns, renaming 'val0' as 'yabadabadoo':
SET #s = CONCAT('SELECT ', (SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), 'val0,', '')
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'test' AND TABLE_SCHEMA =
'<database_name>'), ', val0 AS `yabadabadoo` FROM test');
PREPARE stmt1 FROM #s;
EXECUTE stmt1;

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.

This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.

If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.

Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));

A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).

I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')

What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.

I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.

Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

How to pass result of query to function being part of the same?

This is with Informix 11.70.FC6GE on Linux.
Assuming table mytable with column value varchar(16) and a function as following:
create function myfunc(str varchar(16)) returning varchar(16)
define result varchar(16);
while (<some-condition>)
let result = ...;
return result with resume;
end while;
end function;
When I do
select * from mytable, table(myfunc(value)) vt(result);
I get
Not implemented yet. [SQL State=IX000, DB Errorcode=-999]
... - :-S.
Doing
select * from mytable, table(myfunc('some literal')) vt(result);
works.
Is there any chance to get this going in the given environment? And if not: To which version of Informix do I need to switch?

I don't think there's a way to do what you want; I'm sure there isn't an easy way to do something similar.
Setup
Consider a database with two tables: a table of (chemical) elements — the periodic table — and a table of (US) states.
CREATE TABLE elements
(
atomic_number INTEGER NOT NULL PRIMARY KEY
CHECK (atomic_number > 0 AND atomic_number < 120),
symbol CHAR(3) NOT NULL UNIQUE,
name CHAR(20) NOT NULL UNIQUE,
atomic_weight DECIMAL(8, 4) NOT NULL,
pt_period SMALLINT NOT NULL
CHECK (pt_period BETWEEN 1 AND 7),
pt_group CHAR(2) NOT NULL
-- 'L' for Lanthanoids, 'A' for Actinoids
CHECK (pt_group IN ('1', '2', 'L', 'A', '3', '4', '5', '6',
'7', '8', '9', '10', '11', '12', '13',
'14', '15', '16', '17', '18')),
stable CHAR(1) DEFAULT 'Y' NOT NULL
CHECK (stable IN ('Y', 'N'))
);
CREATE TABLE US_States
(
code CHAR(2) NOT NULL PRIMARY KEY,
name VARCHAR(15) NOT NULL UNIQUE
);
I'll assume you can populate the two tables with the correct data (see Web Elements for the periodic table; the Informix demo database 'stores' has a table state isomorphic to the US_States table used here).
Now consider a procedure states_starting():
CREATE FUNCTION states_starting(initial CHAR(1)) RETURNING VARCHAR(15);
DEFINE result VARCHAR(15);
FOREACH SELECT Name
INTO result
FROM US_States
WHERE Code[1] = initial
ORDER BY Name
RETURN result WITH RESUME;
END FOREACH;
END FUNCTION;
Adapting the queries in the question
I was a little surprised that the vt(result) notation worked — but it does, designating a table alias vt and the column name result. Hence, an adaptation of the query that worked is:
SELECT *
FROM Elements, TABLE(states_starting('M')) vt(result)
This generates the Cartesian product of 118 elements and 8 states with names beginning with 'M' for 944 rows in total. A slightly more reasonable query is:
SELECT *
FROM Elements JOIN TABLE(states_starting('M')) AS vt(result)
ON Elements.Symbol[1] = vt.result[1]
ORDER BY Elements.Atomic_Number
This generates results for 6 elements (Magnesium,
Manganese,
Meitnerium,
Mendelevium,
Molybdenum,
Moscovium: Mercury doesn't figure because its symbol is Hg, which does not start with an M) and 8 states as before, for 48 rows.
The adaptation of the query that you want to execute looks something like:
SELECT *
FROM Elements AS e
JOIN TABLE(states_starting(e.name[1])) AS vt(result)
ON Elements.Symbol[1] = vt.result[1]
ORDER BY Elements.Atomic_Number
This doesn't work, however, generating error:
-217: Column (name) not found in any table in the query (or SLV is undefined).
That's not identical to the error in the question; I've not been able to replicate that. But it is symptomatic of the problems this query faces.
The problem is that in the TABLE(…), the name e isn't known, but removing the e. from the argument doesn't change things.
Further, to generate the complete "correct" result, the TABLE(…) expression would have to be evaluated multiple times, once for each separate initial letter. So, you'd need a set of table results from multiple invocations of the function in the TABLE(…) expression with different arguments. But that isn't the way tables work in SQL. They're supposed to be 'fixed'. The constant argument to the function yields a single result set and looks like a table. But trying to invoke it multiple times with different arguments and dealing with the result set would be (at best) tricky — it isn't the way SQL works.
I'm not entirely satisfied with the explanation. The idea I'm trying to express is almost certainly the reason for the queries not working, but I'm not happy that I've explained it well.
I tried number of variations. For example, assuming you create a GROUP_CONCAT aggregate, you might want to try:
SELECT group_concat(states_starting(a))
FROM (SELECT DISTINCT NAME[1] AS a FROM us_states)
GROUP BY a
but this generates:
-686: Function (states_starting) has returned more than one row.
I don't think that creating a SET or LIST from the the function result helps.
Testing on Mac OS X 10.11.5 with Informix 12.10.FC6 (and ClientSDK 4.10.FC6, and SQLCMD 90.01 — unrelated to Microsoft's johnny-come-lately of the same name).

Rename single column in SELECT * in SQL, select all but a column

here is what I'm trying to do- I have a table with lots of columns and want to create a view with one of the column reassigned based on certain combination of values in other columns, e.g.
Name, Age, Band, Alive ,,, <too many other fields)
And i want a query that will reassign one of the fields, e.g.
Select *, Age =
CASE When "Name" = 'BRYAN ADAMS' AND "Alive" = 1 THEN 18
ELSE "Age"
END
FROM Table
However, the schema that I now have is Name, Age, Band, Alive,,,,<too many>,, Age
I could use 'AS' in my select statment to make it
Name, Age, Band, Alive,,,,<too many>,, Age_Computed.
However, I want to reach the original schema of
Name, Age, Band, Alive.,,,, where Age is actually the computed age.
Is there a selective rename where I can do SELECT * and A_1 as A, B_1 as b? (and then A_1 completely disappears)
or a selective * where I can select all but certain columns? (which would also solve the question asked in the previous statement)
I know the hacky way where I enumerate all columns and create an appropriate query, but I'm still hopeful there is a 'simpler' way to do this.

Sorry, no, there is not a way to replace an existing column name using a SELECT * construct as you desire.
It is always better to define columns explicitly, especially for views, and never use SELECT *. Just use the table's DDL as a model when you create the view. That way you can alter any column definition you want (as in your question) and eliminate columns inappropriate for the view. We use this technique to mask or eliminate columns containing sensitive data like social security numbers and passwords. The link provided by marc_s in the comments is a good read.

Google BigQuery supports SELECT * REPLACE:
A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement.
In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.
A SELECT * REPLACE statement does not change the names or order of columns. However, it can change the value and the value type.
Select *, Age = CASE When "Name" = 'BRYAN ADAMS' AND "Alive" = 1 THEN 18
ELSE "Age"
END
FROM tab
=>
SELECT * REPLACE(CASE WHEN Name = 'BRYAN ADAMS' AND Alive = 1 THEN 18
ELSE Age END AS Age)
FROM Tab

Actually, there is a way to do this in MySQL. You need to use a hack to select all but one column as posted here, then add it separately in the AS statement.
Here is an example:
-- Set-up some example data
DROP TABLE IF EXISTS test;
CREATE TABLE `test` (`ID` int(2), `date` datetime, `val0` varchar(1), val1 INT(1), val2 INT(4), PRIMARY KEY(ID, `date`));
INSERT INTO `test` (`ID`, `date`, `val0`, `val1`, `val2`) VALUES
(1, '2016-03-07 12:20:00', 'a', 1, 1001),
(1, '2016-04-02 12:20:00', 'b', 2, 1004),
(1, '2016-03-01 10:09:00', 'c', 3, 1009),
(1, '2015-04-12 10:09:00', 'd', 4, 1016),
(1, '2016-03-03 12:20:00', 'e', 5, 1025);
-- Select all columns, renaming 'val0' as 'yabadabadoo':
SET #s = CONCAT('SELECT ', (SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), 'val0,', '')
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'test' AND TABLE_SCHEMA =
'<database_name>'), ', val0 AS `yabadabadoo` FROM test');
PREPARE stmt1 FROM #s;
EXECUTE stmt1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL aggregation function to choose the only value - sql

Related

Is there any way to output the result of a column of each row to show a different value in OracleDB using PL/SQL?

Is there a SQL SELECT to rename one column preserving column order? [duplicate]

'In' clause in SQL server with multiple columns

How to pass result of query to function being part of the same?

Rename single column in SELECT * in SQL, select all but a column

Categories

Resources