Combine two (or multiple) columns of a table - sql

I have a table
a b c
1 2
1 3
1 4 1
2 1 2
The column a and c should be combined if the value is the same. If there are not the same, it is always so that one is empty
So the result should be:
a b
1 2
1 3
1 4
2 1
Is there any function that can be applied in PostgreSQL?

According to your description:
The column a and c should be combined if the value is the same. If
there are not the same, it is always so that one is empty
all you need is an unconditional COALESCE.
SELECT COALESCE(a, c) AS a, b FROM tbl;
Assuming that by "empty" you mean NULL, not an empty string (''), in which case you'd add NULLIF:
SELECT COALESCE(NULLIF(a, ''), c) AS a, b FROM tbl;
COALESCE works for multiple parameters:
SELECT COALESCE(a, c, d, e, f, g) AS a, b FROM tbl;

Are you looking for something like this?
SELECT COALESCE(c, a), b
FROM your_table
WHERE COALESCE(c, a) = a

Related

How do I aggregate/sort the results of one column based on data in another?

I have a table displaying three columns, A, B, C. Column A has duplicate values. How do I sort the results based on column C?
For example:
A B C
Amanda healthy
Amanda healthy
Brian healthy
Brian sick
Brian healthy
Colleen [null]
Colleen sick
Tyler healthy
Tyler [null]
Tyler fever
Daniel [null]
Daniel [null]
Daniel [null]
So that's just an example. I've left column B blank because it doesn't really matter here. What I'm trying to do is aggregate the duplicates in A based on the results in C. If all the results are null, then that should show me value 0. If the results are all healthy, or a mixture of healthy and null, then I want value 1. If there is any mention of being sick in the results, I want that to be value 2.
So for example, in the above, I want Amanda to give me value 1, Brian 2, Colleen 2, Tyler 2, Daniel 0. Any thoughts on how I may go about doing that? Thank you!
From your data and results, it looks like want count(distinct):
select a, count(distinct c) cnt
from mytbale
group by a
From the description of your question, that's a bit different. You can use a case expression:
select
a,
case
when max(case when c = 'sick' then 1 else 0 end) = 1 then 2
when max(case when c = 'healthy' then 1 else 0 end) = 1 then 1
when count(c) = 0 then 0
end as res
from mytable
group by a
First step: register the incidences (yes/no = 1/0) of the possible outcomes in C per A:
SELECT a,BIT_XOR(CASE WHEN c IS NULL THEN 1 WHEN c="healthy" THEN 2 ELSE 4 END) FROM table GROUP BY a;
(here I've assumed that "sickness" and "fever" are to be treated similarly since you didn't specify what e.g. all "fever"'s should yield)
In that query the outputs are:
1 if all NULL (for c).
2 if all "healthy".
3 if all "healthy" (at least 1) or NULL (at least one)
otherwise (1xxb) if there is at least one incidence of "sick"/"fever".
To adapt those outputs to your desirable values (0/1/1/2 for the respective cases) the query can be adapted to e.g.
SELECT a,If(bitmap=1,0,IF(bitmap&4,2,1)) FROM
(SELECT a,BIT_XOR(CASE WHEN c IS NULL THEN 1 WHEN c="healthy" THEN 2 ELSE 4 END) bitmap FROM table GROUP BY a) tmp;
PS: if you have allergic reactions to this nested query (e.g. because the SQL server might not optimize it by itself), just rewrite it into a single query by repeating the inner expression, or simply translate the values of the original query at the receiving end.
That script is for oracle, but the 'select' is SQL-92. Hope it will be useful for you.
create table anamnesis (
a varchar(20),
c varchar(20)
);
INSERT ALL
INTO anamnesis (a, c) VALUES ('Amanda','healthy')
INTO anamnesis (a, c) VALUES ('Amanda','healthy')
INTO anamnesis (a, c) VALUES ('Brian','healthy')
INTO anamnesis (a, c) VALUES ('Brian','sick')
INTO anamnesis (a, c) VALUES ('Brian','healthy')
INTO anamnesis (a, c) VALUES ('Colleen',null)
INTO anamnesis (a, c) VALUES ('Colleen','sick')
INTO anamnesis (a, c) VALUES ('Tyler','healthy')
INTO anamnesis (a, c) VALUES ('Tyler',null)
INTO anamnesis (a, c) VALUES ('Tyler','fever')
INTO anamnesis (a, c) VALUES ('Daniel',null)
INTO anamnesis (a, c) VALUES ('Daniel',null)
INTO anamnesis (a, c) VALUES ('Daniel',null)
SELECT * FROM dual;
select t.a, (case when sick>0 then 2 else
case when healthy>0 then 1 else
0
end
end
) res
from
(select
t.a,
sum(case when t.c is null then 1 else 0 end) nullable,
sum(case when t.c='healthy' then 1 else 0 end) healthy,
sum(case when coalesce(t.c,'-') not in ('-','healthy') then 1 else 0 end) sick
from
anamnesis t
group by t.a) t;

Using temporary variable when creating a Netezza VIEW

Here's a simple view I'd like to create, but I'd like only FOUR columns in the final view: a, b, c, e. I would like to define d for temporary use in determining the value of e, but then I do not want d to be part of the resulting view.
create view v as
select a, b, c, a+b+c as d,
case when d > 1000 then 1
when d > 100 then 2
when d > 10 then 3
else 4 END as e
from tbl;
Is there any way in Netezza SQL to define such a temporary value?
In this simple example, certainly I could replace each of my "when d" statements with "when a+b+c" every time, but my real-world scenario is more complex than illustrated here.
You can use a subquery:
create view v as
select a, b, c,
(case when d > 1000 then 1
when d > 100 then 2
when d > 10 then 3
else 4
end) as e
from (select tbl.*, (a + b + c) as d
from tbl
) t;

Count every rows

i tried to count the value of every rows in MYSQL. But it only count the first row only. Can someone assist
First Query:
SELECT A, B, C
FROM [TEST].[dbo].[TEST3]
Result:
A B C
7 8 9
1 2 NULL
1 3 4
1 NULL 1
Count every rows but only the first row appear as result.
Query
SELECT COUNT (A + B + C)
FROM [TEST].[dbo].[TEST3]
Result:
2
It supposed to be 7+8+9 = 22
1+2+NULL = 3
etc.
Just take the sum of the columns directly:
SELECT A, B, C,
COALESCE(A, 0) + COALESCE(B, 0) COALESCE(C, 0) AS total
FROM [TEST].[dbo].[TEST3]
The reason why your current query using COUNT returns a single row is that COUNT is an aggregate function. In the absence of GROUP BY, you are telling SQL Server to return a count over the entire table.
This is what you want:
SELECT IFNULL(A,0)+IFNULL(B,0)+IFNULL(C,0)
FROM [TEST].[dbo].[TEST3]
You do not want to use COUNT() here. COUNT() is an aggregate function. It output once per group. In your case, the whole query will output only one value.
Moreover, adding NULL to anything will be NULL and COUNT() will ignore that. Therefore the output of your query is 2.
COUNT() is a aggregate function which will return group result.
The result is actually correct since 1 + 2 + NULL = NULL, not 3.
SELECT COUNT (A + B + C) FROM [TEST].[dbo].[TEST3]
Returns 2 because COUNT() will count only non-null value. If you run the query without COUNT() it will return 4 rows.
SELECT A + B + C FROM [TEST].[dbo].[TEST3]
The result is
24
NULL
8
NULL
However, if you wanted to return rows considering NULL as 0, you can use COALESCE within the columns,
SELECT COALESCE(A, 0) + COALESCE(B, 0) + COALESCE(C, 0)
FROM [TEST].[dbo].[TEST3]
will now return
24
3
8
2
And when you write it with count, it will now return 4.
SELECT COUNT(COALESCE(A, 0) + COALESCE(B, 0) + COALESCE(C, 0) )
FROM [TEST].[dbo].[TEST3]
Result:
4
Here's a Demo.
SELECT
A, B, C,
Total = A + B+ C
FROM dbo.TEST
DO NOT USE COUNT.

sql: select rows where group of elements occurs several times in the table

I am searching for an implementation of the following pseodo-code:
SELECT A, B, C
FROM X
HAVING COUNT(A,B) > 1
Here is an example of what the code should do:
Assume table X looks as follows:
A B C D
--------------
1 1 0 2
1 1 1 1
2 1 1 0
The first and the second row have the same entries in columns A and B, the third column is identical in column B but different in column A. The desired output is columns A,B, and C of rows 1 and 2:
1 1 0
1 1 1
How could this be implemented? The problem with my pseodo-code is, that COUNT accepts either a single column or all columns (*), but it can't take two out of 4 columns. GROUP BY has the same property.
You can do this with an exists clause. This should work in all databases:
select a, b, c
from x
where exists (select 1
from x x2
where x.a = x2.a and x.b = x2.b and x.c <> x2.c
);
This assumes that the rows have difference c values.
This will perform best with an index on x(a, b).
For RDMS that supports analytic functions, you can do
SELECT a,b,c
FROM
(
SELECT a, b, c, count(1) OVER(PARTITION BY a,b) cnt
FROM X
)t1
WHERE t1.cnt >1
If analytic/windows function are not available , join should do the job
SELECT t1.a, t1.b, t1.c
FROM X t1
INNER JOIN
(
SELECT a,b
FROM X
GROUP BY a,b
HAVING COUNT(1) >1
)t2 ON (t2.a=t1.a AND t2.b=t1.b)

Using new columns in the "where" clause

hive rejects this code:
select a, b, a+b as c
from t
where c > 0
saying Invalid table alias or column reference 'c'.
do I really need to write something like
select * from
(select a, b, a+b as c
from t)
where c > 0
EDIT:
the computation of c it complex enough for me not to want to repeat it in where a + b > 0
I need a solution which would work in hive
Use a Common Table Expression if you want to use derived columns.
with x as
(
select a, b, a+b as c
from t
)
select * from x where c >0
You can run this query like this or with a Common Table Expression
select a, b, a+b as c
from t
where a+b > 0
Reference the below order of operations for logical query processing to know if you can use derived columns in another clause.
Keyed-In Order
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Logical Querying Processing Phases
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You are close, you can do this
select a, b, a+b as c
from t
where a+b > 0
It would have to look like this:
select a, b, a+b as c
from t
where a+b > 0
An easy way to explain/remember this is this: SQL cannot reference aliases assigned within its own instance. It would, however, work if you did this:
SELECT a,b,c
FROM(
select a, b, a+b as c
from t) as [calc]
WHERE c > 0
This syntax would work because the alias is assigned in a subquery.
no
just:
select a, b, a+b as c
from t
where a+b > 0
note: for mysql at least: order by and group by you can use column (or expression) positions
e.g. group by 2, order by 1 would get you one row per column 2 (whether a field name or an expression) and order it by column 1 (field or expression)
also: some RDBMS's do let you refer to the column alias as you first attempted