Using new columns in the "where" clause - sql

hive rejects this code:
select a, b, a+b as c
from t
where c > 0
saying Invalid table alias or column reference 'c'.
do I really need to write something like
select * from
(select a, b, a+b as c
from t)
where c > 0
EDIT:
the computation of c it complex enough for me not to want to repeat it in where a + b > 0
I need a solution which would work in hive

Use a Common Table Expression if you want to use derived columns.
with x as
(
select a, b, a+b as c
from t
)
select * from x where c >0

You can run this query like this or with a Common Table Expression
select a, b, a+b as c
from t
where a+b > 0
Reference the below order of operations for logical query processing to know if you can use derived columns in another clause.
Keyed-In Order
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Logical Querying Processing Phases
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

You are close, you can do this
select a, b, a+b as c
from t
where a+b > 0

It would have to look like this:
select a, b, a+b as c
from t
where a+b > 0
An easy way to explain/remember this is this: SQL cannot reference aliases assigned within its own instance. It would, however, work if you did this:
SELECT a,b,c
FROM(
select a, b, a+b as c
from t) as [calc]
WHERE c > 0
This syntax would work because the alias is assigned in a subquery.

no
just:
select a, b, a+b as c
from t
where a+b > 0
note: for mysql at least: order by and group by you can use column (or expression) positions
e.g. group by 2, order by 1 would get you one row per column 2 (whether a field name or an expression) and order it by column 1 (field or expression)
also: some RDBMS's do let you refer to the column alias as you first attempted

Related

SQL - "Case when" vs condition

I have a table Table_A with columns A, B and C whereby column C needs to be summed, but only if column B is a certain value. Otherwise column C may not always contain a value to be summed.
So the normal SQL of:
SELECT A, B, SUM(C)
FROM Table_A
WHERE B = 'value condition'
GROUP BY A,B
Works well. However, I thought I could use "CASE WHEN" to catch the conditions with zero, say, like this:
SELECT
A, B,
CASE WHEN B = 'value condition' THEN SUM(C) ELSE 0 END
FROM Table_A
GROUP BY A, B
I get an error, referring to the fact that it is still trying to SUM a value not which is not in the condition. Am I missing something? Or have I misinterpreted CASE WHEN?
You want conditional aggregation. The CASE expression should appear inside SUM:
SELECT A, SUM(CASE WHEN B = 'value condition' THEN C ELSE 0 END) AS total
FROM Table_A
GROUP BY A;
Note that B probably does not belong in the GROUP BY clause, given that you want to conditionally aggregate using its values.

Using temporary variable when creating a Netezza VIEW

Here's a simple view I'd like to create, but I'd like only FOUR columns in the final view: a, b, c, e. I would like to define d for temporary use in determining the value of e, but then I do not want d to be part of the resulting view.
create view v as
select a, b, c, a+b+c as d,
case when d > 1000 then 1
when d > 100 then 2
when d > 10 then 3
else 4 END as e
from tbl;
Is there any way in Netezza SQL to define such a temporary value?
In this simple example, certainly I could replace each of my "when d" statements with "when a+b+c" every time, but my real-world scenario is more complex than illustrated here.
You can use a subquery:
create view v as
select a, b, c,
(case when d > 1000 then 1
when d > 100 then 2
when d > 10 then 3
else 4
end) as e
from (select tbl.*, (a + b + c) as d
from tbl
) t;

Filter if values provided otherwise return everything

Say I have a table t with 2 columns:
a int
b int
I can do a query such as:
select b
from t
where b > a
and a in(1,2,3)
order by b
where 1,2,3 is provided from the outside.
Obviously, the query can return no rows. In that case, I'd like to select everything as if the query did not have the and a in(1,2,3) part. That is, I'd like:
if exists (
select b
from t
where b > a
and a in(1,2,3)
)
select b
from t
where b > a
and a in(1,2,3)
order by b
else
select b
from t
where b > a
order by b
Is there a way to do this:
Without running two queries (one for exists, the other one the actual query)
That is less verbose than repeating queries (real queries are quite long, so DRY and all that stuff)
Using NOT EXISTS with a Sub Query to Determine if condition exists
SELECT b
FROM
t
WHERE
b > a
AND (
NOT EXISTS (SELECT 1 FROM #Table WHERE a IN (1,2,3))
OR a IN (1,2,3)
)
ORDER BY
b
The reason this works is because if the condition exists then the OR statement will include the rows and if the condition does not exist then the NOT EXISTS will include ALL rows.
Or With Common Table Expression and window Function with Conditional Aggregation.
WITH cte AS (
SELECT
b
,CASE WHEN a IN (1,2,3) THEN 1 ELSE 0 END as MeetsCondition
,COUNT(CASE WHEN a IN (1,2,3) THEN a END) OVER () as ConditionCount
FROM
t
)
SELECT
b
FROM
cte
WHERE
(ConditionCount > 0 AND MeetsCondition = 1)
OR (ConditionCount = 0)
ORDER BY
b
I find it a bit "ugly". Maybe it would be better to materialize output from your query within a temp table and then based on count from temp table perform first or second query (this limits accessing the original table from 3 times to 2 and you will be able to add some flag for qualifying rows for your condition not to repeat it). Other than that, read below . . .
Though, bear in mind that EXISTS query should execute pretty fast. It stops whether it finds any row that satisfies the condition.
You could achieve this using UNION ALL to combine resultset from constrained query and full query without constraint on a column and then decide what to show depending on output from first query using CASE statement.
How CASE statement works: when any row from constrained part of your query is found, return resultset from constrainted query else return everything omitting the constraint.
If your database supports using CTE use this solution:
with tmp_data as (
select *
from (
select 'constraint' as type, b
from t
where b > a
and a in (1,2,3) -- here goes your constraint
union all
select 'full query' as type, b
from t
where b > a
) foo
)
SELECT b
FROM tmp_data
WHERE
CASE WHEN (select count(*) from tmp_data where type = 'constraint') > 0
THEN type = 'constraint'
ELSE type = 'full query'
END
;

Combine two (or multiple) columns of a table

I have a table
a b c
1 2
1 3
1 4 1
2 1 2
The column a and c should be combined if the value is the same. If there are not the same, it is always so that one is empty
So the result should be:
a b
1 2
1 3
1 4
2 1
Is there any function that can be applied in PostgreSQL?
According to your description:
The column a and c should be combined if the value is the same. If
there are not the same, it is always so that one is empty
all you need is an unconditional COALESCE.
SELECT COALESCE(a, c) AS a, b FROM tbl;
Assuming that by "empty" you mean NULL, not an empty string (''), in which case you'd add NULLIF:
SELECT COALESCE(NULLIF(a, ''), c) AS a, b FROM tbl;
COALESCE works for multiple parameters:
SELECT COALESCE(a, c, d, e, f, g) AS a, b FROM tbl;
Are you looking for something like this?
SELECT COALESCE(c, a), b
FROM your_table
WHERE COALESCE(c, a) = a

How can I group by two rows in SQL?

In the result of an SQL Select command I have two rows:
A | B
B | A
A|B and B|A means the same to me. I want, that only one of them would be selected in an SQL command.
How can I do that?
I have a select command , I join it self (natural join), like this:
SELECT a.coloumn ,b.coloumn
FROM table a,table b
where .... (not important)
and b.coloumn IN (
SELECT coloumn
FROM table
where ... (the same like above)
)
and b.coloumn != a.coloumn ;
And after that I have multiple coloumns.
You neither told us your column names nor your table name, but assuming you have two columns A and B in a table named the_table then the following will do:
select distinct least(a,b), greatest(a,b)
from the_table;
If you want to group by them using standard SQL:
select (case when a < b then a else b end) as a,
(case when a < b then b else a end) as b,
count(*) as cnt
from table t
group by (case when a < b then a else b end),
(case when a < b then b else a end);
Oracle supports the greatest() and least() functions, but not all databases do.
Another possible solution is:
select a, b from the_table
union
select b, a from the_table
This would work fine even if there are NULL values.