SQL Select between two fields depending on the value of one field - sql

I am using a PostgreSQL database, and in a table representing some measurements I've two columns: measurement, and interpolated. In the first I've the observation (measurement), and in the second the interpolated value depending on nearby values. Every record with an original value has also an interpolated value. However, there are a lot of records without "original" observations (NULL), hence the values are interpolated and stored in the second column. So basically there are just two cases in the database:
Value Value
NULL Value
Of course, it is preferable to use the value from the first column if available, hence I need to build a query to select the data from the first column, and if not available (NULL), then the database returns the value from the second column for the record in question. I have no idea how to build the SQL query.
Please help. Thanks.

You can use Coalesce:
The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null.
Select Coalesce( first_value, second_value )
From your_table
This would return first_value if it is not NULL, second_value otherwise.

Peter nailed it. But for the sake of completeness (the title of your question is more general than your particular issue), here goes the docs for the several conditional expressions available in Postgresql (and in several other databases) : CASE, COALESCE, NULLIF, GREATEST, LEAST. The first one is the most general.

Related

How to order alphabetically in SQL with two columns, but where either column may be empty

I'm using SQL Server, and I have a table with two columns, both varchar, column A and column B. I need to produce a list in alphabetical order, however only one of these columns will ever have a value in it (ie, if column A has a value, then column B will be NULL and vice versa).
How can I write an ORDER BY clause in my T-SQL query to produce a list that checks both columns to see which one has the value present, then order the rows alphabetically?
Use COALESCE which takes the first non-null argument
order by coalesce(columnA, columnB) asc
There are some standard options to do this. What you choose is mostly personal "taste". The most "explicit" way is using CASE WHEN:
ORDER BY CASE WHEN columnA IS NULL THEN columnB ELSE columnA END;
By explicit, I mean you clearly understand it without knowing about specific functions that check this.
The standard function to do this which works on every DB is COALESCE:
ORDER BY COALESCE(columnA,columnB);
This has the advantage it's much shorter, especially when you have more columns that should replace each other when null.
SQL Server DB furthermore provides the function ISNULL that expects exact two arguments:
ORDER BY ISNULL(columnA,columnB);
The advantage of this is the name tells a bit more than "COALESCE", also it is faster than the other two options according to some performance articles and tests. The disadvantage is this function will not work on other DB's.
Overall, as I said, it's mainly kind of personal taste which option you should take.

SQL Server Case expression conditions

I am currently working on a data flow and have been given a specific requirement that i am trying to complete.
In my table I have a column which is partially NULL due to a couple of reasons.
What I'm trying to do is write a case expression within my select statement that has two conditions:
When NULL use a different value from another column (which is pulled from another table using a join)
If the column is still NULL (in both cases) then use a different value from another column in the table which will ensure the column is populated.
So basically, if it's NULL do this, if its still NULL, then do this which will mean my column is populated as i intend.
I've been playing around but have been unable to produce the required result. Is this something that can be achieved using a CASE expression ?
Any help/advice would be appreciated.
Thanks.
You are describing the coalesce() function:
coalesce(col1, col2, col3)
You can use this in either a select or update.

SQL 2 HAVING syntax

My professor is teaching sql 2, and used an statement like the code below into a query:
HAVING SUM(column) > subselect
Where subselect is something like SELECT AVG(column) FROM ...
This subselect returns only one value, but I could not understand how is it possible to compare a function (the sum) with a subselect. The subselect should return a table, right? Then how is it possible to compare a table with a value? That did not make sense to me.
Thanks in advance.
SQL has a concept of scalar subqueries. These are subqueries that return exactly one column and at most one row. Scalar subqueries can be used in almost all cases where a single value ("scalar") can be used.
If the scalar subquery returns no rows, then the value is treated as NULL.
(I should add that some databases support tuples. A tuple is a set of scalar values that is treated as a single value. In such databases "scalar" subqueries can return more than one value, but these are converted to a tuple. This is not relevant to the question being asked; tuples are just another example of a "single" value.)
In principle you are right, if you look at it in a relational way.
But SQL is an industry standard and allows the abbreviation of comparing a scalar value with a result table with only a single row and column.
Depending on the exact implementation, it even allows to compare a scalar with a list of values (a column with more than one value), although you actually should write ... value > ALL (subselect), it is often accepted without the ALL keyword.
This is both valid syntax in the where and in the having clause

Which is better with respect to database performance?

We are expecting null values in a particular column. We would like to capture them in the output also. There are two possible values other than null. They are WE and EA. So, out of these two syntax given below, which one performs better?
…( "Src_Dtl"."REGN" not in ('WE','EA') or
"Src_Dtl"."REGN" is null)…
or
...(coalesce(CVRG_REGN, ‘WE’))...
Thanks in advance.
Null values always returns false for comparisons so not in ('a','b') a null comparison would return false so it would not be included. We are forced to add a 2nd clause or something is null. This is two operations on the data.
The coalesce function says give me the first non null value in the list. It is one operation (check if the value is null) and has two possible outcomes. Because this is one operation it is faster than doing two operations.

SQL - Using MAX in a WHERE clause

Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC