How to express "either the single resulting record or NULL", without an inner-query LIMIT? - sql

Consider the following query:
SELECT (SELECT MIN(col1) FROM table1) = 7;
Assuming col1 is non-NULLable, this will yield either true or false - or possibly NULL when table1 is empty;
But now suppose I have:
SELECT (
SELECT
FIRST_VALUE (col2) OVER (
ORDER BY col1
) AS col2_for_first_col1
FROM table1
) = 7;
(and assume col2 is also non-NULLable for simplicity.)
If there is a unique col2 value for the lowest col1 value, or the table is empty, then this works just like before. But if there are multiple col2 values for the lowest col1, I'm going to get a query runtime error.
My question: What is a short, elegant way to get NULL from this last query also in the case of multiple inner-query results? I could of course duplicate it and check the count, but I would rather avoid that.
Important caveat: I'm using MonetDB, and it doesn't seem to support ORDER BY ... LIMIT 1 on inner queries.

Without the MonetDB limitation, you would seem to want:
SELECT (SELECT col2
FROM table1
ORDER BY col1
LIMIT 1
) = 7;
with the limitation, you can use window functions differently:
SELECT (SELECT col2
FROM (SELECT col2, ROW_NUMBER() OVER (ORDER BY col1) as seqnum
FROM table1
) t
WHERE seqnum = 1
) = 7;

Related

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?
You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .
You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.
With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

Select group by with a max predicate

Quite often I have to do queries like below:
select col1, max(id)
from Table
where col2 = 'value'
and col3 = ( select max(col3)
from Table
where col2 = 'value'
)
group by col1
Are there any other ways to avoid subqueries and temp tables? Basically I need a group by on all the rows with a particular max value. Assuming all proper indices are used.
You can use an OLAP function to achieve this. I would say this solution is marginally better in that your predicates are not duplicated between the main query and subquery, so you don't violate DRY:
SELECT *
FROM (
select col1, max(id) as max_id,
RANK() OVER (PARTITION BY col1 ORDER BY col3 DESC) AS irow
from [Member]
where col2 = 'value'
group by col1
) subquery
WHERE subquery.irow = 1

Is the result from an 'OR' clause in SQL SELECT defined?

What I mean is this:
If I have SELECT statement as follows:
SELECT * FROM TAB1 WHERE COL1 = ('VAL1' OR 'VAL2') AND ...
If the table in question has entries for both 'VAL1' and 'VAL2', does the order in the WHERE clause make any difference?
If I would prefer to get back a record with 'VAL1' but would take 'VAL2' if the former is not available, will the above SELECT guarantee this? Or is the response undefined in this case? And would this behavior furthermore perhaps be vendor-specific?
No, you can't do it this way, it could return wrong record depending in where and order.
Try :
SELECT * FROM TAB1
WHERE (COL1 = 'VAL1' OR (COL1 = 'VAL2' and (select count(*) from tab1 where COL1 =
'VAL1') = 0)) AND ...
Do a UNION ALL. Let the first SELECT return VAL1 rows. Have another SELECT that returns VAL2 rows, but only if there exists no VAL1 rows.
SELECT * FROM TAB1
WHERE COL1 = 'VAL1'
union all
SELECT * FROM TAB1
WHERE COL1 = 'VAL2'
and NOT EXISTS (SELECT * FROM TAB1 WHERE COL1 = 'VAL1')
Basically the same as #Claudio Biselli's answer, but easier to optimize.
I would opt for ranking the results on the basis of preference and then choosing the records which are most preferred as below
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(order by case when col1='VAL1' then 1
when col1='VAL2' then 2
end asc) as rnk
FROM TAB1
WHERE COL1 = ('VAL1' OR 'VAL2')
)X
WHERE X.RNK=1
Maybe use IN instead of OR
For example:
SELECT * FROM TAB1 WHERE COL1 IN ('VAL1','VAL2') AND ...
All conditions in a WHERE clause are applied simultaneously in SQL. There is no order guaranteed or implied at all. This is true for any SQL DBMS, not just SAP HANA.
To have a weighted selection criterion you need to create a weighting function, just as the other replies indicated.
One approach, that is rather easy to implement, is to first create SELECTs for all possible weights and then choose the entries with the highest/lowest weight.
For example:
WITH weighted AS (
SELECT 100 as SEL_WEIGHT, PK1, PK2
FROM TAB1 WHERE COL1 = 'VAL1'
UNION ALL
SELECT 60 as SEL_WEIGHT, PK1, PK2
FROM TAB1 WHERE COL1 = 'VAL2'
UNION ALL
SELECT -1 as SEL_WEIGHT, PK1, PK2
FROM TAB1 WHERE COL1 IS NULL)
SELECT
*
FROM
weighted
WHERE (PK1, PK2, SEL_WEIGHT) in
(SELECT PK1, PK2, MAX(SEL_WEIGHT) as MAX_WEIGHT
FROM weighted
GROUP BY PK1, PK2);
In this example, the sub-select finds the maximum weight and the related primary key columns (PK1, PK2) and joins them back as the final selection criterion to the main SELECT.
Records with COL1 ='VAL1' are weighted highest (100) and preferred over those with COL1='VAL2' (60) or those where COL1 is NULL (-1).
The benefit of such a construct is that it is easy to see how the weights are defined and that each weight can have its own independent combination of conditions.
As weighted selection criteria are non-standard in SQL systems I highly recommend making the implementation as obvious as possible to avoid confusion and misunderstanding later on.

BigQuery: Use COUNT as LIMIT

I want to select everything from mytable1 and combine that with just as many rows from mytable2. In my case mytable1 always has fewer rows than mytable2 and I want the final table to be a 50-50 mix of data from each table. While I feel like the following code expresses what I want logically, it doesn't work syntax wise:
Syntax error: Expected "#" or integer literal or keyword CAST but got
"(" at [3:1]
(SELECT * FROM `mytable1`)
UNION ALL (
SELECT * FROM `mytable2`
LIMIT (SELECT COUNT(*) FROM`mytable1`)
)
Using standard SQL in bigquery
The docs state that LIMIT clause accept only literal or parameter values. I think you can ROW_NUMBER() the rows from second table and limit based on that:
SELECT col1, col2, col3
FROM mytable1
UNION ALL
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3, ROW_NUMBER() OVER () AS rn
FROM mytable2
) AS x
WHERE x.rn <= (SELECT COUNT(*) FROM mytable1)
Each SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
As your mytable1 always less column than mytable2 so you have to put same number of column by selection
select col1,col2,col3,'' as col4 from mytable1 --in case less column you can use alias
union all
select col1,col2,col3,col4 from mytable2

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).