Assuming that I have the following piece of code in the SELECT clause which is being executed on Spark:
...
MEAN(CASE
WHEN (col1 = 'A'
AND (col3 = 'A' OR col4 = 'B')) THEN col2
END) AS testing,
...
What would be the output of this query when col2 is NULL? Are the rows containing col2=NULL be ignored by the MEAN function?
Disclaimer - don't know Apache Spark!
I've created a SQL Fiddle - http://sqlfiddle.com/#!9/6f7d5e/3.
If col2 is null, it is not included in the average, unless all the matching records are null.
The result will be NULL. It will have the type of col2 -- this might matter in some databases (or if you are saving the result to a table).
What is the MEAN() function? To calculate the average, use AVG(). This is the standard function for calculating averages in SQL.
Related
I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn't be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it's greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare #sql nvarchar(max);
select #sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(#list) s;
exec sp_executesql(#sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.
You should probably use exists since you aren't really needing a count of anything.
You don't indicate how you want to consume the results of multiple counts, however one thing you could do is use concat to return a list of the columns meeting your criteria:
The following sample table has 5 columns, 3 of which have a value on at least 1 row.
create table t (col1 int, col2 int, col3 int, col4 int, col5 int)
insert into t select null,null,null,null,null
insert into t select null,2,null,null,null
insert into t select null,null,null,null,5
insert into t select null,null,null,null,6
insert into t select null,4,null,null,null
insert into t select null,6,7,null,null
You can name the result of each case expression and concatenate, only the columns that have a non-null value are included as concat ignores nulls returned by the case expressions.
select Concat_ws(', ',
case when exists (select * from t where col1 is not null) then 'col1' end,
case when exists (select * from t where col2 is not null) then 'col2' end,
case when exists (select * from t where col3 is not null) then 'col3' end,
case when exists (select * from t where col4 is not null) then 'col4' end,
case when exists (select * from t where col5 is not null) then 'col5' end)
Result:
col2, col3, col5
I asked a similar question about a decade ago. The best way of doing this in my opinion would meet the following criteria.
Combine the requests for multiple columns together so they can all be calculated in a single scan.
If the scan encounters a not null value in every column under consideration allow it to exit early without reading the rest of the table/index as reading subsequent rows won't change the result.
This is quite a difficult combination to get in practice.
The following might give you the desired behaviour
SELECT DISTINCT TOP 2 ColumnWithoutNull
FROM YourTable
CROSS APPLY (VALUES(CASE WHEN b IS NOT NULL THEN 'b' END),
(CASE WHEN c IS NOT NULL THEN 'c' END)) V(ColumnWithoutNull)
WHERE ColumnWithoutNull IS NOT NULL
OPTION ( HASH GROUP, MAXDOP 1, FAST 1)
If it gives you a plan like this
Hash match usually reads all its build input first meaning that no shortcircuiting of the scan will happen. If the optimiser gives you an operator in "flow distinct" mode it won't do this however and the query execution can potentially stop as soon as TOP receives its first two rows signalling that a NOT NULL value has been found in both columns and query execution can stop.
But there is no hint to request the mode for hash aggregate so you are dependent on the whims of the optimiser as to whether you will get this in practice. The various hints I have added to the query above are an attempt to point it in that direction however.
I am not getting expected result when mentioned Not Null check with one Numeric value on same column. It's returning all the values.
Select *
from TableName
where Col1 = value
and (col2 is Not Null or col2 <> 123)
Here, col2 is Numeric column.
Expected Result = Exclude records where Col2 having Null value or 123
Please help me on this issue.
I'm guessing that you are getting all the rows, except the NULL ones. One solution is to replace the or with and. But that logic is actually redundant:
where Col1 = value and col2 <> 123
NULL fails the <> 123 comparison, so you don't need another check.
I get the correct result from below query. As per following requirement, I was using Or in place of And.
Select *
from TableName
where Col1 = value
and (col2 is Not Null and col2 <> 123)
I am trying to write a PL/SQL procedure which will have the SQL query to get the results. But the requirement is that the order by can be dynamic and is mainly for sorting the columns in the screen. I am passing 2 parameters to this procedure - in_sort_column and in_sort_order.
The requirement is such that on text columns the sorting is in ASC and for numbers it is DESC.
My query looks something like this without adding the in_sort_order -
SELECT col1, col2, col3 from tabl e1 where col1 > 1000
ORDER BY decode(in_sort_column,'col1', col1, 'col2', col2, 'col3', col3);
I am not able to figure out how to use the in_sort_order parameter in this case. Can someone who has done this before help out ?
Thanks
When doing a dynamic sort, I recommend using separate clauses:
order by (case when in_sort_column = 'col1' then col1 end),
(case when in_sort_column = 'col2' then col2 end),
(case when in_sort_column = 'col3' then col3 end)
This guarantees that you will not have an unexpected problem with type conversion, if the columns are of different types. Note that case return NULL without an else clause.
Since the requirement is based on data type, you could just negate the numeric columns in your decode; if col1 is numeric and the others are text then:
ORDER BY decode(in_sort_column, 'col1', -col1, 'col2', col2, 'col3', col3);
But this is going to attempt to convert the text columns to numbers. You can swap the decode or around to avoid that, but you then do an implicit conversion of your numeric column to a string, and your numbers will then be sorted alphabetically - so 2 comes after 10, for example.
So Gordon Linoff's use of case is better, and you can still negate the col1 value with that to make the numbers effectively sort descending.
I am wondering whether is possible to assign a value to a casted column in SQL depending on real table values.
For Example:
select *, cast(null as number) as value from table1
where if(table1.id > 10 then value = 1) else value = 0
NOTE: I understand the above example is not completely Oracle, but, it is just a demonstration on what I want to accomplish in Oracle. Also, the above example can be done multiple ways due to its simplicity. My goal here is to verify if it is possible to accomplish the example using casted columns (columns not part of table1) and some sort of if/else.
Thanks,
Y_Y
select table1.*, (case when table1.id > 10 then 1 else 0 end) as value
from table1
We'd like to write this query:
select * from table
where col1 != 'blah' and col2 = 'something'
We want the query to include rows where col1 is null (and col2 = 'something'). Currently the query won't do this for the rows where col1 is null. Is the below query the best and fastest way?
select * from table
where (col1 != 'blah' or col1 is null) and col2 = 'something'
Alternatively, we could if needed update all the col1 null values to empty strings. Would this be a better approach? Then our first query would work.
Update: Re: using NVL: I've read on another post that this is not considered a great option from a performance perspective.
In Oracle, there is no difference between an empty string and NULL.
That is blatant disregard for the SQL standard, but there you go ...
In addition to that, you cannot compare against NULL (or not NULL) with the "normal" operators: "col1 = null" will not work, "col1 = '' " will not work, "col1 != null" will not work, you have to use "is null".
So, no, you cannot make this work any other way then "col 1 is null" or some variation on that (such as using nvl).
I think that the solution that you posted is one of best options.
Regarding to performance, in my opinion it is not a big difference in this case, if the clause already have a != comparison usually the optimizer won't use an index in that column, because the selectivity is not enough, so the more discriminating filter will be the other side of the "and" condition.
If you ask me, I won't use an empty string as a null, but may be is just a personal preference.
While not the most readable - Oracle has an LNNVL Function that is essentially the not() function, but inverts the behavior for nulls. Meaning that comparing anything with null inside of lnnvl will return true (I don't know what performance implications this may have).
To do what you want in a single statement:
select * from table where lnnvl(col1 = 'blah') and col2 = 'something'
Note that this will only work for comparing a nullable value against a value you can be assured is non-nullable. Otherwise you'll need to do as Thilo suggests - use an operator similar to
lnnvl(nvl(col1, -1) = nvl(col2, -1))
It depends on your data, but most optimizers are going to look at col2 before col1, since = is an easier index than !=.
Otherwise, there are various ways you can speed this query up. It's probably best to do (col1 != 'blah' or col1 is null), but some database allow you to index a function. So you can index coalesce(col1, 0) and get good performance.
Really it depends on you data and your table.
In oracle use the nvl function
select * from table where nvl(col1,'notblah') <> 'blah'
If you want to speed up this sort of query, and you're on Oracle 10g or later, use a function-based index to turn those NULLs into values:
CREATE INDEX query_specific_index ON table (col2, NVL(col1,'***NULL***'));
select * from table
where NVL(col1,'***NULL***') != 'blah' and col2 = 'something';
The database will quite likely use the index in this scenario (of course, subject to the decision of the CBO, affected by row counts and the accuracy of the statistics). The query MUST use the exact expression given in the index - in this case, "NVL(col1,'***NULL***')"
Of course, pick a value for '***NULL***' that will not conflict with any data in col1!
What about this option. I think it may work if your value is never null.
where not (value = column)
which would result in following truth table for evaluation for the where clause
col1
| 'bla' | null |
-----------------
| 'bla' | F | T |
value -------------------------
| null | T | *T |
*this is the only one that's "wrong" but that's ok since our value is never null
Update
Ok, I just tried out my idea and it failed. I'll leave the answer here to save time of others trying the same thing. Here are my results:
select 'x', 'x' from dual where not ('x' = 'x');
0 rows
select 'x', 'y' from dual where not ('x' = 'y');
1 row
select 'x', 'null' from dual where not ('x' = null);
0 rows
select 'null', 'null' from dual where not (null = null);
0 rows
Update 2
This solution works if your value is never null (matches the truth table above)
where ('blah' != col1 or col1 is null)
tests here:
select 'x', 'x' from dual where ('x' != 'x' or 'x' is null);
0 rows
select 'x', 'y' from dual where ('x' != 'y' or 'y' is null);
1 row
select 'x', 'null' from dual where ('x' != null or null is null);
1 row
select 'null', 'null' from dual where (null != null or null is null);
1 row
For Oracle
select * from table where nvl(col1, 'value') != 'blah' and col2 = 'something'
For SqlServer
select * from table where IsNull(col1, '') <> 'blah' and col2 = 'something'
I think that your increase would be minimal in changing NULL values to "" strings. However if 'blah' is not null, then it should include NULL values.
EDIT: I guess I'm surprised why I got voted down here. If 'blah' if not null or an empty string, then it should never matter as you are already checking if COL1 is not equal to 'blah' which is NOT a NULL or an empty string.