Catching null warnings in aggregate functions in sql - sql

How does one use the debugger in sql 2008 / 2012 to catch null values in records?
See:
drop table abc
create table abc(
a int
)
go
insert into abc values(1)
insert into abc values(null)
insert into abc values(2)
select max(a) from abc
(1 row(s) affected)
Warning: Null value is eliminated by an aggregate or other SET operation.
Now this can be rectifed by doing:
SELECT max(isNull(a,0)) FROM abc
which is fine, until I come to to 200 line queries with several levels of nesting,and a result set of 2000 odd records. -- And then have no clue which column is throwing the warning.
How do I add conditional breakpoints ( or break on warning ) in the SQL debugger? ( if it is even possible )

Part 1: About aggregate warnings...
Considering your several levels nesting I am afraid there is no straightforward way of seeing which records trigger those warnings.
I think your best shot would be to remove each aggregate function, one at a time, from the SELECT part of the top-level statement and run query so you can see which aggregate is causing warnings at the top level (if any)
After that you should move on to nested queries and move each sub-query that feeds the top-level aggregates to a separate window and run it there, check for warnings. You should repeat this for additional levels of nesting to find out what actually causes the warnings.
You can employ the following method also.
Part 2:About conditional breakpoints...
For the sake of debugging, you move each of you nested tables out and put its data to a temp table. After that you check for null values in that temp table. You set a breakpoint in an IF statement. I believe this is the best thing close to a conditional breakpoint. (IF clause can be altered to build other conditions)
Here is a solid example,
Instead of this:
SELECT A.col1, A.col2, SUM(A.col3) as col3
FROM (SELECT X as col1, Y as col2, MAX(Z) as col3
FROM (SELECT A as X, B as Y, MIN(C) as Z
FROM myTableC
) as myTableB
) as myTableA
do this:
SELECT A as X, B as Y, MIN(C) as Z
INTO #tempTableC
FROM myTableC
IF EXISTS (SELECT *
FROM #tempTableC
WHERE A IS NULL ) BEGIN
SELECT 'A' --- Breakpoint here
END
SELECT X as col1, Y as col2, MAX(Z) as col3
INTO #tempTableB
FROM #tempTableC
IF EXISTS (SELECT *
FROM #tempTableB
WHERE X IS NULL ) BEGIN
SELECT 'B' --- Breakpoint here
END
SELECT col1, col2, SUM(col3) as col3
FROM #tempTableB as myTableA

aggregate functions exclude null values by definition, so you can just write
select max (a) from abc
instead of
SELECT max(isNull(a,0)) FROM abc
unless all values of a in abc are null, in which the second query would return zero instead of null.
If you want to prevent null values being entered, use a not null constraint on the table column.

You can turn off the warning by executing:
set ansi_warnings off
This is explained here. This works, at least on the systems I've tested it on, to remove the warning when aggregating on NULL values.
This supposedly has another effect on converting numeric overflows and divide by 0s to NULLs rather than an error. However, I still get errors for divide by 0 and arithmetic overflows.
As an aside, when using SQL Server Management Studio, one rarely sees this message. When the query is successful, the message is on the "Messages" tab. However, SSMS defaults to the "Results" tab and usually there is no reason to look at the messages (although the warning is there). You only see the warning automatically when there is an error in the query, and SSMS defaults to the messages tab.

You'll have to write a second query to pull out the data that you're looking for.
SELECT * FROM abc WHERE a IS NULL
You can put that into an IF statement to write an error message, or log to a table. Other than that, you're out of luck. Sorry. : /

Rather You can ignore to have rows with null values
SELECT MAX(a) FROM abc WHERE a IS NOT NULL

Related

SQL query to find columns having at least one non null value

I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn't be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it's greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare #sql nvarchar(max);
select #sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(#list) s;
exec sp_executesql(#sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.
You should probably use exists since you aren't really needing a count of anything.
You don't indicate how you want to consume the results of multiple counts, however one thing you could do is use concat to return a list of the columns meeting your criteria:
The following sample table has 5 columns, 3 of which have a value on at least 1 row.
create table t (col1 int, col2 int, col3 int, col4 int, col5 int)
insert into t select null,null,null,null,null
insert into t select null,2,null,null,null
insert into t select null,null,null,null,5
insert into t select null,null,null,null,6
insert into t select null,4,null,null,null
insert into t select null,6,7,null,null
You can name the result of each case expression and concatenate, only the columns that have a non-null value are included as concat ignores nulls returned by the case expressions.
select Concat_ws(', ',
case when exists (select * from t where col1 is not null) then 'col1' end,
case when exists (select * from t where col2 is not null) then 'col2' end,
case when exists (select * from t where col3 is not null) then 'col3' end,
case when exists (select * from t where col4 is not null) then 'col4' end,
case when exists (select * from t where col5 is not null) then 'col5' end)
Result:
col2, col3, col5
I asked a similar question about a decade ago. The best way of doing this in my opinion would meet the following criteria.
Combine the requests for multiple columns together so they can all be calculated in a single scan.
If the scan encounters a not null value in every column under consideration allow it to exit early without reading the rest of the table/index as reading subsequent rows won't change the result.
This is quite a difficult combination to get in practice.
The following might give you the desired behaviour
SELECT DISTINCT TOP 2 ColumnWithoutNull
FROM YourTable
CROSS APPLY (VALUES(CASE WHEN b IS NOT NULL THEN 'b' END),
(CASE WHEN c IS NOT NULL THEN 'c' END)) V(ColumnWithoutNull)
WHERE ColumnWithoutNull IS NOT NULL
OPTION ( HASH GROUP, MAXDOP 1, FAST 1)
If it gives you a plan like this
Hash match usually reads all its build input first meaning that no shortcircuiting of the scan will happen. If the optimiser gives you an operator in "flow distinct" mode it won't do this however and the query execution can potentially stop as soon as TOP receives its first two rows signalling that a NOT NULL value has been found in both columns and query execution can stop.
But there is no hint to request the mode for hash aggregate so you are dependent on the whims of the optimiser as to whether you will get this in practice. The various hints I have added to the query above are an attempt to point it in that direction however.

Can SELECT expressions sometimes be evaluated for rows not matching WHERE clause?

I would like to know if it's possible for expressions that are part of the SELECT statement list to be evaluated for rows not matching the WHERE clause?
From the execution order documented here, it seems that the SELECT gets evaluated long after the WHERE, however I ran into a very weird problem with a real-life query similar to the query below.
To put you in context, in the example, the SomeOtherTable has a a_varchar column which always contains numerical values for the code 105, but may contain non-numerical values for other codes.
The query statement works:
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
The following query complains about being unable to cast a_varchar to int:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT an_id, CAST(a_varchar AS int)
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
And finally, the following query works:
SELECT 1
FROM (
SELECT an_id, an_integer FROM SomeTable
UNION ALL
SELECT
an_id,
CASE code WHEN 105 THEN CAST(a_varchar AS int) ELSE NULL END
FROM SomeOtherTable
WHERE code = 105
) i
INNER JOIN AnotherOne a
ON a.an_id = i.an_id
Therefore, the only explanation I could find was that with the JOIN, the query gets optimized differently in a way that CAST(a_varchar AS int) gets executed even if code <> 105.
The queries are run against SQL SERVER 2008.
Absolutely.
The documentation that you reference has a section called Logical Processing Order of the SELECT statement. This is not the physical processing order. It explains how the query itself is interpreted. For instance, an alias defined in the select clause cannot be references in the where clause, because the where clause is logically processed first.
In fact, SQL Server has the ability to optimize queries by doing various data transformation operations when it reads the data. This is a nice performance benefit, because the data is in memory, locally, and the operations can simply be done in place. However, the following can fail with a run-time error:
select cast(a_varchar as int)
from table t
where a_varchar not like '%[^0-9]%';
The filter is applied after the attempt at conversion, in the real process flow. I happen to consider this a bug; presumably, the folks at Microsoft do not think so, because they have not bothered to fix this.
Two workarounds are available. The first is try_convert(), which does conversions and returns NULL for a failure instead of a run-time error. The second is the case statement:
select (case when a_varchar not like '%[^0-9]%' then cast(a_varchar as int) end)
from table t
where a_varchar not like '%[^0-9]%';

SQL Server Empty Result

I have a valid SQL select which returns an empty result, up and until a specific transaction has taken place in the environment.
Is there something available in SQL itself, that will allow me to return a 0 as opposed to an empty dataset? Similar to isNULL('', 0) functionality. Obviously I tried that and it didn't work.
PS. Sadly I don't have access to the database, or the environment, I have an agent installed that is executing these queries so I'm limited to solving this problem with just SQL.
FYI: Take any select and run it where the "condition" is not fulfilled (where LockCookie='777777777' for example.) If that condition is never met, the result is empty. But at some point the query will succeed based on a set of operations/tasks that happen. But I would like to return 0, up until that event has occurred.
You can store your result in a temp table and check ##rowcount.
select ID
into #T
from YourTable
where SomeColumn = #SomeValue
if ##rowcount = 0
select 0 as ID
else
select ID
from #T
drop table #T
If you want this as one query with no temp table you can wrap your query in an outer apply against a dummy table with only one row.
select isnull(T.ID, D.ID) as ID
from (values(0)) as D(ID)
outer apply
(
select ID
from YourTable
where SomeColumn = #SomeValue
) as T
alternet way is from code, you can check count of DataSet.
DsData.Tables[0].Rows.count > 0
make sure that your query matches your conditions

Insert to table from set returning function with parameters

This is probably something simple but couldn't figure it out.
I've table Summary and function GetSummary that returns row as set of Summary. I can query it like this
SELECT GetSummary(arg1, arg2)
GetSummary
-----------
(val1, val2, val3)
And like this that returns the actual columns:
SELECT * FROM GetSummary(arg1, arg2)
col1 | col2 | col3
------------------------
val1 | val2 | val3
Insertion to Summary works fine:
INSERT INTO Summary (SELECT * FROM GetSummary(arg1, arg2));
INSERT 0 1
But I can't figure out how to insert several rows at once based on columns in other table. I would like to do something like this:
INSERT INTO Summary (SELECT FROM GetSummary(OtherTable.x, OtherTable.y)
FROM OtherTable WHERE <some query>);
That fails because SELECT FROM GetSummary .. doesn't return Summary table rows. The query SELECT * FROM GetSummary .. would do that but then I don't know how to write the query.
Edit
Happened to stumble to solution few minutes after posting. The right syntax is
INSERT INTO Summary (SELECT (GetSummary(OtherTable.x, OtherTable.y)).*
FROM OtherTable WHERE <some query>);
The (X).* notation expands the select to columns.
The solution appended to the question still has syntax errors. It should be:
INSERT INTO Summary
SELECT (GetSummary(o.x, o.y)).*
FROM OtherTable o
WHERE <some condition>;
Must:
- Only one FROM.
Optional:
- No parenthesis around the SELECT needed.
- Table alias to simplify syntax.
The manual on Accessing Composite Types.
Also, it seems that your function is supposed to return one (or no) row. If that is the case, you should drop the SETOF in the RETURNS clause. Make that:
CREATE FUNCTION getsummary( ... )
RETURNS summary AS ...

TSQL NOT EXISTS Why is this query so slow?

Debugging an app which queries SQL Server 05, can't change the query but need to optimise things.
Running all the selects seperately are quick <1sec, eg: select * from acscard, select id from employee... When joined together it takes 50 seconds.
Is it better to set uninteresting accesscardid fields to null or to '' when using EXISTS?
SELECT * FROM ACSCard
WHERE NOT EXISTS
( SELECT Id FROM Employee
WHERE Employee.AccessCardId = ACSCard.acs_card_number )
AND NOT EXISTS
( SELECT Id FROM Visit
WHERE Visit.AccessCardId = ACSCard.acs_card_number )
ORDER by acs_card_id
Do you have indexes on Employee.AccessCardId, Visit.AccessCardId, and ACSCard.acs_card_number?
The SELECT clause is not evaluated in an EXISTS clause. This:
WHERE EXISTS(SELECT 1/0
FROM EMPLOYEE)
...should raise an error for dividing by zero, but it won't. But you need to put something in the SELECT clause for it to be a valid query - it doesn't matter if it's NULL or a zero length string.
In SQL Server, NOT EXISTS (and NOT IN) are better than the LEFT JOIN/IS NULL approach if the columns being compared are not nullable (the values on either side can not be NULL). The columns compared should be indexed, if they aren't already.