I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null.
Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table?
select count(id) from schema.table where id is not null
If there are 100 records I would expect all columns to come back with 100 but if one column is null it will show a 0.
You can count each column in a single query by using sum and case:
select
sum(case when Column1 is null then 1 else 0 end) Column1NullCount
, sum(case when Column2 is null then 1 else 0 end) Column2NullCount
-- ...
, sum(case when ColumnN is null then 1 else 0 end) ColumnNNullCount
from MyScheme.MyTable
I have a query which extracts some data from a JSON document and I have a query that based on the number of values returned displays an overall column count. I can't seem to work out how to combine these into a single query? assume that I need to use a sub-query but not sure where to go from here?
SELECT
JSON_EXTRACT_SCALAR(data, '$.cat.name') as cat_name
JSON_EXTRACT_SCALAR(data, '$.dog.name') as dog_name
FROM table
SELECT
CASE WHEN cat_name IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN dog_name IS NOT NULL THEN 1 ELSE 0 END AS cat_dog_total
FROM table
You can use a subquery to maintain readability:
SELECT (CASE WHEN cat_name IS NOT NULL THEN 1 ELSE 0 END +
CASE WHEN dog_name IS NOT NULL THEN 1 ELSE 0 END
) AS cat_dog_total
from (select JSON_EXTRACT_SCALAR(data, '$.cat.name') as cat_name
JSON_EXTRACT_SCALAR(data, '$.dog.name') as dog_name
from table
) t
Of course, you can substitute in the JSON_EXTRACT_SCALAR() expressions as well, but this is more readable.
I have a table where the id field (not a primary key) contains either 1 or null. Over the past several years, any given part could have been entered multiple times with one, or both of these possible options.
I'm trying to write a statement that will return some value if there is ever a 1 associated with the select statement. There are lots of semi-duplicate rows, some with 1 and some with null, but if there is ever a 1, I want to return true, and if there are only null values, I want to return false. I'm not sure how to code this though.
If this is my SELECT part,id from table where part = "ABC1234" statement
part id
ABC1234 1
ABC1234 null
ABC1234 null
ABC1234 null
ABC1234 1
I want to write a statement that returns true, because 1 exists in at least one of these rows.
The closest I've come to this is by using a CASE statement, but I'm not quite there yet:
SELECT
a1.part part,
CASE WHEN a2.id is not null
THEN
'true'
ELSE
'false'
END AS id
from table.parts a1, table.ids a2 where a1.part = "ABC1234" and a1.key = a2.key;
I also tried the following case:
CASE WHEN exists
(SELECT id from table.ids where id = 1)
THEN
but I got the error subqueries are not supported in the select list
For the above SELECT statement, how do I return 1 single line that reads:
part id
ABC1234 true
You can use conditional aggregation to check if a part has atleast one row with id=1.
SELECT part,'True' id
from parts
group by part
having count(case when id = 1 then 1 end) >= 1
To return false when the id's are all nulls use
select part, case when id_true>=1 then 'True'
when id_false>=1 and id_true=0 then 'False' end id
from (
SELECT part,
count(case when id = 1 then 1 end) id_true,
count(case when id is null then 1 end) id_false,
from parts
group by part) t
I have a SQL query which displays a list of results. Every row in my database has about
20 columns and not every column is mandatory. I would like the result of the SQL query to be
sorted by the number of filled in columns. The rows with the least empty columns at the top, the ones with the most empty columns at the bottom. Do any of you guys have an idea how to do this?
I thought about adding an extra column to the table which if updated every time the user edits their row, this number would indicate the number of empty columns and I could sort my list with that. This however, sounds like unnecessary troubles, but maybe there is no other way? I'm sure somebody on here will know!
Thanks,
Sander
You can do it in just about any database with a giant case statement:
order by ((case when col1 is not null then 1 else 0 end) +
(case when col2 is not null then 1 else 0 end) +
. . .
(case when col20 is not null then 1 else 0 end)
) desc
You could order by the amount of empty columns:
order by
case when col1 is null then 1 else 0 end +
case when col2 is null then 1 else 0 end +
case when col3 is null then 1 else 0 end +
...
case when col20 is null then 1 else 0 end
(Note the + at the end of the lines: it's only one column with the integer count of empty fields, sorted in ascending order.)
I have a table that has a processed_timestamp column -- if a record has been processed then that field contains the datetime it was processed, otherwise it is null.
I want to write a query that returns two rows:
NULL xx -- count of records with null timestamps
NOT NULL yy -- count of records with non-null timestamps
Is that possible?
Update: The table is quite large, so efficiency is important. I could just run two queries to calculate each total separately, but I want to avoid hitting the table twice if I can avoid it.
In MySQL you could do something like
SELECT
IF(ISNULL(processed_timestamp), 'NULL', 'NOT NULL') as myfield,
COUNT(*)
FROM mytable
GROUP BY myfield
In T-SQL (MS SQL Server), this works:
SELECT
CASE WHEN Field IS NULL THEN 'NULL' ELSE 'NOT NULL' END FieldContent,
COUNT(*) FieldCount
FROM
TheTable
GROUP BY
CASE WHEN Field IS NULL THEN 'NULL' ELSE 'NOT NULL' END
Oracle:
group by nvl2(field, 'NOT NULL', 'NULL')
Try the following, it's vendor-neutral:
select
'null ' as type,
count(*) as quant
from tbl
where tmstmp is null
union all
select
'not null' as type,
count(*) as quant
from tbl
where tmstmp is not null
After having our local DB2 guru look at this, he concurs: none of the solutions presented to date (including this one) can avoid a full table scan (of the table if timestamp is not indexed, or of the indexotherwise). They all scan every record in the table exactly once.
All the CASE/IF/NVL2() solutions do a null-to-string conversion for each row, introducing unnecessary load on the DBMS. This solution does not have that problem.
Stewart,
Maybe consider this solution. It is (also!) vendor non-specific.
SELECT count([processed_timestamp]) AS notnullrows,
count(*) - count([processed_timestamp]) AS nullrows
FROM table
As for efficiency, this avoids 2x index seeks/table scans/whatever by including the results on one row. If you absolutely require 2 rows in the result, two passes over the set may be unavoidable because of unioning aggregates.
Hope this helps
If it's oracle then you can do:
select decode(field,NULL,'NULL','NOT NULL'), count(*)
from table
group by decode(field,NULL,'NULL','NOT NULL');
I'm sure that other DBs allow for similar trick.
Another MySQL method is to use the CASE operator, which can be generalised to more alternatives than IF():
SELECT CASE WHEN processed_timestamp IS NULL THEN 'NULL'
ELSE 'NOT NULL' END AS a,
COUNT(*) AS n
FROM logs
GROUP BY a
SQL Server (starting with 2012):
SELECT IIF(ISDATE(processed_timestamp) = 0, 'NULL', 'NON NULL'), COUNT(*)
FROM MyTable
GROUP BY ISDATE(processed_timestamp);
Another way in T-sql (sql-server)
select count(case when t.timestamps is null
then 1
else null end) NULLROWS,
count(case when t.timestamps is not null
then 1
else null end) NOTNULLROWS
from myTable t
If your database has an efficient COUNT(*) function for a table, you could COUNT whichever is the smaller number, and subtract.
In Oracle
SELECT COUNT(*), COUNT(TIME_STAMP_COLUMN)
FROM TABLE;
count(*) returns the count of all rows
count(column_name) returns the number of rows which are not NULL, so
SELECT COUNT(*) - COUNT(TIME_STAMP_COLUMN) NUL_COUNT,
COUNT(TIME_STAMP_COLUMN) NON_NUL_COUNT
FROM TABLE
ought to do the job.
If the column is indexed, you might end up with some sort of range scan and avoid actually reading the table.
I personally like Pax's solution, but if you absolutely require only one row returned (as I had recently), In MS SQL Server 2005/2008 you can "stack" the two queries using a CTE
with NullRows (countOf)
AS
(
SELECT count(*)
FORM table
WHERE [processed_timestamp] IS NOT NULL
)
SELECT count(*) AS nulls, countOf
FROM table, NullRows
WHERE [processed_timestamp] IS NULL
GROUP BY countOf
Hope this helps
[T-SQL]:
select [case], count(*) tally
from (
select
case when [processed_timestamp] is null then 'null'
else 'not null'
end [case]
from myTable
) a
And you can add into the case statement whatever other values you'd like to form a partition, e.g. today, yesterday, between noon and 2pm, after 6pm on a Thursday.
Select Sum(Case When processed_timestamp IS NULL
Then 1
Else 0
End) not_processed_count,
Sum(Case When processed_timestamp Is Not NULL
Then 1
Else 0
End) processed_count,
Count(1) total
From table
Edit: didn't read carefully, this one returns a single row.