Retrieving count of two columns having null values - sql

I have two columns in a table and I want a SINGLE query for this: I need the count of null values of each column. It is said that in a row if first column contains null value the second may contain or may not contain and Viceversa.
I am a beginner and am not having much idea about this to try it out

Select column1,
(Select count(*) From table Where column = null),
(Select count(*) From table Where column2 = null)
From table
If you want to do this in a single query, use sub-queries

Related

Null values for strings are being counted in GBQ

I have a survey table where the answers are string type and of course, not every respondent answered each question. Now my issue is that I've assigned the empty rows to null and when counting distinct values the nulls get counted as well. I am using Plotly for the visualization and I am certain the issue doesn't lie there. When using SELECT DISTINCT in BigQuery on a column the nulls still show up and are also displayed on the graph. The schema also has these columns as nullable
I have tried assigning an empty string to the empty rows which didn't make sense to me and as expected resulted in the empty rows being counted as well. I am not sure how to proceed from here if anyone has any advice on where the issue might lie it would be much appreciated thanks.
I have created some dummy data to try to replicate your issue, when I run the following:
WITH data AS(
SELECT "Yes" AS column UNION ALL
SELECT "No" AS column UNION ALL
SELECT NULL AS column
)
SELECT COUNT(column) AS count FROM data
I get the result 2, which is expected, ignoring the NULL value.
I see in your screenshot above you are selecting DISTINCT values, which will return three distinct values, with NULL being one of them, as you can see below:
WITH data AS(
SELECT "Yes" AS column UNION ALL
SELECT "No" AS column UNION ALL
SELECT NULL AS column
)
SELECT DISTINCT * FROM data
If you want to see a count of two distinct values then you should use the COUNNT() function, rather than selecting with DISTINCT.
To return the rows which doesn’t contain null values, you can use the below query:
WITH table AS(
SELECT "Yes" AS value UNION ALL
SELECT "No" AS value UNION ALL
SELECT NULL AS value
)
Select *
from table
where value is not NULL;
Output:
And to count the number of rows which doesn’t contain the null values, you can consider the below query:
Select countif(value != 'null') count from table
Output:

Turning multiple rows into single row based on ID, and keeping null values

I have tried some of the various solutions posted on Stack for this issue but none of them keep null values (and it seems like the entire query is built off that assumption).
I have a table with 1 million rows. There are 10 columns. The first column is the id. Each id is unique to "item" (in my case a sales order) but has multiple rows. Each row is either completely null or has a single value in one of the columns. No two rows with the same ID have data for the same column. I need to merge these multiple rows into a single row based on the ID. However, I need to keep the null values. If the first column is null in all rows I need to keep that in the final data.
Can someone please help me with this query I've been stuck on it for 2 hours now.
id - Age - firstname - lastname
1 13 null null
1 null chris null
should output
1 13 chris null
It sounds like you want an aggregation query:
select id, max(col1) as col1, max(col2) as col2, . . .
from t
group by id;
If all values are NULL, then this will produce NULL. If one of the rows (for an id) has a value, then this will produce that value.
select id, max(col1), max(col2).. etc
from mytable
group by id
As some others have mentioned, you should use an aggregation query to achieve this.
select t1.id, max(t1.col1), max(t1.col2)
from tableone t1
group by t1.id
This should return nulls. If you're having issues handling your nulls, maybe implement some logic using ISNULL(). Make sure your data fields really are nulls and not empty strings.
If nulls aren't being returned, check to make sure that EVERY single row that has a particular ID has ONLY nulls. If one of them returns an empty string, then yes, it will drop the null and return anything else over the null.

Exclude leading NULL values from table

To give some context, I am using time series data (one column) and I want to study gaps in the data, represented by NULL values in the data set. Although I expect some leading NULL values that I am not interested in including in my final data set. However the number of leading NULL values will vary between data sets.
I would like to exclude the top x number of rows of my data set where the value of a particular column is NULL, without excluding NULL values that appear lower in the same column.
Any help would be much appreciated.
Thanks!
EDIT: I also know that my first record in the value column is always 1, if that helps.
Unfortunately, for SQL Server 2008, I can't think of anything cleaner than:
SELECT row_number,value FROM <table> t1
WHERE value is not NULL OR
EXISTS (select * FROM <table> t2
where t2.value is not null and
t2.row_number < t1.row_number)
Just as an aside, for SQL Server 2012, you could use MAX() with an appropriate OVER() clause such that it considers all previous rows. If that MAX() returns NULL then all preceding rows are known to be NULL, and that's what I'd recommend if/when you upgrade.
You could find the first non-null item for each data set and then just query everything after that:
WITH FirstItem AS
(
SELECT
DataSetID,
MIN(row_number) row_number
FROM Data
WHERE value IS NOT NULL
GROUP BY DataSetID
)
SELECT d.* FROM Data d
INNER JOIN FirstItem fi
ON d.DataSetID = fi.DataSetid
AND d.row_number >= fi.row_number

Conditional ORDER BY depending on column values

I need to write a query that does this:
SELECT TOP 1
FROM a list of tables (Joins, etc)
ORDER BY Column X, Column Y, Column Z
If ColumnX is NOT NULL, then at the moment, I reselect, using a slightly different ORDER BY.
So, I do the same query, twice. If the first one has a NULL in a certain column, I return that row from my procedure. However, if the value isn't NULL - I have to do another identical select, except, order by a different column or two.
What I do now is select it into a temp table the first time. Then check the value of the column. If it's OK, return the temp table, else, redo the select and return that result set.
More details:
In english, the question I am asking the database:
Return my all the results for certain court appearance (By indexed foreign key). I expect around 1000 rows. Order it by the date of the appearance (column, not indexed, nullable), last appearance first. Check an 'importId'. If the import ID is not NULL for that top 1 row, then we need to run the same query - but this time, order by the Import ID (Last one first), and return that row. Or else, just return the top 1 row from the original query.
I'd say the BEST way to do this is in a single query is a CASE statement...
SELECT TOP 1 FROM ... ORDER BY
(CASE WHEN column1 IS NULL THEN column2 ELSE column1 END)
You could use a COALESCE function to turn nullable columns into orderby friendly values.
SELECT CAST(COALESCE(MyColumn, 0) AS money) AS Column1
FROM MyTable
ORDER BY Column1;
I used in Firebird (columns are numeric):
ORDER BY CASE <condition> WHEN <value> THEN <column1>*1000 + <column2> ELSE <column3>*1000 + <column4> END

Returning more than one value from a sql statement

I was looking at sql inner queries (bit like the sql equivalent of a C# anon method), and was wondering, can I return more than one value from a query?
For example, return the number of rows in a table as one output value, and also, as another output value, return the distinct number of rows?
Also, how does distinct work? Is this based on whether one field may be the same as another (thus classified as "distinct")?
I am using Sql Server 2005. Would there be a performance penalty if I return one value from one query, rather than two from one query?
Thanks
You could do your first question by doing this:
SELECT
COUNT(field1),
COUNT(DISTINCT field2)
FROM table
(For the first field you could do * if needed to count null values.)
Distinct means the definition of the word. It eliminates duplicate returned rows.
Returning 2 values instead of 1 would depend on what the values were, if they were indexed or not and other undetermined possible variables.
If you are meaning subqueries within the select statement, no you can only return 1 value. If you want more than 1 value you will have to use the subquery as a join.
If the inner query is inline in the SELECT, you may struggle to select multiple values. However, it is often possible to JOIN to a sub-query instead; that way, the sub-query can be named and you can get multiple results
SELECT a.Foo, a.Bar, x.[Count], x.[Avg]
FROM a
INNER JOIN (SELECT COUNT(1) AS [Count], AVG(something) AS [Avg]) x
ON x.Something = a.Something
Which might help.
DISTINCT does what it says. IIRC, you can SELECT COUNT(DISTINCT Foo) etc to query distinct data.
you can return multiple results in 3 ways (off the top of my head)
By having a select with multiple values eg: select col1, col2, col3
With multiple queries eg: select 1 ; select "2" ; select colA. you would get to them in a datareader by calling .NextRecord()
Using output parameters, declare the parameters before exec the query then get the value from them afterwards. eg: set #param1 = "2" . string myparam2 = sqlcommand.parameters["param1"].tostring()
Distinct, filters resulting rows to be unique.
Inner queries in the form:
SELECT * FROM tbl WHERE fld in (SELECT fld2 FROM tbl2 WHERE tbl.fld = tbl2.fld2)
cannot return multiple rows. When you need multiple rows from a secondary query, you usually need to do an inner join on the other query.
rows:
SELECT count(*), count(distinct *) from table
will return a dataset with one row containing two columns. Column 1 is the total number of rows in the table. Column 2 counts only distinct rows.
Distinct means the returned dataset will not have any duplicate rows. Distinct can only appear once usually directly after the select. Thus a query such as:
SELECT distinct a, b, c FROM table
might have this result:
a1 b1 c1
a1 b1 c2
a1 b2 c2
a1 b3 c2
Note that values are duplicated across the whole result set but each row is unique.
I'm not sure what your last question means. You should return from a query all the data relevant to the query. As for faster, only benchmarking can tell you which approach is faster.