Null values for strings are being counted in GBQ - google-bigquery

I have a survey table where the answers are string type and of course, not every respondent answered each question. Now my issue is that I've assigned the empty rows to null and when counting distinct values the nulls get counted as well. I am using Plotly for the visualization and I am certain the issue doesn't lie there. When using SELECT DISTINCT in BigQuery on a column the nulls still show up and are also displayed on the graph. The schema also has these columns as nullable
I have tried assigning an empty string to the empty rows which didn't make sense to me and as expected resulted in the empty rows being counted as well. I am not sure how to proceed from here if anyone has any advice on where the issue might lie it would be much appreciated thanks.

I have created some dummy data to try to replicate your issue, when I run the following:
WITH data AS(
SELECT "Yes" AS column UNION ALL
SELECT "No" AS column UNION ALL
SELECT NULL AS column
)
SELECT COUNT(column) AS count FROM data
I get the result 2, which is expected, ignoring the NULL value.
I see in your screenshot above you are selecting DISTINCT values, which will return three distinct values, with NULL being one of them, as you can see below:
WITH data AS(
SELECT "Yes" AS column UNION ALL
SELECT "No" AS column UNION ALL
SELECT NULL AS column
)
SELECT DISTINCT * FROM data
If you want to see a count of two distinct values then you should use the COUNNT() function, rather than selecting with DISTINCT.

To return the rows which doesn’t contain null values, you can use the below query:
WITH table AS(
SELECT "Yes" AS value UNION ALL
SELECT "No" AS value UNION ALL
SELECT NULL AS value
)
Select *
from table
where value is not NULL;
Output:
And to count the number of rows which doesn’t contain the null values, you can consider the below query:
Select countif(value != 'null') count from table
Output:

Related

Turning multiple rows into single row based on ID, and keeping null values

I have tried some of the various solutions posted on Stack for this issue but none of them keep null values (and it seems like the entire query is built off that assumption).
I have a table with 1 million rows. There are 10 columns. The first column is the id. Each id is unique to "item" (in my case a sales order) but has multiple rows. Each row is either completely null or has a single value in one of the columns. No two rows with the same ID have data for the same column. I need to merge these multiple rows into a single row based on the ID. However, I need to keep the null values. If the first column is null in all rows I need to keep that in the final data.
Can someone please help me with this query I've been stuck on it for 2 hours now.
id - Age - firstname - lastname
1 13 null null
1 null chris null
should output
1 13 chris null
It sounds like you want an aggregation query:
select id, max(col1) as col1, max(col2) as col2, . . .
from t
group by id;
If all values are NULL, then this will produce NULL. If one of the rows (for an id) has a value, then this will produce that value.
select id, max(col1), max(col2).. etc
from mytable
group by id
As some others have mentioned, you should use an aggregation query to achieve this.
select t1.id, max(t1.col1), max(t1.col2)
from tableone t1
group by t1.id
This should return nulls. If you're having issues handling your nulls, maybe implement some logic using ISNULL(). Make sure your data fields really are nulls and not empty strings.
If nulls aren't being returned, check to make sure that EVERY single row that has a particular ID has ONLY nulls. If one of them returns an empty string, then yes, it will drop the null and return anything else over the null.

How to get null value when summing a column has a null value

DECLARE #tablo TABLE (oran INT,deger INT)
INSERT INTO #tablo
SELECT 10,NULL
UNION ALL
SELECT 10,20
SELECT oran*deger/100 FROM #tablo
SELECT SUM(oran*deger/100) FROM #tablo
SELECT NULL+2
When i use sum function it returns a value but if a column has a null value , i want it to return null.
How is it possible?
Thanks in advance...
sum() (as with the other aggregation functions) ignores NULLs when combining values from different rows. You can do what you want with additional logic:
SELECT (case when count(*) = count(oran*deger)
then SUM(oran*deger/100)
end)
FROM #tablo;
The construct count(*) = count(<whatever>) is the shortest way I can think of to determine if a value is NULL. The first part counts the number of rows in the group. The second counts the number of non-NULL values. If these are different, then there is a NULL value somewhere.

Exclude leading NULL values from table

To give some context, I am using time series data (one column) and I want to study gaps in the data, represented by NULL values in the data set. Although I expect some leading NULL values that I am not interested in including in my final data set. However the number of leading NULL values will vary between data sets.
I would like to exclude the top x number of rows of my data set where the value of a particular column is NULL, without excluding NULL values that appear lower in the same column.
Any help would be much appreciated.
Thanks!
EDIT: I also know that my first record in the value column is always 1, if that helps.
Unfortunately, for SQL Server 2008, I can't think of anything cleaner than:
SELECT row_number,value FROM <table> t1
WHERE value is not NULL OR
EXISTS (select * FROM <table> t2
where t2.value is not null and
t2.row_number < t1.row_number)
Just as an aside, for SQL Server 2012, you could use MAX() with an appropriate OVER() clause such that it considers all previous rows. If that MAX() returns NULL then all preceding rows are known to be NULL, and that's what I'd recommend if/when you upgrade.
You could find the first non-null item for each data set and then just query everything after that:
WITH FirstItem AS
(
SELECT
DataSetID,
MIN(row_number) row_number
FROM Data
WHERE value IS NOT NULL
GROUP BY DataSetID
)
SELECT d.* FROM Data d
INNER JOIN FirstItem fi
ON d.DataSetID = fi.DataSetid
AND d.row_number >= fi.row_number

Query multiple tables in access

We have 50 tables we need to query a column that exists in all. This column is a checkbox. We need to count per table how many are checked and how many are unchecked. Cant seem to get 1 query to count results and display per table as opposed to multiplying or combining results.
We need 1 column per table to display count of checked and unchecked.
Thanks
SELECT "Table1" , Count('qcpass') AS column
FROM 5000028
GROUP BY [5000028].qcpass
union
SELECT "Table2",count('qcpass')
FROM 5000029
Group By [5000029].qcpass;
Edit
Based on your feedback, try this (sorry, didn't realize you wanted 1 column per table):
Make a union query that combines all 50 tables. The result should be 1 row per table:
SELECT "5000028" as QCPASS, Count () FROM 5000028 group by QCPASS
UNION
SELECT "5000029" as QCPASS, Count () FROM 5000029 group by QCPASS
UNION...
Now make a "Crosstab" query which is pretty easy in Access. First, make a new query and select the Crosstab option at the top. This query will use the union query as its source.
This will have 3 columns. The first will be a constant value (you can use "Totals" if you like, it's just a placeholder). Set this as your "Row Heading".
The 2nd column will be QCPass. Set this as your "Column Heading".
The 3rd column will be Expr1. Set this as your "Value".
When you run this, you should see a 1-row table with 1 column per each of your source tables.
SELECT columna, 'tablename1' from tablename1 where ..
UNION
SELECT columna, 'tablename2' from tablename2 where ..
UNION
SELECT columna, 'tablename3' from tablename3 where ..
...
SELECT columna, 'tablename4' from tablename50 where ..

Retrieving count of two columns having null values

I have two columns in a table and I want a SINGLE query for this: I need the count of null values of each column. It is said that in a row if first column contains null value the second may contain or may not contain and Viceversa.
I am a beginner and am not having much idea about this to try it out
Select column1,
(Select count(*) From table Where column = null),
(Select count(*) From table Where column2 = null)
From table
If you want to do this in a single query, use sub-queries