Exclude leading NULL values from table - sql

To give some context, I am using time series data (one column) and I want to study gaps in the data, represented by NULL values in the data set. Although I expect some leading NULL values that I am not interested in including in my final data set. However the number of leading NULL values will vary between data sets.
I would like to exclude the top x number of rows of my data set where the value of a particular column is NULL, without excluding NULL values that appear lower in the same column.
Any help would be much appreciated.
Thanks!
EDIT: I also know that my first record in the value column is always 1, if that helps.

Unfortunately, for SQL Server 2008, I can't think of anything cleaner than:
SELECT row_number,value FROM <table> t1
WHERE value is not NULL OR
EXISTS (select * FROM <table> t2
where t2.value is not null and
t2.row_number < t1.row_number)
Just as an aside, for SQL Server 2012, you could use MAX() with an appropriate OVER() clause such that it considers all previous rows. If that MAX() returns NULL then all preceding rows are known to be NULL, and that's what I'd recommend if/when you upgrade.

You could find the first non-null item for each data set and then just query everything after that:
WITH FirstItem AS
(
SELECT
DataSetID,
MIN(row_number) row_number
FROM Data
WHERE value IS NOT NULL
GROUP BY DataSetID
)
SELECT d.* FROM Data d
INNER JOIN FirstItem fi
ON d.DataSetID = fi.DataSetid
AND d.row_number >= fi.row_number

Related

Null values for strings are being counted in GBQ

I have a survey table where the answers are string type and of course, not every respondent answered each question. Now my issue is that I've assigned the empty rows to null and when counting distinct values the nulls get counted as well. I am using Plotly for the visualization and I am certain the issue doesn't lie there. When using SELECT DISTINCT in BigQuery on a column the nulls still show up and are also displayed on the graph. The schema also has these columns as nullable
I have tried assigning an empty string to the empty rows which didn't make sense to me and as expected resulted in the empty rows being counted as well. I am not sure how to proceed from here if anyone has any advice on where the issue might lie it would be much appreciated thanks.
I have created some dummy data to try to replicate your issue, when I run the following:
WITH data AS(
SELECT "Yes" AS column UNION ALL
SELECT "No" AS column UNION ALL
SELECT NULL AS column
)
SELECT COUNT(column) AS count FROM data
I get the result 2, which is expected, ignoring the NULL value.
I see in your screenshot above you are selecting DISTINCT values, which will return three distinct values, with NULL being one of them, as you can see below:
WITH data AS(
SELECT "Yes" AS column UNION ALL
SELECT "No" AS column UNION ALL
SELECT NULL AS column
)
SELECT DISTINCT * FROM data
If you want to see a count of two distinct values then you should use the COUNNT() function, rather than selecting with DISTINCT.
To return the rows which doesn’t contain null values, you can use the below query:
WITH table AS(
SELECT "Yes" AS value UNION ALL
SELECT "No" AS value UNION ALL
SELECT NULL AS value
)
Select *
from table
where value is not NULL;
Output:
And to count the number of rows which doesn’t contain the null values, you can consider the below query:
Select countif(value != 'null') count from table
Output:

Turning multiple rows into single row based on ID, and keeping null values

I have tried some of the various solutions posted on Stack for this issue but none of them keep null values (and it seems like the entire query is built off that assumption).
I have a table with 1 million rows. There are 10 columns. The first column is the id. Each id is unique to "item" (in my case a sales order) but has multiple rows. Each row is either completely null or has a single value in one of the columns. No two rows with the same ID have data for the same column. I need to merge these multiple rows into a single row based on the ID. However, I need to keep the null values. If the first column is null in all rows I need to keep that in the final data.
Can someone please help me with this query I've been stuck on it for 2 hours now.
id - Age - firstname - lastname
1 13 null null
1 null chris null
should output
1 13 chris null
It sounds like you want an aggregation query:
select id, max(col1) as col1, max(col2) as col2, . . .
from t
group by id;
If all values are NULL, then this will produce NULL. If one of the rows (for an id) has a value, then this will produce that value.
select id, max(col1), max(col2).. etc
from mytable
group by id
As some others have mentioned, you should use an aggregation query to achieve this.
select t1.id, max(t1.col1), max(t1.col2)
from tableone t1
group by t1.id
This should return nulls. If you're having issues handling your nulls, maybe implement some logic using ISNULL(). Make sure your data fields really are nulls and not empty strings.
If nulls aren't being returned, check to make sure that EVERY single row that has a particular ID has ONLY nulls. If one of them returns an empty string, then yes, it will drop the null and return anything else over the null.

Finding second highest number logic explanation

I need bit explanation on finding second highest valued column using query below
SELECT MAX( column )
FROM table
WHERE column < ( SELECT MAX( column )
FROM table )
assume i have a table with single column named as 'column' and it has values 10,20,30,40.
now my above query outputs 30 which is second highest anyway.
this is how it works as per my understanding.
inner query finds the MAX( column ) and it is 40 always in my case.
now for each column using WHERE we are checking whether its less than max value.
but how we store that result to find the MAX(column) using outer query ?
means somewhere we should have a list of values lesser than actual max value ie 40 in my case.
and that list would be 10,20,30.
and again we are finding max out of this list which is 30.
so here how and where it stores all the columns lesser than actual max value
(40 in this case) which is used at the end to find
max again using that list(10,20,30).
can any one explain me how this works ?
This is how the query is logically evaluated.
T (c) => { 10, 20, 30, 40 }
MAX(c) => 40
SELECT c FROM T WHERE c < 40 => { 10, 20, 30 }
SELECT MAX(c) FROM T WHERE c < 40 => 30
You have two queries here:
SELECT MAX( column )
FROM table
This finds the largest row in the column and returns it.
SELECT MAX( column )
FROM table
WHERE column <...
This finds the largest row in the column that is less than some condition
By their powers combined...
SELECT MAX( column )
FROM table
WHERE column < (SELECT MAX( column )
FROM table)
This finds the largest row in the column that is less than the largest row in the column (aka the second largest).
I see you understand this query, don't you?:
SELECT MAX( column )
FROM table
It simply returns max value stored in this table. So it is 40 in your example.
So now consider this query that simplifies issue a bit:
SELECT MAX( column )
FROM table
WHERE column < 40
I don't see any problems for you to understand this either. It is still the same query as above, but only considering rows that column value is less then 40. How exactly it is stored in database (as temporary table etc.) is problem of DBMS and you don't need to trouble yourself about it.
Please specify what exactly do not you understand and expect from us to clarify.
The inner query selects the top value from the table. The RDBMS is smart enough to hold this value in memory, then hit the table again looking for the top value from the table, except this time it's looking for the top value that's smaller than the original top value.
There's not much more too it than that. I didn't write the RDBMS, so I don't know exactly how it operates.
SELECT column FROM table ORDER BY column DESC LIMIT 2;
This will provide the top two, so you will need to parse the data from there.

Conditional ORDER BY depending on column values

I need to write a query that does this:
SELECT TOP 1
FROM a list of tables (Joins, etc)
ORDER BY Column X, Column Y, Column Z
If ColumnX is NOT NULL, then at the moment, I reselect, using a slightly different ORDER BY.
So, I do the same query, twice. If the first one has a NULL in a certain column, I return that row from my procedure. However, if the value isn't NULL - I have to do another identical select, except, order by a different column or two.
What I do now is select it into a temp table the first time. Then check the value of the column. If it's OK, return the temp table, else, redo the select and return that result set.
More details:
In english, the question I am asking the database:
Return my all the results for certain court appearance (By indexed foreign key). I expect around 1000 rows. Order it by the date of the appearance (column, not indexed, nullable), last appearance first. Check an 'importId'. If the import ID is not NULL for that top 1 row, then we need to run the same query - but this time, order by the Import ID (Last one first), and return that row. Or else, just return the top 1 row from the original query.
I'd say the BEST way to do this is in a single query is a CASE statement...
SELECT TOP 1 FROM ... ORDER BY
(CASE WHEN column1 IS NULL THEN column2 ELSE column1 END)
You could use a COALESCE function to turn nullable columns into orderby friendly values.
SELECT CAST(COALESCE(MyColumn, 0) AS money) AS Column1
FROM MyTable
ORDER BY Column1;
I used in Firebird (columns are numeric):
ORDER BY CASE <condition> WHEN <value> THEN <column1>*1000 + <column2> ELSE <column3>*1000 + <column4> END

Not getting the correct count in SQL

I am totally new to SQL. I have a simple select query similar to this:
SELECT COUNT(col1) FROM table1
There are some 120 records in the table and shown on the GUI.
For some reason, this query always returns a number which is less than the actual count.
Can somebody please help me?
Try
select count(*) from table1
Edit: To explain further, count(*) gives you the rowcount for a table, including duplicates and nulls. count(isnull(col1,0)) will do the same thing, but slightly slower, since isnull must be evaluated for each row.
You might have some null values in col1 column. Aggregate functions ignore nulls.
try this
SELECT COUNT(ISNULL(col1,0)) FROM table1
Slightly tangential, but there's also the useful
SELECT count(distinct cola) from table1
which gives you number of distinct column in the table.
You are getting the correct count
As per https://learn.microsoft.com
COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.
COUNT(ALL expression) evaluates an expression for each row in a group and returns the number of nonnull values.
COUNT(DISTINCT expression) evaluates an expression for each row in a group and returns the number of unique, non null values.
In your case you have passed the column name in COUNT that's why you will get count of not null records, now you're in your table data you may have null values in given column(col1)
Hope this helps!