Counting number of null values in a row to divide by (unknown) number of columns

Counting number of null values in a row to divide by (unknown) number of columns - sql

I'm using SQL Server 14 and I need to count the number of null values in a row to create a new column where a "% of completeness" for each row will be stored. For example, if 9 out of 10 columns contain values for a given row, the % for that row would be 90%.
I know this can be done via a number of Case expressions, but the thing is, this data will be used for a live dashboard and won't be under my supervision after completion.
I would like for this % to be calculated every time a function (or procedure? not sure what is used in this case) is run and need to know the number of columns that exist in my table in order to count the null values in a row and then divide by the number of columns to find the "% of completeness".
Any help is greatly appreciated!
Thank you

One method uses cross apply to unpivot the columns to rows and count the ratio of non-null values.
Assuming that your table has columns col1 to col4, you would write this as:
select t.*, x.*
from mytable t
cross apply (
select avg(case when col is not null then 1.0 else 0 end) completeness_ratio
from (values (col1), (col2), (col3), (col4)) x(col)
) x

Related

How to take average of two columns row by row in SQL?

I have a table match which looks like this (please see attached image). I wanted to retrieve a dataset that had a column of average values for home_goal and away_goal using this code
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
AVG(m.home_goal + m.away_goal) AS avg_goal
FROM match AS m;
However, I got this error
column "m.country_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3: m.country_id,
My question is: why was GROUP BY clause required? Why couldn't SQL know how to take average of two columns row by row?
Thank you.

try this:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal)/2 AS avg_goal
FROM match AS m;
You have been asked for the group_by as avg() much like sum() work on multiple values of one column where you classify all columns that are not a columns wise operation in the group by
You are looking to average two distinct columns - it is a row-wise operations instead of column-wise

how to take average of two columns row by row?
You don't use AVG() for this; it is an aggregate function, that operates over a set of rows. Here, it seems like you just want a simple math computation:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) / 2.0 AS avg_goal
FROM match AS m;
Note the decimal denominator (2.0): this avoids integer division in databases that implement it.

Avg in the context of the function mentioned above is calculating the average of the values of the columns and not the average of the two values in the same row. It is an aggregate function and that’s why the group by clause is required.
In order to take the average of two columns in the same row you need to divide by 2.

Let's consider the following table:
CREATE TABLE Numbers([x] int, [y] int, [category] nvarchar(10));
INSERT INTO Numbers ([x], [y], [category])
VALUES
(1, 11, 'odd'),
(2, 22, 'even'),
(3, 33, 'odd'),
(4, 44, 'even');
Here is an example of using two aggregate functions - AVG and SUM - with GROUP BY:
SELECT
Category,
AVG(x) as avg_x,
AVG(x+y) as avg_xy,
SUM(x) as sum_x,
SUM(x+y) as sum_xy
FROM Numbers
GROUP BY Category
The result has two rows:
Category avg_x avg_xy sum_x sum_xy
even 3 36 6 72
odd 2 24 4 48
Please note that Category is available in the SELECT part because the results are GROUP BY'ed by it. If a GROUP BY is not specified then the result would be 1 row and Category is not available (which value should be displayed if we have sums and averages for multiple rows with different caetories?).
What you want is to compute a new column and for this you don't use aggregate functions:
SELECT
(x+y)/2 as avg_xy,
(x+y) as sum_xy
FROM Numbers
This returns all rows:
avg_xy sum_xy
6 12
12 24
18 36
24 48
If your columns are integers don't forget to handle rounding, if needed. For example (CAST(x AS DECIMAL)+y)/2 as avg_xy,

The simple arithmetic calculation:
(m.home_goal + m.away_goal) / 2.0
is not exactly equivalent to AVG(), because NULL values mess it up. Databases that support lateral joins provide a pretty easy (and efficient) way to actually use AVG() within a row.
The safe version looks like:
(coalesce(m.home_goal, 0) + coalesce(m.away_goal, 0)) /
nullif( (case when m.home_goal is not null then 1 else 0 end +
case when m.away_goal is not null then 1 else 0 end
), 0
)
Some databases have syntax extensions that allow the expression to be simplified.

How to get three count values from same column using SQL in Access?

I have a table that has an integer column from which I am trying to get a few counts from. Basically I need four separate counts from the same column. The first value I need returned is the count of how many records have an integer value stored in this column between two values such as 213 and 9999, including the min and max values. The other three count values I need returned are just the count of records between different values of this column. I've tried doing queries like...
SELECT (SELECT Count(ID) FROM view1 WHERE ((MyIntColumn BETWEEN 213 AND 9999));)
AS Value1, (SELECT Count(ID) FROM FROM view1 WHERE ((MyIntColumn BETWEEN 500 AND 600));) AS Value2 FROM view1;
So there are for example, ten records with this column value between 213 and 9999. The result returned from this query gives me 10, but it gives me the same value of 10, 618 times which is the number of total records in the table. How would it be possible for me to only have it return one record of 10 instead?

Use the Iif() function instead of CASE WHEN
select Condition1: iif( ), condition2: iif( ), etc
P.S. : What I used to do when working with Access was have the iif() resolve to 1 or 0 and then do a SUM() to get the counts. Roundabout but it worked better with aggregation since it avoided nulls.

SELECT
COUNT(CASE
WHEN MyIntColumn >= 213 AND MyIntColumn <= 9999
THEN MyIntColumn
ELSE NULL
END) AS FirstValue
, ??? AS SecondValue
, ??? AS ThirdValue
, ??? AS FourthValue
FROM Table
This doesn't need nesting or CTE or anything. Just define via CASE your condition within COUNTs argument.
I dont really understand what You want in the second, third an fourth column. Sounds to me, its very similar to the first one.

Reformatted, your query looks like:
SELECT (
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999
) AS Value1
FROM view1;
So you are selecting a subquery expression that is not related to the outer query. For each row in view1, you calculate the number of rows in view1.
Instead, try to do the calculation once. You just have to remove your outer query:
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999;

OLEDB Connection in MS Access does not support key words CASE and WHEN .
You can only use iif() function to count two, three.. values in same columns
SELECT Attendance.StudentName, Count(IIf([Attendance]![Yes_No]='Yes',1,Null)) AS Yes, Count(IIf([Attendance]![Yes_No]='No',1,Null)) AS [No], Count(IIf([Attendance]![Yes_No]='Not',1,Null)) AS [Not], Count(IIf([Attendance]![Yes_No],1,Null)) AS Total
FROM Attendance
GROUP BY Attendance.StudentName;

Split a query result based on the result count

I have a query based on basic criteria that will return X number of records on any given day.
I'm trying to check the result of the basic query then apply a percentage split to it based on the total of X and split it in 2 buckets. Each bucket will be a percentage of the total query result returned in X.
For example:
Query A returns 3500 records.
If the number of records returned from Query A is <= 3000, then split the 3500 records into a 40% / 60% split (1,400 / 2,100).
If the number of records returned from Query A is >=3001 and <=50,000 then split the records into a 10% / 90% split.Etc. Etc.
I want the actual records returned, and not just the math acting on the records that returns one row with a number in it (in the column).

I'm not sure how you want to display different parts of the resulting set of rows, so I've just added additional column(part) in the resulting set of rows that contains values 1 indicating that row belongs to the first part and 2 - second part.
select z.*
, case
when cnt_all <= 3000 and cnt <= 40
then 1
when (cnt_all between 3001 and 50000) and (cnt <= 10)
then 1
else 2
end part
from (select t.*
, 100*(count(col1) over(order by col1) / count(col1) over() )cnt
, count(col1) over() cnt_all
from split_rowset t
order by col1
) z
Demo #1 number of rows 3000.
Demo #2 number of rows 3500.
For better usability you can create a view using the query above and then query that view filtering by part column.
Demo #3 using of a view.

SQL Server SQL Select: How do I select rows where sum of a column is within a specified multiple?

I have a process that needs to select rows from a Table (queued items) each row has a quantity column and I need to select rows where the quantities add to a specific multiple. The mulitple is the order of between around 4, 8, 10 (but could in theory be any multiple. (odd or even)
Any suggestions on how to select rows where the sum of a field is of a specified multiple?

My first thought would be to use some kind of MOD function which I believe in SQL server is the % sign. So the criteria would be something like this
WHERE MyField % 4 = 0 OR MyField % 8 = 0
It might not be that fast so another way might be to make a temp table containing say 100 values of the X times table (where X is the multiple you are looking for) and join on that

Counting non-null columns in a rather strange way

I have a table which has 32 columns in an Oracle table.
Two of these columns are identity columns
the rest are values
I would like to get the average of all the value columns, which is complicated by the null (identity) columns. Below is the pseudocode for what I am trying to achieve:
SELECT
((nvl(val0, 0) + nvl(val1, 0) + ... nvl(valn, 0))
/ nonZero_Column_Count_In_This_Row)
Such that: nonZero_Column_Count_In_This_Row = (ifNullThenZeroElse1(val0) + ifNullThenZeroElse1(val1) ... ifNullThenZeroElse(valn))
The difficulty here is of course in getting 1 for any non-null column. It seems I need a function similar to NVL, but with an else clause. Something that will return 0 if the value is null, but 1 if not, rather than the value itself.
How should I go about about getting the value for the denominator?
PS: I feel I must explain some motivation behind this design. Ideally this table would have been organized as the identity columns and one value per row with some identifier for the row itself. This would have made it more normalized and the solution to this problem would have been pretty simple. The reasons for it not to be done like this are throughput, and saving space. This is a huge DB where we insert 10 million values per minute into. Making each of these values one row would mean 10M rows per minute, which is definitely not attainable. Packing 30 of them into a single row reduces the number of rows inserted to something we can do with a single DB, and the overhead data amount (the identity data) much less.

(Case When col is null then 0 else 1 end)

You could use NVL2(val0, 1, 0) + NVL2(val1, 1, 0) + ... since you are using Oracle.

Another option is to use the AVG function, which ignores NULLs:
SELECT AVG(v) FROM (
WITH q AS (SELECT val0, val1, val2, val3 FROM mytable)
SELECT val0 AS v FROM q
UNION ALL SELECT val1 FROM q
UNION ALL SELECT val2 FROM q
UNION ALL SELECT val3 FROM q
);
If you're using Oracle11g you can use the UNPIVOT syntax to make it even simpler.

I see this is a pretty old question, but I don't see a sufficient answer. I had a similar problem, and below is how I solved it. It's pretty clear a case statement is needed. This solution is a workaround for such cases where
SELECT COUNT(column) WHERE column {IS | IS NOT} NULL
does not work for whatever reason, or, you need to do several
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL1 IS NOT NULL;
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL2 IS NOT NULL;
queries but want it as a data set when you run the script. See below; I use this for analysis and it's been working great for me so far.
SUM(CASE NVL(valn, 'X')
WHEN 'X'
THEN 0
ELSE 1
END) as COLUMN_NAME
FROM YOUR_TABLE;
Cheers!
Doug

Generically, you can do something like this:
SELECT (
(COALESCE(val0, 0) + COALESCE(val1, 0) + ...... COALESCE(valn, 0))
/
(SIGN(ABS(COALESCE(val0, 0))) + SIGN(ABS(COALESCE(val1, 0))) + .... )
) AS MyAverage
The top line will return the sum of values (omitting NULL values) whereas the bottom line will return the number of non-null values.
FYI - it's SQL Server syntax, but COALESCE is just like ISNULL for the most part. SIGN just returns -1 for a negative number, 0 for zero, and 1 for a positive number. ABS is "absolute value".

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Counting number of null values in a row to divide by (unknown) number of columns - sql

Related

How to take average of two columns row by row in SQL?

How to get three count values from same column using SQL in Access?

Split a query result based on the result count

SQL Server SQL Select: How do I select rows where sum of a column is within a specified multiple?

Counting non-null columns in a rather strange way

Categories

Resources