Counting non-null columns in a rather strange way

Counting non-null columns in a rather strange way - sql

I have a table which has 32 columns in an Oracle table.
Two of these columns are identity columns
the rest are values
I would like to get the average of all the value columns, which is complicated by the null (identity) columns. Below is the pseudocode for what I am trying to achieve:
SELECT
((nvl(val0, 0) + nvl(val1, 0) + ... nvl(valn, 0))
/ nonZero_Column_Count_In_This_Row)
Such that: nonZero_Column_Count_In_This_Row = (ifNullThenZeroElse1(val0) + ifNullThenZeroElse1(val1) ... ifNullThenZeroElse(valn))
The difficulty here is of course in getting 1 for any non-null column. It seems I need a function similar to NVL, but with an else clause. Something that will return 0 if the value is null, but 1 if not, rather than the value itself.
How should I go about about getting the value for the denominator?
PS: I feel I must explain some motivation behind this design. Ideally this table would have been organized as the identity columns and one value per row with some identifier for the row itself. This would have made it more normalized and the solution to this problem would have been pretty simple. The reasons for it not to be done like this are throughput, and saving space. This is a huge DB where we insert 10 million values per minute into. Making each of these values one row would mean 10M rows per minute, which is definitely not attainable. Packing 30 of them into a single row reduces the number of rows inserted to something we can do with a single DB, and the overhead data amount (the identity data) much less.

(Case When col is null then 0 else 1 end)

You could use NVL2(val0, 1, 0) + NVL2(val1, 1, 0) + ... since you are using Oracle.

Another option is to use the AVG function, which ignores NULLs:
SELECT AVG(v) FROM (
WITH q AS (SELECT val0, val1, val2, val3 FROM mytable)
SELECT val0 AS v FROM q
UNION ALL SELECT val1 FROM q
UNION ALL SELECT val2 FROM q
UNION ALL SELECT val3 FROM q
);
If you're using Oracle11g you can use the UNPIVOT syntax to make it even simpler.

I see this is a pretty old question, but I don't see a sufficient answer. I had a similar problem, and below is how I solved it. It's pretty clear a case statement is needed. This solution is a workaround for such cases where
SELECT COUNT(column) WHERE column {IS | IS NOT} NULL
does not work for whatever reason, or, you need to do several
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL1 IS NOT NULL;
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL2 IS NOT NULL;
queries but want it as a data set when you run the script. See below; I use this for analysis and it's been working great for me so far.
SUM(CASE NVL(valn, 'X')
WHEN 'X'
THEN 0
ELSE 1
END) as COLUMN_NAME
FROM YOUR_TABLE;
Cheers!
Doug

Generically, you can do something like this:
SELECT (
(COALESCE(val0, 0) + COALESCE(val1, 0) + ...... COALESCE(valn, 0))
/
(SIGN(ABS(COALESCE(val0, 0))) + SIGN(ABS(COALESCE(val1, 0))) + .... )
) AS MyAverage
The top line will return the sum of values (omitting NULL values) whereas the bottom line will return the number of non-null values.
FYI - it's SQL Server syntax, but COALESCE is just like ISNULL for the most part. SIGN just returns -1 for a negative number, 0 for zero, and 1 for a positive number. ABS is "absolute value".

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?

By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result

No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable

I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.

With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.

You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this

Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte

Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

Counting number of null values in a row to divide by (unknown) number of columns

I'm using SQL Server 14 and I need to count the number of null values in a row to create a new column where a "% of completeness" for each row will be stored. For example, if 9 out of 10 columns contain values for a given row, the % for that row would be 90%.
I know this can be done via a number of Case expressions, but the thing is, this data will be used for a live dashboard and won't be under my supervision after completion.
I would like for this % to be calculated every time a function (or procedure? not sure what is used in this case) is run and need to know the number of columns that exist in my table in order to count the null values in a row and then divide by the number of columns to find the "% of completeness".
Any help is greatly appreciated!
Thank you

One method uses cross apply to unpivot the columns to rows and count the ratio of non-null values.
Assuming that your table has columns col1 to col4, you would write this as:
select t.*, x.*
from mytable t
cross apply (
select avg(case when col is not null then 1.0 else 0 end) completeness_ratio
from (values (col1), (col2), (col3), (col4)) x(col)
) x

Returning several values within a CASE expression in subquery and separate columns again in main query

My test table looks like this:
# select * from a;
source | target | id
--------+--------+----
1 | 2 | 1
2 | 3 | 2
3 | 0 | 3
My query is this one:
SELECT *
FROM (
SELECT
CASE
WHEN id<>1
THEN source
ELSE 0
END
AS source,
CASE
WHEN id<>1
THEN target
ELSE 0
END
AS target
FROM a
) x;
The query seems a bit odd because the CASE expression with the same criteria is repeated for every column. I would like to simplify this and tried the following, but it doesn't work as expected.
SELECT *
FROM (
SELECT
CASE
WHEN id<>1
THEN (source, target)
ELSE (0, 0)
END
AS r
FROM a
) x;
It yields one column with a row value, but I would rather get the two original columns. Separating them with a (r).* or similar doesn't work, because the "record type has not been registered".
I found several questions here with solutions regarding functions returning RECORD values, but none regarding this example with a sub-select.
Actually, there is a quite long list of columns, so repeating the same CASE expression many times makes the whole query quite unreadable.
Since the real problem - as opposed to this simplified case - consists of several CASE expressions and several column groups, a solution with a UNION won't help, because the number of UNIONs would be large and make it unreadable as well as several CASEs.
My actual question is: How can I get the original columns from the row value?

This answers the original question.
If I understood your needs, you want 0 and 0 for source and target when id = 1:
SELECT
0 AS source,
0 AS target
FROM tablename
WHERE id = 1
UNION ALL
SELECT
source,
target
FROM tablename
WHERE id <> 1

Revised answer: You can make your query work (fixing the record type has not been registered issue) by creating a TYPE:
CREATE TYPE stpair AS (source int, target int);
And cast the composite value column to that type:
SELECT id, (cv).source, (cv).target
FROM (
SELECT id, CASE
WHEN id <> 1 THEN (source, target)::stpair
ELSE (0, 0)::stpair
END AS cv
FROM t
) AS x
Having said that, it should be far more convenient to use arrays:
SELECT id, av[1] AS source, av[2] AS target
FROM (
SELECT id, CASE
WHEN id <> 1 THEN ARRAY[source, target]
ELSE ARRAY[0, 0]
END AS av
FROM t
) AS x
Demo on db<>fiddle

Will this work for you?
select source,target,id from a where id <>1 union all select 0 as source,0 as target,id from a where id=1 order by id
I have used union all to included cases where multiple records may have ID=1

SQL (SQLite) count for null-fields over all columns

I've got a table called datapoints with about 150 columns and 2600 rows. I know, 150 columns is too much, but I got this db after importing a csv and it is not possible to shrink the number of columns.
I have to get some statistical stuff out of the data. E.g. one question would be:
Give me the total number of fields (of all columns), which are null. Does somebody have any idea how I can do this efficiently?
For one column it isn't a problem:
SELECT count(*) FROM datapoints tb1 where 'tb1'.'column1' is null;
But how can I solve this for all columns together, without doing it by hand for every column?
Best,
Michael

Building on Lamak's idea, how about this idea:
SELECT (N * COUNT(*)) - (
COUNT(COLUMN_1)
+ COUNT(COLUMN_2)
+ ...
+ COUNT(COLUMN_N)
)
FROM DATAPOINTS;
where N is the number of columns. The trick will be in making the summation series of COUNT(column), but that shouldn't be too terrible with a good text editor and/or spreadsheet.

i don't think there is an easy way to do it. i'd get started on the 150 queries. you only have to replace one word (column name) each time.

Well, COUNT (and most aggregations funcions) ignore NULL values. In your case, since you are using COUNT(*), it counts every row in the table, but you can do that on any column. Something like this:
SELECT TotalRows-Column1NotNullCount, etc
FROM (
SELECT COUNT(1) TotalRows,
COUNT(column1) Column1NotNullCount,
COUNT(column2) Column2NotNullCount,
COUNT(column3) Column3NotNullCount ....
FROM datapoints) A

To get started it's often helpful to use a visual query tool to generate a field list and then use cut/paste/search/replace or manipulation in a spreadsheet program to transform it into what is needed. To do it all in one step you can use something like:
SELECT SUM(CASE COLUMN1 WHEN NULL THEN 1 ELSE 0 END) +
SUM(CASE COLUMN2 WHEN NULL THEN 1 ELSE 0 END) +
SUM(CASE COLUMN3 WHEN NULL THEN 1 ELSE 0 END) +
...
FROM DATAPOINTS;
With a visual query builder you can quickly generate:
SELECT COLUMN1, COLUMN2, COLUMN3 ... FROM DATAPOINTS;
You can then replace the comma with all the text that needs to appear between two field names followed by fixing up the first and last fields. So in the example search for "," and replace with " WHEN NULL 1 ELSE 0 END) + SUM(CASE " and then fix up the first and last fields.

How to count two fields by using Select statement in SQL Server 2005?

Total records in table are 10.
Select count(ID) from table1 where col1 = 1 (Result is 8)
Select count(ID) from table1 where col1 = 0 (Result is 2)
So its a same table but count is based on different condition. How am i gonna get two results (counts) using one select statement?
Also Performance is a big concern here.
PS: I am using Stored procedure...
EDIT:
I wanna clear the above query is just a part of a big SP logic (for me at least). Since i got these following answers, it gave another idea to achieve it in different way. My above question is a bit changed now.....Please help here? Its a same col (bool type) with true or false state.

Use CASE:
SELECT
SUM(CASE col1 WHEN 1 THEN 1 ELSE 0 END) AS Count1,
SUM(CASE col1 WHEN 0 THEN 1 ELSE 0 END) AS Count0
FROM table1

You should use subselects or UNIONS, I don't see the other way...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Counting non-null columns in a rather strange way - sql

(Case When col is null then 0 else 1 end)

You could use NVL2(val0, 1, 0) + NVL2(val1, 1, 0) + ... since you are using Oracle.

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

Counting number of null values in a row to divide by (unknown) number of columns

Returning several values within a CASE expression in subquery and separate columns again in main query

SQL (SQLite) count for null-fields over all columns

How to count two fields by using Select statement in SQL Server 2005?

Categories

Resources