Average without avg function - sql

I need to calculate average value from 6 columns but when there is an empty field I don't want to have it calculated.
For example, if I have (2, 3, 2, 3, 2, 3) I should get 15/6 = 2,5 but when it is (2, 3, 2, 3, 'empty', 'empty') I need to have 10/4 = 2,5.
How to achieve this?
This is average from 6 columns co I can't use avg function which ignores NULLs as default.

Ignoring nulls is QUITE easy, sqlserver already ignores null values:
SELECT avg(x)
FROM (values(4),(6),(null)) x(x)
Result
5
Edit since your comment says your numeric column is a varchar - which is silly, do this instead to handle empty and null values:
SELECT
avg(cast(nullif(x, '') as decimal(18,2)))
FROM (values('4'),('6'),(null), ('')) x(x)

You can create a view or subquery which skips rows with "empty" values and take an average on the view or subquery.
SELECT AVG(value) FROM
(SELECT value FROM table WHERE value <> the_value_that_janek_considers_as_empty)
But I would seriously advise you to reconsider your notion of "empty". There is no such thing as an empty numeric value in databases. Either you say "empty" when in fact you mean "null", or you are doing something horrible, like storing numbers in textual columns.

Catch your NULL values by using CASE statement. See Below:
SELECT
AVG(ISNULL(Column_to_Average,0)[Column_to_Average])
FROM #yourtable
OR
SELECT
AVG(CASE WHEN Column_to_Average IS NULL OR Column_to_Average=''
THEN 0 ELSE Column_to_Average END)
FROM #yourtable
OR You can also do this if you want to get the average of each rows.
SELECT
(
(CASE WHEN Column_to_Average_1 IS NULL OR Column_to_Average_1='' THEN 0 ELSE Column_to_Average_1 END) +
(CASE WHEN Column_to_Average_2 IS NULL OR Column_to_Average_2='' THEN 0 ELSE Column_to_Average_2 END) +
(CASE WHEN Column_to_Average_3 IS NULL OR Column_to_Average_3='' THEN 0 ELSE Column_to_Average_3 END) +
(CASE WHEN Column_to_Average_4 IS NULL OR Column_to_Average_4='' THEN 0 ELSE Column_to_Average_4 END) +
(CASE WHEN Column_to_Average_5 IS NULL OR Column_to_Average_5='' THEN 0 ELSE Column_to_Average_5 END) +
(CASE WHEN Column_to_Average_6 IS NULL OR Column_to_Average_6='' THEN 0 ELSE Column_to_Average_6 END)
)/6 Column_to_Average
FROM #YourTable

select sun(x) --sql server ignore null,
count(x) --sql server ignore null,
sum(x)/count(x) as avgx
from tbl

You face two problems:
You have to convert the strings to numbers, and avoid the null and empty strings.
You need to calculate the average of columns, so the avg function isn't usable as long as the data is in that form.
Basically there are two possible approaches, either you can just use the column values, or you can turn the columns into rows.
Using the column values naturally becomes repetetive. You have to both convert each column, and check how many usable columns there are:
select
(
case when Col1 is null or Col1 = '' then 0.0 else cast(Col1 as float) end +
case when Col2 is null or Col2 = '' then 0.0 else cast(Col2 as float) end +
case when Col3 is null or Col3 = '' then 0.0 else cast(Col3 as float) end +
case when Col4 is null or Col4 = '' then 0.0 else cast(Col4 as float) end +
case when Col5 is null or Col5 = '' then 0.0 else cast(Col5 as float) end +
case when Col6 is null or Col6 = '' then 0.0 else cast(Col6 as float) end
) / (
case when Col1 is null or Col1 = '' then 0.0 else 1.0 end +
case when Col2 is null or Col2 = '' then 0.0 else 1.0 end +
case when Col3 is null or Col3 = '' then 0.0 else 1.0 end +
case when Col4 is null or Col4 = '' then 0.0 else 1.0 end +
case when Col5 is null or Col5 = '' then 0.0 else 1.0 end +
case when Col6 is null or Col6 = '' then 0.0 else 1.0 end
) as Average
from
TheTable
You could create user defined function to do the conversion and counting, but you still need to do it for each column.
(A word of caution also; if there isn't a value in any of the columns, this would give you a division by zero error.)
You can use unpivot to turn the columns into rows, but you need a unique value to group the result on to keep the rows together that belong together. Example:
select
Id, avg(cast(Col as float))
from
TheTable
unpivot
(Col for ColName in (Col1, Col2, Col3, Col4, Col5, Col6)) x
where
Col <> '' and Col is not null
group by
Id

Related

Concat String columns in hive

I need to concat 3 columns from my table say a,b,c. If the length of the columns is greater than 0 then I have to concat all 3 columns and store it as another column d in the below format.
1:a2:b3:c
I have tried the following query but I am not sure how to proceed as I am getting null as the result.
select a,b,c,
case when length(a) >0 then '1:'+a else '' end + case when length(b) > 0 then '2:'+b else '' end + case when length(c) > 0 then '3:'+c else '' end AS d
from xyz;
Appreciate the help :)
Use concat() function:
select a,b,c,
concat(
case when length(a)>0 then concat('1:',a) else '' end,
case when length(b)>0 then concat('2:',b) else '' end,
case when length(c)>0 then concat('3:',c) else '' end
) as d
from (--test dataset
select stack(4, 'a','b','c', --all
'','b','c', --one empty
null,'b','c', --null
'','','' --all empty
) as (a,b,c)
)your_data;
Result:
OK
a b c 1:a2:b3:c
b c 2:b3:c
NULL b c 2:b3:c
Time taken: 0.284 seconds, Fetched: 4 row(s) - last one row is empty
As of Hive 2.2.0. you can use || operator instead of concat:
select a,b,c,
case when length(a)>0 then '1:'||a else '' end||
case when length(b)>0 then '2:'||b else '' end||
case when length(c)>0 then '3:'||c else '' end as d

How to count unique integers in a string using hive?

Trying to count the unique bytes in a string?
DATA (Phone numbers for example with only numeric bytes):
1234567890
1111111112
Results:
10
2
I have tried the below and it didn't work because the sum() won't accept the UDF 'if' with in it, I think.
select phone
, sum(
cast(if(length(regexp_replace(phone,'0',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'1',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'2',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'3',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'4',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'5',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'6',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'7',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'8',''))<10,'1','0') as int) +
cast(if(length(regexp_replace(phone,'9',''))<10,'1','0') as int)
) as unique_bytes
from table;
I am not apposed to regular expressions as a solution either.
Use + . . . but like this:
select phone,
((case when phone like '%0%' then 1 else 0 end) +
(case when phone like '%1%' then 1 else 0 end) +
(case when phone like '%2%' then 1 else 0 end) +
(case when phone like '%3%' then 1 else 0 end) +
(case when phone like '%4%' then 1 else 0 end) +
(case when phone like '%5%' then 1 else 0 end) +
(case when phone like '%6%' then 1 else 0 end) +
(case when phone like '%7%' then 1 else 0 end) +
(case when phone like '%8%' then 1 else 0 end) +
(case when phone like '%9%' then 1 else 0 end) +
) as ints
from table;
Your code has several issues:
sum() is an aggregation function and is not needed.
The if() is returning strings, but you are adding the values together.
I'm not sure why you are using regexp_replace() rather than just replace().
with tab1 as (
select stack(3,
'1','1234567890',
'2','1111111112',
'3','2222222223') as (col0, col1))
select tab1.col0, count(distinct tf.col) from tab1 lateral view explode(split(tab1.col1,'')) tf as col
where tf.col regexp '\\d'
group by tab1.col0

SQL -Find and Replace the value in the table

I have a table in my database. It contains 35 columns and 150 rows. Some of its values are 0. How can I replace these 0 with the character '-' ?
Just use UPDATE to do this:
UPDATE yourtable
SET col1 = CASE WHEN col1 = '0' THEN '-' ELSE col1 END,
col2 = CASE WHEN col2 = '0' THEN '-' ELSE col2 END,
col3 = ....
In SQL Server:
SELECT REPLACE(Column1,'0','-'),REPLACE(Column2,'0','-'), REPLACE(Column3,'0','-')etc..
FROM YOUR_TABLE_NAME;

Varchar to Decimal conversion in DB2 with empty and null values

I have a varchar field AMOUNT in DB2.
Possible table state values:
ID AMOUNT
-------------------------
1 123.4578
2 NULL
2 123.78
1 -8562.85441
2
1 0.0
-------------------------
Column AMOUNT can be empty as the second last row above depicts.
I want to do a SUM over the AMOUNT group by ID in a query.
The result SUM should be DECIMAL(16,2).
What will be the correct way to do that considering that the value can be both null and empty and also the number of digits after the decimal is not following any fixed format?
It's a duplicate of this question, but the answers given are not complete and not accepted.
Thanks for reading!
I think the code you want is:
select (case when amount <> '' then cast(amount as float) end)
from table t;
You don't have to worry about NULL values. sum() ignores them. If you want to get 0 if all values are NULL/blank, then add else 0 to the case statement.
If you are concerned about other non-numeric values, you can try:
select (case when amount <> '' and
not regexp_like(amount, '[^0-9.]', '')
then cast(amount as float) end
from table t;
CAST(CASE WHEN ISNULL(AMOUNT, '') = '' THEN NULL ELSE REPLACE(AMOUNT, ',', '.') END AS DECIMAL (16,2)) AS AMOUNT
or
CAST (123.4578 AS DECIMAL (16,2))
+ CAST (NULL AS DECIMAL (16,2))
+ CAST (123.78 AS DECIMAL (16,2))
+ CAST (-8562.85441 AS DECIMAL (16,2))
= RESULT DECIMAL(16,2)

Count the Null columns in a row in SQL

I was wondering about the possibility to count the null columns of row in SQL, I have a table Customer that has nullable values, simply I want a query that return an int of the number of null columns for certain row(certain customer).
This method assigns a 1 or 0 for null columns, and adds them all together. Hopefully you don't have too many nullable columns to add up here...
SELECT
((CASE WHEN col1 IS NULL THEN 1 ELSE 0 END)
+ (CASE WHEN col2 IS NULL THEN 1 ELSE 0 END)
+ (CASE WHEN col3 IS NULL THEN 1 ELSE 0 END)
...
...
+ (CASE WHEN col10 IS NULL THEN 1 ELSE 0 END)) AS sum_of_nulls
FROM table
WHERE Customer=some_cust_id
Note, you can also do this perhaps a little more syntactically cleanly with IF() if your RDBMS supports it.
SELECT
(IF(col1 IS NULL, 1, 0)
+ IF(col2 IS NULL, 1, 0)
+ IF(col3 IS NULL, 1, 0)
...
...
+ IF(col10 IS NULL, 1, 0)) AS sum_of_nulls
FROM table
WHERE Customer=some_cust_id
I tested this pattern against a table and it appears to work properly.
My answer builds on Michael Berkowski's answer, but to avoid having to type out hundreds of column names, what I did was this:
Step 1: Get a list of all of the columns in your table
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'myTable';
Step 2: Paste the list in Notepad++ (any editor that supports regular expression replacement will work). Then use this replacement pattern
Search:
^(.*)$
Replace:
\(CASE WHEN \1 IS NULL THEN 1 ELSE 0 END\) +
Step 3: Prepend SELECT identityColumnName, and change the very last + to AS NullCount FROM myTable and optionally add an ORDER BY...
SELECT
identityColumnName,
(CASE WHEN column001 IS NULL THEN 1 ELSE 0 END) +
-- ...
(CASE WHEN column200 IS NULL THEN 1 ELSE 0 END) AS NullCount
FROM
myTable
ORDER BY
NullCount DESC
For ORACLE-DBMS only.
You can use the NVL2 function:
NVL2( string1, value_if_not_null, value_if_null )
Here is a select with a similiar approach as Michael Berkowski suggested:
SELECT (NVL2(col1, 0, 1)
+ NVL2(col2, 0, 1)
+ NVL2(col3, 0, 1)
...
...
+ NVL2(col10, 0, 1)
) AS sum_of_nulls
FROM table
WHERE Customer=some_cust_id
A more generic approach would be to write a PL/SQL-block and use dynamic SQL. You have to build a SELECT string with the NVL2 method from above for every column in the all_tab_columns of a specific table.
Unfortunately, in a standard SQL statement you will have to enter each column you want to test, to test all programatically you could use T-SQL. A word of warning though, ensure you are working with genuine NULLS, you can have blank stored values that the database will not recognise as a true NULL (I know this sounds strange).
You can avoid this by capturing the blank values and the NULLS in a statement like this:
CASE WHEN col1 & '' = '' THEN 1 ELSE 0 END
Or in some databases such as Oracle (not sure if there are any others) you would use:
CASE WHEN col1 || '' = '' THEN 1 ELSE 0 END
You don't state RDBMS. For SQL Server 2008...
SELECT CustomerId,
(SELECT COUNT(*) - COUNT(C)
FROM (VALUES(CAST(Col1 AS SQL_VARIANT)),
(Col2),
/*....*/
(Col9),
(Col10)) T(C)) AS NumberOfNulls
FROM Customer
Depending on what you want to do, and if you ignore mavens, and if you use SQL Server 2012, you could to it another way. .
The total number of candidate columns ("slots") must be known.
1. Select all the known "slots" column by column (they're known).
2. Unpivot that result to get a
table with one row per original column. This works because the null columns don't
unpivot, and you know all the column names.
3. Count(*) the result to get the number of non-nulls;
subtract from that to get your answer.
Like this, for 4 "seats" in a car
select 'empty seats' = 4 - count(*)
from
(
select carId, seat1,seat2,seat3,seat4 from cars where carId = #carId
) carSpec
unpivot (FieldValue FOR seat in ([seat1],[seat2],[seat3],[seat4])) AS results
This is useful if you may need to do more later than just count the number of non-null columns, as it gives you a way to manipulate the columns as a set too.
This will give you the number of columns which are not null. you can apply this appropriately
SELECT ISNULL(COUNT(col1),'') + ISNULL(COUNT(col2),'') +ISNULL(COUNT(col3),'')
FROM TABLENAME
WHERE ID=1
The below script gives you the NULL value count within a row i.e. how many columns do not have values.
{SELECT
*,
(SELECT COUNT(*)
FROM (VALUES (Tab.Col1)
,(Tab.Col2)
,(Tab.Col3)
,(Tab.Col4)) InnerTab(Col)
WHERE Col IS NULL) NullColumnCount
FROM (VALUES(1,2,3,4)
,(NULL,2,NULL,4)
,(1,NULL,NULL,NULL)) Tab(Col1,Col2,Col3,Col4) }
Just to demonstrate I am using an inline table in my example.
Try to cast or convert all column values to a common type it will help you to compare the column of different type.
I haven't tested it yet, but I'd try to do it using a PL\SQL function
CREATE OR REPLACE TYPE ANYARRAY AS TABLE OF ANYDATA
;
CREATE OR REPLACE Function COUNT_NULL
( ARR IN ANYARRAY )
RETURN number
IS
cnumber number ;
BEGIN
for i in 1 .. ARR.count loop
if ARR(i).column_value is null then
cnumber := cnumber + 1;
end if;
end loop;
RETURN cnumber;
EXCEPTION
WHEN OTHERS THEN
raise_application_error
(-20001,'An error was encountered - '
||SQLCODE||' -ERROR- '||SQLERRM);
END
;
Then use it in a select query like this
CREATE TABLE TEST (A NUMBER, B NUMBER, C NUMBER);
INSERT INTO TEST (NULL,NULL,NULL);
INSERT INTO TEST (1 ,NULL,NULL);
INSERT INTO TEST (1 ,2 ,NULL);
INSERT INTO TEST (1 ,2 ,3 );
SELECT ROWNUM,COUNT_NULL(A,B,C) AS NULL_COUNT FROM TEST;
Expected output
ROWNUM | NULL_COUNT
-------+-----------
1 | 3
2 | 2
3 | 1
4 | 0
This is how i tried
CREATE TABLE #temptablelocal (id int NOT NULL, column1 varchar(10) NULL, column2 varchar(10) NULL, column3 varchar(10) NULL, column4 varchar(10) NULL, column5 varchar(10) NULL, column6 varchar(10) NULL);
INSERT INTO #temptablelocal
VALUES (1,
NULL,
'a',
NULL,
'b',
NULL,
'c')
SELECT *
FROM #temptablelocal
WHERE id =1
SELECT count(1) countnull
FROM
(SELECT a.ID,
b.column_title,
column_val = CASE b.column_title
WHEN 'column1' THEN a.column1
WHEN 'column2' THEN a.column2
WHEN 'column3' THEN a.column3
WHEN 'column4' THEN a.column4
WHEN 'column5' THEN a.column5
WHEN 'column6' THEN a.column6
END
FROM
( SELECT id,
column1,
column2,
column3,
column4,
column5,
column6
FROM #temptablelocal
WHERE id =1 ) a
CROSS JOIN
( SELECT 'column1'
UNION ALL SELECT 'column2'
UNION ALL SELECT 'column3'
UNION ALL SELECT 'column4'
UNION ALL SELECT 'column5'
UNION ALL SELECT 'column6' ) b (column_title) ) AS pop WHERE column_val IS NULL
DROP TABLE #temptablelocal
Similary, but dynamically:
drop table if exists myschema.table_with_nulls;
create table myschema.table_with_nulls as
select
n1::integer,
n2::integer,
n3::integer,
n4::integer,
c1::character varying,
c2::character varying,
c3::character varying,
c4::character varying
from
(
values
(1,2,3,4,'a','b','c','d'),
(1,2,3,null,'a','b','c',null),
(1,2,null,null,'a','b',null,null),
(1,null,null,null,'a',null,null,null)
) as test_records(n1, n2, n3, n4, c1, c2, c3, c4);
drop function if exists myschema.count_nulls(varchar,varchar);
create function myschema.count_nulls(schemaname varchar, tablename varchar) returns void as
$BODY$
declare
calc varchar;
sqlstring varchar;
begin
select
array_to_string(array_agg('(' || trim(column_name) || ' is null)::integer'),' + ')
into
calc
from
information_schema.columns
where
table_schema in ('myschema')
and table_name in ('table_with_nulls');
sqlstring = 'create temp view count_nulls as select *, ' || calc || '::integer as count_nulls from myschema.table_with_nulls';
execute sqlstring;
return;
end;
$BODY$ LANGUAGE plpgsql STRICT;
select * from myschema.count_nulls('myschema'::varchar,'table_with_nulls'::varchar);
select
*
from
count_nulls;
Though I see that I didn't finish parametising the function.
My answer builds on Drew Chapin's answer, but with changes to get the result using a single script:
use <add_database_here>;
Declare #val Varchar(MAX);
Select #val = COALESCE(#val + str, str) From
(SELECT
'(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) +' str
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '<add table name here>'
) t1 -- getting column names and adding the case when to replace NULLs for zeros or ones
Select #val = SUBSTRING(#val,1,LEN(#val) - 1) -- removing trailling add sign
Select #val = 'SELECT <add_identity_column_here>, ' + #val + ' AS NullCount FROM <add table name here>' -- adding the 'select' for the column identity, the 'alias' for the null count column, and the 'from'
EXEC (#val) --executing the resulting sql
With ORACLE:
Number_of_columns - json_value( json_array( comma separated list of columns ), '$.size()' ) from your_table
json_array will build an array with only the non null columns and the json_query expression will give you the size of the array
There isn't a straightforward way of doing so like there would be with counting rows. Basically, you have to enumerate all the columns that might be null in one expression.
So for a table with possibly null columns a, b, c, you could do this:
SELECT key_column, COALESCE(a,0) + COALESCE(b,0) + COALESCE(c,0) null_col_count
FROM my_table