mysql: Average over multiple columns in one row, ignoring nulls

mysql: Average over multiple columns in one row, ignoring nulls - sql

I have a large table (of sites) with several numeric columns - say a through f. (These are site rankings from different organizations, like alexa, google, quantcast, etc. Each has a different range and format; they're straight dumps from the outside DBs.)
For many of the records, one or more of these columns is null, because the outside DB doesn't have data for it. They all cover different subsets of my DB.
I want column t to be their weighted average (each of a..f have static weights which I assign), ignoring null values (which can occur in any of them), except being null if they're all null.
I would prefer to do this with a simple SQL calculation, rather than doing it in app code or using some huge ugly nested if block to handle every permutation of nulls. (Given that I have an increasing number of columns to average over as I add in more outside DB sources, this would be exponentially more ugly and bug-prone.)
I'd use AVG but that's only for group by, and this is w/in one record. The data is semantically nullable, and I don't want to average in some "average" value in place of the nulls; I want to only be counting the columns for which data is there.
Is there a good way to do this?
Ideally, what I want is something like UPDATE sites SET t = AVG(a*#a_weight,b*#b_weight,...) where any null values are just ignored and no grouping is happening.
EDIT: What I ended up using, based on van's and adding in correct weighted averages (assuming that a has already been normalized as needed, in this case to a float 0-1 (1 = better):
UPDATE sites
SET t = (#a_weight * IFNULL(a, 0) + ...) / (IF(a IS NULL, 0, #a_weight) + ...)
WHERE (IF(a IS NULL, 0, 1) + ...) > 0

UPDATE sites
--// TODO: you might need to round it depending on your type
SET t =(COALESCE(a, 0) +
COALESCE(b, 0) +
COALESCE(c, 0) +
COALESCE(d, 0) +
COALESCE(e, 0) +
COALESCE(f, 0)
) /
((CASE WHEN a IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN e IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN f IS NULL THEN 0 ELSE 1 END CASE)
)
WHERE 0<>((CASE WHEN a IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN e IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN f IS NULL THEN 0 ELSE 1 END CASE)
)
You could use COALESCE also in the other parts, but this will not handle the case when you have a rating with value 0 properly because it will be excluded. The WHERE clause avoids DivideByZero, but you might need to have additional UPDATE statement to handle this case, if there is no rating for the entry.

Related

Troubleshooting Errors with Two SUMs

I have a table, it's going to be used for a supplier scorecard, with eleven different fields that can be assigned a value of 1-5. Null values are allowed.
I need to write a query that will calculate the average of the fields that are filled out by each row. In other words, I might be dividing TOTAL by 11 in one row, and dividing TOTAL by 5 in another.
I'm working with this query:
select
cf$_vendor_no,
cf$_party,
cf$_environmental,
cf$_inspections,
cf$_invoice_process,
cf$_ncr,
cf$_on_time_delivery,
cf$_qms,
cf$_safety,
cf$_schedule,
cf$_scope_of_work,
cf$_turn_times,
sum(nvl(cf$_environmental,0)
+nvl(cf$_inspections,0)
+nvl(cf$_invoice_process,0)
+nvl(cf$_ncr,0)
+nvl(cf$_on_time_delivery,0)
+nvl(cf$_qms,0)
+nvl(cf$_safety,0)
+nvl(cf$_schedule,0)
+nvl(cf$_scope_of_work,0)
+nvl(cf$_turn_times,0))
/
sum(
case when cf$_environmental is not null then 1 else 0 end +
case when cf$_inspections is not null then 1 else 0 end +
case when cf$_invoice_process is not null then 1 else 0 end +
case when cf$_ncr is not null then 1 else 0 end +
case when cf$_on_time_delivery is not null then 1 else 0 end +
case when cf$_qms is not null then 1 else 0 end +
case when cf$_safety is not null then 1 else 0 end +
case when cf$_schedule is not null then 1 else 0 end +
case when cf$_scope_of_work is not null then 1 else 0 end +
case when cf$_turn_times is not null then 1 else 0 end) --as "average"
from supplier_scorecard_clv
group by cf$_vendor_no, cf$_party, cf$_environmental, cf$_inspections, cf$_invoice_process, cf$_ncr, cf$_on_time_delivery, cf$_qms, cf$_safety, cf$_schedule, cf$_scope_of_work, cf$_turn_times
And, it almost works.
The first SUM in my code will add the values in each row to give me a total. I get a total 25 for the first FARW002 row, I get 6 for the second, and 12 for the third.
The second SUM in my code works as well. I get a count of 6 for my first FARW002 row, 2 for my second, and 3 for my third.
However, when I try to combine these, like in the code snippet above, I get a "ORA-00923: FROM keyword not found where expected" error and I'm not sure why.

So, this is stupid but here's what the problem ended up being:
+nvl(cf$_turn_times,0))
/
sum(
When I changed the code to this - really I was just dicking around - it worked:
+nvl(cf$_turn_times,0))/sum(
So, something about having the / and SUM separated from the rest of the query - which I only do to make the code more readable for me - was causing the issue.
Thanks for nothing Juan!

Equality check for two boolean expressions (TSQL)

I have two tables and I want to join two tables based on the sign of the related columns
I'm joining on the condition a<0 = b<0 but the equality sign gives me a syntax error. I don't want to do (a<0 AND b<0) OR (Not a<0 AND Not b<0) because it doesn't look clean

You know what the answer is:
where (a < 0 and b < 0) or (a >= 0 and b >= 0)
SQL Server doesn't treat boolean values as bona fide values, so you cannot treat them as regular values in other types of expressions.
You can express this using bitwise or ('^') if you really want:
where (case when a < 0 and b < 0 then 1 else 0 end) ^ (case when a >= 0 and b >= 0 then 1 else 0 end) = 1
However, I find that rather inscrutable.

As Arvo suggested, you can use Sign(). Assuming that a and b are integer types:
where Sign( Sign( a ) + 1 ) = Sign( Sign( b ) + 1 )
Explanation: The inner Sign() calls convert the input values to -1, 0 or 1. Adding 1 shifts those values to 0, 1 or 2. The outer Sign() calls collapse that back to 0 or 1 representing negative and non-negative inputs.
This kind of code is occassionally useful, but should be accompanied by a comment explaining the intent. If the technique is truly impenetrable then it should be explained in the comment or a citation provided.

Sqlite Get counts of all distinct values across a row

For a personal end of the year project I've scraped my attendance off the school website hoping to do some form of visualization of the data. I've now gotten stuck transforming that data into the form I need it in.
Currently my database looks like this
Date,One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Dee
2014-09-03,P,P,P,P,AU,AU,P,T*,AU,P
2014-09-04,P,P,P,P,N/A,AU,P,T*,N/A,P
2014-09-05,P,P,P,P,AU,AU,P,P,P,P
2014-09-09,P,P,P,P,AU,AU,P,P,AU,P
2014-09-11,AU,AU,P,AU,AU,P,AU,AU,AU,P
2014-09-15,P,P,P,P,AU,P,P,P,AU,P
2014-09-17,P,P,P,P,AU,AU,P,P,AU,P
The columns are each period,and each one has an indicator of my presence. My question is, is it possible to turn that into something like this using only sqlite?
Date,P,AU,T*,N/A
2014-09-03,6,3,1,0
2014-09-04,6,1,1,2
2014-09-05,8,2,0,0
2014-09-09,7,3,0,0
2014-09-11,3,7,0,0
2014-09-15,8,2,0,0
2014-09-17,7,3,0,0
2014-09-19,9,1,0,0
Counting each occurence of a value across the row.

Something like this:
select date,
case when one = 'p' then 1 else 0 end +
case when two = 'p' then 1 else 0 end +
...
case when dee = 'p' then 1 else 0 end as p,
case when one = 'au' then 1 else 0 end +
case when two = 'au' then 1 else 0 end +
...
case when dee = 'au' then 1 else 0 end as au,
...
from table

Counting how many data that exist [SQL]

im not sure about this question is already asked by anyone else or not yet because this is actually easy but my head is just still can't see the way out of this problem.
this is just like how many times that we do sampling at the material.
SELECT
TABLE01.MATERIAL_NO,
TABLE01.Sample_Tempt1,
TABLE01.Sample_Tempt2,
TABLE01.Sample_Tempt3,
TABLE01.Sample_Tempt4,
TABLE01.Sample_Tempt5
FROM
TABLE01
is it possible to create another column to show count of sample_tempt times?
i mean, if the tempt1 tempt2 data are exist, the column shows 2, when tempt2, tempt4 and tempt5 data are exist, the column show 3. and so on.
Thank you for helping me ^^
Sample :
Material no | Sample_Tempt1 | Sample_Tempt2 | Sample_Tempt3 | Sample_Tempt4 | Sample_Tempt5 |
PO1025 120 150 102
PO1026 122
For the PO1025, i want to create new column that generate "3" because the sample data that exist is only 3, for the PO1026 i want it generate "1" since the sample data that exist is only "1". quite simple right?

If "by exist" you mean "value is not NULL", then you can count the number of non-NULL values in each row as:
SELECT t1.MATERIAL_NO,
t1.Sample_Tempt1, t1.Sample_Tempt2, t1.Sample_Tempt3, t1.Sample_Tempt4, t1.Sample_Tempt5,
((case when t1.sample_temp1 is not null then 1 else 0 end) +
(case when t1.sample_temp2 is not null then 1 else 0 end) +
(case when t1.sample_temp3 is not null then 1 else 0 end) +
(case when t1.sample_temp4 is not null then 1 else 0 end) +
(case when t1.sample_temp5 is not null then 1 else 0 end)
) as NumTempts
FROM TABLE01 t1;
Note that I introduced a table alias. This makes the query easier to write and to read.

Divide by zero error encountered even when divider is set as NOT NULL

I've come across the error "Divide by zero error encountered" when running this query.
> SUM(CASE WHEN EML_DateSent IS NOT NULL THEN 1 ELSE 0 END) AS [Sends],
(SUM(CASE WHEN EML_DateViewed IS NOT NULL OR EML_DateClicked IS NOT NULL THEN 1 ELSE 0 END)) * 100 / SUM((CASE WHEN EML_Datesent IS NOT NULL THEN 1 ELSE 0 END)) AS [Views %],
(SUM(CASE WHEN EML_DateClicked IS NOT NULL THEN 1 ELSE 0 END)) * 100 / SUM((CASE WHEN EML_DateViewed IS NOT NULL OR EML_DateClicked IS NOT NULL THEN 1 ELSE 0 END)) AS [Clicks %]
Its an edited existing stored procedure that now calculates percentages , any quick fix ?

Try using a max/maxium statement depending on what provider you are using.
/ MAX(SUM((CASE WHEN EML_DateViewed IS NOT NULL OR EML_DateClicked IS NOT NULL THEN 1 ELSE 0 END)), 1)
This will use your sum if it has a value, if it is zero the division will use 1 instead.

You don't show the grouping criteria but it's obvious at least one of the groups has one of the dates set to NULL over the entire group. First, you don't have to put all that logic in a sum function. The count function does that, counting all the not null values, ignoring the null values. Where that doesn't work is where you're checking both dates, but that is solved by a simple coalesce. You want a count of where one date or the other or both are not null. There you can play a little trick:
select count(EMS_DateSent) AS Sends,
count(coalesce(EMS_DateViewed, EMS_DateClicked)) * 100
/ case count(EMS_Datesent)
when 0 then 1000000
else count(EMS_Datesent)
end as "Views %",
count(EMS_DateClicked) * 100
/ case count(coalesce(EMS_DateViewed, EMS_DateClicked))
when 0 then 1000000
else count(coalesce(EMS_DateViewed, EMS_DateClicked))
end AS "Clicks %"
from EML
group by whatever;
If the divisor is 0 (all nulls over the group), I have set to a large number so you get a very small answer. But this must be large relative to actual counts in your application so adjust as needed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

mysql: Average over multiple columns in one row, ignoring nulls - sql

Related

Troubleshooting Errors with Two SUMs

Equality check for two boolean expressions (TSQL)

Sqlite Get counts of all distinct values across a row

Counting how many data that exist [SQL]

Divide by zero error encountered even when divider is set as NOT NULL

Categories

Resources