No sum() with null values - sql

I'm looking for a solution to create sums of +- 10 scores and targets of a product over 6 different dimensions. There are some more i won't bother you with. Of every dimension I need a total. For example
SalesPeriod. Product: Bikes. Dimensions: bmx, size, colours, with bars etc. Targets: 1,2,3,4,5. Scores:1,2,3,4,5.
So 10 totals for bmx bikes with size x, colour red and bars, and 10 totals for bmx bikes, size x, colour red etc etc.
However, every score needs to be calculated only when none of the underlying values is a null. For example score 1 contains a null then no calculation, but score 2 does not contain a null thus should be calculated.
At this point the calculation is done via a case statement which basically checks the values of within each column/score and only calculates the total when the count of scores is equal to the expected rows.
The calculation requires a lot of cpu and with a larger dataset this is very inefficient and it simply takes too long.
I'm looking for a solution that will be much more effecient. What could be my best option to try?

You can filter (or first group by) the products with Non Null values only first by using your same count method. I don't think there is any other method.
SELECT columnid, SUM(column1)
FROM table
GROUP BY columnid
HAVING COUNT(column1)=COUNT(*);
Then you can join it on columnid with another similar query on another columnN as well.
(I'm not sure if understood your problem completely, but you basically want an efficient query with sum(scores) and sum(targets) only when they are not null? or only when they are both not null? or only scores? or only targets?)

Related

select lowest percentage values

With a set of data that has for each user an age and a score, I would like to select and plot the lowest 15% of the scores for each age.
Is there a way to simply take out the lowest 15% of scores for each age? This would be preferably in an automated way (don't want to have to repeat process for all 100 ages)
I have tried the conditional formatting but then I would have to manually select what data to keep. I suspect it is a complicated if function that hopefully I wont have to rewrite for each age.
I've created a column that ranks the scores with respect to the other of that age:
RANK(B4,$B$4:$B$426,1)
Then in a new column convert that to a percentage of the total number of entries for that age:
(E4/COUNT($E$4:$E$426))*100
Then an IF statement where it copies the score only if it is in the bottom 15% of scores;
IF(F4<15,B4,"-")
This process is long and messy doesn't seem logical to repeat it 100 times so how to I automate it?

Should I combine the columns of a fact table to make it more narrow, or should I keep it more user friendly with a lot of columns?

I have a Fact table that shows the results of KPIs. There are several KPIs, and some of these have a similar output.
My current columns are something like this:
KPI_ID, DOCUMENT_ID, TRUE_FALSE_FLAG1, TRUE_FALSE_FLAG2, DURATION_3, DURATION_4
So, for KPI number 1 (true false output), the last three columns will be NULL- values. Should I combine TRUE_FALSE_FLAG1 and TRUE_FALSE_FLAG2? What is BEST PRACTICE?
In total, there are 18 columns, where 12 of them are either true/false- flags or durations in the shape of "number of days" (integer).
picture of the two alternatives
EDIT:
KPI 3 could be "duration of problem", and you'd have a bunch of problems, each with a documentID, represented as a row. Dur_3 would be like 5 days, 3 days, 10 days, etc. KPI 4 would be "Delay of fix after repair was ordered", and the answer would still be an integer in days. But completely non- related to KPI 3.
Reporting could be "average delay of fix". So roughly a select AVG() from table where KPI_ID = 3 group by KPI_ID.
Based on your latest comment, you are best with Alternative 2. Specifically, as long as every KPI is only True/False, and has only one duration to store, you are better with Alternative 2.
EDIT: with Alternative 2, each KPI can store one True/False value AND one duration value

Average Distinct Values in a single column in Power Pivot

I have a column in PowerPivot that basically goes:
1
1
2
3
4
3
5
4
If I =AVERAGE([Column]), it's going to average all 8 values in the sample column. I just need the average of the distinct values (i.e., in the example above I want the average of (1,2,3,4,5).
Any thoughts on how to go about doing this? I tried a combination of =(DISTINCT(AVERAGE)) but it gives a formula error.
Thanks!!
Kevin
There must be a cleaner way of doing this but here is one method which uses a measure to get the sum of the values divided by the number of times it appears (to basically give the original value) then uses an iterative function to do it for each unique value.
Apologies for the uninspired measure names:
[m1] = SUM(table1[theValue]) / COUNTROWS(Table1)
[m2] = AVERAGEX(VALUES(Tables1[theValue]), [m1])
Assuming your table is caled table1 and the column is called theValue

Dynamic use of MDX AVG function

Anyone have advice on how to build an average measure that is dynamic -- it doesn't specify a particular slice but instead uses your current view? I'm working within a front-end OLAP viewer (Strategy Companion) and I need a "dynamic" implementation based on the dimensions that are currently filtered in the data view.
My fact table looks something like this:
Key AmountA IndicatorA AmountB Other Data
1 5 1 null 25
2 6 1 null 52
3 7 1 2 106
4 null 0 4 108
Now I can specify a simple average for "[Measures].[AmountA]" with "[Measures].[AmountA] / [Measures].[IndicatorA]" which works great - "[IndicatorA]" sums up to the number of non-null values of "[AmountA]". And this also works great no matter what dimensions are selected in the view - it always divides by the count of rows that have been filtered in.
But what about [AmountB]? I don't have a null indicator column. I want to get an average value of [AmountB] for whatever rows have been filtered in for my current view. If I try to use the count of rows as a simple formula (psuedo-code "[Measures].[AmountB] / Count([Measures].[Key])") I get the wrong result, because it is counting all the null rows in the average.
So, I need a way to use the AVG function to specify the average of [AmountB] over the set of "whatever rows I'm currently filtering in, based on whatever dimensions I'm currently using". How do I specify this dynamic set?
I've tried several different uses of the AVG function and they have either returned null or summed up to huge numbers, clearly not the average I'm looking for.
Thanks-
Matt
Sorry, my first suggestion was wrong. If you don't have access to OLAP cube you can't write any mdx-query for this purpose (IMHO). Because, you don't have any detailed data (from your fact table) in this access level and you can use only aggregated data and dimensions from your cube.
Otherwise (if you have access to olap db), you can create this metric (count of not NULL rows) in your measure group and after that use it for AVG calculation (as calculated member in your cube or in section "WITH" in your mdx-query).

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.