Microsoft Report Builder - Row totals percent of column total - sql

I'd really appreciate some help with Report Builder. As seen below, I have a report that shows the number of items. In my SQL query I have used a CASE statement to tag some of the items with a y or a n.
What I want to do is add a calculated cell that sums all the values of the items tagged with y and divide by the total and * 100 to find the percent of the rows tagged y of the total amount.
Answer looking for is -
Apple | Y | 100
Pear | Y | 200
Orange| N | 500
Total | 800
Percent of Ys = 37.5% (100+200/800*100)
I'm new to report builder so please let me know if this doesn't make sense.
Many thanks.

You could add two more columns to your query, using similar logic as your CASE statement for the Y/N column. The first column is populated with the value only when the condition for "Y" is true, otherwise it is zero. The second column is populated with the value only when the condition for "N" is true, otherwise it is zero. This would give you a result set similar to this:
All Y N
Apple | Y | 100 | 100 | 0
Pear | Y | 200 | 200 | 0
Orange| N | 500 | 0 | 500
Total | 800 | 300 | 500
Then your calculation is something like this:
Percent of Ys = (Sum(Y) / Sum(All)) * 100
i.e.
Percent of Ys = (300 / 800) * 100 = 37.5%

Related

Sum dataset in SQL with conditional expressions

I'm working with a dataset of items with different values and I would like a SQL query to calculate the total USD value of the dataset.
Example Dataset:
id | type | numOrdered
0 | apple | 1
1 | orange | 3
2 | apple | 10
3 | apple | 5
4 | orange | 2
5 | apple | 1
Consider this dataset of fruit orders. Let's say apples are worth $1 and oranges are worth $2. I would like to know how much total USD in fruit orders we have.
I'd like to perform the same operation as this example Javascript function, but using SQL:
let sum = 0;
for(let fruitOrder of fruitOrders) {
if(fruitOrder.type == "orange"){
sum += fruitOrder.numOrdered*2;
} else {
sum += fruitOrder.numOrdered*1;
}
}
return sum;
So the correct answer for this dataset would be $27 USD total since there are 17 apples worth $1 and 5 oranges worth $2.
I know how to break it down into two distinct queries giving me the number I want split by type
SELECT
sum("public"."fruitOrders"."num"*2) AS "sum"
FROM "public"."fruitOrders"
WHERE "public"."fruitOrders"."type" = 'orange';
which would return $10, the total USD value of oranges
SELECT
sum("public"."fruitOrders"."num") AS "sum"
FROM "public"."fruitOrders"
WHERE "public"."fruitOrders"."type" = 'apple';
which would return $17, the total USD value of apples
I just don't know how to sum those numbers together in SQL to get $27, the total USD value of the dataset.
If you want the values 1 and 2 hardcoded then you can use a CASE statement with SUM():
SELECT
sum(case type
when 'apple' then 1
when 'orange' then 2
end * numordered
) AS "sum"
FROM "public"."fruitOrders"
See the demo.
Result:
| sum |
| --- |
| 27 |

FormulaArray not averaging out all the specified entries

Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?
The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.
To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i

Tabulate Command Stata

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.
According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+
I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?

Qlikview - Scatter chart dot colors dimension setup not working

I have some data that I want to display in scatter chart. I have the following two dimensions:
Dimension1: This is each record in the table - say unique id for each row. So the number of dots should be equal to number of records.
Dimension2: This is a combination of 2 columns. tp and vc. Colors of each dot is based on these 2 columns.
tp vc
1 a 1
2 b 2
3 c 1
So there will be dots of 3 colors based on the above tp and vc combinations. Then there are 3 expressions representing X and Y and Size of dot. I am not sure how to configure the dimensions to achieve the goal.
Thanks
You will need a calculated dimmension which is the concatanation expression defined as =tp & vc in your case.
Then this will be your single dimmension. Then your x,y,size expressions make up the remaining requirements for this chart.
This will give you three colors, one for each unique record combination and they will be labled a1 and b2 and c1.
id tp vc x y size
1 | a | 1 | 3 | 5 | 7
2 | b | 2 | 1 | 2 | 10
3 | c | 1 | 9 | 5 | 5

vba sum value in column except one

I have a matrix in microsoft reports. It looks:
Product | Sold
apple | 1000
melon | 200
banana | 500
orange | 2000
sum(without orange) | x
sum | 3700
How to write expression in vba to sum all values without orange? Number of rows with fruits can be different so i cant use static index to identify product
=sum(IIf(Fields!Product.Value<>"orange", Fields!Sold.Value, 0))