Excel: one column has duplicates of each value, I need to take averages of the corresponding two values from the other columns - excel-2007

Example:
column A column B
A 1
A 2
B 2
B 2
C 1
C 1
I would somehow like to get the following result:
column A column B
A 1.5
B 2
C 1
(which are averages of 1 and 2, 2 and 2 and 1 and 1)
How do I achieve that?
Thanks

If you're using Excel 2007 or above, you can also use the shorter AVERAGEIF function:
=AVERAGEIF($A$1:$A:$6,D1,$B$1:$B$6)
Less typing, easier to read..

In D1:D3, type A, B, C. Then in E1, put this formula
=SUMIF($A$1:$A$6,D1,$B$1:$B$6)/COUNTIF($A$1:$A$6,D1)
and fill down to E3. If you want to replace the existing data, copy E1:E3 and paste-special-values over itself. Then delete A:C.
Alternatively, you can add headers to your data, say "Letter" and "Number". Then create a Pivot Table from your data. Put Letter in the rows section and Number in the Data section. Change your Data section from SUM to AVERAGE and you'll get the same result.

Related

How to transpose columns when they encode multiple "records"?

I have a spreadsheet I have imported into OpenRefine. The creator encoded groups of information (records) in columns. I need to bring each of those groups of columns into its own row, along with all the relevant columns.
Using a simplified example, how would I go from this:
id foo1 foo2 foo3 bar1 bar2 bar3
1 4 6 a 7 9 b
2 5 5 a 8 8 b
3 6 4 a 9 7 b
To this:
id foobar1 foobar2 foobar3
1 4 6 a
1 7 9 b
2 5 5 a
2 8 8 b
3 6 4 a
3 9 7 b
I've been trying to think of a way forward with intermediate columns, but there are are 6 groups of 5 columns and I'm currently stuck.
I found a solution. The steps are:
Concat each group of columns into a single column (FOO_CONCAT, BAR_CONCAT)
Delete the now unneeded columns (foo1..3, bar1..3)
Transpose your CONCAT columns into a single column, no prefix, ignoring blanks, filling down other columns
Now FOO_CONCATs and BAR_CONCATs are all in the same column
Split that column into several columns...(using the separator you used in step 1)
Rename columns
Strip out prefixes (I had foo1:4, bar2:8, etc for clarity)
Transform to numbers (Edit cells -> Common Transforms -> toNumber)
Now you're ready to transpose,facet, etc
I think this is essentially the same has the solution you describe, but possibly with some shortcuts to avoid all the steps.
Given the example data you post I would:
On "Id" column select Edit column->Add column based on this column
from menu
Make new column name "foobar"
Use the GREL forEach(row.columnNames,cn,if(cn.startsWith("foo"),cells[cn].value,null)).join("|")+"~"+forEach(row.columnNames,cn,if(cn.startsWith("bar"),cells[cn].value,null)).join("|")
Once new "foobar" column exists, on this column use menu option Edit cells->Split multi-valued cells using the "~" character (as used in the GREL above)
The also on the "foobar" column use menu option Edit columns->Split into several columns, using the "|" character as in the GREL above
Finally on ID column use menu Edit cells->Fill down
This should result in the output you describe - if you don't need the original columns at this point you can either remove them, or (sometimes quicker) export the first X columns that have the reconfigured data using the custom tabular exporter, and then import that data into a new project.
You can modify the GREL to deal with the exact column groupings you have. In my example I've used the column naming to group the values, but if that isn't the reality of the data you are dealing with you can use GREL like:
forEach(row.columnNames.slice(1,4),cn,cells[cn].value).join("|")+"~"+forEach(row.columnNames.slice(4,8),cn,cells[cn].value).join("|")
Which uses the 'slice' function to select certain columns rather than using some aspect of the column name to select them.

Odd Even Sorting in VBA

I am trying to sort rows of data so that the integer value of an alpha-numerical address is in order of odd values then even values given they are of the same type.
The only way I have got it to (semi)work was this:
-Find if the integer of the address is even or odd
-Add EVEN or ODD to a cell in that addresses corresponding row
-Run the macro
-Filter the data by EVEN or ODD designation
This approach isn't ideal. I am interested in rearranging the rows without having to use filtering.
Below is an example of how the sorting would go.
UNSORTED SORTED
Address Type Address Type
1.1p A 1.1p A
1.2p A 1.2p A
1.3p A 1.3p A
1.4p A 1.4p A
2.1p A 3.1p A
2.2p A 3.2p A
2.3p A 3.3p A
2.4p A 3.4p A
3.1p A 5.1p A
3.2p A 5.2p A
3.3p A 5.3p A
3.4p A 5.4p A
4.1p A 2.1p A
4.2p A 2.2p A
4.3p A 2.3p A
4.4p A 2.4p A
5.1p A 4.1p A
5.2p A 4.2p A
5.3p A 4.3p A
5.4p A 4.4p A
6.1p B 7.1p B
6.2p B 7.2p B
6.3p B 7.3p B
6.4p B 7.4p B
7.1p B 9.1p B
7.2p B 9.2p B
7.3p B 9.3p B
7.4p B 9.4p B
8.1p B 6.1p B
8.2p B 6.2p B
8.3p B 6.3p B
8.4p B 6.4p B
9.1p B 8.1p B
9.2p B 8.2p B
9.3p B 8.3p B
9.4p B 8.4p B
10.1p B 10.1p B
10.2p B 10.2p B
10.3p B 10.3p B
10.4p B 10.4p B
I am new to VBA. Thank you in advance for any suggestions.
I think you need to create a helper column where you can store a value that you can use for sorting.
Basic idea is to extract the numeric value from your "Adress" column, check if it is even and if yes multiply it by an high value (eg 1000) so that it is guaranteed to be higher than the highest possible odd value.
You can use either a formula for this cell - but it's looking a little complicated to me. Assuming that your data starts in cell A2:
=VALUE(LEFT(A2, SEARCH("p", A2, 1)-1))*IF(ISODD(VALUE(LEFT(A2, SEARCH("p", A2, 1)-1))),1,1000)
or write a small UDF
Function SortVal(s As String) As Double
SortVal = Val(s)
If Int(SortVal) Mod 2 = 0 Then SortVal = SortVal * 1000
End Function
and put a call to it in your helper column
=SortVal(A2)

Business Objects CountIf by cell reference

So I have a column with this data
1
1
1
2
3
4
5
5
5
how can I do a count if where the value at any given location in the above table is equal to a cell i select? i.e. doing Count([NUMBER]) Where([NUMBER] = Coordinates(0,0)) would return 3, because there are 3 rows where the value is one in the 0 position.
it's basically like in excel where you can do COUNTIF(A:A, 1) and it would give you the total number of rows where the value in A:A is 1. is this possible to do in business objects web intelligence?
Functions in WebI operate on rows, so you have to think about it a little differently.
If your intent is to create a cell outside of the report block and display the count of specific values, you can use Count() with Where():
=Count([NUMBER];All) Where ([NUMBER] = "1")
In a freestanding cell, the above will produce a value of "3" for your sample data.
If you want to put the result in the same block and have it count up the occurrences of values on that row, for example:
NUMBER NUMBER Total
1 3
1 3
1 3
2 1
3 1
4 1
5 3
5 3
5 3
it gets a little more complicated. You have to have at least one other dimension in the query to reference. It can be anything, but you have to be counting something in conjunction with the NUMBER dimension. So, the following would work, assuming there's another dimension in the query named [Duh]:
=Count([NUMBER];All) ForAll([Duh])

How to find conditional cumulative sums in an excel table using VBA macro

Let's say I have two columns.
3.5463 11
4.5592 12
1.6993 111
0.92521 112
1.7331 121
2.1407 122
1.4082 1111
2.0698 1112
2.3973 1121
2.4518 1122
1.1719 1211
1.153 1212
0.67139 1221
0.64744 1222
1.3705 11111
0.9557 11112
0.64868 11121
0.7325 11211
0.58874 11212
0.86673 11221
0.17075 11222
0.64026 12111
0.80229 12112
0.43422 12122
1.0405 12211
0.63376 12212
0.56491 12221
0.34626 12222
0.81631 111111
0.91837 111112
0.70013 111121
0.87384 111122
1.1474 111211
0.47411 111221
0.12249 111222
0.56728 112111
0.88169 112112
0.14509 112121
0.68655 112211
0.36274 112212
1.1652 121111
0.99314 121112
0.42024 121121
0.23937 121122
1.0346 122111
0.64642 122112
0.15632 122121
0.41725 122122
0.40793 122211
In the first column, there is a number. With every one of those numbers, in the second column, is an associated ID. Now, there are some blank rows that do not contain any numbers in them.
Define one of these numbers to be a "daughter" of another number if the ID of the first number is the same as the ID of the second, with an extra digit on the end. For example, both IDs 11211 and 11212 are daughters of 1121, because the ID of 1121 has an extra digit, either a 1 or a 2, added onto the end to form the ID of its daughters. Thus, 1121 is the parent of both 11211 and 11212.
Here is what I want the macro to do. It must output a third column which contains, for every row, a cumulative sum of the number of the first column in that row, plus the parent number of that number, and the parent number of the parent number, etc. all the way up until it reachers either 11 or 12. It will begin by simply outputting the numbers in column 1 for 11 and 12 in the third column. Then, in a loop beginning with 111, it will add up the cumulative sum of every row (the number in that row plus the third column output of the parent), only if that row has a number and an id, and only if the parent exists and has an output in column 3. So for example, the number in the 3rd column of the row with ID 11222 should be the number in column 1 of that row, plus that of 1122, plus that of 112, plus that of 11. So, 0.17075+2.4518+0.92521+3.5463, or 7.09406. However, if you try to do this for ID 111221, you will notice that the row where the parent 11122 should be is empty. Thus, the parent does not exist, and no value will be outputted in column 3 for 111221.
I would greatly appreciate it if someone has some time on their hands to code up this VBA macro for me in exchange for an accepted solution.
Thanks
I don't think a macro is needed, just some formulas. First, I put a header on my columns of data, such as "value," and "id." If you then highlight the column labels (i.e., A and B) and sort by B ("id") then A ("value"), you'll group your blank rows. You can then delete those rows. Now you have the data almost ready. When I did this, I converted the id column to text, as opposed to a number value, so if I sort the table by id, the pattern will be, "11, 111, 1111," and so on, instead of, "11, 12, 111, 112, 121." Then, I added columns to separate the separate characters or levels of the ids. This is to help with parents and children. You can use text-to-columns, or a MID formula, but what I did was have 6 more columns to the right. For each id row, each column would either have a "1," a "2," or a blank (null) value. Then I added another column, calling it "level." I used a formula like COUNTA across all my id splitting columns. So, for 11, my level value was 2. 111 would be 3, 11221 would be 5, and so on. This gives me the id level (parent, child, grandchild, etc). Then I added my final column to the right to compute my cumulative sum of the values. In concept I have one big nested IF statement, but in practice, I needed two. My formula says, if the row above me has a lower level number (i.e., it is some kind of parent), add the value of the current row to the value of the above row. Otherwise, keep going up a row till I do get a parent, and add the current row value to that number.
My final formula for all but the first 5 rows of data was (in the 6th row of data):
=if(K6
rest of answer is below
=if(K6<K7,L6+C7,if(K5<K7,L5+C7,if(K4<K7,L4+C7,if(K3<K7,L3+C7,if(K2<K7,L2+C7,C7)))))
The values were column C, the original id in column D, the id split columns were E through J, the level column was K, and my formula was in L. This formula can be copied down the table. For the first 4 rows, you just need 1 less IF statement each row you go up. The fifth row of data might take the above formula; it depends how it will deal with the column headers in row one. The formula on the 4 row of data might be:
=if(K4<K5,L4+C5,if(K3<K5,L3+C5,if(K2<K5,L2+C5,if(K1<K5,L1+C5,C5))))
I'm still learning how to format these comments, so I'll try to provide a sample of the layout I have...
C D E F G H I J K L
1 value id 1 2 3 4 5 6 lvl cumul_sum
2 3.546300 11 1 1 2 3.546300
3 1.699300 111 1 1 1 3 5.245600
4 1.408200 1111 1 1 1 1 4 6.653800
5 1.370500 11111 1 1 1 1 1 5 8.024300
6 0.816310 111111 1 1 1 1 1 1 6 8.840610
7 0.918370 111112 1 1 1 1 1 2 6 8.942670
8 0.955700 11112 1 1 1 1 2 5 7.609500
So for example, the number in the 3rd column of the row with ID 11222 should be the number in column 1 of that row, plus that of 1122, plus that of 112, plus that of 11. So, 0.17075+2.4518+0.92521+3.5463, or 7.09406.However, if you try to do this for ID 111221, you will notice that the row where the parent 11122 should be is empty. Thus, the parent does not exist, and no value will be outputted in column 3 for 111221.
As a native worksheet array formula¹ in D1,
=IF(LEN(B1), SUM(SUMIFS(A$1:INDEX(A:A, MATCH(1E+99, A:A)),
B$1:INDEX(B:B, MATCH(1E+99, A:A)), LEFT(B1, ROW(INDIRECT("2:"&LEN(B1)))))), TEXT(,))
The above does not compensate for missing parents (null string). It totals everything it can find and uses zero for missing parents.
As a VBA UDF² in E1,
Function conditionalCumulativeSum(nums As Range, _
ids As Range, sib As Range, _
Optional nullOnBlank As Boolean = True)
Dim i As Integer
'truncate any full column reference to the UsedRange
Set nums = Intersect(nums, nums.Parent.UsedRange)
'match the nums and ids ranges
Set ids = ids.Resize(nums.Rows.Count, nums.Columns.Count)
For i = Len(sib.Value2) To 2 Step -1
If nullOnBlank And IsError(Application.Match(--Left(sib, i), ids, 0)) Then
conditionalCumulativeSum = vbNullString
Exit For
End If
conditionalCumulativeSum = conditionalCumulativeSum + _
Application.SumIfs(nums, ids, Left(sib, i))
Next i
If i = 0 Then conditionalCumulativeSum = vbNullString
End Function
The above defaults to return a null string when it encounters any missing parent through the hereditary chain. This can be turned off by adding FALSE as the optional fourth parameter and then the UDF will behave identically to the native formula.
Results from sample data
    
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
² A User Defined Function (aka UDF) is placed into a standard module code sheet. Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the function code into the new module code sheet titled something like Book1 - Module1 (Code). Tap Alt+Q to return to your worksheet(s).

Mark accumulated values on a QlikView column if condition is fulfilled

I have a table in Qlikview with 2 columns:
A B
a 10
b 45
c 30
d 15
Based on this table, I have a formula with full acumulation defined as:
SUM(a)/SUM(TOTAL a)
As a result,
A B D
b 45 45/100=0.45
c 30 75/100=0.75
d 15 90/100=0.90
a 10 100/100=1
My question is. how do I mark in colour the values in column A that have on column D <=0.8)?
The challenge is that D is defined with full accumulation, but if I reference D in a formula, it doesn't consider the full accumulation!
I tried with defining a formula E=if(D>0.8,'Y','N') but this formula doesn't take the visible (accumulated) value for D unfortunately, instead it takes the D with no accumulation. If this worked, I would have tried to hide (not disable) E and reference it from the dimensions column of the table , Text colour option. Any ideas please?? Thanks
You can't get an expression column's value from within a dimension or it's properties, because the expression columns rely on the dimensions provided. It would create an endless loop. Your options are:
Apply your background colour to the expression columns, not the dimensions. This would actually make more sense as the accumulated values would have the colour, not the dimension.
When loading this specific table, have QlikView create a new column that contains the accumulated values of B. This would mean, however, that the order of your chart-table would need to be fixed for the accumulations to make any sense.
Use aggregation to create a temporary table and accumulate the values using RangeSum(). Note this will only accumulate properly if the table is ordered in Ascending order of Column A
=IF(Aggr(RangeSum(Above(Sum(B),0,10)),A)/100>0.8,
rgb(0,0,0),
rgb(255,0,0)
)