combine data of multiple rows from multiple tables in single row and show multiple rows of data based on input - vba

I have a problem regarding making a data table that incorporates data of two other data tables, depending on what the input is in the input sheet.
These are my sheets:
sheet 1) Data table 1
sheet 2) Data table 2
sheet 3) Input sheet:
In this sheet one fills in the origin, destination, and month.
sheet 4) Output sheet:
Row(s) with characteristics that are a combination of the data in data table 1 and data table 2: 1 column for each characteristic in the row:
(General; Month; Origin; feature 1; feature 2; month max; month min; Transit point; feature 1; feature 2; feature 3; month max; month min; Destination; feature 1; feature 2; month max; month min;) => feature 3 of origin and destination don't have to be incorporated in the output!
Depending on the month, origin and destination filled in in the input sheet; the output has to list all the possible rows (routes) with that origin and that destination and the temperatures in that month at the origin, transit point and destination.
I have tried VLOOKUP(MATCH), but that only helps for 1 row. not if I want to list all possible rows..
I don't think this problem is that difficult, but I am really a rookie in Excel. Maybe it could work with a simple macro..

I'm a little unclear about some of your question, but perhaps you could adapt this solution to work for you?
http://thinketg.com/how-to-return-multiple-match-values-in-excel-using-index-match-or-vlookup/

I think this is what you want.
ColA ColB
a 1
b 2
c 3
a 4
b 5
c 6
a 7
b 8
9
10
11
7
8
9
9
16
17
18
19
20
In Cell E1, enter c (this is the value you are looking up).
In Cell F1, enter the function below and hit Ctrl+Shift+Enter.
=IF(ROWS(B$1:B1)<=COUNTIF($A$1:$A$20,$E$1),INDEX($B$1:$B$20,SMALL(IF($A$1:$A$20=$E$1,ROW($A$1:$A$20)-ROW($E$1)+1),ROWS(B$1:B1))),"")

Related

updating the next several row values based on the value of a row in another column

I'm trying to figure out how to add the values of one column (the amount column) to the next few rows based on the condition of another column (the days column). If the condition of the days column is greater than 1, for each day greater than 1 I add the amount column to that many following rows. So if days is three, I add the amount to the next two rows (the first day is just the current row). I actually think this is easier if I make a copy of the amount column, so I made a copy called backlog.
So let's say I have an amount column that represents the amount of support tickets that need to be resolved each day. Each amount has a number of days it takes for the amount to be resolved. I need the total amount to be a sum of the value today and the sum of the outstanding tickets. So if I have an amount of 1 for 2 days, I have 1 ticket amount today and I add that same 1 tomorrow to the ticket amount of tomorrow. If this doesn't make sense, the below examples will. I have a solution as well, but my main issue is doing this efficiently.
Here is a sample dataframe to use:
amount = list(np.zeros(10)) + [random.randint(1,3) for val in range(15)]
random.shuffle(amount)
ex = pd.DataFrame({
'Amount': amount
})
ex.loc[ex['Amount']>0, 'Days'] = [random.randint(0,4) for val in range(15)]
ex.loc[ex['Amount']==0, 'Days'] = 0
ex['Days'] = ex['Days'].astype(int)
ex['Backlog'] = ex['Amount']
ex.head(10)
Input Dataframe:
Amount
Days
Backlog
2
0
2
1
3
1
2
2
2
3
0
3
Desired Output Dataframe:
Amount
Days
Backlog
2
0
2
1
3
1
2
2
3
3
0
6
In the last two values of the backlog column, I have a value of 3 (2 from the current day amount plus 1 from the prior day amount) and a value of 6 (3 for the current day + 2 from the previous day amount + 1 from two days ago).
I have made code for this below, which I think achieves the outcome:
for i in range(0, len(ex['Amount'])):
Days = ex['Days'].iloc[i]
if Days >= 2:
for j in range (1,Days):
if (i+j)>= len(ex['Amount']):
break
ex['Backlog'].iloc[i+j] += ex['Amount'].iloc[i]
The problem is that I'm already using two for loops to slice the data frame for two features first, so when this code is used as a function for a very large data frame it runs far too slowly, and my main goal has been to implement a faster way to do this. Is there a more efficient pandas method to achieve the same outcome? Possibly without having to use slow iteration or a nested for loop? I'm at a loss.

Calculating Weekly Returns from Daily Time Series of Prices

I want to calculate weekly returns of a mutual fund from a time series of daily prices. My data looks like this:
A B C D E
DATE WEEK W.DAY MF.PRICE WEEKLY RETURN
02/01/12 1 1 2,7587
03/01/12 1 2 2,7667
04/01/12 1 3 2,7892
05/01/12 1 4 2,7666
06/01/12 1 5 2,7391 -0,007
09/01/12 2 1 2,7288
10/01/12 2 2 2,6707
11/01/12 2 3 2,7044
12/01/12 2 4 2,7183
13/01/12 2 5 2,7619 0,012
16/01/12 3 1 2,7470
17/01/12 3 2 2,7878
18/01/12 3 3 2,8156
19/01/12 3 4 2,8310
20/01/12 3 5 2,8760 0,047
The date is (dd/mm/yy) format and "," is decimal separator. This would be done by using this formula: (Price for last weekday - Price for first weekday)/(Price for first weekday). For example the return for the first week is (2,7391 - 2,7587)/2,7587 = -0,007 and for the second is (2,7619 - 2,7288)/2,7288 = 0,012.
The problem is that the list goes on for a year, and some weeks have less than five working days due to holidays or other reasons. So I can't simply copy and paste the formula above. I added the extra two columns for week number and week day using WEEKNUM and WEEKDAY functions, thought it might help. I want to automate this with a formula or using VBA and hoping to get a table like this:
WEEK RETURN
1 -0,007
2 0,012
3 0,047
.
.
.
As I said some weeks have less than five weekdays, some start with weekday 2 or end with weekday 3 etc. due to holidays or other reasons. So I'm thinking of a way to tell excel to "find the prices that correspond to the max and min weekday of each week and apply the formula (Price for last weekday - Price for first weekday)/(Price for first weekday)".
Sorry for the long post, I tried to be be as clear as possible, I would appreciate any help! (I have 5 separate worksheets for consecutive years, each with daily prices of 20 mutual funds)
To do it in one formula:
=(INDEX(D:D,AGGREGATE(15,6,ROW($D$2:$D$16)/(($C$2:$C$16=AGGREGATE(14,6,$C$2:$C$16/($B$2:$B$16=G2),1))*($B$2:$B$16=G2)),1))-INDEX(D:D,MATCH(G2,B:B,0)))/INDEX(D:D,MATCH(G2,B:B,0))
You may need to change all the , to ; per your local settings.
I would solve it using some lookup formulas to get the values for each week and then do a simple calculation for each week.
Resulting table:
H I J K L M
first last first val last val return
1 02.01.2012 06.01.2012 2,7587 2,7391 -0,007
2 09.01.2012 13.01.2012 2,7288 2,7619 0,012
3 16.01.2012 20.01.2012 2,747 2,876 0,047
Formula in column I:
=MINIFS($A:$A;$B:$B;$H2)
Fomula in column J:
=MAXIFS($A:$A;$B:$B;$H2)
Formula in column K:
=VLOOKUP($I2;$A:$D;4;FALSE)
Formula in column L:
=VLOOKUP($J2;$A:$D;4;FALSE)
Formula in column M:
=(L2-K2)/K2

Business Objects CountIf by cell reference

So I have a column with this data
1
1
1
2
3
4
5
5
5
how can I do a count if where the value at any given location in the above table is equal to a cell i select? i.e. doing Count([NUMBER]) Where([NUMBER] = Coordinates(0,0)) would return 3, because there are 3 rows where the value is one in the 0 position.
it's basically like in excel where you can do COUNTIF(A:A, 1) and it would give you the total number of rows where the value in A:A is 1. is this possible to do in business objects web intelligence?
Functions in WebI operate on rows, so you have to think about it a little differently.
If your intent is to create a cell outside of the report block and display the count of specific values, you can use Count() with Where():
=Count([NUMBER];All) Where ([NUMBER] = "1")
In a freestanding cell, the above will produce a value of "3" for your sample data.
If you want to put the result in the same block and have it count up the occurrences of values on that row, for example:
NUMBER NUMBER Total
1 3
1 3
1 3
2 1
3 1
4 1
5 3
5 3
5 3
it gets a little more complicated. You have to have at least one other dimension in the query to reference. It can be anything, but you have to be counting something in conjunction with the NUMBER dimension. So, the following would work, assuming there's another dimension in the query named [Duh]:
=Count([NUMBER];All) ForAll([Duh])

How to find conditional cumulative sums in an excel table using VBA macro

Let's say I have two columns.
3.5463 11
4.5592 12
1.6993 111
0.92521 112
1.7331 121
2.1407 122
1.4082 1111
2.0698 1112
2.3973 1121
2.4518 1122
1.1719 1211
1.153 1212
0.67139 1221
0.64744 1222
1.3705 11111
0.9557 11112
0.64868 11121
0.7325 11211
0.58874 11212
0.86673 11221
0.17075 11222
0.64026 12111
0.80229 12112
0.43422 12122
1.0405 12211
0.63376 12212
0.56491 12221
0.34626 12222
0.81631 111111
0.91837 111112
0.70013 111121
0.87384 111122
1.1474 111211
0.47411 111221
0.12249 111222
0.56728 112111
0.88169 112112
0.14509 112121
0.68655 112211
0.36274 112212
1.1652 121111
0.99314 121112
0.42024 121121
0.23937 121122
1.0346 122111
0.64642 122112
0.15632 122121
0.41725 122122
0.40793 122211
In the first column, there is a number. With every one of those numbers, in the second column, is an associated ID. Now, there are some blank rows that do not contain any numbers in them.
Define one of these numbers to be a "daughter" of another number if the ID of the first number is the same as the ID of the second, with an extra digit on the end. For example, both IDs 11211 and 11212 are daughters of 1121, because the ID of 1121 has an extra digit, either a 1 or a 2, added onto the end to form the ID of its daughters. Thus, 1121 is the parent of both 11211 and 11212.
Here is what I want the macro to do. It must output a third column which contains, for every row, a cumulative sum of the number of the first column in that row, plus the parent number of that number, and the parent number of the parent number, etc. all the way up until it reachers either 11 or 12. It will begin by simply outputting the numbers in column 1 for 11 and 12 in the third column. Then, in a loop beginning with 111, it will add up the cumulative sum of every row (the number in that row plus the third column output of the parent), only if that row has a number and an id, and only if the parent exists and has an output in column 3. So for example, the number in the 3rd column of the row with ID 11222 should be the number in column 1 of that row, plus that of 1122, plus that of 112, plus that of 11. So, 0.17075+2.4518+0.92521+3.5463, or 7.09406. However, if you try to do this for ID 111221, you will notice that the row where the parent 11122 should be is empty. Thus, the parent does not exist, and no value will be outputted in column 3 for 111221.
I would greatly appreciate it if someone has some time on their hands to code up this VBA macro for me in exchange for an accepted solution.
Thanks
I don't think a macro is needed, just some formulas. First, I put a header on my columns of data, such as "value," and "id." If you then highlight the column labels (i.e., A and B) and sort by B ("id") then A ("value"), you'll group your blank rows. You can then delete those rows. Now you have the data almost ready. When I did this, I converted the id column to text, as opposed to a number value, so if I sort the table by id, the pattern will be, "11, 111, 1111," and so on, instead of, "11, 12, 111, 112, 121." Then, I added columns to separate the separate characters or levels of the ids. This is to help with parents and children. You can use text-to-columns, or a MID formula, but what I did was have 6 more columns to the right. For each id row, each column would either have a "1," a "2," or a blank (null) value. Then I added another column, calling it "level." I used a formula like COUNTA across all my id splitting columns. So, for 11, my level value was 2. 111 would be 3, 11221 would be 5, and so on. This gives me the id level (parent, child, grandchild, etc). Then I added my final column to the right to compute my cumulative sum of the values. In concept I have one big nested IF statement, but in practice, I needed two. My formula says, if the row above me has a lower level number (i.e., it is some kind of parent), add the value of the current row to the value of the above row. Otherwise, keep going up a row till I do get a parent, and add the current row value to that number.
My final formula for all but the first 5 rows of data was (in the 6th row of data):
=if(K6
rest of answer is below
=if(K6<K7,L6+C7,if(K5<K7,L5+C7,if(K4<K7,L4+C7,if(K3<K7,L3+C7,if(K2<K7,L2+C7,C7)))))
The values were column C, the original id in column D, the id split columns were E through J, the level column was K, and my formula was in L. This formula can be copied down the table. For the first 4 rows, you just need 1 less IF statement each row you go up. The fifth row of data might take the above formula; it depends how it will deal with the column headers in row one. The formula on the 4 row of data might be:
=if(K4<K5,L4+C5,if(K3<K5,L3+C5,if(K2<K5,L2+C5,if(K1<K5,L1+C5,C5))))
I'm still learning how to format these comments, so I'll try to provide a sample of the layout I have...
C D E F G H I J K L
1 value id 1 2 3 4 5 6 lvl cumul_sum
2 3.546300 11 1 1 2 3.546300
3 1.699300 111 1 1 1 3 5.245600
4 1.408200 1111 1 1 1 1 4 6.653800
5 1.370500 11111 1 1 1 1 1 5 8.024300
6 0.816310 111111 1 1 1 1 1 1 6 8.840610
7 0.918370 111112 1 1 1 1 1 2 6 8.942670
8 0.955700 11112 1 1 1 1 2 5 7.609500
So for example, the number in the 3rd column of the row with ID 11222 should be the number in column 1 of that row, plus that of 1122, plus that of 112, plus that of 11. So, 0.17075+2.4518+0.92521+3.5463, or 7.09406.However, if you try to do this for ID 111221, you will notice that the row where the parent 11122 should be is empty. Thus, the parent does not exist, and no value will be outputted in column 3 for 111221.
As a native worksheet array formula¹ in D1,
=IF(LEN(B1), SUM(SUMIFS(A$1:INDEX(A:A, MATCH(1E+99, A:A)),
B$1:INDEX(B:B, MATCH(1E+99, A:A)), LEFT(B1, ROW(INDIRECT("2:"&LEN(B1)))))), TEXT(,))
The above does not compensate for missing parents (null string). It totals everything it can find and uses zero for missing parents.
As a VBA UDF² in E1,
Function conditionalCumulativeSum(nums As Range, _
ids As Range, sib As Range, _
Optional nullOnBlank As Boolean = True)
Dim i As Integer
'truncate any full column reference to the UsedRange
Set nums = Intersect(nums, nums.Parent.UsedRange)
'match the nums and ids ranges
Set ids = ids.Resize(nums.Rows.Count, nums.Columns.Count)
For i = Len(sib.Value2) To 2 Step -1
If nullOnBlank And IsError(Application.Match(--Left(sib, i), ids, 0)) Then
conditionalCumulativeSum = vbNullString
Exit For
End If
conditionalCumulativeSum = conditionalCumulativeSum + _
Application.SumIfs(nums, ids, Left(sib, i))
Next i
If i = 0 Then conditionalCumulativeSum = vbNullString
End Function
The above defaults to return a null string when it encounters any missing parent through the hereditary chain. This can be turned off by adding FALSE as the optional fourth parameter and then the UDF will behave identically to the native formula.
Results from sample data
    
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
² A User Defined Function (aka UDF) is placed into a standard module code sheet. Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the function code into the new module code sheet titled something like Book1 - Module1 (Code). Tap Alt+Q to return to your worksheet(s).

Excel: one column has duplicates of each value, I need to take averages of the corresponding two values from the other columns

Example:
column A column B
A 1
A 2
B 2
B 2
C 1
C 1
I would somehow like to get the following result:
column A column B
A 1.5
B 2
C 1
(which are averages of 1 and 2, 2 and 2 and 1 and 1)
How do I achieve that?
Thanks
If you're using Excel 2007 or above, you can also use the shorter AVERAGEIF function:
=AVERAGEIF($A$1:$A:$6,D1,$B$1:$B$6)
Less typing, easier to read..
In D1:D3, type A, B, C. Then in E1, put this formula
=SUMIF($A$1:$A$6,D1,$B$1:$B$6)/COUNTIF($A$1:$A$6,D1)
and fill down to E3. If you want to replace the existing data, copy E1:E3 and paste-special-values over itself. Then delete A:C.
Alternatively, you can add headers to your data, say "Letter" and "Number". Then create a Pivot Table from your data. Put Letter in the rows section and Number in the Data section. Change your Data section from SUM to AVERAGE and you'll get the same result.