Build a Column by applying a formula to an existing column (like in excel) - pandas

I am new to the community. Hopefully this was not answered already.
I am trying to add a column to a DataFrame that contains a formula based on the previous columns. Example, build a series of stocks returns based on stocks close.
I know how to build a column by doing exactly the same thing to all elements of another, but not to use a columns element and formula to create another.
Thanks for your help.

Related

Reading xlsx into R and creating new header

I'm a newbie to R.
I read in an Excel file for a survey, but I started reading observations from the 3rd row of the excel file, as the survey download creates a first two rows of the question string (first row for all questions) followed by a second row of multiple choice questions (each option gets its own column except the first option, which is listed in the same column in the second row as the question in the first row).
So now, my dataframe starts with Row 3.
But now I need to create custom variable names - ie. new variable names for each column before I manipulate further. I'm looking for tips on how to best accomplish this.
What I am thinking:
Create an Excel file with the variable names, and then use this is as the header. I'm not quite sure which code I would use to do this.
Code the names as an empty dataframe, and then somehow merge this so the empty dataframe column names are the column names for the file I imported.
I would appreciate some suggestions on how best to do this!

How to add a column that substract the difference between two existing columns ? GREL in OpenRefine

So I'm trying to find a simple way to create a new column that displays the difference between two existing columns (each with numbers)... I can't seem to find the proper GREL expression....
So I'm trying to find the amount of items sold with a column named "stock_before" and the other named "stock_after".
I click on edit column from the column "stock_before" and then add column based on this column.
For the GREL I have already entered is:
value-cells["Stock_after"]
It returns no syntax error but still all of the cells for preview say "null"... I have transformed the value of the columns to numbers.
For Python I have tried:
substract(value,"Stock_after")
Same no syntax error but still everything null.
This seems so ridiculously simple but I couldn't find an answer... You can guess I'm fairly new to all this :) Hope someone out there can help me!
thanks for your having the patience to read this and thanks for your time if you answer!
I'd like something similar to this (3 columns):
Stock_before, Stock_after, dif
1,1,0
3,1,2
4,4,0
2,1,1
In GREL, the expression cells["Stock_after"] returns a Cell object representing the corresponding cell, not the actual value of that cell. To get the value, you need to use cells["Stock_after"].value.
So your final GREL expression should be value - cells["Stock_after"].value.
You should also make sure your values are stored as numerals, not strings: they should appear in green in the table. If they do not, use a "To number" operation on both columns first.
You can find out more about GREL and Cell objects here:
https://github.com/OpenRefine/OpenRefine/wiki/Variables

Searching for value in a linked table in power pivot

I have a PowerPivot table that has a column of IDs and a linked table that contains a set of specific IDs that I want to use to create an indicator variable which I can use to sort on in existing tables and charts. Essentially I want:
If the value in column EpisodeID is found anywhere in LostEpisodes[LostID], then return the value "1", otherwise "0".
LostEpisodes is the linked table and LostID is the column that contains the subset of IDs I want to be able to sort on.
I have tried using =IF(VALUES(LostEpisodes[LostID])=[EpisodeID],1,0) but got an error. Is my syntax wrong or should I be using a different approach? Seems simple enough, but I am new to PowerPivot and DAX.
Thanks
OK - So I have found an answer that works and wanted to share. Others may have more elegant solutions, but this worked. This is where I miss MATCH.
I have a linked table called LostEpisodes which contains 2 columns, EpisodeID and Lost (all contain the value of 1 as they are all lost episodes). For my purposes I am manually entering the episode IDs as there are only a few. EpisodeID is also in the main table and is the column I am matching on.
I started with one new column labeled LostLookup with the following formula:
=LOOKUPVALUE(LostEpisodes[Lost],LostEpisodes[EpisodeID],[EpisodeID])
I then created a new column with the following formula:
=if(ISBLANK([LostLookup]),"NotLost","Lost")
This creates the indicator variable I can now use in pivot tables and charts. I have tested it and it works great.
Hope this makes sense!

Query several ranges and add automatically a column to know the source of each row

I am trying to achieve the following in Google Spreadsheets.
First, I want to query several ranges (in different sheets from the same spreadsheet). I tried a formula like this =query(arrayformula({indirect(E2:E10)}),"select * where Col1 <>''") with no success
In E2:E10 I have a list of ranges. Column F contains a name that describes the source of the value in Column E.
My second problem is that I need to add a column to the output of that query that tells me the origin of each row.
If the sources are ranges of 3 columns by country I need to merge those tables and add that country to each row.
All credits to +Ben Liebrand who helped me out here: https://support.google.com/docs/profile/3464
"I just want to start of by saying that the indirect() function does not work in an arrayformula() function as expected. So you will need to take another approach. I can understand what you are trying to do so I added another TAB in your spreadsheet to demonstrate another approach. I know it was initially a specific design you were trying so I made some changes to what you had. Maybe you can take a look at what I have offered and maybe you can tweak your design.
I know what I am offering is just very rough but you will also notice that I removed the end row specifier from your ranges in the range table.
Don't assume my example to be the final result but I was just trying to show that the range you were trying to use with the indirect() function will not work.
So hopefully this will give you a new idea of how you can maybe handle this.
My formula also adds the country to each of the tables in the output. My formula looks like this
=query(ArrayFormula({
if(len(indirect(regexextract(F2,"\w+\!\w+")&":A")),G2,),indirect(F2);
if(len(indirect(regexextract(F3,"\w+\!\w+")&":A")),G3,),indirect(F3);
if(len(indirect(regexextract(F4,"\w+\!\w+")&":A")),G4,),indirect(F4);
if(len(indirect(regexextract(F5,"\w+\!\w+")&":A")),G5,),indirect(F5);
if(len(indirect(regexextract(F6,"\w+\!\w+")&":A")),G6,),indirect(F6);
if(len(indirect(regexextract(F7,"\w+\!\w+")&":A")),G7,),indirect(F7)
})," select * where Col1 <> '' ")
Hope this is of some help to you"
And I hope is useful to the community
GerĂ³nimo

How to Stack a range of values (from multiple tables in another sheet) into a single column

I'm working on a quarterly report that Auto-generates all fields.
I could really use some help building a formula that pulls values from the first column ([T6-TOC]) of three separate tables (ROVH_Jan, ROVH_Feb, ROVH_MAR) existing in another worksheet (RVH 1825). I need the three ranges of values to stack in a single column, but I do not want to eliminate duplicates values.
I've tried using =INDEX formula, and VBA but I can't get the syntax right.
Any suggestions?
These are sources I've viewed but didn't solve my problem.
https://superuser.com/questions/445410/pull-row-of-data-from-one-place-in-spreadsheet-to-another
http://forum.chandoo.org/threads/merge-stack-multiple-named-ranges-across-multiple-worksheets-in-a-master-sheet.11074/
Excel - Combine multiple columns into one column
http://www.mrexcel.com/forum/excel-questions/610527-how-do-i-stack-data-multiple-columns-into-one-column.html
Something like this should work for you:
=IF(ROW(A1)<=ROWS(ROVH_Jan),INDEX(ROVH_Jan[T6-TOC],ROW(A1)),IF(ROW(A1)<=ROWS(ROVH_Jan)+ROWS(ROVH_Feb),INDEX(ROVH_Feb[T6-TOC],ROW(A1)-ROWS(ROVH_Jan)),IF(ROW(A1)<=ROWS(ROVH_Jan)+ROWS(ROVH_Feb)+ROWS(ROVH_MAR),INDEX(ROVH_MAR[T6-TOC],ROW(A1)-ROWS(ROVH_Jan)-ROWS(ROVH_Feb)),"")))