I'm trying to set multiple columns with the apply method from an array (instead of having 3 different lines as the declaration). I would like to have 3 columns set from the dataframe apply method by different args from an array.
declaring in separate lines works, but not very clean.
days=np.array([30,45,60])
def move(row,days):
return row.X / 100 * np.sqrt(days/365)
### I am trying to clean this up -- there's got to be a simpler way!!
#df['Move30'] = df.apply(move,args=(days[0], ),axis=1)
#df['Move45'] = df.apply(move,args=(days[1], ),axis=1)
#df['Move60'] = df.apply(move,args=(days[2], ),axis=1)
### This succeeds but not any cleaner
df['Move30'], df['Move45'], df['Move60'] = df.apply(move,args=(days[0], ),axis=1), df.apply(move,args=(days[1], ),axis=1), df.apply(move,args=(days[2], ),axis=1)
### Is there some way to create...?
df['Move30'], df['Move45'], df['Move60'] = df.apply(move,args=([days[0],days[1],days[2]], ),axis=1)
You can write this as a for loop:
for d in days:
df[f'Move{d}'] = df.apply(move,args=(d, ),axis=1)
In python 2 you'd have to use 'Move' + str(d) instead of f'Move{d}'.
However, I suspect you'd be better off vectorizing this...
Related
I have a dataframe which has a column called hexa which has hex values like this. They are of dtype object.
hexa
0 00802259AA8D6204
1 00802259AA7F4504
2 00802259AA8D5A04
I would like to remove the first and last bits and reverse the values bitwise as follows:
hexa-rev
0 628DAA592280
1 457FAA592280
2 5A8DAA592280
Please help
I'll show you the complete solution up here and then explain its parts below:
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
There are possibly a couple ways of doing it, but this way should solve your problem. The general strategy will be defining a function and then using the apply() method to apply it to all values in the column. It should look something like this:
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
Now we need to define the function we're going to apply to it. Breaking it down into its parts, we strip the first and last bit by indexing. Because of how negative indexes work, this will eliminate the first and last bit, regardless of the size. Your result is a list of characters that we will join together after processing.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
The second line iterates through the list of characters, matches the first and second character of each bit together, and then concatenates them into a single string representing the bit.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
The second to last line returns the list you just made in reverse order. Lastly, the function returns a single string of bits.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
I explained it in reverse order, but you want to define this function that you want applied to your column, and then use the apply() function to make it happen.
I am new to Snakemake and have a problem in Snakemake expand function.
First, I need to have a group of combinations and use them as base to expand another vector upon them with pair-wise elements combinations of it.
Lets say the set for the pairwise combination is
setC=["A","B","C","D"]
I get the partial group as follows:
part_group1 = expand("TEMPDIR/{setA}_{setB}_", setA = config["setA"], setB = config["setB"]
Then, (if that is OK), I used this partial group, to expand another set with its pairwise combinations. But I am not sure how to expand pairwise combinations of setC as seen below. It is obviously not correct; just written to clarify the question. Also, how to input the name of the expanded estimator from shell?
rule get_performance:
input:
xdata1 = TEMPDIR + part_group1 +"{setC}.rda"
xdata2 = TEMPDIR + part_group1 +"{setC}.rda"
estimator1= {estimator}
output:
results = TEMPDIR + "result_" + part_group1 +{estimator}_{setC}_{setC}.txt"
params:
Rfile = FunctionDIR + "function.{estimator}.R"
shell:
"Rscript {params.Rfile} {input.xdata1} {input.xdata12} {input.estimator1} "
"{output.results}"
The expand function will return a list of the product of the variables used. For example, if
setA=["A","B"]
setB=["C","D"]
then
expand("TEMPDIR/{setA}_{setB}_", setA = config["setA"], setB = config["setB"]
will give you:
["TEMPDIR/A_C_","TEMPDIR/A_D_","TEMPDIR/B_C_","TEMPDIR/B_D_"]
Your question is not very clear on what you want to achieve but I'll have a guess.
If you want to make pairwise combinations of setC:
import itertools
combiC=list(itertools.combinations(setC, 2))
combiList=list()
for c in combiC:
combiList.append(c[0]+"_"+c[1])
the you (probably) want the files:
rule all:
input: expand(TEMPDIR + "/result_{A}_{B}_estim{estimator}_combi{C}.txt",A=setA, B=setB, estimator=estimators, C=combiList)
I'm putting some words like "estim" and "combi" not to confuse the wildcards here. I do not know what the list or set "estimators" is supposed to be but I suppose you have declared it above.
Then your rule get_performance:
rule get_performance:
input:
xdata1 = TEMPDIR + "/{A}_{B}_{firstC}.rda",
xdata2 = TEMPDIR + "/{A}_{B}_{secondC}.rda"
output:
TEMPDIR + "/result_{A}_{B}_estim{estimator}_combi{firstC}_{secondC}.txt"
params:
Rfile = FunctionDIR + "/function.{estimator}.R"
shell:
"Rscript {params.Rfile} {input.xdata1} {input.xdata2} {input.estimator} {output.results}"
Again, this is a guess since you haven't defined all the necessary items.
I am trying to modify a method in an existing module to adapt functionality.
What does + operator do in this line?
for line in payment.move_line_ids + expense_sheet.account_move_id.line_ids:
Hello M.E.,
Solution
operator of use is concatenation/combine of two List/String/Tupple.
Example
Plus(+) Operator use with two List
a = [1,2,3]
b = [4,5]
print a + b
output = [1,2,3,4,5]
+ operator use with two String
a = "Vora"
b = " mayur"
print a + b
output = "vora mayur"
+ operator use with two tupple
a = (1,2,3)
b = (4,5)
print a + b
output = (1,2,3,4,5)
It concatenates account.move.line records from payment.move_line_ids and expense_sheet.account_move_id.line_ids into a single recordset, which is then iterated over. Please note that the result of the __add__ (+) operation might contain duplicates if the same account.move.line is present in both operands. If you want to avoid duplicates, use the | (OR) operator.
Is is possible in SPSS to store a value in a variable (not a variable created in a data set)?
For example I have a loop for which I want to pass the value 4 to all the locations in the loop that say NumLvl.
NumLvl = 4.
VECTOR A1L(NumLvl-1).
LOOP #i = 1 to NumLvl-1.
COMPUTE A1L(#i) = 0.
IF(att1 = #i) A1L(#i) = 1.
IF(att1 = NumLvl) A1L(#i) = -1.
END LOOP.
EXECUTE.
You can do this using DEFINE / !ENDDEFINE SPSSs Macro Facility, for example:
DEFINE !MyVar () 4 !ENDDEFINE.
You can then use !MyVar as a substitute for 4 wherever in your syntax you wish.
See DEFINE / !ENDDEFINE documentation for further notes.
I want to load data from two different Excel files, and use them in the same table in QlikView.
I have two files (DIVISION.xls and REGION.xls), and I'm using the following code:
let tt = 'DIVISION$' and 'REGION$';
FOR Each db_schema in 'DIVISION.xls','REGION.xls'
FOR Each v_db in $(tt)
div_reg_table:
LOAD *
FROM $(db_schema)
(biff, embedded labels, table is $(v_db));
NEXT
NEXT
This code works fine, and does not show any error, but I don't get any data, and I don't see my new table (div_reg_table).
Can you help me?
The main reason that your code does not load any data is due to the form of the tt variable.
When your inner loop executes, it evaluates tt (denoted by $(tt)), which then results in the evaluation of:
'DIVISION$' AND 'REGION$'
Which results in null, since these are just two strings.
If you change your statement slightly, from LET to SET and remove the AND, then the inner loop will work. For example:
SET tt = 'DIVISION$', 'REGION$';
However, this also now means that your inner loop will be executed for each value in tt for both workbooks. This means that unless you have a REGION sheet in your DIVISION workbook, then the load will fail.
To avoid this, you may wish to restructure your script slightly so that you have a control table which details the files and tables to load. An example I prepared is shown below:
FileList:
LOAD * INLINE [
FileName, TableName
REGION.XLS, REGION$
DIVISION.XLS, DIVISION$
];
FOR i = 0 TO NoOfRows('FileList') - 1
LET FileName = peek('FileName', i, 'FileList');
LET TableName = peek('TableName', i, 'FileList');
div_reg_table:
LOAD
*
FROM '$(FileName)'
(biff, embedded labels, table is '$(TableName)');
NEXT
If you have just two files DIVISION.XLS and REGION.XLS then it may be worth just using two separate loads one after the other, and removing the for loop entirely. For example:
div_reg_table:
LOAD
*
FROM [DIVISION.XLS]
(biff, embedded labels, table is DIVISION$)
WHERE DivisionName = 'AA';
LOAD
*
FROM [REGION.XLS]
(biff, embedded labels, table is REGION$)
WHERE RegionName = 'BB';
Don't you need to make the noofrows('Filelist') test be 1 less than the answer.
noofrows('filelist') will evaluate to 2 so you will get a loop step for 0,1 and 2. So 2 will return a null which will cause it to fail.
I did it like this:
FileList:
LOAD * INLINE [
FileName, TableName
REGION.XLS, REGION
DIVISION.XLS, DIVISION
];
let vNo= NoOfRows('FileList')-1;
FOR i = 0 TO $(vNo)
LET FileName = peek('FileName', i, 'FileList');
LET TableName = peek('TableName', i, 'FileList');
div_reg_table:
LOAD
*,
$(i) as I,
FileBaseName() as SOURCE
FROM '$(FileName)'
(biff, embedded labels, table is '$(TableName)');
NEXT