What does + operator do in this line? - odoo

I am trying to modify a method in an existing module to adapt functionality.
What does + operator do in this line?
for line in payment.move_line_ids + expense_sheet.account_move_id.line_ids:

Hello M.E.,
Solution
operator of use is concatenation/combine of two List/String/Tupple.
Example
Plus(+) Operator use with two List
a = [1,2,3]
b = [4,5]
print a + b
output = [1,2,3,4,5]
+ operator use with two String
a = "Vora"
b = " mayur"
print a + b
output = "vora mayur"
+ operator use with two tupple
a = (1,2,3)
b = (4,5)
print a + b
output = (1,2,3,4,5)

It concatenates account.move.line records from payment.move_line_ids and expense_sheet.account_move_id.line_ids into a single recordset, which is then iterated over. Please note that the result of the __add__ (+) operation might contain duplicates if the same account.move.line is present in both operands. If you want to avoid duplicates, use the | (OR) operator.

Related

In read_csv(), how to set mutiple encodings?

In read_csv(), how to set mutiple encodings ? For instance, when use below a\b to prase the character separately, the result is ok, But when the combine character from a,b , the parse failed as code c
can work
a <- readr::read_csv(I("type\nBlitzangebotsgebühr\nÜbertrag"),locale = locale(encoding='ISO-8859-1'))
can work
b <- readr::read_csv(I("type\n怳崬傒\n拲暥奜椏嬥"),locale = locale(encoding='Shift_JIS'))
can't work
c <- readr::read_csv(I("type\nBlitzangebotsgebühr\nÜbertrag\n怳崬傒\n拲暥奜椏嬥"),locale = locale(encoding='Shift_JIS'))

How to set multiple conditions for a Dataframe while modifying the values?

So, I'm looking for an efficient way to set up values within an existing column and setting values for a new column based on some conditions. If I have 10 conditions in a big data set, do I have to write 10 lines? Or can I combine them somehow...haven't figured it out yet.
Can you guys suggest something?
For example:
data_frame.loc[data_frame.col1 > 50 ,["col1","new_col"]] = "Cool"
data_frame.loc[data_frame.col2 < 100 ,["col1","new_col"]] = "Cool"
Can it be written in a single expression? "&" or "and" don't work...
Thanks!
yes you can do it,
here is an example:
data_frame.loc[(data_frame["col1"]>100) & (data_frame["col2"]<10000) | (data_frame["col3"]<500),"test"] = 0
explanation:
the filter I used is (with "and" and "or" conditions): (data_frame["col1"]>100) & (data_frame["col2"]<10000) | (data_frame["col3"]<500)
the column that will be changed is "test" and the value will be 0
You can try:
all_conditions = [condition_1, condition_2]
fill_with = [fill_condition_1_with, fill_condition_2_with]
df[["col1","new_col"]] = np.select(all_conditions, fill_with, default=default_value_here)

In SQL how do I group by every one of a long list of columns and get counts, assembled all into one table?

I have performed a stratified sample on a multi-label dataset before training a classifier and want to check how balanced it is now. The columns in the dataset are:
|_Body|label_0|label_1|label_10|label_100|label_101|label_102|label_103|label_104|label_11|label_12|label_13|label_14|label_15|label_16|label_17|label_18|label_19|label_2|label_20|label_21|label_22|label_23|label_24|label_25|label_26|label_27|label_28|label_29|label_3|label_30|label_31|label_32|label_33|label_34|label_35|label_36|label_37|label_38|label_39|label_4|label_40|label_41|label_42|label_43|label_44|label_45|label_46|label_47|label_48|label_49|label_5|label_50|label_51|label_52|label_53|label_54|label_55|label_56|label_57|label_58|label_59|label_6|label_60|label_61|label_62|label_63|label_64|label_65|label_66|label_67|label_68|label_69|label_7|label_70|label_71|label_72|label_73|label_74|label_75|label_76|label_77|label_78|label_79|label_8|label_80|label_81|label_82|label_83|label_84|label_85|label_86|label_87|label_88|label_89|label_9|label_90|label_91|label_92|label_93|label_94|label_95|label_96|label_97|label_98|label_99|
I want to group by every label_* column once, and create a dictionary of the results with positive/negative counts. At the moment I am accomplishing this in PySpark SQL like this:
# Evaluate how skewed the sample is after balancing it by resampling
stratified_sample = spark.read.json('s3://stackoverflow-events/1901/Sample.Stratified.{}.*.jsonl'.format(limit))
stratified_sample.registerTempTable('stratified_sample')
label_counts = {}
for i in range(0, 100):
count_df = spark.sql('SELECT label_{}, COUNT(*) as total FROM stratified_sample GROUP BY label_{}'.format(i, i))
rows = count_df.rdd.take(2)
neg_count = getattr(rows[0], 'total')
pos_count = getattr(rows[1], 'total')
label_counts[i] = [neg_count, pos_count]
The output is thus:
{0: [1034673, 14491],
1: [1023250, 25914],
2: [1030462, 18702],
3: [1035645, 13519],
4: [1037445, 11719],
5: [1010664, 38500],
6: [1031699, 17465],
...}
This feels like it should be possible in one SQL statement, but I can't figure out how to do this or find an existing solution. Obviously I don't want to write out all the column names and generating SQL seems worse than this solution.
Can SQL do this? Thanks!
You can indeed do that in one statement but I am not sure the performances will be good.
from pyspark.sql import functions as F
from functools import reduce
dataframes_list = [
stratified_sample.groupBy(
"label_{}".format(i)
).count().select(
F.lit("label_{}".format(i)).alias("col"),
"count"
)
for i in range(0, 100)
]
count_df = reduce(
lambda a, b: a.union(b),
dataframes_list
)
This will create a dataframe with 2 colummns, col which contains the name of the column you are counting, and count the value of the count.
To change it to a dict, I let you read another post.
Here is a solution with single sql, to get all pos and neg counts
sql = 'select '
for i in range(0, 100):
sql = sql + ' sum(CASE WHEN label_{} > 0 THEN 1 ELSE 0 END) as label{}_pos_count, '.format(i,i)
sql = sql + ' sum(CASE WHEN label_{} < 0 THEN 1 ELSE 0 END) as label{}_neg_count'.format(i,i)
if i < 99:
sql = sql + ', '
sql = sql + ' from stratified_sample '
df = spark.sql(sql)
rows = df.rdd.take(1)
label_counts = {}
for i in range(0, 100):
label_counts[i] = [rows[0][2*i],rows[0][2*i+1] ]
print(label_counts)
You can generate sql without group by.
Something like
SELECT COUNT(*) AS total, SUM(label_k) as positive_k ,.. FROM table
And then use the result to produce your dict {k : [total-positive_k, positive_k]}

setting multiple apply columns

I'm trying to set multiple columns with the apply method from an array (instead of having 3 different lines as the declaration). I would like to have 3 columns set from the dataframe apply method by different args from an array.
declaring in separate lines works, but not very clean.
days=np.array([30,45,60])
def move(row,days):
return row.X / 100 * np.sqrt(days/365)
### I am trying to clean this up -- there's got to be a simpler way!!
#df['Move30'] = df.apply(move,args=(days[0], ),axis=1)
#df['Move45'] = df.apply(move,args=(days[1], ),axis=1)
#df['Move60'] = df.apply(move,args=(days[2], ),axis=1)
### This succeeds but not any cleaner
df['Move30'], df['Move45'], df['Move60'] = df.apply(move,args=(days[0], ),axis=1), df.apply(move,args=(days[1], ),axis=1), df.apply(move,args=(days[2], ),axis=1)
### Is there some way to create...?
df['Move30'], df['Move45'], df['Move60'] = df.apply(move,args=([days[0],days[1],days[2]], ),axis=1)
You can write this as a for loop:
for d in days:
df[f'Move{d}'] = df.apply(move,args=(d, ),axis=1)
In python 2 you'd have to use 'Move' + str(d) instead of f'Move{d}'.
However, I suspect you'd be better off vectorizing this...

TypeError: 'DataFrame' object is not callable in concatenating different dataframes of certain types

I keep getting the following error.
I read a file that contains time series data of 3 columns: [meter ID] [daycode(explain later)] [meter reading in kWh]
consum = pd.read_csv("data/File1.txt", delim_whitespace=True, encoding = "utf-8", names =['meter', 'daycode', 'val'], engine='python')
consum.set_index('meter', inplace=True)
test = consum.loc[[1048]]
I will observe meter readings for all the length of data that I have in this file, but first filter by meter ID.
test['day'] = test['daycode'].astype(str).str[:3]
test['hm'] = test['daycode'].astype(str).str[-2:]
For readability, I convert daycode based on its rule. First 3 digits are in range of 1 to 365 x2 = 730, last 2 digits in range of 1 to 48. These are 30-min interval reading of 2-year length. (but not all have in full)
So I create files that contain dates in one, and times in another separately. I will use index to convert the digits of daycode into the corresponding date & time that these file contain.
#dcodebook index starts from 0. So minus 1 from the daycode before match
dcodebook = pd.read_csv("data/dcode.txt", encoding = "utf-8", sep = '\r', names =['match'])
#hcodebook starts from 1
hcodebook = pd.read_csv("data/hcode.txt", encoding = "utf-8", sep ='\t', lineterminator='\r', names =['code', 'print'])
hcodebook = hcodebook.drop(['code'], axis= 1)
For some weird reason, dcodebook was indexed using .iloc function as I understood, but hcodebook needed .loc.
#iloc: by int-position
#loc: by label value
#ix: by both
day_df = dcodebook.iloc[test['day'].astype(int) - 1].reset_index(drop=True)
#to avoid duplicate index Valueerror, create separate dataframes..
hm_df = hcodebook.loc[test['hm'].astype(int) - 1]
#.to_frame error / do I need .reset_index(drop=True)?
The following line is where the code crashes.
datcode_df = day_df(['match']) + ' ' + hm_df(['print'])
print datcode_df
print test
What I don't understand:
I tested earlier that columns of different dataframes can be merged using the simple addition as seen
I initially assigned this to the existing column ['daycode'] in test dataframe, so that previous values will be replaced. And the same error msg was returned.
Please advise.
You need same size of both DataFrames, so is necessary day and hm are unique.
Then reset_index with drop=True for same indices and last remove () in join:
day_df = dcodebook.iloc[test['day'].astype(int) - 1].reset_index(drop=True)
hm_df = hcodebook.loc[test['hm'].astype(int) - 1].reset_index(drop=True)
datcode_df = day_df['match'] + ' ' + hm_df['print']