Adding column value for a list of indexes - pandas

I have a list of indexes and trying to populate a column 'Type' for these rows only.
What I tried to do:
index_list={1,5,9,10,13}
df.loc[index_list,'Type']=="gain+loss"
Output:
1 False
5 False
9 False
10 False
13 False
But the output just gives the list with all False instead of populating these rows.
Thanks for any advice.

You need to put a single equal instead of double equal. In python, and in most progamming languages, == is the comparison operator. In your case you need the assignment operator =.
So the following code will do what you want :
index_list={1,5,9,10,13}
df.loc[index_list,'Type']="gain+loss"

Related

How to group by one column if condition is true in another column summing values in third column with pandas

I can't think of how to do this:
As the headline explains I want to group a dataframe by the column acquired_month only if another column contains Closed Won(in the example I made a helper column that just marks True if that condition is fulfilled although I'm not sure that step is necessary). Then if those conditions are met I want to sum the values of a third column but can't think how to do it. Here is my code so far:
us_lead_scoring.loc[us_lead_scoring['Stage'].str.contains('Closed Won'), 'closed_won_binary'] = True acquired_date = us_lead_scoring.groupby('acquired_month')['closed_won_binary'].sum()
but this just sums the true false column not the sum column if the true false column is true after the acquired_month groupby. Any direction appreciated.
Thanks
If need aggregate column col replace non matched values to 0 values in Series.where and then aggregate sum:
us_lead_scoring = pd.DataFrame({'Stage':['Closed Won1','Closed Won2','Closed', 'Won'],
'col':[1,3,5,6],
'acquired_month':[1,1,1,2]})
out = (us_lead_scoring['col'].where(us_lead_scoring['Stage']
.str.contains('Closed Won'), 0)
.groupby(us_lead_scoring['acquired_month'])
.sum()
.reset_index(name='SUM'))
print (out)
acquired_month SUM
0 1 4
1 2 0

How to set multiple conditions for a Dataframe while modifying the values?

So, I'm looking for an efficient way to set up values within an existing column and setting values for a new column based on some conditions. If I have 10 conditions in a big data set, do I have to write 10 lines? Or can I combine them somehow...haven't figured it out yet.
Can you guys suggest something?
For example:
data_frame.loc[data_frame.col1 > 50 ,["col1","new_col"]] = "Cool"
data_frame.loc[data_frame.col2 < 100 ,["col1","new_col"]] = "Cool"
Can it be written in a single expression? "&" or "and" don't work...
Thanks!
yes you can do it,
here is an example:
data_frame.loc[(data_frame["col1"]>100) & (data_frame["col2"]<10000) | (data_frame["col3"]<500),"test"] = 0
explanation:
the filter I used is (with "and" and "or" conditions): (data_frame["col1"]>100) & (data_frame["col2"]<10000) | (data_frame["col3"]<500)
the column that will be changed is "test" and the value will be 0
You can try:
all_conditions = [condition_1, condition_2]
fill_with = [fill_condition_1_with, fill_condition_2_with]
df[["col1","new_col"]] = np.select(all_conditions, fill_with, default=default_value_here)

How to return different rows from the same table on single query as a list

I have a table which has a boolean column, this column is used to filter some responses. I'm in the need to return a response as a tuple as {claimed, unclaimed} (Imagine the table is called winnings)
While working on it I've done two separate queries to return claimed then unclaimed rows and manually constructing the response, then I went with returning all rows without checking the boolean column and splitting it outside of the query. Now I'm wondering if there's a way I can run a single query on the same table and return both claimed and unclaimed as separate results mainly for performance hoping it runs better. I've tried doing it with joins but its returning a list of two items tuples like:
[{claimed, unclaimed}, {claimed, unclaimed}]...
While I want:
{claimed, unclaimed}
# OR
[{claimed, unclaimed}]
At most, no more tuples. Note that I'm not running the raw queries but using a library so excuse if the terminology is not right.
This is the last query I ran:
SELECT w0."claimed", w1."claimed"
FROM "winnings" AS w0
INNER JOIN "winnings" AS w1 ON TRUE
WHERE (w0."claimed" AND NOT (w1."claimed"))
LIMIT 10;
EDIT: More details.
When I run the query from above this is the result I get:
=> SELECT w0."claimed", w1."claimed" FROM "winnings" AS w0 INNER JOIN "winnings" AS w1 ON TRUE WHERE (w0."claimed" AND NOT (w1."claimed")) LIMIT 10;
claimed | claimed
---------+---------
t | f
t | f
t | f
t | f
t | f
t | f
t | f
t | f
t | f
t | f
(10 rows)
This is converted to the following on Elixir which is the language I'm using:
[
true: false,
true: false,
true: false,
true: false,
true: false,
true: false,
true: false,
true: false,
true: false,
true: false
]
This is a keyword list which internally is a list of tuples as [{true, false}, {true, false}] - I want: [{[true, true], [false, false]}]
Means that I want 2 lists, each list with their respective rows, only claimed on one and only unclaimed on the other one.
I don't really mind the type it outputs as long as it includes two lists with their rows how I said.
To get the first column from a list of rows, you can use Enum.map/2 to get the first element of each tuple:
Enum.map(rows, &elem(&1, 0))
If you're a newcomer to elixir, the & syntax may be a bit confusing. That code is shorthand for
Enum.map(rows, fn field -> elem(field, 0) end)
You could make that into a function that does that for all your columns like this:
def columnize(rows = [first_row | _]) when is_tuple(first_row) do
for column <- 1..tuple_size(first_row), do: Enum.map(rows, &elem(&1, column - 1))
end
def columnize([]) do
[]
end
hd/1 is a function used to get the first tuple in the list. The = [first_row | _] part guarantees the argument is a list with at least one element; the when is_tuple(first_row) assures at least the first row is a tuple.
If you'd like to know more about the for, it's a comprehension.
Note that this function assumes all rows have the same number of columns and are tuples (which should be true in the results of a query; but may not be in other cases), and will throw an ArgumentError if the first row is wider than any other row. It will also cause errors if other elements of the list are not tuples.
This feels very unidiomatic though. Is this maybe an XY problem?

Getting only True values and their respective indices from a Pandas series

I have a pandas series that looks like this, extracted on querying a dataframe.
t_loc=
312 False
231 True
324 True
286 False
123 False
340 True
I want only the indices that have 'True' boolean value.
I tried t_loc.index gives me all the indices. t_loc['True'] or t_loc[True] are both futile. Need help.
Also, I need to update these locations with a single number if True. How can I update a column in a dataframe given the location numbers ?
Desired O/P:
[231,324,340]
Need to update eg. df[col1] # 231.. is it df[col1].loc[231] ? How to specify multiple locations? Can we pass the entire list since I need to update it with only one value for all the locations ?
This actually works too :
t_loc.index[t_loc == True/False]
u can try this as well
t_loc.astype(int).index[t_loc == 1]

negative of "contains" in openrefine

I would like to add a column based on another column and fill it with all the values that do NOT contain "jpg"
so the negation of this:
filter(value.split(","), v, v.contains("jpg")).join("|")
How can I write "does not contain"?
contains gives a boolean output i.e. true or false. So we have:
v = "picture.jpg" -> v.contains("jpg") = TRUE
v = "picture.gif" -> v.contains("jpg") = FALSE
filter finds all values in an array which return TRUE for whatever condition you use in the filter. There are a couple of ways you could filter an array to find the values that don't contain a string, but using contains the simplest is probably to use not to reverse the result of your condition:
filter(value.split(","), v, not(v.contains("jpg"))).join("|")