Columnwise setting all values that meet a condition to zero - numpy

I have an array that looks like this:
M=np.array([[1,2,3],[4,9,2],[3,5,6],[8,1,3]])
> M = [[1,2,3],
> [4,9,2],
> [3,5,6],
> [8,1,3]]
For each column I want to set the two smallest values to zero.
Thus I sort them in descending order (I know ascending would be faster)
M1 = np.sort(M, axis=0)[::-1]
Then I want to use something like
for column in range(M.shape[1]):
for row in range(M.shape[0]):
if M[row, column] < M1[1,column]:
M[row, column] = 0
and receive:
> M = [[0,0,0],
> [4,9,0],
> [0,5,6],
> [8,0,3]]
How can I make this last part more efficient (for array or DataFrame)?

Try:
M[M< M1[1,:]]=0
Outputs:
[[0 0 3]
[4 9 0]
[0 5 6]
[8 0 3]]

You can use a mask here, that you construct with:
M < M1[1,:]
we can thus set the elements for which this condition holds to 0 with:
M[M < M1[1,:]] = 0

Related

Dataframe change column value on if statement and keeps the new value to next row

I wish you good health to you and your family.
In my dataframe I have a column 'condition' which is filled with .astype(float).
Based on information that i put in this dataframe for every row it makes math and if is over specific amount it increase the value of 'condition' by 1 . Everything works fine with it and as it should be.
I made another column named ['order']. Which change its value if ['condition'] has value of 3. That's the code with witch you can see what I mean:
import pandas as pd
import numpy as np
def graph():
df = (pd.DataFrame(np.random.randint(-3,4,size=(100, 1)), columns=[('condition')]))
df['order'] = 0
df.loc[(df['condition'] == 3) & (df['order'] == 0) , 'order'] = df['order'] + 1
df.loc[(df['condition'] == -3) & (df['order'] == 1) , 'order'] = df['order'] + -1
df.to_csv('copy_bars.csv')
graph()
As you can see it changes the value in 'order' row to 1 when it fill first condition. But it never change back from 1 to 0 because of second if statement. It changes to 0 just because at the begging I give the row amount of 0.
How could I modify the code so when it is changed to 1 to keep this new value until second if statement fill ?
Row, Condition, Order
0 -1 0
1 3 1
2 -1 0
3 2 0
4 -2 0
5 -3 0
6 0 0
instead of this I would like to get in Order column for line from 1 to 4 to be represented with value of 1 so can my second condition trigger.
If I understood what you want this should be something like what you want. Because it is row by row and is based on two values it is not easy to vectorize but probably someone else can do it. Hope it works for you.
order = []
have_found_plus_3 = False
for i, row in df.iterrows():
if row['condition'] == 3:
have_found_plus_3 = True
elif row['condition'] == -3:
have_found_plus_3 = False
if have_found_plus_3:
order.append(1)
else:
order.append(0)
df['order'] = order

Count in string terms and stored mapped to other value

I have a pandas dataframe which includes columns (amongst others) like this, with RATING being integers 0 to 5 and COMMENT is string:
RATING COMMENT
1 some text
2 more text
3 other text
... ...
I would now like to mine (for lack of better word ) the key words for a list of strings:
list = ['like', trust', 'etc etc etc']
and would like to iterate through the COMMENT and count the number of key words by rating to get a df out like so
KEYWORD RATING COUNT
like 1 202
like 2 325
like 3 0
like 4 967
like 5 534
...
trust 1 126
....
how can I achieve this?
I am beginner so would really appreciate your help (and the simpler and more understandable the better)
thank you
hi at the moment I have been iterating through manually,
ie
#DATA_df is the original data
word_list = ['word', 'words', 'words', 'more']
values = [0] * len(word_list)
tot_val=[values]*5
rating_table = pd.DataFrame(tot_val, columns=word_list)
for i in len(word_list):
for g in len (DATA_df[COMMENT]):
if i in DATA_df[COMMENT][g]:
rating_table[i][DATA_df[RATING]-1] +=1
this give a DF like so
word words words more
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
that I am then trying to add to.... it appears really clunky
I managed to solve it, key points learnt are use group by to pre-select data based on the rating, this slices the data and it is possible to alternate through the groups. also use of str.lower() in combination with str.count() worked well.
I am thankful if more experienced programmers could show me a better solution, but at least this works.
rating = [1,2,3,4,5]
rategroup = tp_clean.groupby('Rating')
#print (rategroup.groups)
results_list =[]
for w in word_list:
current = [w]
for r in rating:
stargroup = rategroup.get_group(str(r))
found = stargroup['Content'].str.lower().str.count(w)
c = found.sum()
current.append(c)
results_list.append(current)
results_df = pd.DataFrame (results_list, columns=['Keyword','1 Star','2 Star','3 Star','4 Star','5 Star'])
The one thing I am still struggling with is how to use regex to make it look for full words. I believe \b is the right one but how do I put it into str.count function?

loop over a dataframe and populate with values

I am trying to loop over a dataframe and fill a new column with values according to a rule:
#formula for trading strategy
df['new_column'] = ""
for index,row in df.iterrows():
if row.reversal == 1:
row.new_column = 1
index += 126
row.new_column = -1
else:
row.new_column = 0
This formula is meant to populate the new column in a way that, when reversal=1, a value of 1 is given, followed by 0s for the next 125 rows, and a -1 in the 126th row. Then it should start again looking at whether the 127th item of the reversal column is 1 (indicating a reversal) or 0, etc. Instead, if reversal !=1, a value of 0 is given.
The problem is that when I take a look at the new column formed, it is still an empty column. There must be an error in the way I input the values in it. I looked at other ways to construct if statements for dataframes (e.g., lambda), but they do not allow me to perform all the operations in this code
new_column could have values: 0, 1 or -1,
i suggest you to initially load 0 your column, so no need to set 0:
df['new_column'] = 0
index = 0
while index < df.shape[0]:
if df['reversal'][index] == 1:
df.loc[index,'new_column'] = 1 #set 1 to same index
df.loc[index + 126, 'new_column'] = -1 #set -1 to 126th row
index = index + 127 #inc index to next loop
else:
index = index + 1
Be carefull the value of index is not bigger than the number of row of the dataframe
you could modify the test to secure the loop (to avoid error message):
if df['reversal'][index] == 1 and (index + 126) < df.shape[0]:

Gurobi Optimization Result Writing into Csv file

I am using Gurobi 7 to solve my MIP. I have several different variables. However, I am specifically interested in two of those, "x" and "y" namely. For the reference, I am giving my code that shows how I added x and y variables into the solver:
# Creating Variables
x = {}
y = {}
# Adding Variables
for i in range(I):
x[i+1,P[i]-d[0]] = m.addVar(vtype=GRB.BINARY, name="x%s" % str([i+1,P[i]-d[0]]))
x[i+1,P[i]] = m.addVar(vtype=GRB.BINARY, name="x%s" % str([i+1,P[i]]))
for i in range(I):
for k in range(len(rangevalue)):
y[i+1, rangevalue[k] - E[i]] = m.addVar(vtype=GRB.BINARY,
name="y%s" % str([i+1, rangevalue[k] - E[i]]))
Even though the above code may not really make any sense, I just wanted to show it in case you may use it for my problem.
After I solve the problem, I get the following results:
m.printAttr('X')
Variable X
-------------------------
x[1, 3] 1
sigmaminus[1] 874
x[2, 2] 1
sigmaminus[2] 1010
x[3, 2] 1
sigmaminus[3] 1945
x[4, 4] 1
sigmaplus[4] 75
x[5, 4] 1
sigmaminus[5] 1153
x[6, 5] 1
sigmaminus[6] 280
x[7, 3] 1
sigmaplus[7] 1138
x[8, 2] 1
sigmaplus[8] 538
x[9, 1] 1
sigmaplus[9] 2432
x[10, 5] 1
sigmaminus[10] 480
omega[1] 12
OMEGA[1] 12
omega[2] 9
OMEGA[2] 12
omega[3] 8
OMEGA[3] 9
omega[4] 8
OMEGA[4] 8
OMEGA[5] 8
y[1, 2] 1
y[2, 9] 1
y[3, 5] 1
y[4, 6] 1
y[5, 4] 1
y[6, 6] 1
y[7, 3] 1
y[8, 11] 1
y[9, 8] 1
y[10, 1] 1
phiplus[6] 1
phiminus[7] 1
phiminus[10] 1
I specifically want to display x and y variables with their indexes. Other variables are not necessary. My question is how can I write these results into an csv file on one column as following?
x[1,3]
x[2,2]
x[3,2]
.
.
.
x[10,5]
y[1,2]
y[2,9]
y[3,5]
.
.
.
y[10,1]
I do not need their corresponding value which can only be "1" since they are binary variables. I just need to write the variables which have the value "1".
I would do something along these lines:
import csv
if m.SolCount == 0:
print("Model has no solution")
exit(1)
var_names = []
for var in m.getVars():
# Or use list comprehensions instead
if 'x' == str(var.VarName[0]) and var.X > 0.1:
var_names.append(var.VarName)
# Write to csv
with open('out.csv', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerows(var_names)
I hope this helps. I am going to test this snippet a bit later. Update: works as intended.

Gnuplot - Iteration with two commands

I'm trying to build a sort of bar-chart using a simple data file (.example) containing only 0s or 1s. Here is the data contained in .example:
dest P1 P2 P3 P4 P5 NA
D1 0 1 1 0 0 0
D2 0 0 1 0 0 0
D3 0 1 0 1 0 0
""
GPV 1 1 1 1 1 1
and here is the code I'm using:
set style histogram rowstacked title textcolor lt -1
set datafile missing 'nan'
set style data histograms
plot '.example' using ( $2==0 ? 1 : 0 ) ls 17 title 'NA', \
'' using ( $2==1 ? 1 : 0 ) ls 1, \
for [i=3:5] '.example' using ( column(i)==0 ? 1 : 0) ls 17 notitle, \
for [i=3:5] '' using ( column(i)==1 ? 1 : 0) ls i-1
where the last two commands iterate over a potentially large number of
columns stacking white or colored boxes depending on the value of column(i). To keep the same color order among different columns in the histogram I would need to merge the two iterations into a single one with two commands.
Is it possible? Any suggestion on how to do that?
You can use nested loops, which I think is what you want to achieve. You can use an outer loop iterating over your large number of columns and an inner loop iterating over the two options (white vs. colored), for [i=3:5] for [j=0:1], and tell gnuplot to ignore the column if its content doesn't match the value of j using 1/0 (or use the trick, valid for histograms, of setting it to 0 as you're already doing):
set style histogram rowstacked title textcolor lt -1
set datafile missing 'nan'
set style data histograms
plot '.example' using ( $2==0 ? 1 : 0 ) ls 17 title 'NA', \
'' using ( $2==1 ? 1 : 0 ) ls 1, \
for [i=3:5] for [j=0:1] '.example' using ( column(i) == j ? 1 : 0 ) \
ls ( j == 0 ? 17 : i-1 ) notitle
The code above is equivalent to what you have already, only the value of j allows to switch the style depending on whether you have a 0 or a 1 as the column's value.