Excel: looking for a value in a specific range - vba

I have over 3000 values in column A. I have a value of x in column B. I want excel to look through the values in column A and give back a "yes" if there is a value bigger than x, yet smaller than x+7 (x+7>value to be found in column A >x). If such value(s) does not exist, then display "no"
Here's an example:
Column A
2: 11.2
3: 11.3
4: 11.4
5: 13.5
6: 13.6
7: 20.5
8: 20.6
9: 30.5
Column B
2: 11.1
3: 20.7
In this case, since there are values in Column 1 that are bigger than 11.1 and within the range (smaller than B2+7, and bigger than B2), I need excel to give back "yes". If possible, it would be ideal to also give the first value after the specific value in column B.
Here's what I have tried so far but have had no success:
=IF(AND((B2+7)>A1:A3000>B2),"yes","no")
=IF(AND((B2+7)>$A$2:$A$3000,$A$2:$A$3000>B2),"yes","no")
How can I do this in Excel? is there a way to do this other than using IF?

Forgive me if I am not understanding the question, but isn't the answer:
=IF(AND((B2+7)>$A$2:$A$10,$A$2:$A$10>B2),"yes","no")
Where that would be the equation in C2, testing B2 for entries in the list that spans A2 - A10. You'd copy that equation down the column for all the entries in B column.

Try this (for cell B2)
=IF(SUMPRODUCT((A:A>B2)*(A:A<B2+7)),"Yes","No")
and copy down as far as required.
For the second part, to return the next larger value, try this
=IF(SUMPRODUCT((A:A>B2)*(A:A<B2+7)),INDEX(A:A,MATCH(B2,A:A,1)+1),"No")
Note that this requires the data in column A to be sorted ascending (as your sample ata is) and for cell A to contain '0'. If these conditions are not possible, then you might have to consider a VBA user defined function.

I don't understand what you require but maybe =IF(AND(A2>B$2,A2<B$2+7),"yes","no") copied down would serve to test B2 and B2+7 against each of the Column A values.

Related

How to update column A value with column B value based on column B's string length property?

I scraped a real estate website and produced a CSV output with data requiring to be cleaned and structured. So far, my code properly organized and reformatted the data for it to work with stats software.
However, every row and then, my 'Gross area' column has the wrong value in m2. The correct value appears in another column ('Furbished').
Gross_area
Furbished
170 #erroneous
190 m2
170 #erroneous
190 m2
160 #correct
Yes
155 #correct
No
I tried using the np.where function. However, I could not specify the condition based on string length, which would allow me to target all '_ _ _ m2' values in column 'Furbished' and reinsert them in 'Gross_area'. It just doesn't work.
df['Gross area']=np.where(len(df['Furbished]) == 6, df['Furbished'],df['Gross area']
As an alternative, I tried setting cumulative conditions to precisely target my '_ _ _ m2' values and insert them in my 'Gross area' column. It does not work:
df['Gross area']=np.where((df['Furbished]) != 'Yes' or 'No', df['Furbished'],df['Gross area']
The outcome I seek is:
Gross_area
Furbished
190 m2
190 m2
190 m2
190m2
160
Yes
Any suggestions? Column Furbished string length criterion would be the best option, as I have other instances that would require the same treatment :)
Thanks in advance for your help!
There is probably a better way to do this, but you could get the intended effect by a simple df.apply() function.
df['Gross area'] = df.apply(lambda row: row['Furbished'] if len(row['Furbished']) == 6 else df['Gross area'], axis=1)
With a simple change, you can also keep the 'Gross area' column in the right type.
df['Gross area'] = df.apply(lambda row: float(row['Furbished'][:-2]) if len(row['Furbished']) == 6 else df['Gross area'], axis=1)
You can use pd.where:
df['Gross_area'] = df['Furbished'].where(df['Furbished'].str.len() == 6, df['Gross_area'])
This tells you to use the value in the Furbished column if its length is 6, otherwise use the value in the Gross_area column.
Result:
Gross_area Furbished
0 190 m2 190 m2
1 190 m2 190 m2
2 160 #correct Yes
3 155 #correct No
Thanks a lot for your help! The suggestion of Derek was the simplest to implement in my program:
df['Gross area']=df['Furbished'].where(df['Furbished'].str.len()==6,df['Gross area'])
I could create a set of rules to replace or delete all the misreferenced data :)
To update data from given column A if column B equals given string
df['Energy_Class']=np.where(df['Energy_Class']=='Usado',df['Bathrooms'],df['Energy_Class'])
To replace string segment found within column rows
net=[]
for row in net_col:
net.append(row)
net_in=[s for s in prices if 'm²' in s]
print(net_in)
net_1=[s.replace('m²','') for s in net]
net_2=[s.replace(',','.') for s in net_1]
net_3=[s.replace('Sim','') for s in net_2]
df['Net area']=np.array(net_3)
To create new column and append standard value B if value A found in existing target column rows
Terrace_list=[]
caraocl0=(df['Caracs/0']
for row in carac_0:
caracl0.append(row)
print(caracl0)
if row == 'Terraço':
yes='Yes'
Terrace_list.append(yes)
else:
null=('No')
Terrace_list.append(null)
df['Terraces']=np.array(Terrace_list)
To append pre-set value B in existing column X if value A found in existing column Y.
df.loc[df['Caracs/1']=='Terraço','Terraces']='Yes'
Hope this helps someone out.

Pandas dataframe selection df['a'][50][:51]

I have a dataframe where one of the column name is 'a'
I came across a following selection expression
dataframe['a'][50][:50]
I understand dataframe['a'][50] selects the row 49 in column ['a'], but what does [:50] do?
Thank you
If dataframe['a'][50][:50] doesn't error out and it actually returns something, it means the row 49 in column ['a'] contains iterables(more precisely sequence types) such as list, string, tuple...
dataframe['a'][50][:50] returns the sequence from element 0 to 49 from the value of the row 49 in column ['a'].
As I said above, if the row 49 in column ['a'] doesn't contain a sequence type, you will get errors. Try check dataframe['a'][50] to see if it is a sequence type
Note: dataframe['a'][50] is chain-indexing. It is not recommended. However, it is out of the scope of this question so I don't go into the detail of it.

How to put an upper bound over the sum of each row in a table in GAMS?

I have a table called "latencies" and it encompasses 2 sets, a and b, and a variable y to iterate over this table. As well, I have some parameters for a that must be satisfied:
table latencies(a, b)
b1 b2 b3
a1 1 2 3
a2 4 5 6
a3 7 9 8;
parameter pam1(a) /"a1" 12, "a2" 13, "a3" 14/;
positive variable y(a,b);
I am trying to make the sum of each row from the latencies table at most each respective element in the parameter pam1.
equations maxime(a), ...;
maxime(a)..
sum(a, y(a,b)) =l= pam1(a);
So the sum of the first row in latencies should be less than or equal to 12, the sum of the 2nd row should be less than or equal to 13, etc. However, I am getting these errors: "Set is under control already" and "Uncontrolled set entered as constant" on the same equation above. How do I do this?
Here is the corrected solution (which works):
equations maxime(a), ...;
maxime(a)..
sum(b, y(a,b)) =l= pam1(a);
I was incorrectly setting the row index (a) as my controlling index before. I needed to set that index as b, the column index. That is how you would iterate over the sum of each row and put an upper bound on it.

Need explanation on how pandas.drop is working here

I have a data frame, lets say xyz. I have written code to find out the % of null values each column possess in the dataframe. my code below:
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)
let say i got following results:
abc 26.63
def 36.58
ghi 78.46
I want to drop column ghi because it has more than 70% of null values.
I achieved it using the following code:
xyz = xyz.drop(xyz.loc[:,round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70].columns, 1)
but , i did not understand how does this code works, can anyone please explain it?
the code is doing the following:
xyz.drop( [...], 1)
removes the specified elements for a given axis, either by row or by column. In this particular case, df.drop( ..., 1) means you're dropping by axis 1, i.e, column
xyz.loc[:, ... ].columns
will return a list with the column names resulting from your slicing condition
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70
this instruction is counting the number of nulls, adding them up and normalizing by the number of rows, effectively computing the percentage of nan in each column. Then, the amount is rounded to have only 2 decimal positions and finally you return True is the number of nan is more than 70%. Hence, you get a mapping between columns and a True/False array.
Putting everything together: you're first producing a Boolean array that marks which columns have more than 70% nan, then, using .loc you use Boolean indexing to look only at the columns you want to drop ( nan % > 70%), then using .columns you recover the name of such columns, which then are used by the .drop instruction.
Hopefully this clear things up!
If you code is hard to understand , you can just check dropna with thresh, since pandas already cover this case.
df=df.dropna(axis=1,thresh=round(len(df)*0.3))

Find biggest subset that sums to zero in excel or access(vba,sql or anything)

I have a column of numbers in excel, with positives and negatives. It is an accounting book. I need to eliminate cells that sums to zero. It means I want to remove the subset, so the rest of element can not form any subset to sum to zero. I think this problem is to find the largest subset sum. By remove/eliminate, I mean to mark them in excel.
For example:
a set {1,-1,2,-2,3,-3,4,-4,5,-5,6,7,8,9},
I need a function that find subset {1,-1,2,-2,3,-3,4,-4,5,-5} and mark each element.
This suggestion may be a little heavy-handed, but it should be able to handle a broad class of problems -- like when one credit may be zeroed out by more than one debit (or vice versa) -- if that's what you want. Like you asked for, it will literally find the largest subset that sums to zero:
Enter your numbers in column A, say in the range A1:A14.
In column B, beside your numbers, enter 0 in each of the cells B1:B14. Eventually, these cells will be set to 1 if the corresponding number in column A is selected, or 0 if it isn't.
In cell C1, enter the formula =A1*B1. Copy the formula down to cells C2:C14.
At the bottom of column B, in cell B15, enter the formula =SUM(B1:B14). This formula calculates the count of your numbers that are selected.
At the bottom of column C, in cell C15, enter the formula =SUM(C1:C14). This formula calculates the sum of your numbers that are selected.
Activate the Solver Add-In and use it for the steps that follow.
Set the objective to maximize the value of cell $B$15 -- in other words, to maximize the count of your numbers that are selected (that is, to find the largest subset).
Set the following three constraints to require the values in cells B1:B14 (that indicate whether or not each of your numbers is selected) to be 0 or 1: a) $B$1:$B$14 >= 0, b) $B$1:$B$14 <= 1, and, c) $B$1:$B$14 = integer.
Set the following constraint to require the selected numbers to add up to 0: $C$15 = 0.
Use the Solver Add-In to solve the problem.
Hope this helps.
I think that you need to better define your problem because as it is currently stated there is no clear answer.
Here's why. Take this set of numbers:
{ -9, -5, -1, 6, 7, 10 }
There are 64 possible subsets - including the empty set - and of these three have zero sums:
{ -9, -1, 10 }, { -5, -1, 6 } & { }
There are two possible "biggest" zero-sum subsets.
If you remove either of these you end up with either of:
{ -5, 6, 7 } or { -9, 7, 10 }
Neither of these sum to zero, but there's no rule to determine which subset to pick.
You could decide to remove the "merged" set of zero sum subsets. This would leave you with:
{ 7 }
But does that make sense in your accounting package?
Equally you could just decide to eliminate only pairs of matching positive & negative numbers, but many transactions would involve triples (i.e. sale = cost + tax).
I'm not sure your question can be answered unless you describe your requirements more clearly.