To find the closest value in the column data in VBA - vba

Iam writing the vba code to check closest value in the range of data.
Ex:In the worksheet("Sheet6") i have value 31.848
and the worksheet("Z73") i have list values
65.47
31.74
54.56
0.16
35.71
26.78
56.54
47.62
39.68
1.55
15.87
32.55
17.86
So i need to take the closest value to the 31.848.
Please help me with macro code to do this.

The general solution to your problem is as follows):
A = 31.848
for each value in your list:
if |A - B| < C then
C = |A - B|
end if
next value
your solution will be stored in C.
This is pseudo code, it will not solve your problem.
As for actual code, as #braX rightfully stated, we need to know what you have tried so far, where exactly you are stuck and what your vba skills are, so we know where to start with our help.

Related

Function to filter values in PySpark

I'm trying to run a for loop in PySpark that needs a to filter a variable for an algorithm.
Here's an example of my dataframe df_prods:
+----------+--------------------+--------------------+
|ID | NAME | TYPE |
+----------+--------------------+--------------------+
| 7983 |SNEAKERS 01 | Sneakers|
| 7034 |SHIRT 13 | Shirt|
| 3360 |SHORTS 15 | Short|
I want to iterate over a list of ID's, get the match from the algorithm and then filter the product's type.
I created a function that gets the type:
def get_type(ID_PROD):
return [row[0] for row in df_prods.filter(df_prods.ID == ID_PROD).select("TYPE").collect()]
And wanted it to return:
print(get_type(7983))
Sneakers
But I find two issues:
1- it takes a long time to do that (longer than I got doing a similar thing on Python)
2- It returns an string array type: ['Sneakers'] and when I try to filter the products, this happens:
type = get_type(7983)
df_prods.filter(df_prods.type == type)
java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [Sneakers]
Does anyone know a better way to approach this on PySpark?
Thank you very much in advance. I'm having a very hard time learning PySpark.
A little adjustment on your function. This returns the actual string of the target column from the first record found after filtering.
from pyspark.sql.functions import col
def get_type(ID_PROD):
return df.filter(col("ID") == ID_PROD).select("TYPE").collect()[0]["TYPE"]
type = get_type(7983)
df_prods.filter(col("TYPE") == type) # works
I find using col("colname") to be much more readable.
About the performance issue you've mentioned, I really cannot say without more details (e.g. inspecting the data and the rest of your application). Try this syntax and tell me if the performance improves.

How to split an an atomic vector or a data column within a data frame in R

I'm trying to either split one data column within a data frame into multiple columns added onto the existing data frame, or split up the atomic vector I created in an effort to identify individual variables using conditionals.
I'm using a data set that was created in Brazil so it has already had many formatting issues I've already corrected. ex. commas instead of decimals, adjusting date/time formats, etc. The biggest problem I'm now having is the final column in the data frame with the rows containing between 1-6 results.
This is what I've gotten to thus far and am receiving this error:
Error in CF_IDs$IDs : $ operator is invalid for atomic vectors
CF_IDs <- NewSet1$IDs[ ((NewSet1$Behaviour == "DP" | NewSet1$Behaviour == "P") & NewSet1$Interaction == "S") ]
str_split_fixed[CF_IDs$IDs, ",", 6]
Right now my data frame looks like this:
Behaviour|Interaction|IDs
P |S |15L,33L,38L
D |N |43L,17L
D |N |9L,10L
I'm trying to split up the IDs column while also not creating an issue with NAs. I want to separate them individually in order to figure out each unique variable out of 52 instead of what is currently out of 403.
*Edit: Turning the last column into multiple rows would also work, but I have no idea how to do that. It would potentially look something like:
Behaviour|Interaction|IDs
P |S |15L
P |S |33L
P |S |38L
D |N |43L
D |N |17L
D |N |9L
D |N |10L
I figured out how to do it and wanted to post in case someone else stumbles upon this.
#Separate IDs column into individual rows with corresponding Behavior and Interaction variables
NewSet1_IDs <- separate_rows(NewSet1, IDs)
#Remove rows caused by blank space at the end of IDs in original data set
NewSet1_IDs <- NewSet1_IDs[!(NewSet1_IDs$IDs == ""),]
This gives the desired result.

Include missing dates with missing values with libreoffice-calc

I searched a lot, but didn't find an answer to the following question:
Financial data often come as daily data but with missing dates (weekends, banking holidays ...). I would like to have those data really on a daily basis with missing values, where originally the dates were missing.
So far I did this in liberoffice-calc half-manually, which takes a lot of time. I didn't find ways to really automate this, as there is no fixed rule, which dates are missing.
Example:
I have:
21/12/18 1
27/12/18 2
28/12/18 3
02/01/19 4
I want:
21/12/18 1
22/12/18
23/12/18
24/12/18
25/12/18
26/12/18
27/12/18 2
28/12/18 3
29/12/18
30/12/18
31/12/18
01/01/19
02/01/19 4
I'm not familiar with liberoffice-calc. In Excel or Google Sheets, I would use a lookup table.
In one tab of the spreadsheet, I was enter the dates I want in column A. In another tab, I would place the actual data I have. Then, in column B of the first tab, I would lookup the value for that day from the data on the second tab.
Assuming 1 is in B2, put a start date in say D2 and in E2:
=IFERROR(INDEX(B:B,MATCH(D2,A:A,0)),"")
then copy both down to suit.

Split Words in the comment text

I am trying to write a macro which will split the comment. My supervisor wants to prioritize the comments, e.g.:
Low : Comment 1
Medium : Comment 2
High: Comment 3
The output should be displayed in Excel with the following headings.
I was able to write a macro to export comments from Word to an Excel file, however I am struggling to add this code snippet to split the comment text.
Comment ID |Page| Section/Paragraph Name |Comment Scope |Comment text |Priority |Reviewer |Comment Date|
1 |1| 1.1heading1| example| heading| Comment 1 |Low |BlueDolphin |1/1/1|
2| 2| 1.2heading |example2 |Comment 2 |Medium| BlueDolphin| 1/1/1|
3| 3 |1.3heading| 3example3|Comment 3 |High |BlueDolphin |1/1/1|
Any help is much appreciated.
This looks pretty straightforward. Just turn on the Macro Recorder, select the cells of interest, click Data > Text to Columns, and follow the prompts. Turn off the Macro Recorder when done. That's it.

Libreoffice Calc Finding MAX from a subset of results

I have a Libreoffice Calc workbook for tracking writing, with 3 sheets in it. 'Time Tracking', 'Time Summary' and 'Yearly Stats'. 'Time Tracking' is where user data is entered, 'Time Summary' is a pivot table for 'Time Tracking'; and 'Yearly Stats' shows long-term progress.
Time Summary (running off some test data) looks a bit like this:
|Column A (Weeks) | ... |Column M (Total Words)
-------+-----------------------+-----+----------------------
Row 7 |02/10/17 - 08/10/17 | |3500
Row 8 |13/11/17 - 19/11/17 | |2300
Row 9 |30/04/18 - 06/05/18 | |1000
Row 10 |30/10/17 - 05/11/17 | |700
Yearly Stats looks like this:
|A |B |C
-------+--------------------+--------+----
Row 1 | |2017 |2018
Row 2 |Total Words |6500 |1000
...
Row 7 |Max Words (Week) |3500 |3500
The formula for 'Yearly Stats'.B7:C7 is currently =MAX($'Time Summary'.$M$7:$M$10), but I need to modify it to filter by the year on the column heading.
https://ask.libreoffice.org/en/question/62260/minif-and-maxif-function-in-calc/ looked to be useful, but when I tried it, the MAX from the formula was returning the MAX of ROW - being 10 - rather than ROW returning the position of the MAX value - even though it seems to work in the example file from the link.
The example formula is:
=IFERROR(INDEX($Sheet1.$J$2:$J$13,MAX(ROW($Sheet1.$J$2:$J$13)*($Sheet1.$A$2:$A$13=A2))-1,1),NA())
My formula uses RIGHT() to compare the last two characters of the column heading with the last two chars of the week in $'Time Summary':$A$7:$A$10 and is:
=IFERROR(INDEX($'Time Summary'.$M$7:$M$10,MAX(ROW('Time Summary'.$M$7:$M$10)*(RIGHT($'Time Summary'.$A7:$A$10,2)=RIGHT(B1,2)))-6,1),NA())
I have, of course, remembered to press CTRL+SHIFT+ENTER as the instructions say, to get the array in the formula to work.
So that's the explanation of my problem. What is it that I'm getting wrong?
Ok, this is a bit long-winded, but I've managed to solve the problem by using the following formula:
=IF(MAX(IF(RIGHT(INDIRECT(CONCATENATE("$'Time Summary'.$A7:$A$",COUNTIF($'Time Summary'.$A:$A,"<>''")+2)),2)=RIGHT(B1,2),INDIRECT(CONCATENATE("$'Time Summary'.$Q$",ROW(INDIRECT(CONCATENATE("$'Time Summary'.$Q7:$Q$",COUNTIF($'Time Summary'.$Q:$Q,"<>''")+5))))),0))>0,MAX(IF(RIGHT(INDIRECT(CONCATENATE("$'Time Summary'.$A$7:$A$",COUNTIF($'Time Summary'.$A:$A,"<>''")+2)),2)=RIGHT(B1,2),INDIRECT(CONCATENATE("'Time Summary'.$Q",ROW(INDIRECT(CONCATENATE("$'Time Summary'.$Q$7:$Q$",COUNTIF($'Time Summary'.$Q:$Q,"<>''")+5))))),0)),NA())
It is wrapped in an IF that replaces any 0 result with '#NA' (just for neatness of output).
Also the right half of the ranges specified make use of a calculation to figure out where the bottom row is, leaving out the total, so that's another reason it's so huge.