VBA:: intersect vs. match method - vba

I had a question pertaining to the two built in VBA function of .Match and .Intersect. Currently I have 2 1-dimensional arrays that I wish to consolidate information into a new array. I realize I've posted a question about the approach to the problem earlier but this question pertains to which method would be better. Would one way be able to consolidate information into a new array faster than the other? and is one method more reliable than the other as well?

From Excel help
Excel Developer Reference
Application.Intersect Method
Returns a Range object that represents the rectangular intersection of two or more ranges.
Arrays are not ranges, so interset is not applicable to your question as stated.
A more detailed explanation of what you are trying to do, and what form your raw data is in will allow better advice

If you are merging your two arrays in vba, then the .Match function and the .Intersect do not behave the same way because, you won't be able to merge with a Match function, you will only be able to find a value.
Hence, i would say, use intersect method.
If you want a more precise answer, please tell us more precisely what you want to do with your arrays with examples and the code you already built.
Regards,
Max

Intersect is a method for finding the intersection of one or more ranges: it won't work with arrays. It returns the subset range that is the intersection of the range arguments.
Unless your arrays are sorted it would probably be more efficient to just loop compare the arrays than use .MATCH

Related

performing intersections between lists of numbers in vba

I'm new with vba (i know only R) and I'd need to perform the operation below but i don't know how to do it:
I need to create two distinct lists of numbers, for example: A={2,4,6,8,9} and B={4,6,9} and then I need to create a third variable that contains only the numbers of A that are not in B: C={2,8}.
How can i do that?
Thank you
VBA does not provide a method or structure for doing that. You will need to loop through the collections and test each value against every other value.
You may use a dictionary object or an array (or a spreadsheet) to hold the sets. You may use nested FOR - NEXT loops to loop through the sets.
VBA is a simple and powerful language. A prime reason people use R and python, in spite of their complexities and difficulties, is that VBA is very old language that does not include native methods of handling multi-dimensional sets.

matching two columns in excel with slight difference in the spelling

I am working on huge excel sheets from different sources about the same thing. The way the sources report it and write down information is different. So, for example, one would write the location as "Khurais" whereas the other would write it as "Khorais".
Since both of these files are contain important information, I would like to combine them in one excel sheet so that I can deal with them more easily. So if you have any suggestion or tool that you think would be beneficial, please share it here.
P.s. The words in the excel sheet are translations of Arabic words.
You could use Levenshtein distance to determine if two words are "close" to each other. Based on that you could match.
You could use FuzzyLookup, a macro that allows you to do appropriate matching. It worked really well for me in the past and is actually really well documented.
You can find it here: https://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html including examples on how to use it.
Hope that helps!
PS obviously you can also use it stricly within VBA (not using worksheet functions)
The Double Metaphone algorithm springs to mind. It attempts to convert strings into phonetic representations. For example, "Folly" and "Pholee" should have the same phonetic code.
If you could generate these codes, you could then match your records based on them, instead of the strings.
Here's an article that explains, along with sample VBA code:
https://bytes.com/topic/access/insights/965241-fuzzy-string-matching-double-metaphone-algorithm
Hope that inspires you :)

Excel or Numbers, How Populate adjacent Column?

So I feel like this is a pretty simple question, but I cannot for the life of my find the answer, here or elsewhere.
I'm trying to autopopulate a column with custom text. I suppose it would be the row adjacent.
Thought vlookup was the solution, but I'm rusty.
Basically it's financial, if the Description contains, say, "Amazon" or "Subway" I'd like to populate the adjacent cell with "Amazon" or "Online Shopping" or "Subway" or Fast food.
I'm using numbers but assume that excel advice would apply for such a simple (seemingly) task.
Make sense?
Also, hope I formatted the image correctly.
Ok thanks!
Just looking at the sample data I can see a pattern that emerges from these transactions. However, My first thought would be to jump to VBA for Excel but I don't believe that is available for Mac OS.
Vlookup will only work with the Range_Lookup set to TRUE which means it will try to find the closest match. This might lead to incorrect matches returned or problems with the requirement for sorting your table array that is being queried.
The only other thing that came to mind which would work for a single query value such as "Amazon" OR "Subway" would be to use a nested formula that checks if that substring is found in the Description column for each cell. This would be something like:
=IF(FIND("Amazon",D1)>0,"Amazon","")
The problem with this is that it only checks for one value and it does not have an error handling mechanism so each string that is checked without the word "Amazon" in it will return a #Value error in Excel.

Naming array dimensions in Excel VBA

I'm working with threedimensional arrays and it would be neat if I could name the array dimensions. The question marks in the example below are giving me the idea that this is possible.
Is it, and if so, how does it work? I can't seem to find it anywhere.
The three question marks are showing you that this array has three dimensions. If there was only one question mark, it would mean that the variable was declared as one dimensional. This is built in to VB and can't be change, as far as I know.
I think there's real value into making your code more readable and self-documenting. If I had a three dim array, I would probably create some custom class modules to model the objects that I was using.
If your first dimension is a SchoolID, your second dimension is a ClassID, and your third dimension is a StudentID, then code using custom class modules like this
Debug.Print Schools(10).Classes(3).Students(7).Name
is more readable than
Debug.Print arrLeeftijdenG5(10,3,7)
I don't know what you're storing, so it's just an example. Consider using custom class module to model the real-world objects your code is manipulating. There's a bit more set up involved, but it pays dividends down the road.

How to invoke UDF for each element in an array in hive?

I have a hive table with one column being an array of strings. I also have a set of custom UDFs that manipulate individual strings. I would like to make hive execute my custom UDF on each element in an array and then return the result as a modified array.
This seems like a simple requirement, but I wasn't able to find a simple solution for it. I found two possibilities, none of them being simple really:
Do a hive SQL gymnastic with explode and lateral view, then invoke UDF, then aggregate back into array. This seems way too big overkill as I don't see it executing in less than 2 mapreduce jobs (but I could be wrong here).
Implement each of my UDFs as GenericUDF that, is supplied with an array, processes each element in it and returns an array again. This requires a lot more development.
Is there any simple way to do this?
There's no way I know of to do it without either more custom UDF code, or as you say, requiring more MR jobs.
But I would suggest a possible third option - write a GenericUDF that takes two arguments: an array and the class name of another UDF. Instantiate and call the UDF through reflection, pass it everything in the array, and return the resulting array. This might be a bit difficult to write, but at least then you won't have to rewrite all of your existing UDFs, as you mentioned.