VBA: Efficient Vlookup from another Workbook - vba

I need to do a Vlookup from another workbook on about 400000 cells with Vba. These cells are all in one Column.And shall be written into one Column. I know already , how the Vlookup Works, but my runtime is much to high by using autofill. Do you have an Suggestion how i can approve it?

Dont use VLookup use Index Match: http://www.randomwok.com/excel/how-to-use-index-match/

If you are able to adjust what the data looks like a slight amount, you may be interested in using a binary search. Its been a while since I last used one (writing a code for group exercise check-in program). https://www.khanacademy.org/computing/computer-science/algorithms/binary-search/a/implementing-binary-search-of-an-array , was helpful in setting up the idea behind it.
If you are able to sort them in an order, say by last name (im not sure of what data you are working with) then add an order of numbers to use for the binary search.
Edit:
The reasoning for a binary search would be that with a binary search is that the computational time it takes. The amount of iterations it would take is log2(400000) vs 400000. So instead of 400000 possible iterations, it would take at most 19 times with a binary search, as you can see with the more data you use the binary search would yield much quicker times.
This would only be a beneficial way if you are able to manipulate the data in such a way that would allow you to use a binary search.
So, if you can give us a bit more background on what data you are using and any restrictions you have with that data we would be able to give more constructive feedback.

Related

VBA - what is faster? Case statements or If Statements

I am writing some code for a very large spreadsheet. Where certain things need to go in specific places, and I am wondering what sort of statement would be faster, and what are the pro's and con's of each statement.
Here is my dilemma -
There is a massive table, which could be ranging from column A to column CD, or some other column making it disgusting to look at in one go.
I am trying to create a "lastrow" variable (Integer) to describe the bottom row of data.
At the moment I have a very inefficient if loop, reading up from the bottom -".Rows.Count)End.Up(xlUp).Row", sort of thing
and it currently re-writes the lastrow variable if it is a larger number. I just want to find a quick and easy way to find the lowest row number.
Is looping the way forward with this?
Kind Regards,
lewisthegruffalo

Pig: how to loop through all fields/columns?

I'm new to Pig. I need to do some calculation for all fields/columns in a table. However, I can't find a way to do it by searching online. It would be great if someone here can give some help!
For example: I have a table with 100 fields/columns, most of them are numeric. I need to find the average of each field/column, is there an elegant way to do it without repeat AVERAGE(column_xxx) for 100 times?
If there's just one or two columns, then I can do
B = group A by ALL;
C = foreach B generate AVERAGE(column_1), AVERAGE(columkn_2);
However, if there's 100 fields, it's really tedious to repeatedly write AVERAGE for 100 times and it's easy to have errors.
One way I can think of is embed Pig in Python and use Python to generate a string like that and put into compile. However, that still sounds weird even if it works.
Thank you in advance for help!
I don't think there is a nice way to do this with pig. However, this should work well enough and can be done in 5 minutes:
Describe the table (or alias) in question
Copy the output, and reorgaize it manually into the script part you need (for example with excel)
Finish and store the script
If you need to be able with columns that can suddenly change etc. there is probably no good way to do it in pig. Perhaps you could read it in all columns (in R for example) and do your operation there.

What do I need to do in order to structure these results?

I'm trying to learn VBA, but it's very different from the type of programming I'm used to. For that reason I would appreciate your guidance.
I want to structure the results I get from simulations. There are screenshots below illustrating what I'm trying to describe with words here:
What I want to do is:
Copy all the results from one sheet to a new sheet (to keep the original data).
Delete certain columns, for instance B & D:E
Move (or copy, doesn't matter) rows 30:38 up besides rows 11:19, with one empty column in between. The result will be as shown in the last figure below. (The number of rows in each block varies, and there are 4 blocks).
I don't know if these are the recommended procedures, but I know I can:
Delete columns this way:
Sub sbDeleteAColumnMulti()
Columns("D").Delete
End Sub
Copy/paste a range like this:
Sub copyRangeOver()
Dim i As Integer
i = 6
Dim copyRange As Range
Set copyRange = ThisWorkbook.Worksheets("Sheet2").Range("A" & i + 1 & ":CA" & i + 1)
Dim countD As Integer
countD = 10
copyRange.Copy Destination:=Cells(countD, 2)
End Sub
A few things that is complicating stuff for me: The headers (except the first one): Bus ( A ) -LL Fault, are shifted one column to the right (they are now above Ik'', not Bus Name).
I don't know in advance how many rows are in each "block", thus I need to check this (but I know there are only 4 "blocks"). All "blocks" are the same size, so I can just check the number of rows between two Bus Names.
Now, I don't want someone to write me a code! What I hope someone will help me with is to suggest a procedure I can follow for this to work. I know that many roads lead to Rome, and I see that this question might come of as either a "Primarily opinion-based question", or "Too broad". However, I think it's a legitimate question that belongs here. I'm not trying to start a debate over what the "best" way of doing this is, I just want to find a way that works, as I currently don't know where to start. I'm not afraid of "the complicated way", if it's more robust and cleaner.
What I don't know is what kind of Modules, Class Modules (if any) etc I need. Do I need Collections, create Public/Private subs? What would be the purpose of each of those be in this case?
What I start with: (Edit: none of the cells are merged, it's just a bunch of whitespaces)
What I want:
Update:
Here's the first chunk of code I get when recording a macro (note that my workbook has more columns and rows than in the example I gave):
Range("D:I,K:M,O:P").Select
Range("O1").Activate
Selection.Delete Shift:=xlToLeft
ActiveWindow.SmallScroll Down:=39
Range("C52:E78").Select
Selection.Copy
ActiveWindow.SmallScroll Down:=-42
Range("G13").Select
ActiveSheet.Paste
ActiveWindow.SmallScroll Down:=84
Range("C91:E117").Select
To me, this looks like a piece of crap. Of course, it's possible that I should have created the macro differently, but if I did it the right way, I don't think it's much to work with. I guess I can delete all the SmallScroll-lines, but still...
Also, I don't see how I can adapt this, so that it will work if I have a different number of rows in each block.
To get this, you're going to want to start with using the Macro Recorder from Excel.
If you are doing the exact same formatting options for the exact same data output each time, this is by far your best bet. The recorder will copy whatever you do for formatting and write the code you need. It may not be the best code but it will by far be the best option for what you are describing.
If (when?) you need to start adding logic other than the same formatting, you will then have functional code which will make your life easier.
But isn't the macro recorder going to generate bad code and/or it's better to just code from scratch?
I'm fairly experienced at this point and often use the macro recorder because... while it does put a lot of code there which isn't strictly speaking necessary, it gets you a ton of the more obscure stuff (how do I format the cell border to be this way?) etc. Of course it's better to not only use the recorder, but for your example it's even more perfect, you get all the formatting recorded and then can modify the logic and not have to waste time figuring out syntax for formatting, deleting columns, etc.
Very few languages offer the ability to basically say, "I want to do what I am doing now programmatically - how can I start?" the way VBA does. You can bypass a lot of annoying syntax issues when learning (especially if you've previously done any sort of coding) and focus right on the logic you want to add. It works out pretty well, honestly.

VBA: Performance of multidimensional List, Array, Collection or Dictionary

I'm currently writing code to combine two worksheets containing different versions of data.
Hereby I first want to sort both via a Key Column, combine 'em and subsequently mark changes between the versions in the output worksheet.
As the data amounts to already several 10000 lines and might some day exceed the lines-per-worksheet limit of excel, I want these calculations to run outside of a worksheet. Also it should perform better.
Currently I'm thinking of a Quicksort of first and second data and then comparing the data sets per key/line. Using the result of the comparison to subsequently format the cells accordingly.
Question
I'd just love to know, whether I should use:
List OR Array OR Collection OR Dictionary
OF Lists OR Arrays OR Collections OR Dictionaries
I have as of now been unable to determine the differences in codability and performance between this 16 possibilities. Currently I'm implementing an Array OF Arrays approach, constantly wondering whether this makes sense at all?
Thanks in advance, appreciate your input and wisdom!
Some time ago, I had the same problem with the macro of a client. Additionally to the really big number of rows (over 50000 and growing), it had the problem of being tremendously slow from certain row number (around 5000) when a "standard approach" was taken, that is, the inputs for the calculations on each row were read from the same worksheet (a couple of rows above); this process of reading and writing was what made the process slower and slower (apparently, Excel starts from row 1 and the lower is the row, the longer it takes to reach there).
I improved this situation by relying on two different solutions: firstly, setting a maximum number of rows per worksheet, once reached, a new worksheet was created and the reading/writing continued there (from the first rows). The other change was moving the reading/writing in Excel to reading from temporary .txt files and writing to Excel (all the lines were read right at the start to populate the files). These two modifications improved the speed a lot (from half an hour to a couple of minutes).
Regarding your question, I wouldn't rely too much on arrays with a macro (although I am not sure about how much information contains each of these 10000 lines); but I guess that this is a personal decision. I don't like collections too much because of being less efficient than arrays; and same thing for dictionaries.
I hope that this "short" comment will be of any help.

Model clause in Oracle

I am recently inclined towards in Oracle jargon and the more I am looking into the more is attracting me.
I have recently come across the MODEL clause but to be honest I am not understanding the behaviour of this. Can any one with some examples please let me know about the same.
Thanks in advance
Some examples of MODEL are given here.
Personally I've looked at MODEL several times and not yet succeeded in finding a use case for it. While it first appears to be useful, there's a lot of places where only literals work (rather than binds or variables) which restrict its flexibility. For example, on inter-row calculations, you can't readily refer to the 'previous' or 'next' row, but have to be able to absolutely identify it by its attributes. So you can't say 'take the value of the row with the same date in the previous month' but can only code a specific date.
It might be used (internally) by some analytical tools. But as an end-user tool, I never 'got' it. I have previously recommended that, if you ever find a problem you think can be solved by the application of the MODEL clause, go and have a lie down until the feeling wears off.
I think the MODEL clause is quite simple to understand, when you slowly read the official whitepaper. In my opinion, the whitepaper nicely explains the MODEL clause step by step, adding one feature at a time to the examples, leaving out the most advanced features to the official documentation.
From the whitepaper, I also find it easy to understand when to actually use the MODEL clause. In some examples, it is a lot simpler to do "Excel-spreadsheet-like" operations using MODEL rather than, for instance, using window functions, CONNECT BY, or subquery factoring. Think about Excel. Whenever you want to define a complex rule set for Excel columns, use the MODEL clause. Example Excel spreadsheet rules:
A10 = A9 + A8
B10 = A10 * 5
C10 = MAX(A1:A9)
D10 = C10 / A10
In other words, MODEL is a very powerful SQL spreadsheet!
The best explanation is in the official white paper. It uses the SH demo schema and you really need it installed.
http://www.oracle.com/technetwork/middleware/bi-foundation/10gr1-twp-bi-dw-sqlmodel-131067.pdf
I don't think they do a very good job explaining this. It basically lets you load up data into an array and and then loop through array using straight SQL, instead of having to write procedural logic. Alot of the terms are based on spreadsheet terms (they are used in the Excel Help). So if you have them in excel, this would be confusing.
They should have drawn a picture for each of the queries and shown the array created than shown how you look through the array. The syntax looks to be based on Excel syntax. I'm not sure if this is common to all spreadsheet tools or not.
It has uses. Bin fitting is the most common. See the 2nd example. This is basically a complex group by where you are grouping by a range, but that range can change. It requires procedural logic. The example gives 3 ways to do it one of which is the model clause.
http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22asktom-1518271.html
I think people (often managers) who do complex spreadsheet calculations may have an easier time seeing uses for this and getting the lingo.