I know there is no way to thread or anything fancy like that with VBA but when I was recently running a find duplicates macro in VBA on about 60000 lines of excel data I opened task manager performance to find that all 4 cores of my processor were under heavier than normal usage. One core was at about 80% and the other 3 were at about 60%.
This is rather high being as of right now none of them are going over 15%.
So my question is how does my sequential VBA macro get divided into 4 parrallel parts that tax each processor core?
Related
Assume I have a CPU running at a constant rate, pulling an equal amount of energy per instruction. I also have two functionally identical programs, which result in the same output, except one has been optimized to execute only 100 instructions, while the other program executes 200 instructions. Is the 100 instruction program necessarily faster than the 200 instruction program? Does a program with fewer instructions draw less power than a program with more instructions?
Things are much more complex than this.
For example execution speed is in many cases dominated by memory. As a practical example some code could process the pixels of an image first in rows and then in columns... a different code instead could be more complex but processing rows and columns at the same time.
The second version could execute more instructions because of more complex housekeeping of the data but I wouldn't be surprised if it was faster because of how memory is organized: reading an image one column at a time is going to "trash the cache" and it's very possible that despite being simple the code working that way could be a LOT slower than the more complex one doing the processing in a memory-friendly way. The simpler code may end up being "stalled" a lot waiting for the cache lines to be filled or flushed to the external memory.
This is just an example, but in reality what happens inside a CPU when code is executed is for many powerful processors today a very very complex process: instructions are exploded in micro-instructions, registers are renamed, there is speculative execution of parts of code depending on what branch predictors guess even before the program counter really reaches a certain instruction and so on. Today the only way to know for sure if something is faster or slower is in many cases just trying with real data and measure.
Is the 100 instruction program necessarily faster than the 200 instruction program?
No. Firstly, on some architectures (such as x86) different instructions can take a different number of cycles. Secondly, there are effects — such cache misses, page faults and branch mispreditictions — that complicate the picture further.
From this it follows that the answer to your headline question is "not necessarily".
Further reading.
I found a paper from 2017 comparing the energy usage, speed, and memory consumption of various programming languages. There is an obvious positive correlation between faster languages also using less energy.
BACKGROUND:
Every few weeks I need to read about ten thousand URLs for search engine optimization purposes (on sites I own/manage). In the future the total URLs will be multiplied by 20 to support other language versions of my site.
When I use rebol, it takes about a second per URL to read and process, around 3 hours total. To reduce the time-to-completion I want to split the job into a number of smaller batches that can execute simultaneously on multiple (local) interpreters.
Current thinking is that my script will write a number of .r files which, when launched in another process will process a subset of the URL list.
QUESTION:
I'm wondering if there are any tips or warnings/restrictions for launching interpreter processes in this manner. For http reads, I expect to launch fewer than 10 interpreters.
Happy to share my script and insights as I learn more.
I have a program producing a lot of data, which it writes to a csv file line by line (as the data is created). If I were able to open the csv file in excel it would be about 1 billion cells (75,000*14,600). I get the System.OutOfMemoryException thrown every time I try and access it (or even create an array this size). If anyone has any idea how to can take the data into vb.net so I can do some simple operations (all data needs to be available at once) then I'll try every idea you have.
I've looked at increasing the amount of ram used but other articles/posts say this will run short way before the 1 billion mark. There's no issues with time here, assuming it's no more than a few days/weeks I can deal with it (I'll only be running it once or twice a year). If you don't know anyway to do it the only other solutions I can think of would be increasing the number of columns in excel to ~75,000 (if that's possible - can't write the data the other way around), or I suppose if there's another language that could handle this?
At present it fails right at the start:
Dim bigmatrix(75000, 14600) As Double
Many thanks,
Fraser :)
First, this will always require a 64bit operating system and a fairly large amount of RAM, as you're trying to allocate about 8 GB.
This is theoretically possible in Visual Basic targeting .NET 4.5 if you turn on gcAllowVeryLargeObjects. That being said, I would recommend using a jagged array instead of a multidimensional array if possible, as this will remove the requirement of needing a single allocation of 8GB. (This will also potentially allow it to work in .NET 4 or earlier.)
I'm doing a Excel 2007 VSTO template where I load a worksheet with between 10.000 to 15.000 rows of data, and was counting on using the built-in excel cell auto-complete to speed future data entry. Nevertheless I noticed the auto-complete is very slow, it takes a couple of minutes for excel to, after starting to edit a cell, be able to finish the "internal scan" of all the upper cells in other to be able to find a match. Is there anything I can do to speed it up?
Contacted MS support and they said it was by design and could only be solved by a change request to the product team which would take a log time.
Looking for some help with a Labview data collection program. If I could collect 2ms of data at 8kHz (gives 16 data points) per channel (I am collecting data on 4 analog channels with an National Instruments data acquisition board). The DAQ-MX collection task gives a 1D array of 4 waveforms.
If I don't display the data I can do all my computation time is about 2ms and it is OK if the processing loop lags a little behind the collection loop. Updating the chart in Labview's front panel introduces an unacceptable delay. We don't need to update the display very quickly probably at 5-10Hz would be sufficient. But I don't know how to set this up.
My current Labview VI has three parallel loops
A timed-loop for data collection
A loop for analysis and processing
A low priority loop for caching data to disk as a TDMS file
Data is passed from the collection loop to the other loops using a queue. Labview examples gave me some ideas but I am stuck.
Any suggestions, references, ideas would be appreciated.
Thanks
Azim
Follow Up Question
eaolson suggests that I re-sample the data for display purposes. The data coming from the DAQ-MX read is a one dimensional array of waveforms. So I would need to somehow build or concatenate the waveform data for each channel. And then re-sample the data before updating the front panel chart. I suppose the best approach would be to queue the data and in a display loop dequeue the stack build and re-sample the data based on screen resolution and then update the chart. Would there be any other approach. I will look on
(NI Labview Forum)[http://forums.ni.com/ni/board?board.id=170] for more information as suggetsted by eaolson.
Updates
changed acceptable update rate for graphs to 5-10Hz (thanks Underflow and eaolson)
disk cache loop is a low priority one (thanks eaolson)
Thanks for all the responses.
Your overall architecture description sounds solid, but... getting to 30Hz for any non-trivial graph is going to be challenging. Make sure you really need that rate before trying to make it happen. Optimizing to that level might take some time.
References that should be helpful:
You can defer panel updates. This keeps the front panel from refreshing until you're ready for it to do so, allowing you to buffer data in the background, and only draw it occasionally.
You should know about (a)synchronous display. This option allows some control over display rates.
There is some general advice available about speeding execution.
There is a (somewhat dated) report on execution speed on the LAVA forums. Googling around the LAVA forums is a great idea if you need to optimize your speed.
Television updates at about 30 Hz. Any more than that is faster than the human eye can see. 30 Hz should be at the maximum update rate you should consider for a display, not the starting point. Consider an update rate of 5-10 Hz.
LabVIEW charts append the most recent data to the historical data they store and display all the data at once. At 8 kHz, you're acquiring at least 8000 data points per channel per second. That means the array backing that graph has to continuously be resized to hold the new data. Also, even if your graph is 1000 pixels across, that means you're displaying 8 data points per screen pixel. There's not usually any reason to display any more than one data point per pixel. If you really need fast update rates, plot less data. Create an array to hold the historical data and plot only every Nth data point, where N is chosen so you're plotting, say, only a few hundred points.
Remember that your loops can run at different rates. It may be satisfactory to run the write-to-disk loop at a much lower frequency than the data collection rate, maybe every couple of seconds.
Avoid property nodes if you can. They run in the UI thread, which is slower than most other execution.
Other than that, it's really hard to offer a lot of substantial advice without seeing code or more specifics. Consider also asking your question at the NI LabVIEW forums. There are a lot of helpful people there.