Passing data between different speed loops in LabVIEW

Passing data between different speed loops in LabVIEW - labview

I have a (desktop) LabVIEW program running several large While loops. Each loop corresponds to the functions on an IO card in a myRIO DAQ system. Each card operates at a different speed, therefore each loop and subVIs in my code run at different speeds as well.
However, I'm now finding that I need to pass data from a low speed loop to a high speed loop, and I'm not sure how to best go about it.
The low speed loop actually connects via TCP to a Yokogowa power analyzer, and the loop speed is 50ms (20Hz). The high speed loop runs at 50kHz, and performs math operations using inputs from a high speed ADC to calculate motor torque, and needs the info from the low speed loop (power analyzer) to proceed. There's an 816:1 data flow difference.
At runtime, it appears to work fine, until I spin the motor up, then the overtorque routine kicks in and shuts me down.
So I next tried to queue the data, and that only significantly slowed the high speed loop.
That being said, my thought was to take the incoming data on the low speed loop, and fill an array with that data (816 deep) and queue it up to the high speed loop, but I'm not quite certain how to go about that as well.
How should I accomplish what I'm trying to do in a more efficient and proper manner?

Look to the Real-Time FIFO palette. The functions here create and operate a lockless-FIFO system explicitly designed for passing data in a deterministic way between loops. Used correctly, they guarantee that the slower loop, trying to write data, will not lock the FFO in a way that throws the faster loop off of its schedule.
You can find a simple example of the RT FIFO code here. You'll find more in the LabVIEW shipping examples.

If the high speed loop is running faster then it only really needs the latest value so you need a variable/tag type communication.
Depending on what you are aware of already there are a few options:
Local/Global Variable
Functional Global Variable (but globals are faster)
Notifier (if you use get staus you can read this like a variable.
I would pick one you are comfortable with and try that.

Related

Is faster code also more power efficient?

Assume I have a CPU running at a constant rate, pulling an equal amount of energy per instruction. I also have two functionally identical programs, which result in the same output, except one has been optimized to execute only 100 instructions, while the other program executes 200 instructions. Is the 100 instruction program necessarily faster than the 200 instruction program? Does a program with fewer instructions draw less power than a program with more instructions?

Things are much more complex than this.
For example execution speed is in many cases dominated by memory. As a practical example some code could process the pixels of an image first in rows and then in columns... a different code instead could be more complex but processing rows and columns at the same time.
The second version could execute more instructions because of more complex housekeeping of the data but I wouldn't be surprised if it was faster because of how memory is organized: reading an image one column at a time is going to "trash the cache" and it's very possible that despite being simple the code working that way could be a LOT slower than the more complex one doing the processing in a memory-friendly way. The simpler code may end up being "stalled" a lot waiting for the cache lines to be filled or flushed to the external memory.
This is just an example, but in reality what happens inside a CPU when code is executed is for many powerful processors today a very very complex process: instructions are exploded in micro-instructions, registers are renamed, there is speculative execution of parts of code depending on what branch predictors guess even before the program counter really reaches a certain instruction and so on. Today the only way to know for sure if something is faster or slower is in many cases just trying with real data and measure.

Is the 100 instruction program necessarily faster than the 200 instruction program?
No. Firstly, on some architectures (such as x86) different instructions can take a different number of cycles. Secondly, there are effects — such cache misses, page faults and branch mispreditictions — that complicate the picture further.
From this it follows that the answer to your headline question is "not necessarily".
Further reading.

I found a paper from 2017 comparing the energy usage, speed, and memory consumption of various programming languages. There is an obvious positive correlation between faster languages also using less energy.

Does optimizing code in TI-BASIC actually make a difference?

I know in TI-BASIC, the convention is to optimize obsessively and to save as many bits as possible (which is pretty fun, I admit).
For example,
DelVar Z
Prompt X
If X=0
Then
Disp "X is zero"
End //28 bytes
would be cleaned up as
DelVar ZPrompt X
If not(X
"X is zero //20 bytes
But does optimizing code this way actually make a difference? Does it noticeably run faster or save memory?

Yes. Optimizing your TI-Basic code makes a difference, and that difference is much larger than you would find for most programming languages.
In my opinion, the most important optimization to TI-Basic programs is size (making them as small as possible). This is important to me since I have dozens of programs on my calculator, which only has 24 kB of user-accessible RAM. In this case, it isn't really necessary to spend lots of time trying to save a few bytes of space; instead, I simply advise learning the shortest and most efficient ways to do things, so that when you write programs, they will naturally tend to be small.
Additionally, TI-Basic programs should be optimized for speed. Examples off of the top of my head include the quirk with the unclosed For( loop, calculating a value once instead of calculating it in every iteration of a loop (if possible), and using quickly-accessed variables such as Ans and the finance variables whenever the variable must be accessed a large number of times (e.g. 1000+).
A third possible optimization is for run-time memory usage. Every loop, function call, etc. has an overhead that must be stored in the memory stack in order to return to the original location, calculate values, etc. during the program's execution. It is important to avoid memory leaks (such as breaking out of a loop with Goto).
It is up to you to decide how you balance these optimizations. I prefer to:
First and foremost, guarantee that there are no memory leaks or incorrectly nested loops in my program.
Take advantage of any size optimizations that have little or no impact on the program's speed.
Consider speed optimizations, and decide if the added speed is worth the increase in program size.

TI-BASIC is an interpreted language, which usually means there is a huge overhead on every single operation.
The way an interpreted language works is that instead of actually compiling the program into code that runs on the CPU directly, each operation is a function call to the interpreter that look at what needs to be done and then calls functions to complete those sub tasks. In most cases, the overhead is a factor or two in speed, and often also in stack memory usage. However, the memory for non-stack is usually the same.
In your above example you are doing the exact same number of operations, which should mean that they run exactly as fast. What you should optimize are things like i = i + 1, which is 4 operations into i++ which is 2 operations. (as an example, TI-BASIC doesn't support ++ operator).
This does not mean that all operations take the exact same time, internally a operation may be calling hundreds of other functions or it may be as simple as updating a single variable. The programmers of the interpreter may also have implemented various peephole optimizations that optimizes very specific language constructs, e.g. for(int i = 0; i < count; i++) could both be implemented as a collection of expensive interpreter functions that behave as if i is generic, or it could be optimized to a compiled loop where it just had to update the variable i and reevaluate the count.
Now, not all interpreted languages are doomed to this pale existence. For example, JavaScript used to be one, but these days all major js engines JIT compile the code to run directly on the CPU.
UPDATE: Clarified that not all operations are created equal.

Absolutely, it makes a difference. I wrote a full-scale color RPG for the TI-84+CSE, and let me tell you, without optimizing any of my code, the game would flat out not run. At present, on the CSE, Sorcery of Uvutu can only run if every other program is archived and all other memory is out of RAM. The programs and data storage alone takes up 20k bytes in RAM, or just 1kb under all of available user memory. With all the variables in use, the memory approaches dangerously low points. I had points in my development where due to poor optimizations, I couldn't even start the game without getting a "memory all gone" error. I had plans to implement various extra things, but due to space and speed concerns, it was impossible to do so. That's only the consideration to space.
In the speed department, the game became, and still is, slow in the overworld. Walking around in the overworld is painfully slow compared to other games, and that's because of what I have to do in that code; I have to check for collisions, check if the user is moving to a new map, check if they pressed a key that should illicit a response, check if a battle should go on, and more. I was able to make slight optimizations to the walking speed, but even then, I could blatantly tell I had made improvements. It still was pretty awfully slow (at least compared to every other port I've made), but I made it a little more tolerable.
In summary, through my own experiences crafting a large project, I can say that in TI-Basic, optimizing code does make a difference. Other answers mentioned this, but TI-Basic is an interpreted language. This means the code isn't compiled into faster, lower level code, but the stuff that you put in the program is read straight out as it executes, is interpreted by the interpreter, calls the subroutines and other stuff it needs to to execute the commands, and then returns back to read the next line. As a result of that, and the fact that the TI-84+ series CPU, the Zilog Z80, was designed in 1976, you get a rather slow interpreter, especially for this day and age. As such, the fewer the commands you run, and the more you take advantage of system weirdness such as Ans being the fastest variable that can also hold the most types of data (integers/floats, strings, lists, matrices, etc), the better the performance you're gonna get.
Sources: My own experiences, documented here: https://codewalr.us/index.php?topic=778.msg27190#msg27190
TI-84+CSE RAM numbers came from here: https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-c-se?category=specifications
Information about the Z80 came from here: http://segaretro.org/Zilog_Z80

Depends, if it's just a basic math program then no. For big games then YES. The TI-84 has only 3.5MB of space available and has the combo of an ancient Z80 processor and a whopping 128KB of RAM. TI-BASIC is also quite slow as it's interpreted (look it up for further information) so if you to make fast-running games then YES. Optimization is very important.

An example: Am I understanding GPU advantage correctly?

Just reading a bit about what the advantage of GPU is, and I want to verify I understand on a practical level. Lets say I have 10,000 arrays each containing a billion simple equations to run. On a cpu it would need to go through every single equation, 1 at a time, but with a GPU I could run all 10,000 arrays as as 10,000 different threads, all at the same time, so it would finish a ton faster...is this example spot on or have I misunderstood something?

I wouldn't call it spot on, but I think you're headed in the right direction. Mainly, a GPU is optimized for graphics-related calculations. This does not, however, mean that's all it is capable of.
Without knowing how much detail you want me to go into here, I can say at the very least the concept of running things in parallel is relevant. The GPU is very good at performing many tasks simultaneously in one go (known as running in parallel). CPUs can do this too, but the GPU is specifically optimized to handle much larger numbers of specific calculations with preset data.
For example, to render every pixel on your screen requires a calculation, and the GPU will attempt to do as many of these calculations as it can all at the same time. The more powerful the GPU, the more of these it can handle at once and the faster its clock speed. The end result is a higher-end GPU can run your OS and games in 4k resolution, whereas other cards (or integrated graphics) might only be able to handle 1080p or less.
There's a lot more to this as well, but I figured you weren't looking for the insanely technical explanation.
The bottom line is this: For running a single task on one piece of data, the CPU will normally be faster. A single CPU core is generally much faster than a single GPU core. However, they typically have many cores and for running a single task on many pieces of data (so you have to run it once for each), the GPU will usually be faster. But these are data-driven situations, and as such each situation should be assessed on an individual basis to determine which to use and how to use it.

updating 2 800 000 records with 4 threads

I have a VB.net application with an Access Database with one table that contains about 2,800,000 records, each raw is updated with new data daily. The machine has 64GB of ram and i7 3960x and its over clocked to 4.9GHz.
Note: data sources are local.
I wonder if I use ~10 threads will it finish updating the data to the rows faster.
If it is possiable what would be the mechanisim of deviding this big loop to multiple threads?
Update: Sometimes the loop has to repeat the calculation for some row depending on results also the loop have exacly 63 conditions and its 242 lines of code.

Microsoft Access is not particularly good at handling many concurrent updates, compared to other database platforms.
The more your tasks need to do calculations, the more you will typically benefit from concurrency / threading. If you spin up 10 threads that do little more than send update commands to Access, it is unlikely to be much faster than it is with just one thread.
If you have to do any significant calculations between reading and writing data, threads may show a performance improvement.
I would suggest trying the following and measuring the result:
One thread to read data from Access
One thread to perform whatever calculations are needed on the data you read
One thread to update Access
You can implement this using a Producer / Consumer pattern, which is pretty easy to do with a BlockingCollection.
The nice thing about the Producer / Consumer pattern is that you can add more producer and/or consumer threads with minimal code changes to find the sweet spot.
Supplemental Thought
IO is probably the bottleneck of your application. Consider placing the Access file on faster storage if you can (SSD, RAID, or even a RAM disk).

Well if you're updating 2,800,000 records with 2,800,000 queries, it will definitely be slow.
Generally, it's good to avoid opening multiple connections to update your data.
You might want to show us some code of how you're currently doing it, so we could tell you what to change.
So I don't think (with the information you gave) that going multi-thread for this would be faster. Now, if you're thinking about going multi-thread because the update freezes your GUI, now that's another story.
If the processing is slow, I personally don't think it's due to your servers specs. I'd guess it's more something about the logic you used to update the data.

Don't wonder, test. Write it so you could dispatch as much threads to make the work and test it with various numbers of threads. What does the loop you are talking about look like?
With questions like "if I add more threads, will it work faster"? it is always best to test, though there are rule of thumbs. If the DB is local, chances are that Oded is right.

Scattered-write speed versus scattered-read speed on modern Intel or AMD CPUs?

I'm thinking of optimizing a program via taking a linear array and writing each element to a arbitrary location (random-like from the perspective of the CPU) in another array. I am only doing simple writes and not reading the elements back.
I understand that a scatted read for a classical CPU can be quite slow as each access will cause a cache miss and thus a processor wait. But I was thinking that a scattered write could technically be fast because the processor isn't waiting for a result, thus it may not have to wait for the transaction to complete.
I am unfortunately unfamiliar with all the details of the classical CPU memory architecture and thus there may be some complications that may cause this also to be quite slow.
Has anyone tried this?
(I should say that I am trying to invert a problem I have. I currently have an linear array from which I am read arbitrary values -- a scattered read -- and it is incredibly slow because of all the cache misses. My thoughts are that I can invert this operation into a scattered write for a significant speed benefit.)

In general you pay a high penalty for scattered writes to addresses which are not already in cache, since you have to load and store an entire cache line for each write, hence FSB and DRAM bandwidth requirements will be much higher than for sequential writes. And of course you'll incur a cache miss on every write (a couple of hundred cycles typically on modern CPUs), and there will be no help from any automatic prefetch mechanism.

I must admit, this sounds kind of hardcore. But I take the risk and answer anyway.
Is it possible to divide the input array into pages, and read/scan each page multiple times. Every pass through the page, you only process (or output) the data that belongs in a limited amount of pages. This way you only get cache-misses at the start of each input page loop.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas