I need to resample an arbitrary number of complex signals, perform some miscellaneous operations on them, and finally sum them and save them to a file. The length of the signals forces me to buffer the signals into chunks and operate on them as such.
Most (all that I could find) resampling VIs can operate on chunks, using a reset flag to differentiate between new and appended data. My issue is that I would like to perform resampling on my signals in parallel (or at least interweavingly), which doesn't work as the resample VI keeps its previous state. A way around this would be to resample each signal sequentially, save it to a temporary file and then operate using the new files. This is a poor solution.
Practically, what I need (I think) is to have the resampling VI be cloneable, then I could make an instance for each signal. The VI I am currently using is the "Rational Resample" VI.
Any ideas?
Rational Resampling VI is polymorphic, so you can just select "Multi-Channel instance" to process several channels directly.
Moreover, even a single Rational Resampling VI is defined as "Preallocated clones reentrant execution" (LV2014, 32bit, windows). So if you place several Rational Resampling VIs into several different loops, each of them will maintain its own state (independent on the other instances). They will execute as parallel as LabVIEW execution system allows. Source: http://zone.ni.com/reference/en-XX/help/371361J-01/lvconcepts/reentrancy/
Related
I have three inputs in merge signals in different time, the out put of merge signals appeared to wait for all signals and outputted them. what I want is to have an output for every signal (on current output) as soon as it inputted.
For example: if I write (1) in initial value. 5,5,5 in all three numeric. with 3 sec time delay, I will have 6,7,and 16 in target 1, target 2, and target 3. And over all 16 on current output. I don't want that to appear at once on current output. I want to have as it appears in target with same time layout.
please see attached photo.
can anyone help me with that.
thanks.
All nodes in LabVIEW fire when all their inputs arrive. This language uses synchronous data flow, not asynchronous (which is the behavior you were describing).
The output of Merge Signals is a single data structure that contains all the input signals — merged, like the name says. :-)
To get the behavior you want, you need some sort of asynchronous communication. In older versions of LabVIEW, I would tell you to create a queue refnum and go look at examples of a producer/consumer pattern.
But in LabVIEW 2016 and later, right click on each of the tunnels coming out of your flat sequence and chose “Create>>Channel Writer...”. In the dialog that appears, choose the Messenger channel. Wire all the outputs of the new nodes together. This creates an asynchronous wire, which draws very differently from your regular wires. On the wire, right click and choose “Create>>Channel Reader...”. Put the reader node inside a For Loop and wire a 3 to the N terminal. Now you have the behavior that as each block finishes, it will send its data to the loop.
Move the Write nodes inside the Flat Sequence if you want to guarantee the enqueue order. If you wait and do the Writes outside, you’ll sometimes get out-of-order data (I.e. when the data generation nodes happen to run quickly).
Side note: I (and most LabVIEW architects) would strongly recommend you avoid using sequence structures as much as possible. They’re a bad habit to get into — lots of writings online about their disadvantages.
I have a (desktop) LabVIEW program running several large While loops. Each loop corresponds to the functions on an IO card in a myRIO DAQ system. Each card operates at a different speed, therefore each loop and subVIs in my code run at different speeds as well.
However, I'm now finding that I need to pass data from a low speed loop to a high speed loop, and I'm not sure how to best go about it.
The low speed loop actually connects via TCP to a Yokogowa power analyzer, and the loop speed is 50ms (20Hz). The high speed loop runs at 50kHz, and performs math operations using inputs from a high speed ADC to calculate motor torque, and needs the info from the low speed loop (power analyzer) to proceed. There's an 816:1 data flow difference.
At runtime, it appears to work fine, until I spin the motor up, then the overtorque routine kicks in and shuts me down.
So I next tried to queue the data, and that only significantly slowed the high speed loop.
That being said, my thought was to take the incoming data on the low speed loop, and fill an array with that data (816 deep) and queue it up to the high speed loop, but I'm not quite certain how to go about that as well.
How should I accomplish what I'm trying to do in a more efficient and proper manner?
Look to the Real-Time FIFO palette. The functions here create and operate a lockless-FIFO system explicitly designed for passing data in a deterministic way between loops. Used correctly, they guarantee that the slower loop, trying to write data, will not lock the FFO in a way that throws the faster loop off of its schedule.
You can find a simple example of the RT FIFO code here. You'll find more in the LabVIEW shipping examples.
If the high speed loop is running faster then it only really needs the latest value so you need a variable/tag type communication.
Depending on what you are aware of already there are a few options:
Local/Global Variable
Functional Global Variable (but globals are faster)
Notifier (if you use get staus you can read this like a variable.
I would pick one you are comfortable with and try that.
I have a flow graph with a file source (with repeat off) and a GUI Time Sink. The graph is throttled by a throttle block at 2 samples / sec. I expect to see two new samples in my GUI Time Sink every second. However, instead of 1-second updates, the GUI Time Sink doesn't display anything at all. If I turn repeat on on the file source, the GUI Time Sink does update. Why doesn't it update when repeat is off?
My question is similar to this one. In my case, I also have a file source throttled down to a very slow sample rate. However, my sink is a GUI Time Sink, not a file sink--I see no option for an "Unbuffered" parameter on the Time Sink.
My flow graph
Repeat off
Repeat On
This is actually multiple problems in one:
You're assuming the time sink will show two new values when they come in. That's not true: it will only update the display when it has (at least) as many new items as you configured it to show in number of points.
You're assuming GNU Radio will happily read single items (or two) at a time. Typically, that is not the case: it will ask the file source for as many items as there is space in the output buffer, something like 8192 (not fixed). And typically,
Throttle doesn't work like you think. It takes the number of input samples it gets in each call to its work function (e.g. 8192) and divides that number by the throttle rate you set and then just blocks for that amount of seconds. Throttle regulates the average rate, on a longer time scale, or in your really minimal rate case, a very long time scale.
You can limit the number of items in an output buffer, but not below a page size (4kB); for complexes that is 1024 items at least.
I think the classical graphical GNU Radio sinks might just not be the right thing to analyze files sample-by-sample.
I recommend trying the example flow graphs that come with Tim O'Shea's gr-pyqt. They are very handy for this kind of analysis.
I have this molecular dynamics program that writes atom position and velocities to a file at every n steps of simulation. The actual writing is taking like 90% of the running time! (checked by eiminating the writes) So I desperately need to optimize that.
I see that some fortrans have an extension to change the write buffer size (called i/o block size) and the "number of blocks" at the OPEN statement, but it appears that gfortran doesn't. Also I read somewhere that gfortran uses 8192 bytes write buffer.
I even tried to do an FSTAT (right after opening, is that right?) to see what is the block size and number of blocks it is using but it returns -1 on both. (compiling for windows 64 bit)
Isn't there a way to enlarge the write buffer for a file in gfortran? Will it be diferent compiling for linux than for windows?
I'd really really rather stay in fortran but as a desperate measure isn't there a way to do so by adding some c routine?
thanks!
IanH question is key. Unformatted IO is MUCH faster than formatted. The conversion from base 2 to base 10 is very CPU intensive. If you don't need the values to be human readable, then use unformatted IO. If you want to be able to read the values in another language, then use access='stream'.
Another approach would be to add your own buffering. Replace the write statement with a call to a subroutine. Have that subroutine store values and write only when it has received M values. You'll also have to have a "flush" call to the subroutine to cause it to write the last values, if they are fewer them M.
If gcc C is faster at IO, you could mix Fortran and C with Fortran's ISO_C_Binding: https://stackoverflow.com/questions/tagged/fortran-iso-c-binding. There are examples of the use of the ISO C Binding in the gfortran manual under "Mixed Language Programming".
If you spend 90% of your runtime writing coords/vels every n timesteps, the obvious quick fix would be to instead write data every, say, n/100 timestep. But I'm sure you already thought of that yourself.
But yes, gfortran has a fixed 8k buffer, whose size cannot be changed except by modifying the libgfortran source and rebuilding it. The reason for the buffering is to amortize the syscall overhead; (simplistic) tests on Linux showed that 8k is sufficient and more than that goes far into diminishing returns territory. That being said, if you have some substantiated claims that bigger buffers are useful on some I/O patterns and/or OS, there's no reason why the buffer can't be made larger in a future release.
As for you performance issues, as already mentioned, unformatted is a lot faster than formatted I/O. Additionally, gfortran has rather high per-IO-statement overhead. You can amortize that by writing arrays (or, array sections) rather than individual elements (this matters mostly for unformatted, for formatted IO there is so much to do that this doesn't help that much).
I am thinking that if cost of IO is comparable or even larger than the effort of simulation, then it probably isn't such a good idea to store all these data to disk the first place. It is better to do whatever processing you intend to do directly during the simulation, instead of saving lots of intermediate data them later read them in again to do the processing.
Moreover, MD is an inherently highly parallelizable problem, and with IO you will severely cripple the efficiency of parallelization! I would avoid IO whenever possible.
For individual trajectories, normally you just need to store the initial condition of each trajectory, along with its key statistics, or important snapshots at a small number of time values. When you need one specific trajectory plotted you can regenerate the exact same trajectory or section of trajectory from the initial condition or the closest snapshot, and with similar cost as reading it from the disk.
I have a FORTRAN MPI code to solve a flow field.
At the start I want to read data from file and distribute it to the participating processes.
The data is consisting of several 3-D arrays(velocities in space x,y,z).
Every process stores only a part of the array.
So if every process is going to read the file(the easiest way I think) it is not going to work as it will only store a the first part of the file corresponding to the number of arrays that the process can hold.
MPI Bcast can work for 3d arrays? But then things become complex.
Or is there an easier way?
You have, broadly speaking, 2 or 3 choices, depending on your platform.
One process reads the input data and sends (parts of) it to the other processes. I wouldn't usually use broadcast for this since it is a collective operation and all processes have to take part. I'd usually just send the necessary information to each process. If it is convenient (and not a memory issue) you could certainly broadcast all the input data to all the processes, it's just not a pattern of operation that I use or see much.
All processes read the data that they require. This may involve a process reading an entire input file and only storing those parts it requires. But if you have very large input files you can write routines to read only the necessary part into each process's memory space. This approach may involve processes competing for disk access, which is only slow in a relative sense: if you are running large-scale and long-running parallel computations waiting a few seconds while all the processes get their data is not much of an overhead.
If you have a parallel file system then you can use MPI's parallel I/O routines so that each process reads only those parts of the input data that it requires.
The canonical way of such an I/O pattern in MPI is either to
Read the data on rank 0, then use MPI_Scatter to distribute it. Or if memory is tight, do this blockwise, or then use 1-to-1 communication rather than MPI_Scatter.
Use MPI-I/O, and have each rank read its own subset of the data file (to be useful, this of course requires a file format where you can figure out the boundaries without first reading through the entire file).
For extreme scalability, one can combine the two approaches, that is a subset of processes (say, sqrt(N) as a rough rule of thumb) use MPI I/O, and each MPI process sends data to its own IO process.
If you are running your code on less than 1000 cores with a good file system (e.g. Lustre) then just use Fortran I/O where each rank opens the file and reads the data it needs (skipping the rest). Yes it takes a few minutes but you're only reading the file once during start.
MPI I/O (binary only) is non-trivial and usually you are always better off using higher level libs such as HDF5 or Parallel NetCDF. Performance will depend on how the data is read (contiguous vs non-contiguous and so on). The following links may be helpful ...
http://www.osc.edu/supercomputing/training/pario/parallel-io-nov04.pdf
https://support.scinet.utoronto.ca/wiki/images/0/01/Parallel_io_course.pdf